Why Tools Alone Don’t Fix Bad Product Data

It’s still true that many digital merchants invest in shiny new tech because deep down, they want whatever the problem is to be a question of technology. Whether it’s a new PIM, a data quality module, a supplier portal, or an AI-powered enrichment pilot, they’re all seeking that magic ‘something’ which will turn their messy product information into clean, channel-ready content. After all, the demos are real-life proof! The capabilities exist in abundance.

Yet the same symptoms endure after these tools go live. They’re still living with broken filters, rejected marketplace feeds, and inconsistent specs, while their teams stealthily return to those comforting, if unwieldy, spreadsheets. “Better the devil you know…” But the root of the problem isn’t failure on the part of the tech vendor. It’s a category error to assume so. These tools store and distribute product data; What they can’t do is create that data or make it true.

With that mindset, a PIM becomes a ‘better’ container. Best in class. But containers alone can’t clean what goes into them. Our article explains some of the fallacies around adoption of new tools as a panacea for all your product data problems and outlines how you can address the real issues at source: human behaviour, and lack of robust data governance.

An entirely predictable post-implementation shock: visibility without resolution

Centralisation reduces one type of inconsistency: that of different teams using different versions. What it doesn’t reduce is the inconsistency embedded within the record itself. Things like missing attributes, conflicting values, non-standard units, duplicate SKUs, mismatched identifiers, or category drift. What changes after implementing sophisticated tech tools isn’t the quality, but the surface area of the problem. The business feels worse because:

errors are now syndicated faster to more channels
validation rules block publication and create bottlenecks in workflow
exceptions become measurable, visible, and ‘politically contentious’
Any ‘single source of truth’ claims are killed under the friendly fire of cross-system disagreement

New system adoption weakens under these stress factors. A structured system makes the cost of ambiguity explicit. Users don’t abandon a PIM solution because they dislike governance; they abandon it because governance was never even agreed upon, but the new tool is what forces them to confront this fact ‘mid-flight.’

Tools enforce structure, not truth

Most ‘bad’ data” isn’t just incomplete. It’s fundamentally flawed:

Incorrect dimensions
Out-of-date compliance flags
Misleading pack sizes
Contradictory descriptions in different channels
Images that don’t match variants
Pricing that doesn’t reconcile, Vague guesses at material specs
· …..

Of course, validation can check formats – whether a field is populated, whether a value matches a pattern, or whether an identifier is present. What it can’t do is verify reality.

If a supplier provides the wrong dimensions, the system will store and publish them, flawlessly. That’s its job! However, if one team calls a colour “navy” and another calls it “dark blue,” the tool simply preserves that discrepancy unless you decide on a standardised vocabulary. Tools are obedient. They want us to be happy. They won’t (at least, for the time being!) arbitrate disputes you yourselves haven’t settled.

Automation accelerates whatever process it touches

Don’t confuse automation with cleansing. Automation is a throughput efficiency enabler. Applied to a well-defined process, it gives you the capacity to scale. If you apply it to a messy existing process, you’ll just get the same mess…at scale.

AI intensifies this phenomenon. AI-generated descriptions, attribute suggestions, deduplication heuristics, and so on, are all marvellous capabilities when your underlying schema is stable and the source attributes are reliable. Without that stability and reliability, AI can easily become an amplification trap: fluent content that is confidently wrong, then multiplied across thousands of SKUs and pushed into feeds, search, advertising, and customer support. Accelerated inaccuracy ends up being much worse, operationally, than slow and incomplete. All it does is create more downstream rework, a higher returns rate, and incalculable damage to a brand’s reputation.

Where the real gap lies: Governance, and the decisions a tool cannot make

When product data programmes underperform, the general pattern isn’t that “we lack features.” Far more often, it’s “we lack decisions.” That’s where governance steps up. The set of choices and accountabilities which define what ‘good’ means, who is responsible for maintaining that standard, and how everyone is on board with protocols like the following:

Who owns attribute definitions by domain (commercial, compliance, logistics, content)?
What is the taxonomy, and what attribute requirements attach to each node?
What are the controlled vocabularies (colour, material, finish, fit, sizing)?
Which is the master system when records conflict (ERP vs PIM vs marketplace vs supplier file)?
What is the approval path for changes affecting legal exposure or returns?
What is the supplier standard, and what happens when submissions don’t comply?

A PIM is excellent at hosting workflows and validation, but it cannot answer the questions above. What’s more, if you leave them unanswered, users have to ‘make do and mend’ – they’ll find ways to route around the system so as to hit their time-to-market targets. Wouldn’t we all, when shipping lots of products is rewarded, while cleaning data is treated as an optional (and when done, exasperating) overhead.

Bad product data: the behaviour problem

Even with governance in place, in writing, data quality ultimately depends on how people behave under pressure. Think of common scenarios; buyers onboarding ranges late, or content teams standardising terms across thousands of items, or suppliers providing exports convenient to themselves, rather than required values. If, in these not uncommon circumstances, the organisation tolerates a ‘fix it later, -get it out’ mindset, the tool simply becomes a more expensive place to postpone decisions.

Anecdotes stack up, where the operational tell-tale signs are obvious:

PIM completeness dashboards show persistent gaps, but…nothing changes
Workflow steps “pause” for manual clarification…then get bypassed
Data stewards become a bottleneck because…ownership is unclear
Teams maintain ‘shadow’ spreadsheets… “just to get things live.”
Marketplace failures frequently spike during peak trading because “standards bend.”

These aren’t training problems. They’re more like incentive problems. It’s not the tool that’s failing, but more a reflection of how the business actually prioritises work.

Why forced adoption is destined to fail

When c-suite responds with mandates like “everyone has to use the PIM,” adoption rarely improves sustainably. It simply relocates the friction. People will comply on the surface while moving the real work elsewhere, because the underlying mismatch is still present. The business is demanding of the tool and its users that they somehow ‘compensate’ for the missing upstream production of data.

So, where does a realistic improvement sequence start? First things first. You need to carry out an honest data assessment to get to the bottom of some possibly uncomfortable questions:

What product data do you actually have?
Where is it (and how many versions are lurking around)?
What’s missing?
What’s contradictory?
What standards are undefined?
Where do the costs lie?
Who currently absorbs this problem (…as well as the costs)?

Only when you’ve done this can you concentrate on ensuring things like configuration, validation, automation, and supplier onboarding operate as multipliers rather than accelerants for permanent fire-fighting.

The structural mismatch behind permanent drag

Half-hearted adoption after a technically successful PIM go-live is usually driven by the following mismatch: the organisation has implemented a system designed to govern and scale product data, but it hasn’t implemented an operating model that produces and sustains that data. In other words, the tool works on the basis that decisions, standards, and ownership exist, while business assumes that it’s the tool which will create them. That gap is where bad data survives and where operational drag and very easily become a chronic operational condition.

Data assessment

If your PIM is live but quality hasn’t moved, get in contact with us today at Start with Data. We’d be happy to use our experience and expertise in partnering with you to carry out a thorough data assessment: Let’s quantify the gaps, pinpoint their sources, and work to develop the decisions, clarity, and ownership that your tool, however good it is, just can’t supply.

Why tools alone don’t fix bad product data