Product Data Cleansing

Rigorously cleansed data guarantees you use high quality product data when you syndicate your product information - clean data means accurate and complete data.

The importance of product data cleansing

Using a single ‘golden’ record version of data is clearly a prerequisite for excellent product information management. But that single record must consist of clean data points, otherwise, it is, at best, useless, and at worst, actively harmful. That is why product data cleansing is a key part of any PIM project – if you start with clean data, they are useful and add value to your product information. At Start with Data, we give product data cleansing the significance it deserves, and we have dedicated professionals equipped to guarantee clean product data, and leave your information fit for use.


Talk to us about Product Data Cleansing

If you are a retailer, brand, manufacturer or distributor interested in cleaning or enhancing your product data, we would love to help you

What is product data cleansing?

Data cleansing (sometimes referred to as data cleaning or data scrubbing) is the process of detecting and amending corrupt or inaccurate data points from a data set, table, or database. These are replaced, modified, merged, or deleted.

When it comes to product data cleansing, the key focus is on removing irrelevant, useless, and inaccurate values, including: 

  • Duplicate Values.
  • Typos
  • Inappropriate or non-compliant data formats
  • Populating or reconciling missing, inaccurate, or inconsistent values.

The benefits of Product Data Cleansing

Regardless of language or structure, you can measure the accuracy, completeness, conformity, and uniqueness of your product information, including de-duplication needs for complex data. Content quality can be guaranteed by using rule-based classification, matching, and linking, and normalization. KPI reporting allows the user to gain an overview of the status of product data, flagging any associated errors so that corrective action can be taken.


Three notable benefits of product data cleansing are: 

Speed and ease

Manage data feeds easily if they are complete, standardised, and consistent. Less time needed to tailor the feed to specific channel-requirements. Easier to search and filter the feed to analyse, extract insights and adjust.

Visibility, click-through, and conversion

Any product feed which has clean and correct product data can be found on searches. Allows buyers to make informed purchasing decisions.

Customer satisfaction

Buyers depend on accurate and enriched product information to make decisions. By leveraging this quality and meeting expectations, satisfaction levels (individual consumer and B2B) rise.

As an indication of the importance of clean data, US government figures show that the economy loses at least $3 trillion per year due to management and use of dirty or substandard data.

At the level of internal organisational efficiency:

  • Cleansing data siloes and data lakes helps remove errors.
  • If CX problems happen to emerge, it is easier to find and correct the error if the product data is to a high standard of cleanliness in the first place.
  • It is much more feasible for businesses to plan more precise strategic roadmaps and management of multiple channel requirements is easier and faster.

Product Data Cleansing challenges

Product data changes to a greater and more frequent degree depending on the attributes and metadata for the product. Furthermore, provenance does not guarantee cleanliness, especially when data are being ingested from a vast number of suppliers and vendors. Finally, if attention isn’t given to legacy data in the initial stages of a PIM or MDM project, the problem of dirty data is simply being kicked down the road.

 In general terms, however, the most common problems involve the following characteristics:


These can emerge in different ways and at different times. data migration, data exchanges occurring through integrations, use of third-party connectors, manual data entries, and batch imports of product data. Failure to deduplicate can lead to; 

  • Inefficient workflows and data retrieval
  • Substandard adoption of software due to inaccessible data
  • Lower ROI on CRM and marketing automation



Legacy data sets may be years old and almost certainly irrelevant, inaccurate or in non-compatible formats, hence of no use. So, why does outdated data prevail so much? 

  • Key stakeholders leave or enter.
  • Rebranding and takeovers.
  • Legacy systems frequently evolve from previous versions, rendering data incompatible.


The present-day digital ecosystem is prone to very frequent changes, so product data must be fresh and up to date before being used for syndication, insights, or decision-making.



Technological advances, the rise of digital shopping as the norm and geopolitical trends have meant that data security & privacy laws are evolving and emerging. In an increasingly consumer-centric business landscape, insecure (or non-conforming) data could be seen as the riskiest kind of dirty data in terms of potential cost to the company, both financial and reputational.



Data are defined as incomplete if they lack key fields required to process incoming information before sales and marketing take action. For example, if new or existing product data is missing key feature fields, they cannot be included in product catalogs or eCommerce sites. As such, revenue opportunities are missed.



The biggest problem with incorrect data is that it is stored in the wrong location – for example, a text field which contains a numerical value. Conversely, inaccurate data is when a data field is filled but with the wrong information – for example, the wrong measurements for a product’s dimensions.

This can cause numerous problems, including imprecise targeting and segmentation, irrelevant or non-personalised messaging or problems with storage and display.



One common problem at the discovery phase of a PIM implementation project is the existence of multiple versions of the same data elements across different databases. It is inconsistent (or non-standardised) because although it looks similar, it represents the same thing in a different location and format.



Some companies simply have an aversion to culling data, resulting in;

  • Slower data exchange rates
  • Inflated record counts
  • Failure to comply with storage limits


Maintaining ‘lean’ databases is key to overall product data hygiene and organisational agility.

The data cleansing cycle

Data cleansing methods

In general terms, the methodology behind data cleansing is linked to the challenges we have seen above Product data, however, has its own characteristics – attributes vary widely, there is no fixed syntax in many cases, the standards which exist are confined to those organisations which sell and provide content.

For effective product data cleaning, with its almost infinite variables, methods increasingly use the following approaches:

Data cleansing strategy - success factors

The clear business imperatives of data cleansing are:

  •         to detect and eliminate major inconsistencies and errors, either when working with single product data sources (such as a supplier), and when combining multiple sources (enriching and contextualising data for export).
  •         to implement tools which minimise manual inspection and the need for programming, and which can streamline the entire process
  •         to deploy the cleansing project in conjunction with a pre-determined and robust data governance framework, where the entire organisation understands and is aware of the protocols and standards to observe when it comes to cleansing data.


Data cleansing best practices

1. Develop a Product Data Quality Plan

Firstly, you need to set quality standards and expectations for your product data. From those emerge your data quality KPIs:

  • What are the sources of dirty data?
  • What steps can be taken to deal with the problems at source?
  • How will you track the health of your data?
  • What are the KPIs and metrics used to measure cleanliness?
  • How will you maintain high standards of data hygiene on an ongoing basis?


Gaining understanding of the root cause for the data health problem is fundamental to minimise the number of repeated problems from occurring.

2. Standardise at the point of entry

Before cleansing, incoming data can be checked at the point of entry to ensure they are standardised when they enter a database. This will ease detection of duplicates. Creating and following a standard operating procedure will ensure that your team is only allowing quality data in your PIM, CRM, ERP, and so on, at the point of entry.

3. Validate accuracy

There are tools available for cleaning data, such as list imports, which allow you to carry out real-time validation of the accuracy of your product data.

 Once the scope of the business case for the product data management solution has been established, data profiling can begin. It needs to be carried out early and often, with an established benchmark at the initial quality level (before cleansing) to help the personnel involved to objectively demonstrate the causal impact of poor-quality product data on business value (and to justify the ongoing funding required).


Is data mining the same as data cleansing?

We have already defined data cleansing. But what about data mining? It is a key part of data cleansing but is not a substitute. Data mining pure and simple is a technique used to extract interesting and useful information from data sets. It is typically used for analytics, gathering insights, and applying to strategic decision-making.

Data quality mining is a relatively recent approach – it applies the principles of data mining to identify, correct or eliminate problematic data in large databases. Data quality mining is becoming a more commonplace technique used as part of a data cleansing initiative.

Benefits of outsourcing data cleansing and management

Supervising a data cleansing partner service is far less expensive than investing in new technology and recruiting experienced data professionals. Outsourcing data cleansing helps the business to benefit from extra resources in a low-cost and low-risk way.

At Start with Data, we partner with vendors and providers who offer data cleansing as an integral part of their product information management solutions. Working closely alongside them, our experts can establish a clear business case for data cleansing and advise on the best solution for your particular circumstances.

Get in touch with us to have a conversation about how we can help you to ensure your data is clean, not only now, but long into the future.

Find out more

If you would like to find out more about how product data management, PIM and MDM can create value for your business, we’d love to hear from you – Ben Adams, CEO Start with Data

Case Study

“Start with Data are helping transform product data management, laying scalable technology and data governance foundations”