The five layers of product data needed to make it AI-ready

Customers don’t buy objects; they buy outcomes, and the way we express it has changed over time. Nowadays, these outcomes are encoded as data which extends across ERPs, supplier feeds, DAMs and marketplaces. If you want AI tools to perform optimally – that is, performing retrieval, augmented generation, semantic search, or automated enrichment, your product data needs to be arranged in five clear layers, not in one big messy pile.

Our article provides you with a practical blueprint, underpinned by Kotler’s[1] five product levels and updated for the field of Product Information Management (PIM) in an era defined by the power of AI to add value to that management.

1. Core purpose data

This is the “why” behind the SKU: the job to be done, primary use cases, and the customer context in which they select that product. For instance, “portable power for site tools” rather than “18V battery”. If you capture this layer well, it narrows the search space, and reduces the chances of AI hallucination[2].

What ‘good’ looks like

A single authoritative category (plus controlled sub-uses) per SKU
Structured intent fields (such as job, use case, application, environment, or buyer role)
Short, reusable value statements aligned to search intent and customer language

2. Foundational product data

This is the foundation upon which the product becomes unambiguous and computable: identifiers, mandatory specs and fixed or controlled set of measurement units for consistency, accuracy, and safe interpretation of attributes (‘unit-secure’). If this layer is unstable, every downstream AI task loses that stability too.

What ‘good’ looks like

Clean identifiers (GTIN/MPN/SKU) with comprehensive lineage and change history
Normalised attributes with allowed values, units and tolerances
Automated checks for completeness, conformance, duplicates and conflicts

How PIM + AI help

Applying schema rules and ML validators during supplier onboarding
Using entity resolution to merge near-duplicates
Applying unit conversion and attribute mapping automatically so that the PIM system is maintained as the ‘single source of truth’.

3. Expected experience data

These are the attributes, assurances and conditions which customers consider as deal makers or breakers: performance ranges, compatibility, safety and compliance, warranty and service information, and localised content, This is also where channel requirements live.

What ‘good’ looks like

Channel-specific completeness profiles (by marketplace / template)
Compliance metadata per region, with audit trails and expiry alerts
Localised titles and attributes driven by rules, not by just copy-pasting

How PIM + AI help

Using rules and validation to meet each channel’s template criteria
Applying generative AI to draft variant copy from structured attributes, with guard-rails (like style guides, unauthorised claims, or reference citations)
Implementing human oversight protocols and approval for higher-risk categories

4. Augmented context data

This is the rich context that differentiates your offer:

Long-form descriptions
Feature-to-benefit mappings
How-to guides
Q&A
User reviews
Usage instructions
Imagery and video
Sustainability credentials
Comprehensive regulatory compliance documentation
Digital product passport fields

It’s the difference between a humdrum product spec sheet copy-pasted from a supplier sheet, and a truly compelling product story.

What ‘good’ looks like

A ‘product knowledge graph’ linking SKUs to use cases, accessories, parts, standards and content assets
Review summaries with extracted pros/cons as structured fields
Rich media tagged with attributes (not just filenames), enabling precise retrieval
SEO metadata (search terms, synonyms, questions) kept as governed fields

How PIM + AI help

Use AI to tag assets, cluster reviews, and generate benefit-led copy which is tailored to the right audience on the right channel. Feed the knowledge graph to your RAG pipeline so generated descriptions cite the right facts, not guesswork.

5. Potential and learning data

The more your AI learns, the better and more useful it’s going to be in terms of:

Capturing signals that reveal what content to change next
Search terms with zero results
Click paths,
Reasons for returns
Content performance
Support themes
Competitor snapshots for benchmarking
Price movements

What ‘good’ looks like

Event capture mapped back to SKUs and their attributes
Evaluation sets and benchmarks for any generative or retrieval model which affects product content
Taxonomy and template reviews on a systemised and regular basis, informed by live alerts
Clear ownership and approval workflows for edits which pose commercial or non-compliance risks

How PIM + AI help

Instrumentalising your catalogue (from static database to dynamic driver of operation effectiveness)
Push analytics into the PIM so editors see where data is outdated, drifting or incomplete
Tracking prompts, versions and outcomes for generative content and rolling back when metrics dip below acceptable thresholds

Leveraging the 5 layers to work effectively together

Design from questions, not systems

Start with the tasks you want AI to support (think findability, compatibility, configuration, selection, service) and derive fields and relationships that answer those questions.

Prioritise supplier onboarding

Give suppliers guided templates with validation as a gatekeeper. Auto-map attributes and route any exceptions to the appropriate team. Remedying issues at source reduces rework and accelerates time-to-market.

Adopt a composable PIM solution

PIM is your governed product truth, to which you can add modular services for areas like classification, translation, media, analytics or syndication. Your clean and trustworthy product data can be safely used with downstream AI using APIs.

Measure what matters

We need to move past generic “completeness” by tracking metrics like:

Search discoverability
Conversion by category
Return-rate fluctuations
Time-to-enrich (from onboarding)
Share of AI-authored content, auto-approved without needing edits

A quick checklist

Using the 5-layer model as a framework, implement quick-check quality criteria:

Core purpose captured as structured data fields
Product identifiers and units normalised, and duplicates eliminated
Coverage of channel-ready “expected” attributes and full compliance
Feedback loops, analytics and model governance in place

And

Rich context linked in a knowledge graph* and tagged for RAG

A bit ‘techie’, that last one: Essentially, it means that your product data isn’t just a pile of text. It’s arranged as connected facts, with labels that make it easy for an AI tool to fetch exactly what it needs and cite it. You’re capturing the who/what/where/why/how around a product: variants, compatible parts, standards, materials, safety notes, images, region rules, lifecycle status, customer segments, use cases, and so on. Think: the story around the SKU, not only the SKU.

The acronym RAG means Retrieval-Augmented Generation. When you tag and chunk content an LLM can retrieve first and generate second. In practical terms, you need to:

Confused by PIM Vendors?

With 100s of PIM software vendors worldwide, choosing the right PIM solution can be a daunting & confusing task.

Use our guide to assess PIM solutions against the right capabilities to make an objective and informed choice.

Download Guide

Why this matters

Fewer hallucinations: the AI model has no other option but to retrieve from your approved product facts.

Sharper answers: “Which drill fits this anchor in gypsum board?” becomes a single hop across the indexed information.

Personalisation: tags (region, channel, audience) enables the AI to assemble copy which is suitable for a given market and a given reseller.
Governance: versioned sources with access tags allow you to keep regulated fields (compliance documentation, for instance) clean and easily auditable.

Start with Data supports you

If you want AI to work with your product data, not guess at it, let’s talk.

We’ll help you assess which layers are missing, where risk is creeping in, and what to fix first to get real commercial value from AI. If you’re ready to turn your product data into a durable advantage, get in touch with us today

Bonus: The Product Knowledge Graph

A product knowledge graph is a map of everything your products ‘know’ about themselves, and how those facts connect. So, instead of using flat data tables, you model products as nodes (things) with relationships (how those things link) and properties (the facts). It makes the difference between a bog-standard parts list and a living wiring diagram.

Core elements of a product knowledge graph

Nodes (entities): Product, Variant, Attribute, Category, Material, Standard, Document, Image, Accessory, Spare Part, Region, Channel, Customer Segment.

Edges (relationships): hasVariant, belongsToCategory, madeOf, compliesWith, compatibleWith, replaces, requires, soldIn, approvedFor, referencesDocument.

Properties (facts): Key/value details on nodes/edges (e.g., cut resistance=C, effective from=2024-03-01, locale=en-GB, lifecycle=discontinued).

Ontology (schema): The ‘controlled’ vocabulary that says which nodes/edges exist and what’s allowed to connect. Reuse standards where possible (GS1, schema.org/Product), extend for your domain.

Example (distributor)

SKU: 9HX Drill

hasVariant → 9HX-110V, 9HX-240V
compatibleWith → Bit Set BS-25
compliesWith → EN 60745
soldIn → UK, DE
referencesDocument → “9HX Manual v2” (pages 3–7)
Each item carries properties (power, torque, locale, approvals, effective dates).

As a result, customer queries like “Which drills sold in Germany are gypsum-safe and compatible with anchor X?” become a stroll in the park, not a search for needles in a spreadsheet haystack.

[1] Philip Kotler was a highly-renowned author, educator, and consultant, widely regarded as the “father of modern marketing”.

[2] Real instances where the AI model generates product information that is factually incorrect, nonsensical, or entirely fabricated, yet presented with a tone of confidence and authority. If the data used to train the AI contains inaccuracies or is not representative of all products, the model simply learns and perpetuates these errors

The five layers of product data that AI needs

1. Core purpose data

2. Foundational product data

What ‘good’ looks like

How PIM + AI help

3. Expected experience data

What ‘good’ looks like

How PIM + AI help

4. Augmented context data

What ‘good’ looks like

How PIM + AI help

5. Potential and learning data

What ‘good’ looks like

How PIM + AI help

Leveraging the 5 layers to work effectively together

Design from questions, not systems

Prioritise supplier onboarding

Adopt a composable PIM solution

Measure what matters

A quick checklist

Confused by PIM Vendors?

Why this matters

Start with Data supports you

Bonus: The Product Knowledge Graph

Core elements of a product knowledge graph

Example (distributor)

SKU: 9HX Drill