Customers don’t buy objects; they buy outcomes, and the way we express it has changed over time. Nowadays, these outcomes are encoded as data which extends across ERPs, supplier feeds, DAMs and marketplaces. If you want AI tools to perform optimally – that is, performing retrieval, augmented generation, semantic search, or automated enrichment, your product data needs to be arranged in five clear layers, not in one big messy pile.
Our article provides you with a practical blueprint, underpinned by Kotler’s[1] five product levels and updated for the field of Product Information Management (PIM) in an era defined by the power of AI to add value to that management.
1. Core purpose data
This is the “why” behind the SKU: the job to be done, primary use cases, and the customer context in which they select that product. For instance, “portable power for site tools” rather than “18V battery”. If you capture this layer well, it narrows the search space, and reduces the chances of AI hallucination[2].
What ‘good’ looks like
- A single authoritative category (plus controlled sub-uses) per SKU
- Structured intent fields (such as job, use case, application, environment, or buyer role)
- Short, reusable value statements aligned to search intent and customer language
2. Foundational product data
This is the foundation upon which the product becomes unambiguous and computable: identifiers, mandatory specs and fixed or controlled set of measurement units for consistency, accuracy, and safe interpretation of attributes (‘unit-secure’). If this layer is unstable, every downstream AI task loses that stability too.
What ‘good’ looks like
- Clean identifiers (GTIN/MPN/SKU) with comprehensive lineage and change history
- Normalised attributes with allowed values, units and tolerances
- Automated checks for completeness, conformance, duplicates and conflicts
How PIM + AI help
- Applying schema rules and ML validators during supplier onboarding
- Using entity resolution to merge near-duplicates
- Applying unit conversion and attribute mapping automatically so that the PIM system is maintained as the ‘single source of truth’.
3. Expected experience data
These are the attributes, assurances and conditions which customers consider as deal makers or breakers: performance ranges, compatibility, safety and compliance, warranty and service information, and localised content, This is also where channel requirements live.
What ‘good’ looks like
- Channel-specific completeness profiles (by marketplace / template)
- Compliance metadata per region, with audit trails and expiry alerts
- Localised titles and attributes driven by rules, not by just copy-pasting
How PIM + AI help
- Using rules and validation to meet each channel’s template criteria
- Applying generative AI to draft variant copy from structured attributes, with guard-rails (like style guides, unauthorised claims, or reference citations)
- Implementing human oversight protocols and approval for higher-risk categories
4. Augmented context data
This is the rich context that differentiates your offer:
- Long-form descriptions
- Feature-to-benefit mappings
- How-to guides
- Q&A
- User reviews
- Usage instructions
- Imagery and video
- Sustainability credentials
- Comprehensive regulatory compliance documentation
- Digital product passport fields
It’s the difference between a humdrum product spec sheet copy-pasted from a supplier sheet, and a truly compelling product story.
What ‘good’ looks like
- A ‘product knowledge graph’ linking SKUs to use cases, accessories, parts, standards and content assets
- Review summaries with extracted pros/cons as structured fields
- Rich media tagged with attributes (not just filenames), enabling precise retrieval
- SEO metadata (search terms, synonyms, questions) kept as governed fields
How PIM + AI help
Use AI to tag assets, cluster reviews, and generate benefit-led copy which is tailored to the right audience on the right channel. Feed the knowledge graph to your RAG pipeline so generated descriptions cite the right facts, not guesswork.
5. Potential and learning data
The more your AI learns, the better and more useful it’s going to be in terms of:
- Capturing signals that reveal what content to change next
- Search terms with zero results
- Click paths,
- Reasons for returns
- Content performance
- Support themes
- Competitor snapshots for benchmarking
- Price movements
What ‘good’ looks like
- Event capture mapped back to SKUs and their attributes
- Evaluation sets and benchmarks for any generative or retrieval model which affects product content
- Taxonomy and template reviews on a systemised and regular basis, informed by live alerts
- Clear ownership and approval workflows for edits which pose commercial or non-compliance risks
How PIM + AI help
- Instrumentalising your catalogue (from static database to dynamic driver of operation effectiveness)
- Push analytics into the PIM so editors see where data is outdated, drifting or incomplete
- Tracking prompts, versions and outcomes for generative content and rolling back when metrics dip below acceptable thresholds
Leveraging the 5 layers to work effectively together
Design from questions, not systems
Start with the tasks you want AI to support (think findability, compatibility, configuration, selection, service) and derive fields and relationships that answer those questions.
Prioritise supplier onboarding
Give suppliers guided templates with validation as a gatekeeper. Auto-map attributes and route any exceptions to the appropriate team. Remedying issues at source reduces rework and accelerates time-to-market.
Adopt a composable PIM solution
PIM is your governed product truth, to which you can add modular services for areas like classification, translation, media, analytics or syndication. Your clean and trustworthy product data can be safely used with downstream AI using APIs.
Measure what matters
We need to move past generic “completeness” by tracking metrics like:
- Search discoverability
- Conversion by category
- Return-rate fluctuations
- Time-to-enrich (from onboarding)
- Share of AI-authored content, auto-approved without needing edits
A quick checklist
Using the 5-layer model as a framework, implement quick-check quality criteria:
- Core purpose captured as structured data fields
- Product identifiers and units normalised, and duplicates eliminated
- Coverage of channel-ready “expected” attributes and full compliance
- Feedback loops, analytics and model governance in place
And
- Rich context linked in a knowledge graph* and tagged for RAG
A bit ‘techie’, that last one: Essentially, it means that your product data isn’t just a pile of text. It’s arranged as connected facts, with labels that make it easy for an AI tool to fetch exactly what it needs and cite it. You’re capturing the who/what/where/why/how around a product: variants, compatible parts, standards, materials, safety notes, images, region rules, lifecycle status, customer segments, use cases, and so on. Think: the story around the SKU, not only the SKU.
The acronym RAG means Retrieval-Augmented Generation. When you tag and chunk content an LLM can retrieve first and generate second. In practical terms, you need to:

Confused by PIM Vendors?
With 100s of PIM software vendors worldwide, choosing the right PIM solution can be a daunting & confusing task.
Use our guide to assess PIM solutions against the right capabilities to make an objective and informed choice.
Why this matters
- Fewer hallucinations: the AI model has no other option but to retrieve from your approved product facts.
- Sharper answers: “Which drill fits this anchor in gypsum board?” becomes a single hop across the indexed information.
- Personalisation: tags (region, channel, audience) enables the AI to assemble copy which is suitable for a given market and a given reseller.
- Governance: versioned sources with access tags allow you to keep regulated fields (compliance documentation, for instance) clean and easily auditable.
Start with Data supports you
If you want AI to work with your product data, not guess at it, let’s talk.
We’ll help you assess which layers are missing, where risk is creeping in, and what to fix first to get real commercial value from AI. If you’re ready to turn your product data into a durable advantage, get in touch with us today
Bonus: The Product Knowledge Graph
A product knowledge graph is a map of everything your products ‘know’ about themselves, and how those facts connect. So, instead of using flat data tables, you model products as nodes (things) with relationships (how those things link) and properties (the facts). It makes the difference between a bog-standard parts list and a living wiring diagram.
Core elements of a product knowledge graph
- Nodes (entities): Product, Variant, Attribute, Category, Material, Standard, Document, Image, Accessory, Spare Part, Region, Channel, Customer Segment.
- Edges (relationships): hasVariant, belongsToCategory, madeOf, compliesWith, compatibleWith, replaces, requires, soldIn, approvedFor, referencesDocument.
- Properties (facts): Key/value details on nodes/edges (e.g., cut resistance=C, effective from=2024-03-01, locale=en-GB, lifecycle=discontinued).
- Ontology (schema): The ‘controlled’ vocabulary that says which nodes/edges exist and what’s allowed to connect. Reuse standards where possible (GS1, schema.org/Product), extend for your domain.
Example (distributor)
SKU: 9HX Drill
- hasVariant → 9HX-110V, 9HX-240V
- compatibleWith → Bit Set BS-25
- compliesWith → EN 60745
- soldIn → UK, DE
- referencesDocument → “9HX Manual v2” (pages 3–7)
Each item carries properties (power, torque, locale, approvals, effective dates).
As a result, customer queries like “Which drills sold in Germany are gypsum-safe and compatible with anchor X?” become a stroll in the park, not a search for needles in a spreadsheet haystack.
[1] Philip Kotler was a highly-renowned author, educator, and consultant, widely regarded as the “father of modern marketing”.
[2] Real instances where the AI model generates product information that is factually incorrect, nonsensical, or entirely fabricated, yet presented with a tone of confidence and authority. If the data used to train the AI contains inaccuracies or is not representative of all products, the model simply learns and perpetuates these errors