Relevance in Digital Commerce
First, remember that in digital commerce, relevance requires knowing as much as you can about your buyers so that you can guide them to exactly what they are looking for. This dynamic guided navigation experience includes facets, autosuggest, type aheads, and recommendations.
But to do any of the above you need data. For B2C and B2B you need data about your customers, their buying histories, their likes and dislikes, and their demographics. You need data from your loyalty and point-of-of-purchase systems. You want customer reviews and customer location data,
In B2B, you might not have as many customers as B2C, so you want to make sure you have all that data about your products. That means data from your product catalogs — and product descriptions. Data from your CMS, your CRM, your ERP systems.
Unfortunately, historically “search” has only meant retrieving data from only one of those repositories — such as the product catalog. Further, it often limits you to only searching on the structured data (typically the column headings) within that repository. That would eliminate a buyers ability to search any unstructured data that might be included.
And what if you have a recurring customer looking to reorder. Do you have their prior purchases easily findable? Or what if your customers want to search by price — do you let them see it — or must they call a rep for pricing?
Giving your customers information on all these different variables requires having a robust, unified index. And that unification process is usually when the appetite to unify is greater than the stomach to do the work. An Agile Index will get you where you need to be — without any ulcers. Here’s how.
How to Integrate Customer & Product Data
If you architected your system more than a decade ago, you can be forgiven if you consider bringing all that data together to being a Herculean task. After all, it is likely all those systems represent a variety of data types and formats, including documents, data and even binaries. My colleague Vincent Bernard, a veteran search architect, reminded me that once upon a time, unification required “defining how data should be represented — and it was hard to agree on the objects, their properties, etc.”
“A key struggle has been related to the way information is stored and siloed. Merchants have lots of data, but they are not able to get a 360-view of their customers easily. CRMs are good at providing the customer info and their interaction, but the ERP and POS systems aren’t integrated, he begins.
“On top of that,” he explains, “many merchants are using a variety of systems — some very old, some newer.”
Rise of Agile Index for Disparate Data
It used to be that with all these different content shapes you would create multiple indexes — and use a query federator to find the best result. There are so many reasons not to do this — one is performance and the second is you can’t really create relevance. You end up with a lot of segmented results — instead of aggregated ones. So back to Goldilocks — you have lots of recall — not much precision.
Creating a single index then was key — but oftentimes it meant so much massaging of data — that certain data types would be left out. And if indexing a variety of structured (transactional data) was hard — adding in unstructured/semi-structured data such as reviews and documents was challenging.
An agile index lets you ingest all this heterogeneous data (binaries, documents and key values) — by transforming it to a uniform document model such as JSON. Now you want to index these documents.
Anytime you create an index you have to think in terms of its structure. For information/knowledge retrieval, an inverted index is best. An inverted index is similar to that found at the end of a textbook, which tells you where certain words or terms can be found. In this case it is a map to the digital location of the data or document.
How to Index Product Catalog Variants, Entitlements & Availability
Indexing documents can be straightforward, although product catalogs can create a tremendous overhead really quickly. One of the characteristics of catalogs is that in both B2C and B2B ecommerce, products might have many variants — such as color, size, flavor, etc. In a catalog without variants, the product is obviously the purchasable item. In a catalog with product variants, users search for products, and then select the proper variant to purchase.
In order to optimize your search performance, you want your index to be as light as possible. So you want your search engine to handle your product catalog structure in a way that optimizes your search. To do that you need an index that will allow you to build relationships.
For example, if you have 1000 SKUs, each having 100 variants, you would need to index all the variant combinations — meaning 100,000 items in the index. What if you also had 10,000 users with different entitlements. That would swell to 1 billion items! You want to make sure you have a system that allows you to group your variants in relationships — either at index or at query time.
Here are some examples:
- Of the three objects above: SKUs, Variants and Users — what does the relationship hinge on? In most cases it would be the User who is allowed to have access to some variants of products — not all of them. So in this case, you would group together the User + Variants + SKU — and that is what would be indexed.
- Of think store availability — a store carries certain variants of a product. Again, you would index the store location + Variants + SKU. If the store doesn’t have the variant, it would not show up in the results.
Product Search Engine and Relevance
One of the benefits of having this agile index is being able to understand what a person is looking for — or the intent. When a person queries “handbag” you can associate it to mean “purse” or “bag” or “sac.” One of the ways to discover intent is through log analysis. You can certainly try and do this analysis by hand — such as pouring through log files to see what terms people write and adding them to dictionaries and thesauri.
Or you can do an offline analysis of what people buy — or what products are bought — and then create rules to promote those items.
However, if you have a million SKUs in your catalog, or even tens of thousands, doing the above tuning by hand means needing a legion of people to pore through log files, building thesauri, mapping tail queries to heads, and creating lines and lines of code.
Or, you can use machine learning.
In our next blog we’ll look at how machine learning and your agile, unified index play well together to give your shoppers a truly personalized experience.