Data normalization is essential to aligning unstructured and structured data to create actionable information and knowledge.

My health is important to me, particularly how much I weigh. Like all Canadians born under the metric system, of course, I count it in pounds. To this day I have trouble going from pounds to kilos for quantities as big as a person’s weight. The same goes for miles and kilometers, although through some odd intertwining of my neurons I find it easier to refactor figures by 1.6 (mi to km ratio) than by 2.2 (pounds to kilo ratio). For the life of me I cannot mentally convert Celsius to Fahrenheit, I invariably ask my smartphone to do it through voice command.

We do many data conversions like these in our daily lives, without thinking about it. Usually because only one of the two values make sense to us, between lbs/kg, mi/km, F/C. Machines deal with such conversions, and much, much more complicated ones, to be able to make sense of all the data they process. This data normalization process is but a part of the data transformation journey as it is outlined by the DIKW pyramid.

 

data normalization pyramid

For example: Say I weighed 165 pounds . 165 is merely data. The fact that it is a number that measures weight makes it information. Cross-referenced with a body mass index chart that accounts for a person’s height, puts that information in the context of what constitutes a healthy weight for a person of a certain gender, height and age: that’s knowledge. This knowledge enables be to make an informed decision – which in this case will be to keep training and eating properly most of the time (yay imaginary me!)

Where Data Normalization fits into the DIKW ‘journey’

The DIKW pyramid is a very good tool to help visualize how wisdom, or insight, or whatever you want to call the power to make an informed decision, percolates from knowledge, which is a series of conclusions obtained through information, which came in its own time from seemingly disparate facts and these facts stemmed from seemingly innocuous bits and pieces of data in an ocean of data.

The pyramid wisely does not say exactly how the transition between each step happens. The process is bound to vary wildly from one organization to the other, but the path to wisdom, or insight, becomes harder every year. In their book Organization Data Mining: Leveraging enterprise data resources for Optimal Performance (Nemati, Barko. 2003), two enterprise data experts cite an already old (2002) study from the University of California Berkeley that “found the amount of data organizations collect and store in enterprise databases doubles every year, and slightly more than half of this data will consist of ‘reference information’, which is the kind of information strategic business applications and decision support systems demand.”

Organizations have been generating data faster than they can consume it for a good while now. AI-Powered Search acts as a way to build a pyramid on top of which people in the organization can stand, see the bigger picture that this sea of data actually paints, and base their decisions on the right insight, provided at the right moment.

What’s really interesting is the next-to-last step in the DIKW pyramid: the one that takes us from information up to knowledge. Specifically, how do you take multiple systems that model the same information in different ways, and reconcile these bits of information so you can search across everything efficiently?

At Coveo, we informally call that process data normalization. In the context of Coveo’s Relevance Maturity Model, this applies somewhere at level 2, when organizations move from federated to unified search.  It is crucial to set these things right to have a solid foundation upon which to build the next steps.

Coveo Relevance Maturity Model

How Coveo handles data normalization during solution implementations

Our services team or implementation partners build this foundation when a solution is implemented for our customers. Like the Egyptian architects of old, we bring experience in design and marvelous tools to help shape a vision. And we also have to work with the materials that are available. There’s data, sitting across multiple ticketing systems, mail servers, file shares, CRM systems, CMS systems, and so on. Of course a lot of it is already arranged in finite sets of elements, so that you have a measure of data structured into information available already. It’s never ‘just a primordial soup of data’. What we usually find, however, is that information is structured and expressed in different ways depending on what systems you look at. Enter data normalization.

The tricky part in taking data and shaping it into information, and then knowledge, is to find common elements among, say, knowledge articles from a CRM, PDFs sitting in a fileshare, and JIRA support tickets. Once you have pinpointed those elements which represent the same concept in a myriad different ways they have to be made the same at the level of Coveo’s Unified Index. Let me illustrate this.

Since Coveo is often used by businesses who sell products, many of Coveo’s implementations naturally revolve around product information. Sometimes there is existing metadata in documents that clearly state what product(s) it relates to. Sometimes the information is there but it’s unstructured, as is illustrated by the data element in SharePoint where the product name and version are clearly there but in one single element . More often than not, there are subtle differences in the way Awesome Product v1.2 is tagged in a SharePoint page, a JIRA ticket and a Salesforce Knowledge Article.

 

Source Data element Content
SharePoint Item Metadata (name: ows_productfull) AwesomeProduct v1.2
JIRA Ticket Field (name: product) Awesome Product
JIRA Ticket Field (name: version) 1.2
Salesforce KB Article Field (name: productid) 0000AgZxyz12355446
Salesforce KB Article Field (name: productversion) 1
Salesforce KB Article Field (name: producsubversion) 2

Take it from a Solution Architect: this is not an extreme example. It’s fairly common for the Coveo services team  to have such discrepancies to reconcile so that information that pertains to the same concept gets tagged correctly in the Unified Index. This is a team effort, one that requires the valuable input from the customers’ system owners, curators and users. Through workshops, our Business Architects and Solution Specialists elicit a sense of what information elements need normalization across the systems to index, automate common structuring among these systems and build the baseline that will allow to achieve relevance.

Following up on the example outlined above; the customer and Coveo teams would likely decide that what’s needed to organize knowledge in a meaningful and structured way is to have two fields in the Coveo index: @productname and @productversion. Those two fields would then be populated differently based on the Source being worked on. Once populated, these fields have the potential to be used as facets that allow a user to narrow down a result set by selecting certain values that are meaningful to him/her.

SharePoint: using an Indexing Pipeline Extension (IPE), the ows_productfull metadata will be parsed to extract the product name and version from the character string and be applied to the Coveo fields.

JIRA: direct mapping can be applied in the source’s configuration. Simple!

Salesforce: the version is easily taken care of by concatenating the version and subversion information into the @productversion field. The product, however does not have a name to refer to but an id. This is solved by again using in (IPE) which can go at least 2 ways:

  1. The IPE makes a call to Salesforce to retrieve the product name based on the id, then mapped to the @productname field. This guarantees best precision, but can add a significant amount of time to the indexing process.
  2. The IPE contains a static id/productname mapping which is used to populate the @productname field.

Coveo can act as a guide to shape knowledge in your organization

Looking at all this I imagine one cannot help but ask: so I basically need to be an expert on my own data if I want to get value out of it? Although having someone in a knowledge management role in your organization definitely helps, he short answer is no, you don’t need to be an expert. That’s because our Services team know what questions to ask, and how to explore data from your systems to figure out how what data elements matter to you in your daily workflows, what data needs to be marked as measurable information to make actionable knowledge out of it, so that Coveo can normalize and present it to you and your organization in the right context for you to make insightful decisions.

Making sense of all the data and information contained within a company’s systems is a complex task, one that requires hard thinking, exploration and discovery of what your ecosystem of information is made of. As the steward and user of this ecosystem, you are the best person to know what the important elements are within your ecosystem.

In your journey to highlight and make the best use of this ecosystem, think of Coveo’s Services team as a compass to help you orient yourself, and Coveo’s platform as a GPS system that will make use of the map you build to guide everyone from your company to the right insight, every time. Contact us for a demo of Coveo to see its power in action.

About Francois Verpaelst

Francois Verpaelst is a Solution Architect at Coveo. He has over a decade of experience, from development to design to team lead, in software and IT consulting firms and has a passion for Intelligent Search. When he's not helping customers make the best of their Coveo solution, you can find him spending time with his wonderful wife and daughters, running, or practicing martial arts.

Read more from this author