Semantic Similarity Matching Using Contextualized Representations (In a Nutshell)

Semantic similarity matching between texts is one of the sub-tasks of Natural Language Understanding (NLU). It has a wide range of applications, such as informal retrieval, question answering, and paraphrase detection.

Traditionally, machine learning models based on word frequency or word embedding representations were used for similarity evaluation. While these models can be fast and effective in many cases, they can’t figure out the semantic similarity between two pieces of text when a given concept is represented in two different wordings.

More recently, transformer-based language models achieved great performance in textual sequence matching. The problem is that these models are computationally expensive and require very efficient graphical processing units (GPUs). These restrictions make it vital to develop models that are scalable and effective given certain performance metrics.

Focusing on scalability and performance metrics, different approaches have been used for semantic similarity matching. These approaches generally fall into one of the following categories:

Those based on the interaction between two textual sequences.
Those based on static representations, paving the way for the pre-computation and reusability of those representations.

But which approach is the most effective? That’s what Coveo data scientists looked into in their recent paper: Semantic Similarity Matching Using Contextualized Representations.

They experimented with three types of models to evaluate which of those are the most effective in terms of performance and required inference time. The models that were experimented with were:

An interaction-based model
A representation-based model
A new type of model that combines the mechanisms of interaction-based and representation-based models.

The conclusions showed that the interaction-based model achieved the best performance in terms of accuracy. However, this model type isn’t efficient enough to be used for retrieval tasks where each new query needs to be compared with several other samples.

Conversely, while the representation-based model achieved the lowest accuracy performance among the models tested in the experimentation, it also appeared to be the most efficient in terms of inference time and showed that it can be used in situations where there’s no access to GPUs or in use cases where the inference time is the most important factor.

Finally, the model combining the mechanisms of interaction-based and representation-based models showed that, when GPU resources are available, it achieves better accuracy performance than the representation-based model and that it is more efficient than the interaction-based model.

The answer to the “Which approach is the most effective?” question therefore depends on the use case in which the model is used. Does it need to be used for retrieval tasks? Does it have to provide a very low inference time? Is there access to efficient GPUs? This was insightful in our use cases, as the Coveo models have to navigate in various environments, and figuring the right balance of accuracy and inferences to best adapt to each of our customer’s needs and expectations is key in delivering a relevant service.

Those are questions and challenges that an NLP specialist would regularly face at Coveo. We are always looking for new colleagues, join the Coveolife if this interests you! For more details on the methods and the architecture of the models used for this experimentation, we invite you to read the full Semantic Similarity Matching Using Contextualized Representations paper that was published for the 34^th Canadian Conference on Artificial Intelligence.