As devoted readers know, we often remark that a huge portion of Ecommerce research is hardly relevant for practitioners outside of the few retail giants who lead the space.
This is due to the peculiar nature of their data distribution and computational resources. Companies like Amazon or Walmart that are swimming in data – and have many repeat customers are looking to solve problems that might not immediately translate for retailers and brands of smaller sizes. The same goes for the solutions: the methods adopted by companies that collect humongous amounts of data is not immediately applicable to the rest of us.
At Coveo we recognize the need for a trade-off between competitiveness and open (science+source) initiatives. We also believe that every organization at the forefront of AI should think long and hard on how to contribute back to the scientific community. That’s why we are thrilled to be the host of SIGIR eCom Data Challenge.
Call for Ecommerce Researchers
The Special Interest Group on Information Retrieval, better known as SIGIR, is the world’s top venue for elite organizations to showcase their breakthroughs in search and recommendations every year.
The group spans many industries and has specific areas of interest as well. In 2020, Coveo presented multiple papers in SIGIR eCom, a premier venue to publish works in AI applied to ecommerce. A bit of a humble brag: we shared the stage with leading players in the space such as Alibaba, Shopify, Amazon and Etsy.
This year Coveo took it up a notch by being selected as the host of this year’s challenge. Previous hosts of this challenge include such industry behemoths as Rakuten and eBay.
In addition to being among the world’s foremost data scientists, we are particularly excited by our role since SIGIR 2021 was supposed to be in Montreal. And while it’s true the conference will be fully virtual due to the COVID19 pandemic, it’s still a great opportunity to showcase the AI coming out of Québec.
There is no progress in AI without high-quality datasets. And there is no progress for the AI community if high-quality datasets are not made available to researchers. For this reason, we are particularly proud to be hosting the SIGIR eCom Data Challenge.
The Data Challenge will allow for teams from all over the globe to put their modeling skills to test on high-impact use cases. To do that Coveo will provide a new anonymized dataset for ecommerce that is of unprecedented breadth. Teams will get more than 30M (fully anonymized and hashed) browsing events, generated over several millions of shopping sessions produced by real users.
Together with clicks and views, the dataset includes (anonymized) search interactions and content-based embeddings pre-calculated from the catalog features.
Solving Session-Based Product Recommendations
Armed with this wealth of data, researchers will be asked to come up with original solutions to two core problems in ecommerce. They will be asked to use session data to advance product recommendations and cart abandonment predictions.
Product recommendations are a tried and tested strategy for ecommerce websites of all shapes and sizes. It’s hard to overestimate the excitement and industry gains, should we find improvement on benchmarks.
The challenge will focus specifically on the following problem: given anonymous information on product and search interactions during a certain shopping session, can we build a machine learning model that reliably predicts shoppers’ next actions?
This session-based approach to recommendation is particularly relevant when it comes to personalization. The whole idea of using only the behavioural data from the sessions is rooted in the growing need to provide personalization to users as fast as possible, on the basis of the actions undertaken by the user during the very same session.
We believe that this approach to the problem is all the more important for the industry because it addresses the need for reliable personalization strategies for all those e-commerce websites where users do not necessarily register, leaving retailers with only anonymous information about the user behavior and nothing more.
Determining Intent Prediction
While somewhat less popular, cart abandonment represents a crucial example of what is more generally called intent prediction. For the cart abandonment task the problem is framed in the following way: given an anonymous session that includes (among other interactions) an event of add-to-cart, can we predict if the user will complete the purchase before the end of the current session? If so, how many clicks after the add-to-cart event do we need to be sure?
Given our previous research on the topic, we are interested not just in solving the sequence classification task, but also in qualitative considerations that could improve our understanding of the general problem, and the applicability of the findings.
See You (on the Leaderboard), Space Cowboys
Participation in the challenge and access to the dataset is completely free, and simply subject to our research-friendly T&C. The call for registrants opens today, April 21 and closes June 1. To learn more, please check the public repository. If you have questions about the data release or ideas for new areas of research – even unrelated to those highlighted above – don’t hesitate to get in touch!
The Coveo Data Challenge is yet another successful collaboration between us and academia. First of all, a huge thank to Surya Kallumadi and the SIGIR eCom committee for making this initiative possible, and a shout out to Luca, Patrick, and Coveo’s legal team for their support; second, we wish to thank our colleagues Jean-Francis and Christine; finally, we are lucky to have Giovanni Cassani and Federico Bianchi on board for this project, and we are even luckier to count them among Coveo’s friends.
While the road ahead to make ecommerce research relevant for the bulk of the market is still long, we know the wisdom in the old saying: a “journey of a thousand miles starts with one step.” In our case, it is a step with 30 million events and an international data challenge.
– Jacopo & Ciro, on behalf of the Data Challenge committee.