DATA MINING CUP 2020

21th edition: 162 Teams from 126 universities in 35 countries

It is no secret that the ability to optimize stocks provides many benefits for retail companies. Different advantages may accrue, depending on the type of company, its strategy and its situation. It allows store-based retailers to downsize their storage space and increase the sales area to provide a more open and inviting shopping experience, for example. Online retailers, on the other hand, may be able to upscale their business without relocating their entire operations to a larger facility.
Overall, optimized inventory planning helps to reduce the number of slow-moving goods, because retailers only stock products that people actually buy. This, in turn, means that it is not necessary to send customers away because products are temporarily unavailable; this increases both revenues and customer satisfaction. Moreover, fewer slow-moving goods means less reorganization, accounting and clearance and this also reduces the work time required and the outlay for logistical services.
For these reasons, forecasting demand is the focus of this year’s DATA MINING CUP.

Scenario

An established retailer wants to optimize its inventory planning to not only significantly reduce storage space, but also its costs and need for logistical operations. It plans to restock its inventory every other week and only keep in stock the items that it has actually sold during that period.
The goal of the participating teams is to create a machine learning model to predict the demand for every product over the two-week period. It is important to point out that some products will be promoted for limited periods of time. Products that are promoted during the simulation period will be earmarked. However, the transaction data needs to indicate whether a product is being promoted during the training period. Finally, the model does not need to be able to respond to price changes during the simulation period. To simplify matters, prices will not be changed during the period.
In order to create this model, the teams obtain information about the exact time of every transaction during a period of six months and about other features that describe the products.

Task

Historical data must be used to create a machine learning model to reliably forecast the demand for each item in the “items.csv” file for a period of 14 days. Use the period starting on 30 June 2018 00:00:00, the day after the last date from the transaction files.
The historical demand for an item (e.g. daily) can be derived from the “orders.csv” file by aggregating the orders for each item (daily). The “orders.csv” file is not already aggregated (e.g. on a daily basis); as a result, the participant can choose the scope of its time steps more freely.
In addition to time-dependent features, participants are allowed to use any attribute provided by the “items.csv”, “orders.csv” and “infos.csv” files.
The solution file must match the specifications described in the Data section, if they are relevant. Incorrect or incomplete submissions cannot be assessed.