top of page
Smart Retail (Branco).png
Writer's pictureEduardo Silva

Week #3: The Plan

Updated: Jan 4, 2022

1. The dataset

As promised by Hernani in the previous week, he provided the team with a dataset containing baskets/receipts from a big supermarket chain in South America, more specifically in Argentina, with the following features:

Feature

Description

Receipt ID

Receipt/Basket identifier

Date

Date of the purchase

Category

Category of the product (e.g. Vegetables)

Product

Product designation (e.g. Tomato)

Quantity

Quantity/Units of the product that were bought

Cost

Total cost of the product in the receipt

It is worth noting that each entry of the dataset corresponds to an entry in a receipt, identified by its ID. As such, an entire receipt may span multiple entries.


Furthermore, this week, Hernani could only provide a small sample of the dataset (only for 1 day), so the team could analyze the available features and already perform some exploratory data analysis and pre-processing.


A larger and richer dataset will be provided next week, ranging 2-3 years of one of the supermarkets receipt/basket history, which may allow the training of interesting models such as a Product Demand Forecast, throughout the year, based on previous years’ trends.

2. Proposed solution

Finally, with a dataset in the team’s hands, it is possible to propose a solution that will aim to solve the problem of Store Layout and Shelf Planning.


The team believes that, with this being a placement/organization problem, a genetic algorithm may be an interesting approach to solve it, as it will produce a series of generations of store/shelf layouts, with each being better than the last and evaluated based on a series of heuristics, that may be supported by machine learning algorithms, with the latter extracting knowledge from the available dataset.


These machine learning algorithms may range from Product Demand Forecasting, as stated before, based on a regression algorithm, to a Frequent Product Association ranking, based on an Apriori Algorithm.


To sum this up, the team’s vision is portrayed in the following diagram:

Proposed Solution - Overview

It is worth noting that section and shelf layouts can and should be tackled in different instances. The team believes that best approach is to first apply a genetic algorithm to identify the best section layout and then, for each section, identify the best product placement on the shelves, also through a genetic algorithm.


3. The plan

Similar to what was done in the last project, the team decided to establish a set of milestones to be reached in each week of the challenge's development, although it is prone to changes in the future:


Week #3 (13/12 - 17/12)

  • Obtain the receipt/basket dataset from Hernani;

  • Identify the problem and propose/define a solution.


Week #4 (20/12 - 23/12)

  • Perform an initial analysis and pre-processing on the dataset;

  • Study the tools and technologies that will be used in the implementation (e.g. Jupyter, Scikit-learn, PyGAD);

  • Write the introduction of the scientific article.

Week #5 (3/1 - 7/1)

  • Clean and preprocess the original raw dataset;

  • Perform an exploratory data analysis on the dataset;

  • Write the chapter "State-of-the-Art" of the scientific article.


Week #6 (10/1 - 14/1)

  • Perform the training and evaluation of the necessary machine learning models (i.e. Apriori and Regression algorithms);

  • Implement a genetic algorithm for both the store layout and shelf planning;

  • Write the chapter "Proposed Solution" of the scientific article.


Week #7 (17/1 - 21/1)

  • Integrate the trained machine learning models with the genetic algorithms;

  • Implement a back-end that grants access to the solution through a REST API;

  • Write the chapter "Solution Implementation" of the scientific article, following the CRISP-DM model.


Week #8 (24/1 - 28/1)

  • Implement a front-end that interacts with the back-end and presents the obtained solution(s);

  • Write the conclusion and abstract of the scientific article;

  • Review and and refine the scientific article.

In the following weeks' posts, the team will check on the plan and verify if everything is on track, by confronting the obtained results with the goals set in the challenge roadmap.

39 views0 comments

Recent Posts

See All

Comments


bottom of page