1. Sales forecasting
Having pre-processed and analysed the data, the team was now ready to train regression models to forecast product and category sales throughout a year.
After doing some research, the team decided to experiment with two algorithms: linear/polinomial regression and neural networks, as both algorithms seem to be the most used and effective algorithms for sales forecasting, with neural networks in particular being very effective due to it's non-linear regression, at the cost of being for computationally expensive to train and optimize its hyperparameters.
After some experimentation and analysis, the team found the neural networks to generally be more accurate than an equivalent polinomial regression model. For instance, the team trained two models to predict the sales of vegetables throughout the year and the neural network came out on top, with a correlation coefficient of around 0.70, compared to the 0.61 of the polinomial regression model.
It's worth noting that having data from both 2019 and 2020, the team thought it would be interesting to train two different sets of models, to predict sales during pandemic and non-pandemic circumstances, as the customer behaviour is prone to change in both situations.
For instance, the following graph shows that by march of 2020, the start of the pandemic, there was an abrupt decrease in take-away sales, which was one of the categories with highest demand on the supermarket being studied, while the demand for cleaning/hygiene products rose significantly.
2. Deployment
This week, the team felt that work was better distributed to two teams of two people, with the first team being responsible for the sales forecasting problem, while the other team worked on the back-end module where the solution will be deployed, including the genetic and machine learning algorithms, allowing the front-end to interact with the solution through a REST API, thus completing the web application described in the following component diagram.
This back-end module is planned to provide two endpoints: (i) obtain the optimal plan for a given store and time of year; (ii) obtain the sales forecast for a given product/category for a specific time of year.
It is worth noting that although both teams were working in different problems, they were still in touch, exchanged ideas and reported the current state of work, which kept everyone synchronized and up to date.
3. Scientific article
Having a delivery of the state of the art chapter this week, the team decided to review what was written last week, while also taking the opportunity to advance a bit more with the other chapters, namely the solution proposal and overview, which has already been done in an earlier post.
Besides the overview, the team also took the opportunity to reflect on the advantages and disadvantages of the proposed solution when compared to other solutions out there.
For instance, the modular nature of the genetic algorithm, support for different types of layouts, adaptation to the season/time of year and the innovative integration of the machine learning algorithms with the genetic algorithm all emerge as significant advantages for the proposed solution.
On the other hand, the reliance on historical data and difficulties in adapting to unknown circumstances also come up as disadvantages when compared to solutions that may rely on more data and/or a more programmatic/rule based approach to its solution.
4. Week retrospective
In retrospective, similar to the previous week, this one was also rather productive, with the team making progress on different fronts, providing a strong and secure foundation for next week's main topic: the genetic algorithm implementation, a critical component of the proposed solution's success.
On the other hand, similar progress is expected on the scientific article, as the team expects to have the 8th (and final) week available to review the article and write the conclusion and abstract, effectively concluding its writing.
Comentarios