Sales prediction using Machine Learning

One of the facilities offered byMachine Learning (ML) is the possibility of making time series predictions. To cite some applications, predictions of economic, weather or climate, health, or business indicators.

In the Business Intelligence (BI) environment, a type of prediction that would be of great value could be the prediction of sales. This prediction, for example, would allow us, among other things, to efficiently manage procurement, stock, logistics or rethink marketing actions.

Prediction algorithms

As mentioned in the article Artificial Intelligence applied to BI, there are currently multiple tools available to develop Machine Learning algorithms, many of which are accessible to small and medium-sized companies.

For time series prediction we can currently rely on algorithms based on “classical” statistics, such as ARIMA, Exponential Smoothing, Linear Regression Model or Random Forest, and algorithms based on neural networks such as Classical Recurrent Networks, LSTM, N-Beats, or Transformer type models, for example.

Many of the breakthroughs that are taking some AI applications such as natural language processing (NLP) by leaps and bounds are being made possible by the emergence of Transformers. Broadly speaking, this model includes attentional mechanisms to determine which information or elements are relevant and which are not. Transformers are proving successful in more and more applications such as computer vision. This technique competes with the most widely used convolutional networks or even, in the case in point, in the prediction of time series.

Temporary Fusion Transformer

Going further in our goal of predicting our organization’s sales, a new neural network model has appeared, the Temporal Fusion Transformer (TFT), which precisely combines mechanisms from several previous models, including LSTM-type layers and the attentional components of the Transformers (Attention Heads).

One of the advantages offered by neural models such as TFT over non-neural models is the possibility of being trained on more than one time series. They even offer the possibility of making predictions on series with which the network has not been trained.

Sales forecasting

In our example for sales prediction, we could train our network with the individual sales history by product, or even using series given by multiple dimensions such as the product category, the sales channel, the region or the country in which it is made, for example. Depending on the number of dimensions used in the training and the values that each of them can take, we could have as input a very significant number of time series, which used in the training of a model such as TFT would allow us to make the prediction on a specific dimension or on several of them.

In deep learning, several models have shown improved prediction effectiveness after being trained with tens of thousands of time series.

Another aspect that many Machine Learning models, and specifically TFT, allow us to support training not only based on the historical time series target, but also on other series, which although they are not prediction targets, could further improve the effectiveness of our model. These series called “covariates” can be known a priori (past covariates), as could be the case of industry sales, nationally or internationally, or the evolution of the economy in general. As a more concrete example, if our product were soft drinks or beers, it could be interesting to use as covariates the evolution of temperatures, or the planning of events such as concerts, special days, etc.

Many models also support known future covariates, such as weather forecasts for the next 7 days, for example.

Probabilistic prediction

Some Machine Learning models available for time series prediction allow to do it in probabilistic terms, as opposed to classical or deterministic prediction. While the result of the prediction in the latter is a specific value for a given instant, in probabilistic prediction a range of possible values is given for that instant, with a certain degree of confidence, which will be more or less wide depending on the uncertainty that may exist in the prediction.

Probabilistic Forecasting — Example of probabilistic prediction

This makes it possible to detect, for example, periods in which seasonality is high and the trend is very clear, giving narrower ranges, or on the contrary, periods where the variability of sales is greater throughout the history and which would suggest a deeper analysis to detect, for example, if there are covariates that could be interesting in those periods.

Conclusion

Concretely for a hypothetical case, we could therefore consider the training of a neural network based on the TFT model with multiple time series, whose training we will support with covariate type series both past and future.

This requires analyzing and determining which target series or series would provide value for our organization if we had the ability to predict them and which other series could support their prediction. Ultimately achieving a useful tool that could be used for decision making.

Some of the tools that greatly facilitate the training and prediction of time series and that are available for free are Darts and Pytorch-forecasting. Both are based on the Pytorch library and use the Python programming language. Pythorch together with Tensorflow are two of the most widely used frameworks in ML today.

In a future article(Sales Forecasting – Case Study, Sales Forecasting – Case Study Pt. 2) we will describe a concrete case of sales forecasting using one of these tools.

More information

This post is also available in: English Español

Sales prediction using Machine Learning

Prediction algorithms

Temporary Fusion Transformer

Sales forecasting

Probabilistic prediction

Conclusion

Artículos relacionados

Agents: Extending LLM functionality with OpenAI

Meta AI and Llama 3

Leave a Reply Cancel reply