A4E Case Team Mentors:
 Alexander Efremov(aefremov@gmail.com)
A4E Case Team:
 Gergana Damyanova (gerganavaldamyanova@gmail.com), Irena Lazarova (iglazarova@telenor.bg), Stanislav Georgiev (s.georgiev@softel.bg), Martin Boyanov(mboyanov@gmail.com), Doychin Damyanov (doychin.damyanov@gmail.com), Vladimir Vutov(statistikavladi@gmail.com)
Team Toolset:
 Python: GenSim, Keras, Tensorflow, Python SciKit Learn, Random Forest Regression, PyWavelets
 R: Apriori
 IBM SPSS Modeler
 SQL Server Express 2016 (SQL), Excel
Business Understanding

Who: Retail Client

What (1): Optimal Recommendation for Combined Offer (CO) for next week

What (2): Market Basket Analysis (MBA)

https://gitlab.com/datasciencesociety/case_a4e/tree/master/case_study
Data Understanding
 Explore datasets' structure
 Discover potential variable dependencies across datasets
 Identify the subsets of data that modelling would be based on
Data Preparation
 Raw Data:
 Dataset 1> Data Type: Transactions > Data Format: CSV
 Dataset 2> Data Type: Weather Data > Data Format: CSV
 EDA + ETL + Prep:
 Remove: Duplicates and data waste based on business rules
 Aggregate: Product Total Sales per Day
 Attach: Daily weather data
 Variables reduction: Removed "lowvarying" variables and predictors lowcorrelated to the dependent variables for Sales Volume Prediction Approach 2.
 For the target forecast, two approaches were opted: the forecast to be made on weekly aggregated time intervals and on a daily basis. Outliers and extreme values were identified. We ensured that there are no variables with large percent of missing/outlier values. The dataset was split 70%/30% on Training/Testing for Sales Volume Prediction Approach 2
 Produce up to 24 random permutations of each transaction to increase the size of the dataset needed for Neural Network modelling
 Filter data frame only on last particular month, fetch out cake pieces products and on root indentify most frequently appeared drinking product and Normalize the data  to have mean 0 and variance 1  by using the "scale" function for Graphical Models Approach
Modeling
 Step 1 > Market Basket Analysis (MBA):
 Used Apriori algorithm in R > . Convert the data into basket form based on the saleID identifier and find the association rules by specifying min values for support (0.5%) and confidence (50 %). Output of this step > 45 association rules satisfying the support and confidence constraints with Lift range between (1.6  17.7). Greater Lift values indicate stronger associations.
 Based on client's interest in the cakePieces category we filtered the association rules and came up with 6 association rules with Lift range between (1.9  2.4)
Frequencies.pdf
 Step 2 > Sales Volume Prediction > Approach 1:
 The sales volume predictors used are various weather condition features, working days etc: date, minTemp, maxTemp, avrTemp, windSpeed, wind16dir, precipit, humidity, pressure, cloudCover, FeelsLike, workDay, bankHoliday, workOff
 Currently the regressor used is Random Forest regression with the usual hyperparameters.
 The sales volume is decomposed using SWT with sym14 level 1, which produces 1 low freq part and 1 high freq. These 2 signals are fed into 2 separate RF regressors to train 2 models which represent the overal sales volume model.
 When prediction is done the weather condition and the desired period are used to predict using the 2 models and predict the 2 signals which are used in inverse SWT to reconstruct the predicted sales volume.
 Tuning:
Here the regressor could be SVR with RBF kernel which is harder to tune, but could give some better results. Or some other ensemble predictors similar to RF.
Also there could be some tuning of the wavelet transform using more levels, different filters etc.
For volatile product sales it would be a good idea to tune separate regressors for low and high frequency decompositions.
For example: if SVR with RBF epsilon and C should be different for low and high freq signals.
 Step 2 > Sales Volume Prediction > Approach 2:
 Modeling technique : Expert Modeler option (Time series modelling node) in SPSS Modelev v.15, which automatically finds the bestfitting model for each dependent series. Confidence interval of 95% was chosen. For each of the 11 target variables, apart of the different time interval, modeling tuning was opted via changing the input explanatory variables, and using the option for automatic outliers detection (in the node) . The weekly time series approach was not successful, no significant models were created. Finalized the number of ARIMAbased models.
Data_stats.xlsx
 Neural Network modelling
 A transaction can be seen as a stream of items.
This allows us to apply various models which tackle NLP problems. We were interested in trying out some deep learning techniques  Run word2vec on the transactions.
Word2vec is a popular technique for mapping words to a higherdimensional vector space which is supposed to have some semantic interpretation. Applied word2vec to map the products to a higher dimensional vector space. The resulting vectors for similar products are similar.  Feed these vectors to a Recurrent Neural Network with the objective to predict the last item in the transaction.
 Graphical Models modelling
 Calculate partial correlation
 Plot links on tree structure(Reingold Tilford)
Evaluation
 MBA provides a set of association rules which is then combined with the frequency analysis of items sold alleviates the business to choose a specific combined offer.
 Modelling Approach 1: Most suitable for clients interested in longterm CO recommendation and larger datasets (could expand with artificial data). Allows quick models' development and relatively easy deployment
 Modelling Approach 2: Most suitable for clients interested in shortterm CO recommendation. The ARIMA models through SPSS are relatively fast to develop, in case there is a change in the customer product catalogue and the MBA is updated, also and for most of the forecasted target variables, the models show good results in the shortterm.

Neural Network Approach: It is evaluated against the results of the Market Basket Analysis. When given the precedent, the neural network predicts the antecedent ~52% of the time.
The w2v vectors are evaluated manually. Overall we can see that human perception similar items are also similar in the word2vec model
Example:
melba1 is similar to ('sundaeYogurt2', 0.6426471471786499) ('sundaeYogurt1', 0.5623082518577576) ('melba2', 0.47845354676246643)
mojito2 is similar to ('mojito3', 0.7359622716903687) ('mojito1', 0.6873572468757629) burger is similar to ('sandwich7', 0.5521785616874695) ('sandwich6', 0.5061086416244507)
All the similarities can be found here: https://drive.google.com/file/d/0BxYLkQRqdXrcTVd1Y01XVUNEeTA/view?usp=sharing
Deployment
Could be deployed as selfservice SaaS application.
Client loads data in a secured web based form > data is ingested and validated through ETL cycle > data is fed to the recommendation engine > the engine outputs recommendations in the clients account dashboard section > client makes a decision to act on the recommendation