After our ‘Applied Statistics – SAP BO Predictive Analysis’ McCoy-training, last January 17th we set out to prove the validity of one of our continuous messages to the 15 enthusiasts: ‘…the nice part of statistics is that it’s the same formula, in every tool’! Once again also inviting all our customers with a passion for statistics to the training, as it was a great success all the way the first time.
The tool selection for the hands-on exercises this time included SAP HANA, SAP BW and R (Studio), linking the results of the exercises to each other, and even to some of the results of the SAP BO Predictive Analysis exercises’ results.
1) The first exercise focused on what is described in the ‘What’s HAPpening’-blog; the use of the SAP HANA Analysis Process modeling within SAP BW. A dataset of Stores was to be clustered based on their Turnover by an ABC Analysis:
As it turned out, the Turnover was quite a good metric to identify the A-level stores: also when considering other metrics such as the Margin. Interestingly, these Stores did not have the largest Size (sqf), as the biggest stores appeared to lack both Turnover as well as Margin!
2) The second exercise was related to the ‘Outside-in approach’; connecting an R instance to the SAP HANA database tables using ODBC. Having a simple data set with monthly Turnover metrics for a period of 3 years (up and until December 2017), we wanted to forecast the Turnover for the entire 2018 period. After the replication of the HANA database tables via the ‘odbcConnect’ and ‘sqlFetch’ functions of the ‘RODBC’ R-Package, a subset of the dataset was then transformed to a time-series table to apply the HoltWinters model of forecasting of the ‘forecast’ R-Package via the ‘forecast:::forecast.HoltWinters’ function.
3) The third exercise took the opposite approach of exercise 2: the Inside-out approach; using R Script directly in your SAP HANA SQL Procedure (with an R Server to execute). The same dataset as that of the first exercise was used to now cluster the Stores with the K-Means algorithm based on all the different metric vectors (i.e. additional vectors such as (nr. of ) Staff (members));
The result was to be compared to the same analysis executed during the SAP BO Predictive Analysis training. However, with the above code the results did not match! As it turns out, the ‘kmeans’ algorithm – by default – does not use the same version of the algorithm: Hartigan and Wong (R) vs. MacQueen (SAP BO Predictive Analysis). Similarly, the Number of Iterations between the two tools might have a different default setting, possibly effecting the outcome.
Changing to the above script to include both items in question indeed obtained the same results as the SAP BO Predictive Analysis output.
4) The fourth exercise had the participants stay in the SAP HANA database, this time to use the Apriori algorithm of the Predictive Analytics Library (PAL). Not yet using SAP HANA 2.0 (> SPS 02) functionalities in which the algorithms have more pre-defined procedures that can more easily be called, as a preparation for the training the more cumbersome steps had to be executed first:
Creating types for: (input) data, results, PMML and control-values in the _SYS_AFL-schema;
Using these types in a signature table;
Calling the AFLLANG_WRAPPER_PROCEDURE with the above input to have a ready-to-use Procedure
The Apriori algorithm was then used to identify the relation between buying different items in the grocery store. Visualizing the Apriori-results with the Tag Cloud data preview identified that the ‘Item 1 => 3’-relation had both the biggest support and confidence:
5) The last exercise made use of the Flowgraph designer within SAP HANA; using the same data as the first exercise and the same algorithm (i.e. ABC Analysis) to get the same results
Interestingly, the ‘Predictive Analysis Library’ folder structure of the Flowgraph-designer does not recognize the ABC Analysis as a Clustering algorithm, but rather as something ‘Miscellaneous’!
We can conclude that the proof of the Pudding lies in the eating. All tools have identical algorithms, but some have different settings to start with. Once you know what exact details of the algorithm are used you can see that it's …just a formula.
We will of course continue our journey to bring you statistics. Our next ‘Predictive Analytics’ hands-on training session (on June 14th) will focus on the predictive functionalities with SAP Analytics Cloud (SAC).
McCoy's consultants are specialized in Predictive Analytics and are more than willing to help you on the road of Predictive Analytics. Please feel free to contact us for more information!