Are you looking for a way to visualize data in Snowflake (live) with PowerBI using something else than the standard Visualizations? Then you’ve arrived at the right blog! In this blog, a ‘how to’ is given on the easy connectivity possibilities between PowerBI and Snowflake in general, but also on visualizing the data using R script visuals.
To illustrate the connection you have made after following the steps of this blog, the same dataset is used as in our latest blog explaining how to connect Snowflake and R Studio. To recap, the dataset used is about clustering stores based on certain KPI’s of the stores using the kmeans algorithm.
The table that stores the data in Snowflake is:
The ‘R_TABLE_KMEANS_INPUT’ table (‘input’ from a PowerBI point of view). This table has a set of a 150 ‘Stores’ with certain related KPI’s (Turnover, Size, Staff and Margin) that are used to determine the clusters:
In this blog, the following steps are explained:
Setting up a connection between PowerBI and Snowflake
In-between configuration steps in PowerBI
Creating the R script visual
For this blog, PowerBI Desktop will be used, which can be freely downloaded.
Steps that are required:
Selecting Snowflake as a possible source of data within PowerBI: using the ‘Get data from another source’ option and searching for Snowflake as a possible Source:
Having selected ‘Snowflake’ as a source and choosing ‘Connect’, the ‘Server’ and ‘Warehouse’ need to be added as input for the connection to be established, where:
Server = <your Snowflake account name>.<your region>.<cloud host>.snowflakecomputing.com
Warehouse = COMPUTE_WH
Additionally, credentials are needed to access Snowflake (note: depending on if it is the first time you are connecting or not, the screen might look different compared to the below):
When the connection is established successfully, the Snowflake Databases become visible in the Navigator. Drilling down, the ‘R_TABLE_KMEANS_INPUT’ Table can be selected in the “SNOWFLAKE_PBI”.”PUBLIC” Schema, and loaded with the DirectQuery Connection setting:
When selecting the R script visual in Power BI, a few options become available:
The ‘Visualizations’ section where the ‘R script visual’ was actually selected
The ‘Fields’ section where the fields of the connected Table/View can be selected that will become available in the actual end-dataset. This dataset is then usable by the R script visual
The placement of the R script visual in your dashboard
The ‘R script editor’, where the actual R script coding is added
Depending on which R package is used, it might be the case it is not readily available to be used in the PowerBI environment yet. For this, there are a few possible solutions:
The package can be installed with the ‘install.packages(“…”) command, with the additional parameter ‘repos’ of an accessible repository. For example, for the DT package, this can be initialized via:
Setting the ‘External R IDE’ correctly, via ‘File’ -> ‘Options and settings’ -> ‘Options’ -> ‘R scripting’, and managing the following settings:
If correctly set (e.g. the above example using R Studio as External R IDE), executing the following in the R script editor will automatically open up R Studio and enable the execution of the ‘install.packages’ command indirectly (note: a more detailed how-to is also given here).
Will open up an R Studio ‘REditorWrapper’, with which the package can be installed:
Within the R script editor, the ‘R_TABLE_KMEANS_INPUT’ fields that are used, are merged into the ‘dataset’, which is converted towards a data.frame, only storing unique values:
Using this ‘dataset’, the following steps are now executed to visualize the Clusters:
Via the steps mentioned at ‘Step 2: In-between configuration steps in PowerBI’, install and initiate the relevant packages
Use the ‘kmeans’ algorithm of the ‘cluster’ package to calculate 5 clusters
Use the ‘fviz_cluster’ function of the ‘factoextra’ package to visualize the 5 clusters
Executing the following R script in the Editor of PowerBI, will lead to the desired visualization:
In which the 5 identified Clusters are shown distributed among Principal Component Analysis (PCA) dimensions 1 and 2.