BLOGBUSINESS SCIENCESAP HANA
Ana Carolina

SAP Hana Cloud databases with Jupyter notebooks

Data-driven’ is the buzzword when it comes to most organizational decision making today. Businesses are constantly seeking innovative ways to analyze and extract insights from their vast amounts of data. This blog explores the infinite possibilities brought about by the use of open-source language with SAP Hana Cloud, to obtain data and collaborate with business science. Namely, by integration of Hana cloud databases with Jupyter notebooks.

Ask a data scientist and they will tell you about the definite advantages this integration offers to them and the company (Figure 1). You have access to unified data, advanced analytics, real-time and streamlined data, leading to better collaboration between business teams, scalability and performance. This is what the enhanced power of having a cloud solution (SAP Hana Cloud) fully integrated with open-source tools (python libraries and Jupyter Notebooks) can deliver.

Figure 1 - Powerful outputs from connecting your Jupyter Notebook to SAP Hana Cloud

The power of Jupyter notebook

We chose Jupyter Notebooks over other environments since they excel in their interactive and  documentation capabilities, rich visualization support, collaboration features, and the wide range of languages that are supported. They are an obvious and powerful choice for data scientists and analysts worldwide. As an SAP partner with in-depth expertise, we help organizations find the best-fit solutions or customization for their systems. We know how leveraging open-source tools and collaborating with SAP can help data scientists transform their data analytics journey, gain a competitive edge, and deliver significant value to their clients.

Secure Connections and Data Collection

Below, we show you how to apply Hana Cloud Databases in Jupyter Notebooks and design machine learning models to get insightful predictions to your business. To extract insights from Hana Cloud databases, establishing connections in a Jupyter Notebook is essential (also from multiple containers and schemas – UNIFIED DATA ACCESS). See Figure 2, 3, 4 and 5. By utilizing the "hana_ml" library, we can securely connect Jupyter Notebooks to Hana Cloud and collect the necessary data for analysis and preprocessing.

Figure 2 - Reading json configuration file and connect to Hana Cloud

After these steps data can be collected from Hana Cloud in a Jupyter Notebook.

Figure 3 - Table and schema name to enable collect function from hana_ml library

Figure 4 - Collect data from Hana Cloud

Streamlined Data Collection and Preprocessing

After connecting to Hana Cloud, we can collect data from the database and perform Queries and preprocessing methods. This streamlined approach empowers organizations to use unified platforms for business science, integrating ETL processes and data science algorithms. The outputs can then be seamlessly integrated with various front-end tools, enabling efficient report generation and insightful analysis.

Machine Learning in SAP HANA Cloud

In this section, we demonstrate how machine learning models can be trained to establish a data science pipeline using Hana Cloud tables. Our expertise with this application, though, extends well beyond these brief insights, and we are committed to collaborating with customers to achieve the best outcome.

We utilize data from the "T_SALES_ORDER_BIKE_SALES" table. We selected it from sales_order_df two columns: CREATEDAT and  NETAMOUNT:

Here we propose to use a random forest regressor as ML model. A random forest is a meta estimator that fits several classifying decision trees on various sub-samples of the dataset and uses averages to improve the predictive accuracy and control over-fitting.  To begin, working with the sales_order data we split into train and test datasets (Figure 5).

Figure 5 –Split in train and test dataset

Figure 6 – Create and fit the Random Forest Regressor Model

Figure 7 – Make predictions on the test dataset, evaluate the model and analyze the predictions made.

Our random forest regressor trained well - has a good root mean squared error and good coefficient of determination. Using ML and applying that to your database should be easier when working with fully integrated tools and knowledge. You may notice that there are several ways to build ML models in Jupyter Notebooks. What we have here is a simple overview of some of the capabilities and demonstration of how efficiently an ML model can be trained using the Hana Cloud connection. Are you interested in learning more? Then get in touch with us. Here in the Business Science team @McCoy we look forward to creating the best suited and simplest solutions, for your SAP data.

Opportunities and infinite possibilities

The integration of Hana Cloud Databases with Jupyter Notebooks opens an array of possibilities for data scientists, analysts, and businesses. From seamless data access to advanced analytics, real-time streaming analysis, collaborative data science, interactive visualization, and scalable performance -- it revolutionizes the way data-driven insights are generated and empowers organizations to make informed decisions faster than ever before.

Embrace the limitless potential of Hana Cloud and data science with Business Science @McCoy. Uncover hidden insights, make smarter decisions, and outshine your competitors. The future of intelligent analytics and a world of transformative possibilities awaits.