Many large companies like Yahoo! and Facebook have embraced Hadoop as their leading big data solution. It’s scalable and affordable and offers a great ecosystem for storing data. However, it is not suitable for data analytics. As data grows, the demand for ad-hoc queries and real-time analytics grows too. Especially in customer focused industries like retail, energy and financial services where real-time analysis plays a vital role.
Nowadays, combining batch processed system data like Hadoop with structured data is on every agenda in a data driven organisation. SAP’s new software solution, called SAP HANA Vora, makes this possible. It bridges the gap between traditional transactional data and new big data technologies. VORA is an abbreviation for the word “voracious” which can be plotted against the word “data” resulting in the craving or consuming of large quantities of data.
The main goal of this newly positioned tool is to incorporate distributed and unstructured data in a quick and efficient way from analytical database engines like Hadoop, Spark and HANA. This plugin - as SAP calls it - can be easily integrated with Apache’s Spark, the multi-stage in-memory solution. Currently, Spark is the most active project in the Apache Software Foundation and open source big data.
As already stated SAP HANA Vora is an addition on the Apache Spark framework. The plugin can be used without SAP HANA and vice versa. Basically the Vora tool consists out of three basic components:
1) a cache layer
2) a SQL interpreter layer and
3) a distributed process framework
The cache layer retrieves the data from the HaDoop FileSystem (HDFS) and transforms it to in-memory data. The second layer interprets the SQL commands submitted to the Spark engine and converts these down to native C code. As native C code is closer to machine language than SQL - as it does not need an interpreter to run – data is retrieved much faster. Last the distributed process framework will take care of the distribution of the compiled SQL code and divides these among the available nodes, resulting in queries being up to 20 times faster.
In combination with SAP HANA it implements a so-called high-speed data line. This enables your business to bridge the gap between structured/transactional data (for e.g. SAP ECC) and unstructured data stored in HADOOP. The join will be executed without the data being moved from the source system. SAP’s own Smart Data Access tool could be used for this purpose.
If your employees – like data scientists - want to analyze structured data along with data from the HADOOP cluster, then SAP HANA Vora is the answer. SAP HANA Vora does not require an SAP HANA platform, but when using SAP HANA you will “unleash the beast” and give your IT landscape more potential.
Using Vora will also enable your business analyst, data scientists and even software developers to enhance their HADOOP and business data with hierarchies, drill-down capabilities, unit-of-measure conversion and currency conversions. Functions already known in traditional BI tools like SAP BW. By giving these endless possibilities, unstructured data will be better tailored to the respective business processes it belongs.
SAP has launched SAP HANA VORA September 18th in a couple of different versions, starting with a standard and enterprise edition. The prices are still unknown but will most probably be below the cost of a Hadoop on a per-node basis. But before actually acquiring licenses SAP gives your organisation the possibility to try out the tooling for free. This trial – based on Amazon - will be given out shortly by SAP. Think big start small!