E2Data in a nutshell

E2Data aims to answer two key questions:

How can we improve execution times while using less hardware resources?

In order to address the alarming scalability concerns, both end users and cloud infrastructure vendors (such as Google, Microsoft, Amazon, and Alibaba) are investing in heterogeneous hardware resources able to utilize a diverse selection of architectures such as CPUs, GPUs, FPGAs, and MICs aiming to further increase performance while minimizing the climbing operational costs. Furthermore, despite current investments in heterogeneous resources, large companies such as Google develop in-house ASICs with TensorFlow being the prime example.

E2Data will provide a new Big Data software paradigm of achieving the maximum resource utilization for heterogeneous cloud deployments without affecting current Big Data programming norms (i.e. no code changes in the original source). The proposed solution takes a cross-layer approach by allowing vertical communication between the four key layers of Big Data deployments (application, Big Data software, scheduler/cloud provider, and execution run time).

Image

E2Data dynamically exploits heterogeneous hardware (GPUs, FPGAs) by:

  • Enabling dynamic heterogeneous compilation of arbitrary code;
  • Following a full-stack vertical approach where state-of-the-art software frameworks will be enhanced;
  • Managed Runtimes | Cloud Providers
  • Designing an intelligent elastic system where we can:
    • Profile results, communicate to scheduler, and assess the decision;
    • Fall back and recompile on-the-fly;
    • Iterate until the AI enabled scheduler finds the “best” possible execution configuration.

How can the user establish for each particular business scenario which is the highest performing and cheapest hardware configuration?

The E2Data consortium brings together two distinct cutting edge EU Big Data practitioners to achieve its ambitious goals. On the one hand we have the following four Big Data users in specific markets with strict requirements in terms of performance and infrastructure costs:

  1. EXUS in the Health Sector,
  2. Neurocom in Fintech,
  3. SparkWorks/CTI in Green Building Infrastructure, and
  4. iProov in security and biometric recognition.

 

On the other hand four Big Data technology providers will implement the E2Data solution by extending cutting-edge European technologies:

  1. DFKI, the creators of Apache Flink (the number one European competitor of Apache Spark), will provide solutions in the core of the Big Data stack,
  2. ICCS will deliver a novel Big Data scheduler (i.e. the component that assigns hardware resources to tasks during execution) capable of intelligent resource selection,
  3. UNIMAN with expertise in heterogeneous computing will work at the system level and enable dynamic code compilation and execution on diverse heterogeneous hardware resources, and
  4. Kaleao will showcase that its high-performing, low-power, cloud architecture can strengthen EU's Big Data hardware capabilities with E2Data's proposed technologies.
This project has received funding from the European Union's Horizon H2020 research and innovation programme under grant agreement No 780245.