Imagine a Big Data application with the following characteristics:
- it has to process large amounts of complex streaming data,
- the application logic that processes the incoming data must execute and complete within a strict time limit, and
- there is a limited budget for infrastructure resources. In today’s world, the data would be streamed from the local network or edge devices to a cloud provider which is rented by a customer to perform the data.
The Big Data software stack, in an application and hardware agnostic manner, will split the execution stream into multiple tasks and send them for processing on the nodes the customer has paid for. If the outcome does not match the strict time business requirement, then the customer has two options:
- scale-up (by upgrading processors at node level),
- scale-out (by adding nodes to their clusters), or
- manually implement code optimizations specific to the underlying hardware.
E2Data provides a new Big Data software paradigm of achieving the maximum resource utilization for heterogeneous cloud deployments without affecting current Big Data programming norms (i.e. no code changes in the original source).