by Florin Blanaru, Juan Fumero, Christos Kotselidis
The University of Manchester
1. Introduction
TornadoVM v0.6 is now compatible with GraalVM, and its polyglot runtime, enabling heterogeneous hardware acceleration for the plethora of programming languages implemented on top of it.TornadoVM v0.6 extends its compatibility with OpenJDK 8, to GraalVM and OpenJDK 11.0. With the latest version, developers can invoke their Tornado-compatible Java code from within JS, NodeJS, R, etc. and accelerate their existing applications on GPUs and FPGAs. Any existing Java code that could potentially be accelerated through TornadoVM, can now be called by the GraalVM’s polyglot runtime and be reused without any extra effort!
2. Background
Heterogeneous hardware accelerators are becoming pervasive even in commodity personal computers with combinations of CPUs, GPUs, AI acceleration chips etc.
To exploit the performance and energy characteristics of such devices,several programming models (e.g. CUDA, OpenCL, HLS) are being used which are integrated into popular programming languages (e.g. C, C++, etc.).
TornadoVM is a framework developed at The University of Manchester that enables transparent hardware acceleration of Java programs onto GPUs and FPGAs yielding orders of magnitude higher performance compared to CPU execution on certain workloads.
Until the release v0.5, TornadoVM could be used as a plugin to OpenJDK that implements the JVMCI interface. In addition, TornadoVM extends the Graal compiler with performance optimizations specifically targeting heterogeneous hardware acceleration.
With the latest release v0.6, any Java code which is compatible with TornadoVM can be invoked from the programming languages implemented on top of GraalVM and enjoy the performance and energy gains of hardware acceleration.
3. Polyglot Runtimes
Polyglot runtimes have been gaining traction recently mainly due to advancements in compiler technologies allowing aggressive speculative optimizations tailored for specific languages. GraalVM is a prime example of such polyglot runtimes allowing a plethora of programming languages such as JS, R, WASM, Python to be executed on top of a single Virtual Machine. In addition to the programmability and maintainability benefits of having a single polyglot VM, GraalVM also achieves high levels of peak performance compared to existing runtimes for the programming languages it supports.
Until recently, GraalVM supported execution only on CPUs with the latest addition of CUDA extensions by NVIDIA (grCUDA) that enabled the integration of pre-existing CUDA kernels with the various programming languages. Although grCUDA is a step forwards in utilizing hardware accelerations from managed programming languages, it is limited to only NVIDIA GPUs as well as it requires high expertise from developers to manually write their CUDA kernels in order to be plugged-in to the rest of the system.
4. TornadoVM
In contrast to grCUDA or other similar solutions, TornadoVM does not rely on the presence of pre-existing kernels to be used for hardware acceleration. Instead, TornadoVM follows a novel two-tiered compilation approach that JIT-compiles and optimizes pure Java code down to GPU or FPGA compatible binaries. This way developers do not need to learn additional programming languages to exploit such devices which in some cases, like in FPGAs, requires high hardware expertise. In addition, TornadoVM runs internally its own set of bytecodes that enables the dynamic application reconfiguration of applications running on multi-device systems. In essence, TornadoVM can discover which accelerator can yield the best results for any running applications and reconfigure them in a way to find the optimal mapping. The reconfiguration takes place completely transparently and automatically, requiring no intervention from the developers.
TornadoVM can be regarded as an external library or a secondary VM dedicated to accelerate Java applications on heterogeneous hardware accelerators. It can not run by itself and it requires a host VM to handle CPU execution. Until v0.5, TornadoVM could only be used with OpenJDK 8. Developers could download TornadoVM as a .jar file and connect it with OpenJDK. Then upon invoking its API, TornadoVM would take control over the host VM compile and run the portion of the application on a GPU or an FPGA and return the results back to the host VM. All memory management, synchronization and communication between the two VMs are handled transparently by TornadoVM. The only visibility that TornadoVM has to the host VM is through its lightweight public API that developers can use.
5. Merging the two worlds
TornadoVM 0.6 extends its supported platforms (host VMs) to OpenJDK 11 and GraalVM 19.3. Through the integration with GraalVM, TornadoVM can be used from within the various programming languages implemented on top of GraalVM and accelerate existing codes on GPUs and FPGAs completely transparently and automatically. Below is an example of how we can accelerate code run through NodeJS with TornadoVM.
Figure 1 lists a Java snippet for performing the Mandelbrot computation via TornadoVM. As shown, the code follows the simple TornadoVM code principles and API with the addition of the @parallel annotation to the loops’ induction variables and the creation of the TaskSchedule inside the compute method. For more information regarding the API of TorandoVM and how to use it, please refer to: https://github.com/beehive-lab/TornadoVM
Figure 1: Mandelbrot computation with TornadoVM.
The example of Figure 1 runs and accelerates Mandelbrot on the available heterogeneous accelerators of the system. In our examples we use a GPU.
Figure 2 shows a NodeJS example of how to call the TornadoVM version of Mandelbrot via GraalVM’s polyglot runtime.
Figure 2: Calling TornadoVM Mandelbrot via GraalVM NodeJS.
As shown in Figure 2, the invocation of TornadoVM via GraalVM requires no additional changes compared to invoking standard Java code.
After initializing the Node server and running our TornadoVM polyglot application we can see the output computations displayed on the web-browser.
In our example, TornadoVM runs the Mandelbrot computation on an NVidia GTX 1050 GPU on a 1024x1024 resolution.
For the GPU-accelerated computation, TornadoVM spent 0.3 seconds compared to ~18 seconds of sequential Java code compiled by GraalVM.
6. GraalVM-TornadoVM benefits
The benefits of TornadoVM’s interoperability with GraalVM’s polyglot runtime are listed below:
- Re-use TornadoVM-compatible Java code from within GraalVM’s polyglot runtimes.
- Requires zero knowledge from developers regarding hardware acceleration or manual coding of custom kernels.
- Code-once-accelerate-anywhere approach through code reuse and TornadoVM’s extended device coverage.
- Accelerate workloads over multicore Intel and AMD CPUs, NVIDIA and AMD GPUS, Intel and Xilinx FPGAs.
- Enable dynamic reconfiguration of executed applications for user-defined optimal results and SLAs.
- TornadoVM guarantees no performance loss and will follow a best-effort-approach to accelerate the code. If it is not feasible, it will fall back to traditional CPU execution on the host VM.
- Dynamic and transparent code optimizations per-device. No need to recompile or retune manually written kernels.
7. Conclusions
With v0.6, TornadoVM adds interoperability with GraalVM’s polyglot runtime expanding its reach to the numerous programming languages running on top of GraalVM (JS, NodeJS, R, etc.). The combination of the unique characteristics of GraalVM and TornadoVM result in the reusability of TornadoVM code across the supported programming languages enabling the hardware acceleration suitable workloads across high performance GPUs and FPGAs.