balanceIn order to cope with the challenges of exascale computing, particularly the massive amounts of heterogeneous processing units, the deep memory hierarchies, and the deep and heterogeneous communication facilities, application developers need support in all phases of the application lifecycle, including programming models that allow the construction of efficient, yet portable, applications, and advanced compilation techniques and adaptive runtime environments.

Applications at the exascale will need to exploit every bit of parallelism in the codes. This can be done using a variety of methods: message passing (e.g. MPI between nodes), shared memory parallelism (e.g. OpenMP between the cores in a node), vector and other instruction level parallelism within the core (SSE, AVX, etc.). In addition, GPU-like accelerators may be available (e.g. on-chip sharing the same memory space, as co-processors attached via the network, or as GPUs connected via a PCI bus). At least in their current form, these accelerators offer a combination of vector and SIMD parallelism that can (and should) be exploited in parallel with the CPU resources.

How can application codes exploit all these levels of parallelism? CRESTA conducted research into the current task and data parallel programming models; into possible hybrid models; and into parallel mark-up frameworks and auto-tuning techniques to allow the compiler or runtime to decide where to schedule parallel operations. Active engagement in open standards bodies is a key feature of this work.