Mythbusters | Imagination’s GPU Compute special edition

PowerVR GPU Mythbusters Imagination PowerVR GPU compute mobile graphics

There seems to be a dangerous trend proliferating recently in certain circles that promote misleading concepts about what GPU compute is all about. But worry not; we here at Imagination have decided to put an end to all the nonsense and give you the proverbial facts in a no-spin zone.

We’ve assembled the best engineers in one room and asked them to demystify the confusion by coming up with some clear messages on GPU compute APIs and their relation to heterogeneous computing. After five minutes spent debating what Justice League-type name we should adopt, we quickly started working on the assignment.

Going through all the related material, we’ve come up with a list of common myths about compute APIs and successfully separated fact from fiction. Here is an overview of their Judge Dredd-worthy ‘trial, judge and jury’ rebuttal.

Myth #1: GPU compute-based mobile applications will be ported from the desktop space

Totally busted: The mass-migration from desktop to mobile is not needed and will not happen for mobile GPU compute apps. Developers are not porting High Performance Computing (HPC) applications to the embedded space: their use cases are new and improved apps for mobile devices with limited battery life. And performing the least power hungry calculations is only possible on an embedded GPU like Imagination’s PowerVR Series6.

One of the most popular GPU compute APIs today is OpenCL. It defines two profiles: Full (or Desktop) and Embedded. The Desktop Profile enables HPC to be performed using GPUs, and the Embedded Profile allows many computationally-demanding calculations to be implemented in software, providing tech-savvy OEMs within the mobile space the ability to differentiate their products using software. We have also seen use cases that reduce power consumption, reduce bills-of-material– why add a DSP when you can multi-task the GPU for both graphics and parallel computing– and improve time-to-market.

Imagination's PowerVR GPUs running OpenCL on Amazon's Kindle Fire

Imagination’s PowerVR GPUs running OpenCL on Amazon’s Kindle Fire HD tablet*

Supporting just one profile can quickly limit your embedded needs and options. Instead, Imagination’s PowerVR architecture is designed to handle both profiles, with a wide range of products naturally addressing our customers’ needs, from efficient embedded processing through to high performance computing.

The PowerVR Series6 feature set includes floating and fixed point data paths optimized for low power which make it the perfect choice for integrating a GPU in a system that doesn’t burn through your battery when launching an OpenCL application.

The mandatory elements in the Desktop profile provide high-precision floating point accuracy required by the HPC market for scientific calculations. The Desktop profile mostly disregards power considerations in favor of raw speedups, and is largely implemented by graphics processors that end up in the large Matrix-like farms we’ve all read about, taking up megaWatts of power. All of the real-world embedded use cases we’ve seen fit within 32 bits of relaxed precision and are largely implemented by graphics processors optimized for the Embedded profile. It is these processors that end up in the world’s leading mobile devices.

Myth #2: OpenCL for heterogeneous processing is about moving code between the CPU and GPU

Completely busted: This is a classic case of misunderstanding heterogeneous processing and what it means. Current compute algorithms targeting CPUs must be modified or re-written to benefit from the parallelism of GPUs as parallel kernels need to scale with number of execution units and organize data differently to sequential functions running on the CPU.

Heterogeneous comes from joining together two Greek words: ἕτερος (heteros, “different”) and γένος (genos, “kind”). GPUs and CPUs are inherently very different pieces of hardware. Their architecture is different because they are designed to perform distinctive tasks: the CPU is particularly good at handling sequential code performing control and I/O functions while the GPU is very efficient at executing computations with a high degree of parallelism typically found in image and video processing, game physics or augmented reality apps.

Imagination's PowerVR GPUs running OpenCL on Amazon's Kindle Fire HD tablet

Imagination’s PowerVR GPUs running OpenCL on a range of platforms**

Today’s most compelling applications are written to make full use of all available programming resources including CPUs, GPUs and any other programmable units available on-chip. While heterogeneous processing architectures are not in themselves new–any system-on-chip (SoC) with a CPU and DSP engine fits the criteria–the ability to use embedded GPUs for general-purpose computations beyond 3D graphics is a recent development, providing more flexibility to programmers in the kinds of applications they can write.

Myth #3: Compute APIs like OpenCL are only for high-end smartphone and tablets

Again, utterly busted: OpenCL was not created just for a certain range of processors. Instead, it is an open standard aimed at all modern CPU and GPU architectures. Consumers expect the same experience across all their devices. Furthermore, computing platforms on a budget usually have a low cost CPU. Integrating an OpenCL-capable PowerVR graphics core helps improve the overall system performance by taking over the heavy-lifting processing a CPU was not designed to tackle in the first place.

Some graphics vendors have chosen not to offer OpenCL compute capabilities for certain GPUs. This will cause an immediate negative reaction from both consumers and developers. Imagination’s approach was to expand OpenCL support across all our Series5/5XT and Series6 cores. By doing so, we’ve offered all our partners equal opportunities to deliver the right balance of performance they require and not put certain designs at an unfair disadvantage.

This proved to be the right choice, as we are now seeing the latest mobile operating systems like Android 4.2 (Jelly Bean) adopt Renderscript-type APIs such as Filterscript. Filterscript is a perfect match for platforms integrating PowerVR Series5/5XT and Series6 GPUs because it relaxes some of the existing Renderscript APIs, allowing the resulting code to run on a wider variety of processors (CPUs, GPUs, and DSPs). Developers can use Filterscript on PowerVR graphics cores to improve applications dealing with image processing operations, such as those written with an OpenGL ES fragment shader.

In Filterscript, built-in types will not usually exceed 32-bits and relaxed floating point precision is a must, therefore having FP64 and the restrictive IEEE 754 floating point precision translates to wasting of silicon and power, as no application running on Android would ever use it.

Final reflections

We are at the start of the heterogeneous, parallel processor aware software revolution, where both semiconductor IP companies and developers must work together more closely. APIs like OpenCL, Filterscript/Renderscript or DirectCompute from organizations and companies like the HSA Foundation, the Khronos Group, Google and Microsoft will grow in popularity and enable the growth of heterogeneous, parallel processing aware applications.

Mobile developers will favor the embedded-oriented compute standards as they did with OpenGL ES. Low power implementations that minimize numerical precision will be the key differentiating factor for mobile computing platforms. This is because developers will make full use of F32 and F16 integers, and fixed point numbers as much as they can. It allows them to target maximum performance and portability but minimal power consumption–the essence of mobile and embedded computing for the past two decades.

We look forward to working with our PowerVR Insider ecosystem partners on developing the next generation of applications that will run on our groundbreaking PowerVR architecture.

If you want to see a demonstration of the impressive potential behind our graphics and compute technologies, then stop by our booths at the various events we take part in around the world. For more announcements on all things GPU compute, follow us on Twitter (@ImaginationPR, @PowerVRInsider and @GPUCompute).

* Image and video courtesy of Engadget, all rights reserved

** Image courtesy of Anandtech, all rights reserved

, , , , , , , , , , , , ,

  • OpenCL

    > Some graphics vendors have chosen not to offer OpenCL compute
    > capabilities for certain GPUs

    > Imagination’s approach was to expand OpenCL support across all

    > our Series5/5XT and Series6 cores.
    >

    So where is it?
    On which SoC with SGX5 GPU can I (as a developer working for a small company) use OpenCL today? Yeah I think there is NONE.

    It WORKS for example on Pandaboard already – why not just release that?!
    (I don’t care about bugs there are always bugs)

    This has been asked frequently since at least the last 3 years and there were announcements 2 years ago that it works… but nothing has been released nowhere…

  • alexvoica

    Hi,

    All our PowerVR GPUs are capable of supporting OpenCL, Renderscript Compute and Filterscript. We are working with our ecosystem partners to make these compute APIs available to developers but it is a combination of OEM/ODM manufacturers, operating system providers and silicon manufacturers deciding together which API they choose to expose and that usually takes some time.

    We’ve recently shown the compute performance of mainstream devices like Amazon’s Kindle Fire 8.9 tablet using a proprietary version of Imagination’s drivers (DDK) which is available to our current and future licensees, under NDA.

    http://www.engadget.com/2013/01/04/opencl-mod-for-the-kindle-fire-hd/

    The Pandaboard is not a developer project that Imagination oversees directly, we have limited control over the drivers released for that platform. You can find out more about it at http://pandaboard.org/content/platform.

    I hope this helps.

    Best regards,
    Alex.

  • Pingback: How much speedup can you get with CUDA or OpenCL? | ArrayFire