Mobile GPU compute must be practical GPU compute

By definition, mobile application scenarios must be power efficient. The reason for this is simple: they run from a battery. The target is to allow a consumer to enjoy the full functionality of a device for as long as possible from a single charge. This means that any usage scenario must be practical and useful, and not just something which burns through battery life thus leaving an unhappy consumer carrying around an unusable device.

In terms of mobile GPU compute, any compute scenario must be a practical, useful GPU compute scenario. The key characteristics explained in my previous article immediately come to mind: only consider tasks suitable for the GPU. This ideally means parallel compute tasks with minimal divergence and not serial divergent tasks, a perfect representation of ensuring that we are using the appropriate compute resource for the right task.

But even the task itself has to be practical and the overall usage scenario of the device has to be practical.

Examples of practical and impractical mobile GPU compute applications

When running a game with console-quality graphics, using GPU compute for some physics calculations does not make sense. While physics are parallel and have limited divergence, in this usage scenario the GPU is already very busy delivering stunning graphical quality to a high resolution screen, hence further loading (or perhaps more accurately overloading) the GPU with a physics task will typically just lead to an overall reduced consumer experience (e.g. lower framerate and/or lower image quality).

On the other hand, when snapping multi-megapixel pictures with your mobile phone camera, you may want to run some image enhancement routines. Loading this onto the GPU makes sense, as this is a parallel non-divergent type of task. Additionally, during the processing, the user is basically just waiting to see his picture and hence the GPU will not be very busy – apart from probably just showing an idle/waiting animation in the GUI.

So two different scenarios both pass the type of processing check but only one passes the practical usage scenario.

There are other usage scenarios that pass the processing type check but may be at odds with the practical check. Video encode and decode fall somewhere in this grey area, where most devices have dedicated resources for executing these tasks in the form of hardware blocks (video processing units). For example, PowerVR VPUs (essentially, fixed function hardware) are far more power and bandwidth efficient than using a programmable, generalized compute resource such as a PowerVR GPU. However, for platforms that do not include dedicated hardware resources for video transcoding, video transcoding using GPU compute might be a more realistic and efficient way of performing these tasks compared to, for example, using the CPU.

A failed usage scenario for mobile would be extreme types of compute which require massive processing time and precision, e.g. folding molecules or other scientific tasks. These fail the practical check as these are things you should never even consider doing on a mobile device. While you may want to view the results on your mobile device, this type of compute should be run on dedicated servers in the cloud.

GPU compute_impractical examplesBiomedical simulations and weather pattern distributions are some examples of impractical use cases for mobile GPU compute

Most compute usage scenarios for mobile battery-powered devices, at least in the near-term, will be practical, common-sense usage scenarios dominated by image and video post-processing and camera vision type of tasks. All of these meet the checks for types of compute as well as the practicality requirement.

GPU compute_practical examplesImage processing, camera vision and augmented reality applications are some examples of practical use cases for mobile GPU compute

A basic rule to remember: just because a task is parallel and non-divergent doesn’t mean that it should run on a mobile GPU – it must always be a sensible usage of your valued battery life.

If you have any questions or feedback about Imagination’s graphics IP, please use the comments box below. To keep up to date with the latest developments on PowerVR, follow us on Twitter (@GPUCompute, @PowerVRInsider and @ImaginationPR) and subscribe to our blog feed.

, , , , , , , , ,

  • Sean Lumly

    Great post! Succinct, clear, and spot on.

    Now if only Android’s Renderscript allowed for the distinction of executing hardware. This, and lack of features is somewhat frustrating when considering Android’s het-compute API — I sincerely hope that it improves to take advantage the wonderful characteristics of your GPUs.

  • Hi Sean,

    Thanks for the feedback, there will be more on Renderscript in a future blog article.


  • Sean Lumly

    Thanks Alex! Any additional insight into Renderscript and imagination GPUs (and CPUs) would be very welcome — I’m looking forward to the post! I’m actually excited about the API, though it’s progress seems somewhat slow.

  • verdantchile

    The microkernal of PowerVR GPUs should assist them in getting better results from Renderscript even with the APIs lack of functionality in that regard. Future enhancements to Renderscript as well as more targeted implementations with Filterscript should bring improvement, too.

    Determining whether moibile GPU compute makes sense on a task level requires evaluating the trade-offs involved. Processing game physics when it’ll be taking away from the visual splendor you were after in the first place can be a bad trade-off, but seeing realistic physical behavior of in-game objects could also sometimes be more impressive than just layering on additional scene complexity to the graphics.

  • sikulas

    Always write that the new PowerVR G6XX0 has a small area and power consumption. But…

    1. What is REAL area of PowerVR G6100/6200/6400 ?

    2. And what is frequency of GPU PowerVR G6400/6200/6100 now (for example G6200 in MT8135)?

  • verdantchile

    1. The resulting die size of a core implemented onto a System-On-a-Chip can differ depending on whether the semiconductor company optimized the implementation for a smaller area, higher frequency, or laid it out in a way to better control power consumption and heat dissipation (at the expense of extra die area). Choices to vary on-die buffer sizes or considerations to the support resources for outfitting the core with better bandwidth can also be made and effect the resulting die area.

    I assume PowerVR cores might end up somewhat larger than some competing cores, but trading some extra area for better thermals/power efficiency is the right design choice to make for mobile designs.

    2. Speculation is that the G6200 in the MT8135 may end up getting targeted at around 300 MHz. That’s a low target compared to what some other semiconductor companies have been considering with their Rogue implementations: 400 MHz to 600 MHz and even beyond.

  • sikulas

    Thanks for the answer!
    Can you clarify about the PowerVR G6200 in MT8135:

    1. About the area.
    If you compare the G6200 with SGX544 and SGX554, so its area will be like MP1 or MP2 or MP4 (or between some of them)?

    2. About the frequency.
    MediaTek “said” about 80 GFLOPS for PowerVR G6200 in MT8135. But if you look at the “formula”:
    16USSE2 x 2Clusters x 0.300GHz x 9 = 86,4 GFLOPS
    So frequency of PowerVR G6200 in MT8135 is less than 300 MHz ? Or what is the TRUE formula to GFLOPS-calculate (PowerVR G6xxx) ?

  • Hi,

    Unfortunately I cannot disclose the exact frequency at which the PowerVR G6200 inside the MT8135 processor operates. Same for area, these numbers are usually revealed under strict NDA.

    However, all PowerVR G6xxx GPUs have USCs (Unified Shading Clusters); The USSE v2 shading engine is specific to PowerVR Series5XT GPUs, therefore your formula is incorrect.

    You read more about PowerVR G6200 in this press release:


  • Hi,

    This is a very detailed and spot-on explanation, thanks for stepping in.


  • Hi,

    Indeed, all PowerVR G6x30 GPUs have been optimised for maximum efficiency but still manage to keep power consumption to a minimum even with an incremental increase in area; PowerVR G6x00 GPUs are designed to deliver the best performance at the smallest area possible

    An example feature included in PowerVR G6x30 cores is lossless compression which reduces GPU bandwidth usage thus enabling higher performance and reduced power consumption.

    As always, I am unable to comment on the specific GPU frequency for any application processor unless explicitly stated by the silicon vendor.


  • sikulas

    Alexandru, would you write the TRUE formula to GFLOPS-calculate for Series 6 of PowerVR, please.

  • sikulas

    I read it all, of course. But there is also written that the information is “under NDA”.

    OK. Do you know whether the MediaTek plans to release new processors later this year 2013 with the use of Series 6 of PowerVR (MT6592, MT6588)? … if this information is not “under NDA”, of course 🙂

  • roninja

    rumours suggest MT6588 is SGX 544MP1 and the 92 might actually be a Mali core now either 4xx/6xx dependent on some research published a few weeks ago by Maybank. Infact this whole Octocore thing is questionable as remarked by Anand Chandreshekar recently formerly of Intel and now employed by Qualcomm – a person who would be very family with the PowerVr family by the way (I digress!) how MT6592 compares with MT8135 will be interesting.

  • sikulas

    Mali 400MP4 is used in the MT6582 now. According to rumors in MT6592 it may be 544MP4 or 554MP4. But… the performance of G6200 (in GFLOPS) is about PowerVR 5x4MP4 – that is why it may also be used… 🙂

  • Because our PowerVR ‘Rogue’ cluster architecture scales linearly in performance, PowerVR G6200 (2 clusters) is 2x the GFLOPS performance of PowerVR G6100 (1 cluster), for the same frequency.

    Another great advantage of PowerVR ‘Rogue’ GPUs is that the cluster-based structure avoids replicating coherency-related overhead resources that competing multicore GPUs still need to maintain.


  • Пламен Николов

    Alexandru Voica, you ‘re wrong:

    PowerVR G6200 (MediaTek) :

    16USSE2 x 2 Clusters x 0.280MHz x 9 = 80GFLOPS

    This is tantamount to SGX554 MP4 (80 GFLOPS).

    eeTimes says :

  • Hi,

    How does that chart contradict any of the statements in the article above (or my comments)?

    Please try to stay on topic, this article is about mobile GPU compute applications and makes no reference to PowerVR G6200 or any other specific GFLOPS numbers.

    If you need more information on the MT8135, please click on the link below.

    Best regards,

  • Пламен Николов

    Below said that the formula was wrong . On the contrary .
    Although in that article does not mention, it is difficult to know the specific results . Ultimately, it is almost certain that things look as the formula indicates . Of course, G Series allows much higher clock frequency, but it is harmful to the battery in PDA devices .
    Thanks for the reply !

  • On the contrary, PowerVR Series6 GPUs introduce a number of hardware features designed to keep power consumption to a minimum (lossless image compression, PVRTC/PVRTC2, etc.).

    Please read the articles carefully before jumping to conclusions or speculating on performance numbers.


  • Пламен Николов

    Whether USSE2 or USC, there’s nothing more important. The important thing is that there are 16 pipelines!!
    Ultimately PowerVR G6200 (MediaTek) is a 80GFLOPS (280MHz). eeTimes says.