Architectures and ISAs: what really matters in mobile processing
I’ve recently returned from the Linley Tech Mobile Conference, an exciting event held yearly in Silicon Valley that features a two day, single track marathon of technical presentations related to mobile processing from a range of companies such as Synopsys, Intel, GLOBALFOUNDRIES, Qualcomm (all part of Imagination’s growing ecosystem, might I add) and many others.
This year we were greeted by the familiar Santa Clara landscape and the Hyatt Regency Hotel, a suitable location for the large number of attendees from all sides of the tech aisle, including press, analysts, senior engineers and business managers.
My presentation was part of the third session focused on mobile CPUs and explained why, in spite of all the trends towards high level software development and abstraction away from the underlying CPU and GPU instruction set architectures (ISAs), efficient processor architectures built for scalability from the start make all the difference when designing CPUs, GPUs and other processors for mobile devices.
Power and shrinking product development cycles – the current battlegrounds in mobile processing
Mobile product development currently faces two very different challenges that control the pace of computing advancements and directly link back to the point above.
On one side, power stands as the ultimate battleground which every major processor IP and silicon vendor out there is trying to dominate. Balancing high performance and low power consumption has become a carefully choreographed act for system designers, influencing every major SoC decision. It has already dominated 28nm designs and will continue to define sub-20nm SoCs. But more importantly, it transcends mobile, as keeping thermal envelopes from expanding is a vital aspect for many markets, from embedded computing and mobile processing to networking, M2M and the Internet of Things.
The other big challenge we’ve encountered is the shrinking product development cycle. 18-24 months used to be norm, but we are now seeing some of our customers pushing new chipsets every 6 to 12 months. This requires an amazing engineering effort and is exacerbated when those products are scaled across tiers of markets.
Convergence is aggressively accelerating design cycles for markets which traditionally were less dynamic, where products like smart TVs, portable gaming consoles or connected cars need to have the latest technology to appeal to consumers that are used to personal compute devices the size of a credit card.
Code portability, both in terms of re-use and better distributed use across different compute resources on a chip, are solutions to power and shrinking development cycle challenges. This is the basis for the heterogeneous processing revolution, and the technologies that bring it to reality, facilitating the move away from low level hardware dependency. CPUs today run the bulk of general-purpose software but a better utilization of existing SoC elements must be the primary contributor to achieving future performance increases. A good example is LLVM – it removes programming exposure to the underlying ISA and enables code portability across devices and architectures.
PowerVR GPUs drive heterogeneous processing efficiency
Imagination’s PowerVR ‘Rogue’ architecture is approaching 1 TFLOPS of performance for mobile devices. Defining an important part of any system’s performance and features now relates to the graphics and compute potential of these highly parallel processors.
Historically, the utilization of that performance has been mostly limited to driving the display of the device.
But given such capability, improving SoC efficiency is dependent on tapping into the GPU’s potential through new and improved APIs, scalable software solutions and a unified programming environment.
Revisiting the RISC architecture philosophy
In spite of this trend, the CPU remains the primary and most used general purpose programmable unit of a system. Imagination’s MIPS architecture traces its roots back 30 years to John Hennessy and his team, who brought expertise in compiler theory to the definition of a pure RISC architecture implementation, facilitating the development of optimizing compilers.
Thirty years later, this emphasis is perfectly aligned with the industry’s shift towards code portability using JIT/dynamic compiler technologies during run-time. MIPS ISA features like single operation per instruction, simple addressing modes, no predicated execution or no integer condition bits translate into real world benefits that help system designers achieve higher performance and build high end, superscalar, out-of-order CPUs operating at high frequencies while keeping power consumption firmly in check.
In other words, despite technologies that free the industry from historical ties to underlying ISAs, inherent architectural attributes remain important for dynamic compilation performance, how efficiently an architecture implements in silicon, and support for open standards and operating systems.
The proAptiv CPU is a perfect example of how Imagination’s uncompromising MIPS architecture translates into a high performance, low power CPU. For example, a lack of predicated instructions in the architecture eases implementation of branch prediction schemes, contributing to the proAptiv CPU’s leading branch prediction performance in its class.
Further architecture efficiencies and microarchitecture design choices resulted in a core delivering the highest CoreMark/MHz scores for a CPU in its class at time of introduction, and at~60% the area of competing CPU solutions.
Mobile CPUs – not just application processors
But application processors are not the only CPU in mobile SoCs. Other functions, such as the communication processing in the baseband that ties the mobile device to the network, are just as important and can benefit from other architecture attributes. Multithreaded processors can deliver more performance in a similar area and power footprint of a single core CPU.
This can be achieved with better support for real-time/deterministic processing requirements and QoS, through a built-in hardware scheduler and yield qualifier. Using our multithreaded and multicore solutions, customers can build superior baseband solutions based around our efforts with partners on optimized LTE baseband stacks and multi-threaded RTOS support from several providers. Using such technology has translated to a 37-53% gains in throughput performance on LTE traffic, relative to use of a single threaded CPU core.
Overall, the foundation technologies for heterogeneous computing have started to improve the outlook of mobile computing. Imagination offers a wide portfolio of IP processing units and embraces a scalable, portable, highly open, standards-driven future. We believe architectures (CPU GPU, or otherwise) should compete based on their true merits and that the industry benefits from not being tethered to the ISA dependencies of the past.