Simultaneous multithreading in CPUs and its importance for network processing

The Linley Tech Processor two-day conference has grown to feature 25 technical presentations, together with panel discussions, exhibits, and other networking opportunities. It was the best place for Imagination to showcase our Meta advanced simultaneous multithreading CPU technologies as the covered topics ranged from trends in multicore-processor design and programming to “manycore” architectures that address high-speed networking, licensable CPUs, platform security, and high-performance memories for networking.

The problem: Today’s systems are affected by increased memory latency

Current systems have seen a rise in competing demands on memory subsystems from multiple units including CPUs, DSPs, VPUs and GPUs. Therefore, a processor’s potential throughput is greatly reduced by SoC memory latency. Traditional CPU solutions simply cannot manage this efficiently, often stalling for as much as 50% of the time, throwing away valuable processor bandwidth.

Simultaneous multithreading CPUs like Imagination’s Meta processor is the ideal solution to tackle this issue that is so widespread in many of today’s SoCs due to its efficiency-oriented design that offers maximum system latency tolerance.

Imagination's Meta multi-threaded CPU architecture overview

Because the multi-thread Meta CPU benefits from both instruction and thread level parallelism, modern router architectures can take advantage of this to implement advanced functionality such as improved database searching or assigning different threads to handle separate sets of incoming traffic. Additionally, Meta can adapt to tasks requiring special resources therefore providing flexibility without any added software complexity. It also addresses security issues by being able to run threads independently so processes are kept private from each other.

The solution: A simultaneous multithreading processor like Imagination’s Meta CPU

32-bit communication processing is divided into a range of specific hardware devices, from mobile and terminal equipment to middle layers and application servers. There is a section of low cost home and enterprise network equipment that includes the access and distribution layer and the enterprise backbone where 32-bit Meta SMT CPUs can be integrated. As most low end routers implement a centralized architecture, forwarding performance is limited by the CPU as all IP lookups and forwarding functions as well as table creation and packet head extraction have to be performed in the central processor.

Networking equipment CPU processing requirements

The usual approach was to use multiple specialized processors for system control and data plane management. But this mechanism is obviously inefficient: the more functionality is added, the more dedicated hardware needs to be integrated, often scaling to platforms that become inefficient from a power and performance perspective.

The method: Simultaneous Multithreading with Advanced MIPS Allocation

When comparing a typical RISC single core CPU and a simultaneous multithreading processor, there are some architectural differences that help the latter come out as a clear winner. To achieve ILP (instruction level parallelism) and TLP (thread level parallelism), Meta has a separate instruction pipeline for each independent thread.

Imagination’s CPU also comes with a distinct set of local and global registers so there is no need to reorder instructions, predict branches or make any similar speculations to keep the execution unit busy.

Furthermore, Meta’s unique, patented AMA (Automatic MIPS Allocation) provides automatic hardware resource management which ensures that each thread of execution gets the throughput it requires and has the adequate response time. AMA allows thread instruction issue rates and relative thread priorities to be dynamically controlled based on rate and priority control.

Meta CPU: Multi-threading provides DMIPS per MHz sustained throughput

 

Multithreading provides DMIPS per MHz sustained throughput

Since Meta’s implementation of multithreading includes all the options above, computing platforms designed for entry-level routers will not be forced to jump to a multi-core architecture and can rely on Meta for the next generation. For example, dereferencing through large tables can become a slow process for traditional CPUs as each pointer is fetched from cache or memory. Superscalar and OoO (Out of Order) execution implementations do not offer any benefit or architectural advantages for such tasks and are therefore reduced to an old fashion single thread operation, wasting silicon area and consuming more power.

Multicore vs multithreaded CPUsMulticore vs multithreading CPUs

The Meta simultaneous multithreading CPU allows N such operations to run in parallel, without the need to add more hardware to increase processing capabilities.

In conclusion, simultaneous hardware multithreading allows communication platforms to naturally decompose tasks into multiple separate threads. It provides flexible management of combined tasks that handle incoming traffic by exploiting Instruction Level Parallelism more efficiently than Superscalar OoO implementations.

Meta CPUs can also add new streams without compromising performance due to its AMA (Advanced MIPS Allocation) mechanism which enables system architects to ensure a future-proof design against new functionality. They offer a way of matching architectural cost points required by different equipment without having to redesign the hardware. Overall, the net benefit of SMT for communication processors is a more efficient, flexible and reliable system with a more maintainable embedded software platform.

Interested in our range of Meta CPUs? Stay in touch with us on Twitter (@ImaginationPR) and keep coming back to our blog for more articles on Imagination’s range of CPUs and their extended range of applications.

, , , , , , , ,