Some 970FX info (new?) from IBM's PPC newsletter #16...

Posted:
in Future Apple Hardware edited January 2014
...Wich was released today AFAIK.



Link to the quoted article below, which is an interestin read (?even if this might not be new?) IMHO



Delivering performance is what the IBM PowerPC® 970FX processors do by leveraging the architectural advantages of the 64-bit IBM POWER4? server processor and providing industry-leading performance through an advanced superscalar design with multiple, pipelined execution units. Applying this processor to networking applications could potentially change the way that the industry views desktop processors.



It is often thought that applications with poor data locality, such as networking, require either a network processor-based system or a custom ASIC system that has a processor, embedded memory controllers and networking I/Os to achieve the desired function and performance. When one looks at the systems developed by leading manufacturers of networking equipment, one finds custom ASICs as well as specialized processors that are tuned to this specific application with integrated memory controllers and networking I/Os.



The IBM PowerPC 970FX has 10 parallel execution units, which are capable of handling multiple tasks in an efficient manner. Additionally, the PowerPC 970FX implements advanced load/store techniques such as deep reorder queues and deep miss queues as well as instruction prefetch and data prefetch streaming. These features are a key requirement for networking applications, to be able to perform multiple tasks and to hide memory latency in the background.



With the introduction of IBM PowerPC 970FX, networking vendors have another option to develop networking applications with deep packet-processing requirements using desktop processors with:



* Off-the-shelf hardware components

* Off-the-shelf operating systems like Linux

* Industry-standard programming languages

* Industry-standard debug tools



Let?s explore the PowerPC 970FX and see how the execution unit can be used to reduce CPU?to-memory latency (see Figure 1).







Figure 1. PowerPC 970FX functional block



L2 cache

The 512KB of L2 cache provides the execution core with very fast 64-GBps access to data and instructions.



L1 cache

Instructions are prefetched from the L2 cache into a large, direct-mapped 64KB L1 cache at 64 GBps. In addition, 32KB of L1 data cache can prefetch up to eight active data streams simultaneously.



Fetch and decode

As instructions are accessed from the L1 cache, up to eight instructions per clock cycle are fetched, decoded and divided into smaller, easy-to-schedule operations. This efficient preparation maximizes processing speed as instructions are dispatched into the execution core and data is loaded into the large number of registers behind the functional units.



Dispatch

Before instructions are dispatched into the functional units, they are arranged into groups of up to five. Within the core alone, the PowerPC 970FX can track up to 20 groups at a time or 100 individual instructions. This efficient group-tracking scheme enables the PowerPC 970FX to manage an unusually large number of instructions ?in flight.?



Queue

After an instruction group is dispatched into the execution core, it is broken into individual instructions, which proceed to the appropriate functional unit. Each functional unit has its own dedicated queue, where multiple instructions are arranged for processing in whatever order is required.



Optimized vector units

The PowerPC 970FX uses an optimized dual-issue pipeline with two independent queues and dedicated 128-bit registers and data paths for efficient instruction and data flow in the four parallel vector units. These vector-processing units accelerate data manipulation by applying a single instruction to multiple data sets at the same time, known as SIMD processing..



Two double-precision floating-point units

Two double-precision floating-point units provide the precision required for highly complex scientific computations. Although 32-bit processors are able to execute double-precision 64-bit calculations by cycling through the floating-point math unit multiple times, a double-precision math unit on a 64-bit processor can complete the same calculation in a single clock cycle.



Two integer units

Integer units perform simple and complex integer mathematics ? such as add, subtract and compare ? which are commonly used in many basic computer functions, as well as in networking applications, to manipulate the pointers or bits. The PowerPC 970FX has two integer units capable of a broad range of simple and complex instructions involving both 32- and 64-bit calculations. What?s more, they take full advantage of the processor?s 64-bit registers and data paths to complete 64-bit integer calculations in a single pass.



Load/store

At the same time as instructions are queued, the load/store units load the associated data from L1 cache into the data registers behind the units that will be processing the data. After the instructions manipulate the data, these units store it back to L1 cache, L2 cache or main memory. Each functional unit is generously equipped with 32 registers that are 128-bits wide on the vector units, and 64-bits wide on the floating-point units and the integer units. With two load/store units, the PowerPC 970FX is able to keep these registers filled with data for maximum processing efficiency. This is one of the key benefits that one can use for the packet-processing application to reduce latency.



Condition register

The condition register indicates the results of comparison operations and provides a means for testing them as branch conditions. By bridging information between the branch unit and other functional units, the condition register improves the flow of data throughout the execution core.



Three-component branch prediction logic

The PowerPC 970FX usually ?knows the answer before it asks the question,? using branch prediction and speculative operation to increase efficiency. Branch prediction anticipates which instruction should go next, and speculative operation causes that instruction to be executed. If the prediction is correct, the processor works more efficiently ? because the speculative operation has executed an instruction before it is required.



Completion unit

When operations on the data are complete, the PowerPC970 FX recombines the instructions into the original groups of five and the load/store units store the data in cache or main memory for further processing.





Figure 2. PowerPC 970FX system view







As you can see from the system view in Figure 2, there is plenty of bus bandwidth and memory bandwidth to sustain high throughput for the packet-processing application. An issue that often comes up is memory latency and I/O latency. The memory latency can be further reduced to some very impressive numbers by the use of multiple load/store operations as the PowerPC 970FX has a 32-entry load reorder queue and an 8-entry load miss queue. This means that the PowerPC 970FX can continue to operate with up to 32 outstanding load operations, with up to 8 of them being L1 data cache load misses, effectively pipelining memory accesses. Pipelining of memory access enables the processor to continue execution by effectively hiding memory latency.



The combination of high-speed I/O bandwidth and a way to hide memory latency, along with a very powerful CPU, makes it possible to use the PowerPC 970FX for packet-processing applications with a rich set of capabilities for handling heavy-activity packets.

Comments

  • Reply 1 of 3
    programmerprogrammer Posts: 3,458member
    So what information to you think is new? I read that a couple of days ago and didn't see anything noteworthy...?
  • Reply 2 of 3
    Quote:

    Originally posted by Programmer

    So what information to you think is new? I read that a couple of days ago and didn't see anything noteworthy...?



    yeah, some commentary would have been nice.

    well... at least he did a good job of posting the entire article, including graphics, for those who were too lazy to click the link.



    Programmer, I only clicked back on this "thread" 'cause I was hoping you had some 970FX comments. Maybe you can salvage it before it's locked?

  • Reply 3 of 3
    kroehlkroehl Posts: 164member
    Quote:

    Originally posted by Programmer

    So what information to you think is new? I read that a couple of days ago and didn't see anything noteworthy...?



    He did write:
    Quote:

    (new?)



Sign In or Register to comment.