As I understand it, when IBM says that it is a dual coar CPU it means that the CPU has the circutry for 2 CPU cores that are interconnected on the silicone. This is different than Hyperthreading becouse there really are two physical "CPU" cores instead on one CPU with a "Virtual" core. Now, if IBM's version of Hyperthreading ( if forget the name) works in a similar manner to Intels, then 1 G5 processor would appear to the system as 4 CPU's (2 physical and 2 "virtual). Of course I could be totally wrong on this, I'm not a hardware expert.
Just doing som simple math to illustrate the power of this..
There are 4 Power5 CPUs on that cake, 2 cores each with SMT witch add 40% performance. If One 970 @ 2 GHz is equal in performance to one 1.6 GHz Power5 core, that cake will have the performance of 11 970 CPUs. IBM states that one machine can take 16 MCMs. 13 machines packed to the brim with Power5 MCMs will equal in performance to the VT cluster powered by 1100 PowerMac G5s.
Such a setup would probably blow the VT clusterout of the water in the real world but is likely cost like a proper supercomputer too.
I think I'll leave my interpretation of the tech articles to the following diagram:
I find the dual-core design pretty interesting, mainly because it's an interesting approach to multiprocessing design. Traditionally you have CPUs in a multiprocessor system communicating with the memory and I/O controllers, but I don't believe they generally actually spend much time talking to each other. The Power4 architecture appears to do so.
I'm a little unsure how a dual-core Power4 or Power5 chip would appear to an operating system, as there are two possible approaches (that I can see). Either the OS is full aware that there are essentially two processors in the system, but does not know/care that they are sitting on the same die; or the OS would not be aware at all that there are two cores on each die.
The reason I think the latter could be possible is that all the instructions, data, etc. passes through the "fabric controller" before being allocated to one core or the other. I'm speculating that the OS could be unaware of the two cores being present, and would see only a single, albeit fast, unified processor.
If I'm correct, this could potentially mean seeing the benefits of SMP without recoding/optimization of apps.
They appear as two seperate processors. I suspect that IBM's OS task scheduler knows that they share cache resources and takes this into account when scheduling on systems with more than one chip.
... Armonk, N.Y.-based IBM's Power4 implemented two processor cores per chip, the Power5 will present four virtual cores to the operating system?two physical cores and two virtual ones?said Ron Kalla, a system designer for IBM...
I have read that the power5 core required 25 % transistors more than the power4 core (compared to 5 % more between the P4 and the P4 HT).
I have the feeling that this huge difference of numbers do not come only from the bigger number of register in a power4 core. I think it come from the end of the utilisation of the five instruction group trick.
In the power4 and in the G5, the chip followed the instructions in process in the pipeline via a group of 5 instructions. At the contrary the P4 followed less instructions simultaneously but followed them individually.
I think (but i may be wrong) the grouping of instructions is bad for SMT, and thus IBM has no more this feature, and have exchange it for a individual one like the P4 (but much more complicated, due to the more hyperscalar architecture).
Grouping was a way to save transistors, but SMT certainly obliged IBM to leave this trick .
May be it's the reason why SMT required so much more transistors in the power5 architecture.
I have read that the power5 core required 25 % transistors more than the power4 core (compared to 5 % more between the P4 and the P4 HT).
I have the feeling that this huge difference of numbers do not come only from the bigger number of register in a power4 core. I think it come from the end of the utilisation of the five instruction group trick.
I'm pretty sure that I saw mention of the instruction grouping in the POWER5. This would actually help SMT since they can track which thread a given group belongs to, rather than per instruction. The instruction grouping doesn't hurt these chips that badly once the code is optimized for it, its just one of those rules that the optimizer has to be extended to understand.
Comments
Originally posted by JCG
As I understand it, when IBM says that it is a dual coar CPU it means that the CPU has the circutry for 2 CPU cores that are interconnected on the silicone. This is different than Hyperthreading becouse there really are two physical "CPU" cores instead on one CPU with a "Virtual" core. Now, if IBM's version of Hyperthreading ( if forget the name) works in a similar manner to Intels, then 1 G5 processor would appear to the system as 4 CPU's (2 physical and 2 "virtual). Of course I could be totally wrong on this, I'm not a hardware expert.
You are right.
There are 4 Power5 CPUs on that cake, 2 cores each with SMT witch add 40% performance. If One 970 @ 2 GHz is equal in performance to one 1.6 GHz Power5 core, that cake will have the performance of 11 970 CPUs. IBM states that one machine can take 16 MCMs. 13 machines packed to the brim with Power5 MCMs will equal in performance to the VT cluster powered by 1100 PowerMac G5s.
Such a setup would probably blow the VT clusterout of the water in the real world but is likely cost like a proper supercomputer too.
http://www.research.ibm.com/journal/rd46-1.html
And a cool cutaway view of the Power4 MCM:
http://www.research.ibm.com/journal/rd46-6x1.html
It's too bad that so many people are getting multi-core and SMT confused, since they're not related.
Originally posted by wmf
Here's more than you ever wanted to know about the Power4:
http://www.research.ibm.com/journal/rd46-1.html
And a cool cutaway view of the Power4 MCM:
http://www.research.ibm.com/journal/rd46-6x1.html
It's too bad that so many people are getting multi-core and SMT confused, since they're not related.
Big thanks. Electrical engineering isn't my field of interest or aptitude, but I'll try to shed some light on this topic. Interesting read so far.
I find the dual-core design pretty interesting, mainly because it's an interesting approach to multiprocessing design. Traditionally you have CPUs in a multiprocessor system communicating with the memory and I/O controllers, but I don't believe they generally actually spend much time talking to each other. The Power4 architecture appears to do so.
I'm a little unsure how a dual-core Power4 or Power5 chip would appear to an operating system, as there are two possible approaches (that I can see). Either the OS is full aware that there are essentially two processors in the system, but does not know/care that they are sitting on the same die; or the OS would not be aware at all that there are two cores on each die.
The reason I think the latter could be possible is that all the instructions, data, etc. passes through the "fabric controller" before being allocated to one core or the other. I'm speculating that the OS could be unaware of the two cores being present, and would see only a single, albeit fast, unified processor.
If I'm correct, this could potentially mean seeing the benefits of SMP without recoding/optimization of apps.
present four virtual cores to the OS...
eweek
I heard IBM was going to be presenting some G3 'VX' and other variations at the MPF?
Supposedly some stuff to take the G3 beyond the Moto' G4...
I can't recall where I read that...
Lemon Bon Bon
http://www.mdronline.com/mpf/conf.html
http://www.theinquirer.net/?article=12217
I have the feeling that this huge difference of numbers do not come only from the bigger number of register in a power4 core. I think it come from the end of the utilisation of the five instruction group trick.
In the power4 and in the G5, the chip followed the instructions in process in the pipeline via a group of 5 instructions. At the contrary the P4 followed less instructions simultaneously but followed them individually.
I think (but i may be wrong) the grouping of instructions is bad for SMT, and thus IBM has no more this feature, and have exchange it for a individual one like the P4 (but much more complicated, due to the more hyperscalar architecture).
Grouping was a way to save transistors, but SMT certainly obliged IBM to leave this trick .
May be it's the reason why SMT required so much more transistors in the power5 architecture.
Any guess, by our AI geeks here ?
Originally posted by Powerdoc
I have read that the power5 core required 25 % transistors more than the power4 core (compared to 5 % more between the P4 and the P4 HT).
I have the feeling that this huge difference of numbers do not come only from the bigger number of register in a power4 core. I think it come from the end of the utilisation of the five instruction group trick.
I'm pretty sure that I saw mention of the instruction grouping in the POWER5. This would actually help SMT since they can track which thread a given group belongs to, rather than per instruction. The instruction grouping doesn't hurt these chips that badly once the code is optimized for it, its just one of those rules that the optimizer has to be extended to understand.