macbidouille Mars28 info on 970

placebo · March 28, 2003 7:24PM

The Calculator isn't what we'de call an essential feature. I don't think that the whole "services" thing on the Apple menu makes much sense, but thats just my opinion.

And what the hell does "Mars28 info" mean?

kim kap sol · March 28, 2003 7:58PM

Quote:

Originally posted by Placebo

The Calculator isn't what we'de call an essential feature. I don't think that the whole "services" thing on the Apple menu makes much sense, but thats just my opinion. And what the hell does "Mars28 info" mean?

March 28 in french. Who knows why the original poster decided to write the date in french.

john whitney · March 28, 2003 8:16PM

Quote:

Originally posted by Amorph

MaxBus can transfer 64 bits at a time either way, and GigaBus has two 32-bit paths.

Two fetches to fill one register, or are both paths bi-directional, so it can be 64-bits in one direction when needed?

Also, do you know if the 970 uses a 64-bit instruction set? If so, I wonder that kind of code bloat that might lead to.

John

eugene · March 28, 2003 8:34PM

Quote:

Originally posted by JLL

Even if it's soldered, it's still a place to have RAM.

But it doesn't allow you to match two different sticks of RAM, so in practice it's useless. Dual-channel DDR only works with DIMMs of the same capacity. If you only want 256 MB of RAM, have at it.

programmer · March 28, 2003 8:57PM

Quote:

Originally posted by John Whitney

Two fetches to fill one register, or are both paths bi-directional, so it can be 64-bits in one direction when needed?

Also, do you know if the 970 uses a 64-bit instruction set? If so, I wonder that kind of code bloat that might lead to.

The 970 uses PPC64 which is almost exactly the same as PPC32. Instructions are 32-bits each, and the encodings are all the same although some meanings are changed slightly due to 64-bit GPRs and there are a couple of additional instructions.

As for fetching behaviour -- there are almost no memory accesses these days except as cache line fetches & writes. This means that all memory operations are bursts of (typically) 32 or 64 bytes, depending on the processor's cache setup. As a result it doesn't matter what the actual width of the bus is, it just matters how fast the bus can transmit the required data. The pathway from cache to the registers is probably 128-bits wide to support the AltiVec registers. MPX is 64-bits wide and up to 167 MHz (so far), but it has seperate address lines. To deliver a 32 byte cache line the processor issues an address on the address lines and then the memory controller responds with 4 64-bit data clocks (note that the address and data operations can overlap and these transactions can be pipelined so the time to deliver a full cache line is close to 4 clocks). This gives the MPX a speed of about 1.3 GB/sec maximum. The 970's FSB has 32 lines in each direction, no address lines, and runs at 450 MHz double pumped. In theory that means 3.6 GB/sec in each direction but for every 8 32-bit cycles of data (i.e. a 32-byte cache line) there has to be an address sent as well which probably takes 2 cycles (its a 48-bit address plus some descriptive information like the length of the message). This overhead drops the performance to 3.2 GB/sec, according to IBM... based on those numbers, however, I'd guess that the 970 is actually bursting 64 byte cache lines (16 data cycles + 2 address cycles gives about 88.9% efficiency and strangely enough 88.9% of 3.6 GB/sec is 3.2 GB/sec -- imagine that!).

As an aside, due to the encoding of the PPC instruction set immediate values are limited to 16-bits (there is only 32-bits in the instruction, after all, and 5-bits of that is to specify the destination register) so loading a 64-bit register with an immediate value can take 4 instructions. More typically the compiler won't use immediate loads but will instead load directly from a global table where constants are kept, which takes only one instruction.

mooseman · March 28, 2003 10:01PM

Quote:

Originally posted by Clive

No worries about that, US presidents do it every day.

...your "official" court language used to be French, no?

Ahahahahaha. Britain, still smarting from the "Ass kicking of 1066!"

(j/k)

boy_analog · March 28, 2003 10:56PM

Quote:

Originally posted by midwinter

I think it's a little funny that you linked to a page published by a professor of computer science.

You'll note that the bulk of this page is simply lifted from Fowler. Good to see that being a CS professor is no barrier to acquiring a measure of discernment.... Once they ditch those pocket protectors, all hell breaks loose!

March 28, 2003 11:59PM

FWIW, mosr seems to be echoing these rumors as well.....

Quote:

( Friday, March 28 )

Serial ATA, USB 2.0, AGP 8X, PCI-X expansion slots, Firewire 800, and a new keyboard and mouse are all specifically mentioned as intended features in what few Apple documents we've been able to see on the new systems. If anything is cut, USB 2.0 will be first on the block, with Serial ATA and PCI-X tied for a distant second....and at this point, we don't think cuts have been made. In fact, the program to build Apple's HyperTransport-based motherboards that will be home to those impressive IBM processors appears to be coming along well. Prototypes are said to have a competitive PC3200 memory architecture (200MHz DDR SDRAM), and some have even been produced with twin PC3200 memory banks to be able to fully saturate the HyperTransport and PPC 970 processor front-side busses, which can each push 6.4GB/s. PC3200, which is already costly and in relatively short supply when talking about only a single memory bus, is as its name indicates, only able to provide about 3.2GB/s of bandwidth.

The extra cost, and need to install memory in two paired banks of a twin-PC3200 architecture are not attractive to Apple, but having an utterly leading-edge professional architecture is. The balance between these, paired with the fortunes of world war and an unstable economy, will decide for Apple which architecture it can release right off the bat. We've been hearing single-bank is the choice for the initial release, but it is still too soon to say with certainty.

applenut · March 29, 2003 12:46AM

most likely ripped it from that fench site

john whitney · March 29, 2003 7:07AM

Quote:

Originally posted by Programmer

The 970 uses PPC64 which is almost exactly the same as PPC32. Instructions are 32-bits each, and the encodings are all the same although some meanings are changed slightly due to 64-bit GPRs and there are a couple of additional instructions.

As for fetching behaviour -- there are almost no memory accesses these days except as cache line fetches & writes. This means that all memory operations are bursts of (typically) 32 or 64 bytes, depending on the processor's cache setup. As a result it doesn't matter what the actual width of the bus is, it just matters how fast the bus can transmit the required data. The pathway from cache to the registers is probably 128-bits wide to support the AltiVec registers. MPX is 64-bits wide and up to 167 MHz (so far), but it has seperate address lines. To deliver a 32 byte cache line the processor issues an address on the address lines and then the memory controller responds with 4 64-bit data clocks (note that the address and data operations can overlap and these transactions can be pipelined so the time to deliver a full cache line is close to 4 clocks). This gives the MPX a speed of about 1.3 GB/sec maximum. The 970's FSB has 32 lines in each direction, no address lines, and runs at 450 MHz double pumped. In theory that means 3.6 GB/sec in each direction but for every 8 32-bit cycles of data (i.e. a 32-byte cache line) there has to be an address sent as well which probably takes 2 cycles (its a 48-bit address plus some descriptive information like the length of the message). This overhead drops the performance to 3.2 GB/sec, according to IBM... based on those numbers, however, I'd guess that the 970 is actually bursting 64 byte cache lines (16 data cycles + 2 address cycles gives about 88.9% efficiency and strangely enough 88.9% of 3.6 GB/sec is 3.2 GB/sec -- imagine that!).

As an aside, due to the encoding of the PPC instruction set immediate values are limited to 16-bits (there is only 32-bits in the instruction, after all, and 5-bits of that is to specify the destination register) so loading a 64-bit register with an immediate value can take 4 instructions. More typically the compiler won't use immediate loads but will instead load directly from a global table where constants are kept, which takes only one instruction.

Thanks, Programmer. That was just the information I was interested in. Which compiler, by the way, makes use of a global constant table? GCC certainly doesn't seem to (at least, it hasn't in any of the assembly I've seen come out of it to date), at least at the kernel level.

Are they adding new instructions for immediate loads (li, lis, liss, lisss), or will manual shifts be required and oris used?

netromac · March 30, 2003 2:39AM

Quote:

From Macbidouille (translation by Google)

PowerPC has an instruction set to handle the memory hiding place: dcbt, dcbz, dcba... These instructions depend on the physical characteristics of the memory hiding place. And in particular of the size of a line of mask. The mask is cut out in lines.

For Power4, IBM used a line of mask of 64 bytes. However all PowerPC used by APPLE previously had a line of mask of 32 bytes.

The use of the instruction dcbz can generate bugs with PowerPC 970. This instruction puts at 0 a line of mask. On G4, it puts 32 bytes at 0. On PowerPC 970, it will put 64 bytes at 0. Because PowerPC 970 derives from Power4. From where an involuntary risk of obliteration of data with PowerPC 970.

It is amusing to note that APPLE updated recently its instructions

http://developer.apple.com/technotes/tn/tn1174.html . In this technical note of 1999, APPLE shows how to use the instruction dcbz with G4. APPLE incited the developers then to use it.

http://developer.apple.com/hardware/...ce_memory.html . Now, the instruction is "C not uses dcbz". Strange not! (A my knowledge the instruction dcba is not supported by Power4, it east can be why APPLE declares also "C not uses dcba").

How many applications and what applications would be affected by this problem? Is there other reasons than the transition to the 970 that would lead Apple to make these changes?

programmer · March 31, 2003 12:37AM

That was an absolutely hilarious translation! Automatic translation of technical documents can be quite a hoot.

Apple could be doing this because there is a different processor with a larger cache line on the way, but that is considerably less likely than the 970 being the cause.

This could affect any application, but in practice I doubt many programmers tried this optimization.

zapchud · March 31, 2003 3:22AM

Hilarious yes, useless yes

Usually I can understand these automatic translations pretty well, since they aren't too messed up, but this is so far the worst example I've seen!

jxfreak · March 31, 2003 5:59AM

I think apple will use Twin Bank in the 970's. Nvidia used the Twin Bank in their Nforce 1 and 2 motherboard designs as part of HyperTransport. Apple being part of the HyperTransport consortium...

netromac · March 31, 2003 6:03AM

Quote:

Originally posted by Programmer

That was an absolutely hilarious translation! Automatic translation of technical documents can be quite a hoot.

Apple could be doing this because there is a different processor with a larger cache line on the way, but that is considerably less likely than the 970 being the cause.

This could affect any application, but in practice I doubt many programmers tried this optimization.

Quote:

Originally posted by r-0X#Zapchud

Hilarious yes, useless yes

Usually I can understand these automatic translations pretty well, since they aren't too messed up, but this is so far the worst example I've seen!

Yes, I thought you'd like it

barto · March 31, 2003 6:48AM

Quote:

Originally posted by jxfreak

I think apple will use Twin Bank in the 970's. Nvidia used the Twin Bank in their Nforce 1 and 2 motherboard designs as part of HyperTransport. Apple being part of the HyperTransport consortium...

TwinBank is not a technology, it is a trademark. It simply refers to having two channels of memory. It is not revolutionary, but perhaps neccessary to take advantage of multiprocessing 970s.

It has nothing whatsoever to do with HyperTransport. Which is not to say that HT doesn't have some interesting possibilities for the next Power Mac. ApplePI makes a lot of sense in light of the 970 announcement. ApplePI, based on HT, to be used to connect a companion chip to an IC, perhaps?

Barto

macbidouille Mars28 info on 970

Comments