A couple questions re: PPC 970 chip

nebcon65 · January 20, 2003 3:15PM

So far as I understand it all the benchmark scores and even possibly the Mhz rating of the first chips to come off the line have been expressed by IBM engineers as "conservative" estimates. i don't have a link for the document that I saw on that but i know that Hannibal over at Ars Techica has been working on an in depth story on the 970 that he said would include more detailed information directly from IBM. I check Ars every day but I'm still waiting. This artical may have some of the answers we are looking for, at least in terms of non Apple hardware specific features.

I've all but given up on benchmarks. There seems to be no end to that debate.

nevyn · January 20, 2003 7:00PM

[quote]Originally posted by Borborygmi:

1) MHz for MHz, is the 970 faster than the current G4? If Apple ships a 2x1.25 Ghz 970 based machine, will it be faster? If so, how much (roughly)?<hr></blockquote>

It, of course, depends on how you plan to use it. The SpecFP results closely model what _I_ use, though they do completely neglect AltiVec.

Here's where the 1GHz G4 is currently:

<a href="http://www.heise.de/ct/english/02/05/182/qpic03.jpg"; target="_blank">Some Spec2000 FP numbers</a>

Floating Point Results:

Athlon 1666 : 596

P4 2200 : 779

G4 : 187

PIII 667 : 222

And IBM says the 1.8 GHz PPC 970 will reack SpecFP of 1051. That's 5.6 times as fast on floating point. But wait - this is comparing a 1GHz chip to a 1.8 GHz chip -> An estimate for a benchmark of a 1GHz PPC 970 would be 583. That's STILL 3.1 times faster at floating point on a clock-for-clock basis. At 20 Watts. (Where the chart of Pentiums & Xeons I have averages more like 60 Watts)

So a _single_ IBM PPC 970, with a single core, at 1.8 GHz should beat a Dual G4 1.25 GHz by 1051 to 467 -> more than double the FP. Just playing with numbers we'd need either a 5.6 GHz G4 to tie one-on-one, or a dual 2.8 GHz G4...

_I'm_ ecstatic. And a lot of the benchmarks that the G4 has been crushed in lately rely on double precision floating point - say Lightwave etc.

I'm not going to list guesses for what a dual CPU, dual cored varient would do. (For which I would gladly pay though the nose).

[quote]Originally posted by Borborygmi:

2) What are the implications of it being a 64-bit chip? Will it be faster due to this? What is the advantage here?<hr></blockquote>

There's no reason for the iMac line to _care_. As far as consumer level products are concerned, it's a checkmark on a feature list - it won't ever affect the speed of anything for or against - it just isn't that necessary. (Unless it can be leveraged in encoding codecs maybe, but those should be using AV).

In the pro-desktop/workstation/server market, it will be a big deal. The ppc group & IBM's approach seems to be the most coherent & least headache. 32-bit things will 'just work', 64-bit things will 'just work'. Apps that use a LOT of data will work _faster_ after being recoded to utilize the larger registers - think Oracle, science/engineering custom code, etc. Normal programs - think iCal - won't _ever_ be recoded to be 64-bit, nor should they even need a recompile. They'll run at the same speed regardless. One other advantage would be more than 4GB of physical RAM.

The nice thing (as far as I can tell) is that a 'borderline' app can cross over if it needed to. That is, there might not be any benefit to Photoshop overall. But perhaps someone came up with a really cool plugin that works faster in a 64-bit environment. My understanding is that that would be fine - individual threads could (carefully) be 64-bit in an application that is mostly 32-bit.

amorph · January 20, 2003 10:41PM

[quote]Originally posted by Nevyn:

There's no reason for the iMac line to _care_.<hr></blockquote>

Not for a while, anyway.

Although, it hasn't been lost on a number of people that the iMac makes a spiffy little server complete with a built-in console. It's obviously not outfitted for mission-critical work, but there are a lot of servers that don't have to be set up that robustly.

[quote]The nice thing (as far as I can tell) is that a 'borderline' app can cross over if it needed to. That is, there might not be any benefit to Photoshop overall. But perhaps someone came up with a really cool plugin that works faster in a 64-bit environment. My understanding is that that would be fine - individual threads could (carefully) be 64-bit in an application that is mostly 32-bit.<hr></blockquote>

I can see Photoshop and its ilk going this way relatively quickly. Even though output to screen and to print will probably remain 24-bit for some years yet, it would be nice to do all the back-end computation at 64 bit and then round back down to 32 bit for output, to all but eliminate color distortion introduced by filters and layers and transparency. It would be a level of detail that only pros would care about - or even notice - but I'm sure they'd appreciate it.

There is, of course, some output that is more precise - I believe Hollywood digital is 36 bits, with 12 bits of each color. If FCP could output media at that quality, then digital would be ready to move into the big time - and bring some of the big time down to a sane price range, as it often does.

I can see audio doing that as well. OS X supports 32-bit sound, but if you have a lot of effects on a given track it would be nice to know that the processing is introducing an absolute minimum amount of distortion. The effects could be that much subtler, too.

The benefits for 3D should be obvious.

(I should point out that they could all be doing this now, since the G4 is quite capable of crunching 64-bit floats - but since the 970 should be able to do that a lot faster, they're probably holding out.)

[ 01-20-2003: Message edited by: Amorph ]

[ 01-20-2003: Message edited by: Amorph ]

odenshaw · January 20, 2003 11:48PM

chip speed!?!

what about the 900 mhz front side bus

christ man, thats the good thing.

nevyn · January 21, 2003 1:54AM

[quote]Originally posted by Amorph:



Although, it hasn't been lost on a number of people that the iMac makes a spiffy little server....<hr></blockquote>

Sure. But there's no reason for the iMac of 2005 running as a home server (DHCP, DNS, Print, File, PVR, iTune-server, heck - phone, security, whatever) to need 64-bits. I have a database for recipies, Oracle would benefit from 64-bit -> this does NOT imply that my recipie database will ever be 64-bit. I just am not going to have more than 32 million record -> pointless. Making the app use 64-bit even when it doesn't need to is just code-bloat/memory maker heaven.

[quote]I can see Photoshop and its ilk going this way relatively quickly. Even though output to screen and to print will probably remain 24-bit for some years yet, it would be nice to do all the back-end computation at 64 bit ...<hr></blockquote>

I thought all that was going towards floating point -> and the floating point units are already 64-bit.

Edit:bad quoting.

[ 01-21-2003: Message edited by: Nevyn ]

dfiler · January 21, 2003 7:38AM

[quote]Originally posted by Nevyn:

[QB]...

I just am not going to have more than 32 million record -> pointless. Making the app use 64-bit even when it doesn't need to is just code-bloat/memory maker heaven.

...[QB]<hr></blockquote>

Quite true. But I think a more typical use for 64bit architecture will be increased RAM capacity. With how cheap memory is getting, many people can afford and use an excess of 4gigs. All iMovie users could bennefit from boatloads of memory.

programmer · January 21, 2003 9:54AM

[quote]Originally posted by dfiler:

Quite true. But I think a more typical use for 64bit architecture will be increased RAM capacity. With how cheap memory is getting, many people can afford and use an excess of 4gigs. All iMovie users could bennefit from boatloads of memory.<hr></blockquote>

And a couple of gigabytes isn't enough? I don't see any of the iApps going 64-bit for quite a while because they don't really benefit that much, and most of the market will still be 32-bit processors. Its going to be some time before we see much software that needs to be 64-bit, so its better to keep it as 32-bit for both performance and compatibility. Some might be made 64-bit just for the "coolness" factor, but that's really just marketing hype.

cowerd · January 21, 2003 11:37AM

[quote]Quite true. But I think a more typical use for 64bit architecture will be increased RAM capacity. With how cheap memory is getting, many people can afford and use an excess of 4gigs. All iMovie users could bennefit from boatloads of memory.<hr></blockquote>Doesn't the G4 already support 36-bit memory address space, allowing max of 64GB of physical memory? The 4GB is a limit in place because of Apple's mobo design.

whisper · January 21, 2003 2:10PM

[quote]Originally posted by cowerd:

Doesn't the G4 already support 36-bit memory address space, allowing max of 64GB of physical memory? The 4GB is a limit in place because of Apple's mobo design.<hr></blockquote>

I just did the calculations and 36-bit addressing does allow up to 64GB of RAM.

nevyn · January 21, 2003 2:15PM

[quote]Originally posted by cowerd:

Doesn't the G4 already support 36-bit memory address space, allowing max of 64GB of physical memory?<hr></blockquote>

Yes. Not sure about what the newest G4-varients support.

But even with pure 32-bit code in an OS running on 64-bit hardware... that's 32-bits (4GB!) per _thread_. Tough to see for iApps. iDVD/FCP maybe.

dumpster · January 21, 2003 4:39PM

There are more reasons to go 64bit than merely the maximum amount of addressable space.

Plus, if that's all you wanted to do--just redefine what a byte is on the system.

Some other changes: having 64b memory/bus means moving data around in 2x volume/clock, and lots of data exceeds 32bits now anyway. Instructions can become more complex--so that won't be a huge benefit for PPC [which must flush out all the bits so all are 64b in length--which has the advantage of not having to decode an instruction before knowing where it ends and data begins], unless they start instituting VLIW and steady-state programming [which can have huge benefits for algorithm speed by parallel-izing memory access times].

If A/V remains at 128b, then the time it spends waiting for data gets cut in 1/2.

One drawback is that all pointers are now 2x in size--not that big of a deal for the non-hardware driver folks.

Integer data can now be represented accurately to 2^64: that's great because floating point is slow and inaccurate. High->extremely high values can be computed using integer math: faster computation.

Larger memory blocs imply better cache-data locality [until programmers get lazy again and start using unnecessarily massive structures]: higher cache hit rate = faster.

Then there's communication across the bus to your PCI bus and AGP "bus" . . . .

No single one of these will revolutionise your desktop experience, each will contribute a little bump in performance for the machines you see next year. On the PC side, there wasn't a massive change in "speed" during the 16->32 bit change [and 32->64 is less profound]: but if you continued to evolve both architectures you would see a massive difference now. So, it serves to remove the dampening effect on performance which would be seen as hardware improves.

My question would be, how is VMM handled? What's the caching system for the PD and PT, and how many layers--doing that wrong vs. doing that right is about an order of magnitude of performance lost/gained. If anyone has a white paper on that, please supply a link!

nevyn · January 21, 2003 5:16PM

[quote]Originally posted by dumpster:

There are more reasons to go 64bit than merely the maximum amount of addressable space.<hr></blockquote>

Sure there are, but the size of the bus doesn't depend on the size of the registers necessarily. If it did, everything would have to be 128-bit buses due to the AltiVec unit. Making an integer unit 32-bit versus 64-bit doesn't prevent either a 8-bit 'memory' bus or a 1024-bit memory bus. (Not that either makes sense necessarily)

And switching the bus from 32-bit to 64-bit doesn't inherently double anything... unless there's twice as many wires running around (or a completely different bus setup). Doubling the wires would help a 32-bit-only chip just as much. (and cost basically the same).

And changing the size of everything is not in the cards. The vast majority of programs aren't going to have loops that actually count _past_ 4 billion. That's where having an 'int' be 64-bits starts being useful - just not happening _overall_. It is easy to see why people wanted to go past 8-bits as counters - 256 is way too low. 16-bit is also too low... but at some point you are passing more 'useless' information around than is justified.

When you are keeping track of the number of windows open, 8-bit is probably too low... but 16 & 32-bit would be a LOT of windows... and a 64-bit counter for the number of open windows is insane. Roughly 56-bits of _every_ such counter would be dead-weight dragged around across the bus/through the cache/through the registers. Decisions/variables like that vastly outnumber the ones than need special treatment because they can potentially exceed 4-billion.

There are techniques that are used on 32-bit machines to reach 'arbitrary' bitness - that is to express numbers like 4,294,967,297, accurate to the nearest integer..., but those techniques are slower than 64-bit-native would be -> those are the programs that will see _inherent_ speedups. Those programs are 1) few and extremely far between, 2) either expensive or custom, 3) not likely to ever include iCal amongst them.

PS Gosh I wish the spell-checker would stay checked in Safari. I miss OW. Sigh.

[ 01-21-2003: Message edited by: Nevyn ]

[ 01-21-2003: Message edited by: Nevyn ]

programmer · January 21, 2003 9:19PM

[quote]Originally posted by dumpster:

Plus, if that's all you wanted to do--just redefine what a byte is on the system.

<hr></blockquote>

Heh, that's funny... you'd break every piece of software in existence.

A 32-bit machine can address 4 GB of memory per process, not per thread. Each running application is a process and gets one "address space" which must be addressable by a pointer value... 32-bits in the case of a 32-bit app. A 64-bit app has 64-bit pointers so the address space is actually 16 billion GB. The 970 has "only" 42-bit physical addressing so it is "limited" to about 4000 GB. The G4 has 36-bit physical address so it could actually use 16 GB of RAM, but a single process is limited to 4 GB (actually a little less because of some OS per-process overhead, I believe).

As for 64-bit math... there are very few applications that actually need numbers that big which (a) wouldn't prefer double precision floating point, or (b) make do with the 50-some-odd-bit mantissa in the double precision float. There are a few times when the 64-bit integers are very convenient, but rarely are they essential.

costique · January 22, 2003 2:22AM

The amount of addressable space is becoming an issue in pre-press. First of all, there are huge works which easily make 1.5GB per page plate (roughly 6GB for separated CMYK postscript). I've seen a couple of them last week brought by a crazy client of ours. Second, did anyone here try A1 posters at 300dpi in Photoshop? Layers, masks, filters, history... Photoshop devours your disk space by gigabytes like there's no tomorrow. I can hardly imagine the scale of algorithmic problems involved to not ever cross the 4GB limit.

barto · January 22, 2003 5:03AM

1) MHz for MHz, is the 970 faster than the current G4? If Apple ships a 2x1.25 Ghz 970 based machine, will it be faster? If so, how much (roughly)?

2) What are the implications of it being a 64-bit chip? Will it be faster due to this? What is the advantage here?

Yes. The 970 dispatches 5 instructions per cycle, the 745x (G4) dispatches 4. So automagically its 20% faster clock for clock (peak).

Most applications will be much faster because although the 970 can process a bit more data per clock, it can access much more data to process per clock.

Some applications such as databases, and in future advanced visual applications like FCP and Photoshop (with 64 and 128 bit colour), will take advantage of 64-bits to be even more speedy than the G4.

Barto

PS

In South Australia I was born

Heave away, haul away

South Australia 'round Cape Horn

Bound for South Australia

costique · January 22, 2003 5:35AM

[quote]Originally posted by Barto:

The 970 dispatches 5 instructions per cycle, the 745x (G4) dispatches 4. So automagically its 20% faster clock for clock (peak).<hr></blockquote>

It's true only for peak performance. Holes in the longer pipeline are a greater penalty.

pilmour boy · January 22, 2003 7:11AM

[quote]Originally posted by Programmer:



Heh, that's funny... you'd break every piece of software in existence.

A 32-bit machine can address 4 GB of memory per process, not per thread. Each running application is a process and gets one "address space" which must be addressable by a pointer value... 32-bits in the case of a 32-bit app. A 64-bit app has 64-bit pointers so the address space is actually 16 billion GB. The 970 has "only" 42-bit physical addressing so it is "limited" to about 4000 GB. The G4 has 36-bit physical address so it could actually use 16 GB of RAM, but a single process is limited to 4 GB (actually a little less because of some OS per-process overhead, I believe).

As for 64-bit math... there are very few applications that actually need numbers that big which (a) wouldn't prefer double precision floating point, or (b) make do with the 50-some-odd-bit mantissa in the double precision float. There are a few times when the 64-bit integers are very convenient, but rarely are they essential.<hr></blockquote>

Note- this only applies to MacOS X. Windows NT x has a 2:2 split for the app and the OS. 2 GB are reserved for the OS and shared libraries, and then the app itself gets another 2. This can be user configured for a 1:3 split, but I believe that this has some side effects. On MOSX, each app gets 4 GB, but all of the shared libs/frameworks that that app loads must be in there too, but it is within a single address space.

I'm not sure if apps on the 970 will be able to use the full 42 bit space, or if that will only be available to the OS.

costique · January 22, 2003 7:29AM

[quote]Originally posted by Pilmour Boy:

I'm not sure if apps on the 970 will be able to use the full 42 bit space, or if that will only be available to the OS.<hr></blockquote>

Do you believe the 970 will support such a level of granularity as 5.25 byte addresses?

dumpster · January 22, 2003 4:59PM

[quote] Originally posted by Programmer:



Heh, that's funny... you'd break every piece of software in existence.<hr></blockquote>

Even the ones on other machines?

I wasn't suggesting it as a good idea--though, it is worth mention that there are plenty of architectures have bytes != 8 bits. [hence all that define BYTE and MAX_INT in the headers of your C projects].

Nevyn: it look like we agree, but I don't understand your focus on counters. After all, a typical loop only lasts five iterations (datum a posteriori).

Sure, doing nominal integer math/operations on signed/unsigned numbers > 32/31 bits is less likely as the number of bits increases. However, I have run into multiple instances where doing transforms on numbers > 2^32 is required and cannot be done w/floating-point [encoding, for example]. Of course, you find a more complex way of handling the requirements--but that's neither the faster nor more sustainable method and is therefore not preferred.

There are lots of operations which fundamentally cannot be achieved through floating-point operations [encryption, for example] because of a zero tolerance for approximation--not to mention that FP is significantly slower.

As for the bus wires--I did assume that the bus would advance in proportion [though, not immediately, of course]. I don't know what the limits are for handling multiplexing on the bus--so it's my guess that it is a preferable state to not break data into multiplexed or multi-clocked blocs.

amorph · January 22, 2003 5:42PM

[quote]Originally posted by Nevyn:

But there's no reason for the iMac of 2005 running as a home server (DHCP, DNS, Print, File, PVR, iTune-server, heck - phone, security, whatever) to need 64-bits. I have a database for recipies, Oracle would benefit from 64-bit -> this does NOT imply that my recipie database will ever be 64-bit. I just am not going to have more than 32 million record -> pointless. Making the app use 64-bit even when it doesn't need to is just code-bloat/memory maker heaven.<hr></blockquote>

I was being silly, but on the other hand, there are plenty of apps that require 3D even though, had they been coded more carefully, they could use 32 bit with little to no loss of precision.

Also, you might want an Oracle database on an iMac: If it's a little server in a shop with Oracle running on a big box or cluster it can use Oracle native routines to talk to the big box (or other Oracle instances) quickly and easily for its own more modest needs.

Also, for the student 3D artist, it'll run 3D apps at a good clip. If I thought about it I could probably summon up all kinds of interesting uses for a 64-bit iMac - none of which, admittedly, have much to do with the consumer market, but hey, a sale is a sale.

[quote]quote:I can see Photoshop and its ilk going this way relatively quickly. Even though output to screen and to print will probably remain 24-bit for some years yet, it would be nice to do all the back-end computation at 64 bit ...

I thought all that was going towards floating point -> and the floating point units are already 64-bit.<hr></blockquote>

Yes, which means that any transformations applied by the graphics card will not result in significant distortion. However, if PShop isn't tweaked to take advantage of the graphics cards' resolution, it'll still be storing color as a 32 bit integer, and PShop operations will still result in color distortion - which the graphics card will reproduce faithfully.

Also, as I mentioned, the designer might want 64-bit FP color accuracy when she's targeting other media than her screen (e.g., printers, film), in which case the graphics card's capabilities are irrelevant.

[ 01-22-2003: Message edited by: Amorph ]

A couple questions re: PPC 970 chip

Comments