gp-ul: Single or Dual?

powerdoc · October 15, 2002 3:34PM

[quote]Originally posted by AirSluf:



The Athlons are currently superior because they have more/faster floating point units, not bigger ones.

<hr></blockquote>

Yes but , even if they are fully pipelined the FPU units of the Athlon are specialized and thus not always able to deliver 3 operations per cycle.

A design with two global FP units will be nice and nearly as effective as the Athlon design.

programmer · October 16, 2002 10:39AM

[quote]Originally posted by Amorph:

Also, don't forget that Oracle has started to move to OS X Server in earnest. They can have a 64 bit 9i up and running not long after Apple ships a 64 bit OS X Server - its clustering capabilities are almost custom made for the XServe. Then, of course, Sybase and possibly DB2 are coming, and they'll tax the platform as thoroughly as it can be taxed.

Lightwave and Maya and the other 3D apps will see an immediate benefit from the 64 bit FP support.

I would say that there's no shortage of 64 bit apps waiting in the wings. But they're not apps in the Mac's traditional strongholds. 64 bit color is coming, and doubtless Adobe will be on top of that game, but there are more immediate uses for the extra bits.

If nothing else, FCP/DVDSP users might appreciate the ability to slurp a >4GB file into memory, and possibly even into >4GB of physical RAM.<hr></blockquote>

Yes, once machines start arriving with >4 GB of RAM we'll start seeing some of the 64-bit advantages. The movie editors will certainly benefit. Photoshop? Well not your average user, but the high end guys who are pushing the memory limits will. The database guys. A bunch of scientific applications.

While these are very important apps they don't represent anything close to a majority of Apple's users though.

As mentioned already the PowerPC has had 64-bit floats since day 1, so there isn't a new FP capability, just increased performance (which is very welcome!).

mrmister · October 16, 2002 11:24AM

"64 bit color is coming, and doubtless Adobe will be on top of that game, but there are more immediate uses for the extra bits."

How many colors will our monitors have...a "billions" entry?

Does this mean higher color fidelity and photorealism in monitors, or is the monitor tech the delimiting factor?

chych · October 16, 2002 11:44AM

64 bit colors, that's 2^64 ≈ 18 quintillion colors, quite a far bit from billions

How many gigs of video ram would you need to run that

[ 10-16-2002: Message edited by: chych ]

programmer · October 16, 2002 12:03PM

[quote]Originally posted by chych:

64 bit colors, that's 2^64 ≈ 18 quintillion colors, quite a far bit from billions

How many gigs of video ram would you need to run that

[ 10-16-2002: Message edited by: chych ]<hr></blockquote>

Actually it is 16 billion billions.

The video RAM requirements merely double because current pixels are 32-bits and going to 64-bit ones would just double that.

Currently displays (good ones) only have about 10-bits per channel of colour resolution so only 30-bit colour is actually needed (note that 32-bit colour is typically 8-bits per channel with RGB plus an alpha channel that isn't visible). Some new boards (Matrox, for example) actually support a 10/10/10/2 mode. All of the new 3D video chipsets, however, will support 128-bit colour where each pixel has 4 single precision floating point values. This is really only useful for intermediate buffers during 3D work -- it could be fed to the display but the limiting factor is the display in this case.

10-bits is very close to the perceptual limit, and 16-bits is probably well beyond it, so I doubt we'll ever see displays that really use beyond 16-bits of precision. Single precision floating point has 23 bits of mantissa / precision. The video chipsets, however, will allow the use of floating point frame buffers that have much higher precision for reasons other than the visual results. Floating point also has the advantage of having much greater dynamic range, although its not clear that that would be useful for actual display either.

amorph · October 16, 2002 3:07PM

I was unaware that the transition to 64-bit color implied a transition to FP. Learn something new every day.

At any rate, the issue with 16-bit FP is not the sheer number of values it can represent, nor the range: Both are enormous. However, all values are approximated, and the approximations get coarser and coarser toward the extremes, until distortion becomes a real problem. This has been an issue with 16-bit digital sound. I'm not sure whether it would be with 16-bit FP applied to color.

Also, the main use for such a huge colorspace would be to reduce the amount of color distortion introduced by rounding errors when colors were manipulated and combined (again, in the same way that sound is processed at much higher quality than the eventual CD, or in the same way that you get a better image from drawing at 300dpi and rendering down to 72dpi than you do from simply drawing at 72dpi). The monitor wouldn't really have to keep up with the theoretical maximum that the FP representation was capable of displaying.

I hope I have this all right.

ubertweek · October 16, 2002 5:56PM

The Power4 was originally called the "Giga Processor," but it debuted at 1.3GHz.. (I think) .. the UL - "ultralite" - part of GPUL means only one core.. The successor to the Power4 is called the GQ (one letter after P). So, GPUL = Power4 w/one core.

mrmister · October 16, 2002 6:00PM

Thanks, Programmr...very informative.

I would like a "quadrillions" entry in the displayspael, however.

programmer · October 16, 2002 8:40PM

[quote]Originally posted by Amorph:

I was unaware that the transition to 64-bit color implied a transition to FP. Learn something new every day.

At any rate, the issue with 16-bit FP is not the sheer number of values it can represent, nor the range: Both are enormous. However, all values are approximated, and the approximations get coarser and coarser toward the extremes, until distortion becomes a real problem. This has been an issue with 16-bit digital sound. I'm not sure whether it would be with 16-bit FP applied to color.

Also, the main use for such a huge colorspace would be to reduce the amount of color distortion introduced by rounding errors when colors were manipulated and combined (again, in the same way that sound is processed at much higher quality than the eventual CD, or in the same way that you get a better image from drawing at 300dpi and rendering down to 72dpi than you do from simply drawing at 72dpi). The monitor wouldn't really have to keep up with the theoretical maximum that the FP representation was capable of displaying.

I hope I have this all right.

<hr></blockquote>

Yeah, pretty much. "64-bit" colour doesn't have to be floating point, but that is what the 3D cards have done. These cards blend successive polygons over top of the frame buffer, which is why they need the precision. I'd expect that most intermediate buffers will be 32-bit floats so precision shouldn't be a problem at all.

telomar · October 17, 2002 6:05AM

This isn't on the PPC970 exactly but who wants to bet IBM isn't thinking in similar terms.

[quote] "We've got multicore Hammer already at the electrical level," he said. "At 90 nanometers it's very practical ? but we're not announcing any products today." <hr></blockquote>

sc_markt · October 17, 2002 9:39AM

[quote]Originally posted by Telomar:

This isn't on the PPC970 exactly but who wants to bet IBM isn't thinking in similar terms.

<hr></blockquote>

I'm sure IBM could make a PPC970 with a dual core if Apple wanted. In fact, I bet one day we'll see it.

programmer · October 17, 2002 10:20AM

[quote]Originally posted by sc_markt:



I'm sure IBM could make a PPC970 with a dual core if Apple wanted. In fact, I bet one day we'll see it.<hr></blockquote>

Consider that the POWER4 was ~172 million transistors on a 0.18 micron process. Expensive, yes, but IBM did it. Now consider that the new GPUL is only 52 million. That means there is room for 3.3 GPUL's on one chip at 0.18 microns. IBM will have 0.09 available in the not-too-distant future so putting 2 or 4 GPUL cores on a single chip won't merely be possible, it'll be economical.

Now consider the IBM research paper published a while back that proposes "The Cellular Approach" where one chip of up to a billion transistors is a network of processor cores. In a billion transistors they could fit at least 16 GPUL cores.

IBM says that GPUL is good for up to 16-way SMP. Hmmm... perhaps we have something approximating a roadmap here?

telomar · October 17, 2002 9:56PM

[quote]Originally posted by sc_markt:



I'm sure IBM could make a PPC970 with a dual core if Apple wanted. In fact, I bet one day we'll see it.<hr></blockquote>

The part that I was really pointing out was not so much the multi-core aspect, which every company is picking up, but the timeline AMD is aiming at.

eupfhoria · October 18, 2002 1:00AM

[quote]Originally posted by Programmer:



Yeah, pretty much. "64-bit" colour doesn't have to be floating point, but that is what the 3D cards have done. These cards blend successive polygons over top of the frame buffer, which is why they need the precision. I'd expect that most intermediate buffers will be 32-bit floats so precision shouldn't be a problem at all.<hr></blockquote>

hey programmer, where did you get so friggen smart?

And how do you always make it seem so friggen simple?

And please tell me you are a millionaire, cause that's a talent.

outsider · October 18, 2002 7:19AM

[quote]Originally posted by Programmer:



Consider that the POWER4 was ~172 million transistors on a 0.18 micron process. Expensive, yes, but IBM did it. Now consider that the new GPUL is only 52 million. That means there is room for 3.3 GPUL's on one chip at 0.18 microns. IBM will have 0.09 available in the not-too-distant future so putting 2 or 4 GPUL cores on a single chip won't merely be possible, it'll be economical.

Now consider the IBM research paper published a while back that proposes "The Cellular Approach" where one chip of up to a billion transistors is a network of processor cores. In a billion transistors they could fit at least 16 GPUL cores.

IBM says that GPUL is good for up to 16-way SMP. Hmmm... perhaps we have something approximating a roadmap here?<hr></blockquote>

How about the flip side; instead of a processor with multiple cores with their own BPU and cache and an internal bus for interconnecting them all, what about a processor with many execution units? A 970 derived processor with like 8 FXUs, 6 FPUs, 4 vector units, 128 I&D L1 cache and a huge 2MB L2 cache? It would have to have a lot of dedicated hyper-threading circuitry but it think an approach like this would yield better results on code out now.

programmer · October 19, 2002 9:21AM

[quote]Originally posted by Outsider:



How about the flip side; instead of a processor with multiple cores with their own BPU and cache and an internal bus for interconnecting them all, what about a processor with many execution units? A 970 derived processor with like 8 FXUs, 6 FPUs, 4 vector units, 128 I&D L1 cache and a huge 2MB L2 cache? It would have to have a lot of dedicated hyper-threading circuitry but it think an approach like this would yield better results on code out now.<hr></blockquote>

A single processor with too many execution units doesn't work well because inevitably there are too many bubbles, stalls, and interdependencies. HyperThreading is a solution to that in that it allows multiple threads to fill in eachother's bubbles... unfortunately if there are no bubbles then it ends up slowing the code down. Throwing more execution units at it could correct that somewhat, but the interconnects between execution units are big and complicated so it would probably start to run into performance & design walls.

A multi-core design has much more isolated and simplified communications -- basically everything talks to the bus/cache interface and that just has to know how to respond to fixed sized data requests. Very simple. This also has the advantage that these cores aren't interdependent, so if you get a manufacturing flaw in one of them its much easier to turn off without affecting the others.

There are tradeoffs. My guess is that the multi-core approach scales much better, although each of the cores might be 2-4 way hyperthreaded to try and fill the bubbles that are present with about the current number of execution units. There is probably a "sweet spot" for hyperthreading, and beyond that multi-core makes more sense.

programmer · October 19, 2002 9:24AM

[quote]Originally posted by Eupfhoria:

hey programmer, where did you get so friggen smart?

And how do you always make it seem so friggen simple?

And please tell me you are a millionaire, cause that's a talent.<hr></blockquote>

Heh, I'm not smart I'm just good at sounding like it. The downside to making things "seem so friggen simple" is that most of the time they aren't. And no, I'm not even close to being a millionare but if you want to pay me lots to explain things, I'd be happy to oblige!

gp-ul: Single or Dual?

Comments