QE with N30? What's possible?

xype · August 1, 2002 7:03AM

[quote]Originally posted by phobos:

O.k I don't know but I think that the requirements for the memory on the graphic cards memory should be more modest. 32MB for a standard resolution is quite demanding.I'm not saying this cause I want to use my 8MB Ati rage (which doesn't have the capabilities) but because I don't understand how QE works and want it to be absolutely great.

Oh and one more question.If you play quake on a window (that takes a lot of texture memory) what's gonna happen on the rest of the desktop?

Sorry for the long post<hr></blockquote>

I think 32 MB is ok if one also wants to have room for special "cases" of software which demand a "whole" window and not just a button subset. Also it might be possible that the whole Aqua application renders it's whole content as a texture (instead of messing with hundret little rectangles onto which letters are pasted) which would require additional memory.

As for Quake I think the most straightforward way of doing it "windowed" is to simply give all screen control to quake and don't touch anything outside the window (one could say "freeze" all the desktop but the rectangle where quake is) since drawing the Mac interface _and_ a 3D game would require a _really good_ 3D card. But then, most applications do not update the screen often if you don't touch them.. however might as well be that Mac wont support windowed games anymore.

As for the long posts - as long as it's not full of crap noone has a problem with it. When you start writing a 5 pages paper on why you're going to switch to Wintel then you might get into people not liking you!

rogue27 · August 1, 2002 8:39AM

Well, QE actually only requires 16MB video RAM, but aside from the iBook, all of the GPUs with the hardware capabilities QE needs have 32MB VRAM.

Anyway, here's the real breakdown.

3d graphics are rendered offscreen, then "rasterized" into a pixel representation. This pixel representation is stored in a frame buffer. There are two frame buffers. One onscreen, and one offscreen. The one onscreen is what you see, while the one offscreen is being rendered. When the offscreen frame buffer is finished rendering, they swap. This way, there is no flicker as the graphics are drawn to the screen.

How big does the frame buffer have to be?

well, if you are using 32-bit graphics, each pixel iss 32 bits, which is 4 bytes. (there are 8 bits in a byte) you multiply that by the number of pixels on the screen. In the case of the iBook, which is 1024x768 pixels x 4 bytes per pixel, you get:

3145728 bytes, or ~3.15 MB , which is what a 2D desktop requires. However, for a 3D screen, you need that much again, because there are two frame buffers, and again for a depth buffer (which I only vaguely understand and don't know how to explain) so, to render a 1024x768x32 screen in OpenGL (or any 3D graphics) requires at least 9.45MB video RAM. (3.15 x 3) Anything above that is used for texture memory, so on the iBook, you would have 6.55 MB remaining for texture memory.

Then each window will be a texture that needs to be stored in video memory. If you have multiple windows open, some will definitely have to be transferred over AGP from main memory on the fly while the screens are being rendered, and if a window is being updated, like a video playback, that texture will keep being updated in main memory and will have to be constantly fed down the AGP bus to be rendered onto the screen. If multiple windows are changing at the same time, like if you have a QT movie plaing in the corner of the screen while scrolling down a webpage and recieving instant messages in iChat, all of this at the same time will cause a lot of data to be fed to the video card. AGP can handle this. Apparently PCI could not.

As screen resolution increases, the size of the frame buffers increases greatly. doubling resolution increases the size of the frame buffer by 4. Like if you go from 1024x768 to 2048x1536, one frame buffer would use up 12.6MB of video memory and it would be 37.8 MB after you add in the second frame buffer and the depth buffer.

Adding a second monitor shouldn't be difficult at reasonable resolutions, since most dual-head cards have at least 64MB video memory, but I would imagine that dual-monitor support with Quartz Extreme will require you to use a dual-head card. Apple's been offering them on PowerMacs for a while now.

gnurf · August 1, 2002 9:10AM

[quote] My question on brief: Will quartz Extreme be able to off load all (most) FCP composoting, effects to your 3D card?

It would be a revolution that nobody has noticed. <img src="graemlins/bugeye.gif" border="0" alt="[Skeptical]" /> <hr></blockquote>

Well, this is how it`s been done for years on SGI hardware. Flint, Flame, Inferno, Piranha, etc. do exactly the same thing that QE does. With video, from D1 to HD.

And if you take a closer look at the PDF, you will see that YUV is supported as a texture format, so that`s definetly coming. I figure FCP 4 will have realtime just about everything, utilising QE.

The only thing that is holding the implementation of 8 layer video in realtime with multiple CC`s and other things is the bandwidth of the motherboard. You need something like 2-3 GB/sec for doing that. And that should be on it`s way in the new hardware, since there are people at Apple that did it for SGI in the first place. The CPU can handle it as is. Look at Flame on a 2x250 MHz R10K MIPS processor. It`s realtime, multiple layers, advanced keying, multiple CC`s and DVE`s. Because of a similar thing to QE and the bandwidth, not because of the processor.

Maybe the next generation PowerMac will have a crossbar-switch or something? Who knows, but then you`d get realtime HD compositing...

Don`t you think Apple knows that this can be done?

I´m optimistic.

<img src="graemlins/smokin.gif" border="0" alt="[Chilling]" />

astronaut jones · August 1, 2002 9:14AM

I don't see it in the PDF, but as was explained at WWDC, the GPU's resources are virtualized in Jaguar, such that applications aren't fighting with QE for texture memory and the like... Also, hidden and/or stagnate windows can be compressed in memory and refreshed to the screen on the fly without needing to be decompressed. If you run the Quartz Debugger in Jag and show a window list, you'll notice that next to the memory usage for some windows, there's the letter C, which denotes that particular window is compressed. QE is going to manage memory very very well...

iconmaster · August 1, 2002 9:48AM

[quote]Originally posted by Lemon Bon Bon:

I wasn't clear that QE treats the interface as 3d graphics?

Polygons? With textures on?

That...that's...<hr></blockquote>

...tremendously cool? Yes.

[quote]I thought it (don't laugh) 'just sped things up'. Just a compositing aid for the interface...for composited 2D elements. I guess I misunderstood

I didn't realise it was this clever.<hr></blockquote>

<vader> Don't underestimate the power of Quartz Extreme. </vader>

[quote]If Apple push the envelope...can't they offload much of the OS visualisation onto the graphic card?<hr></blockquote>

Quartz2d has yet to be offloaded to the hardware. It will happen, sooner or later -- and bring even more power and capabilities to developers.

[quote]That could lead to some incredible things, no?<hr></blockquote>

Yes!

[quote]Am I missing something?<hr></blockquote>

Microsoft perhaps? They're about two years back that way.

programmer · August 1, 2002 10:22AM

[quote]Originally posted by rogue27:

Well, QE actually only requires 16MB video RAM, but aside from the iBook, all of the GPUs with the hardware capabilities QE needs have 32MB VRAM.

Anyway, here's the real breakdown.

...<hr></blockquote>

Good analysis. A couple of points:

- AGP is required because it gives the GPU the ability to read from main memory, whereas on PCI everything must be sent to the GPU. Subtle, but important difference since it lets the GPU choose what it needs and when it needs it.

- The depth buffer (also known as the Z-buffer) stores the distance from the camera of each pixel that is drawn. This is primarily used in 3D. Imagine drawing a polygon that is 100 meters (or whatever scale the software is using) away from the camera to a set of pixels. Then you draw two other polygons to the same set of pixels, one being 50m away and one being 150m away. The GPU looks at the Z-buffer and realizes that the 150m pixels are "behind" the others and thus won't be visible, whereas the 50m ones are in front and thus need to be drawn. The Z-values are typically 32-bit floating point numbers and so the depth buffer is the same size as the front and back buffers.

1024x768 x 3 buffers x 4 bytes/pixel = 9.43 MBytes

1280x1024 x 3 x 4 = 15.73 MBytes

1600x1200 x 3 x 4 = 23.04 MBytes

As you can see, just the frame buffer requirements alone get pretty silly pretty fast. For 2D the Z-buffer isn't strictly necessary, so its probable that QE doesn't create a fullscreen Z-buffer (at least until somebody starts drawing real 3D content using OGL).

I expect many of the static Aqua images will be uploaded to the graphics VRAM as compressed textures (buttons, icons, certain important font sizes, etc).

rogue27 · August 1, 2002 10:56AM

[quote]Originally posted by Programmer:



I expect many of the static Aqua images will be uploaded to the graphics VRAM as compressed textures (buttons, icons, certain important font sizes, etc).<hr></blockquote>

That would make sense. In fact, I think the new aqua elements in jaguar were designed in a way that can be compressed easily. I have no proof of this, but it seems like a smart thing to do for improved performance and the somewhat straighter appearance of the new elements makes them look easier to compress.

xype · August 1, 2002 12:42PM

[quote]Originally posted by Programmer:

For 2D the Z-buffer isn't strictly necessary, so its probable that QE doesn't create a fullscreen Z-buffer (at least until somebody starts drawing real 3D content using OGL).<hr></blockquote>

Yep. I think for GUI work a Z-buffer of 8 bit should be quite enough per pixel, but you could go ahead and make a 32 bit "Z-buffer" that only applies to windows (so each window component has one Z-buffer byte) instead of doing it per-pixel. Considering there seldom are more than 256 UI elements on the dekstop one should get along with the (most likely already there) simple per-window depth buffering.

Per pixel Z-buffer would only make sense if you'd like Doom III 3D monsters playing with your windows (thus making a "real" 3D scene).

razzfazz · August 1, 2002 1:02PM

[quote]Originally posted by Programmer:



For 2D the Z-buffer isn't strictly necessary, so its probable that QE doesn't create a fullscreen Z-buffer.

<hr></blockquote>

What about overlapping windows, possibly even translucent ones?

EDIT: OK, you'd only really need a single Z value for a whole window, but then again, having one per (window) pixel would allow for some nice effects (like a 3D successor to the genie effect).

Bye,

RazzFazz

[ 08-01-2002: Message edited by: RazzFazz ]

razzfazz · August 1, 2002 1:25PM

[quote]Originally posted by Programmer:



- The depth buffer (also known as the Z-buffer) stores the distance from the camera of each pixel that is drawn. This is primarily used in 3D. Imagine drawing a polygon that is 100 meters (or whatever scale the software is using) away from the camera to a set of pixels. Then you draw two other polygons to the same set of pixels, one being 50m away and one being 150m away. The GPU looks at the Z-buffer and realizes that the 150m pixels are "behind" the others and thus won't be visible, whereas the 50m ones are in front and thus need to be drawn. The Z-values are typically 32-bit floating point numbers and so the depth buffer is the same size as the front and back buffers.

1024x768 x 3 buffers x 4 bytes/pixel = 9.43 MBytes

1280x1024 x 3 x 4 = 15.73 MBytes

1600x1200 x 3 x 4 = 23.04 MBytes

<hr></blockquote>

Shouldn't Z values be assigned to the pixels of visible polygons (i.e. windows in Quartz) or something like that, rather than pixels in the frame buffer?

After all, the final frame buffer is already the result of the composition of all the 3D objects into a 2D representation of what's going to be on the screen surface. So, unless I'm missing something, there shouldn't be a need to have a Z buffer for every one of the 1024x768 (e.g.) physical pixels on the screen, and thus it should be (resolution) x 2 x 4 + (whatever textures are stored in VRAM rather than main RAM) instead. No?

(Damn, wich I could explain my point a little better.)

Bye,

RazzFazz

[ 08-01-2002: Message edited by: RazzFazz ]

rickag · August 1, 2002 1:38PM

QE recommended 4 X AGP Minimum 2 X AGP

iMac 2 X AGP

any affect on perfomance??

xype · August 1, 2002 2:11PM

[quote]Originally posted by RazzFazz:

(Damn, wich I could explain my point a little better.)<hr></blockquote>

Let me see if I understood;

I think the frame buffer can be viewed as a "composing cavas" where the Z-buffer bytes are simply used to determine what is drawn and what not - it's a lot more precise (and in contrast to the 1990's consumer GPUs are fast enough for it) than per-polygon sorting since it allows "complicated" intersections. And though the 24-bit color values are what you see on the screen, the alpha and Z depth values the frame buffer had stored are used whicle the image is drawn.

Earlier only SGI-class workstations had enough memory to have a hardware-accelerated Z-buffer (and thus it's precision). A few additional bytes per pixel are not a lot these days and the gained precision/speed are worth it. If you have Z-buffered pixels you don't need to do depth-sort the polygons, you draw all pixels that have a lower Z value than possible pixels at the same position.

shannyla · August 1, 2002 2:24PM

[quote] Will quartz Extreme be able to off load all (most) FCP composoting, effects to your 3D card?

It would be a revolution that nobody has noticed. <hr></blockquote>

Not exactly a revolution this one, more copying...

SGI have been doing this for donkey's years, and 5D Cyborg on PC does exactly the same.

All original thoughts already have been...

[Edit] just noticed gnurf wrote exactly this, except for the bit about Cyborg...

[ 08-01-2002: Message edited by: shannyla ]

programmer · August 1, 2002 3:22PM

The Z information in the source polygons is carried by the vertices (i.e. the corners of the window) and the Z value of each pixel being drawn is computed by interpolating between the corners, and this is compared to the nearest Z drawn so far into the frame buffer at that location.

But like I said above, the Z-buffer isn't needed for 2D and isn't even very useful because of all the alpha in Aqua. Polygons which contain alpha must be drawn from farthest to nearest so that all the alpha blending happens correctly, and these polygons are often setup to not use the Z-buffer even if it exists. In the case of QE, I expect that it will not create the Z-buffer except behind windows that are actually using an OpenGL 3D context.

AGP 2x vs 4x (vs 8x in the future) is a performance issue, just like 16 vs 32 mbytes of VRAM will be. It will be interesting to see how well QE deals with the more extreme cases where the user reaches the limits of what the available hardware is capable of. The frame rate may just drop gracefully, but eventually it'll be low enough to interfere with the user. I think a minimum of about 20fps is needed for a reasonable experience. I wonder if Apple will make a frame rate counter available on the desktop.

For the video editing stuff the place we may see big improvements is the ability to apply filters and processes to the video in real time as it plays, without actually modifying the source data. This will let you preview the effects before you tell it to go ahead and make the change to the data. Applying the change to the data may speed up a little too, but the main problem is the bandwidth required to send the video to the GPU and then bring it back again. I haven't worked through the numbers, so I don't know what the situation is like.

[ 08-01-2002: Message edited by: Programmer ]

airsluf · August 1, 2002 3:41PM

gnurf · August 1, 2002 3:57PM

[quote]Originally posted by shannyla:



[Edit] just noticed gnurf wrote exactly this, except for the bit about Cyborg...

]<hr></blockquote>

Well, yes. I forgot about 5D for a second there. Haven`t seen/used one (yet!) so I don`t know how it`s built. The next generation Strata/Mezzo from Discreet is probably built this way too. Since they`re still early in development of that, I suppose it`s a bit of a secret how they got realtime HD on dual Xeons...on Windows... :eek:

Which makes my mind take off in a fairly obvious direction.

Now, most of the Discreet people on the Combustion team are with the Nothing Real guys now, so you can expect Apple to know what`s going on. <img src="graemlins/smokin.gif" border="0" alt="[Chilling]" />

Now, realtime up the wazoo _is_ coming, question is when. When Jaguar, DDR PowerMacs and the NV30 are out, that`s what. The software to use it won`t be out until at least January, though.

Looking at that Xserve architecture-overview (from the announcement stream), you see it`s got 4 gigs of memory thruput on the 266MHz DDR L3, but only 1 GB/sec from the bridge. That means 266 MHz DDR from the bridge/system controller will yield 4 GB/sec shared between the dual procs.* Good enough for at least 4 streams of uncompressed SD video (20 MB/sec per stream).

Couple that with a good AGP 8X graphics card like the NV30, top-notch OpenGL implementation and a scaled down Xserve RAID with a Fibre Channel card. You`d get realtime Shake (i.e Tremor...**) and FCP. You might get people in the broadcast and film industry buying systems by the truckload... I`ll buy one.

Considering it`s Apple, and how they price themselves aggresively in _their_ high-end (Xserve), I would think a turnkey system with uncompressed realtime and an hour of storage (IDE, remember) would go for around $10-15K. Which is not very much, all things considered.

Actually, you can get these bundles at Promax for around $12K today! Now swap the Kona SD with a non-realtime SD card ($1,5-2K), kill the Ultra160 SCSI, add your needed amount of striped IDE drives. Viola!

In the next couple of years, HD VTRs will probably have Firewire or Gigawire and you can drop the expensive card. Actually Pansonic DVCPRO50 and DVCPRO HD*** decks will have Firewire. This was announced at NAB. Is this looking good or what?

Now, that means you might actually get turnkey realtime uncompressed SD finishing systems from Apple at about $8-10K**** within a year. How`s that for cool?

*: Well, I`m not really a hardware guy, but this should be relatively close.

**:Yes, I know that Tremor isn`t really realtime. It had distributed rendering on render-slaves connected through Fibre Channel. Since Shake/Tremor is scanline based, you ccould farm of the rendering and pull back the rendered lines to you main viewer if you have the throughput. This will be fast enough for close to interactive performance. It already was/is close enough compared to a Flame or Cyborg...

***: I think those HD decks only do 720P or 480P, though. DVCPRO50 is roughly the equivalent to Digibeta.

****: Well, compare that to a Cyborg, Flame, Flint or an Avid DS. Starting at $60K for Flint, last I checked.

Wow. Long post. Should have been a separate thread... <img src="graemlins/lol.gif" border="0" alt="[Laughing]" />

macronin · August 1, 2002 6:32PM

Wow. Great post.

Kind of sums up my thoughts on the whole 'Mac as a viable high-end 3d/composite/edit workstation' controversy.

If anything is announced this month, it will be a stepping stone. The new year should bring some REALLY great hardware, no matter what the naysayers, well, say!

gnurf · August 1, 2002 7:01PM

[quote]Originally posted by MacRonin:

Wow. Great post.

Kind of sums up my thoughts on the whole 'Mac as a viable high-end 3d/composite/edit workstation' controversy.

If anything is announced this month, it will be a stepping stone. The new year should bring some REALLY great hardware, no matter what the naysayers, well, say!<hr></blockquote>

Thank you for the kind words.

One thing though, is that Apple will do really great things in both hardware AND software. The key is software and it`s integration with (arguably) mediocre hardware, IMNHO.

Well, mediocre would be a bit tough on them but I will be very surprised if they bring out hardware that is leaps and bounds better than what is currently available in commodity hardware at relatively lower prices with more options. Yes, I know I`m flogging.

Which is part of the point: They don`t actually need super-duper hardware, because they have super-duper software. DDR is inevitable...

I´ll take a surprise with a composed demeanor and run to the store, however.

[okey, have to stop posting and go out and get some beer before they close the last place in town...]

johnsonwax · August 1, 2002 7:11PM

[quote]Originally posted by Programmer:



1024x768 x 3 buffers x 4 bytes/pixel = 9.43 MBytes

1280x1024 x 3 x 4 = 15.73 MBytes

1600x1200 x 3 x 4 = 23.04 MBytes

As you can see, just the frame buffer requirements alone get pretty silly pretty fast.<hr></blockquote>

Programmer,

Because most everything on OS X is double-buffered, Apple can compress the backing store for the window graphics - something that the NeXT boxes did. Now, with all of this being handled by the GPU, does that compression play any kind of a role? That is, can it be used to minimize the load on the bus and the high memory requirements?

Also, shouldn't Quicktime be able to shove all of it's layers out in the same way and let QE take care of the compositing? Seems that if you're using such an app, like Shake, for instance

that you could probably pull off real-time compositing of rather complex scenes - especially with enough RAM. That'd be enough to make a Windows version fairly superfluous.

Of course, that also means that we don't need any kick-ass CPUs to come in and make it all work nice.

programmer · August 1, 2002 7:31PM

[quote]Originally posted by johnsonwax:



Because most everything on OS X is double-buffered, Apple can compress the backing store for the window graphics - something that the NeXT boxes did. Now, with all of this being handled by the GPU, does that compression play any kind of a role? That is, can it be used to minimize the load on the bus and the high memory requirements?

Also, shouldn't Quicktime be able to shove all of it's layers out in the same way and let QE take care of the compositing? Seems that if you're using such an app, like Shake, for instance

that you could probably pull off real-time compositing of rather complex scenes - especially with enough RAM. That'd be enough to make a Windows version fairly superfluous.

Of course, that also means that we don't need any kick-ass CPUs to come in and make it all work nice.

<hr></blockquote>

Unfortunately the kinds of texture compression supported by GPUs is fairly limited because of the need to do the decompression in hardware and in real-time. The compression is usually also fairly lossy -- i.e. the images are degraded to some extent. Third, the compression / decompression cost is asymmetric, and it is much faster to decompress than to compress. As a result, compressing large textures isn't a good idea unless you know they aren't going to change for a while, you've got enough processor time to do the work, and the image fidelity is going to be sufficient. I don't believe that any of the current (nor nv30) GPUs support JPEG compression for textures.

Furthermore, all of this compression is on the texture side of things... the numbers I gave above are for the frame buffers which cannot be compressed at all. Texture memory requirements (which includes the window backing stores) are over-and-above the frame buffer requirements.

Despite the relatively limited value of the GPU texture compression, there will be plenty of other things possible that will make us drool and improve our productivity. Besides, with the new AGP 8x GPUs with 128+ MBytes of VRAM who needs compression?!

:eek:

QE with N30? What's possible?

Comments