QE with N30? What's possible?

whisper · August 1, 2002 7:48PM

Does anybody know how much of Quartz Apple could offload to the NV30-based cards? They're supposed to be so programmable and whatnot... Just wondering if anyone has a good guestimate.

gnurf · August 1, 2002 8:25PM

[quote]Originally posted by Whisper:

Does anybody know how much of Quartz Apple could offload to the NV30-based cards? They're supposed to be so programmable and whatnot... Just wondering if anyone has a good guestimate.<hr></blockquote>

Well, having read up on the Cg language and the available info on the NV30 lately, I`d say _everything_. At least hypothetically, if you were a Masochist.

Anything computationally could be done by the GPU, but not necessarily so that the first thing that came to mind was acceleration...

How practical and if it would plain make sense, is a different matter. There would probably be a messaging and bus problem eventually.

The thing is that there are loads of things that are faster to send off to the processor than pushing through the GPU, since the GPU is fairly simple compared to the CPU. Some structures are far better off on a complex chip. Not to mention that you`d have to emulate a CPU pipeline on the GPU...

Anyway, we need to use our procs, once in a while.

[missed the beer, aaargh!]

[edit: doh! - did I answer you question? - nope.]

[ 08-01-2002: Message edited by: gnurf ]

programmer · August 1, 2002 11:35PM

Its important to understand that the GPU executes code differently than the CPU. The CPU is completely free-form in its execution, it handles threads and interrupts and basically executes whatever the programmer wants it to, when the programmer wants it to. It can also touch any data the programmer wants it to, including going to disk or across the network.

The GPU, on the other hand, has two kinds of code -- vertex programs, and fragment programs (also called vertex shaders and pixel shaders). These programs, vertices, textures and other parameters are fed to the GPU and it decides how best to process it all. The vertex programs are fed just the data for that vertex (plus some parameters), and the outputs of that and nearby vertices are used to compute the inputs to the fragment program. The fragment program is then executed on each pixel that is going to be output. The result of the fragment program is a colour and Z value that will be drawn to the frame & depth buffers. This extremely tightly controlled model is how the GPU can achieve its awesome performance levels -- the whole thing is setup to very very deeply pipeline these computations, and execute them in parallel. Since the order of execution is controlled by the GPU it can arrange well in advance for the data necessary to be fetched from and stored to memory. It can do things in a way that optimizes memory access. These advantages also mean, however, that what can be done is considerably more limited than on a CPU.

Despite those limitations the GPU is still an enormously flexible beast, it just requires a new approach by the developer and a deeper understanding of how the hardware works. Some (many) problems can be handled by the GPU, even beyond the realm of graphics, but there will still be many things left to the CPU. Fortunately many of the things that current CPUs are struggling with (some more than others!) are typically the kinds of things that the GPU can do well.

To answer the question a bit more directly: while it might be possible to contort the current GPUs into Quartz rendering, they aren't particularly well suited to it. The main reason is that Quartz uses bezier curves heavily, and a vertex-based approach is not ideal for rasterizing bezier curves (not to say that its impossible -- I have some ideas on how to do it, but no feel for how inefficient it would be).

[ 08-02-2002: Message edited by: Programmer ]

johnsonwax · August 2, 2002 1:24AM

[quote]Originally posted by Programmer:



Unfortunately the kinds of texture compression supported by GPUs is fairly limited because of the need to do the decompression in hardware and in real-time. The compression is usually also fairly lossy -- i.e. the images are degraded to some extent. Third, the compression / decompression cost is asymmetric, and it is much faster to decompress than to compress. As a result, compressing large textures isn't a good idea unless you know they aren't going to change for a while, you've got enough processor time to do the work, and the image fidelity is going to be sufficient. I don't believe that any of the current (nor nv30) GPUs support JPEG compression for textures.<hr></blockquote>

Well, I think the NeXT were just doing RLE or something pretty trivial (Huffman maybe?) Nothing lossy. If you look at your screen right now, you'll see a lot of RLE compression potential. Surely anything in the Radeon or GeForce family can do RLE in realtime...

On the encoding side, we've got Altivec on our side, so the asymmetric nature shouldn't be a problem (not really a problem with RLE anyway). Won't help G3 owners, but that's okay.

[quote]Furthermore, all of this compression is on the texture side of things... the numbers I gave above are for the frame buffers which cannot be compressed at all. Texture memory requirements (which includes the window backing stores) are over-and-above the frame buffer requirements.<hr></blockquote>

Well, the window backing stores could be compressed, but no, not the frame buffers.

[quote]Despite the relatively limited value of the GPU texture compression, there will be plenty of other things possible that will make us drool and improve our productivity. Besides, with the new AGP 8x GPUs with 128+ MBytes of VRAM who needs compression?!<hr></blockquote>

Yeah, we're getting along just fine with 640K of RAM, after all. <img src="graemlins/bugeye.gif" border="0" alt="[Skeptical]" />

The compression isn't to reduce the RAM needs, just the bus traffic. On the NeXT it was done to prevent paging to disk - those 040s were fast and double-buffered color megabit displays ate up a lot of the 32MB of RAM that they shipped with.

razzfazz · August 2, 2002 4:15AM

[quote]Originally posted by AirSluf:



The z-buffer essentially remembers the closest z-value for a particular pixel in relation to the camera or eye-point. The frame buffer remembers the color of that pixel. As the pipeline renders polys into pixels, it will check each individual pixel's z-value to decide wheter or not that pixel's color will be sent to the frame buffer. If the z-value is closer than what is already in the z-buffer, you update both buffers simultaneously. If it was farther away, you just discard and don't worry about it again until the next frame.

<hr></blockquote>

Ah, OK, didn't think of that.

Thanks for the explanation.

Bye,

RazzFazz

programmer · August 2, 2002 9:44AM

[quote]Originally posted by johnsonwax:

Well, I think the NeXT were just doing RLE or something pretty trivial (Huffman maybe?) Nothing lossy. If you look at your screen right now, you'll see a lot of RLE compression potential. Surely anything in the Radeon or GeForce family can do RLE in realtime...

The compression isn't to reduce the RAM needs, just the bus traffic. On the NeXT it was done to prevent paging to disk - those 040s were fast and double-buffered color megabit displays ate up a lot of the 32MB of RAM that they shipped with.<hr></blockquote>

Why do you assume that RLE is possible? The compression hardware is not programmable (yet), and designed for rapid texture lookups for which RLE is typically poorly suited. The current compression schemes in hardware are determined by Microsoft's decisions in Direct3D -- DXT1..3.

The NeXT guys could use whatever they wanted because their GPU was actually an i860 CPU, not fixed function hardware.

Saving bandwidth is an admirable goal, but depending on the kind of data you're reading RLE isn't going to help you. Most current GPUs typically don't read the entire texture, decompress it, and then use it. Instead they read the parts of the texture they need to draw with, as they need to draw it. It isn't stored in VRAM, just in the GPU's texture cache(s). As a result the texture compression methods are designed to allow a few cachelines to be read in isolation, whereas RLE you generally need which a large variable sized chunk of the image... and many textures won't compress at all. In the special case of 2D windows RLE may work quite well, but how many GPU based 2D GUIs were out there when these things were designed...?

johnsonwax · August 2, 2002 10:58AM

[quote]Originally posted by Programmer:



Why do you assume that RLE is possible? The compression hardware is not programmable (yet), and designed for rapid texture lookups for which RLE is typically poorly suited.<hr></blockquote>

Thanks for the info.

It's always tempting to take the work that NeXT (or any pioneering group) did and plop it on today's calendar to see what's still relevant and useful.

QE with N30? What's possible?

Comments