Aqua help in NVidia GeForce 4

xype · January 28, 2002 3:40AM

[quote]Originally posted by moki:



If it were really as simple as "using OpenGL to do it", trust me, Apple would have done it already.<hr></blockquote>

One could do all the windows widgets (such as the open/cose and scroll buttons and schadows) as openGL textures where you would only need the video RAM for _one_ widget and OpenGL would simply draw it over the screen in immediate mode, sorting iz with the Z buffer and automatically doing all the transparency calculations on the graphics card, without the need to burden the CPU too much. All the geenie and other UI effects could be accelerated with OpenGL - given the architecture is open and flexible enough. Which you said it is.

moki · January 28, 2002 3:42AM

[quote]Originally posted by xype:



One could do all the windows widgets (such as the open/cose and scroll buttons and schadows) as openGL textures where you would only need the video RAM for _one_ widget and OpenGL would simply draw it over the screen in immediate mode, sorting iz with the Z buffer and automatically doing all the transparency calculations on the graphics card, without the need to burden the CPU too much. All the geenie and other UI effects could be accelerated with OpenGL - given the architecture is open and flexible enough. Which you said it is.<hr></blockquote>

...and what do you propose to do with the actual contents of the windows, in that case? Again, see what I wrote above re: memory requirements and the way Quartz works...

grad student · January 28, 2002 3:47AM

hey, just to chime in on the memory thang...

the shadows do not have to be 32 bit bitmapps in the graphics VRAM, 8-bit alphas would do..., aside from that, you really need very little VRAM to totally hardware accelerate all shadow drawing - think about it, you only need a small image to represent top, left, right, bottom, and all four corners - 8 images total. and if you applied some transformations to some of the images, you could cut that down a real large [%] amount. think about it, all the window shadows are the same... to take that to another level, the menus need a very small pattern that could be repeated, and same with the dock and window title bars [for when they are transparent]. to HW accelerate all the transparency, it wouldn't be THAT hard on VRAM or the GPU... The more difficult QUARTZ bottleneck is font drawing - but that one could be accelerated too, with a quick software pre-render to a texture object/bit map with alpha, and then use that bit-map in compositing to preserve the antialiased text. There aint too much you see in Quartz everyday that CAN'T be done in HW on todays (and maybe even yesterdays) hardware... I mentioned this once before on AI but got yelled at by the un-informed natives... anyway, if you want to see what HW Quartz would feel like, open Interface Builder, and drag an opengl view to a window. size the view to the entire size of the window, and test the interface. Window resize on meth. anyway, its late and I drank a criminally large amount of alcohol this weekend........

grad student · January 28, 2002 4:07AM

to elaborate in advance, well - I'm sure your next [previous] question would be: fine, small bitmapps for shadows, menus,... would fit in VRAM, but not the window buffers.... so big deal, you still have to pump window buffers through the memory and AGP bus before you could composite them anyway. how do you fix THAT?

the answer to that requires some math. first, how much VRAM is needed for a display? well, at 1600x1200x24 bit you need ~5.5 megs. now the way OSX does drawing today is like this: each window has a offscreen buffer in RAM. all drawing is done from CPU to this buffer, and then the buffer is copied from RAM to onscreen VRAM (over the AGP bus). well, this makes for a flicker free and slow GUI (with lousy latency). How could this be better? well, on windows when you draw a window, all drawing is done directly to onscreen VRAM by the gfx card = very fast. so, how bout this: instead of buffer each window, just buffer the entire screen once. the union of onscreen windows buffers is a LOT smaller than the sum of those buffers. how much VRAM are we talking about? well, the 1600x1200x24 bit screen needed ~5.5 megs, so would a buffer of the screen -> about 11 megs for onscreen gfx and the offscreen buffer. all drawing would occur on the gfx card in native ops to the buffer, or in VRAM to VRAM composites by the gfx card (very fast). EVERYTHING would be quick, and VERY LITTLE would need to be done by the CPU. 11 megs would fit in ANY G4's VRAM with enough room to spare for shadow, menu, titlebar, dock, and font bitmap caches. at lower resolutions or 16 bits, it would be even easier on VRAM. so, sounds great right? well, there IS one downside. the current method of drawing has the apps draw once in a buffer, and they are done. they only need to redraw if the window changes. otherwise the processes can be put to sleep til they need to do event handling/something else.... in my scheme, each process who has a visable area in an update region would need to be swaped in, and would have to draw its stuff in the shared screen buffer in reverse Z order of the windows... That aint SOOO bad at all. well, except for one circumstance, and that is that a process that needs to draw has been paged out, and RAM is scarce. normally it wouldn't need to draw, so it wouldn't need to be swaped in, and its pages wouldn't need to be swapped into RAM. in this circumstance lots of thrashing COULD occur. however, think about this: my scenario doesn't use window buffers in RAM. the amount of memory saved by freeing the currently used ram would probably save more thrashing than it would ever cause... anyway, just my thoughts...

-D

xype · January 28, 2002 4:12AM

[quote]Originally posted by moki:



...and what do you propose to do with the actual contents of the windows, in that case? Again, see what I wrote above re: memory requirements and the way Quartz works...<hr></blockquote>

Umm, do I understand it correctly that you'd reserve memory for the contents of all windows? If that's what you mean then I think that for most Aqua-ized apps wouldn't need much of the video buffer since the Aqua UI can be generated on the fly - this is different with games or custom-UI software but that's not what affects the OS/UI performace.

moki · January 28, 2002 4:26AM

[quote]Originally posted by xype:



Umm, do I understand it correctly that you'd reserve memory for the contents of all windows? If that's what you mean then I think that for most Aqua-ized apps wouldn't need much of the video buffer since the Aqua UI can be generated on the fly - this is different with games or custom-UI software but that's not what affects the OS/UI performace.<hr></blockquote>

Again, read what I wrote above -- that is precisely what Quartz does -- every window is fully buffered, and indeed, if you want transparency effects, the video card would need access to everything below whatever it is drawing transparently. That means a whole lotta buffering...

grad student · January 28, 2002 4:50AM

ok, perhaps I don't properly describe my thoughts. so, quite simply:

today: quartz uses separate buffers for each window in memory

my way: one buffer for the entire screen, stored on the gfx card

clear moki?

PS no offense meant, I truely don't mean to sound condescending

moki · January 28, 2002 5:05AM

[quote]Originally posted by grad student:

ok, perhaps I don't properly describe my thoughts. so, quite simply:

today: quartz uses separate buffers for each window in memory

my way: one buffer for the entire screen, stored on the gfx card

clear moki?

PS no offense meant, I truely don't mean to sound condescending<hr></blockquote>

Sure... but there already is one large buffer in VRAM for the screen under Quartz. The technique you're proposing is really just the old method used for Mac OS 8/9, but using OpenGL to pain the various elements to the screen rather than using the video card's 2D acceleration.

It simply wouldn't be usable -- since you'd need to keep each graphic object for each window in VRAM for OpenGL to draw, it would get incredibly unweildy rather quickly...

grad student · January 28, 2002 5:28AM

ok, I am most definately misunderstood. in bullet point format:

1. buffer on VRAM for onscreen grafix (required by any/every operating system). for 1600x1200x24 bit = ~5.5 MB

2. another buffer offscreen in VRAM for drawing to. equal size and deminsions to onscreen buffer #1. for 1600x1200x24 bit, again ~5.5 MB. Total for these two buffers ~11 MB

3. all drawing occurs in buffer #2. when drawing is complete, dirty region is blitted from buffer #2 to buffer #1. this is a VRAM to VRAM blit, and VRAM to VRAM blits are fast

4. all drawing to buffer #2 is fast cause its done by GPU to VRAM. This is NOT how OS 9, OS X, or Windows does its GUI drawing. OS 9 and windows use the GPU to draw to onscreen VRAM in the gfx card = fast with flicker.

5. the GUI would not be drawn by OpenGL, it would simply be native 2D ops on the gfx card.

6. in a 16 MB gfx card, there would be 5 MB for swapping in and out any other necessary bitmapps. This could get constraining for OpenGL apps, but for the GUI in general, wouldn't be bad. [also, see #7]

7. I have other optimizations that could/would cut the footprint of buffer #2 by a LARGE percentage (ie 25% or to just over 1.35 MB) however, my mentioning this optimization would further confuse people; however I assure you it would work, and is not complex...

I am interested in any further complaints or confusion by readers

-David

EDIT: wording...

[ 01-28-2002: Message edited by: grad student ]

moki · January 28, 2002 6:35AM

[quote]Originally posted by grad student:



4. all drawing to buffer #2 is fast cause its done by GPU to VRAM. This is NOT how OS 9, OS X, or Windows does its GUI drawing. OS 9 and windows use the GPU to draw to onscreen VRAM in the gfx card = fast with flicker.

<hr></blockquote>

Your proposal simply boils down to doing the same old thing OS 9 did, but double-buffering it instead -- and again, it does not address the transparency issue at all, unless you're going to redraw *everything* (all GUI elements) when there is transparency involved.

In any event, we should probably spare people here this back and forth discussion -- if you want to chat about it further, free free to drop me an eMail.

razzfazz · January 28, 2002 6:38AM

[quote]Originally posted by *l++:

The GeForce 4 supports 2D transparency effects (alpha channel) in 2D.<hr></blockquote>

Are you sure this is in fact a hardware feature?

From the text you linked to, it could just as well be software-based (Win2k and up allow for transparency effects, check AVS in WinAMP for example), which would of course be of little use to Quartz.

Bye,

RazzFazz

[ 01-28-2002: Message edited by: RazzFazz ]

razzfazz · January 28, 2002 6:52AM

[quote]Originally posted by xype:

Actually Aqua could be accelerated very easily if done trought OpenGL and all computers with a gxf card as good or better than the ATI rage would do quite well.<hr></blockquote>

Hmm, dunno, wouldn't using OpenGL mean the window contents would be treated as textures, and thus would have to be a power-of-two size originally, and would be limited in size?

Bye,

RazzFazz

razzfazz · January 28, 2002 7:02AM

[quote]Originally posted by grad student:

How could this be better? well, on windows when you draw a window, all drawing is done directly to onscreen VRAM by the gfx card = very fast. so, how bout this: instead of buffer each window, just buffer the entire screen once. the union of onscreen windows buffers is a LOT smaller than the sum of those buffers.

<hr></blockquote>

It's smaller, but what use would it be any more?

[quote]

in my scheme, each process who has a visable area in an update region would need to be swaped in, and would have to draw its stuff in the shared screen buffer in reverse Z order of the windows...<hr></blockquote>

Yeah, but eliminating this need to do a full recomposition of newly-visible window areas is the whole point of per-window buffering.

Without a bitmap of each window's original, complete, unobscured state, transparency effects would force the window to continously re-calculate it's content area, which, depending on window complexity, might even be slower than the current approach (and seeing it update all the time would look kinda ugly). Also, how would you do something like the genie effect?

Bye,

RazzFazz

[ 01-28-2002: Message edited by: RazzFazz ]

crayz · January 28, 2002 7:03AM

OK so if this is gonna need custom HW to get faster, how are we ever gonna get that HW? Do you really think nVidia is gonna specially make a GPU that can do PostScript just so Quartz can be fast? How would that be worth it to them?

Seems Apple either needs to make something themselves, or contract someone else to do it. Would a customer "Quartz accelerator" card be possible, whose sole purpose would be to speed up OS X, or would it be prohibitevely expensive to have that in addition to a 3D card?

razzfazz · January 28, 2002 7:15AM

[quote]Originally posted by grad student:

ok, I am most definately misunderstood. in bullet point format:

(...)

<hr></blockquote>

Still, to use OpenGL for transparency, geometry transforms (genie, ...) and other things, you would need a bitmap of the complete, opaque, unchanged version of each and every object you want OGL to work on in VRAM, since you can't get it from the buffer containing the whole screen (unless it is completely unobscured).

Bye,

RazzFazz

razzfazz · January 28, 2002 7:20AM

[quote]Originally posted by crayz:

OK so if this is gonna need custom HW to get faster, how are we ever gonna get that HW?

<hr></blockquote>

I guess the most likely way we'll see such acceleration is in by means of faster CPUs which are powerful enough to handle it in software (as has happened with DVD decoding). Probably both cheaper and more economical.

Bye,

RazzFazz

*l++ · January 28, 2002 9:14AM

With a fast graphics card bus and DMA access, the graphics card could do all the compositing work by fetching what it needs directly from main memory. This would offload the CPU. The bottleneck is the memory bandwith, the path from CPU to memory is still faster than the path from GPU to memory over the AGP bus.

ghost_user_name · January 28, 2002 9:23AM

[quote]Originally posted by moki:

[QB]This allows for some very cool stuff, like flick-free window drawing, but it also means that everything you draw must effectively be drawn twice: once to the window's buffer, and then again to the screen./QB]<hr></blockquote>

That's two different buffers. The double buffering of the display keeps the flicker down. That's not related to saving the rendered window contents in a buffer. Two different buffers that have little to do with one another.

ghost_user_name · January 28, 2002 9:38AM

[quote]Originally posted by Tarbash:

OS X = slow basically depends on the machine you're running it on. On my iBook/500, yeah, it could be faster, but on my TiBook/500 with 512 MB RAM and on my friends 867 tower OS X flies.

<hr></blockquote>

No it doesn't. That's part of my point. I've read that Java runs faster in Classic than it does in OS X. It's not just a little but a lot. Why? OpenGL is faster in 9 too isn't it? Why? OS X has a very real speed issue that no hardware upgrade or faster CPU is doing to fix. I've been telling people this for a long time that there is a real problem with speed in OS X but people can't stand to hear it.

Worst of all is that it has a good chance of cost Apple sales. Slower hardware + Slower OS = Chapter 11.

*l++ · January 28, 2002 12:01PM

[quote]Originally posted by RazzFazz:



Are you sure this is in fact a hardware feature?

From the text you linked to, it could just as well be software-based (Win2k and up allow for transparency effects, check AVS in WinAMP for example), which would of course be of little use to Quartz.

Bye,

RazzFazz

[ 01-28-2002: Message edited by: RazzFazz ]<hr></blockquote>

No, not sure at all, yes it could be a software thing and use OpenGL for the effect.

Aqua help in NVidia GeForce 4

Comments