OpenGL on the Mac Pros is multithreaded!

placebo · August 21, 2006 9:47PM

http://www.macrumors.com/pages/2006/...21223856.shtml

Looks like on the Mac Pros a special version of 10.4.7 is installed by default which includes multithreaded OpenGL code that will let multithreaded games and apps get quite a speed boost.

Blizzard has evidently made a beta recompile of WoW that's OpenGL-multithreaded and they report a DOUBLING in framerates.

Sounds cool!

theapplegenius · August 21, 2006 10:01PM

Oh man. OpenGL needs this, as DirectX 10 is a farce.

wmf · August 21, 2006 10:28PM

This doesn't really make sense to me. Just about all Macs now have multiple processors, so it seems like multithreaded OpenGL would benefit all of them. Why hide this feature?

mikef · August 21, 2006 10:40PM

Quote:

Originally Posted by theapplegenius

Oh man. OpenGL needs this, as DirectX 10 is a farce.

For it being a farce, it sure has a lot of followers (both hardware and software)...

mikef · August 21, 2006 10:40PM

Quote:

Originally Posted by wmf

This doesn't really make sense to me. Just about all Macs now have multiple processors, so it seems like multithreaded OpenGL would benefit all of them. Why hide this feature?

Agreed.

dr. zoidberg · August 22, 2006 6:27AM

grrrrm ... first, a logic pro update enabling 4-way processing on Mac Pros but NOT PowerMac Quads, now these findings of OpenGL being optimized for intel and not PowerPC .... is this your idea of supporting the installed user base, apple??!

if they goe on in this fashion, apple will face a new era of really p*ssed customers ...

mikef · August 22, 2006 8:35AM

Quote:

Originally Posted by dr. zoidberg

grrrrm ... first, a logic pro update enabling 4-way processing on Mac Pros but NOT PowerMac Quads, now these findings of OpenGL being optimized for intel and not PowerPC .... is this your idea of supporting the installed user base, apple??!

if they goe on in this fashion, apple will face a new era of really p*ssed customers ...

Is this a ploy to make the Intel-based Macs look much better than the PPC or just an attempt to put the PPC days long behind them in a hurry? Either way, it's not a good way to treat existing G5 users.

benroethig · August 22, 2006 8:42AM

For CPU bound games, it's not going to make a whole lot of difference. For GPU bound games, it's going to make a dramatic difference.

placebo · August 22, 2006 9:13AM

I think it's going to be the other way around since the part of OpenGL we're talking about here is not the part processed by the GPU but the polygon-parsing, texture-loading part the CPU deals with. It will if anything make the GPU more likely to be the bottleneck.

hiro · August 22, 2006 10:38AM

Well, this sure sounds like someone mangled a message. And Placebo has it correct, not Ben.

I wasn't at WWDC so I don't know exactly what Apple said, but the chances they multi-threaded OpenGL is pretty damn slim. OpenGL isn't a process that happens on the CPU, OpenGL is an API set that is almost entoirely shuffled off to the GPU. As in there are no magic OpenGL threads to make multi-processor aware, just OpenGL calls sprinkled into your own threads.

What do I think really happened? Apple has done some serious work on making their OpenGL drivers safely re-entrant. What does this do? It allows separate applications threads to call the re-entrant API's without trashing each others state, or having to wait for a serial shot through the code which was the previous solution to avoid the state trashing. This can remove a serious bottleneck and allow independent threads to progress without spending an abhorrent amount of time just waiting for ther other thread to get out of a particular OpenGL call.

I can see how some non-developer could mangle "Now I can safely multi-thread my OpenGL dpendent game code without bottlenecking" into "OMG, OpenGL just got multi-threaded1!11 OMG!!! Same end effect, big difference in how and what that means to taking advantage of it.

Part of this re-entrancy might be allowing some OpenGL instructions to call a proxy in an OS owned thread to handle certian functions that take awhile, like loading textures, but most of the API set isn't amenable to this treatment. Re-entrancy though is just more hard work under any call to get a nice return with no proxy threading. Apple has been attempting to do this type of thing throughout the OS for a few years now.

In a couple years everyone will be doing this, it is a big part of why Cell and the triple core Xboc CPU's are the way they are. I would also bet that the changes will eventually make it back to the multi-processor PPC's by the time 10.5 is out, this is just a by-product of the order the QA has been certified and it make sense to release the big win on the newest box first so it doesn't get trashed for 6 months by the older one.

Six months from now if the entire product lineup is running quad-optimized code the G5's will be faster but still slower than the new quads and everyone will be happy but still have motivation to buy a new box which will make Apple happy too.

davegee · August 22, 2006 10:50AM

For those who want to know what this is REALLY all about just visit the ARS thread

http://episteme.arstechnica.com/eve/...r=598007950831

They've been talking about it for some time and near as I can tell this isn't a slight against PPC but maybe it was easier to roll out more quickly on Intel.

This link will also help explain things (even some of the comments are useful)

http://arstechnica.com/staff/fatbits.ars/2006/8/17/5024

As well as:

The Low Level Virtual Machine web site (this is what actually does the heavy lifting).

http://llvm.org/

Finally check this MacWorld story for some good quotes like "For the games that are very graphics bound it could give us some very nice frame rate boosts" - Glenda Adams - Aspyr.

http://www.macworld.com/news/2006/08...read/index.php

Dave

davegee · August 22, 2006 10:59AM

Quote:

Originally Posted by Hiro

Well, this sure sounds like someone mangled a message. And Placebo has it correct, not Ben.

Well if Glenda Adams from Aspyr can be taken at her word then it would seem that both you and Placebo have it wrong after all. Unless Glenda was somehow misquoted (see my post above) and the article was never corrected. (not out of the realm of possibility)

Dave

hiro · August 22, 2006 1:47PM

I read all those other links earlier this morninig before posting my above post and absolutely nothing in any of them lead me to believe OpenGL itself has undergone any magical transformations. I see plenty of the typical tech writer over simplification/overgeneralization though. LLVM is a separate issue from multi-threading. Siracusa has it right, but if you read a little sloppily you could easily miss that point.

As for Glenda, after her last public commenting performance I don't trust a thing she says at face value. She has also shown through the abysmal performance of her shop's ports that they don't really understand what they are doing [they can do the porting mechanics, just not good optimizations] or what benefits can be gained from threading outside of OpenGL.

To give her a sliver of credit in word choice though, she said "graphics bound" not GPU bound and those have very different meanings. Aspyr's serious graphical shortcomings have been in getting the code off the CPU and into the GPU. The Apple changes will help this significantly. This is still an advance that will make a CPU bound graphical app faster, not GPU bound one. [How much does anyone want to bet that Aspyrs awfully performing ports have been something that spurred Apple to make it easier to avoid those same screw-ups. The tools to do good optimizations have always been there for shops that really understand what they are doing, this will sidestep many of those techniques allowing less accomplished teams to generate faster code without learning much more].

Make no bones, this will be a very good thing, it's just not a GPU thing.

kickaha · August 22, 2006 2:22PM

Okay, wait.

LLVM is being used for creating multiply-threaded OGL on the GPU, I thought, and will be out in 10.5, and not before.

This, in 10.4.7, appears to be re-entrant OGL in general, meaning that CPU utilization for multi-threaded OGL apps just got a lot better, but is *independent* of the LLVM technique.

In other words, this is *one* optimization technique, and others are coming with 10.5.

Right?

If so, OGL performance could get a *lot* better, very quickly.

chucker · August 22, 2006 2:26PM

Quote:

Originally Posted by Kickaha

In other words, this is *one* optimization technique, and others are coming with 10.5.

Right?

Correct. The bulk of the changes are coming in 10.5, and not yet available, neither on Mac Pros nor elsewhere.

hiro · August 22, 2006 5:08PM

Quote:

Originally Posted by Kickaha

Okay, wait.

LLVM is being used for creating multiply-threaded OGL on the GPU, I thought, and will be out in 10.5, and not before.

This, in 10.4.7, appears to be re-entrant OGL in general, meaning that CPU utilization for multi-threaded OGL apps just got a lot better, but is *independent* of the LLVM technique.

In other words, this is *one* optimization technique, and others are coming with 10.5.

Right?

If so, OGL performance could get a *lot* better, very quickly.

Well, everyone who actually knows is under NDA so we are left to best guesses, but I really don't think the concept of thread even applies to the GPU.

The GPU is generally a fixed function state machine that has some number of identical parallel pipelines. We get GPU programmability by replacing the fixed vertex processing unit with a programmable one and the fixed fragment unit with a programmable one, but all the pipelines have to use the current program(or the fixed unit) in the same fixed order and place in the pipeline. There is no way to split this up any finer.

The constraint on multi-threading OpenGL commands is mainly driven by the need to maintain a correct order to feed verticies into the GPU and keep those synchronized with the appropriate state change callouts. Once any single vertex enters the GPU we know exactly how it will be processed because all states that affect it are already present in the pipeline. Nothing Apple can do will change this, it is driven by the spec and hardware implementations.

My read on LLVM is that it will vastly optimize vertex & fragment programs before they are sent to the GPU as a state change. It will also go good things for OpenGL code that never needs to hit the GPU. There is a lot of optimization at a very granular level that no scene graph system could ever hope to incorporate because trying to capture all that customization would have the opposite effect and would make it too slow. In the bazillion tight loops that OpenGL uses, a well written JIT (just-in-time) compiler can theoretically make optimizations that are not physically possible at static compile time, without adding any complexity to the original program. [This is also why Hotspot Java and C# are getting so much faster nowadays]. I think this is where a great deal of the performance is coming from.

Re-entrancy is the other big contributor by assinating stupid driver imposed bottlenecks, sure a good OGL coder could plan around those bottlenecks but most outfits don't want to pay someone that good to spend the time required. Just eliminating those re-entrancy bottlenecks will allow an app to separate all non-pipeline related calls (verticies and state changes) into whatever thread you want without clobbering the whole stack. Now you can do expensive stuff such as load textures and V/F programs like any other good app handles I/O, off to the side.

The main draw thread will probably stay as a single thread, but it will be doing less so it can iterate faster. Chopping the actual draw thread up introduces too much opportunity for state thrashing, and because state changes necessitate flushing, too many of them are performance killers. Personally I think there is a lot of cruft that could already come out of the draw thread, but so far run of the mill coders are still too scared of synchronization to see where it doesn't really matter and where it is really needed.

Maybe the multi-threading in the stack refers to running culling and cropping in their own threads, not something specific the program calls or does. This would be pretty safe and is what SGI did for a long time with it's proprietary OpenGL pipeline and middleware products like Performer. That would offload A LOT of cycles from the main draw thread without causing any dependency issues. Worst case it the GPU gets a few extra verticies, but waits less for them. Since the wait was the killer, not the few extra verticies, absolutely perfect synchronization is not required, just pretty damn close. Considering cull/crop is done on the CPU and we usually know within a few degrees where the view frustrum is even in a fast paced environment, this is not too hard. Hell you could pay a few mil for a Reality Monster setup and get this 5 years ago. Why not today on the desktop for a few thousand?

1337_5l4xx0r · August 24, 2006 4:37AM

As a Maya user, and future gamer (when I get my Merom MBP in September), I'm all hot and bothered by this. Regardless of the techniques being employed, in a world where CPUs and GPUs are multi-core, this is definitely a good thing.

Sweet!

1337_5l4xx0r · August 24, 2006 4:43AM

Hiro, you talk the talk wrt OpenGL and SGI. What do you do for a living? Do you have Barcos in your basement?

hiro · August 24, 2006 10:48AM

Quote:

Originally Posted by 1337_5L4Xx0R

Hiro, you talk the talk wrt OpenGL and SGI. What do you do for a living? Do you have Barcos in your basement?

Just been in the simulation field awhile. The AI side is where my heart is but I have to have at least half a clue on the graphics side since that generally owns the processing budget. Unfortunately I didn't disguise the half-clue well enough so now I have to teach the stuff as I work on my next degree.

joe_the_dragon · August 24, 2006 4:14PM

Quote:

Originally Posted by wmf

This doesn't really make sense to me. Just about all Macs now have multiple processors, so it seems like multithreaded OpenGL would benefit all of them. Why hide this feature?

it will likey be in 10.4.8

dfryer · August 29, 2006 12:59AM

My understanding of what they're doing with LLVM (from the Ars article and its followup) is that they will take vertex or possibly fragment programs and compile them to ordinary x86/SSE code in order to take advantage of the CPU on machines where the GPU is rather lacking (e.g. integrated graphics) or if some particular feature can't be HW accellerated. In other words, it's a technique to get decent performance on the bottom end, not killer performance on the top end. That said, it sounds like they're putting serious effort into Making Stuff Fast.

OpenGL on the Mac Pros is multithreaded!

Comments