Snow Leopard's Grand Central, Open CL boost app by 50%

shadow · September 18, 2009 6:39AM

Quote:

Originally Posted by hdasmith

Anyone know how much work had to be put in to this developers application to take advantage of these technologies? I'm guessing it's not as simple as checking a couple of check boxes.

Definitely not as simple as checking a couple of check boxes. GCD has support at Cocoa level, which makes it's use much simpler. Also, Cocoa applications can enjoy "free lunch" sometimes: Apple says that CoreImage was re-written using OpenCL and got 30% (AFAIR) boost on average. Some of the Leopard classes (NSOperation?) take advantage of GCD without recompile. But the rest of the code needs change. And OpenCL may need changes to the software architecture. OpenCL is good for a relatively limited number of tasks, N-body calculations being the showcase example.

cwingrav · September 18, 2009 6:40AM

Quote:

Originally Posted by manonthemove

Just so you guys know, a factor of 10-30 in performance is not uncommon for scientists who use CUDA. Given the similarities between OpenCL and CUDA, we should (hopefully) see a lot of improvement in the near future. Here is a link (http://sussi.megahost.dk/~frigaard/) to a standard piece of scientific code G2X (it does N-body calculations) modified to use CUDA by C. FRIGAARD (go to the bottom) which gets a factor of 30 for one subroutine and a factor of 10 overall.

Just to clarify a bit about optimization by multiple cores, it all really depends on the algorithm. N-body calculations are pretty much the ABSOLUTE BEST case for multi-core optimization as the calculations are small and are not dependent upon each other's results, until the next timestep (and even that can be fudged). I don't know too much about video optimization but would guess that it too is pretty open for optimization w/ multiple cores but there are quite a few dependencies between frames and the like that make it more complicated and tricky. I think the best test will be for something like a word processor and the like, more typical of desktop applications. However, video and pictures are applications that have more computational needs so maybe they are the best realistic benchmark?

mikemcfarlane · September 18, 2009 6:57AM

Lets hope Apple applies all this new technology to a rewrite of their fantastic quality, but fantastically slow Aperture app. Having just spent £130 on it and having to stop using it (except for my portfolio grade RAW conversions) as it is so slow to process RAW files (and I know I'm not alone in this problem), I really hope there is some scope for still image as well as video improvements.

shadow · September 18, 2009 7:36AM

These new technologies require paradigm shift from developers. The future seems to be heading there and Apple skates "where the puck is going to be". An unexpected breakthrough in semiconductor/processor technology could bring the free ride on core speed back, however. Very unlikely, but possible.

shadow · September 18, 2009 7:37AM

Quote:

Originally Posted by mikemcfarlane

Lets hope Apple applies all this new technology to a rewrite of their fantastic quality, but fantastically slow Aperture app. Having just spent £130 on it and having to stop using it (except for my portfolio grade RAW conversions) as it is so slow to process RAW files (and I know I'm not alone in this problem), I really hope there is some scope for still image as well as video improvements.

I am with you here. Fingers crossed for Aperture

roc ingersol · September 18, 2009 8:45AM

Quote:

Originally Posted by cwingrav

I don't know too much about video optimization but would guess that it too is pretty open for optimization w/ multiple cores but there are quite a few dependencies between frames and the like that make it more complicated and tricky.

Slightly. You should be able to split the video into chunks by keyframes. Then dependencies aren't a problem. And while the process isn't maximally parallelized, it's a fairly straightforward change for a huge benefit that can be implemented before a serious re-write that gets in deeper.

manonthemove · September 18, 2009 9:05AM

Quote:

Originally Posted by cwingrav

Just to clarify a bit about optimization by multiple cores, it all really depends on the algorithm. N-body calculations are pretty much the ABSOLUTE BEST case for multi-core optimization as the calculations are small and are not dependent upon each other's results, until the next timestep (and even that can be fudged). I don't know too much about video optimization but would guess that it too is pretty open for optimization w/ multiple cores but there are quite a few dependencies between frames and the like that make it more complicated and tricky. I think the best test will be for something like a word processor and the like, more typical of desktop applications. However, video and pictures are applications that have more computational needs so maybe they are the best realistic benchmark?

I agree with you that everyday tasks matter more (especially those involving video pictures and music) and that the optimization is algorithm dependent. However, the ability to get an order of magnitude in performance is there. I hope developers will figure out better algorithms for everyday tasks so they can get more than just factor of 2.

tauron · September 18, 2009 9:19AM

Quote:

Originally Posted by IQatEdo

Hi

Have you tried Windows 7 on your system yet? Just interested in a speed comparison.

Looking forward to the new iMacs, interesting to see where they are going.

Windows 7 will now require 4 GB of RAM as a minimum and applications running on Windows 7 will use 30% more CPU cycles to keep it from crashing.

desuserign · September 18, 2009 10:28AM

Quote:

Originally Posted by coolfactor

The 130% figure illustrates that the application was able to split across two processors (parallelism), rather than max out just one, but some of the work was also offloaded to the GPU's, which would not be reported in the CPU utilization. The 130% alone is *not* the full processing power being tapped.

There's two technologies being used here:

- Grand Central Dispatch - utilizing multiple CPU's/cores

- OpenCL - utilizing graphics processors

Yes. But I would add that GCD not only orchestrates the use of available cores/CPUs but also other Resources such as DSP's as well as open CL capable GPUs. I guess their close association is why folks confuse/conflate them

gazoobee · September 18, 2009 10:36AM

Quote:

Originally Posted by coolfactor

The 130% figure illustrates that the application was able to split across two processors (parallelism), rather than max out just one, but some of the work was also offloaded to the GPU's, which would not be reported in the CPU utilization. The 130% alone is *not* the full processing power being tapped.

There's two technologies being used here:

- Grand Central Dispatch - utilizing multiple CPU's/cores

- OpenCL - utilizing graphics processors

I don't pretend to know about the details of this kind of stuff. I was just thinking that since I'm using a very similar (or perhaps even the exact same) hardware, and that since Handbrake is also performing the same kind of work, that a direct comparison could be made.

I'm taking the 130% (and my 750%) as direct indications of how many cores are in use (1.3 and 7.5), but as I say I'm not totally sure of that.

It's nice to see real world implementations of this sort of thing so soon either way. Too often someone invents a really cool and much better way to do things and yet it's never implemented because of some foolish capitalist or legal reason that has no bearing on the technology itself. Hopefully the uptake on these technologies will be better than that.

desuserign · September 18, 2009 10:39AM

Quote:

Originally Posted by lilgto64

Did I miss it - did the article or that comment mention how many CPUs were in the test machine? When I read that I was thinking dual-core machine where before only 1 core could be tapped and now both cores are used for a 30% boost in performance. I was thinking this because aren't the majority of Intel Macs out there dual core? Core 2 or Core 2 Duo? iMac, Mac Mini, MacBook, MacBook Pro? Sure there are 8 core desktops and xServe - and that is more likely where you would expect to see these apps - just saying that the configuration of the test platform was not obvious to me.

Yes. Some core2's (core2 solo, I think only in some early Minis) are single core.

desuserign · September 18, 2009 10:48AM

Quote:

Originally Posted by FattyMcButterpants

You're thinking of all the talk about debug code being left in OS X. Back when OS X was slow (remember how Finder windows would not move around cleanly?), lots of people were saying Apple had left debugging code in the system. That was a bunch of bull.

The 6GB of disk space reclaimed in Snow Leopard was mostly from removing PowerPC support.

The performance improvements in SL come from some massive under-the-hood optimizations, not from removing PPC support as the code which executed in Universal apps was determined at run-time and the other CPU support would remain on disk.

I've read that some of the reclaimed space is partially from the move to decimal file sizes.

1GB=1024KB=1024*1024*1024B=1.074K(decimal)or a 7% "gain" in disk size.

I assume files also store a little more efficiently with the smaller Byte sized blocks?

Anyone know what the true skinny is on this?

desuserign · September 18, 2009 10:59AM

Quote:

Originally Posted by shadow

Definitely not as simple as checking a couple of check boxes. GCD has support at Cocoa level, which makes it's use much simpler. Also, Cocoa applications can enjoy "free lunch" sometimes: Apple says that CoreImage was re-written using OpenCL and got 30% (AFAIR) boost on average. Some of the Leopard classes (NSOperation?) take advantage of GCD without recompile. But the rest of the code needs change. And OpenCL may need changes to the software architecture. OpenCL is good for a relatively limited number of tasks, N-body calculations being the showcase example.

Definitely not simple, but also definitely not so hard either. Minor changes in the code can enable parallellization by GCD (at least for the CPU cores.) I'm not a coder, but here is a link to what I have read:

http://www.macresearch.org/cocoa-sci...-grand-central

lukeskymac · September 18, 2009 11:18AM

Quote:

Originally Posted by camroidv27

This type of thing is the wave of the future. GPUs are quite powerful chips too, so I'm glad to see this.

I was encoding a video file today on my PC (with specs a little better than the lowest Mac Pro, including the Xeon processor) and with CUDA functions enabled along with multi-threading, I was amazed at how fast this thing cranked through video! And that's just on *Gasp* Vista! If my PC can go that fast, I imagine the Mac must be....

... well, probably about the same speed, maybe a little faster. But still, it should be amazingly blinding fast too!

No, it has the potential to be a lot faster. Because OpenCL may be similar to CUDA, but there is nothing available to Windows that works like Grand Central Dispatch

solipsism · September 18, 2009 11:34AM

Quote:

Originally Posted by Denmaru

Oh my god... we need an OpenCL and GrandCentral ready version of Handbrake...

It?s open source so have at it.

There hasn?t been a new update for a year so I am doubtful that we?ll see anyone take the ball and run with it at this point.

Quote:

Originally Posted by DESuserIGN

I've read that some of the reclaimed space is partially from the move to decimal file sizes.

1GB=1024KB=1024*1024*1024B=1.074K(decimal)or a 7% "gain" in disk size.

I assume files also store a little more efficiently with the smaller Byte sized blocks?

Anyone know what the true skinny is on this?

Technically speaking, they change from BASE-2 to BASE-10 does not alter the space used. A Byte is a Byte. You?ll get back at least 7GB but many are reporting 20GB, because of other software installed, but mostly because of the BASE change. I?m glad they made the change and everyone else needs to get on board. There is no reason why the user needs to be doing binary calculations when decimal is natural. Let the computer deal with binary; it?s good at it.

As for your 7%, that is only for a 1GB. With the Terabyte nomenclature ?a common size now? that discrepancy jumps to 10%. Apple should have gone a step further and used the Kibi-, Mebi-Gibi- and Tebibyte of the IEC standard o that it?s not confused with the SI standard of Kilo-, Mega-, Giga-, and Terabyte now that they are using BASE-10 in the OS UI. I can?t think of anything else that uses the exactly same writing to represent two similar but very distinct representations in math. It?s fraught with issues.

? http://www.iec.ch/zone/si/si_bytes.htm

desuserign · September 18, 2009 12:03PM

Quote:

Originally Posted by solipsism

Technically speaking, they change from BASE-2 to BASE-10 does not alter the space used. A Byte is a Byte. You’ll get back at least 7GB but many are reporting 20GB, because of other software installed, but mostly because of the BASE change. I’m glad they made the change and everyone else needs to get on board. There is no reason why the user needs to be doing binary calculations when decimal is natural. Let the computer deal with binary; it’s good at it.

As for your 7%, that is only for a 1GB. With the Terabyte nomenclature —a common size now— that discrepancy jumps to 10%. Apple should have gone a step further and used the Kibi-, Mebi-Gibi- and Tebibyte of the IEC standard o that it’s not confused with the SI standard of Kilo-, Mega-, Giga-, and Terabyte now that they are using BASE-10 in the OS UI. I can’t think of anything else that uses the exactly same writing to represent two similar but very distinct representations in math. It’s fraught with issues.

• http://www.iec.ch/zone/si/si_bytes.htm

Sorry I was not clear. Is there any "real" regained disk space, or is it all just imagined by people who were not aware of the switch to decimal MB and GB? Presumably Apple would not be deceptive about this. The reported disk savings I have heard posted seem more in line with the 7-10% one would expect from the decimal change.

Also my other thought was that there might be savings due to a change in block sizes (? not sure if that's the right term) on the disk. Smaller blocks might be slightly more efficient at storing some kinds of files (but this is totally wild ass uninformed speculation on my part.)

solipsism · September 18, 2009 12:09PM

Quote:

Originally Posted by DESuserIGN

Sorry I was not clear. Is there any "real" regained disk space, or is it all just imagined by people who were not aware of the switch to decimal MB and GB? Presumably Apple would not be deceptive about this. The reported disk savings I have heard posted seem more in line with the 7-10% one would expect from the decimal change.

Also my other thought was that there might be savings due to a change in block sizes (? not sure if that's the right term) on the disk. Smaller blocks might be slightly more efficient at storing some kinds of files (but this is totally wild ass uninformed speculation on my part.)

Yes, at least 7GB is real space that is freed up. Most of the additional is just a difference in reporting from binary to decimal depending on partition size, which is why you see reports of 20GB and more.

cjones051073 · September 18, 2009 12:55PM

Quote:

Originally Posted by Gazoobee

Sounds good, but Handbrake on my Mac Pro uses between 500% and 750% of the CPU. Unless we are talking different things, 130% seems low.

Don't expect to see any improvement in handbrake due to these technologies.

Handbrake can already efficiently uses all cores on multicore machines since the x264 library it uses supports this. Has done for some time, long before Grand Central Dispatch came along. So nothing to gain there.

Moreover, the x264 devs have already looked into OpenCL/CUDA and (from memory) deduced there is not much they can gain from that. GPUs may well be fast but they have some serious limitations, and in the case of H264 encoding result in them not being ideal (note I said encoding, not decoding...)

Last and not least handbrake is multi-platform. They support linux and windows as well as OSX, so are unlikely to start a widespread rewrite for some new technology only available on one platform.

Chris

munch · September 18, 2009 1:21PM

The space saving in 10.6 is a lot more than stripping PPC code and re-defining a GB - apple actually optimized a ton of code AND implemented compression across the board under the hood. If you haven't read the ars review - here is a full page detailing it: http://arstechnica.com/apple/reviews...s-x-10-6.ars/3

addabox · September 18, 2009 1:32PM

Quote:

Originally Posted by cjones051073

Don't expect to see any improvement in handbrake due to these technologies.

Handbrake can already efficiently uses all cores on multicore machines since the x264 library it uses supports this. Has done for some time, long before Grand Central Dispatch came along. So nothing to gain there.

Moreover, the x264 devs have already looked into OpenCL/CUDA and (from memory) deduced there is not much they can gain from that. GPUs may well be fast but they have some serious limitations, and in the case of H264 encoding result in them not being ideal (note I said encoding, not decoding...)

Last and not least handbrake is multi-platform. They support linux and windows as well as OSX, so are unlikely to start a widespread rewrite for some new technology only available on one platform.

Chris

True, but I'm curious about what GCD could do for Handbrake when Handbrake is competing for resources with other running processes.

My understanding is that one of the advantages of GCD is that it is system aware in a way that a given application cannot be, no matter how carefully coded for multicore optimization it may be, and allocates resources accordingly.

Snow Leopard's Grand Central, Open CL boost app by 50%

Comments