Snow Leopard's Grand Central, Open CL boost app by 50%

Posted:
in macOS edited January 2014
A developer reports seeing a 50% jump in real world performance after adding initial support for two new Snow Leopard technologies: Grand Central Dispatch and Open CL.



As reported by the French site HardMac, MovieGate developer Christophe Ducommun found his app jumped from 104 frames per second encoding under Leopard to 150 fps performance on the same hardware under Snow Leopard after implementing support for the new features.



Grand Central Dispatch helps developers efficiently maximize multiple processors available on the system, and OpenCL enables applications to make use of the latent power within available video card GPUs.



In addition to an overall performance boost in output, Ducommun also reported CPU utilization for MPEG-2 encoding under the ffmpeg open source library leap from 100% to 130% on his quad core Mac Pro, indicating a significant improvement in tapping its multicore potential.



At the same time, decoding operations dropped CPU utilization from 165% to 70% under Snow Leopard because a significant amount of the work could be delegated to the machine's GPU.



The observations illustrate how Snow Leopard's Grand Central Dispatch and OpenCL combine to improve performance both in raw computing power and in increased use of otherwise idle hardware.



In particular, it indicates the potential for GPU delegation to speed things up while reducing the load on the CPU for some operations, particular video playback. The battery life of mobile devices stand to benefit from such optimizations.
«13

Comments

  • Reply 1 of 55
    Sounds awesome.



    Be good to hear (or see) this outcome for a number of other apps too...



    Maybe AppleInsider could keep a list online?
  • Reply 2 of 55
    This type of thing is the wave of the future. GPUs are quite powerful chips too, so I'm glad to see this.



    I was encoding a video file today on my PC (with specs a little better than the lowest Mac Pro, including the Xeon processor) and with CUDA functions enabled along with multi-threading, I was amazed at how fast this thing cranked through video! And that's just on *Gasp* Vista! If my PC can go that fast, I imagine the Mac must be....

    ... well, probably about the same speed, maybe a little faster. But still, it should be amazingly blinding fast too!
  • Reply 3 of 55
    vinney57vinney57 Posts: 1,162member
    I've seen pure compute demos using scientific mapping software that resulted in a 50X speed increase. Many algorithms aren't suited to massive-parallelism but when they are you can see huge gains. Apple have done some seriously good work in Snow Leopard that should attract a lot of previously ambivalent developers.
  • Reply 4 of 55
    MarvinMarvin Posts: 14,224moderator
    Quote:
    Originally Posted by AppleInsider View Post


    Ducommun also reported CPU utilization for MPEG-2 encoding under the ffmpeg open source library leap from 100% to 130% on his quad core Mac Pro, indicating a significant improvement in tapping its multicore potential.



    Not close to the 400% it could be though. It would be nice if video compressors encoded groups of pictures in separate threads in parallel, then just cached the output so it would write to file in order.



    The OpenCL speedup was quite low considering even the 9400M can rival an 8-core Mac Pro CPU in some cases.



    I'm sure it will take time to get to grips with these technologies in the best ways though and it's good to see real-world uses.
  • Reply 5 of 55
    gazoobeegazoobee Posts: 3,754member
    Quote:
    Originally Posted by AppleInsider View Post


    ... Ducommun also reported CPU utilization for MPEG-2 encoding under the ffmpeg open source library leap from 100% to 130% on his quad core Mac Pro, ...



    Sounds good, but Handbrake on my Mac Pro uses between 500% and 750% of the CPU. Unless we are talking different things, 130% seems low.
  • Reply 6 of 55
    brucepbrucep Posts: 2,823member
    Quote:
    Originally Posted by vinney57 View Post


    I've seen pure compute demos using scientific mapping software that resulted in a 50X speed increase. Many algorithms aren't suited to massive-parallelism but when they are you can see huge gains. Apple have done some seriously good work in Snow Leopard that should attract a lot of previously ambivalent developers.



    yes and apple has just begun this new fast lean road of intergrated cpu and dual gpu's all working together under GCD AND open CL . If i said this right .
  • Reply 7 of 55
    Quote:
    Originally Posted by Gazoobee View Post


    Sounds good, but Handbrake on my Mac Pro uses between 500% and 750% of the CPU. Unless we are talking different things, 130% seems low.



    The 130% figure illustrates that the application was able to split across two processors (parallelism), rather than max out just one, but some of the work was also offloaded to the GPU's, which would not be reported in the CPU utilization. The 130% alone is *not* the full processing power being tapped.



    There's two technologies being used here:

    - Grand Central Dispatch - utilizing multiple CPU's/cores

    - OpenCL - utilizing graphics processors
  • Reply 8 of 55
    lilgto64lilgto64 Posts: 1,147member
    Quote:
    Originally Posted by coolfactor View Post


    The 130% figure illustrates that the application was able to split across two processors (parallelism), rather than max out just one, but some of the work was also offloaded to the GPU's, which would not be reported in the CPU utilization. The 130% alone is *not* the full processing power being tapped.



    There's two technologies being used here:

    - Grand Central Dispatch - utilizing multiple CPU's/cores

    - OpenCL - utilizing graphics processors



    Did I miss it - did the article or that comment mention how many CPUs were in the test machine? When I read that I was thinking dual-core machine where before only 1 core could be tapped and now both cores are used for a 30% boost in performance. I was thinking this because aren't the majority of Intel Macs out there dual core? Core 2 or Core 2 Duo? iMac, Mac Mini, MacBook, MacBook Pro? Sure there are 8 core desktops and xServe - and that is more likely where you would expect to see these apps - just saying that the configuration of the test platform was not obvious to me.
  • Reply 9 of 55
    iqatedoiqatedo Posts: 1,606member
    Quote:
    Originally Posted by camroidv27 View Post


    This type of thing is the wave of the future. GPUs are quite powerful chips too, so I'm glad to see this.



    I was encoding a video file today on my PC (with specs a little better than the lowest Mac Pro, including the Xeon processor) and with CUDA functions enabled along with multi-threading, I was amazed at how fast this thing cranked through video! And that's just on *Gasp* Vista! If my PC can go that fast, I imagine the Mac must be....

    ... well, probably about the same speed, maybe a little faster. But still, it should be amazingly blinding fast too!



    Hi



    Have you tried Windows 7 on your system yet? Just interested in a speed comparison.



    Looking forward to the new iMacs, interesting to see where they are going.
  • Reply 10 of 55
    iqatedoiqatedo Posts: 1,606member
    Quote:
    Originally Posted by Marvin View Post


    Not close to the 400% it could be though. It would be nice if video compressors encoded groups of pictures in separate threads in parallel, then just cached the output so it would write to file in order.



    The OpenCL speedup was quite low considering even the 9400M can rival an 8-core Mac Pro CPU in some cases.



    I'm sure it will take time to get to grips with these technologies in the best ways though and it's good to see real-world uses.



    Apple aficionados knew that SL was an 'under-the-hood' revolution - it'll be nice now to hear of and to witness the pay-off.
  • Reply 11 of 55
    Just so you guys know, a factor of 10-30 in performance is not uncommon for scientists who use CUDA. Given the similarities between OpenCL and CUDA, we should (hopefully) see a lot of improvement in the near future. Here is a link (http://sussi.megahost.dk/~frigaard/) to a standard piece of scientific code G2X (it does N-body calculations) modified to use CUDA by C. FRIGAARD (go to the bottom) which gets a factor of 30 for one subroutine and a factor of 10 overall.
  • Reply 12 of 55
    Once again proving that Snow Leopard is more about positioning the Mac platform for the future than trying to drum up massive sales (which is happening anyway). And, I guess, it's also about "encouraging" people to upgrade their hardware, too.
  • Reply 13 of 55
    iqatedoiqatedo Posts: 1,606member
    Quote:
    Originally Posted by Cubert View Post


    Once again proving that Snow Leopard is more about positioning the Mac platform for the future than trying to drum up massive sales (which is happening anyway). And, I guess, it's also about "encouraging" people to upgrade their hardware, too.



    Just doubling my MBP (3,1) RAM to 4 GB and replacing the 160 GB HD with a 500 GB one made a huge difference after installing SL. An old dog taught new tricks - love it.
  • Reply 14 of 55
    Quote:
    Originally Posted by lilgto64 View Post


    Did I miss it - did the article or that comment mention how many CPUs were in the test machine? When I read that I was thinking dual-core machine where before only 1 core could be tapped and now both cores are used for a 30% boost in performance. I was thinking this because aren't the majority of Intel Macs out there dual core? Core 2 or Core 2 Duo? iMac, Mac Mini, MacBook, MacBook Pro? Sure there are 8 core desktops and xServe - and that is more likely where you would expect to see these apps - just saying that the configuration of the test platform was not obvious to me.



    I checked Hardmac and came up with "Mac Pro 2007 (Quad Core 2.66 GHz with a GeForce 8800 GT)". I'm not sure if the CPU is hyper-threaded or not, but I would guess 'Yes'. I think the Mac Pro has Xeon CPUs.



    I very much approve of Apple doing a separate Mac OS X release as a foundational release without the distraction of new features orientated toward the consumer. This takes courage, and has been a good software development strategy in my opinion.



    Edit: searched apple.com/support and came up with Xeon 5400 Series, not hyper-threaded. So the test would be on 4 CPU threads.
  • Reply 15 of 55
    A bit off topic, but something this "performance boost" talk reminded me of: I seem to remember a couple of years ago (probably here on AI) there being mention that Leopard was bloated because some sort of developer files were left in the OS that should have been removed when it went to Golden Master.



    Does anyone know what I'm talking about? If you do, do you know if the 6GB freed up in Snow Leopard is a true improvement, or does it come just from cleaning up the bloat that should never have been there in the first place?



    Or maybe I'm getting my info totally crossed
  • Reply 16 of 55
    Quote:
    Originally Posted by LighteningKid View Post


    A bit off topic, but something this "performance boost" talk reminded me of: I seem to remember a couple of years ago (probably here on AI) there being mention that Leopard was bloated because some sort of developer files were left in the OS that should have been removed when it went to Golden Master.



    Does anyone know what I'm talking about? If you do, do you know if the 6GB freed up in Snow Leopard is a true improvement, or does it come just from cleaning up the bloat that should never have been there in the first place?



    Or maybe I'm getting my info totally crossed



    You're thinking of all the talk about debug code being left in OS X. Back when OS X was slow (remember how Finder windows would not move around cleanly?), lots of people were saying Apple had left debugging code in the system. That was a bunch of bull.



    The 6GB of disk space reclaimed in Snow Leopard was mostly from removing PowerPC support.



    The performance improvements in SL come from some massive under-the-hood optimizations, not from removing PPC support as the code which executed in Universal apps was determined at run-time and the other CPU support would remain on disk.
  • Reply 17 of 55
    Oh my god... we need an OpenCL and GrandCentral ready version of Handbrake...
  • Reply 18 of 55
    Quote:
    Originally Posted by Denmaru View Post


    Oh my god... we need an OpenCL and GrandCentral ready version of Handbrake...



    Totally. The PC equivalent using ATI's Stream (and MediaEspresso something) is fast but buggy. http://badaboomit.com/ for Nvidia cards on PC promises fast encoding, haven't tried it as I have an ATI 4830 512MB in my PC.



    Freeware DVD and BluRay transcoder, using OpenCL and GrandCentral would equal big Win.
  • Reply 19 of 55
    Anyone know how much work had to be put in to this developers application to take advantage of these technologies? I'm guessing it's not as simple as checking a couple of check boxes.
  • Reply 20 of 55
    I had a previous post regarding OpenCL performance on MacBook Pro. Anyone who has access to the developer tools and sample code could try this on a MacPro and report back.



    Here is the link to the post with more details.



    The other important aspect of OpenCL which seems misunderstood, it is not a GPGPU only. It is a technology which takes advantage of all compute resources available, including CPUs, GPUs, DSPs (digital Signal Processors) or any custom encoding/decoding chips available. I am not sure Apple has some ideas beyond CPU and GPU right now but this is what COULD be done with OpenCL.



    Both OpenCL and GCD add flexibility for future architectures, e.g. cell-like processors or Larrabee architecture. Considering the fact that Apple controls the hardware, this could be a great advantage for Apple, if a new powerful architecture emerges.
Sign In or Register to comment.