|
|||||||
| Register | Members List | New Posts | Mark Forums Read |
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Kasper's Automated Slave
Join Date: Nov 1997
Posts: 6,169
|
Snow Leopard's Grand Central, Open CL boost app by 50%
A developer reports seeing a 50% jump in real world performance after adding initial support for two new Snow Leopard technologies: Grand Central Dispatch and Open CL.
As reported by the French site HardMac, MovieGate developer Christophe Ducommun found his app jumped from 104 frames per second encoding under Leopard to 150 fps performance on the same hardware under Snow Leopard after implementing support for the new features. Grand Central Dispatch helps developers efficiently maximize multiple processors available on the system, and OpenCL enables applications to make use of the latent power within available video card GPUs. In addition to an overall performance boost in output, Ducommun also reported CPU utilization for MPEG-2 encoding under the ffmpeg open source library leap from 100% to 130% on his quad core Mac Pro, indicating a significant improvement in tapping its multicore potential. At the same time, decoding operations dropped CPU utilization from 165% to 70% under Snow Leopard because a significant amount of the work could be delegated to the machine's GPU. The observations illustrate how Snow Leopard's Grand Central Dispatch and OpenCL combine to improve performance both in raw computing power and in increased use of otherwise idle hardware. In particular, it indicates the potential for GPU delegation to speed things up while reducing the load on the CPU for some operations, particular video playback. The battery life of mobile devices stand to benefit from such optimizations. |
|
|
|
|
|
#2 |
|
Registered User
Join Date: Jan 2007
Posts: 49
|
Interesting news...
Sounds awesome.
Be good to hear (or see) this outcome for a number of other apps too... Maybe AppleInsider could keep a list online? |
|
|
|
|
|
#3 |
|
Registered User
Join Date: Nov 2006
Location: Arizona
Posts: 334
|
This type of thing is the wave of the future. GPUs are quite powerful chips too, so I'm glad to see this.
I was encoding a video file today on my PC (with specs a little better than the lowest Mac Pro, including the Xeon processor) and with CUDA functions enabled along with multi-threading, I was amazed at how fast this thing cranked through video! And that's just on *Gasp* Vista! If my PC can go that fast, I imagine the Mac must be.... ... well, probably about the same speed, maybe a little faster. But still, it should be amazingly blinding fast too!
openSuSe 11.2, 32 and 64 bit, for Mac and PC!
"Shiny capt'n. Everything thing is A-Okay." |
|
|
|
|
|
#4 |
|
Registered User
Join Date: Nov 2001
Location: The UK of Englandshire
Posts: 985
|
I've seen pure compute demos using scientific mapping software that resulted in a 50X speed increase. Many algorithms aren't suited to massive-parallelism but when they are you can see huge gains. Apple have done some seriously good work in Snow Leopard that should attract a lot of previously ambivalent developers.
|
|
|
|
|
|
#5 | |
|
Global Moderator
Join Date: Feb 2006
Posts: 5,251
|
Quote:
The OpenCL speedup was quite low considering even the 9400M can rival an 8-core Mac Pro CPU in some cases. I'm sure it will take time to get to grips with these technologies in the best ways though and it's good to see real-world uses. |
|
|
|
|
|
|
#6 |
|
Registered User
Join Date: Feb 2009
Location: Somewhere in the Cheese
Posts: 464
|
Sounds good, but Handbrake on my Mac Pro uses between 500% and 750% of the CPU. Unless we are talking different things, 130% seems low.
It was a widely held belief by the smartest people in late 1400's Europe that human knowledge and indeed civilisation itself, had advanced to such a nearly complete and perfect state, that the "end times" were certainly almost upon them.
|
|
|
|
|
|
#7 | |
|
Registered User
Join Date: Jan 2007
Location: methane seas of neptune
Posts: 1,488
|
Quote:
Change your company's name. Not that big of a deal.
The Beatles . |
|
|
|
|
|
|
#8 | |
|
Registered User
Join Date: Jul 2004
Location: Van Isle, BC, Canada
Posts: 209
|
Quote:
There's two technologies being used here: - Grand Central Dispatch - utilizing multiple CPU's/cores - OpenCL - utilizing graphics processors |
|
|
|
|
|
|
#9 | |
|
Registered User
Join Date: Apr 2005
Location: The Northcoast
Posts: 127
|
how many CPUs on the test machine?
Quote:
|
|
|
|
|
|
|
#10 | |
|
Registered User
Join Date: Jun 2003
Location: Where East meets West
Posts: 230
|
Quote:
Have you tried Windows 7 on your system yet? Just interested in a speed comparison. Looking forward to the new iMacs, interesting to see where they are going.
Where are we on the curve? We'll know once it goes asymptotic!
|
|
|
|
|
|
|
#11 | |
|
Registered User
Join Date: Jun 2003
Location: Where East meets West
Posts: 230
|
Quote:
Where are we on the curve? We'll know once it goes asymptotic!
|
|
|
|
|
|
|
#12 |
|
Registered User
Join Date: Sep 2009
Posts: 2
|
good but could be better....
Just so you guys know, a factor of 10-30 in performance is not uncommon for scientists who use CUDA. Given the similarities between OpenCL and CUDA, we should (hopefully) see a lot of improvement in the near future. Here is a link (http://sussi.megahost.dk/~frigaard/) to a standard piece of scientific code G2X (it does N-body calculations) modified to use CUDA by C. FRIGAARD (go to the bottom) which gets a factor of 30 for one subroutine and a factor of 10 overall.
|
|
|
|
|
|
#13 |
|
Registered User
Join Date: Jun 2005
Location: Philadelphia
Posts: 472
|
Once again proving that Snow Leopard is more about positioning the Mac platform for the future than trying to drum up massive sales (which is happening anyway). And, I guess, it's also about "encouraging" people to upgrade their hardware, too.
|
|
|
|
|
|
#14 | |
|
Registered User
Join Date: Jun 2003
Location: Where East meets West
Posts: 230
|
Quote:
![]()
Where are we on the curve? We'll know once it goes asymptotic!
|
|
|
|
|
|
|
#15 | |
|
Registered User
Join Date: May 2009
Posts: 82
|
Quote:
I very much approve of Apple doing a separate Mac OS X release as a foundational release without the distraction of new features orientated toward the consumer. This takes courage, and has been a good software development strategy in my opinion. Edit: searched apple.com/support and came up with Xeon 5400 Series, not hyper-threaded. So the test would be on 4 CPU threads. Last edited by BertP; 09-18-2009 at 12:31 AM.. Reason: Info on number of threads. |
|
|
|
|
|
|
#16 |
|
Registered User
Join Date: Jun 2009
Location: Canada
Posts: 20
|
A bit off topic...
A bit off topic, but something this "performance boost" talk reminded me of: I seem to remember a couple of years ago (probably here on AI) there being mention that Leopard was bloated because some sort of developer files were left in the OS that should have been removed when it went to Golden Master.
Does anyone know what I'm talking about? If you do, do you know if the 6GB freed up in Snow Leopard is a true improvement, or does it come just from cleaning up the bloat that should never have been there in the first place? Or maybe I'm getting my info totally crossed ![]() |
|
|
|
|
|
#17 | |
|
Registered User
Join Date: Sep 2009
Location: Columbus, OH
Posts: 6
|
Quote:
The 6GB of disk space reclaimed in Snow Leopard was mostly from removing PowerPC support. The performance improvements in SL come from some massive under-the-hood optimizations, not from removing PPC support as the code which executed in Universal apps was determined at run-time and the other CPU support would remain on disk. |
|
|
|
|
|
|
#18 |
|
Registered User
Join Date: Oct 2004
Location: Vienna
Posts: 182
|
Oh my god... we need an OpenCL and GrandCentral ready version of Handbrake...
![]()
Now running on a 20" aluminium iMac (Fall 2008), as well as a Macboook Pro 13" (mid 2009) and an iPhone.
|
|
|
|
|
|
#19 | |
|
Registered User
Join Date: Feb 2007
Posts: 3,706
|
Quote:
The PC equivalent using ATI's Stream (and MediaEspresso something) is fast but buggy. http://badaboomit.com/ for Nvidia cards on PC promises fast encoding, haven't tried it as I have an ATI 4830 512MB in my PC.Freeware DVD and BluRay transcoder, using OpenCL and GrandCentral would equal big Win. |
|
|
|
|
|
|
#20 |
|
Registered User
Join Date: Jun 2005
Location: UK
Posts: 114
|
Anyone know how much work had to be put in to this developers application to take advantage of these technologies? I'm guessing it's not as simple as checking a couple of check boxes.
|
|
|
|
|
|
#21 |
|
Registered User
Join Date: Feb 2005
Posts: 347
|
OpenCL performance
I had a previous post regarding OpenCL performance on MacBook Pro. Anyone who has access to the developer tools and sample code could try this on a MacPro and report back.
Here is the link to the post with more details. The other important aspect of OpenCL which seems misunderstood, it is not a GPGPU only. It is a technology which takes advantage of all compute resources available, including CPUs, GPUs, DSPs (digital Signal Processors) or any custom encoding/decoding chips available. I am not sure Apple has some ideas beyond CPU and GPU right now but this is what COULD be done with OpenCL. Both OpenCL and GCD add flexibility for future architectures, e.g. cell-like processors or Larrabee architecture. Considering the fact that Apple controls the hardware, this could be a great advantage for Apple, if a new powerful architecture emerges. |
|
|
|
|
|
#22 |
|
Registered User
Join Date: Feb 2005
Posts: 347
|
Definitely not as simple as checking a couple of check boxes. GCD has support at Cocoa level, which makes it's use much simpler. Also, Cocoa applications can enjoy "free lunch" sometimes: Apple says that CoreImage was re-written using OpenCL and got 30% (AFAIR) boost on average. Some of the Leopard classes (NSOperation?) take advantage of GCD without recompile. But the rest of the code needs change. And OpenCL may need changes to the software architecture. OpenCL is good for a relatively limited number of tasks, N-body calculations being the showcase example.
|
|
|
|
|
|
#23 | |
|
Registered User
Join Date: Oct 2008
Posts: 14
|
Quote:
|
|
|
|
|
|
|
#24 |
|
Registered User
Join Date: Jun 2009
Posts: 2
|
Lets hope Apple applies all this new technology to a rewrite of their fantastic quality, but fantastically slow Aperture app. Having just spent £130 on it and having to stop using it (except for my portfolio grade RAW conversions) as it is so slow to process RAW files (and I know I'm not alone in this problem), I really hope there is some scope for still image as well as video improvements.
|
|
|
|
|
|
#25 |
|
Registered User
Join Date: Feb 2005
Posts: 347
|
These new technologies require paradigm shift from developers. The future seems to be heading there and Apple skates "where the puck is going to be". An unexpected breakthrough in semiconductor/processor technology could bring the free ride on core speed back, however. Very unlikely, but possible.
|
|
|
|
|
|
#26 | |
|
Registered User
Join Date: Feb 2005
Posts: 347
|
Quote:
![]() |
|
|
|
|
|
|
#27 |
|
Registered User
Join Date: Oct 2008
Location: Detroit, MI
Posts: 123
|
Slightly. You should be able to split the video into chunks by keyframes. Then dependencies aren't a problem. And while the process isn't maximally parallelized, it's a fairly straightforward change for a huge benefit that can be implemented before a serious re-write that gets in deeper.
|
|
|
|
|
|
#28 | |
|
Registered User
Join Date: Sep 2009
Posts: 2
|
Quote:
|
|
|
|
|
|
|
#29 |
|
Registered User
Join Date: Jun 2008
Posts: 888
|
Windows 7 will now require 4 GB of RAM as a minimum and applications running on Windows 7 will use 30% more CPU cycles to keep it from crashing.
|
|
|
|
|
|
#30 | |
|
Registered User
Join Date: Apr 2007
Location: Chicago
Posts: 82
|
Quote:
|
|
|
|
|
|
|
#31 | |
|
Registered User
Join Date: Feb 2009
Location: Somewhere in the Cheese
Posts: 464
|
Quote:
I'm taking the 130% (and my 750%) as direct indications of how many cores are in use (1.3 and 7.5), but as I say I'm not totally sure of that. It's nice to see real world implementations of this sort of thing so soon either way. Too often someone invents a really cool and much better way to do things and yet it's never implemented because of some foolish capitalist or legal reason that has no bearing on the technology itself. Hopefully the uptake on these technologies will be better than that.
It was a widely held belief by the smartest people in late 1400's Europe that human knowledge and indeed civilisation itself, had advanced to such a nearly complete and perfect state, that the "end times" were certainly almost upon them.
|
|
|
|
|
|
|
#32 | |
|
Registered User
Join Date: Apr 2007
Location: Chicago
Posts: 82
|
Quote:
|
|
|
|
|
|
|
#33 | |
|
Registered User
Join Date: Apr 2007
Location: Chicago
Posts: 82
|
Quote:
1GB=1024KB=1024*1024*1024B=1.074K(decimal)or a 7% "gain" in disk size. I assume files also store a little more efficiently with the smaller Byte sized blocks? Anyone know what the true skinny is on this? |
|
|
|
|
|
|
#34 | |
|
Registered User
Join Date: Apr 2007
Location: Chicago
Posts: 82
|
Quote:
http://www.macresearch.org/cocoa-sci...-grand-central Last edited by DESuserIGN; 09-18-2009 at 01:06 PM.. |
|
|
|
|
|
|
#35 | |
|
Registered User
Join Date: Apr 2009
Posts: 80
|
Quote:
![]() |
|
|
|
|
|
|
#36 | ||
|
Registered User
Join Date: Apr 2006
Location: The Ansible
Posts: 11,895
|
Quote:
There hasn’t been a new update for a year so I am doubtful that we’ll see anyone take the ball and run with it at this point. Quote:
As for your 7%, that is only for a 1GB. With the Terabyte nomenclature —a common size now— that discrepancy jumps to 10%. Apple should have gone a step further and used the Kibi-, Mebi-Gibi- and Tebibyte of the IEC standard o that it’s not confused with the SI standard of Kilo-, Mega-, Giga-, and Terabyte now that they are using BASE-10 in the OS UI. I can’t think of anything else that uses the exactly same writing to represent two similar but very distinct representations in math. It’s fraught with issues. • http://www.iec.ch/zone/si/si_bytes.htm |
||
|
|
|
|
|
#37 | |
|
Registered User
Join Date: Apr 2007
Location: Chicago
Posts: 82
|
Quote:
Also my other thought was that there might be savings due to a change in block sizes (? not sure if that's the right term) on the disk. Smaller blocks might be slightly more efficient at storing some kinds of files (but this is totally wild ass uninformed speculation on my part.) |
|
|
|
|
|
|
#38 | |
|
Registered User
Join Date: Apr 2006
Location: The Ansible
Posts: 11,895
|
Quote:
|
|
|
|
|
|
|
#39 | |
|
Registered User
Join Date: Sep 2009
Posts: 4
|
Quote:
Handbrake can already efficiently uses all cores on multicore machines since the x264 library it uses supports this. Has done for some time, long before Grand Central Dispatch came along. So nothing to gain there. Moreover, the x264 devs have already looked into OpenCL/CUDA and (from memory) deduced there is not much they can gain from that. GPUs may well be fast but they have some serious limitations, and in the case of H264 encoding result in them not being ideal (note I said encoding, not decoding...) Last and not least handbrake is multi-platform. They support linux and windows as well as OSX, so are unlikely to start a widespread rewrite for some new technology only available on one platform. Chris |
|
|
|
|
|
|
#40 |
|
Registered User
Join Date: Jun 2003
Location: BC Canada
Posts: 6
|
Space saving is more than the loss of PPC code
The space saving in 10.6 is a lot more than stripping PPC code and re-defining a GB - apple actually optimized a ton of code AND implemented compression across the board under the hood. If you haven't read the ars review - here is a full page detailing it: http://arstechnica.com/apple/reviews...s-x-10-6.ars/3
|
|
|
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|