Originally Posted by Brendon
What if there were a way to design a Velocity engine that had several different specialized cores in it. One for H264, another for gaming physics, another for HDTV bluray, another for DNA sequencing stuff. In sort a few cores that are very fast at what they do and add great value to Apple machines.
This is the opposite of what they should do. Putting in extremely specialized hardware means any given piece of hardware is only used when that task is being performed. Your physics hardware would be inert while you watched a movie, for example. Plus you would only be faster at things you provided accelerators for. Sorry, bad idea.
The more likely approach (and this is
inevitably going to happen in the computing world in general... just look at how GPUs are evolving) is to add "general purpose vector processors". This may sound like an oxymoron, but its not. Most processors are inherently "scalar" in nature -- they are optimized for doing one operation on one data value at a time, and doing them sequentially as fast as possible. Most modern scalar processors have some SIMD (i.e. vector) capabilities tacked on the side, and under the hood they speed up their scalar processing using things like pipelining, caching, etc. A general purpose vector processor moves the SIMD capabilities front-and-center, and optimizes for highly regularized calculations on long streams of data. This kind of array processing underpins video encoding/decoding, image processing, audio processing, 3D graphics, physics, supercomputing, DNA sequencing, so on and so forth. A gang of general purpose vector processors (like the Cell contains) can do extremely impressive amounts of calculation of this type.
The big problem is usually that they are painful to program using normal programming tools
. A few years ago, however, a new way to program certain common specialized vector processors was introduced... shader languages (Cg, HLSL, GLSL) in the OpenGL and Direct3D APIs give developers an easy way to write highly concurrent programs for very specialized vector hardware (GPUs). This has led to a revolution in 3D graphics. More recently nVidia introduced CUDA which is an attempt to give non-graphics developers a way to more easily write code for their GPUs. Unfortunately it is still fairly low level and rather specific to nVidia's hardware. Very recently Apple announced OpenCL as their initiative to introduce a standardized way for developers to write programs that will run on any processors
. The developer writes in OpenCL and when the program is run it is compiled for the hardware on which it has been run (likely, and I'm speculating here, using LLVM). If there is no programmable GPU and only a conventional x86, then it runs relatively slowly. If there is an 8 core AVX equipped x86-64 CPU, then it runs much faster. If there is a programmable GPU then it runs even faster. And (just maybe) if there is a pool of "wicked fast" custom Apple vector processors in the memory controller, then it runs faster than a speeding bullet.
And even if a given developer doesn't use OpenCL, Apple will. Apple's OpenGL, CoreAudio, QuickTime, CoreAnimation, etc. will very likely be reimplemented using OpenCL and thus will automatically take advantage of whatever processing horsepower you might have in your machine. Sure it might not be as fast as if some developer had written their code specifically for a particular piece of hardware, but it will be dramatically faster than C/C++ code and it will automatically adapt to whatever your machine happens to have in it.