Intel at CES to show off next-gen of Apple-bound Sandy Bridge processors

marvin · December 6, 2010 5:43AM

Quote:

Originally Posted by blueeddie

So, in the end, will Intel's Sandy Bridge be available for MBP 15/17in during February? If no, why not? Is it because of the Graphics support?

I highly doubt Apple will not have any updates to the Macbook pro early 2011 (not as late as april, maybe feb), im just wondering, will this include Sandy bridge?

There's nothing stopping the 15"/17" coming out with Sandy Bridge even in January. The graphics issue only affects the lower-end where Apple rightly choose to have NVidia IGPs instead of Intel's.

13" - GPU comes with motherboard (NVidia) paired with Intel Core 2 Duo as NVidia have a license for it

15" - motherboard is Intel's and NVidia GPU is dedicated as NVidia have no license to make motherboards for Intel i-series processors

Apple have now skipped out on 2 generations of i-series chips in the low-end but the i3 mobile chips aren't all that fast. The desktop ones perform really well as they have a higher clock speed but the mobile i3 only performs about 15% faster than the Core 2 Duo. If they go Intel on the low-end, it's a 15% bump in CPU with a 100% drop in GPU.

The only chips worth using are the mobile i5 and i7 and since they are in the 15/17, they likely won't go into the low-end.

Apparently Intel have have hired one of the guys from AMD's GPGPU dept:

http://www.brightsideofnews.com/news...-to-intel.aspx

So the whole OpenCL deal on the GPU might happen but not in time for Sandy Bridge.

Macbook 2.4GHz, 250GB, 2GB, 320M, $999

MBA 1.4GHz, 64GB SSD, 2GB, 320M, $999

MBA 1.86GHz, 128GB SSD, 2GB, 320M, $1299

MBP 2.4GHz, 250GB, 4GB, 320M, $1199

MBP 2.66GHz, 320GB, 4GB, 320M, $1499

MBP 2.4 i5, 320GB, 4GB, 330M, $1799

MBP 2.53 i5, 500GB, 4GB, 330M, $1999

MBP 2.66 i5, 500GB, 4GB, 330M, $2199

What they could do is the following:

MBA 1.86GHz, 128GB SSD, 2GB, 320M, $999 ($1099 with 256GB)

MBA 2.13GHz, 256GB SSD, 4GB, 320M, $1199 ($1399 with 512GB)

MBP 2.5 i5, 256GB SSD, 4GB, 420M, $1699

MBP 2.6 i5, 512GB SSD, 4GB, 420M, $1999

Any lack of internal storage is made up with Light Peak external drives, the 15" i5s can have a 2.5" extra internal drive. No optical, all instant-on, all have Light Peak, all have good NVidia GPUs.

The entry Air will be 30% slower than the old MB but the SSD and weight will more than make up for that in terms of overall experience. If it doesn't happen at this revision, it will happen soon. Given that the MBA just launched, it's likely they would save this kind of change for late 2011 but Intel has given them nothing to use for the next update. They have 3 choices on the low-end:

- cram an i5 + 415M dedicated into a 13"

- use minor CPU bumps with NVidia IGP = 2.66GHz C2D + 320M IGP

- drop the 13" MB and MBP in favour of the Air.

Marketing-wise, I reckon the 3rd option is the best if they can get the 1.86GHz into the entry model.

blueeddie · December 6, 2010 1:23PM

Quote:

Originally Posted by Marvin

There's nothing stopping the 15"/17" coming out with Sandy Bridge even in January. The graphics issue only affects the lower-end where Apple rightly choose to have NVidia IGPs instead of Intel's.

13" - GPU comes with motherboard (NVidia) paired with Intel Core 2 Duo as NVidia have a license for it

15" - motherboard is Intel's and NVidia GPU is dedicated as NVidia have no license to make motherboards for Intel i-series processors

Apple have now skipped out on 2 generations of i-series chips in the low-end but the i3 mobile chips aren't all that fast. The desktop ones perform really well as they have a higher clock speed but the mobile i3 only performs about 15% faster than the Core 2 Duo. If they go Intel on the low-end, it's a 15% bump in CPU with a 100% drop in GPU.

The only chips worth using are the mobile i5 and i7 and since they are in the 15/17, they likely won't go into the low-end.

Apparently Intel have have hired one of the guys from AMD's GPGPU dept:

http://www.brightsideofnews.com/news...-to-intel.aspx

So the whole OpenCL deal on the GPU might happen but not in time for Sandy Bridge.

Macbook 2.4GHz, 250GB, 2GB, 320M, $999

MBA 1.4GHz, 64GB SSD, 2GB, 320M, $999

MBA 1.86GHz, 128GB SSD, 2GB, 320M, $1299

MBP 2.4GHz, 250GB, 4GB, 320M, $1199

MBP 2.66GHz, 320GB, 4GB, 320M, $1499

MBP 2.4 i5, 320GB, 4GB, 330M, $1799

MBP 2.53 i5, 500GB, 4GB, 330M, $1999

MBP 2.66 i5, 500GB, 4GB, 330M, $2199

What they could do is the following:

MBA 1.86GHz, 128GB SSD, 2GB, 320M, $999 ($1099 with 256GB)

MBA 2.13GHz, 256GB SSD, 4GB, 320M, $1199 ($1399 with 512GB)

MBP 2.5 i5, 256GB SSD, 4GB, 420M, $1699

MBP 2.6 i5, 512GB SSD, 4GB, 420M, $1999

Any lack of internal storage is made up with Light Peak external drives, the 15" i5s can have a 2.5" extra internal drive. No optical, all instant-on, all have Light Peak, all have good NVidia GPUs.

The entry Air will be 30% slower than the old MB but the SSD and weight will more than make up for that in terms of overall experience. If it doesn't happen at this revision, it will happen soon. Given that the MBA just launched, it's likely they would save this kind of change for late 2011 but Intel has given them nothing to use for the next update. They have 3 choices on the low-end:

- cram an i5 + 415M dedicated into a 13"

- use minor CPU bumps with NVidia IGP = 2.66GHz C2D + 320M IGP

- drop the 13" MB and MBP in favour of the Air.

Marketing-wise, I reckon the 3rd option is the best if they can get the 1.86GHz into the entry model.

thats exactly the thoughtful insight i have been looking for... or probably just because its the news i'd like to hear. im still desperately waiting for an update to the 15in i7 with ssd... hoping the updated one in early 2011 with sandy bridge will be faster than the current 15in i7 2.8GHz...

wizard69 · December 6, 2010 2:07PM

Quote:

Originally Posted by backtomac

I don't know this for sure but I would bet that in applications that could leverage OCL on a 320m they would be significantly faster than that same application run on a Sandy Bridge processor (cpu only) that would go in a MB, mini or MBA. Those are likely dual core cpus, (maybe with hyperthreading) that'll clock under 3.0 ghz.

When talking about laptops it is difficult to say exactly what we will be seeing performance wise. First most of Sandy Bridge's improvements are in the areas that compete with GPU computing. That is the the i86 integer units are not especially faster with respect to older hardware but the FP and SIMD units are vastly improved. So there is not enough data right now to say how good or bad OpenCL on a Sandy Bridge CPU would or wouldn't be.

In general though the extremely wide units in a GPU should have a huge advantage over the processor in SB for codes that can be leveraged on the GPU. Of course there will be cases (for the marketing department) where CPU executed OpenCL code will be faster. I do not think this will be the norm though. Most of the time well fitted code will be ten to hundred times faster on a GPU.

Quote:

IIRC, applications that ran on OCL using a 9400m were done faster than on 2.0 ghz dual C2Ds MPs. That bakeoff was a while ago so I may be mistaken. I bet Marvin would remember and could add more to the discussion.

This isn't really something that can be baked off. The problem is there is a very wide range of advantage for the GPU depending upon exactly what is being processed on the GPU. Even if the code is 1:1 with the CPU you can still have an advantage with the GPU as it executes in parallel with the CPU. This is a good thing as the CPU needs to do a lo of data setup to prep the work for the GPU.

In many cases the developer of the App has to test to see what and where's of OpenCL code performance. As an end user you might not know the advantages seen by the developer.

Quote:

Its my understanding that massively parallel tasks are far better performed on a GPU. Otherwise our GPUs would look a lot like CPUs, no?

It isn't like most GPU are optimized for the codes running via things like OpenCL either. Well they haven't been but there has been considerable improvement to GPU's to support running arbitrary code. I doubt a computer designer building a machine to do scientific computing would model the design on a GPU. The problem is in modern computers GPU's are a requirement, there is no mystery here the display technology requires their existence. So what we have people doing is leveraging all the engineering effort that goes into making a GPU fast for computing it wasn't originally designed to do. Thankfully the GPU architecture maps well to many parallel programming needs.

Quote:

Tasks like that are well suited for OCL. Its a shame we don't have more applications that leverage the GPU when its advantageous to do so.

Well this is the other nice thing, we might not know if the app is built to use OpenCL. With the ability to specify a fall back i86 routine the app can run transparently to the user. So you might not know if an app is accelerated at all. The flip side of this is that apps don't get OpenCL support overnight, it takes a lot of work to realize the improvements.

One good example here is WebKit where Apple has slowly been accelerating Safari. It is getting there but one has to realize that OpenCL is a very new technology, so you can't expect new stuff overnight. The other thing to realize is that acceleration is often just for parts of an app.

wizard69 · December 6, 2010 2:37PM

*************

Any lack of internal storage is made up with Light Peak external drives, the 15" i5s can have a 2.5" extra internal drive. No optical, all instant-on, all have Light Peak, all have good NVidia GPUs.

The entry Air will be 30% slower than the old MB but the SSD and weight will more than make up for that in terms of overall experience. If it doesn't happen at this revision, it will happen soon. Given that the MBA just launched, it's likely they would save this kind of change for late 2011 but Intel has given them nothing to use for the next update. They have 3 choices on the low-end:

[/quote]

This reality kinda sucks. The AIR will likely be stuck with a Core 2 for awhile until either AMD or INtel offer up a more suitable chip. The 13" MBP however has many potential options, it all depends upon how radical they want to get.

Quote:

- cram an i5 + 415M dedicated into a 13"

This would be very easy to do simply by deleting the optical. With a significantly faster processor to go with the GPU Appel could easily price the machine a bit higher to cover the additional expense. I wouldn't expect the fastest GPU in the world but it doesn't need to be all that fast to do significantly better that the all Intel option.

Quote:

- use minor CPU bumps with NVidia IGP = 2.66GHz C2D + 320M IGP

Possible but it is getting to the point where the market will be resistant.

Quote:

- drop the 13" MB and MBP in favour of the Air.

I hope not, at least not in the case of the 13" MBP! Really it would be very sad mainly because the AIRs still come up short for some users. The MB is an issue and frankly it needs to be repositioned, I'd like to see Apple drop the price significantly and make it a very low price intro machine.

That is take the plastic machine, throw in an AMD Zacate chip with a cheap drive and lower the price to around $600.

Quote:

Marketing-wise, I reckon the 3rd option is the best if they can get the 1.86GHz into the entry model.

The problem is the AIR's are for everybody and the 13" is a significant improvement over them. It would be better for Apple to add whatever is required to the 13" MBP to make that significance even wider. Things like LightPeak, multiple Blade SSD slots, a better GPU and other enhancements could still make for differentiation.

In any event it would be nice if Apple had Sandy Bridge based laptops for sale in January but previous releases highlight that Apple really doesn't care about Intels release schedule. Something will arrive when it pleases Apple.

backtomac · December 6, 2010 4:28PM

Quote:

Originally Posted by wizard69

When talking about laptops it is difficult to say exactly what we will be seeing performance wise. First most of Sandy Bridge's improvements are in the areas that compete with GPU computing. That is the the i86 integer units are not especially faster with respect to older hardware but the FP and SIMD units are vastly improved. So there is not enough data right now to say how good or bad OpenCL on a Sandy Bridge CPU would or wouldn't be.

In general though the extremely wide units in a GPU should have a huge advantage over the processor in SB for codes that can be leveraged on the GPU. Of course there will be cases (for the marketing department) where CPU executed OpenCL code will be faster. I do not think this will be the norm though. Most of the time well fitted code will be ten to hundred times faster on a GPU.

This isn't really something that can be baked off. The problem is there is a very wide range of advantage for the GPU depending upon exactly what is being processed on the GPU. Even if the code is 1:1 with the CPU you can still have an advantage with the GPU as it executes in parallel with the CPU. This is a good thing as the CPU needs to do a lo of data setup to prep the work for the GPU.

In many cases the developer of the App has to test to see what and where's of OpenCL code performance. As an end user you might not know the advantages seen by the developer.

It isn't like most GPU are optimized for the codes running via things like OpenCL either. Well they haven't been but there has been considerable improvement to GPU's to support running arbitrary code. I doubt a computer designer building a machine to do scientific computing would model the design on a GPU. The problem is in modern computers GPU's are a requirement, there is no mystery here the display technology requires their existence. So what we have people doing is leveraging all the engineering effort that goes into making a GPU fast for computing it wasn't originally designed to do. Thankfully the GPU architecture maps well to many parallel programming needs.

Well this is the other nice thing, we might not know if the app is built to use OpenCL. With the ability to specify a fall back i86 routine the app can run transparently to the user. So you might not know if an app is accelerated at all. The flip side of this is that apps don't get OpenCL support overnight, it takes a lot of work to realize the improvements.

One good example here is WebKit where Apple has slowly been accelerating Safari. It is getting there but one has to realize that OpenCL is a very new technology, so you can't expect new stuff overnight. The other thing to realize is that acceleration is often just for parts of an app.

You make some good points.

One additional thing to remember is that theoretically under OCL the gpu would be used in addition to the cpu. It isn't one or the other. OCL is supposed to be capable of using all processor resources available if they meet the spec.

That alone is a good reason to favor OCL capable gpus.

nht · December 6, 2010 4:42PM

Quote:

Originally Posted by wizard69

What advantages? The Sandy Bridge platform isn't uniformly faster. Further the opertunities for extremely wide SIMD is not there on a CPU. That given that this is the one area where Sandy Bidge is improved the most.

Define uniformly. Sandy Bridge appears to be 20% faster on a test that doesn't scale with anything but IPC. This is one potential measurement of "uniform" improvement.

Anand disagrees with your assessment:

"While Nehalem was an easy sell if you had highly threaded workloads, Sandy Bridge looks to improve performance across the board regardless of thread count. It's a key differentiator that should make Sandy Bridge an attractive upgrade to more people."

http://www.anandtech.com/show/3871/t...ns-in-a-row/13

From a consumer perspective I would guess that a large majority of the speed increase from GPU computation is from video encoding/transcoding. Something that Intel has dedicated silicon for in Sandy Bridge.

"Intel confirmed that Sandy Bridge has dedicated video transcode hardware that it demoed during the keynote. The demo used Cyberlink’s Media Espresso to convert a ~1 minute long 30Mbps 1080p HD video clip to an iPhone compatible format. On Sandy Bridge the conversion finished in a matter of a few seconds (< 10 seconds by my watch).

Dedicated hardware video transcode is Intel’s way of fending off the advance of GPU compute into the consumer market, particularly necessary since you can’t do any compute on Intel’s HD graphics (even on Sandy Bridge).

Given Intel’s close relationship with the software vendors, I suspect we’ll see a lot of software support for this engine when Sandy Bridge ships early next year."

http://www.anandtech.com/show/3916/i...anscode-engine

Tell me what OSX does via OpenCL that represents a huge time savings for the average user? That's an honest question...I simply can't think of anything I really spend lots of time on besides video transcodes.

Quote:

In the end the last thing Apple needs to do is stagnate GPU performance. People will not want to regress.

With the rumored nVidia and Intel settlement they may not need to go with SB alone.

If not, then it provides more differentiation between the light/consumer line and the pro line. As long as the Sandy Bridge only solution is faster than the Core 2 solution then it's not stagnation.

nht · December 6, 2010 5:01PM

Quote:

Originally Posted by backtomac

You make some good points.

One additional thing to remember is that theoretically under OCL the gpu would be used in addition to the cpu. It isn't one or the other. OCL is supposed to be capable of using all processor resources available if they meet the spec.

That alone is a good reason to favor OCL capable gpus.

It depends on how the media encoder hardware in Sandy Bridge is designed I guess:

"It is also possible that Intel's engineers were more interested in creating a general purpose computing engine that somehow emulates the functions of a basic GPU. If the logic there is really just a collection of small processing cores similar to shader units from AMD/NVIDIA then it could be that Intel's Sandy Bridge CPU might not just be faster at video transcoding; it could accelerate the full host of applications for content creation, photo editing and media viewing that are currently entrenched into the world of ATI Stream and NVIDIA CUDA.

Obviously we realize it will take some time to get the drivers and software support out there to enable this acceleration, if it exists. But what software developer in their right mind would NOT support hardware that will eventually be found in nearly every PC sold in 2011?"

Whether Apple would or wouldn't is debatable. How it works is conjecture (perhaps these are just hardware encoders/decoders for MPEG2, VC1 and H.264). But I dunno that you can write off Sandy Bridge as unusable in Apple's low end even without an nVidia settlement.

marvin · December 6, 2010 7:41PM

Quote:

Originally Posted by backtomac

IIRC, applications that ran on OCL using a 9400m were done faster than on 2.0 ghz dual C2Ds MPs. That bakeoff was a while ago so I may be mistaken. I bet Marvin would remember and could add more to the discussion.

Its my understanding that massively parallel tasks are far better performed on a GPU. Otherwise our GPUs would look a lot like CPUs, no? Tasks like that are well suited for OCL. Its a shame we don't have more applications that leverage the GPU when its advantageous to do so.

Yeah, graphics calculations benefit hugely on the GPU and even Intel admit this is the case:

http://www.engadget.com/2010/06/24/n...imes-faster-t/

There is an OpenCL benchmark here with a few tests and some show direct CPU/GPU comparisons:

http://www.macupdate.com/app/mac/32266/opencl-benchmark

The teapot one shows the same output and you switch from CPU to GPU by simply pressing P. The 320M runs it over 60x faster than a Core 2 Duo and it's not really surprising if you do post-production rendering because you will rarely get renders coming out faster than 0.2FPS. Compare that to GPUs that churn out 720p @ 30+ FPS. Obviously the CPU output is usually fully anti-aliased with many more samples and you get complete flexibility to run any code and any algorithm be it rasterisation, raytracing, voxels etc but at present, like-for-like data, the GPU is still processing in the region of an order of magnitude faster than the CPU (comparing CPUs to GPUs that are typically bundled together).

This issue is what types of code can actually run on it. Shader kernels are fine but x86 programs haven't been built for them so people need to think a lot harder about how to leverage the GPU in the best way and where in the code to do it. The ideal scenario would be to have a Core set of function calls that are OpenCL optimised by Apple so people can just drop them in here and there so at least there would be some speed-up.

There was a mention of the 320M being not in the same league as a Tesla or Fermi GPU and that's a fair point but a 320M still has 48 SPs running at 950MHz. Compared to 512 SPs @ 1.4GHz in a Fermi card, it falls short but still very capable.

It's also not about CPU vs GPU but trying to use both. If a 320M only matched a Core 2 Duo, by leveraging it during computation, you could still double your machine performance, which is well worth pursuing. It's certainly better than zero which is what you get with an Intel GPU.

The idea with the Intel CPU is that it runs the OCL code but this would only be beneficial if they used i5s all round. The i3 chips in the laptops aren't that much faster than C2Ds so C2D running normal code + 320M running OCL is better than Core i running both.

Core i + dedicated is the best of both of course and has the benefit of giving you two GPUs so you could in theory run your display off the core i GPU while doing OCL compute on the dedicated and normal computation on the CPU.

Quote:

Originally Posted by wizard69

This reality kinda sucks. The AIR will likely be stuck with a Core 2 for awhile until either AMD or INtel offer up a more suitable chip.

Ivy Bridge will be interesting though as they are supposed to be going quad-core across the lineup. This would mean the chips will be fast enough. Intel have hired one of AMD's GPGPU guys too so there's a chance they could get a quad-core CPU with a built-in GPU that handles GPU computation by late 2011. I would be certain it won't come close to NVidia's or AMD's offerings but even if it matched the 320M at the end of the year, that would be good enough.

It'll be interesting to see what route Apple will take with the 13". As you say, i think at this stage, people are getting tired of being shown Core 2 Duos for the 5th year running. It's not Apple's fault though. Intel's i3 isn't fast enough to be worth changing. I think i5 + dedicated would be the best value for the consumer in the 13". If they combine that with a MBA-like design, that would certainly be a great step forward.

I think if they do that, it's almost certain they will drop the white model because it wouldn't be strong enough to design the same way. They could even drop the 13" Air. If they lighten the 13" MBP and sell it $100 cheaper with a faster CPU and GPU but is maybe 0.5lb heavier, there's no point in having the 13" Air.

wizard69 · December 6, 2010 9:09PM

Quote:

Originally Posted by nht

Define uniformly. Sandy Bridge appears to be 20% faster on a test that doesn't scale with anything but IPC. This is one potential measurement of "uniform" improvement.

I'm referring to the fact that some parts of the processor have been enhanced more than others. It isn't a simple clock rate or cache boost but rather significant reworking of the chips internals. As for benchmarks I will wait for shipping hardware.

Quote:

Anand disagrees with your assessment:

"While Nehalem was an easy sell if you had highly threaded workloads, Sandy Bridge looks to improve performance across the board regardless of thread count. It's a key differentiator that should make Sandy Bridge an attractive upgrade to more people."

http://www.anandtech.com/show/3871/t...ns-in-a-row/13

I wouldn't call that disagreeing with me. All I said was that the improvement where not uniform. Major parts of the chip have been overhauled, it's performance really needs to be fleshed out in an unbiased evaluation. From my perspective you aren't unbiased if you get prerelease parts from intel.

Quote:

From a consumer perspective I would guess that a large majority of the speed increase from GPU computation is from video encoding/transcoding. Something that Intel has dedicated silicon for in Sandy Bridge.

I'm not sure I agree with that! The thing is if your web browser is using GPU computing that is likely to be a major use of the tech. As far as transcoding and such does it really make sense to put such functionality in the CPU? It is almost like intel is going out of it's way to partition the processor so that they can attack the competitions solutions. They will be able to say "hey our processors can (en)decode video", AMDs can't. This is all well and good except for the fact that AMD has a fairly good GPU to do such work on.

Quote:

"Intel confirmed that Sandy Bridge has dedicated video transcode hardware that it demoed during the keynote. The demo used Cyberlink?s Media Espresso to convert a ~1 minute long 30Mbps 1080p HD video clip to an iPhone compatible format. On Sandy Bridge the conversion finished in a matter of a few seconds (< 10 seconds by my watch).

Dedicated hardware video transcode is Intel?s way of fending off the advance of GPU compute into the consumer market, particularly necessary since you can?t do any compute on Intel?s HD graphics (even on Sandy Bridge).

Exactly. The question is why put this in the CPU instead of the GPU? As to Intels marketing they neglect one important fact: your computer needs a GPU anyways!

Quote:

Given Intel?s close relationship with the software vendors, I suspect we?ll see a lot of software support for this engine when Sandy Bridge ships early next year."

http://www.anandtech.com/show/3916/i...anscode-engine

Is it open enough that VLC and Handbrake can use it? It is a good question because past history here has been tough.

Quote:

Tell me what OSX does via OpenCL that represents a huge time savings for the average user? That's an honest question...I simply can't think of anything I really spend lots of time on besides video transcodes.

It may be an honest question but do you really expect people to read your mind? That is we have no idea about your usage nor about the software that you are using. Further you as a user might not even know if a developer made use of OpenCL.

Quote:

With the rumored nVidia and Intel settlement they may not need to go with SB alone.

If not, then it provides more differentiation between the light/consumer line and the pro line. As long as the Sandy Bridge only solution is faster than the Core 2 solution then it's not stagnation.

Well we don't know that it will be faster. At least not the GPU part.

backtomac · December 6, 2010 9:14PM

@Marvin

Saw your link and browsed the OCL thread at Macupdate.

As I suspected, the mobile CPUs aren't close to GPUs in performance under OCL. While we don't have mobile SB CPUs to bench under OCL yet, I don't see SB able to close that gap as it's quite large.

The desktop SB parts look like they'll give the GPUs a run for their money however, based upon Nehlem performance on the OCl bench.

nht · December 6, 2010 10:31PM

Quote:

Originally Posted by wizard69

I'm referring to the fact that some parts of the processor have been enhanced more than others. It isn't a simple clock rate or cache boost but rather significant reworking of the chips internals. As for benchmarks I will wait for shipping hardware.

I wouldn't call that disagreeing with me. All I said was that the improvement where not uniform. Major parts of the chip have been overhauled, it's performance really needs to be fleshed out in an unbiased evaluation. From my perspective you aren't unbiased if you get prerelease parts from intel.

I think we have a disagreement over terms. When a review says "improvement across the board" I kinda treat that as uniformly better even if it isn't a straight 26% across the board but 20% here and 30% there. Close enough for uniformly given there's a bunch of different areas to improve.

If I'm trading only 5% GPU performance increase for 25% everywhere else from going Core 2+320M to Sandy Bridge i5 I think that's a win.

Quote:

I'm not sure I agree with that! The thing is if your web browser is using GPU computing that is likely to be a major use of the tech.

You'll have to show me where webkit can reasonably use opencl calls.

Quote:

As far as transcoding and such does it really make sense to put such functionality in the CPU?

The hardware encoder isn't likely burning CPU cycles to do the encoding...and memory bandwidth being used is also going to get used using the GPU.

Quote:

It is almost like intel is going out of it's way to partition the processor so that they can attack the competitions solutions. They will be able to say "hey our processors can (en)decode video", AMDs can't. This is all well and good except for the fact that AMD has a fairly good GPU to do such work on.

Um...yes? From the user's perspective I don't care if it is the GPU or CPU that permits me to transcode in half the time...

Quote:

Exactly. The question is why put this in the CPU instead of the GPU? As to Intels marketing they neglect one important fact: your computer needs a GPU anyways!

It has a GPU in Sandy Bridge...on the same die...along with the encoder. Sure, you can clearly point to the GPU and the CPU transistors but ah...I don't understand your point.

Quote:

Is it open enough that VLC and Handbrake can use it? It is a good question because past history here has been tough.

Depends on Apple right? If it is in Apple interest to do so, they will. There won't be a significant technical reason not to. This is presuming that the software that currently uses OpenCL isn't directly calling OpenCL but using it though the Core API.

Presumably if you can write your code to use OpenCL you can also write it to check to see if a hardware encoder is available on die to use as long as the compiler supports it. You can likely assume Intel's compiler will have it when they update for sandy bridge.

Quote:

It may be an honest question but do you really expect people to read your mind? That is we have no idea about your usage nor about the software that you are using. Further you as a user might not even know if a developer made use of OpenCL.

I have a Dell C410x Tesla chassis sitting somewhere in my lab that we'll use for CUDA to accelerate MATLAB. We have folks here that have used GPUs for other computations. Here's the downside they've told me...pushing the data across to the GPU for non-graphics types of computations can often eat up the performance gain (at least for CUDA). You need a reasonably large problem set to justify the setup time so not all problems that generally are parallelizable are necessarily amenable to acceleration via OpenCL.

So I'm not asking you to "read my mind". I'm asking you or anyone to provide a real world scenario for the average user where OpenCL makes a significant difference. You can define "average user" as you like.

How do we know if a developer has leveraged OpenCL? We don't.

But I think the burden is on you to show the real world advantage of OpenCL vs 20% IPC improvement. The latter is quantifiable improvement to the overall system.

Quote:

Well we don't know that it will be faster. At least not the GPU part.

The i5 sample seems to indicate it won't be much slower.

nht · December 6, 2010 11:11PM

Quote:

Originally Posted by Marvin

Yeah, graphics calculations benefit hugely on the GPU and even Intel admit this is the case:

Why would you choose CPU rendering over GPU rendering for the normal user? Why does OpenCL represent an advantage over using OpenGL for graphic calculations?

Quote:

http://www.engadget.com/2010/06/24/n...imes-faster-t/

There is an OpenCL benchmark here with a few tests and some show direct CPU/GPU comparisons:

http://www.macupdate.com/app/mac/32266/opencl-benchmark

The teapot one shows the same output and you switch from CPU to GPU by simply pressing P. The 320M runs it over 60x faster than a Core 2 Duo and it's not really surprising if you do post-production rendering because you will rarely get renders coming out faster than 0.2FPS. Compare that to GPUs that churn out 720p @ 30+ FPS. Obviously the CPU output is usually fully anti-aliased with many more samples and you get complete flexibility to run any code and any algorithm be it rasterisation, raytracing, voxels etc but at present, like-for-like data, the GPU is still processing in the region of an order of magnitude faster than the CPU (comparing CPUs to GPUs that are typically bundled together).

Ah...yah, it's a GPU. I sure hope it renders faster than a software render. Again, the challenge is to show a real-world scenario for the average user where OpenCL represents a significant performance gain. The primary scenario is likely video transcode...but Intel has added hardware transcode to Sandy Bridge.

Quote:

There was a mention of the 320M being not in the same league as a Tesla or Fermi GPU and that's a fair point but a 320M still has 48 SPs running at 950MHz. Compared to 512 SPs @ 1.4GHz in a Fermi card, it falls short but still very capable.

It's also not about CPU vs GPU but trying to use both. If a 320M only matched a Core 2 Duo, by leveraging it during computation, you could still double your machine performance, which is well worth pursuing. It's certainly better than zero which is what you get with an Intel GPU.

Except it's not zero for an Intel GPU if the task is transcoding. The on die media processor should burn no CPU cycles for transcoding.

CPU+GP GPU computation power is all very well but it's kinda like horsepower measured at the crankshaft and not at the wheel after transmission losses.

Quote:

The idea with the Intel CPU is that it runs the OCL code but this would only be beneficial if they used i5s all round. The i3 chips in the laptops aren't that much faster than C2Ds so C2D running normal code + 320M running OCL is better than Core i running both.

Ah...I thought the Sandy Bridge i3s were all desktop...so it would be Core i5s all around.

And again, if the primary real world advantage of Open CL for the average user is video transcoding then the media processor fits that bill. Intel's design philosophy appears to be anything that be defined as a fixed function unit should be. I would agree for a problem outside the most common that this strategy is inferior to going the GP-GPU route but I still haven't identified what that use case is.

Quote:

It'll be interesting to see what route Apple will take with the 13". As you say, i think at this stage, people are getting tired of being shown Core 2 Duos for the 5th year running. It's not Apple's fault though. Intel's i3 isn't fast enough to be worth changing. I think i5 + dedicated would be the best value for the consumer in the 13". If they combine that with a MBA-like design, that would certainly be a great step forward.

I think if they do that, it's almost certain they will drop the white model because it wouldn't be strong enough to design the same way. They could even drop the 13" Air. If they lighten the 13" MBP and sell it $100 cheaper with a faster CPU and GPU but is maybe 0.5lb heavier, there's no point in having the 13" Air.

I could see MBP 13" getting a dedicated GPU and the MBA, MB and Mini getting stuck with IGP (whether sandy bridge native or nvidia if a settlement occurs) seems like a viable option given that's the status quo.

You might also see the 13" MBP get the i7 2620M where the 13" MB and Mini get the i5 2540M and the MBAs stay C2D with a lower TDP.

nht · December 6, 2010 11:40PM

Paint me slow but...that 20%+ performance improvement for SB i5 (desktop) is over the current i5 (desktop) without a working turbo boost...not over the C2D or i3.

You can claim 60x performance improvement for specific OpenCL benchmarks but give me the Sandy Bridge i5 w/no OpenCL over the C2D + 320M combo w/OpenCL any day of the week.

wizard69 · December 7, 2010 7:17AM

Quote:

Originally Posted by nht

I think we have a disagreement over terms. When a review says "improvement across the board" I kinda treat that as uniformly better even if it isn't a straight 26% across the board but 20% here and 30% there. Close enough for uniformly given there's a bunch of different areas to improve.

Most likely. Honestly though discussions about Sandy bridge performance really don't mean anything until we have a better picture of what the processor does performance wise.

Quote:

If I'm trading only 5% GPU performance increase for 25% everywhere else from going Core 2+320M to Sandy Bridge i5 I think that's a win.

That may be fine for you but for many users that isn't going to be the case.

Quote:

You'll have to show me where webkit can reasonably use opencl calls.

I don't follow WebKit that closely that I could pinpoint every acceleration effort but just about every graphical web browser is seeing some attempts at performance improvements through GPU acceleration.

Quote:

The hardware encoder isn't likely burning CPU cycles to do the encoding...and memory bandwidth being used is also going to get used using the GPU.

I guess my problem is that personally it doesn't make sense to have a video encoder any farther away from the GPU than is absolutely required.

Quote:

Um...yes? From the user's perspective I don't care if it is the GPU or CPU that permits me to transcode in half the time...

This is very true for most users.

Quote:

It has a GPU in Sandy Bridge...on the same die...along with the encoder. Sure, you can clearly point to the GPU and the CPU transistors but ah...I don't understand your point.

Depends on Apple right? If it is in Apple interest to do so, they will. There won't be a significant technical reason not to. This is presuming that the software that currently uses OpenCL isn't directly calling OpenCL but using it though the Core API.

Intels current GPU hardware has had technical issues that limit what they can do.

Quote:

Presumably if you can write your code to use OpenCL you can also write it to check to see if a hardware encoder is available on die to use as long as the compiler supports it. You can likely assume Intel's compiler will have it when they update for sandy bridge.

I have a Dell C410x Tesla chassis sitting somewhere in my lab that we'll use for CUDA to accelerate MATLAB. We have folks here that have used GPUs for other computations. Here's the downside they've told me...pushing the data across to the GPU for non-graphics types of computations can often eat up the performance gain (at least for CUDA). You need a reasonably large problem set to justify the setup time so not all problems that generally are parallelizable are necessarily amenable to acceleration via OpenCL.

This can be the case where setup time can swamp the gains from the fast calculations. It isn't just setup time either parallel code that is branchy won't work that well either. In the end the GPU can be seen as sort of a vector processor. It really doesn't do well running the sorts of code that the i86 is designed for.

Quote:

So I'm not asking you to "read my mind". I'm asking you or anyone to provide a real world scenario for the average user where OpenCL makes a significant difference. You can define "average user" as you like.

I think this has already been covered. It depends upon he user in question and his selection of software.

Quote:

How do we know if a developer has leveraged OpenCL? We don't.

But I think the burden is on you to show the real world advantage of OpenCL vs 20% IPC improvement. The latter is quantifiable improvement to the overall system.

This has already been covered the performance that can be realized via OpenCL type acceleration is all over the map depending upon what you are doing exactly. There are good bench marks floating about the net that supports this. You on the other hand are claiming a 20% increase in performance from Sandy Bridge based on thin and hardly credible evidence.

I know it rubs people the wrong way, but if you are getting prerelease hardware samples from the likes of Intel your credibility is questionable as a reporter. So I discount the reportage seen so far and wait for more independently derived info. That only with respect to comparing the CPU against the competition and other intel hardware. With respect to GPU computing, I don't think there are any reasonable arguments against it, when practical it can give the user a huge gain for little effort.

Quote:

The i5 sample seems to indicate it won't be much slower.

I'm going to try to wrap this up in a nutshell.

GPU / Open CL computing is good because it leverages hardware you already have in place!

Performance of GPU's for GPGPU computing is all over that map. Thus we have this fact that if you dont test you will never know what has happened.

Sandy Bridges ability ot run i86 code very fast is very intersting but one has to realize tha the GPU would isn't slowing down. AMD has just announced a new hiper performance GPU for mobile. If I remember correctly there are like 96 "cores" in this GPU. That will be very difficult to compete with when you have a handful of kitchen hardware.

wizard69 · December 7, 2010 7:27AM

Quote:

Originally Posted by nht

Paint me slow but...that 20%+ performance improvement for SB i5 (desktop) is over the current i5 (desktop) without a working turbo boost...not over the C2D or i3.

You can claim 60x performance improvement for specific OpenCL benchmarks but give me the Sandy Bridge i5 w/no OpenCL over the C2D + 320M combo w/OpenCL any day of the week.

I prefer to wait for the bigger picture. This won't happen in the base MBP range but I'd rather see a Sandy Bridge processor and a good attached GPU. Even better would be AMD's approach to their so called APUs.

In the long run the discrete GPU is a thing of the past for all but upper end systems. As such I admire AMD's vision to make computing heterogeneous with the GPU having full access to the system bus. That is off in the future but when it does happen we will have eliminated much of the current problems associated with OpenCL and other GPU compute frameworks.

nht · December 7, 2010 9:42AM

Quote:

Originally Posted by wizard69

Most likely. Honestly though discussions about Sandy bridge performance really don't mean anything until we have a better picture of what the processor does performance wise.

That may be fine for you but for many users that isn't going to be the case.

A 5% improvement is somehow a detriment? Exactly what are the alternatives? Sure we can hope for a discrete GPU in the 13" MBP but that isn't likely for the MB or Mini. So the alternatives are:

1) i5 + nVidia IGP - requires legal settlement.

2) C2D + nVidia IGP - serious performance penalty

3) AMD CPU/GPU - not frigging likely even if llano is the second coming.

Quote:

I don't follow WebKit that closely that I could pinpoint every acceleration effort but just about every graphical web browser is seeing some attempts at performance improvements through GPU acceleration.

You need to provide a link. We're talking about OpenCL...not normal GPU acceleration. This is why the teapot is not a very good example. I can render a teapot in OpenGL.

Quote:

I guess my problem is that personally it doesn't make sense to have a video encoder any farther away from the GPU than is absolutely required.

IT'S ON THE SAME DIE. How far away do you think it is?

Quote:

Intels current GPU hardware has had technical issues that limit what they can do.

And Anand is showing very good performance with the Sandy Bridge i5 he had. Do you think that the intel IGP drivers are going to suddenly get worse from engineering sample to release?

Quote:

I think this has already been covered. It depends upon he user in question and his selection of software.

This has NOT been covered. The assertion is that OpenCL is so important that Apple cannot use Sandy Bridge without OpenCL GPU support.

So the folks making this assertion has to show AT LEAST ONE USE CASE WHERE OPENCL SUPPORT IS CRITICAL.

The only limitations is that this has to be something a normal user is likely to do and something that the hardware transcoding in Sandy Bridge cannot do. I guess the another caveat is that it should be something that can't be done in OpenGL.

Quote:

This has already been covered the performance that can be realized via OpenCL type acceleration is all over the map depending upon what you are doing exactly. There are good bench marks floating about the net that supports this. You on the other hand are claiming a 20% increase in performance from Sandy Bridge based on thin and hardly credible evidence.

So you're saying that Anand isn't a credible source for performance benchmarks? The engineering sample is most likely SLOWER than production given that turbo boost wasn't working. Not Faster.

Benchmarks "all over the map" is not more credible when trying to assess the impact of OpenCL to the general user. You have to show that the specific 600x OpenCL performance increase has real world impact to the general user.

Quote:

I know it rubs people the wrong way, but if you are getting prerelease hardware samples from the likes of Intel your credibility is questionable as a reporter. So I discount the reportage seen so far and wait for more independently derived info. That only with respect to comparing the CPU against the competition and other intel hardware. With respect to GPU computing, I don't think there are any reasonable arguments against it, when practical it can give the user a huge gain for little effort.

Do you really think Anand is going to lie about the performance of SB or do you believe that Intel thinks they have a real winner and is sampling to some tech sites they like? Anand may or may not have an intel bias but Intel is executing very well at the moment. Do you really think they're going to suck forever at IGPs?

With respect to GPU computing you actually have yet to show where it gives the user a huge gain for little effort. The reasonable argument is "show me a commonly used application that sees significant speedup through OpenCL or Cuda".

Quote:

I'm going to try to wrap this up in a nutshell.[*]GPU / Open CL computing is good because it leverages hardware you already have in place!

Then provide an example where OpenCL is leveraged today for the average (not scientific) user.

Quote:

[*]Performance of GPU's for GPGPU computing is all over that map. Thus we have this fact that if you dont test you will never know what has happened.

This is not a plus.

Quote:

[*]Sandy Bridges ability ot run i86 code very fast is very intersting but one has to realize tha the GPU would isn't slowing down.

Except for the fact that most normal software isn't GPU bound. The PRIMAY use case for GPU bound software are games. If the Sandy Bridge GPU is capable of running about the same as Radeon HD 5450 that's not bad. You still probably don't want to go raiding in WoW with it but for casual gaming, not so bad. No worse than the 9400M I use for casual gaming at home.

Quote:

AMD has just announced a new hiper performance GPU for mobile. If I remember correctly there are like 96 "cores" in this GPU. That will be very difficult to compete with when you have a handful of kitchen hardware.

Assuming you're talking about llano we'll see how it does when we actually get some benchmarks ...engineering samples are just fine.

However, CPU performance for many normal tasks (ie not games and video transcode) is probably more important. This is why many mainstream laptops can get away with a IGP.

backtomac · December 7, 2010 10:04AM

Quote:

Originally Posted by nht

So the folks making this assertion has to show AT LEAST ONE USE CASE WHERE OPENCL SUPPORT IS CRITICAL.

.

There isn't any now but do neither you nor I are privy to Apple's future plans.

Its important to remember that Apple helped develop OCL and baked it into the OS. If they don't use processors that capable of leveraging that technology they risk allowing that technology to 'die on the vine'.

Developers need to know that APple are going to support OCL for the long term before they commit the time and resources to coding their apps to take advantage of it.

I think its too early for Apple to give up on OCL.

nht · December 7, 2010 12:55PM

Quote:

Originally Posted by backtomac

There isn't any now but do neither you nor I are privy to Apple's future plans.

Its important to remember that Apple helped develop OCL and baked it into the OS. If they don't use processors that capable of leveraging that technology they risk allowing that technology to 'die on the vine'.

Developers need to know that Apple are going to support OCL for the long term before they commit the time and resources to coding their apps to take advantage of it.

I think its too early for Apple to give up on OCL.

Apple isn't giving up on OCL in either scenario given Intel is providing OCL support on the CPU. OpenCL is supposed to run on CPUs, GPUs and whatever other hardware happens to be available. As long as the Sandy Bridge MB performs well in the GPGPU benchmarks then who cares if the OpenCL code execution occurs on a GPU or CPU.

The two primary GPGPU benchmarks that matter to the average user is the media transcode test and possibly the cryptography test (disk encryption, DRM, etc). We've already seen Intel highlight transcoding on Sandy Bridge. For cryptography this is what SiSoft thinks with their new GPGPU tests:

"The interesting bit is that SiSoft expects how CPUs should be able to challenge GPGPUs in the near future, given that "Using 256-bit register width (instead of 128-bit of SSE/2/3/4) yields further performance gains through greater parallelism in most algorithms. Combined with the increase in processor cores and threads we will soon have CPUs rivaling GPGPUs in performance."

http://www.brightsideofnews.com/news...gpu-tests.aspx

Gee, who has 256-bit wide SIMD units for AVX instructions?

So I found another use case for you guys but it happens to be another one that Intel is baking into Sandy Bridge. So if the OpenCL call in the Intel OpenCL CPU drivers hits the media encoder hardware or uses the 256-bit SIMDs rather than a GPU but gets the same performance who cares?

There is an interesting tidbit at the end regarding nVidia's OpenCL drivers...evidently the h.264 transcoding and cryptography benchmarks don't work with their current drivers due to lack of exposure of some OpenCL features.

That Apple might ship a laptop with a GPU doesn't support specific OpenGL functions in hardware doesn't mean Apple would be abandoning OpenGL. It just means those calls run a little slower.

Likewise for OpenCL. If I do a FFT in OpenCL and it runs on all Apple laptops then Apple is continuing to fully support OpenCL...even if the base machines run slower. If I need to, I'll detect if the hardware can support my needs just like I would today to make sure the GPU has all the shader features and other support I need to render at a certain quality and speed.

tailpipe · December 7, 2010 1:58PM

This thread has become so geeky it's terrifying. Can you try and dial it down a couple of notches so lesser mortals can understand what the heck you're saying?

Wiz, Marvin, bactomac, blueeddie and nht would care to summarise your analysis in single paras. Thank you in anticipation!

marvin · December 7, 2010 2:00PM

Quote:

Originally Posted by nht

Why would you choose CPU rendering over GPU rendering for the normal user?

Video effects like you see in iMovie, Motion, Final Cut Express and PhotoBooth can be more complex than basic GPUs (like Intel's) can handle so you'd have no choice but to drop the rendering to the CPU and it wouldn't be real-time or at least couldn't be previewed. I'm not sure if Apple apps use effects that are beyond the Intel GPUs but they run slower at least and After Effects and Maya certainly have GPU rendering that isn't supported on Intel's chips.

Quote:

Originally Posted by nht

Why does OpenCL represent an advantage over using OpenGL for graphic calculations?

Flexibility. OpenCL allows you to do more general-purpose computation, which is important for advanced shader effects. You can also offload physics calculations to the GPU like you see in PhysX. Here's an example of an OpenCL compositor comparing CPU and GPU doing things you can't do with OpenGL alone:

http://www.youtube.com/watch?v=sVkDx_4GP5M

Same code, same output just CPU vs GPU and the GS 220 beats the quad 2.66 i7 920 quite easily. I can't find the spec for the 220GS but usually GS versions are inferior to GT versions and the 220GT has 48 SPs @ 1.4GHz. It's about 50% faster than the 320M but still would suggest that the 320M would outperform a quad-core i7 in this type of task so personally I'd rather have that with a slower CPU than Intel's i5.

Quote:

Originally Posted by nht

Ah...yah, it's a GPU. I sure hope it renders faster than a software render.

AFAIK the test is designed to run only the post-processing code on either the CPU or GPU not the geometry.

Quote:

Originally Posted by nht

Except it's not zero for an Intel GPU if the task is transcoding. The on die media processor should burn no CPU cycles for transcoding.

Only for supported formats though. Once you start transcoding AVCHD to ProRes, it's not any faster. OpenCL can be leveraged to do any transcoding. The specs given for the media transcoder in SB sound amazing though and I would love to have a dedicated processor that can do 400FPS H.264 transcoding and I'm surprised they haven't done this already because MP4 H.264 is used everywhere and is so slow to encode.

Intel at CES to show off next-gen of Apple-bound Sandy Bridge processors

Comments