artificialintel wrote: »
I think this is going too far. Samsung et al. did significant work to optimize their SoC for more cores, and heterogeneous ones at that. Building a clever architectures that can spin cores up and down quickly, manage cache contention and so on is its own kind of engineering work, and several generations in, they've significantly improved their power consumption and ability to keep the cores properly fed under load.
Having said that, I do still agree with the basic point that Apple is much better positioned to start improving performance by adding cores than Samsung or Qualcomm are to improve performance by improving their cores. There's also absolutely no scope left for adding more than four cores, because few applications even really use three yet. (The Exynos SoCs are really a quad-heavyweight CPU set and a quad-flyweight set for two totally different roles.)
The comparisons between the Exynos 8 and the A9 are interesting, but it's always worth remembering that the Exynos 8 is a brand new chip straight off the production line which isn't used in any phones yet, while the comparison is being made against the A9 chip which is installed in a many tens of millions of iPhones out in the real world. Even so, the brand new Exynos 8 still can't match the A9 and of course it will need an operating system that can properly take advantage of it when used in a smartphone. Apple's A9 already has that.
The problem for Samsung is that Apple continues to further refine both it's 64 bit chips and IOS and seems to be doing so at quite a pace. If their new chip can't match the A9 now, then what will they do when Apple's next generation A series chip appears?
The more I see of Apple's approach, the more I realise that they have deployed a sophisticated strategy that Samsung simply can't follow indefinitely. Samsung are obliged to spend vast amounts of development money to try and match features that Apple offers, but Samsung can't sell enough of their top-end phones to make it a sufficiently profitable business. In short, Apple is enticing Samsung into a spending war which could greatly reduce Samsung's profitability, while Apple's profits handsomely reward their own development expenditure. Samsung's determination to be a fast follower ( in Korean, it's pronounced 'innovative leader' ), coupled with Samsung's pride is being exploited by Apple to encourage Samsung to ruin itself.
Your statement, in bold above, really doesn't support cause and effect; the only implication is that Qualcomm, Samsung, and Mediatek make processors with more than two cores; why they do that is much more complex. Android OS isn't magical for operating on the processor cores it is given. Since Apple has already used a custom three core configuration, one might assume that Apple gamed iOS, applications, and hardware for the most beneficial configuration of the die at the time of the design for the node it would be run on.
Apple isn't "catching up"; they are quite ahead of the competition, which are primarily semi-custom ARM core configurations, not the highly tuned designs of the A series.
iPad Pro features 2 cores as opposed to 3 that iPad Air2 offered. So, if anything, Apple iOS platform isn't going anywhere from 2 core setup (which is good). There is no real benefit in having more cores (when comparing 1 core CPU running at the same performance level as 8 cores, each running 1/8 performance of that 1 core CPU), but in return it offers a great pain in the butt when it comes to reaching 100% potential of the chip with an app that can't use several threads. In that case you are bound to only 12.5% of the max. performance no matter what you do.
Multi-core is multi-core regardless of how many there actually are. Apple stuck with a 2 core design because they designed iOS to be extremely efficient and make use of all the resources available to it. A LOT of processing in iOS is pushed to the GPU and always has. Plus there are other processing units including the ISP. Adding more cores in Android makes more sense since developers (and Google) have no idea what other processing units will be available at any given time. This is where and why Apple has a huge advantage with their vertical integration; they can optimize the hell out of the entire platform.
After reading EricTheHalfBee's post, I certainly believe that Apple does have a large lead on Samsung/Qualcomm chips, because as noted the path(s) open to Apple to increase performance by adding a couple more cores (e.g. going to 4) is straightforward. As you note it is how you spend transistors, and Apple has spent them in making great performing cores. Much harder to do than adding more cores. Next couple of years will certainly be interesting.
On the point of local (on device) vs. cloud, it is important to note that today (and IMO the next 5+ years), despite ever increasing mobile/wireline broadband and huge cloud resources, there is still a noticeable difference between services executed locally and those done in the cloud. More power on the device allows more to be done there (hopefully in future a good bit of Siri voice recognition as an example), which makes a better experience. Sub-second responses count. While making people want to upgrade to a premium device every few years is Apple's business model, having a premium device can mean a better experience. We are still a ways away from the Star Trek infinite bandwidth, zero latency, instant cloud processing future.
That is not what Anandtech say in this recent article: The Mobile CPU Core-Count Debate: Analyzing The Real World
The tests are run on Samsung's Galaxy S6 with the Exynos 7420 (4x Cortex A57 @ 2.1GHz + 4x Cortex A53 @ 1.5GHz) which should serve well as a representation of similar flagship devices sold in 2015 and beyond.
Depending on the use-cases, we'll see just how many of the cores on today's many-core big.LITTLE systems are used. Together with having power management data on both clusters, we'll also see just how much sense heterogeneous processing makes and just how much benefit one can gain from it.
The initial web site rendering is clearly done by the big cluster, and it looks like all 4 cores have working threads on them
When looking at the total amount of threads on the system, we can see that the S-Browser makes good use of at least 4 CPU cores with some peaks of up to 5 threads. All in all, this is a scenario which doesn't necessarily makes use of 8 cores per-se, however the 4+4 setup of big.LITTLE SoCs does seem to be fully utilized for power management as the computational load shifts between the clusters depending on the needed performance.
This time around, we a more even distribution of the load on the little cores. Again, most of the 4 CPU cores are active and have threads placed onto them, averaging about 2.5 fully loaded cores.
The biges cores seems much less loaded in this scenario, as most of the time except for a small peak we only have 1 large thread loading the cluster. Because of this, we expect the other cores to be shut down and if we look at the power state distribution we guessed correctly.
The total amount of threads on the system doesn't change much compared to the previous scenario: The S-Browser still manages to actively make good use of up to 4 cores with the occasional burst of up to 5 threads.
Chrome is the de-facto browser application on a lot of Android devices. We again use it to load the AnandTech frontpage and to analyse the CPU's behaviour.
Chrome seems to place much higher load on the little cores compared to S-Browser. When looking at the run-queue chart we see that indeed all cores are almost at their full capacity for a large amount of time.
What stands out though is a very large peak around the 4s mark. Here we see the little cores peak up to almost 7 threads, which is quite unexpected. This burst seems to overload the little cluster's capacity. The frequency also peaks to 1.3GHz at this point. The reason we don't see it go higher is probably that the threads are still big enough that they're picked up by the scheduler and migrated over to the big cluster at that point.
The big cores also see a fair amount of load. Similarly to the S-Browser we have 1 very large thread that puts a consistent load on 1 CPU. But curiously enough we also see some significant activity on up to 2 other big cores. Again, in terms of burst loads we see up to 3 big CPUs being used concurrently.
The total run-queue depths for the system looks very different for Chrome. We see a consistent use of 4-5 cores and a large burst of up to 8 threads. This is a very surprisng finding and impact on the way we perceive the core count usage of Chrome.
Chrome is able to consistently make use of a large amount of threads, so that we see use of up to 6 CPUs with small bursts of up to almost 9 threads.
Moving away from browser-based scenarios, we move onto real application use-cases. We start off with Google Hangouts.
During the initial application launch, we don't see much activity on the little cores. Cores 1-3 are mostly power-gated and we see that there's little to no threads placed onto the cluster during that period. Once the app opened, we see the threads migrate back onto the little cluster. Here we see full use of all 4 CPU cores as each core has threads placed on it doing activity.
This is the perfect burst-scenario for the big cores. The application launch kicks in the cores into high gear as they reach the full 2.1GHz of the SoC. We see that all 4 cores are doing work and have thread placed on them.
In general, the workload is optimized towards 4-core CPUs. Because 4x4 big.LITTLE SoCs in a sense can be seen as 4-core designs, we don't see an issue here.
I wanted also to have a closer look at CPU behaviour while using the phone's camera. First off, we start off by analyzing what happens when we launch the camera application.
Most of the work when launching the camera was done by the big cluster. Here we see all 4 cores jumping into action.
Samsung seems to be able to parallelize well the camera application as this is again a sensible scenario that makes good usage of the 4.4 big.LITTLE topology of the SoC.
Real Racing 3 PlayingOf course we have to also measure what happens during a normal play-through. I recorded a 38s section of in-game activity while racing part of a lap around a circuit.
The little cores see at least 3 major threads loaded onto them. The 4th core is doing some work as well, but quite a bit less than the first 3.
I think it's pretty safe to come to the conclusion that Real Racing 3 is coded with quad-core CPUs in mind as we see exactly 4 major threads loading the SoC's CPUs to various extent.
Trying out another popular high-end game, we have a look at Modern Combat 5.
Again this is a case of using parallelization for the sake of power efficiency instead of performance. The 3 smaller threads on the little cores could have well been handled by a single larger CPU at higher frequency, but it wouldn't have been nearly as power efficient as spreading them onto the smaller cores.
Overall Analysis & Conclusion
When I started out this piece the goals I set out to reach was to either confirm or debunk on how useful homogeneous 8-core designs would be in the real world. The fact that Chrome and to a lesser extent Samsung's stock browser were able to consistently load up to 6-8 concurrent processes while loading a page suddenly gives a lot of credence to these 8-core designs that we would have otherwise not thought of being able to fully use their designed CPU configurations. In terms of pure computational load, web-page rendering remains as one of the heaviest tasks on a smartphone so it's very encouraging to see that today's web rendering engines are able to make good use of parallelization to spread the load between the available CPU cores.
On the high-performance "big" cluster side, the discussion topic is more about whether 2 or 4 core designs make more sense. I think the decision here is not about performance but rather about power efficiency. A 2-core big-cluster design would provide more than enough performance for most use-cases, but as we've seen throughout our testing during interactive use it's more common than not to have 2+ threads placed on the big cluster. So while a 2-core design could handle bursts where ~3-4 threads are placed onto the big cluster, the CPUs would need to scale up higher in frequency to provide the same performance compared to a wider 4-core design. And scaling up higher in frequency has a quadratically detrimental effect on power efficiency as we need higher operating voltages. At the end of the day I think the 4 big core designs are not only the better performing ones but also the more efficient ones.
The fact that a SoC has more cores does not automatically mean it uses more power. As demonstrated in the data, modern power management is advanced enough to make extensive use of fine-grained power-gated idle states, thus eliminating any overhead there might be of simply having more physical cores on the silicon. If there are cases (And as we've seen, there are!) which make use of more cores then this should be seen purely as an added bonus and icing on the cake.
Apple and recently Nvidia with their Denver architecture both choose to keep going the route of employing large 2-core designs that are strong in their single-threaded performance but fall behind in terms of multi-threaded performance. While for Apple it can be argued that we're dealing with a very different operating system and it is likely iOS applications are less threaded than their Android counter-parts.
On the other hand, scenarios were we'd find 3-4 high load threads seem not to be that particularly hard to find, and actually appear to be an a pretty common occurence. For mobile, the choice seems to be obvious due to the power curve implications. In scenarios where we're not talking about having loads so small that it becomes not worthwhile to spend the energy to bring a secondary core out of its idle state, one could generalize that if one is able to spread the load over multiple CPUs, it will always preferable and more efficient to do so.
In the end what we should take away from this analysis is that Android devices can make much better use of multi-threading than initially expected. There's very solid evidence that not only are 4.4 big.LITTLE designs validated, but we also find practical benefits of using 8-core "little" designs over similar single-cluster 4-core SoCs.
So once again we have a hit-piece article from DED that is just manure and makes assertions that are not congruent with reality.
Back when the iPhone 4S came out with the A5 everyone was using 2 core processors. This is when the smartphone/mobile processor wars really took off and we saw significant gains every year. The difference is the path companies took when they encountered the "fork in the road".
- Apple decided to stay with 2 cores for their processors and worked at making more powerful cores.
- Everyone else (Samsung & Qualcomm) started on a race to add more cores and crank up the clock speeds.
So here we are today and Apple has the fastest phone/tablet mobile ARM processors with the A9/A9X while still using 2 cores. Competitors have their 8 core processors that can barely keep up in very limited scenarios and get smoked in real world use.
The question is, where do they go next? Look at Intel as an example. They have so finely tuned and optimized their x86 architecture that new processors are only seeing small gains from one generation to the next. Short of some miracle technology breakthrough we won't be seeing Intel ever release anything that's 70-90% faster than last years model.
I think Apple is close to the limits of what they can extract out of a single core, with the A10 probably being the last jump in performance while still using 2 cores. Likewise Samsung and Qualcomm can't keep adding cores and they can't keep cranking the clock speeds. This is where Apple has a HUGE advantage over everyone else.
Apple was very smart to spend years designing their own cores/microarchitecture. This represent a significant investment in money & resources. Nvidia made their own custom 64bit cores in the Denver, but shortly after they dropped it and started using regular ARM cores instead. Now Qualcomm is making a custom 64bit processor, but early benchmarks show it's still significantly behind Apple. Same with the Samsung Exynos 8890. It will take these companies a long time to get their cores as optimized as Apple (that is, IF they can even get there).
Meanwhile, Apple, with their high performance cores has lots of options. They can go to 3 or 4 core processors. They can bump clock speeds. Or do a mix of both. Apple could keep the existing A9 core without doing a single thing to modify it, and by adding cores/playing with clock speeds can stay well ahead of Samsung and Qualcomm.
Bottom line: Apple did the hard work first and it's paying off now. Samsung & Qualcomm were lazy and took the easy way out (using ARM designs and adding cores/cranking the clock) at the beginning and now they're looking at a very high brick wall that's going to be very difficult to get over.
To me, this is the brilliance of making the whole widget as noted by Steve Jobs!
All you've shown here is that you can read but don't understand what's going on.
Of course Android can schedule across all 8 cores. Anyone who thought this wasn't the case has no idea how a modern pre-emptive multitasking operating system works. It's the job of the OS/scheduler to assigns threads to cores.
You don't seem to know the difference between "scheduling" and "utilization". Perhaps you should ask your son to explain it to you, since you claim he's a coder. He'll tell you that there are virtually no Android Apps that can utilize all 8 cores (outside of a benchmark or highly specialized or optimized App).
Here's something to think about: a low-priority thread that's waiting for the user to press a key on the keyboard is spending 99% of its time idling and doing nothing. It will only perform processing when a key is actually pressed and it needs to handle the event.
This thread could be assigned to its own core, but it's not actually utilizing that core since it's basically sitting around doing nothing most of the time. So YIPPEE, one of those 8 cores has a thread assigned to it, but what's the benefit of having it sitting there doing nothing?
Another example: You have 4 threads of a browser (the example Anand used) running on the high-performance cores all downloading parts of a web page concurrently. Core 1 finishes first because it only had to download some text. It's now sitting idle. Core 2 + 3 have a few small images and they finish next. Core 4 is loading a video, but the server is making it wait a second because of high demand. It finally finishes last.
What were the benefits of having cores 1,2 & 3 finish early and sit around doing NOTHING while they wait for core 4 to get the last piece so the page can be rendered fully? Again, all 4 cores had threads scheduled to run on them but none of the cores were actually being fully utilized.
In cases like these it would be just as easy to run all the threads on a really fast processor core and do a context switch (again, ask your son what a context switch is). Each thread gets access to the full speed of the fast core and as each one finishes they move to a low-priority state. When the final thread is completed the processor can resume its low-power state as it's no longer doing any heavy lifting.
You can ask any developer about this. It's really easy to divide the workload of your application up into multiple threads. It's very hard to divide them up in such a way that they are all doing an equal amount of work all the time, and you never have a situation where a thread finishes first and sits around idle.
I think I understand the article well enough.
What you are attempting to engage in is goal post shifting. The assertion DED makes, and which you are repeating, and which the article refutes, is that there are NO Android apps that make use of more than a single core. The article clearly refutes that. You are now apparently trying to shift the narrative to that of absolute efficiency, and competing architectures, which are different topics.
I am very surprised AI haven't been all over this one, jumping up and down with glee. Here's an early Christmas present for everyone:
Samsung to quit mobile phone business within 5 years.
Why? It's just somebody's opinion, worth about as much as any other opinion. No reason to get excited.
Bull. You know EXACTLY what DED meant and you're being intentionally obtuse. He's exaggerating for effect, of course, but his underlying premise is still correct. And that is that there are virtually no Android Apps that can fully utilize more than one core. There are likely a few that do, but the overwhelming majority of them don't.
You're the one trying to shift the goal posts of Anandtech's article (which is nothing more than a discussion of thread scheduling across multiple cores) to imply that Android Apps are taking advantage of those 8 core processors. They aren't. Not even in the slightest.
On the point of local (on device) vs. cloud, it is important to note that today (and IMO the next 5+ years), despite ever increasing mobile/wireline broadband and huge cloud resources, there is still a noticeable difference between services executed locally and those done in the cloud. More power on the device allows more to be done there (hopefully in future a good bit of Siri voice recognition as an example), which makes a better experience.
This is a good thread, interesting. I use both platforms, and Google's cloud services are a long way ahead of Apple. I can point an android phone as a sign written in Chinese characters, take a photo and get a translation in a few seconds. This is insane. Won't be on-device for years.