Originally Posted by Dick Applebaum
I want to discuss your answers further -- you appear to have hardware expertise that I lack. I don't want to challenge your statements, rather to understand how they affect the computing world as I see it.
Sure, no problem
CPU architectures have always been one of my favorite tech topics by the way, I graduated university on reconfigurable VLIW processor architectures, and after that I've worked in the EDA industry (software for IC design and production), and now for the worlds largest supplier of optical lithography gear, so I think I can safely say I've been always been pretty close to the fire ;-)
I realize that 64-bit has little or no effect on hardware performance. But it can have a significant effect on OS and app performance (paging memory, video rendering, parallel operation, etc.). Based on the work being done by the user, 64-bit and additional RAM can affect the perceived power and speed of the "hardware".
There really is only one way a 64-bit CPU would significantly affect performance (measurable or perceived), and that would be when the combined memory requirements of all the tasks you have running on the system exceeds 4 GB, and there is no way to stuff more RAM into the machine. I think you can safely say this only becomes an issue if you are a power user, but it's an issue anyway.
That said, there are ways to address over 4 GB of RAM on 32-bit CPU's, such as physical address extension (PAE) on x86. I'm pretty sure something similar could be included in future 32-bit ARM designs.
Some very specific tasks will benefit from 64-bit registers by the way, but I honestly don't think the lack of 64-bit ARM designs is holding back the platform. Sure enough when ARM goes 64-bit, it will make another big step up in performance, but it won't be because of the 64-bitness. Just progress as usual
I think I knew the answer to that, before I asked it.
However, Apple has done a lot of work to make their OS(es) and apps (including apps by 3rd-party developers):
-- largely independent/abstracted from the underlying hardware
-- to use OpenCL and GCD wherever possible
-- to exploit parallelism using any available CPU and GPU cores
-- for lack of a better phrase distributed processing
This definitely true. However, for now, Apples OpenCL and GCD efforts are not really paying off yet. There's not a whole lot of applications using OpenCL, and the ones that do are relatively specific (video coding, some forms of scientific computing, etc). GCD is mainly a tool to make it easier for developers to write code that would benefit from multi-core/multi-processor setups, but it doesn't allow anything that wasn't already possible otherwise. For example Intel itself has their Threading Building Blocks technology, which is more or less meant to solve the same problem.
If Apple has done its job well, I believe that a power computing solution, for the near future, would be a series of "compute boxes" daisy chained on a thunderbolt cable along with RAID Storage, peripheral docks, wireless stations, and Displays.
These "compute boxes" would consist of:
-- enough SSD to run a minimal OS
-- CPUs and GPUs
-- an internal power supply
-- a fan if needed
-- small packaging like the Mac Mini or AppleTV 2
The theory is that as your compute needs grow -- just add another "compute box" to the daisy chain.
These "compute boxes" could contain whatever CPU and GPU architecture that provided the required price/performance.
Very interesting thought. I was about to write how I personally don't share the same vision, but now that I come to think of it, what you describe would definitely have some interesting applications, and we may actually see something similar in the future.
I don't think it would be something people would have at home though, it makes much more sense in the context of the typical distributed computing that big-iron compute clusters are used for now. The idea of clustering computers, adding and removing nodes to increase the total computational capabilities of the cluster are not new of course, but with a ridiculously fast port like optical Thunderbolt (100 GBps) you could imagine 'miniature' compute clusters that don't take racks of big computers and expensive network infrastructure, yet largely eliminate the difficulties that compute clusters have. In a traditional compute cluster nodes don't share the same memory, so every job has to be chopped up, distributed and the results have to be assembled afterwards, which means the network infrastructure between the nodes becomes a huge complication and bottleneck, to the extent that it rules out many interesting use cases because the communication overhead far outweighs the computational gains. Thunderbolt could greatly increase the set of viable distributed computing use cases, because it reduces communication overhead by a very large factor.
That said, stuff like this only makes sense for computing tasks that take a long time and can easily be chopped up smaller pieces, or if you have many tasks running concurrently. With 'running concurrently' in this case, I don't mean just having 20 applications sitting on your desktop, because usually only one or two will actually be using the CPU, with the rest sitting idle most of the time. I can't really imagine a whole lot of garden-variety use cases that require massive parallel CPU power, except if you like doing video editing at home and such.
For speeding up single, dual, quad or even 8-threaded applications, a CPU with multiple cores/multiple hardware threads will always be faster than a cluster of compute boxes, no matter how fast the interconnect is. Nothing beats on-die scheduling and execution of tasks, with a direct data path to shared system memory. Even if I really try my best to get my quad-core i7 iMac to its knees doing large multi-process Xcode compiles with a Linux VM running in the background while playing a video, the CPU is hardly the bottleneck. Ivy Bridge will provide a comparable performance level at a TDP around 40 Watts next year, even with the cheaper entry-level parts.
Of course nobody knows what kind of applications we might see in the future that would map favorably to a compute cluster, but right now, my impression is that we've already eliminated the requirement for faster CPU's for home/office use a few years ago. This is exactly why Intel should be worried about losing out on the 'low-end' and mobile side of computing, because soon, any ARM CPU will reach the same baseline level of performance, and Intel will have a really hard time convincing people they need faster CPU's.
The only problem I have with your last paragraph -- is that for ARM chips to be used in low-end, cheap computers and laptops.
As I understand Windows 8, in order to run legacy x86 apps the device will require an x86 CPU. This would appear to eliminate the use of ARM in low-end, cheap computers and laptops.
Further, developers might be discouraged from rewriting their x86 apps for Metro/ARM because of the disincentive of paying MS 30% for the privilege.
I have no problem with the curated Metro store or the 30%...
But I think it is a chicken/egg thing -- without a lot of Metro apps there won't be any Metro tablets (and low-end, cheap computers and laptops) -- and without the Metro tablets et al, there won't be any incentive to port x86 apps to Metro/ARM.
Personally I think tablets and possibly Chromebook-like devices (the 'low-end') will replace laptops for many tasks, with 'real' laptops/desktops marginalized for all the other, 'serious' computer tasks (the 'mid-range/high-end'). This will almost inevitably mean ARM for 'low-end' computing, x86 for everything else. So I'm not sure if the lack of backwards compatibility with x86 desktop Windows is really going to be a big deal in the future.
If I were in charge at Microsoft I would restrict Windows 8 for ARM to Metro apps, and force all Metro apps to be universal binaries (i.e.: ARM + x86 compatible) by the way. I don't see why anyone would want to run x86 desktop apps on mobile hardware.
Finally, I don't know this, but based on past performance, I suspect it is true:
Say there is a breakthrough and a new computer architecture suddenly arrives on the scene. Apple is in a good position to migrate their OSes and apps, natively, to exploit that new platform. And through something like rosetta, existing iOS, OS X and Windows apps could run at normal speed in emulation. Third-party iOS and OS X could run native with a simple recompile.
Apple has bet the farm (and won) on this kind of revolutionary migration -- no other OS or hardware mfgr has.
Yes, there's no denying that. Apple has proven they know how to switch architectures and are not afraid to actually do it. I would not be surprised if at some point, they would release a MacBook with an ARM CPU in it, just not one replacing the x86 CPU, but adding it alongside, allowing you to boot or even switch on the fly to ARM or x86 OS X, depending on your computing needs. Imagine a MacBook Air that would do 12 to 20 hours in 'ARM-mode', but still have the performance of a fast laptop in 'x86-mode', using universal binaries and/or ARM emulation for interoperability.
I'm not sure we'll see any big breakthroughs in computer architectures anytime soon though. You'd expect at least some hints pointing in this direction in the form of research papers and such, but apart from quantum computers, I'm not aware of any such thing. But who knows what could happen...