Insanely Upgradeable Mac Pro
Apple should make Insanely Upgradeable Mac Pro computers like these (Mac mini, Mac mini Tower, Mac Pro plus standalone displays for them):
HP unveils its insanely upgradeable Z-class workstations
Does 3TB of RAM, dual Xeon CPUs and dual NVIDIA Quadro P6000 GPUs work for you?
Virtual Reality, Machine Learning and Design Needs Spark Reinvention of HP Z Workstations
Most Powerful Workstation on the Planet Part of HP’s Next Generation Workstation Portfolio
HP’s MOST POWERFUL PCs
Z WORKSTATIONS
Experience what the unprecedented power of the world’s most secure workstations1 can do for your vision.
Comments
https://www.theregister.co.uk/2017/05/16/aws_ram_cram/
Here's a VFX workstation spec used at Dreamworks (2015):
http://www.techradar.com/news/world-of-tech/inside-dreamworks-how-animated-movies-are-rendered-1127122
Dual 8-core Xeon (2012/2013 CPU), 32GB RAM, 160GB SSD, NVidia Quadro 5000 (2011 GPU, <1TFLOP). Model was an HP Z820.
Look at the statement at the end:
"DreamWorks says it has a close partnership with HP because it doesn't like to get surprised by technology. The multi-core systems are fine-tuned for production work. In the hurried pace of creating an animated film, the computer has to be a reliable, predictable tool."
When you look at their earnings, you can see how much they spend on hardware and the upgrade cycle (2-5 years):
https://www.sec.gov/Archives/edgar/data/1297401/000129740116000026/dwa-12312015x10xk.htm
They spend ~$100m/year. That includes upgrades to their 20,000 core renderfarm using HP blade servers. The following article from last year says they are upgrading to 20-core Ivy Bridge Xeon (dual 10-core), 96GB RAM, dual 480GB SSD, Nvidia Quadro K5000 (2TFLOP).
http://www.alphr.com/technology/1003629/dreamworks-the-tech-bringing-animation-to-life
The following gives numbers for the workstations at about 1500 units (they have 2700 employees) and they want to look at hardware leasing rather than ownership:
http://www.computerworlduk.com/data/how-hp-powers-dreamworks-animation-3659490/
HP says it's not a big moneymaker for them, it's more about PR. This is obvious from their earnings. Workstations are $1.8b (6% of their revenue) (equivalent to 600,000 x $3k units). Apple makes $22b from Macs. Even if Apple managed to become the biggest workstation manufacturer in the world, it would be less than 10% of their Mac revenue and less than 1% of their entire revenue.
What these companies are asking for more than anything is trust, just like with Final Cut Pro. Their business depends on the decisions they make so they can't depend on a company that pays very little attention to those products. They are using hardware for years so upgradeability clearly isn't such a big deal but upgradeability gives them options. They are opting for NVidia GPUs, as most companies do, and if Apple only offers AMD then they can't get what they want.
This can be solved easily with some kind of additional PCIe slot. Thunderbolt can cover this but only if the OS supports it, which Apple didn't until recently. In 2019, PCIe 5 will allow for 4x bandwidth of what current PCs have so Intel might be able to introduce a 2-4x speedup over the already fast 40Gbps TB3.
The current Mac Pro can already support up to 512GB RAM with 4 RAM slots. CPU can go up to 24/28-core. It can support up to 4TB SSD. GPU power is enough for ~12-14TFLOPs. If they went with a single GPU internally, that would free up PCIe lanes. If they had something like a mini-PCIe slot, there could be a cable like what is used for the Cubix Expander:
http://www.cubix.com/product/xpander-adapter-with-cable/
It would just come out under a notch at the back. Then you can plug whatever you want in, even 4 GPUs externally with a 1500W PSU without changing the form factor of the main unit. Or they can have a custom connector and custom PCIe containers with their own cooling.
In addition to this, there's room for storage without the other GPU so there could be some storage connector that allows for large, lower cost SSDs. QLC SSDs can go up to 100TB in a single 3.5" drive. Typical workstations probably wouldn't need more than 10-15TB for active projects. Samsung's 16TB SSD is $10,000, SAS connector for 12Gbps.
It would still be expensive but this is where an upgrade program helps. HP wanted leasing but if Apple charged $1k upfront and $99/month with the option to trade-in to a new model every 2 years, people would keep upgrading and don't have the hassle of selling a very expensive machine on eBay and don't have high upfront fees. They can offer a higher priced NVidia option for the people who really want it but external allows for anything.
The enthusiast/gamer market is looking for for a cheap tower around $1000-2000 with standard i7 and being able to put any GPU in. This doesn't work well with Thunderbolt and a Mac Pro upgrade program would work for this market too and they don't have to get their own GPUs, they just get the next upgrade or they can buy the refurbs of the 1-2 year old models from Apple.
If they decide on internal upgradeability, there's no point in putting 9 PCIe slots for the handful of people that will use more than 2. 4 RAM slots is enough and a single PCIe connector of some kind, especially if it's PCIe 5 will be able to connect anything and I don't think it should be a full-size internal slot because that just means a mess of internal cabling, power/cooling issues and having holes in the casing. They could get away with a dual trashcan. You buy the base unit and if you need to, stick another on the side with some GPU or PCIe card in it.
The Mac mini can be turned into stick-like computer like Intel's compute stick once SSDs are cheap enough:
https://www.reddit.com/r/pics/comments/32q7pp/this_is_the_size_of_entire_motherboard_w_cpu_gpu/
https://www.intel.com/content/www/us/en/compute-stick/intel-compute-stick.html
People can plug it into a TV or monitor very easily and it makes coding accessible. The iMac covers the mainstream premium segment so they can make the margin on the display sale instead of that going to another company and they can improve the user experience with a nice display.
Apple's upcoming professional display will likely be no bigger than 30", I'd expect space grey, laminated glass, as slim as the edge of a current iMac, 10-bit, 5K or 8K (if Thunderbolt 4 in 2019 uses PCIe 5 then it can handle 8K/10-bit/60Hz = 60Gbps).
The 'Think Different' and dreamers, creators, etc. are the pro market... from prosumer types up to the true pros that need high end equipment. I don't think apple needs a box with 24 RAM slots, but they need more than their current offerings. And... I think they likely need more than what the next Mac Pro will be.
They also really need a box that is an iMac w/o the display and then either be big enough to have a single GPU slot, or possibly eGPU now. It would be easy to produce, and fill a large segment between the mini and Mac Pro who don't want an iMac. iMacs are fine, but unless they open them up to external signals or enough variations, some people want to setup their own display and peripheral environment.
The reason to switch the GPU is about computing and GPUs have only been used for compute because they happen to be built pretty well for it. I think that Apple could build a better computation engine from their own silicon. That would be the best thing to offer people who need a lot of raw computing power. It would be like a box of ~100 ARM cores (50-100TFLOPs, ~$5k) that was tuned for OpenCL and scientific computing. This can be used in place of special purpose media decoders like RED and could work with real-time content like Da Vinci. You'd just plug it in via Thunderbolt to any Mac system and if you need to render an FCP X timeline to MP4, it can spool the media over to the ARM cores to encode in blocks in parallel. It would need processes to be compiled for it but Mac rendering binaries can be compiled for it no problem.
The people who want expansion generally want to put in more power so why not give them more power directly instead of more expansion where they have to buy the power components on top, which are more expensive and less powerful as they each have to fit into power limits. Maxon/AutoDesk/Chaos/BlackMagic/Adobe won't make all of their heavy processing software work on GPUs but they can easily make it work on a 100-core ARM cluster when it's as easy to compile for as the Mac binary. They can call it the Apple Core. The Mac Pro cylinder is the right shape for it. The front-end to it would be any Mac system and it would have its own memory and own OS. If people need more power, just buy more Apple Cores and plug them in, scale up to 400 cores (200-400TFLOPs). Nobody can compete with this because they can't make products like this. Intel and Nvidia would never make them as cheap, AMD might but they'd never take the risk on building it as it has such a small market and none of them has their own OS.
When you look at past examples of Macs in computing e.g https://en.wikipedia.org/wiki/System_X_(computing), that's $5.2m for 1100 G5s in 2003 to get 20TFLOPs max. A single GPU can do that now.
Given that it's Apple's own silicon, they can approach the scientific/arts communities to find out the most intensive and widely used computing algorithms and they can make custom silicon for those tasks with APIs without the overall hardware being special purpose. Like how they put hardware noise reduction into the new iPhones, that would work for Da Vinci with temporal noise reduction.
They should be able to make something that performs like this:
The second video shows a $40k quad-CPU machine. That price shows why talking about high-end hardware for the performance alone doesn't make sense. If hardly anyone can afford it (the company wouldn't even ship it) then there's no point in building something like that because if someone returns it, the company can't just sell it to someone else. It's also huge and not everyone wants that size of machine. Apple builds machines as close to getting one model to fit as many people as possible, that's one of their strengths (compare smartphone manufacturers making dozens of models).
A modular system allows for this and the modules are better external and powered separately otherwise they would either be constrained too much or the base system would be too big. That's what Thunderbolt was designed to do. It is bandwidth-constrained when dealing with things like 8K so there can be a custom high bandwidth connector for special use cases or it may not matter with PCIe5 coming.
There's only so many ways that a box can be built: it needs a power supply, a motherboard, a cooling system, one or more CPUs, one or more GPUs, some storage and memory. The base system needs 1 CPU, 1 GPU, 4 RAM slots, SSD storage. Beyond that, they can attach other modules externally e.g slot a GPU or custom board onto it with it's own power supply. Slot multiple base units together for multi-CPU systems or have the custom ARM cluster attached.
http://appleinsider.com/articles/17/04/04/all-new-mac-pro-with-modular-design-apple-branded-pro-displays-coming-in-2018
On the meaning of "modular" -- everyone basically fits that word to what they think they want or what they think Apple should be doing. That word has shut up a lot of people for now, thank god, but once the reality is introduced the response will be ... cacophonous. No matter how complete a solution it is, it will only meet the expectations of some. The rest will be disappointed, loudly.
We discussed this all before, likely in that thread Mike links to above (which I haven't looked at now), but the more I learn about it, the more I think we might be looking at another attempt at a paradigm shift -- they tried it with the cylinder and failed, but the tone of that media session was more like "we're going to take another swing at it" rather than surrender. I know I keep harping on this with no actual basis, but I still think the driver behind the Mac Pro announcement was something about Cascade Lake's access to persistent memory -- the timing of the session was just about when Apple would have gotten the details (if not prototypes) of what Intel was up to. The imprecise timeline of "2018 at best" might indicate they knew Intel's release dates for Xeon-SP 2nd generation (Cascade Lake) might slip. Whereas Xeon-W was all set to go for the iMac Pro, so they could be definite about that.
The only downer is that HP's Z6 line shows that the new Mac Pro might well just be single-socket -- Apple doesn't have to go dual-socket to take advantage of Purley. That's good from an affordability standpoint, of course -- a 2S Mac Pro would be, um, expensive.
The only other question in my mind is what will they do with the GPU. I see a CPU/Memory/System/Storage module, but I'm not sure the GPU fits into that. Maybe not. I mean, the whole point of the session was to lament how the GPU ruined their elegant solution. So it's not crazy to think the session was intended to start laying the groundwork for the idea that GPUs should be "architected" into separate modules, because "thermals."
https://www.tweaktown.com/news/59265/amds-next-gen-vega-20-uses-pcie-4-arrives-q3-2018/index.html
Single socket isn't much of a disadvantage these days. The $30-40k computers in the videos above are quad-socket. It comes down to the price point being targeted. By the time you get beyond 12-core, the chips are $3.5k each and the number of people buying above $10k is so small that it's not worth making the hardware. Single socket goes all the way to 28-core now:
https://ark.intel.com/products/120508/Intel-Xeon-Platinum-8176-Processor-38_5M-Cache-2_10-GHz
There's always the option for people to buy more than one machine to get as many CPU cores. HP has 20,000 cores in their network, they just buy more machines.
Every computer needs a GPU in order to drive a display so I think the base would still have a GPU. Apple will be making another professional display. I'd expect the display inputs to be based on Thunderbolt but I think it will be Thunderbolt 4 in order to support 8K. 8K/60/10-bit = 60Gbps. TB4 will likely be 80Gbps.
Right now, GPU tech is at 50GFLOPs/Watt. By the time the next Mac Pro ships, it may be 75GFLOPs/Watt. This means a 250W GPU would be nearly 19TFLOPs. It should be over 12TFLOPs anyway (an XBox One X is 6TFLOPs).
A single internal GPU like that could be in the base unit and used to manage the framebuffer and connect to the displays.
That base unit would suffice for the majority of Apple's Mac Pro customers. Since they'd only have a single internal GPU, that leaves a lot of PCIe lanes for the modules. Given that the fans are circular, having a cylindrical shape still works well but the modules can be attached to the sides to form something like this shape (internal ~8", external ~5"):
It is pretty much the GPUs (RED decoder is also a GPU) that would go outside as those are using a lot of power. They would need their own power supply and cable but that external case can have a large PSU (1kW) at the bottom and the base unit can slot into the middle, which saves having a power supply for each. If it supports full height GPUs, that outside part would be taller than the unit in the middle.
This would allow for up to 4 total GPUs (76TFLOPs). It wouldn't even need the one at the front, 3x GPUs would be enough (57TFLOPs). 4 RAM slots would allow up to 512GB RAM. XPoint persistent memory would allow cheap memory so for example, they can have 2x 128GB RAM and 2x 512GB XPoint for over 1TB of RAM.
SSD will offer up to 4TB but having these external units can allow a PCIe SSD add-on or some multi-drive RAID hooked into the PCIe connector.
This means that when a new Mac Pro comes out, you just upgrade the base unit and keep the external part. The base unit would have an AMD card but the external unit allows for Nvidia cards.
But, I'm not sure Apple has to try and build the fastest commercial machine on the planet or anything. They just need something better and more versatile than their current offerings. While the iMac is a nice machine, it's a bit too 'all in one' for some, including me. They really have nothing else to offer the prosumer to pro customers.
And, as I've mentioned in other threads, we have to define what Pro even means anymore. In the way I've always used it, it pertains more to a quality of the equipment, than to the professional vs consumer. A lawyer might use a Chromebook to be way more 'pro' than I'll ever be if we're talking hourly billable work. So, what I mean is a machine that can run 24x7 at max output, is durable/reliable, etc. And, then we start getting into what is needed in terms of CPU, GPU, RAM, storage, etc. to get particular 'pro' type jobs done.
Personally, I'd like a more mid-level machine that doesn't cost and arm and a leg that can handle being pushed for extended periods of time and remain reasonably quiet. If the 'trash can' had TB3, and updated GPU, and was reasonably priced, it would fit the bill quite well for *me*. Others need a lot more GPU or other expandability, but the 'trash can' was headed in the right direction, it was just a generation or two too early. And, I think there is enough of a market to keep a 'cheese grater' type machine as well.
Apple's blunder wasn't the 'cheese grater' or 'trash can' so much as it was not listening to their pro community and just keeping the dang hardware up to date. How hard would it be for them to just release an updated version of both with modern hardware? You'd think that would be a weekend project for the teams that work on these machines.
I don't need some new wiz-bang design so Schiller can make some quip about innovation. I just need a solid pro and prosumer machine.
Also, you can now do a heck of a lot external. But you still need some basic level of CPU/RAM/GPU/storage in the box at some minimal level for specific types of jobs. Whatever Apple comes up with, it needs to hit that target for the majority of pro needs.
Yes, GPU is the big gotcha, as different kinds of 'pros' want different GPU architectures, and there is some minimal amount of GPU power that will keep them happy. Also, since GPU tech moves rather quickly, many want to keep up, as it quite literally pays for itself. (Though, I suppose it could be argued that if Apple just kept up, most of these people would happily buy the new model rather than trying to swap GPUs.)
Yes, you need a base amount of GPU to get whatever job done locally. After that, most people probably just buy cloud computing power anymore. I used to do 3D rendering and setup a bunch of machines locally and remotely as a 'render farm.' But, people are now doing this with Amazon, Google, etc. cloud computing where they can just quickly spin up whatever scale of computing power is needed for a particular project... on demand, only paying for what they use... no overhead!
That doesn't work for GPU as well though, if you're trying to pre-vis the scene while setting it up, or playing the latest FPS title. Maybe someday.
I bow to your argument for a basic GPU in a core cylinder that fits into an optional trilobal base for expansion, but I still wonder about it. What if the base were not optional? It might be possible to create a more elegant and efficient core cylinder without the GPU, thereby achieving a more attractive and natural overall shape. For those who want 3x GPUs, they can still go there, but most would use the two unassigned modules for storage and/or memory.
It would be about 2U tall, they'd fit about 10 units along the width and 7 depth-wise so ~70 units per 2U rack with 350W power consumption.
Very cheap for dedicated servers. The processes would have to be compiled for ARM but that's not such a big problem and with things like system integrity protection and filesystem sandboxing in iOS, I think they could make a pretty secure server out of the box.
This has been done with Raspberry Pi boards:
https://www.raspberrypi.org/forums/viewtopic.php?t=155030
Rather than raw boards, Apple would have a product with fan and heatsink with a standard connector.
One of Apple's strengths here would be ease of configuration but also an App Store. Desktops/mobiles have benefitted from having app stores but servers still do the command line config with binaries that you just hope work ok and don't know about vulnerabilities or the developer.
A customer would buy/rent their dedicated Apple box (or multiple boxes), choose what size of separate storage they wanted via a hosting company and it would have a neat iOS-style configuration UI, which could be managed with an iOS device as well as a desktop with graphs showing traffic and status.
To add software, they would get these from the server app store but the apps would behave differently from iOS apps as some need to run all the time as services. But it makes some things much easier like image encoding. If someone uploads an image to a server, they can have a Pixelmator type app (no UI) that does the conversion using an API.
These servers would also be available for home use so if you wanted a personal email server at home or in a small office, a node could be hooked up to a Time Capsule or something. Then you'd buy an email server app from the store and install it. Something that would take hours to setup normally by someone with IT training could be done in minutes by anybody.
For web hosting, that too would be an app so PHP/Ruby/Node whatever would be installed in a self-contained app. Apple could have a team to offer official versions of the most popular open source services. Microsoft/IBM/Oracle would offer their ones.
It would be low cost for Apple to get in this way and it's cheap for end users. A lot of the revenue would be the service revenue from the software.
It's the price that really kills a lot of the high-end hardware sales. That affected the XServe too and the 17" MBP. The Mac Pro entry price went from $2200 to $3000, Powermacs were as low as $1800.
I think if they drop one of the GPUs from the base Mac Pro, they should be able to hit $2500. I still don't think that's enough to boost sales but there are upgrade programs they can make to help with that. If they offered purchase options like $1k upfront with $99/month, businesses would be able to order a bunch of them and be able to upgrade them at any time within 2 years.
The Mac Pro cylinder works ok for the majority of professional use cases, the heavy computing users are a small number of people. For that portion, they can add an extra external unit like so:
The base Mac Pro unit can have a connector on the bottom and it can twist onto the external. The cylinders to the sides would be large enough to hold full length cards vertically but no external ports (they could restrict it to half length cards to match the height). Any cabling would have to be fed out under a notch at the back. The internal and external units would have separate power supplies but there can be a split cable to power both from a single wall socket.
The external power supply could be 600W and power two high-end GPUs for 3 total or one high-end GPU and a custom card or storage product.
This means that when they come to upgrade the base unit, they unscrew it and sell it separately and just screw the new one in place. The connection would be tight enough that the whole unit can be picked up using the base unit handle.
The external pods could be custom-made by Apple and similarly screwed into place but it's easier to just let them use 3rd party components. The separate cooling per unit keeps them quiet. It works for storage because there can be RAID cards with 4 connectors to get 4x 16TB SSDs:
They don't even need a custom external really. This can be done using a mini PCIe connector from inside the base unit and 3rd parties can build the boxes but either way, it allows Apple to update the Mac Pro infrequently without holding users back and they can just drop the prices a bit over time like they did with the 2013 model to stay competitive with PC price/performance.
They make a Mac Pro that is say a 4 brick case for local use or sell bricks stand alone so customers can mix and match cpu/ gpu or special units to suit. An office might have a small rack unit with a few MacPro bricks in it, or scale up to a full data center like Apple could do themselves.
As for upgrades well the market take care of most of them. Apple switches their brick to latest A series as soon as they are making enough to cover that yearsiPhone.
It's basically a taller dual cylinder. I was thinking about the internals of this having sleds like the old Mac Pro hard drives. There would be 3 or 4 put in sideways. The CPU sleds would have the RAM and an SSD blade on them. This would allow e.g 3x CPUs and 1 GPU or 1 CPU and 3x GPUs and it means that when Apple gets an order they just slot the ordered components into the unit and ship it and they could easily reconfigure returns, damaged boards can just be pulled out. The heatsinks would go between the sleds and they'd have some on the insides.
However that design works just as well and is probably more practical. The CPUs aren't mentioned but I assume one or two are are on the inside of the triangle heatsinks.
It needs to be that tall to support full length off-the-shelf GPUs but Apple can make more compact, better performing and better cooled boards than these companies:
http://www.nvidia.com/object/pf_boardpartners.html
If they supported half length, that would be enough for some 3rd party options ( https://www.vortez.net/articles_pages/zotac_gtx_1080_ti_mini_review,1.html ) and they would only need to make a couple of their own boards. Having separate GPUs makes it problematic to get the video out the Thunderbolt ports at the back, they might need to put a fixed internal GPU in but that design allows for this and this would work better for compute as both the other GPUs can be set for computing tasks while the fixed internal manages the displays.
I wouldn't rule out them making something similar to this. It's based off their design. I definitely don't see them going to a box design again because it's hard to make a box shape look good, especially with curved edges and the fans are circular. The new one looks so much better:
A double length one would still be compact. Power supply would need a boost from 450W to 1kW. TB4 will allow for 8K output so I don't think it will ship until late 2018 at the earliest but they could show it at WWDC 2018 like before.
Making a Mac Pro out of ARM cores would make it a lot less expensive but I think they'd have to go all-in with ARM Macs to avoid having software conflicts. For servers there's less of a software barrier because people don't need that many different apps. Usually it's just one or two main services (server + database) running the whole thing. Everything else: firewall, SSH, FTP, control panel can be included by the OS. With an app store (which I'd expect to be based around service pricing e.g $0.99/month) where people are accustomed to buying service apps, the software issue would be taken care of.
They could certainly have main nodes as server controllers that just use other nodes for load balancing and scaling up compute tasks. I think this would work for independent studios. They'd buy standard Macs and then racks of ARM chips. 70 quad-cores (280 cores) would cost ~$10k, more if each node had more memory. They'd compile whatever compute app (it would support OpenCL too) for the rack and then control it via the Mac systems.
BTW, does anyone know how many TB3 controllers can be implemented? It looks like 4 maybe from a bit of searching I did, but I'm in over my head in technical detail. That probably limits any kind of modular expansion in the sense of separate boxes tied together. I suppose if the 'core' unit had 4 TB3 channels, you could go external with up to 2 GPUs, storage, and I/O all very close (~90%?) of having them internal. (There was a video speculating about this kind of modular from the Macsales folks.)
The main benefits are that they are cheap and can be custom made. The chip in the Apple TV is sold for $179 along with a remote (that costs $79 to replace) and 32GB storage and a power supply. Take out the extras and you get a box that is <$100. Boost the core count, up the clock speeds, make it actively cooled and it could make for a very inexpensive computing product.
Intel doesn't make their high-end processors cheap because they don't need to. They own most of that market, they'd just be throwing money away. They have a single chip that costs $13k:
https://ark.intel.com/products/120498/Intel-Xeon-Platinum-8180M-Processor-38_5M-Cache-2_50-GHz
The highest geekbench score is the following 8-CPU (18-cores each) machine at 172,099:
http://browser.geekbench.com/geekbench3/5471006
https://ark.intel.com/products/84685/Intel-Xeon-Processor-E7-8890-v3-45M-Cache-2_50-GHz
Apple's A10X is 9562 and this has a GPU in there too:
http://browser.geekbench.com/v4/cpu/4262233
Intel's chips are 165W each so that machine is using 1.32kW of power to get that result. Apple's chips are passively cooled and using ~5W. The Xeon machine there is 18x faster. Even if Apple's chips were left at the same performance while actively cooled, 18 chips is only 90W of power and $1800. Intel's chips are >$7k each so 8 of them is >$56k.
Obviously people can buy cheaper Xeons and do something similar but Apple's chips rival the lower Xeons in performance and these are 85W vs 5W and ~$300:
https://browser.geekbench.com/processors/2003
I think there's a solution Apple can come up with that would be compelling for some kind of server use. Not for hosts that prefer the high density chips but the colocation hosts and independent studios that want more affordable options. They also have the freedom to implement custom hardware acceleration for creative and scientific tasks.
The Xeons may be better at multi-tasking and as I say, I wouldn't expect ARM to replace high-end Intel Mac systems but they can make compute cards like Intel's Xeon Phi card. If they had custom hardware acceleration for common algorithms like noise reduction, blurring, raytracing, computational biology, people would be all over it.
They wouldn't need to hook them together with Thunderbolt. Although TB can transmit power, every high power module would need a separate power cable if they did that and it would be a mash of mismatched shapes. If they went the external module route, they only need the base unit plus an external unit and the external can hold whatever can't go inside the main unit. There would be a fast connection for that similar to a riser card. I don't think it will be something highly expandable, just enough to cover what the use cases are.
Apple is very practical about their designs, there's no reason to design something for a use case that is never going to happen. i don't think the end result is going to be all that far from the cylinder because they put a lot of thought into how to get the essentials into that unit. The main flaw was them not being able to get new dual GPUs to put into it. The above mockup follows a similar design and solves this problem.
It doesn't need to be that tall because ideally the GPUs would be stuck to the heatsink and wouldn't have their own fans, that's the point of the heatsink. The heatsink in the above mockup on the hard drive side doesn't serve much purpose if there's no chips attached to it. It would be best to have a fixed internal GPU on one of them and the CPU on the other. If Apple turns addon GPUs into modules that can be easily switched by them then they don't need to keep the overall unit refreshed often but they can offer new GPU modules after a year or two and upgrade them in-store or ship them out and 3rd parties can design to the same module spec.