UrbaneLegend
About
- Banned
- Username
- UrbaneLegend
- Joined
- Visits
- 6
- Last Active
- Roles
- member
- Points
- 182
- Badges
- 0
- Posts
- 48
Reactions
-
Mac Studio with M1 UItra review: A look at the future power of Apple Silicon
Again, I know there is no such thing as a built in eGPU. I am also expecting that the reader to also understand there is no such thing as a built in eGPU, I used the term as shorthand to describe the function I believe a discrete GPU in the future Mac Pro will have as an additional compute device.
I expect the Mac Pro will come with 2x and 4x SOCs with 64/128 GPU cores but also with a PCIe discrete GPU option with 64, 128, or 256 GPU core options. I think the discrete GPU will be more of an additional compute device for heavy lifting and not used as the frontline GPU which will be managed by the integrated GPUs on the SOCs. I used the term built-in eGPU to denote the function of a discrete GPU in the Mac Pro as it would have a similar role to what people use eGPUs for today with current Intel Macs but it would also have inefficiencies associated with it due to it being a PCIe device like eGPUs have inefficiencies associated with their usage.
When I wrote that I expected people to understand the nuance and not immediately get triggered and say, 'Bruv, you know a built-in eGPU ain't a fing.'
I might be wrong and Apple is planning something completely different and they have found a way to have a discrete GPU that doesn't come with all the latency inefficiencies of sitting on a PCIe bus that they've gone out of their way to eradicate with the M1 design philosophy. Heck, they may not even bring a discrete GPU to market at all in which case the Mac Pro will be rather disappointing as far as raw compute performance is concerned. -
Mac Studio with M1 UItra review: A look at the future power of Apple Silicon
Some of the answers as to why there isn't a more linear performance with the Studio Ultra maybe answered by looking at the history of AMD's Threadripper.
When the Threadripper was first released is was hailed for bringing huge numbers of CPU cores to the masses and in tests like Cinebench it excelled but testers and benchmarkers soon discovered that there was an inherent design flaw that if a piece of software used threads on separate CPU chiplets there was a latency penalty to be paid. IIRC AMD made a 'Gaming' mode which switched off software access to all but one chiplet. While the Threadripper presents itself to the system as a single CPU on the hardware level it is of course 4 mini CPUs flying in close formation which share a data bus to keep them all in sync. Depending on which cores were communicating this lead to tangible performance degradation. Software developers soon learned to keep all their threads on the one chiplet if possible. AMD has obviously done a lot of work on subsequent versions of Threadripper and Ryzen which have reduced these latency issues to virtually nil.
With the M1 architecture the situation is exacerbated as the M1 SOCs don't just have CPU cores but GPU cores and a whole host of other goodies. The lack of linear performance with the Ultra is probably due to the interchip latency either directly hitting performance or that the latency is so great that developers simply are not using the GPU or codec cores in the second SOC. It's early days and developers will be working to do what Threadripper developers learned to do and understand where the inefficiencies lie and work around them and limit the amount of core to core communication for a particular process and keep certain threads on one SOC. I can see this being particularly necessary with GPU based tasks which may be heavily impacted by the SOC to SOC bus.
Tasks like 3D rendering either on CPU or GPU are less likely to be impacted and show a near linear performance because rendering can easily be split evenly across cores.
I think the Mac Pro, if it has the expected 4x SOCs, will show even worse performance scaling just as the Threadrippers initially showed. I expect the Mac Pro to have an Apple discrete GPU but if this GPU is a PCIe based GPU I think it's function will be akin to a built-in eGPU as again the latencies of the PCIe bus will mean developers choose to use the integrated GPUs for most interactive workloads like video editing and the discrete GPU for heavy compute operations like science simulation calculations and rendering.
The M1 architecture is whole new beast and we have to temper our expectations and not believe all the marketing blurb Apple throws at us. In ideal situations the Studio Ultra will give near 2x performance but the problem is much of the software we run works in ways far from the ideal way for M1 which is why we're seeing a modest uplift with Studio Ultra over Studio Max to no uplift at all. It's all going to be application dependent. I've seen several benchmarks of video exports not showing any improvement, in this instance I expect it's not worth the latency penalty using the second bank of codecs for this task.
If I were buying a Studio it would be the Max version, right now it offers best value for money as you're getting the performance you're paying for. With the Ultra you're getting some of the performance some of the time.
If I'm right, in time Apple will reduce inter SOC communication latency just like AMD did with Threadripper so the scaling improves. Future versions of M (x) will scale much better and end up being better value for money.
I could be completely wrong but if I am explain why. -
Mac Studio review roundup: Incredible speed, that not everybody needs
fastasleep said:"Optimisation is going to be an ongoing effort, rather than a task we tackle just the once, and I’m hoping the team can see some improvements land in every release. We have big ambitions."
[...]
Basically this work is hot off the Press!
We’ve a lot of work in Cycles and Metal still to do, including Intel GPU enablement, stability/bug-fixes, kernel optimisation, unified memory optimisations, out of core support expansion, algorithmic optimisations etc. While we want to work on all of these areas, logistically we need to focus our available engineering hours into a workable subset, based on what we feel we can make the most meaningful impact with."
M1 Ultra - 1132
M1 Max - 706
RTX 3090 5552
If you seriously think Apple is going to magically close the performance gap chasm with optimisations then you're kidding yourself. You buy a Mac Studio today and you get the disappointing benchmarked performance with Cycles X, that's a fact. Even if some optimisation of code were to bring a doubling of performance (which it won't) the Ultra will still barely beat 6900 XT(2133) and 6800 XT (2254) and way off anything but nVidia's low end GPUs which are approaching end of life.
fastasleep said:bobolicious said:...it seems complicated...
How does the max benefit metal geekbench scores from even the iMac Pro from 2017...?
While that may be a big leap for mobile does that qualify for desktop?
CPU seems to range from 50~400% (major) bump, so as Verge seems to suggest is such app specific given emulation and a steep price ?
Nvidia 3060 specs seem to impress at under $500 yet no Apple support ?
Why the drop of Nvidia support, nor any upgrade capability for a FOUR THOUSAND DOLLAR computer...?
(well I could speculate why : )
It is of course still early days...
CPU performance in Blender is also interesting,AMD Ryzen 9 5950X 16-Core Processor 461.74 AMD Ryzen 9 3950X 16-Core Processor 384.53 Apple M1 Ultra 379.69
According to Geekbench you'd be fooled into thinking that the M1 was core for core better than everything around.
When Applications can lean on the fixed function hardware on the M1 SOC like codec support and ML then the M1 is a world beater, no question. When an application is unable to take advantage of these functions you're left with what amounts to underwhelming performance at a premium price which the Blender benchmarks expose. Cinebench also shows this same disparity in claims vs real world performance.
Anyone with late Intel based Macs really have little to no reason to upgrade unless they have an application or task that can make benefit of the M1 specific architecture like video editing. Other than that their Intel/AMD Macs are at least as good if not better than the M1 in raw horsepower. -
Mac Studio review roundup: Incredible speed, that not everybody needs
lkrupp said:With all the claims of inferiority made by the ‘experts’ here it’s a wonder anyone uses Apple hardware, right? I mean, according to these ‘experts’, Apple exceeds at nothing, is always two years behind every curve, lacks pro features, and is overpriced to boot. Why would any ‘pro’ user select Apple hardware?
Speaking as someone who has been in the creative industry for over 25 years and has owned a succession Power Macs and Mac Pros during that period from my point of view and I suspect many other 3D artists wanting to come back to the Mac will look at the performance of the Studio's GPUs will give it a hard pass.
Apple has one more chance with the Mac Pro to deliver the necessary GPU performance, maybe with their own custom GPU and if they can deliver that then they have a chance to get many former Mac based 3D artists back who were lost after Apple's Mac Pro releases went to rat shit in 2013 and became a laughing stock.
Professionals have different needs to fanboys who enjoy Macs vicariously through the medium of web forums, Professionals need performance which exceeds their preference for an operating system which is why so many had to reluctantly move on from their beloved Mac Pros. With Apple Silicon Apple has sewn up the video editing market, you'd be a fool to choose anything but a Mac for video work but 3D is one area that brute GPU performance is an absolute necessity and based on the Studio benchmarks Apple is still way off. -
Mac Studio review roundup: Incredible speed, that not everybody needs
fastasleep said:UrbaneLegend said:Incredible speed? Really...
Blender 3.1 Benchmark. Blender Benchmark is superior to Cinebench IMHO as it tests multiple scenes with various complexity across CPU and GPU.
M1 Ultra - 1132
M1 Max - 706
RTX 3090 5552
I guess Apple wasn't using this benchmark suite for their performance graphs.