Apple Silicon 13-inch MacBook Pro nearly as fast at machine learning training as 16-inch M...

AppleInsider · December 16, 2020 2:20PM

While Apple still needs to fully optimize the M1 processor and its software for the task, a 13-inch MacBook Pro with Apple Silicon performed nearly as well at a machine learning test as the 16-inch MacBook Pro with dedicated Radeon graphics.

The M1 processor is up to 3.6x faster at ML training vs Intel

Benchmarks for the M1 processor have been impressive so far with scores rivaling even the most expensive Intel MacBook Pro configurations. These are early days yet as software continues to be optimized for the processor, so some tasks and processes will see big speed jumps as developers take advantage of the hardware.

One space that the M1 processor should excel in is machine learning (ML) processes. As with Apple's A-series chips like the A12Z Bionic, the M1 has a dedicated Neural Engine used for complex data processing and ML. Apple says the M1 Neural Engine can handle up to 11 trillion operations per second when in use.

This processor is not best in class in terms of machine learning however, as dedicated GPUs from companies like Nvidia boast even higher numbers for neural operations. The first generation of Macs running Apple Silicon only have the M1 processor to rely on--no additional GPU options are available.

The developers at Roboflow wanted to pit Apple's new machines against the older Intel variants. The processor transition has only just begun for Apple, so tools like TensorFlow have not yet been optimized to run for a full benchmark test.

The testers chose to use Apple's native tool called CreateML, which allowed developers to train a machine learning algorithm with object-based learning and no written code. The tool is available on the M1-based Macs, so the testers believe it should have been properly optimized to perform the test.

They chose to compare the 13-inch MacBook Pro with an M1 processor and 8GB of RAM to the 13-inch MacBook Pro with Intel Core i5 and 16GB of RAM which has a dedicated Intel Iris Plus Graphics 645 card. The 16-inch MacBook Pro with an Intel Core i9 processor, 64GB of memory, and a dedicated Radeon Pro 5500M was also tested.

The Roboflow team decided to run the test with a no code object recognition task. They used the 121,444 image Microsoft COCO object detection dataset, then exported the assets using Roboflow software to convert it to Create ML format. They ran the CreateML software using a YOLOv2 object detection model over 5,000 epochs with a 32 batch size.

The COCO dataset used is a large image database of objects that should be easily recognizable by a 4-year old, and is used to test machine learning algorithms. YOLOv2 is a type of image recognition that uses boundary boxes to show where an object is in an image. An epoch is one cycle of a test and a batch size is the number of objects run though each cycle.

Basically, the computers will be shown a series of images and have to decide what is being shown based on what it has learned from what it was shown previously. As it sees more images of a given object it will get more accurate at identifying that object in other random images.

The results:

The M1 based MacBook took 149 minutes to finish the test with 8% GPU utilization
The MacBook running the Intel Core i5 took 542 minutes to run the test, though didn't use the Intel Iris Plus Graphics 645
The MacBook running the Intel Core i9 with Radeon Pro took 70 minutes and utilized 100% of the GPU during the test

The team notes that CreateML was able to use 100% of the discrete Radeon GPU but didn't bother using the Intel Iris at all and only 8% of the integrated M1 GPU. This affected the time and likely is due to Apple needing to further optimize the toolset for the M1 processor.

Based on this benchmark, the Apple M1 is 3.64 times as fast as the Intel Core i5. However, the M1 machine is not fully utilizing its GPU and -- so far -- underperforms the i9 with discrete graphics.

Apple is expected to continue optimizing its CreateML framework and is working with TensorFlow to properly port their toolset to M1. Future M-series processors may have even more powerful neural engines and processors as rumors already indicate a 32-core M-series chip could be in a future desktop Mac.

tipoo · December 16, 2020 2:30PM

>149 minutes
>70 minutes

2.2x slower doesn't sound "nearly as fast" as the 16. The low GPU utilization is because ML tasks are automatically dispatched to the neural engine instead.

yeldarby · December 16, 2020 3:12PM

tipoo said:

>149 minutes
>70 minutes

2.2x slower doesn't sound "nearly as fast" as the 16. The low GPU utilization is because ML tasks are automatically dispatched to the neural engine instead.

Hi, I'm one of the ones that helped with this benchmark. That's almost certainly not the case; if it was, you would expect the CPU to also be largely idle as it is with GPU-constrained training but it was pegged between 150%-250% on the M1 for the duration of training. Leads me to strongly believe there is more optimization yet to be done to adapt the CPU part of the pipeline for the new architecture and that this will get much better over time.

mjtomlin · December 16, 2020 3:16PM

tipoo said:

>149 minutes
>70 minutes

2.2x slower doesn't sound "nearly as fast" as the 16. The low GPU utilization is because ML tasks are automatically dispatched to the neural engine instead.

I believe, the ANE is optimized for running ML models, not training them. The ML accelerators would be more suited for that. And I'm pretty sure those accelerators are in the CPU as an extension of the ARMv8 ISA.

edited December 2020

tjwolf · December 16, 2020 4:15PM

What was the memory utilization during this? I know nothing of ML, but I'd think reading/analyzing images is an inherently memory-intensive task. If so, not sure what comparing an 8gb M1 Mac with a 16gb Mac w. separate GPUs reveals.

alphafox · December 16, 2020 5:02PM

So the M1 used its neural accelerator but the i5 was running in software on the CPU only? How are these results compatible at all?

yeldarby · December 16, 2020 5:15PM

tjwolf said:

What was the memory utilization during this? I know nothing of ML, but I'd think reading/analyzing images is an inherently memory-intensive task. If so, not sure what comparing an 8gb M1 Mac with a 16gb Mac w. separate GPUs reveals.

Spun up a training job to grab a screenshot of the memory usage: https://imgur.com/a/tGeiURT
There doesn't seem to be thrashing going on. The COCO dataset is under 1GB post-processing so it should comfortably fit in memory.

yeldarby · December 16, 2020 5:17PM

alphafox said:

So the M1 used its neural accelerator but the i5 was running in software on the CPU only? How are these results compatible at all?

As far as I'm aware the neural engine was not being used. This is comparing the current M1 13" MacBook Pro vs the previous generation Intel 13" MacBook Pro. Both have integrated graphics processors.

frantisek · December 16, 2020 6:52PM

yeldarby said:

alphafox said:

So the M1 used its neural accelerator but the i5 was running in software on the CPU only? How are these results compatible at all?

As far as I'm aware the neural engine was not being used. This is comparing the current M1 13" MacBook Pro vs the previous generation Intel 13" MacBook Pro. Both have integrated graphics processors.

So it TensorFlow really optimized or just compiled for M1 sand real optimization is on the way?

wwishart · December 16, 2020 7:34PM

Intel Core i5 and 16GB of RAM which has a dedicated Intel Iris Plus Graphics 645 card? Are you sure it’s a dedicated card.

cloudguy · December 17, 2020 4:15AM

"To the 13-inch MacBook Pro with Intel Core i5 and 16GB of RAM which has a dedicated Intel Iris Plus Graphics 645 card"

Huh? Intel Iris Plus is an integrated GPU. Intel didn't start making discrete GPUs again until this year with the Iris Xe Max.

Also - and I have mentioned this in the past - the Intel Core i5 in the MBP is a quad core chip. Comparing it to the octacore Apple M1 chip is apples versus oranges (pun not intended). Comparisons between the octacore Intel Core i9 and especially the hexacore Intel Core i7 is what I wish to see. Sadly, all the Core i7 and Core i9 MBP devices have actual dedicated graphics cards.

What needs to be done:
1. Get a Dell XPS 13 7390. It contains the latest hexacore Intel chip but does not have a GPU (it has "Intel UHD graphics")
2. Dump the Windows 10 OS.
3. Replace Windows 10 with Ubuntu 20.04 desktop. (Yes, I know that macOS isn't Unix or Linux ... but it isn't as if we can get macOS on that hardware.)
4. Take one of the 8 GB RAM sticks out.
5. Crank the 4K laptop screen down to 1080p or 720p.
6. Run this same machine learning test. 6 x86 core chip with integrated graphics and 8 GB of RAM versus 8 core ARM chip with the same.

Now of course any tie would go to the runner - Apple - in this case. The Intel chip is on a 14nm process so it runs very hot and uses a ton of power. (Latest word on the street is that Intel's 10nm process is suffering from very low yields and TSMC doesn't have the capacity to help them out, so they are considering going to Samsung.) Also, that Intel chip costs about $400. The Apple M1 costs $100 to make tops.

But anything to get actual useful benchmarks here instead of having 8 core chips beat up on the dual core Intel i3 chip that was in the $999 MBA and $799 Mac Mini. Also, for the record, the Qualcomm Snapdragon 8cx also has similar benchmarks to the Intel Core i5 in the entry level MacBook Pro. (Its problems are with Windows 10, not the CPU. Put ChromeOS or ARM Ubuntu on a Surface Pro X and it would run fine.) Of course, the M1 runs circles around the 8CX while tap dancing on its head. But that just shows that beating the Intel "mobile" i3 and i5 chips that have 2 or 4 cores and run at 1.1 and 1.8 GHz really isn't that big a deal.

mcdave · December 17, 2020 9:43AM

cloudguy said:

Also - and I have mentioned this in the past - the Intel Core i5 in the MBP is a quad core chip. Comparing it to the octacore Apple M1 chip is apples versus oranges (pun not intended).

The M1 only has 4 performance cores with the efficiency cores running at around 20%. It would be totally fair to compare it with a hyper threading quad core x86 CPU and unfair to compare it to one with 8 performance cores.

FYI, macOS is a UNIX variant. There’s more to UNIX than Linux.

cloudguy · December 17, 2020 9:14PM

mcdave said:

cloudguy said:

Also - and I have mentioned this in the past - the Intel Core i5 in the MBP is a quad core chip. Comparing it to the octacore Apple M1 chip is apples versus oranges (pun not intended).

The M1 only has 4 performance cores with the efficiency cores running at around 20%. It would be totally fair to compare it with a hyper threading quad core x86 CPU and unfair to compare it to one with 8 performance cores.

FYI, macOS is a UNIX variant. There’s more to UNIX than Linux.

The efficiency cores ... run at 20%? No. In the M1's big/little arch, the Firestorm cores run at 3.2 GHz and the IceStorm cores run at 1.85 GHz. So yes, it is perfectly valid to compare 4 3.2 GHz and 4 1.85 GHz cores to a hexacore chip, especially since that hexacore chip is going to have to be throttled precisely because of the lack of efficiency cores (which also happens if you merely have 4 performance cores with no efficiency cores as is the case with the Intel Core i5 that also gets beaten by the big/little octactore Qualcomm Snapdragon 865). Intel states that they are going to adopt the big/little architecture with efficiency cores in 2021. Knowing them lately, that means 2023.

While macOS "starts with" FreeBSD ... it veers off into its own direction, particularly in that macOS has a hybrid kernel vs the UNIX monolithic one. And for the record, even FreeBSD technically isn't UNIX. It is UNIX-like and compatible but there are differences. It is fair to say that FreeBSD is a lot closer to UNIX than macOS is to FreeBSD. So since it is twice removed from UNIX - and is significantly different from FreeBSD - then calling macOS "a UNIX variant" is challenging. My point is that benchmarking hardware running Linux is better than benchmarking hardware using Windows. (And it is a lot easier to replace Windows 10 with Ubuntu desktop than it is FreeBSD.)

jdb8167 · December 17, 2020 11:06PM

cloudguy said:

mcdave said:

cloudguy said:

Also - and I have mentioned this in the past - the Intel Core i5 in the MBP is a quad core chip. Comparing it to the octacore Apple M1 chip is apples versus oranges (pun not intended).

The M1 only has 4 performance cores with the efficiency cores running at around 20%. It would be totally fair to compare it with a hyper threading quad core x86 CPU and unfair to compare it to one with 8 performance cores.

FYI, macOS is a UNIX variant. There’s more to UNIX than Linux.

The efficiency cores ... run at 20%? No. In the M1's big/little arch, the Firestorm cores run at 3.2 GHz and the IceStorm cores run at 1.85 GHz. So yes, it is perfectly valid to compare 4 3.2 GHz and 4 1.85 GHz cores to a hexacore chip, especially since that hexacore chip is going to have to be throttled precisely because of the lack of efficiency cores (which also happens if you merely have 4 performance cores with no efficiency cores as is the case with the Intel Core i5 that also gets beaten by the big/little octactore Qualcomm Snapdragon 865). Intel states that they are going to adopt the big/little architecture with efficiency cores in 2021. Knowing them lately, that means 2023.

While macOS "starts with" FreeBSD ... it veers off into its own direction, particularly in that macOS has a hybrid kernel vs the UNIX monolithic one. And for the record, even FreeBSD technically isn't UNIX. It is UNIX-like and compatible but there are differences. It is fair to say that FreeBSD is a lot closer to UNIX than macOS is to FreeBSD. So since it is twice removed from UNIX - and is significantly different from FreeBSD - then calling macOS "a UNIX variant" is challenging. My point is that benchmarking hardware running Linux is better than benchmarking hardware using Windows. (And it is a lot easier to replace Windows 10 with Ubuntu desktop than it is FreeBSD.)

Nevertheless the IceStore cores are measured to be about 20-25% of a FireStorm core. You haven't fallen into the MHz myth have you? IceStorm cores are not just down clocked FireStorm cores. The efficiency cores are designed for a different purpose than the performance cores, it's in the name.

Unless Apple hasn't kept up with certification, macOS/Darwin is officially a Unix. Not sure what you are trying to say, but macOS isn't Unix like, it is Unix.

nicholfd · December 17, 2020 11:37PM

cloudguy said:

What needs to be done:
1. Get a Dell XPS 13 7390. It contains the latest hexacore Intel chip but does not have a GPU (it has "Intel UHD graphics")
2. Dump the Windows 10 OS.
3. Replace Windows 10 with Ubuntu 20.04 desktop. (Yes, I know that macOS isn't Unix or Linux ... but it isn't as if we can get macOS on that hardware.)
4. Take one of the 8 GB RAM sticks out.
5. Crank the 4K laptop screen down to 1080p or 720p.
6. Run this same machine learning test. 6 x86 core chip with integrated graphics and 8 GB of RAM versus 8 core ARM chip with the same.

Actually you are wrong - macOS is certified Unix: https://www.opengroup.org/openbrand/register/ (Certified Unix).

And to quote:

For details of the certification click the product links

Apple Inc.: macOS version 11.0 Big Sur on Apple silicon-based Mac computers
Apple Inc.: macOS version 11.0 Big Sur on Intel-based Mac computers

tipoo · December 18, 2020 1:28AM

yeldarby said:

tipoo said:

>149 minutes
>70 minutes

2.2x slower doesn't sound "nearly as fast" as the 16. The low GPU utilization is because ML tasks are automatically dispatched to the neural engine instead.

Hi, I'm one of the ones that helped with this benchmark. That's almost certainly not the case; if it was, you would expect the CPU to also be largely idle as it is with GPU-constrained training but it was pegged between 150%-250% on the M1 for the duration of training. Leads me to strongly believe there is more optimization yet to be done to adapt the CPU part of the pipeline for the new architecture and that this will get much better over time.

Thank you for the explanation!

I still take issue with this articles title calling 2.1x slower nearly as fast though. But then the 8% GPU use is curious, perhaps it's only logging graphics in the percent and not compute?

duhsesame · December 20, 2020 8:54PM

cloudguy said:

mcdave said:

cloudguy said:

Also - and I have mentioned this in the past - the Intel Core i5 in the MBP is a quad core chip. Comparing it to the octacore Apple M1 chip is apples versus oranges (pun not intended).

The M1 only has 4 performance cores with the efficiency cores running at around 20%. It would be totally fair to compare it with a hyper threading quad core x86 CPU and unfair to compare it to one with 8 performance cores.

FYI, macOS is a UNIX variant. There’s more to UNIX than Linux.

The efficiency cores ... run at 20%? No. In the M1's big/little arch, the Firestorm cores run at 3.2 GHz and the IceStorm cores run at 1.85 GHz. So yes, it is perfectly valid to compare 4 3.2 GHz and 4 1.85 GHz cores to a hexacore chip, especially since that hexacore chip is going to have to be throttled precisely because of the lack of efficiency cores (which also happens if you merely have 4 performance cores with no efficiency cores as is the case with the Intel Core i5 that also gets beaten by the big/little octactore Qualcomm Snapdragon 865). Intel states that they are going to adopt the big/little architecture with efficiency cores in 2021. Knowing them lately, that means 2023.

While macOS "starts with" FreeBSD ... it veers off into its own direction, particularly in that macOS has a hybrid kernel vs the UNIX monolithic one. And for the record, even FreeBSD technically isn't UNIX. It is UNIX-like and compatible but there are differences. It is fair to say that FreeBSD is a lot closer to UNIX than macOS is to FreeBSD. So since it is twice removed from UNIX - and is significantly different from FreeBSD - then calling macOS "a UNIX variant" is challenging. My point is that benchmarking hardware running Linux is better than benchmarking hardware using Windows. (And it is a lot easier to replace Windows 10 with Ubuntu desktop than it is FreeBSD.)

As of 2020, “UNIX” isn’t just an operating system or one strict standard.

http://www.catb.org/esr/writings/taoup/html/ch01s02.html

e.g. an uncertified Java programmer is still a Java programmer, let along a certified one.

Apple Silicon 13-inch MacBook Pro nearly as fast at machine learning training as 16-inch M...

Comments