Apple trademarks its patented "macroscalar" code optimization technology

AppleInsider · February 6, 2012 5:57PM

Apple has recently filed for international trademark protections of "macroscalar," its name for various patented optimizations for efficiently executing code on a processor, suggesting it plans to begin commercially promoting its differentiating technology.

Apple has filed for multiple patents that reference the concept of "macroscalar processor architecture" beginning at least as early as 2004. Most of the patent filings appear to be "continuation-in-part" applications that incorporate and expand upon previous co-pending patents, which have since been granted.

The patents list Jeffry Gonion as their inventor; he has worked as a Platform Architect at Apple since the company hired him away from Geneva-based semiconductor firm STMicroelectronics in 2003.

It was STMicro that sued Apple in 2010 over the EU trademark rights for "iPad," in what was reported to be an apparent bid to win Apple's business. Apple subsequently released iPhone 4 that summer with STMicro's 3-axis digital gyroscope, and used the company's accelerometer in its 6th generation iPod nano later that same year.

Inside Apple's Macroscalar technology

At Apple, Gonion has developed the concept of the macroscalar processor architecture as a technique for making optimized, efficient use of a processor's execution pipelines by preparing and ordering instructions so they can be executed in parallel as much as possible.

The macroscalar patent summary describes one example of where "a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel."

Rather than being a code compiler-driven parallelism technology like Apple's Grand Central Dispatch, which can prepare code written to take advantage of it in the compiler for efficient, parallel execution on existing chips, Macroscalar relates to technologies that appear to depend on custom chip hardware, including a "considerably increased" number of registers and allowing new flexibility in how code is executed on a particular processor, avoiding code optimizations for a "specific processor."

The patents describe the technology as incorporating instruction-level parallelism "generated at run-time, rather than scavenged, improving efficiency and performance while reducing power dissipation per task. The number of program registers is increased considerably, and over-specification of binary code for a specific processor is avoided, replaced by mechanisms which may ensure that software for prior versions of the processors automatically utilize additional execution resources in future versions."

The patent description further notes, "these enhancements also permit virtually substantially all inner loops to be aggregated to varying degrees, including those that cannot be unrolled by compilers, increasing IPC by maximizing utilization of multiple execution units."

Apple's expanding chip technology portfolio

The nearly decade long research related to macroscalar technology at Apple as outlined in sophisticated detail in the patent filings seems to describe a variety of efforts to maximize the performance of code running on processors using as little power as possible, while also reducing dependance on a particular processor's design, key design goals of the company's custom chips for iOS devices.

This indicates Apple may be gearing up to push more of its unique optimizations into the System on a Chip components it uses in iOS devices, an effort that continued as Apple shifted from its own PowerPC chips to Intel in 2005, its subsequent acquisition of PASemi in 2008 and its purchase of Intrinsity in 2010, initiatives AppleInsider has provided exclusive and frequent coverage on as events have unfolded.

Not content with using Intel's own Silverthorne (since reamed Atom) mobile chip designs on its then in-development iPad, Apple instead secretly licensed design rights to next generation graphics and video IP cores from Imagination Technologies, which it combined with ARM-designed processor cores and technologies from other companies, including Audience noise reduction to build its custom A4 used to power iPad and later iPhone 4, iPod touch and Apple TV.

Last year, Apple introduced its A5 for iPad 2 and iPhone 4S, incorporating more advanced multicore ARM processor cores, dual core SGX543 Imagination GPUs and more advanced sound processing technology from Audience that optimized the performance of Siri's voice recognition.

In addition to Siri, Apple's custom processor designs have helped endow its iOS devices with industry leading graphics and battery performance, neither of which would have been possible had the company relied on Intel's Atom efforts or had simply used generically available SoCs such as NVidia's Tegra line or ARM's own Mali GPU, all of which offer relatively weak graphics performance.

By continuing to develop its own increasingly customized chips incorporating the best technologies available and leaving off features its doesn't use, as well as optimizing its compilers and other development tools to take full advantage of the limited range of chips in its devices, Apple will avoid the problems of broadly licensed platforms like Google's Android, where devices may use a variety of different GPU cores, complicating the efforts of developers who want to create impressive 3D games and other titles.

[ View article on AppleInsider ]

blastdoor · February 6, 2012 6:13PM

In theory, designing one's own CPUs, compilers, and OS could have some profound advantages. It will be interesting to see if Apple can turn theory into reality.

freerange · February 6, 2012 6:16PM

Apple is obviously much more than just a generic electronics manufacturer which is what the Android market has become. Android is becoming very much a commodity business driven by price except for the very top end of the market. Even then, the lack of a standard set of components in the Android devices makes it a much less developer and user friendly platform which is also exasperated by the carriers' control over updates. Android is a mess... Meanwhile Apple continues to innovate at every level, even those we don't see.

Dan_Dilger · February 6, 2012 6:24PM

Quote:

Originally Posted by Blastdoor

In theory, designing one's own CPUs, compilers, and OS could have some profound advantages. It will be interesting to see if Apple can turn theory into reality.

Hello Mr 2002, welcome to the future. It's 2012. Apple already designs its own processors, compilers and OS, and the "profound advantages" are manifestly obvious and have been for some time now.

esummers · February 6, 2012 6:30PM

"and used the company's accelerator in its 6th generation iPod nano later that same year." I didn't realize the iPod had an accelerator....

Grand Central Dispatch could be considered programmer-driven parallelism or kernel-driven parallelism. However, it has very little to do with compiler-driven parallelism. In fact, it isn't even compiled... it is linked via a shared library.

At best, this looks like a small improvement to vector processing units. It doesn't look much different then vector processing already works. Apple is uniquely situated to take more advantage of vector units if they desire. I sometimes get the sense that Apple is up to something a little bigger in parallelism then simply GCD. It just feels that they give parallelism too much attention in areas that, unlike GCD, do not provide an obvious benefit outside of niche software.

afrodri · February 6, 2012 6:46PM

Quote:

Originally Posted by Blastdoor

In theory, designing one's own CPUs, compilers, and OS could have some profound advantages. It will be interesting to see if Apple can turn theory into reality.

Apple already does this to some extent, but what I find more interesting is how they pick and choose between existing components and still differentiate themselves. For example, they use stock x86 cores in the Desktop/laptop, and for the mobile devices license ARM's designs but extend it and integrate into a SoC. For compilers they build off the GNU and llvm projects while contributing back to both, on the OS front they clearly differentiate the most, but also build off of BSD and Mach kernels where it makes sense to.

In these areas, Apple has generally avoided bot the "not invented here" syndrome and the "license/copy everything and hope we can differentiate ourselves by being 2% cheaper" syndrome that plague the industry.

solipsismx · February 6, 2012 7:21PM

Quote:

Originally Posted by Corrections

Hello Mr 2002, welcome to the future. It's 2012. Apple already designs its own processors, compilers and OS, and the "profound advantages" are manifestly obvious and have been for some time now.

A little over the top there, don't you think? And I am unaware of what processors Apple designs. Do you mean their own SoCs and PoPs based on ARM's reference designs? I'm not sure I'd classify that Apple designing its own processors.

wizard69 · February 6, 2012 7:25PM

I'm actually surprised at this article as it is well done. I suspect from past fillings that Apple is up to more than just a macroscalar enhancement. For own they seem to have patented features in the past that would optimize a processor for exception of Objective C.

Quote:

Originally Posted by esummers

"and used the company's accelerator in its 6th generation iPod nano later that same year." I didn't realize the iPod had an accelerator....

That opens up the question as to what is an accelerator? Is it any hardware outside of the CPU's ALU? Cache memory could be consider an accelerator, in fact early in computer history it was marketed that way, today it is standard hardware.

Quote:

Grand Central Dispatch could be considered programmer-driven parallelism or kernel-driven parallelism. However, it has very little to do with compiler-driven parallelism. In fact, it isn't even compiled... it is linked via a shared library.

I'm not too sure I like that characterization GCD it a way for programers to leverage computing resources on a Mac. There is actually multiple ways to do that on todays computers. GCD is just one approach that maps well to certain problem sets. As to programer driven, well the programmer or design always has a big impact on the ability of code to execute in parallel as does the problem domain.

Quote:

At best, this looks like a small improvement to vector processing units.

It could be. This isn't Apples first attack on vector processing as Alt-Vec was heavily influenced by Apple. There is some mystery as to how much Apple IP is in Alt-Vec but I'd have to say it is more that the zero many believe.

Quote:

It doesn't look much different then vector processing already works.

It does have that appearance. However the smart move for Apple would be to design a processor that doesn't depend on a separate vector unit, make all code paths support vector operations even if the ultimate vector is a bit shorter. The idea of vector units came to the industry relatively late which has resulted in their being segregated off from the main computing hardware. That might not be ideal for a low power processor implementation.

Quote:

Apple is uniquely situated to take more advantage of vector units if they desire.

Not really. Vector processing and the closely related GPU computing is only advantage if you have a problem set that can take advantage of it. It is really something people really seem to have a problem grasping, often the vector units in a CPU are doing absolutely nothing other than using up power. The same thing goes for the GPU. I cringe every time I hear somebody whine about OpenCL as it is a very successful Apple initiative, that is if you understand what it is and who the technology is targeted at.

On the flip side Apple has iOS where they heavily leverage all of the hardware resources in the chip. Apple is certainly a leader in this use of hardware but I just can't see the phrase "uniquely situated" applying here as anybody can use the same types of hardware. It is more a lack of will more than anything with the competition.

Quote:

I sometimes get the sense that Apple is up to something a little bigger in parallelism then simply GCD.

Really they have no choice. In the mobile arena your only option to maintain performance while running on battery, is to keep the clock rate down and to spread the workload around.

Quote:

It just feels that they give parallelism too much attention in areas that, unlike GCD, do not provide an obvious benefit outside of niche software.

I'm not sure I buy in to the above statement. Parallelism is not always simple and then you have the reality of completely unrelated tasks running that are so important to todays operating systems. One approach does not solve even problem on todays machines so you need to pay attention to all techniques. For example iOS greatly benefits from process parallelism, as do most computers, but today people hardly acknowledge how important that sort of parallelism is to offering up a responsive machine. The interesting thing here is that iOS only runs normally one user process yet there are enough things happening in background that a second processor makes a huge difference. In the case of this article the technology highlighted focuses on parallelism deep inside the processors. It isn't really that esoteric as all code has branches and missed branches have always caused processor slow downs. The described technology is just there to execute both code paths at the same time to reduce performance losses from branches. It isn't so much doing more work as it is taking out some of the latency from bad predictions. The way I read this is that the processor wouldn't be executing more threads or processes at all so it isn't a type of parallelism leveraged by code.

In any event if you follow Apple and there stream of patents, you will have noticed that the processor patents have been flowing for some time. Some related to this and others focused on other concepts. It would be very nice to see the efforts here realized in hardware this year.

shompa · February 6, 2012 7:31PM

This is Apple (and all other "closed vendors") advantage.

Apple is in an unique position to customize its SoC with stuff that they want to use in iOS (and later OSX when they are merged). This patent is one more evidence in this brilliant strategy.

Take A5. 1 year after its release there is no other ARM SoC that have the same performance in real world tasks. Even the single core A4 performed within 8% with dual core Tegra's.

A5 is a "huge" ARM SoC. Its 30% larger then Tegra2. Apple fills this extra space with stuff they can optimize its OS around. Google/Android cant do that since not all devices have the same hardware.

If I understand this patent right, Apple can use this hardware to dispatch code over different processors. For example Cortex9 and ARM15. These processors have different pipeline number of steps. Instead of optimizing per processor, this new hardware can do it on the fly.

This will be more important now when ARM starts to have a longer pipe to be able to clock its processors higher.

Lets hope that ARM does not have redicilous pipe like the later Pentium4s with over 30 stage pipeline = why they where so slow per clock cykle.

X86/Intel have to continue to use its brute force approach to achive good performence. Apple can get the same performence in real world apps using SIMD/GPU accelleration and slower (but much cheaper) ARM SoCs.

The other article on this site where Tim says that tablets soon will fill the Macbook Air niche: Tim does not mean that Apple won't do ARM computers. What Tim means is that in the next couple of years our Phones and Tablets will be our computers. A6 will be the first ARM SoC that is fast enough for generalt computing.

Majority of ordinary users that buys Ipad3 will have an Ipad with better performence then their desktop PC with intel Graphics. And this is 2012.

Just hock your Ipad to a big screen + bluetooth keyboard and you have a complete computer. Same with your phone later this year. This is the future of computing. Big boxes will dissapear/be a niche.

I would love Apple to release a headless iOS mac with A6. Its exactly the same hardware like AppleTV but with a few more ports. Apple could sell these for 200 dollar and make a huge profit. For users, they can have a computer that draws 2-6 watt that can do almost everything like the standard mac. The only programs it cant do is stuff that have not been optimized for ARM/SIMD. Lazy companies like Adobe and MSFT that took years and years to adopt OSX and later 64bit.

We are seeing how Apple is moving everything in house.

SoC: PA Semi

Graphics PowerVR (That Apple owns over 10% off)

Anobit for NAND Flash controller.

Loads of LTE patents to design their own LTE chip

Giving Sharp over 500 million to build a LCD factory.

Apple can almost build a complete SoC/Computer. They only need a baseband and some good 2G/3G patents. Maybe buy the late Sony-Ericcson that have a great patent portfolio (The only Android vendor that are not sued by MSFT and dont have to pay protection fee to MSFT for Android). Sony bought Sony-Ericsson for just 1 billion and Sony is loosing ton of money. Apple could buy them out and have a complete SoC.

gwmac · February 6, 2012 7:32PM

Macrumors posted this story earlier today.

http://www.macrumors.com/2012/02/06/...ture-advances/

What happened to the days when AI actually had scoops an didn't rehash and recycle stories from other sources? Kudos for the extra details and information, but still...How about some original and "you saw it here first" stories again AI?

shompa · February 6, 2012 7:34PM

Quote:

Originally Posted by SolipsismX

A little over the top there, don't you think? And I am unaware of what processors Apple designs. Do you mean their own SoCs and PoPs based on ARM's reference designs? I'm not sure I'd classify that Apple designing its own processors.

Apple puts stuff like DDR2 controllers, NOVA and the voice chip into its SoC. That is why its 30% larger then other SoCs. So yes. Apple uses standard ARM cores, but the stuff around is Apple specific. You can look at any benchmark and see the huge different between the Apple SoC and Tegra2 for example.

Exactly the same ARM cores, but Apple are much faster (thanks to the memory interface and other stuff).

afrodri · February 6, 2012 7:37PM

Quote:

Originally Posted by wizard69

I

That opens up the question as to what is an accelerator? Is it any hardware outside of the CPU's ALU? Cache memory could be consider an accelerator, in fact early in computer history it was marketed that way, today it is standard hardware.

From the context of the article, I think they meant to say "accelerometer" - ST Micro makes accelerometers, and I believe supplied the ones for the iPhone.

shompa · February 6, 2012 7:41PM

Quote:

Originally Posted by gwmac

Macrumors posted this story earlier today.

http://www.macrumors.com/2012/02/06/...ture-advances/

What happened to the days when AI actually had scoops an didn't rehash and recycle stories from other sources? Kudos for the extra details and information, but still...How about some original and "you saw it here first" stories again AI?

And they took it from Patently Apple.

Its a small Apple rumor world.

The thing that irritates me is when some idiotic GermanPC magazine reports that Ipad3 will be released in autum 2011 and that story spreads like wild fire in the Apple rumor sites.

Apple rumor sites starts to use each other as reference.

Then Forbes and Yahoo starts to report that Apple insider said that Ipad 3 is on the way 3 month after the German magazine that have zero clue reported it.

But... These sites are about generating clicks, not produce accurate information.

afrodri · February 6, 2012 7:44PM

Quote:

Originally Posted by esummers

At best, this looks like a small improvement to vector processing units. It doesn't look much different then vector processing already works.

From skimming the patent, it looks like the slices are a bit more independent that most vector processors, in that each slice has its own register file, and can operate on a different thread at the same time. Sorta reminds me of the 'clustered' micro-architectures which were talked about for DSPs.

dick applebaum · February 6, 2012 7:57PM

Quote:

Originally Posted by esummers

"and used the company's accelerator in its 6th generation iPod nano later that same year." I didn't realize the iPod had an accelerator....

Grand Central Dispatch could be considered programmer-driven parallelism or kernel-driven parallelism. However, it has very little to do with compiler-driven parallelism. In fact, it isn't even compiled... it is linked via a shared library.

At best, this looks like a small improvement to vector processing units. It doesn't look much different then vector processing already works. Apple is uniquely situated to take more advantage of vector units if they desire. I sometimes get the sense that Apple is up to something a little bigger in parallelism then simply GCD. It just feels that they give parallelism too much attention in areas that, unlike GCD, do not provide an obvious benefit outside of niche software.

Apple has a feature in Final Cut Pro X called Optical Flow.

Optical Flow is used primarily when video is slowed down. For example, to slow a 30 FPS (Frames Per second) video by 50%' it would at play at 15 FPS. This would give poorer video playback than the original. To slow the video and preserve the quality, Optical Flow is used to generate morphed frames in between the original frames. So the resulting video, runs at 30 FPS by playing 15 generated frames interspersed among 15 original frames == 30 FPS.

Optical Flow lends itself to parallel processing.

OK?

Now, consider the opposite approach...

What if you could take a video, drop 50% of the frames, stream it, than computationally generate the dropped frames at the other end?

Potentially, you could reduce the bandwidth to stream the video in half... Or a deliver a much higher quality stream at the same bandwidth.

If this can be done at a low cost, it could "crack" the video marketplace!

Dan_Dilger · February 6, 2012 8:56PM

Quote:

Originally Posted by gwmac

Macrumors posted this story earlier today.

http://www.macrumors.com/2012/02/06/...ture-advances/

What happened to the days when AI actually had scoops an didn't rehash and recycle stories from other sources? Kudos for the extra details and information, but still...How about some original and "you saw it here first" stories again AI?

Sometimes AI is first, such as in reporting on the Audience chip tech a day before MacRumors. Sometimes MR is first, in your example of throwing up a summary of Patently Apple ahead of AI, and suggesting a bunch of speculation that PA wrote that is rather nonsense.

I prefer the detail and context, and some research behind what's being reported to at least attempt to portray things with some objective accuracy. But if you like first post exclusives, watch for the red headlines on AI I guess.

blastdoor · February 6, 2012 9:09PM

Quote:

Originally Posted by Corrections

Hello Mr 2002, welcome to the future. It's 2012. Apple already designs its own processors, compilers and OS, and the "profound advantages" are manifestly obvious and have been for some time now.

Up until now, Apple has come nowhere near realizing the potential benefits of controlling the OS, compiler, and CPU design. Their A4 and A5 are minimally tweaked ARM SOCs that aren't that much different from other ARM SOCs. Samsung could probably run Android on an A4 or A5 chip with very little difficulty.

What this news story is talking about is something far more exciting. Sorry you can't grasp that.

realistic · February 6, 2012 9:36PM

I readily admit that I don't know jack so I feel fortunate that AI has so many self-professed experts that know it all. Why pray tell are they seeing things differently? Could it be that they also don't l know jack but think they do? Think about it, they all can't be right but they all could be wrong.

blastdoor · February 7, 2012 4:23AM

Quote:

Originally Posted by Realistic

I readily admit that I don't know jack so I feel fortunate that AI has so many self-professed experts that know it all. Why pray tell are they seeing things differently? Could it be that they also don't l know jack but think they do? Think about it, they all can't be right but they all could be wrong.

Abcde

ahmlco · February 7, 2012 4:35PM

Quote:

Originally Posted by wizard69

II'm not sure I buy in to the above statement. Parallelism is not always simple and then you have the reality of completely unrelated tasks running that are so important to todays operating systems.

Seems to me that there's a middle ground between simple vector parallelism and brute-force process parallelism.

What if, say, the compiler could take a loop apart and cause each iteration to occur in parallel? Take, say, generating a list view. Each cell results in a callback to get the contents of the cell to display. What if the majority of those callbacks and cell generations could happen simultaneously?

Or, perhaps, generating image icons for a thumbnails view?

Some of those things could be done now, if the developer broke the code up into GCD blocks. But that requires the developer to do the code. What if compiler and processor technology advanced to the point where the developer just codes the basic loop, and the system does the GCD breakdown automatically, on a "macro" scale?

Could be a huge advance in taking advantage of multiple processes and cores.

blastdoor · February 8, 2012 7:06AM

Quote:

Originally Posted by ahmlco

Seems to me that there's a middle ground between simple vector parallelism and brute-force process parallelism.

What if, say, the compiler could take a loop apart and cause each iteration to occur in parallel? Take, say, generating a list view. Each cell results in a callback to get the contents of the cell to display. What if the majority of those callbacks and cell generations could happen simultaneously?

Or, perhaps, generating image icons for a thumbnails view?

Some of those things could be done now, if the developer broke the code up into GCD blocks. But that requires the developer to do the code. What if compiler and processor technology advanced to the point where the developer just codes the basic loop, and the system does the GCD breakdown automatically, on a "macro" scale?

Could be a huge advance in taking advantage of multiple processes and cores.

A thought occurred to me -- this macroscalar thing kind of sounds like a less extreme version of what Intel tried to do with Itanium. A big difference being that (I think) Itanium required a new compiler (and also recompiled code) every time the processor was updated. It kind of sounds like Apple is saying they want a VLIW-ish processor, but without such extreme demands on the compiler.

Apple trademarks its patented "macroscalar" code optimization technology

Comments