Apple's other open secret: the LLVM Complier

Posted:
in macOS edited January 2014
SproutCore, profiled earlier this week, isn't the only big news spill out from the top secret WWDC conference due to Apple's embrace of open source sharing. Another future technology featured by the Mac maker last week was LLVM, the Low Level Virtual Machine compiler infrastructure project.



Like SproutCore, LLVM is neither new nor secret, but both have been hiding from attention due to a thick layer of complexity that has obscured their future potential.



Looking for LLVM at WWDC



Again, the trail of breadcrumbs for LLVM starts in the public WWDC schedule. On Tuesday, the session "New Compiler Technology and Future Directions" detailed the following synopsis:



"Xcode 3.1 introduces two new compilers for Mac OS X: GCC 4.2 and LLVM-GCC. Learn how the new security and performance improvements in GCC 4.2 can help you produce better applications. Understand the innovations in LLVM-GCC, and find out how you can use it in your own testing and development. Finally, get a preview of future compiler developments."



There's a lot of unpronounceable words in all capital letters in that paragraph, LOLBBQ. Let's pull a few out and define them until the synopsis starts making sense.



Introducing GCC



The first acronym in our alphabet soup is GCC, originally the GNU C Compiler. This project began in the mid 80s by Richard Stallman of the Free Software Foundation. Stallman's radical idea was to develop software that would be shared rather than sold, with the intent of delivering code that anyone could use provided that anything they contribute to it would be passed along in a form others could also use.



Stallman was working to develop a free version of AT&T's Unix, which had already become the standard operating system in academia. He started at the core: in order to develop anything in the C language, one would need a C compiler to convert that high level, portable C source code into machine language object code suited to run on a particular processor architecture.



GCC has progressed through a series of advancements over the years to become the standard compiler for GNU Linux, BSD Unix, Mac OS X, and a variety of embedded operating systems. GCC supports a wide variety of processor architecture targets and high level language sources.



Apple uses a specialized version of GCC 4.0 and 4.2 in Leopard's Xcode 3.1 that supports compiling Objective-C/C/C++ code to both PowerPC and Intel targets on the desktop and uses GCC 4.0 to target ARM development on the iPhone.



The Compiler



A compiler refers to the portion of the development toolchain between source code building and debugging and deployment. The first phase of compiling is the Front End Parser, which performs initial language-specific syntax and semantic analysis on source code to create an internal representation of the program.



Code is then passed through an Optimizer phase which improves it by doing things like deleting any code redundancies or dead code that doesn't need to exist in the final version.



The Code Generator phase then takes the optimized code and maps it to the output processor, resulting in assembly language code which is no longer human readable.



The Assembler phase converts assembly language code into object code that can be interpreted by a hardware processor or a software virtual machine.



The final phase is the Linker, which combines object code with any necessary library code to create the final executable.



Introducing LLVM



GCC currently handles all those phases for compiling code within Xcode, Apple's Mac OS X IDE (Integrated Development Environment). However, there are some drawbacks to using GCC.



One is that it is delivered under the GPL, which means Apple can't integrate it directly into Xcode without making its IDE GPL as well. Apple prefers BSD/MIT style open source licensees, where there is no limitation upon extending open projects as part of larger proprietary products.



Another is that portions of GCC are getting long in the tooth. LLVM is a modern project that has aspired to rethink how compiler parts should work, with emphasis on Just In Time compilation, cross-file optimization (which can link together code from different languages and optimize across file boundaries), and a modular compiler architecture for creating components that have few dependencies on each other while integrating well with existing compiler tools.



LLVM only just got started at the University of Illinois in 2000 as a research project of Chris Lattner. It was released as version 1.0 in 2003. Lattner caught the attention of Apple after posting questions about Objective-C to the company's objc-language mailing list. Apple in turn began contributing to the LLVM project in 2005 and later hired Lattner to fund his work.



Clang and LLVM-GCC



Last year the project released Clang as an Apple led, standalone implementation of the LLVM compiler tools aimed to provide fast compiling with low memory use, expressive diagnostics, a modular library-based architecture, and tight integration within an IDE such as Xcode, all offered under the BSD open source license.



In addition to the pure LLVM Clang project, which uses an early, developmental front end code parser for Objective C/C/C++, Apple also started work on integrating components of LLVM into the existing GCC based on Lattner's LLVM/GCC Integration Proposal. That has resulted in a hybrid system that leverages the mature components of GCC, such as its front end parser, while adding the most valuable components of LLVM, including its modern code optimizers.



That project, known as LLVM-GCC, inserts the optimizer and code generator from LLVM into GCC, providing modern methods for "aggressive loop, standard scalar, and interprocedural optimizations and interprocedural analyses" missing in the standard GCC components.



LLVM-GCC is designed to be highly compatible with GCC so that developers can move to the new compiler and benefit from its code optimizations without making substantial changes to their workflow. Sources report that LLVM-GCC "compiles code that consistently runs 33% faster" than code output from GCC.



Apple also uses LLVM in the OpenGL stack in Leopard, leveraging its virtual machine concept of common IR to emulate OpenGL hardware features on Macs that lack the actual silicon to interpret that code. Code is instead interpreted or JIT on the CPU.



Apple is also using LLVM in iPhone development, as the project's modular architecture makes it easier to add support for other architectures such as ARM, now supported in LLVM 2.0 thanks to work done by Nokia's INdT.



On page 2 of 2: LLVM and Apple's Multicore Future; and Open for Improvement.



LLVM and Apple's Multicore Future



LLVM plays into Apple's ongoing strategies for multicore and multiprocessor parallelism. CPUs are now reaching physical limits that are preventing chips from getting faster simply by driving up the gigahertz. Intel's roadmaps indicate that the company now plans to drive future performance by adding multiple cores. Apple already ships 8-core Macs on the high end, and Intel has plans to boost the number of cores per processor into the double digits.



Taking advantage of those cores is not straightforward. While the classic Mac OS' and Windows' legacy spaghetti code was made faster through a decade of CPUs that rapidly increased their raw clock speeds, future advances will come from producing highly efficient code that can take full advantage of multiple cores.



Existing methods of thread scheduling are tricky to keep in sync across multiple cores, resulting in inefficient use of modern hardware. With features like OpenCL and Grand Central Dispatch, Snow Leopard will be better equipped to manage parallelism across processors and push optimized code to the GPU's cores, as described in WWDC 2008: New in Mac OS X Snow Leopard. However, in order for the OS to efficiently schedule parallel tasks, the code needs to be explicitly optimized for for parallelism by the compiler.



Open for Improvement



LLVM will be a key tool in prepping code for high performance scheduling. As the largest contributor to the LLVM project, Apple is working to push compiler technology ahead along with researchers in academia and industry partners, including supercomputer maker Cray. Apple is also making contributions to GCC to improve its performance and add features.



Because both projects are open source, it's easy to find hints of what the company is up to next. Enhancements to code debugging, compiler speed, the speed of output code, security features related to stopping malicious buffer overflows, and processor specific optimizations will all work together to create better quality code.



That means applications will continue to get faster and developers will have an easier time focusing on the value they can add rather than having their time consumed by outdated compiler technology.



For Apple, investing its own advanced compiler expertise also means that it can hand tune the software that will be running while it also optimizes the specialized processors that will be running it, such as the mobile SoCs Apple will be building with its acquisition of PA Semi, as noted in How Apple?s PA Semi Acquisition Fits Into Its Chip History.



There's more information on The LLVM Compiler Infrastructure Project. Lattner also published a PDF of his presentation of The LLVM Compiler System at the 2007 Bossa Conference.

«13

Comments

  • Reply 1 of 51
    exscapeexscape Posts: 27member
    "The Code Generator phase then takes the optimized code and maps it to the output processor, resulting in assembly language code which is no longer human readable."



    Assembly isn't human readable? I guess that makes me a machine.
  • Reply 2 of 51
    eaieai Posts: 417member
    Quote:
    Originally Posted by exscape View Post


    "The Code Generator phase then takes the optimized code and maps it to the output processor, resulting in assembly language code which is no longer human readable."



    Assembly isn't human readable? I guess that makes me a machine.



    And me too, I'd guess Well, at least someone finally told me...
  • Reply 3 of 51
    da2357da2357 Posts: 35member
    Quote:
    Originally Posted by exscape View Post


    Assembly isn't human readable? I guess that makes me a machine.



    Quote:
    Originally Posted by eAi View Post


    And me too, I'd guess Well, at least someone finally told me...



    Count me in... I guess we're either machines or dinosaurs!
  • Reply 4 of 51
    Even more. I have know guys that can read pages and pages of 0's and 1's and know what they are saying. But now that you mention it, the guys were sort of different. Maybe machines?
  • Reply 5 of 51
    dutch peardutch pear Posts: 588member
    That was a nice, insightfull read



    Seeing how apple is charging ahead in optimizing their OS and other software and IDE to really get the maximum performance out of their hardware and the direction the CPU/GPU makers are going makes me feel very warm and fuzzy about apple's future



    These developments also really highlight the rationale behind Apple's strategy to engineer their products as an integrated union of hard- and software and focus on the most optimal, efficient en powerfull integration/symbiosis of both. Combined with their renewed efforts in internet connectivity using open standards and a great user experience (iPhone, sproutcore) and they really can't go anywhere but up.



    Makes you wonder when companies that focus on a single aspect will see the light.

    Microsoft, Dell, Samsung, Sony, Nokia, Google, I'm looking at you!
  • Reply 6 of 51
    merdheadmerdhead Posts: 587member
    Quote:
    Originally Posted by exscape View Post


    "The Code Generator phase then takes the optimized code and maps it to the output processor, resulting in assembly language code which is no longer human readable."



    Assembly isn't human readable? I guess that makes me a machine.



    This is another example in a long line that reminds us (but obviously not them) that AI should keep away from analysis. They really come across as not knowing what they are talking about.



    The fact they think that a compiler is involved in scheduling threads, that it 'optimises for parallelism' at the code level for multiple processors and that the OS is involved is scheduling parallel code makes them look silly.



    The article just sounds like a bunch of gobbledegook. They're taking what is a pretty mundane improvement (for people other than compiler hackers) and trying to turn it into something special. It's got nothing to do with multiple architectures or specialised versions of ARM processors (which all run basically identical instruction sets, no matter how they are applied).



    Just give it up, AI, please.
  • Reply 7 of 51
    elijahgelijahg Posts: 2,753member
    Very interesting read, thanks
  • Reply 8 of 51
    merdheadmerdhead Posts: 587member
    Quote:
    Originally Posted by dutch pear View Post


    That was a nice, insightfull read



    Seeing how apple is charging ahead in optimizing their OS and other software and IDE to really get the maximum performance out of their hardware and the direction the CPU/GPU makers are going makes me feel very warm and fuzzy about apple's future



    These developments also really highlight the rationale behind Apple's strategy to engineer their products as an integrated union of hard- and software and focus on the most optimal, efficient en powerfull integration/symbiosis of both. Combined with their renewed efforts in internet connectivity using open standards and a great user experience (iPhone, sproutcore) and they really can't go anywhere but up.



    Makes you wonder when companies that focus on a single aspect will see the light.

    Microsoft, Dell, Samsung, Sony, Nokia, Google, I'm looking at you!



    Your post is very buzzword compliant. You should work for AI.
  • Reply 9 of 51
    irelandireland Posts: 17,798member
    Interesting article Dan!
  • Reply 10 of 51
    gee4orcegee4orce Posts: 165member
    The developers I spoke to at WWDC were super excited about LLVM - if they were, so should you be.



    It makes me laugh to think that some supposedly well educated people in the IT field still can't see past Apple's eye-candy hardware and UIs, and don't realise that they are actually a serious computer-science company.



    LLVM + GCD + Open CL + <redacted> + <redacted> = a perfect Snow Leopard storm
  • Reply 11 of 51
    kim kap solkim kap sol Posts: 2,987member
    Quote:
    Originally Posted by exscape View Post


    "The Code Generator phase then takes the optimized code and maps it to the output processor, resulting in assembly language code which is no longer human readable."



    Assembly isn't human readable? I guess that makes me a machine.



    You obviously are a machine because you're incapable of actually understanding anything but machine code. What AI is saying is not that assembly is unreadable...it's saying that the optimized assembly code that is being generated by LLVM is very difficult to read or "no longer human readable" (which is somewhat of a hyperbole since some people can sit down in front of the code and spend some time understanding what it's doing).



    But like I said...you're a machine and wouldn't understand this. Good job on luring other machines though.
  • Reply 12 of 51
    boogabooga Posts: 1,082member
    Quote:
    Originally Posted by merdhead View Post


    This is another example in a long line that reminds us (but obviously not them) that AI should keep away from analysis. They really come across as not knowing what they are talking about.



    The fact they think that a compiler is involved in scheduling threads, that it 'optimises for parallelism' at the code level for multiple processors and that the OS is involved is scheduling parallel code makes them look silly.



    The article just sounds like a bunch of gobbledegook. They're taking what is a pretty mundane improvement (for people other than compiler hackers) and trying to turn it into something special. It's got nothing to do with multiple architectures or specialised versions of ARM processors (which all run basically identical instruction sets, no matter how they are applied).



    Just give it up, AI, please.



    I think you're being a bit harsh. LLVM's intermediate bytecode approach is pretty interesting, both in terms of linear optimization and parallel optimization. I can imagine a case where LLVM opcodes hint at parallelizable tasks that can be dynamically compiled for any number of processors. Thus the language would simply say "the elements in this loop can be executed in parallel" and the compiler and OS would take care of the rest. And when you're dynamically recompiling, you have awareness of not only the system architecture but its resource availability at compile time.



    LLVM seems to be driving toward this a lot faster than Java or CLR's intermediate bytecode. They seem intent on staying pretty close to the metal with LLVM so they can generate everything from GPU to embedded native code.



    I find it pretty exciting stuff. I'd love it if some of the LLVM slides leaked. Hopefully Apple will post them soon for those who weren't able to attend WWDC.
  • Reply 13 of 51
    dick applebaumdick applebaum Posts: 12,527member
    Quote:
    Originally Posted by starwarrior View Post


    Even more. I have know guys that can read pages and pages of 0's and 1's and know what they are saying. But now that you mention it, the guys were sort of different. Maybe machines?



    We (more progressive) programmers write code in octal absolute. A programmer with any credentials would not trust these newfangled assemblers as they introduce cruft and generate un-optimized code.



    Further, the practice of using macro-instructions should be avoided at all costs-- it is a sign of laziness that will only serve to further estrange the programmer from the hardware and introduce bloat into the resulting program.
  • Reply 14 of 51
    solipsismsolipsism Posts: 25,726member
    Quote:
    Originally Posted by exscape View Post


    Assembly isn't human readable? I guess that makes me a machine.



    I've always suspected that everyone on the internet is a Bot, except me... hence my alias.





    Quote:
    Originally Posted by merdhead View Post


    Your post is very buzzword compliant. You should work for AI.





    While I don't agree with the sentiment, I did find it very humorous.
  • Reply 15 of 51
    merdheadmerdhead Posts: 587member
    Quote:
    Originally Posted by Booga View Post


    I think you're being a bit harsh. LLVM's intermediate bytecode approach is pretty interesting, both in terms of linear optimization and parallel optimization. I can imagine a case where LLVM opcodes hint at parallelizable tasks that can be dynamically compiled for any number of processors. Thus the language would simply say "the elements in this loop can be executed in parallel" and the compiler and OS would take care of the rest. And when you're dynamically recompiling, you have awareness of not only the system architecture but its resource availability at compile time.



    LLVM seems to be driving toward this a lot faster than Java or CLR's intermediate bytecode. They seem intent on staying pretty close to the metal with LLVM so they can generate everything from GPU to embedded native code.



    I find it pretty exciting stuff. I'd love it if some of the LLVM slides leaked. Hopefully Apple will post them soon for those who weren't able to attend WWDC.



    I'm not being harsh and I'm not saying its not interesting to someone.



    The important thing you mention in your post is 'language': you're going to need language support to parallelise. A compiler might take C code and improve parallel execution within one core, using multiple units or whatever, but its not going to make it run on multiple processors without library and OS support which is outside the job description of the compiler and requires code to be written to some required spec. The scope for automatically make code automatically parallel is very limited indeed, ie not really practical.
  • Reply 16 of 51
    aplnubaplnub Posts: 2,605member
    Quote:
    Originally Posted by da2357 View Post


    Count me in... I guess we're either machines or dinosaurs!



    Cylons! Someone get those mother frackers before they ruin the entire fleet.
  • Reply 17 of 51
    zanshinzanshin Posts: 350member
    Replicants, the whole lot of ya, I believe.



    And God bless you all for making my life so nice and personal computing experiences so rewarding. May you all be richly compensated for devoting your lives to something we mere mortals can't fathom.



  • Reply 18 of 51
    jeffdmjeffdm Posts: 12,951member
    Quote:
    Originally Posted by aplnub View Post


    Cylons! Someone get those mother frackers before they ruin the entire fleet.



    You can always tell they're the ones that need the most interspecies lovin' too.
  • Reply 19 of 51
    melgrossmelgross Posts: 33,510member
    While I agree with most of what is in the article, I would like to remind the writer that one tiny "factoid" isn't true.



    I'm getting a bit tired of writers, even some knowledgeable ones, writing that "now that individual cores aren't getting faster, cpu manufacturers are relying on multi core to move things forward".



    While it's true that when moving to dual cores, and above, IBM, Intel, and AMD moved the speeds of those cores down to allow for the heavier use of current on the chip required by the multi-core designs.



    Also, Intel particularly, needed to move back from their aggressive marketing for speed.



    But, despite writers seeming to miss it, we constantly talk about how chip speeds are moving up. And overclockers seem to get speeds that are quite a bit higher than the highest achieved by Prescott back then.



    Current Core 2 designs will get to at least 3.4 GHz, and possibly 3.6 Ghz before they are phased out. Nehalen will easily get to 3.6 GHz early, and will possibly exceed 4.0 GHz, and we will see 5 GHz some time after that in the next two years or so.



    So while the march to ever higher clock speeds has slowed, it continues.



    Other major advances show that adding more cores isn't the only method to add power. Nehalen has a host of new features to do that. Taking advantage of those new advanced features will provide most of the raw power increase we will see in processing, with the same number of cores.



    Sometimes, writers make it look as though the only advances in cpu design is in adding more cores. Not true.



    These newer compilers will be working very hard to take that advantage. If all we had to look forwards to were more cores, we would be in bad shape for the next few years.
  • Reply 20 of 51
    You mean that all of Apple's previous OS's were spaghetti code? Really? Do you double as an AP journalist as well? Both this article and their reporting are seriously fact-challenged.
Sign In or Register to comment.