Planet Clang

January 26, 2015


LLVM Weekly - #56, Jan 26th 2015

Welcome to the fifty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or or @llvmweekly or @asbradbury on Twitter.

I'll be talking the lowRISC project to produce a fully open-source SoC at FOSDEM this coming weekend. Do come and see my main track talk and read my speaker interview for more background. There is of course an LLVM toolchain devroomon the Sunday.

The canonical home for this issue can be found here at

News and articles from around the web

Stephen Diehl has written an absolutely fantastic tutorial on writing an LLVM specializer for Python, guiding you through the process of creating something like Numba.

A new tool, Dwgrep (DWARF Grep) may be of interest to many LLVM Weekly readers. This blog post gives an intro to using it.

Paul Smith has a blog post on getting started with the LLVM C API.

A post on the official LLVM Blog announces that LLDB is coming to Windows, announcing to a wider audience that it is now possible to debug simple programs with LLDB on Windows and giving a rationale for investing effort into porting LLDB to Windows and adding support for the MS debug format. The post also features a todo list indicating what's next for Windows support.

A draft version 0.1 of the IA-32 psABI (processor specific application binary interface) is available. This aims to supplement the existing System V ABI with conventions relevant to newer features such as SSE1-4 and AVX. Comments are welcome.

LLVM/Clang 3.6-rc1 is now available. Get testing and filing bugs.

ELLCC 0.1.8 has been released. ELLCC is an LLVM/Clang-based cross compilation toolchain.

LLDB now has it's own IRC channel. You'll want to join #lldb on

On the mailing lists

LLVM commits

  • A backend targeting the extended BPF (Berkeley Packet Filter) interpreter/JIT in the Linux kernel has been added. See this LWN article for more background. r227008.

  • The initial version of the new ORC JIT API has landed. r226940.

  • There's been a flurry of work on the new pass manager this week. One commit I will choose to pick out is the port of InstCombine to the new pass manager, which seems like a milestone or sorts. r226987.

  • LLVM learnt how to use the GHC calling convention on AArch64. r226473.

  • InstCombine will now canonicalize loads which are only ever stored to always use a legal integer type if one is available. r226781.

  • The llvm_any_ty type for intrinsics has been born. r226857.

  • llvm-objdump now understands -indirect-symbols to dump the Mach-O indirect symbol table. r226848.

Clang commits

  • Clang now supports SPIR calling conventions. r226548.

  • It's now possible to set the stack probe size on the command line. r226601.

  • Clang gained initial support for Win64 SEH IR emission. r226760.

Other project commits

  • Sun Solaris users, now is the time to celebrate. libc++ will now build on your platform of choice. r226947.

  • A minimal implementation of ARM static linking landed in lld. r226643.

  • Basic support for PPC was added to openmp. r226479.

by Alex Bradbury ( at January 26, 2015 02:56 PM

January 20, 2015


LLDB is Coming to Windows

We've spoken in the past about teaching Clang to fully support Windows and be compatible with MSVC.  Until now, a big missing piece in this story has been debugging the clang-generated executables.  Over the past 6 months, we've started working on making LLDB work well on Windows and support debugging both regular Windows programs and those produced by Clang.

Why not use an existing debugger such as GDB, Visual Studio's, or WinDBG?  There are a lot of factors in making this kind of decision.  For example, while GDB understands the DWARF debug information produced by Clang on Windows, it doesn't understand the Microsoft C++ ABI or debug information format.  On the other hand, neither Visual Studio nor WinDBG understand the DWARF debug information produced by Clang.  With LLDB, we can teach it to support both of these formats, making it usable with a wider range of programs.  There are also other reasons why we're really excited to work on LLDB for Windows, such as the tight integration with Clang which lets it support all of the same C++ features in its expression parser that Clang supports in your source code.  We're also looking to continue adding new functionality to the debugging experience going forward, and having an open source debugger that is part of the larger LLVM project makes this really easy.

The past few months have been spent porting LLDB's core codebase to Windows.  We've been fixing POSIX assumptions, enhancing the OS abstraction layer, and removing platform specific system calls from generic code.  Sometimes we have needed to take on significant refactorings to build abstractions where they are necessary to support platform specific differences.  We have also worked to port the test infrastructure to Windows and set up build bots to ensure things stay green.

This preliminary bootstraping work is mostly complete, and you can use LLDB to debug simple executables generated with Clang on Windows today.  Note the use of the word "simple".  At last check, approximately 50% of LLDB's tests fail on Windows.  Our baseline, however, which is a single 32-bit executable (i.e. no shared libraries), single-threaded application built and linked with Clang and LLD using DWARF debug information, works today.  We've tested all of the fundamental functionality such as:
  1. Various methods of setting breakpoints (address, source file+line, symbol name, etc)
  2. Stopping at and continuing from breakpoints
  3. Process inspection while stopped, such as stack unwinding, frame setting, memory examination, local variables, expression evaluation, stepping, etc  (one notable exception to this is that step-over doesn't yet work well in the presence of limited symbol information).
Of course, there is still more to be done.  Here are some of the areas we're planning to work on next:
  1. Fixing low hanging fruit by improving the pass-rate of the test suite.
  2. Better support for debugging multi-threaded applications.
  3. Support for debugging crash dumps.
  4. Support for debugging x64 binaries.
  5. Enabling stepping through shared libraries.
  6. Understanding PDB (for debugging system libraries, and executables generated with MSVC).  Although the exact format of PDB is undocumented, Microsoft still provides a rich API for querying PDB in the form of the DIA SDK.
  7. Adding debugging commands familiar to users of WinDBG (e.g. !handle, !peb, etc)
  8. Remote debugging
  9. Symbol server support
  10. Visual Studio integration
If you're using Clang on Windows, we would encourage you to build LLDB (it should be in the Windows LLVM installer soon) and let us know your thoughts by posting them to lldb-dev.  Make sure you file bugs against LLDB if you notice anything wrong, and we would love for you to dive into the code and help out.  If you see something wrong, dig in and try to fix it, and post your patch to lldb-commits.

by Zachary Turner ( at January 20, 2015 07:29 PM

January 19, 2015


LLVM Weekly - #55, Jan 19th 2015

Welcome to the fifty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

It seems to have been a very busy week in the world of LLVM, particularly with regards to discussion on the mailing list. Due to travel etc and the volume of traffic, I haven't been able to do much summarisation of mailing list discussion I'm afraid.

The canonical home for this issue can be found here at

News and articles from around the web

LLM/Clang 3.6 has been branched and subsequently, 3.6 RC1 has been tagged.

LLVM/Clang 3.5.1 seems to have been quietly released.

Registration for EuroLLVM 2015, to be held at Goldsmiths College in London, UK on April 13-14th is now open.

All slides and videos from the last LLVM Developers' meeting are now live, including those from Apple employees.

On the mailing lists

LLVM commits

  • A new code diversity feature is now available. The NoopInsertion pass will add random no-ops to x86 binaries to try to make ROP attacks more difficult by increasing diversity. r225908. I highly recommend reading up on the blind ROP attack published last year. It would also be interesting to see an implementation of G-Free for producing binaries without simple gadgets. The commit was later reverted for some reason.

  • A nice summary of recent MIPS and PowerPC target developments, as well as the OCaml bindings is now there in the form of the 3.6 release notes. r225607, r225695, r225779.

  • LLVM learned the llvm.frameallocate and llvm.framerecover intrinsics, which allow multiple functions to share a single stack allocation from one function's call frame. r225746, r225752.

  • An experimental (disabled by default) 'inductive range check elimination' pass has landed. This attempts to eliminates range checks of the form 0 <= A*I + B < Length. r226201.

  • StackMap/PatchPoint support is now available for the PowerPC target. r225808.

  • Initial support for Win64 SEH catch handlers has landed. See the commit message for current missing functionality. r225904.

  • A new utility script has been started to help update simple regression tests. It needs some work to generalise it beyond x86. r225618.

  • TargetLibraryInfo has been moved into the Analysis library. r226078.

Clang commits

  • The new -fno-inline-asm flag has been added to disallow all inline asm. If it exists in the input code it will be reported as an error.

  • -fsanitize-recover command line flags are again supported. r225719.

  • The integrated assembler is now used by default on 32-bit PowerPC and SPARC. r225958.

Other project commits

  • The libcxx build system learnt how to cross-compile. r226237.

  • LLD gained a nice speedup by speculative instantiating archive file members. This shaves off a second or two for linking lld with lld. r226336.

  • LLD learnt the --as-needed flag (previously this was the default behaviour). r226274.

  • OpenMP gained an AARch64 port. r225792.

by Alex Bradbury ( at January 19, 2015 05:58 PM

January 13, 2015


LLVM Weekly - #54, Jan 12th 2015

Welcome to the fifty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

As you receive this week's issue, I should be on my way to California where I'll be presenting lowRISC at the RISC-V workshop in Monterey and having a few mother meetings. I'm in SF Fri-Sun and somewhat free on the Saturday if anyone wants to meet and chat LLVM or lowRISC/RISC-V.

The canonical home for this issue can be found here at

News and articles from around the web

Euro LLVM 2015 will be held on April 13th-14th in London, UK. The call for papers is now open with a deadline of 16th Feb.

Talks for the LLVM devroom at FOSDEMhave been announced. The LLVM devroom is on Sunday 1st Feb. Readers will be pleased to know this doesn't clash with my talk on lowRISC which is on the Saturday.

Google now use Clang for production Chrome builds on Linux. They were previously using GCC 4.6. Compared to that baseline, performance stayed roughly the same while binary size decreased by 8%. It would certainly have been interesting to compare to a more recent GCC baseline. The blog post indicates they're hopeful to use Clang in the future for building Chrome for Windows.

Philip Reames did an interesting back of the envelope calculation about the cost of maintaining LLVM. He picked out commits which seems like they could be trivially automated and guesstimated a cost based on developer time. The figure he arrives at is $1400 per month.

The next LLVM social for Cambridge, UK will be on Wed 21st Jan at 7:30pm.

On the mailing lists

LLVM commits

  • An option hoist-cheap-insts has been added to the machine loop invariant code motion pass to enable hosting even cheap instructions (as long as register pressure is low). This is disabled by default. r225470.

  • The calculation of the unrolled loop size has been fixed. Targets may want to re-tune their default threshold. r225565, r225566.

  • DIE.h (datastructures for DWARF info entries) is now a public CodeGen header rather than being private to the AsmPrinter implementation. dsymutil will make use of it. r225208.

  • The new pass manager now has a handy utility for generating a no-op pass that forces a usually lazy analysis to be run. r225236.

  • There's been a minor change to the .ll syntax for comdats. r225302.

  • There have been some minor improvements to the emacs packages for LLVM and tablegen mode. r225356.

  • An example GCStrategy using the new statepoint infrastructure has been added. r225365, r225366.

Clang commits

  • A Wself-move warning has been introduced. Similar to -Wself-assign, it will warn you when your code tries to move a value to itself. r225581.

  • The I, J, K, M, N, O inline assembly constraints are now checked. r225244.

Other project commits

  • The libcxx test infrastructure has been refactored into separate modules. r225532.

  • The effort to retire InputElement in lld continues. Linker script files are no longer represented as an InputElement. r225330.

  • Polly has gained a changelog in preparation of the next release.r225264.

  • Polly has also gained a TODO list for its next phase of development. r225388.

by Alex Bradbury ( at January 13, 2015 02:10 AM

January 11, 2015

Philip Reames

How much does it cost to maintain LLVM?

In early October of 2014, I started collecting changes that I saw fly by on llvm-commits that I thought would be straight-forward to automate.  I was trying to be pretty conservative, so these tend to be pretty basic things: fixing deceptive white space around an if clause, removing the name of a method from it’s doxygen comment, removing a couple of syntactically redundant semi colons, and things of similar complexity.  These weren’t chosen because they were interesting, but precisely because they were not.

In the 66 days since I started collecting, I’ve saved 105 unique commits.  That’s a bit less than 2 per day, and only about 1.6% of the 6,500 commits made to LLVM in that time.

Let’s assume that each of those changes took an average of 15 minutes on the part of their author.  That’s not too much more than a single build and test cycle, so it seems like a reasonable estimate.  At roughly $2 per developer minute, we can guesstimate that each of these changes cost about $30.  Taken together, these 105 changes consumed about 26 hours of developer time at a cost of a bit over $3,150.

This gives us a value for straight forward code maintenance activities of roughly $1,400 per month (or roughly $50 per day.)

If anything, this is an extremely low estimate.  I know several of these changes required review, and at least a couple of them broke the build and had to be reverted.  We could probably add in several more hours of developer time just for that alone.

Now this is only a small fraction of the roughly $88,000 in development time going to the project as a whole each month*, but it’s still pretty material.

* Using the same logic as above: 15 minutes per change, $2 per developer minute, 6500 changes, llvm repository only.  It goes without saying that this is a massive understatement of the actual value of the contributed work.

by reames at January 11, 2015 04:54 AM

January 06, 2015


Using clang for Chrome production builds on Linux

Chrome 38 was released early October 2014. It is the first release where the Linux binaries shipped to users are built by clang. Previously, this was done by gcc 4.6. As you can read in the announcement email, the switch happened without many issues. Performance stayed roughly the same, binary size decreased by about 8%. In this post I'd like to discuss the motivation for this switch.


There are two reasons for the switch.

1. Many Chromium developers already used clang on Linux. We've supported opting in to clang for since before clang supported C++ – because of this, we have a process in place for shipping new clang binaries to all developers and bots every few weeks. Because of clang's good diagnostics (some of which we added due to bugs in Chromium we thought the compiler should catch), speed, and because of our Chromium-specific clang plugin, many Chromium developers switched to clang over the years. Making clang the default compiler removes a stumbling block for people new to the project.

2. We want to use modern C++ features in Chromium. This requires a recent toolchain – we figured we needed at least gcc 4.8. For Chrome for Android and Chrome for Chrome OS, we updated our gcc compilers to 4.8 (and then 4.9) – easy since these ports use a non-system gcc already. Chrome for Mac has been using Chromium's clang since Chrome 15 and was already in a good state. Chrome for iOS uses Xcode 5's clang, which is also new enough. Chrome for Windows uses Visual Studio 2013 Update 4. On Linux, switching to clang was the easiest way forward.

Keeping up with C++'s evolution in a large, multi-platform project

C++ had been static for many years. C++11 is the first real update to the C++ language since the original C++ standard (approved on July 27 1998). C++98 predated the founding of Google, YouTube, Facebook, Twitter, the releases of Mac OS X and Windows XP, and x86 SSE instructions. The time between the two standards saw the rise and fall of the iPod, several waves of social networks, and the smartphone explosion.

The time between C++11 and C++14 was three years, and the next major iteration of the language is speculated to be finished in 2017, three years from C++14. This is a dramatic change, and it has repercussions on how to build and ship C++ programs. It took us 3+ years to get to a state where we can use C++11 in Chromium; C++14 will hopefully take us less long. (If you're targeting fewer platforms, you'll have an easier time.)

There are two parts to C++11: New language features, and new library features. The language features just require a modern compiler at build time on build machines, the library features need a new standard library at runtime on the user's machine.

Deploying a new compiler is conceptually relatively simple. If your developers are on Ubuntu LTS releases and you make them use the newest LTS release, they get new compilers every two years – so just using the default system compiler means you're up to two years behind. There needs to be some process to relatively transparently deploy new toolchains to your developers – an "evergreen compiler". We now have this in place for Chromium – on Linux, by using clang. (We still accept patches to keep Chromium buildable with gccs >= 4.8 for people who prefer compiling locally over using precompiled binaries, and we still use gcc as the target compiler for Chrome for Android and Chrome OS.)

The library situation is slightly more tricky: On Linux and Mac OS X, programs are usually linked against the system C++ library. Chrome wants to support Mac OS X 10.6 a bit longer (our users seem to love this OS X release), and the only C++ library this ships with is libstdc++ 4.2 – which doesn't have any C++11 bits. Similarly, Ubuntu Precise only has libstdc++ 4.6. It seems that with C++ updating more often, products will have to either stop supporting older OS versions (even if they still have many users on these old versions), adopt new C++ features very slowly, or ship with a bundled C++ standard library. The latter implies that system libraries shouldn't have a C++ interface for ABI reasons – luckily, this is mostly already the case.

To make things slightly more complicated, gcc and libstdc++ expect to be updated at the same time. gcc 4.8 links to libstdc++ 4.8, so upgrading gcc 4.8 while still linking to Precise's libstdc++ 4.6 isn't easy. clang explicitly supports building with older libstdc++ versions.

For Chromium, we opted to enable C++11 language features now, and then allow C++11 library features later once we have figured out the story there. This allows us to incrementally adopt C++11 features in Chromium, but it's not without risks: vector<int> v0{42} for example means something different with an old C++ library and a new C++ library that has a vector constructor taking an initializer_list. We disallow using uniform initialization for now because of this.

Since bundling a C++ library seems to become more common with this new C++ update cadence, it would be nice if compiler drivers helped with this. Just statically linking libstdc++ / libc++ isn't enough if you're shipping a product consisting of several executables or shared libraries – they need to dynamically link to a shared C++ library with the right rpaths, the C++ library probably needs mangled symbol names that don't conflict with the system C++ library which might be loaded into the same process due to other system libraries using it internally (for example, maybe using an inline namespace with an application-specific name), etc.

Future directions

As mentioned above, we're trying to figure out the C++ library situation. The tricky cases are Chrome for Android (which currently uses STLport) and Chrome for Mac. We're hoping to switch Chrome for Android to libc++ (while still using gcc as compiler). On Mac, we'll likely bundle libc++ with Chrome too.

We're working on making clang usable for compiling Chrome for Windows. The main motivations for this are using AddressSanitizer, providing a compiler with great diagnostics for developers, and getting our tooling infrastructure working on Windows (used for example automated large-scale cross-OS refactoring and for building our code search index – try clicking a few class names; at the moment only code built on Linux is hyperlinked). We won't use clang as a production compiler on Windows unless it produces a chrome binary that's competitive with Visual Studio's on both binary size and performance. (From an open-source perspective, it is nice being able to use an open-source compiler to compile an open-source program.)

You can reach us at

by thakis ( at January 06, 2015 01:01 AM

January 05, 2015


LLVM Weekly - #53, Jan 5th 2015

Welcome to the fifty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

I'm going to be in California next week for the RISC-V workshop. I'm arriving at SFO on Monday 12th and leaving on Sunday the 18th. Do let me know if you want to meet and talk lowRISC/RISC-V or LLVM, and we'll see what we can do.

The canonical home for this issue can be found here at

News and articles from around the web

I was getting ready to break out gitstats for some analysis of the LLVM repo and I find to my delight that Phoronix has saved me the trouble and has shared some stats on activity in the LLVM repo over the past year.

Tom Stellard has made a blog post announcing some recent RadeonSI performance improvements on his LLVM development branch. This includes 60% improvement in one OpenCL benchmark and 10-25% in a range of other OpenCL tests.

Gaëtan Lehmann has written a blog post about getting started with libclang using the Python bindings.

The C++ Filesystem Technical Specification, based on the Boost.Filesystem library has been approved.

On the mailing lists

LLVM commits

  • Instruction selection for bit-permuting operations on PowerPC has been improved. r225056.

  • The scalar replacement of aggregates (SROA) pass has started to learn how to more intelligently handle split loads and stores. As explained in detail in the commit message, the old approach lead to complex IR that can be difficult for the optimizer to work with. SROA is now also more aggressive in its splitting of loads. r225061, r225074.

  • InstCombine will now try to transform A-B < 0 in to A < B. r225034.

  • The Hexagon (a Qualcomm DSP) backend has seen quite a lot of work recently. Interested parties are best of flicking through the commit log of lib/Target/Hexagon. r225005, r225006, etc.

Clang commits

Other project commits

by Alex Bradbury ( at January 05, 2015 02:30 PM

December 29, 2014


LLVM Weekly - #52, Dec 29th 2014

Welcome to the fifty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

This issue marks the end of one full year of LLVM Weekly. It's a little shorter than usual as the frenetic pace of LLVM/Clang development has slowed over the holiday period. Surprising even to me is that we managed to make it full all 52 weeks with an issue every Monday as promised. This requires a non-trivial amount of time each week (2-3+ hours), but I am intending to keep it going into 2015. I'd like to give a big thank you to everyone who's said hi at a conference, sent in corrections or tips on content, or just sent a random thank you. It's been very helpful in motivation. I don't currently intend to change anything about the structure or content of each issue for next year, but if you have any ideas then please let me know.

I can't make it to 31C3 due to the awkward timing of the event, but do let me know if there are any LLVM/Clang related talks worth sharing. There was a talk about Code Pointer Integrity which has previously been covered in LLVM Weekly and is working towards upstreaming. The video is here. If you're interested in lowRISC and at 31C3, Bunnie is leading a discussion about itat 2pm on Monday (today).

The canonical home for this issue can be found here at

News and articles from around the web

There doesn't seem to have been any LLVM or Clang related news over the past week. Everyone seems to be busy with non-LLVM related activities over the christmas break. If you're looking for a job though, Codeplay tell me they have two vancancies: one for a debugger engineer and another for a compiler engineer.

On the mailing lists

  • David Li has shared some early info on Google's plans for LTO. He describes the concept of 'peak optimization performance' and some of the objectives of the new design. This includes the ability to handle programs 10x or 100x the size of Firefox. We can expect more information in 2015, maybe as early as January.

  • The discussion on possible approaches to reducing the size of libLLVM has continued. Chris Bieneman has shared some more size stats. These gains come from removing unused intrinsics. Chandler Carruth has followed up with a pleasingly thought-provoking argument on a different approach: target-specific intrinsics shouldn't exist in the LLVM front or middle-end. He describes the obvious issues with this, with the most fiddly probably being instruction selection converting appropriate IR to the right target-specific functionality.

LLVM commits

  • The SROA (scalar replacement of aggregates) pass has seen some refactoring to, in the future, allow for more intelligent rewriting. r224742, r224798.

  • The masked load and store intrinsics have been documented. r224832.

  • CodeGenPrepare learned to speculate calls to llvm.cttz/ctlz (count trailing/leading zeroes) if isCheapToSpeculateCtlz/isCheapToSpeculatCttz in TargetLowering return true. r224899.

Clang commits

  • The Clang internals manual has been extended with stub sections on Parse, Sema, and CodeGen. r224894.

Other project commits

  • The libcxx LIT test-suite has seen a number of new configuration options. Even better, these are now documented. r224728.

by Alex Bradbury ( at December 29, 2014 10:20 AM

LLVM Weekly - #50, Dec 15th 2014

Welcome to the fiftieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

I'll be at MICRO-47 this week. If you're there do say hi, especially if you want to chat about LLVM or lowRISC/RISC-V.

The canonical home for this issue can be found here at

News and articles from around the web

The videos and slides from the 2014 LLVM dev meeting went online last week. I already linked to them then, but there's enough interesting stuff there I think I can justify linking again.

LLVM/Clang 3.5.1-rc1 has been tagged. Volunteer testers are very welcome.

Clang UPC 3.4.1 has been released. This is a Unified Parallel C compiler that can target SMP systems or Portals4.

On the mailing lists

LLVM commits

  • The LLVM Kaleidoscope tutorial has been extended with an 8th chapter, describing how to add debug information using DWARF And DIBuilder. r223671. A rendered version can be found here.

  • Extensive documentation has been added for the MergeFunctions pass. r223931.

  • A monster commit to split Metadata from the Value class hierarchy has landed. r223802.

  • InstrProf has been born. This involves the llvm.instrprof_increment instrinsic and the -instrprof pass. This moves logic from Clang's CodeGenPGO into LLVM. r223672.

  • With the addition of support for SELECT nodes, the MIPS backend now supports codegen of MIPS-II targets on the LLVM test-suite. Code generation has also been enabled for MIPS-III. r224124, r224128.

  • Work has started on an LLVM-based dsymutil tool, with the aim to replace Darwin's dsymutil (a DWARF linker). r223793.

  • LiveInterval has gained support to track the liveness of subregisters. r223877.

  • Work has started on converting moves to pushes on X86 when appropriate. r223757.

  • Print and verify passes are now added after each MachineFunctionPass by default, rather than on some arbitrarily chosen subset. r224042.

  • LLVM now requires Python 2.7. Previously 2.5 was required. r224129.

Clang commits

  • The __builtin_call_with_static_chain GNU extension has been implemented. r224167.

  • Clang's CodeGenPGO has moved to using the new LLVM -instrprof pass. r223683.

  • Clang now accepts Intel microarchitecture names as the -march argument. r223776.

Other project commits

  • libcxx gained relational operators in std::experimental::optional. r223775.

  • libcxx can now be built as a 32-bit library. r224096.

  • The lldb unwinder has learned to use unwind information from the compact-unwind section for x86-64 and i386 on Darwin. r223625.

by Alex Bradbury ( at December 29, 2014 10:19 AM

LLVM Weekly - #51, Dec 22nd 2014

Welcome to the fifty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

Last week as part of the lowRISC project I was involved in sharing our plans for tagged memory and 'minion' cores in the initial version. We've almost made it a full year of LLVM Weekly with no interruption of service!

The canonical home for this issue can be found here at

News and articles from around the web

3.5.1-rc2 has been tagged, time to get testing again.

Version 0.15.1 of LDC, the LLVM D Compiler has been released. The most prominent feature is probably the addition of preliminary support for MSVC on Win64.

SN Systems (part of Sony) have written a blog post describing their recently contributed ABI test suite.

Peter Wilmott has benchmarked Ruby across various GCC and Clang releases. The discussion at HN may be of interest.

On the mailing lists

LLVM commits

  • Metadata is now typeless in assembly. r224257.

  • PowerPC instruction selection for bit-permuting operations has been improved. r224318.

  • An optimisation has been added to move sign/zero extends close to loads which causes performance improvements of 2-3% on a few benchmarks on x86. r224351.

  • More overflow arithmetic intrinsics are strength reduced into regular arithmetic operations if possible. r224417.

Clang commits

  • Codegen for 'omp for' has started to be committed. r224233.

  • -save-temps will now emit unoptimized bitcode files. r224688.

Other project commits

  • The libcxx test suite can be run with ccache now. r224603.

  • Breakpoints can now be tagged with a name in lldb. r224392.

by Alex Bradbury ( at December 29, 2014 10:19 AM

December 17, 2014

OpenMP Runtime Project

New code release

We are excited to announce the next release of the Intel® OpenMP* Runtime Library at This release aligns with Intel® Parallel Studio XE 2015 Composer Edition Update 2.

New Features

by Johnny Peyton at December 17, 2014 07:47 PM

December 15, 2014


LLVM Weekly - #49, Dec 8th 2014

Welcome to the forty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Most of the 2014 LLVM Developers' Meeting videos and slides are now online. Sadly, there are no videos from the talks by Apple employees yet. Hopefully they'll be appearing later.

QuarksLab has a rather nice write-up of deobfuscating an OLLVM-protected program.

The LLVM-based ELLCC has been making progress on ELK, a bare-metal POSIX-like environment.

Support for statepoints landed in LLVM this week, and Philip Reames has a blog post detailing some notes and caveats. See also the mailing list discussion linked to below about future plans for GC in LLVM.

On the mailing lists

LLVM commits

  • The statepoint infrastructure for garbage collection has landed. See the final patch in the series for documentation. r223078, r223085, r223137, r223143.

  • The LLVM assembler gained support for ARM's funky modified-immediate assembly syntax. r223113.

  • The OCaml bindings now has a CMake buildsystem. r223071.

  • The PowerPC backend gained support for readcyclecounter on PPC32. r223161.

  • Support for 'prologue' metadata on functions has been added. This can be used for inserting arbitrary code at a function entrypoint. This was previously known as prefix data, and that term has been recycled to be used for inserting data just before the function entrypoint. r223189.

  • PowerPC gained a Power8 instruction schedule definition r223257.

Clang commits

  • LLVM IR for vtable addresses now uses the type of the field being pointed to, to enable more optimisations. r223267.

  • New attributes have been added to specify AMDGPU register limits. This is a performance hint that can be used to attempt to limit the number of used registers. r223384.

  • Clang gained the __has_declspec_attribute preprocessor macro. r223467.

  • __has_attribute now only looks for GNU-style attributes. You should be able to use __has_cpp_atribute or __has_declspec_attribute instead. r223468.

Other project commits

  • DataFlowSanitizer is now supported for MIPS64. r223517.

  • libcxx now supported std::random_device on (P)NaCl. r223068.

  • An effort has started in lld to reduce abstraction around InputGraph, which has been found to get in the way of new features due to excessive information hiding. r223330. The commit has been temporarily reverted due to breakage on Darwin and ELF.

  • A large chunk of necessary code for Clang module support has been added to LLDB. r223433.

  • LLDB now has documented coding conventions. r223543.

by Alex Bradbury ( at December 15, 2014 11:27 AM

December 01, 2014


LLVM Weekly - #48, Dec 1st 2014

Welcome to the forty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

John Regehr has posted an update on the Souper superoptimizer which he and his collaborators have been working on. They have implemented a reducer for Souper optimizations that tries to reduce the optimization to something more minimal. There current results given ~4000 distinct optimisations of which ~1500 LLVM doesn't know how to do. Of course many of these may in fact be covered by a single rule or pass. One of the next steps for Souper is to extend Souper to support the synthesis of instruction sequences. See also the discussion on the llvm mailing list.

The LLVM Blog features a summary of recent advances in loop vectorization for LLVM. This includes diagnostics remarks to get feedback on why loops which aren't vectorized are skipped, the loop pragma directive in Clang, and performance warnings when the directive can't be followed.

The LLVM Haskell Compiler (LHC) has been newly reborn along with its blog. The next steps in development are to provide better support for Haskell2010, give reusable libraries for name resolution and type checking, and to produce human-readable compiler output.

The next LLVM Social in Paris will take place on December 9th.

Intel have published a blog post detailing new X86-specific optimisations in GCC 5.0. You may also be interested in the discussion of this post on Hacker News.

On the mailing lists

LLVM commits

  • Support for -debug-ir (emitting the LLVM IR in debug data) was removed. There's no real justification or explanation in the commit message, but it's likely it was unfinished/unused/non-functional. r222945.

  • InstCombine will now canonicalize toward the value type being stored rather than the pointer type. The rationale (explained in more detail in the commit message) is that memory does not have a type, but operations and the values they produce do. r222748.

  • The documentation for !invariant.load metadata has been clarified. r222700.

  • In tablegen, neverHasSideEffects=1 is now hasSideEffects=0. r222801.

Clang commits

  • Four new ASTMatchers have been added: typedefDecl, isInMainFile, isInSystemFile, and isInFileMatchinName. r222646.

  • The documentation on MSVC compatibility has been updated to represent the current state of affairs. Clang has also gained support for rethrowing MS C++ exceptions. r222731, r222733.

Other project commits

  • Initial tests have been added for lldb-mi (the LLDB machine interface). r222750.

  • libcxxabi can now be built and tested without threads using CMake. r222702.

  • The compact-unwind-dumper tool now has complete support for x86-64 and i386 binaries. r222951.

by Alex Bradbury ( at December 01, 2014 02:57 PM

November 25, 2014


Loop Vectorization: Diagnostics and Control

Loop vectorization was first introduced in LLVM 3.2 and turned on by default in LLVM 3.3. It has been discussed previously on this blog in 2012 and 2013, as well as at FOSDEM 2014, and at Apple's WWDC 2013. The LLVM loop vectorizer combines multiple iterations of a loop to improve performance. Modern processors can exploit the independence of the interleaved instructions using advanced hardware features, such as multiple execution units and out-of-order execution, to improve performance.

Unfortunately, when loop vectorization is not possible or profitable the loop is silently skipped. This is a problem for many applications that rely on the performance vectorization provides. Recent updates to LLVM provide command line arguments to help diagnose vectorization issues and new a pragma syntax for tuning loop vectorization, interleaving, and unrolling.

New Feature: Diagnostics Remarks

Diagnostic remarks provide the user with an insight into the behavior of the behavior of LLVM’s optimization passes including unrolling, interleaving, and vectorization. They are enabled using the Rpass command line arguments. Interleaving and vectorization diagnostic remarks are produced by specifying the ‘loop-vectorize’ pass. For example, specifying ‘-Rpass=loop-vectorize’ tells us the following loop was vectorized by 4 and interleaved by 2.

void test1(int *List, int Length) {
  int i = 0;
  while(i < Length) {
    List[i] = i*2;

clang -O3 -Rpass=loop-vectorize -S test1.c -o /dev/null

test1.c:4:5: remark: 
vectorized loop (vectorization factor: 4, unrolling interleave factor: 2)
    while(i < Length) {

Many loops cannot be vectorized including loops with complicated control flow, unvectorizable types, and unvectorizable calls. For example, to prove it is safe to vectorize the following loop we must prove that array ‘A’ is not an alias of array ‘B’. However, the bounds of array ‘A’ cannot be identified.

void test2(int *A, int *B, int Length) {
  for (int i = 0; i < Length; i++)

clang -O3 -Rpass-analysis=loop-vectorize -S test2.c -o /dev/null

test2.c:3:5: remark:
loop not vectorized: cannot identify array bounds
    for (int i = 0; i < Length; i++)

Control flow and other unvectorizable statements are reported by the '-Rpass-analysis' command line argument. For example, many uses of ‘break’ and ‘switch’ are not vectorizable.

C/C++ Code-Rpass-analysis=loop-vectorize
for (int i = 0; i < Length; i++) {
  if (A[i] > 10.0)
  A[i] = 0;

control_flow.cpp:5:9: remark: loop not vectorized: loop control flow is not understood by vectorizer
    if (A[i] > 10.0)

for (int i = 0; i < Length; i++) {
  switch(A[i]) {
  case 0: B[i] = 1; break;
  case 1: B[i] = 2; break;
  default: B[i] = 3;

no_switch.cpp:4:5: remark: loop not vectorized: loop contains a switch statement
    switch(A[i]) {


New Feature: Loop Pragma Directive

Explicitly control over the behavior of vectorization, interleaving and unrolling is necessary to fine tune the performance. For example, when compiling for size (-Os) it's a good idea to vectorize the hot loops of the application to improve performance. Vectorization, interleaving, and unrolling can be explicitly specified using the #pragma clang loop directive prior to any for, while, do-while, or c++11 range-based for loop. For example, the vectorization width and interleaving count is explicitly specified for the following loop using the loop pragma directive.

void test3(float *Vx, float *Vy, float *Ux, float *Uy, float *P, int Length) {
#pragma clang loop vectorize_width(4) interleave_count(4)
#pragma clang loop unroll(disable)
  for (int i = 0; i < Length; i++) {
    float A = Vx[i] * Ux[i];
    float B = A + Vy[i] * Uy[i];
    P[i] = B;

clang -O3 -Rpass=loop-vectorize -S test3.c -o /dev/null

test3.c:5:5: remark:
vectorized loop (vectorization factor: 4, unrolling interleave factor: 4)
    for (int i = 0; i < Length; i++) {

Integer Constant Expressions

The options vectorize_width, interleave_count, and unroll_count take an integer constant expression. So it can be computed as in the example below.

template <int ArchWidth, int ExecutionUnits>
void test4(float *Vx, float *Vy, float *Ux, float *Uy, float *P, int Length) {
#pragma clang loop vectorize_width(ArchWidth)
#pragma clang loop interleave_count(ExecutionUnits * 4)
  for (int i = 0; i < Length; i++) {
    float A = Vx[i] * Ux[i];
    float B = A + Vy[i] * Uy[i];
    P[i] = B;

void compute_test4(float *Vx, float *Vy, float *Ux, float *Uy, float *P, int Length) {
  const int arch_width = 4;
  const int exec_units = 2;
  test4<arch_width, exec_units>(Vx, Vy, Ux, Uy, P, Length);

clang -O3 -Rpass=loop-vectorize -S test4.cpp -o /dev/null

test4.cpp:6:5: remark:
vectorized loop (vectorization factor: 4, unrolling interleave factor: 8)
    for (int i = 0; i < Length; i++) {

Performance Warnings

Sometimes the loop transformation is not safe to perform. For example, vectorization fails due to the use of complex control flow. If vectorization is explicitly specified a warning message is produced to alert the programmer that the directive cannot be followed. For example, the following function which returns the last positive value in the loop, cannot be vectorized because the ‘last_positive_value’ variable is used outside the loop.

int test5(int *List, int Length) {
  int last_positive_index = 0;
  #pragma clang loop vectorize(enable)
  for (int i = 1; i < Length; i++) {
    if (List[i] > 0) {
      last_positive_index = i;
    List[i] = 0;
  return last_positive_index;

clang -O3 -g -S test5.c -o /dev/null

test5.c:5:9: warning:
loop not vectorized: failed explicitly specified loop vectorization
    for (int i = 1; i < Length; i++) {

The debug option ‘-g’ allows the source line to be provided with the warning.


Diagnostic remarks and the loop pragma directive are two new features that are useful for feedback-directed-performance tuning. Special thanks to all of the people who contributed to the development of these features. Future work includes adding diagnostic remarks to the SLP vectorizer and an additional option for the loop pragma directive to declare the memory operations as safe to vectorize. Additional ideas for improvements are welcome.

by Tyler Nowicki ( at November 25, 2014 01:34 AM

November 24, 2014


LLVM Weekly - #47, Nov 24th 2014

Welcome to the forty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Version 3.0 of the Capstone disassembly framework has been released. Python bindings have been updated to support Python 3, and this release also adds support for Sparc, SystemZ and XCore. It also has performance improvements.

Herb Sutter has penned a trip report of the recent ISO C++ meeting.

Emscripten has updated to use LLVM 3.4 from the PNaCl team. There's more work to be done to rebase on top of 3.5.

Woboq has written a blog post detailing C++14 features of interest to Qt programmers, though I suspect the article has a wider potential audience than that. Recent Clang of course has good support for the new C++14 features.

There is going to be an LLVM Devroom at FOSDEM 2015, and the submission deadline for presentations/talks/tutorials is on Dec 1st.

Apple's LLVM Source Tools and Program Analysis teams are looking for interns for Summer 2015.

On the mailing lists

  • If you're wondering how the process of adding OpenMP support to Clang is going, the answer is that it's still ongoing and there's hope it will be done by the 3.6 release, depending on the speed of code reviews.

  • Siva Chandra kicked off a discussion on the mailing list about how to better manage breakages caused by LLVM or Clang API changes. Siva suggests LLDB should be developed against a known-good version of LLVM/Clang that gets periodically bumped. Vince Harron says that he is looking to add a continuous build on curated versions of Clang/LLVM in addition to a continuous build on top of tree for everything. This should help improve the signal to noise ratio and make it easier for LLDB developers to tell when a breaking change is due to their addition or a change elsewhere. Reid Kleckner suggests lldb should be treated part of the same project as Clang/LLDB and more pressure should be put on developers to fix breakages, presumably in the same way that API changes in LLVM almost always come with an associated patch to fix Clang.

  • Peter Collingbourne has proposed adding the llgo frontend to the LLVM project. Chris Lattner is in favour of this, but would like to see the GPLv3+runtime exception dependencies rewritten before being checked in. Some people in the thread expressed concern that the existing base of LLVM/Clang reviewers know C++ and may not be able to review patches in Go, though it looks like a non-zero of existing LLVM reviewers are appropriately multilingual.

  • Brett Simmers is working on HHVM and is interested if there are ways to control where a BasicBlock ends up in memory, with the motivation to make best of the instruction cache by keeping frequently executed pieces of code closer together. There's general agreement this would be a great feature to have, but it doesn't sound like this is easily supported in LLVM right now.

LLVM commits

  • A small doc fix has the honour of being commit 222222.

  • A nice little optimisation has been committed which replaces a switch table with a mul and add if there is a linear mapping between index and output. r222121.

  • The SeparateConstOffsetFromGEP, EarlyCSE, and LICM passes have been enabled on AArch64. This has measurable gains for some SPEC benchmarks. r222331.

  • The description of the noalias attribute has been clarified. r222497.

  • MDNode is being split into two classes, GenericMDNode and MDNodeFwdDecl. r222205.

  • The LLVM CMake-based build system learned to support LLVM_USE_SANITIZER=Thread. r222258.

  • The R600 backend gained the SIFoldOperands pass which attempts to fold source operands of mov and copy instructions into their uses. r222581.

Clang commits

  • Clang now distinguishes between -fpic and -fPIC. r222227.

  • The -Wuninitialized warning will now trigger when accessing an uninitialized base class in a constructor. r222503.

Other project commits

  • LLDB can now perform basic debugging operations on Windows. r222474.

  • LLDB's line editing support was been completely rewritten. r222163.

  • MemorySanitizer gained support for MIPS64. r222388.

  • A sample tool was added to lldb to extract and dump unwind information from Darwin's compact unwind section. r222127.

by Alex Bradbury ( at November 24, 2014 02:00 PM

November 18, 2014


LLVM Weekly - #46, Nov 17th 2014

Welcome to the forty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Chrome on Linux now uses Clang for production builds. Clang has of course been used on OS X Chrome for quite some time. The switch saw reduction in binary size of ~8%, but this was vs GCC 4.6 rather than something more up-to-date.

The LLVM in HPC workshop at SC14 is taking place on Monday and the full agenda with abstracts is available online

On the mailing lists

LLVM commits

  • Work on call lowering for MIPS FastISel has started. r221948.

  • Work has started on an assembler for the R600 backend. r221994.

  • A pass implementing forward control-flow integrity as been added. r221708.

  • A whole slew of patches that made MDNode a Value have been reverted due to a change in plan. The aim is now to separate metadata from the Value hierarchy. r221711.

  • There are two ways to inform the optimizer the result of a load is never null. Either with metadata or via assume. The latter is now canonicalized into the former. r221737.

  • vec_vsx_ld and vec_vsx_st intrinsics have been added for PowerPC. r221767.

  • PowerPC gained support for small-model PIC. r221791.

  • The intrinsic was added to make it easier to write tests for ARM ConstantIslands. r221903.

Clang commits

  • The constant trickle of OpenMP patches continues. Codegen for threadprivate variables has been added. r221663.

  • Support for __has_cpp_attribute is now present. r221991.

Other project commits

  • Breakpoint stop/resume has been implemented on Windows for LLDB. r221642.

  • The libcxx status page has been updated with the current state of C++1z support. r221601).

by Alex Bradbury ( at November 18, 2014 01:43 PM

November 10, 2014


LLVM Weekly - #45, Nov 10th 2014

Welcome to the forty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Adrian Sampson has posted a status update on his Quala project to add custom type annotations to C and C++ in Clang/LLVM.

Bruce Mitchener has posted to the Dylan blog describing how Dylan integrates with LLVM. Interestingly, Dylan doesn't link with the LLVM libraries and instead generates bitcode files directly.

The Numba project has released llvmlite, lightweight python bindings to LLVM for writing JIT compilers. This was developed based on experience using the old llvmpy bindings.

Obfuscator-LLVM has been updated to work with LLVM 3.5.

On the mailing lists

LLVM commits

  • The PBQP register allocator has had its spill costs and coalescing benefits tweaked. This apparently results in a few percent improvement on benchmarks such as EEMBC and SPEC. r221292, r221293.

  • The new SymbolRewriter pass is an IR to IR transformation allowing adjustment of symbols during compilation. It is intended to be used for symbol interpositioning in sanitizers and performance analysis tools. r221548.

  • Hexagon gained a basic ELF object emitter. r221465.

  • llvm-vtabledump gained support for the Itanium ABI. r221133.

  • LLVM's CMake build system gained the LLVM_BUILD_STATIC option. r221345.

  • The usage of Inputs/ for extra test files has been documented. r221406.

  • The MIPS backend has reached a milestone in support for the N32/N64 ABI. This commit fixes all known bugs for this ABI and the first 10000 tests generated by pass. r221534.

Clang commits

  • clang-format gained various improvements for formatting Java code. r221104, r221109, and others.

  • Support was added for C++1z nested namespace definitions, u8 character literals, and attributes on namespaces or enumerators. r221574, r221576, r221580.

Other project commits

  • LLD learned how to parse most linker scripts. Before getting too excited, do note this is parsing only, semantic actions will come in the future. r221126.

  • The common Sanitizer code gained a generic stack frame renderer. This allows the user to control the format of stack frame output. r221409, r221469.

  • The basic framework for live debugging on Windows was added to LLDB. It will detect changes such as DLL loads and unloads etc, but these need to be propagated through LLDB properly. r221207.

  • lldb-gdbserver now supports the Android target. r221570.

by Alex Bradbury ( at November 10, 2014 01:11 PM

November 03, 2014


LLVM Weekly - #44, Nov 3rd 2014

Welcome to the forty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

The 2014 LLVM Dev meeting was held last week. I couldn't make it, but it seems like there was a great selection of talks. Sadly the keynote about Swift's high-level IR was cancelled. No word yet on when we can expect slides and videos online. However, slides by Philip Reames and Sanjoy Das from their talk on on implementing fully relocating garbage collection in LLVM are online.

Peter Zotov has been doing lots of work on the LLVM OCaml bindings recently, and is looking for additional help. Recently, he's closed almost all open bugs for the bindings, migrated them to ocamlfind, fixed Lllvm_executionengine, and ensured pretty much the whole LLM-C API is exposed. Tasks on the todo list include writing tests in OUnit2 format, migrating the Kaleidoscope tutorial off camlp4, and splitting up and adding OCaml bindings to this patch. More ambitiously, it would be interesting to writing LLVM passes in OCaml and to represent LLVM IR as pure AST. If any of this interests you, do get in touch with Peter. He's able to review any patches, but could do with help on working through this list of new features.

The LLVM Bay Area monthly social is going to be held on 6th November.

On the mailing lists

  • Reid Kleckner has proposed dropping support for running LLVM on Windows XP. This would allow the use of system APIs only available in Vista and above. Thus far all responses have been positive, with one even suggesting raising the minimum to Windows 7.

  • Tom Stellard suggests deprecating the autoconf build system. Right now there is both an autotools based system and a CMake system, though CMake seems most used by developers for LLVM at least. Bob Wilson points out that the effort required to keep the existing makefiles working is much less than what might be needed to update the CMake build to support all uses cases. Though other replies make it seems that the CMake build supports pretty much all configurations people use now. If there are people who actually enjoy fiddling with build systems (far-fetched, I know), it seems like a little effort could go a long way and allow the makefile system to be jettisoned.

  • Betul Buyukkurt has posted an RFC on indirect call target profiling. The goal is to use the collected data for optimisation. Kostya Serebryany described how it can be used to provide feedback to fuzzers and detailed properties that would be useful for this usecase.

  • Chris Matthews announces that a new Jenkins-based OSX build cluster is up and running. This includes multiple build profiles and an O3 LTO performance tracker. The Jenkins config should be committed to zorg soon.

LLVM commits

  • Support for writing sampling profiles has been committed. In the future, support to read (and maybe write) profiles in GCC's gcov format will be added, and llvm-profdata will get support to manipulate sampling profiles. r220915.

  • A comment has been added to X86AsmInstrumentation to describe how asm instrumentation works. r220670.

  • The Microsoft vectorcall calling convention has been implemented for x86 and x86-64. r220745.

  • The C (and OCaml) APIs gained functions to query and modify branches, and to obtain the values for floating point constants. There have been a whole bunch of additional commits related to the OCaml bindings, too many to pick out anything representative. r220814, r220815, r220817, r220818.

  • The loop and SLP (superword level parallelism) vectorizers are now enabled in the Gold plugin. r220886, r220887.

Clang commits

  • A refactoring of libTooling to reduce required dependencies means that clang-format's binary is now roughly half the size. r220867.

Other project commits

  • lldb has started to adopt the StringPrinter API. r220894.

  • Initial support for PowerPC/PowerPC64 on FreeBSD has been added to LLDB. r220944.

by Alex Bradbury ( at November 03, 2014 11:29 AM

October 27, 2014


LLVM Weekly - #43, Oct 27th 2014

Welcome to the forty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

This week it's the LLVM Developers' Meeting in San Jose. Check out the schedule. Unfortunately I won't be there, so I'm looking forward to the slides and videos going online.

The canonical home for this issue can be found here at

News and articles from around the web

Philip Reames has written up a detailed discussion of statepoints vs gcroot for representing call safepoints. The aim is to clearly explain how the safepoint functionality provided by the patches currently up for review differ to the current gc.root support.

The Haskell community have put together a proposal for an improved LLVM backend to GHC. They intend to ship GHC with its own local LLVM build.

CoderGears have published a blog post about using Clang to get better warnings in Visual C++ projects.

There is going to be a dedicated LLVM devroom at FOSDEM 2015. Here is the call for speakers and participation.

On the mailing lists

LLVM commits

  • The nonnull metadata has been introduced for Load instructions. r220240.

  • minnum and maxnum intrinsics have been added. r220341, r220342.

  • The Hexagon backend gained a basic disassembler. r220393.

  • PassConfig gained usingDefaultRegAlloc to tell if the default register allocator is being used. r220321.

  • An llvm-go tool has been added. It is intended to be used to build components such as the Go frontend in-tree. r220462.

Clang commits

  • C compilation defaults to C11 by default, matching the behaviour of GCC 5.0. r220244.

  • Clang should now be better at finding Visual Studio in non-standard setups. r220226.

  • The Windows toolchain is now known as MSVCToolChain, to allow the addition a CrossWindowsToolChain which will use clang/libc++/lld. r220362, r220546.

Other project commits

  • The libcxxabi gained support for running libc++abi tests with sanitizers. r220464.

by Alex Bradbury ( at October 27, 2014 11:02 AM

October 21, 2014

Philip Reames

Statepoints vs gcroot for representing call safepoints

I recent discussion on LLVM commits w.r.t. the statepoint changes which are up for review, I managed to get myself confused and made a couple of inaccurate statements regarding the existing capabilities of gcroots vs the newly proposed statepoints.  This post is a (hopefully correct) summary of the similarities and differences.

For the purposes of this post, I am only talking about the semantics of the collector at a source language level call site.  The issues highlighted with gc root and safepoint poll sites in my previous post still stand, but I didn’t do a very good job (in retrospect) of distinguishing between safepoints at call sites, and additional checks + runtime calls inserted to ensure that running code checks for a safepoint request at some interval.  The points in that post apply to the later; this one talks about the former.

From a functional correctness standpoint, gc.root and statepoint are equivalent.  They can both support relocating collectors, including those which relocate roots.  To prevent future confusion, let me review how each works.

gc.root uses explicit spill slots in the IR in the form of allocas.  Each alloca escapes (through the gcroot call itself); as a result, the compiler must assume that any readwrite call can both consume and update the values in question.  Additionally, the fact that all calls are readwrite prevents reordering of unrelated loads past the call.  gcroot relies on the fact that no SSA value relocated at a call site is used at a site reachable from the call.  Instead, a new SSA value (whose relation to the original is unknown by the compiler) is introduced by loading from the (potentially clobbered) alloca.  gcroot creates a single stack map table for the entire function.  It is the compiled code’s responsibility to ensure that all values in the allocas are either valid live pointers or null.

Statepoints use most of the same techniques.  We rely on not having an SSA value used on both sides of a call, but we manage the relocation via explicit IR relocation operations, not loads and stores.  We require the call to be read/write to prevent reordering of unrelated loads.  Since the spill slots are not visible in the IR, we do not need the reasoning about escapes that gc.root does.

To explicitly state this again since I screwed this up once before, both statepoints and gc.roots can correctly represent relocation semantics in the IR.  In fact, the underlying reasoning about their correctness are rather similar.

They do differ fairly substantially in the details though.  Let’s consider a few examples.

SSA vs Memory – gcroot encodes relocations as memory operations (stores, clobbering calls, loads) where statepoint uses first class SSA values.  We believe this makes optimizations more straightforward.

Consider a simple optimization for null pointer relocation.  If the optimizer manages to establish that one of the value being relocated is null, propagating this across a statepoint is straightforward.  (For each gc.relocate, if source is null, replaceAllUsesWith null.)  Implementing this same optimization for gc.root is harder since the store and load may have been reordered from immediately around the call.  This isn’t an unsolvable problem by any means, but it would be a GVN change, not an InstCombine one.  In practice, we believe InstCombine style optimizations to be advantageous since they’re simpler to write and debug.  Arguably, they’re also more powerful given the current pipeline since they have multiple opportunities to trigger.

Derived Pointers – gcroot can represent derived pointers, but only via convention.  There is no convention specified, so it’s up to the frontend to create it’s own.  Statepoints define a convention (explicitly in the relocation operation) which makes describing optimizations straight forward.

One thing we plan to do with the statepoint representation is to implement an “easily derived pointer” optimization (to run near CodeGenPrep).  On X86, it’s far cheaper to recreate a GEP base + 5 derived pointer than relocate it.  Recognizing this case is quite straight forward given the statepoint representation.

A frontend could implement a similar optimization for gcroot at IR generation time.  You could also implement such an optimization over the load/call/store representation, but the implementation would be much more complex (analogous to the null optimization above).

To be fair, gc.root may need such an optimization less.  Since call-safepoints are inserted early, CSE has not yet run.  As a result, there may be fewer “easily derived pointers” live across a call.

Format – Statepoints use a standard format.  gc.root supports custom formats.  Either could be extended to support the other without much difficulty.

The more material difference between the two is that gc.root generates a single stack map for the entire function while statepoints generate a unique stack map per call site.  Having a single stack map imposes a slight penalty on code compiled with gc.root since dead values must explicitly be removed from the alloca (by a write of null).  In the wrong situation (say a tight loop with two calls), this could be material.

Lowering - Currently, both gc.root and statepoint lower to stack slots.  gc.root does this at the IR level, statepoints does so in SelectionDAG.

The design of statepoints is intended to allow pushing the explicit relocations back through the backend.  The reason this is desirable is that pointers can be left in callee saved registers over call sites.  Without substantial re-engineering, such a thing is not possible for gc.root.  The importance of this from a performance perspective is debatable.  It is my belief that the key benefit would be in a) reducing frame sizes (by not requiring spill slots), and b) avoiding spills around calls.

An advantage of gc.root is that the backend can remain largely ignorant of the gc.root mechanism.  By the point the backend encounters them, a gc.root is just another alloca.  One potential problem with the current implementation is that the escape is lost when lowering; the gcroot call is lowered to an entry into a side table and the alloca no longer escapes.  This is a source of possible bugs, but is also a straightforward fix.

As to the lowering currently implemented, it’s debatable which is better.  Statepoints optimize constants, and unifies based on SDValue.  As a result, two IR level values of different types (with the same bit pattern) can end up sharing the same stackslot.  However, it suffers when trying to assign stack slots.  We currently use heuristics, but you can end up with ugly shuffling of values around on the stack across basic blocks.  (There’s a number of ways to improve that, but it’s not yet implemented.)  gc.root doesn’t suffer from this problem since stack slots are assigned by the frontend.

Since the stack spills and reloads are visible at the IR layer, gcroot gets the full ability of the optimizer to remove redundant reloads.  Statepoints only get to leverage the pieces in the backend.  In theory, this could result in materially worse spill/reload code for statepoints.  In practice, this appears not to matter much provided the same value is assigned to the same slot across both calls, but I don’t actually have much data here to say anything conclusively yet.

I haven’t tried to measure frame size for gc.root vs statepoints.  I suspect that statepoints may come out slightly ahead, but I doubt this is material.  There are also cases (see “easily derived pointers” above), where gc.root may come out ahead.

IR Level Optimization – Both gc.root and statepoints cripple optimization (by design!).  gcroot works better with inlining today, but statepoints could be easily enhanced to handle this case.  (The same work would benefit symbolic patchpoints.)

It is my belief that statepoints are easier to optimize (i.e. teach to LICM), but this is purely my guess with no real evidence.  Both suffer from the fact that calls must be marked readwrite.  Not having to reason about memory seems easier, but I’m open to other arguments here.

Community Support & Compatibility
From a practical perspective, statepoints have active users behind them.  We are interested in continuing to enhance and optimize them in the public tree.  The same support does not seem to exist for gcroot.

The implementation of statepoints is largely aligned with that of patchpoints.  The implementation of gcroot is completely separate and poorly understood by the majority of the community.

It wouldn’t be hard to write a translation pass from gcroot to statepoints or from statepoints to gcroot.  If folks are concerned about compatibility, this would be a reasonable option.  The largest challenge to transparently replacing one with the other is in generating the right output format.

To summarize, gcroot and statepoints are functionally equivalent (modulo possible bugs.)  In their current form, the two are largely comparable with each having some benefits.  Long term, we believe a statepoint representation will allow better code generation and IR level optimization of code with safepoints inserted.  We believe statepoints to be easier to optimize both at the IR level and backend.

Again, the late safepoint proposal is independent and could be done with either representation.  It’s currently implemented on statepoints, but it could be extended to gcroot without too much work.

by reames at October 21, 2014 09:42 PM

October 20, 2014

OpenMP Runtime Project

LLVM in Clang Developers Meeting

In case you missed it, you may like to know that there will be a talk on "OpenMP* Support in Clang/LLVM: Status Update and Future Directions" at the LLVM developers' meeting in a couple of weeks' time.

by Eugene Roeder (Intel) at October 20, 2014 01:13 PM


LLVM Weekly - #42, Oct 20th 2014

Welcome to the forty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

If you're local to London, you may be interested to know that I'll be talking about lowRISC at the Open Source Hardware User Group on Thursday.

The canonical home for this issue can be found here at

News and articles from around the web

ELLCC, the LLVM-based cross-compilation toolchain now has pre-built binaries for all LLVM tools.

Eli Bendersky's repository of examples for using LLVM and Clang as libraries and for building new passes aren't new, but they are incredibly useful for newcomers to LLVM/Clang and I haven't featured them before. If you want to build something using LLVM or Clang, the llvm-clang-samples repos is one of the best places to start.

On the mailing lists

LLVM commits

  • Go LLVM bindings have been committed. r219976.

  • Invoking patchpoint intrinsics is now supported. r220055.

  • LLVM gained a workaround for a Cortex-A53 erratum. r219603.

  • Basic support for ARM Cortex-A17 was added. r219606.

  • The C API has been extended with the LLVMWriteBitcodeToMemoryBuffer function. r219643.

  • NumOperands has been moved from User to Value. On 64-bit host architectures this reduces sizeof(User) and subclasses by 8. r219845.

  • The LLVMParseCommandLineOptions was added to the C API. r219975.

Clang commits

  • Constant expressions can now be used in pragma loop hints. r219589.

  • The libclang API gained a function to retrieve the storage class of a declaration. r219809.

  • With the -fsanitize-address-field-padding flag, Clang can insert poisoned paddings between fields in C++ classes to allow AddressSanitizer to find intra-object overflow bugs. r219961.

Other project commits

  • lldb now supports a gdb-style batch mode. r219654.

by Alex Bradbury ( at October 20, 2014 11:31 AM

LLVM Weekly - #41, Oct 13th 2014

Welcome to the forty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

I've been in Munich for ORCONF this weekend. Slides from my talk about lowRISC are available here.

The canonical home for this issue can be found here at

News and articles from around the web

ELLCC, the LLVM/Clang-based cross development toolkit now has Windows binaries available.

IBM have posted a bounty on fixing the AddressSanitizer tests that fail on PowerPC.

GCC needs you! A large number of potential starting points for new contributors has been posted to the GCC mailing list.

On the mailing lists

LLVM commits

  • Switches with only two cases and a default are now optimised to a couple of selects. r219223.

  • llvm-symbolizer will now be used to symbolize LLVM/Clang crash dumps. r219534.

  • The calculation of loop trip counts for loops with multiple exits has been de-pessimized. r219517.

  • MIPS fast-isel learnt integer and floating point compare and conditional branches. r219518, r219530, r219556.

  • R600 gained a load/store machine optimizer pass. r219533.

Clang commits

  • The integrated assembler has been turned on by default for ppc64 and ppc64le. r219129.

  • clang-format's interpretation of special comments to disable formatting within a delimited range has been documented. r219204.

  • The integrated assembler has been turned on by default for SystemZ. r219426.

Other project commits

  • lld gained support for 'fat' mach-o archives. r219268.

  • The lldbtk example has seen some further development. r219219.

  • lldb-gdbserver can now be used for local-process Linux debugging. r219457.

  • The disassembly format for lldb can now be customized. r219544.

by Alex Bradbury ( at October 20, 2014 11:31 AM

October 14, 2014

Philip Reames

Statepoint changes up for review

Last week, the first set of patches for our work on garbage collection support in LLVM hit the mailing list.  The review process will probably take a few weeks, but hopefully these should have landed by the 2014 LLVM Developers Meeting at the end of this month.  At that conference, my co-worker Sanjoy and I are going to be giving a talk about our progress on statepoints, and late safepoint placement.

Here’s the full text of the review request, along with a couple of updates:

Title: [Patch] Statepoint infrastructure for garbage collection

The attached patch implements an approach to supporting garbage collection in LLVM that has been mentioned on the mailing list a number of times by now.  There’s a couple of issues that need to be addressed before submission, but I wanted to get this up to give maximal time for review.

The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types.  Our testing to date has focused on fully relocating collectors (where pointers can change at any safepoint poll, or call site), but the infrastructure should support collectors of other styles.  The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them.  There are no side tables created, no extra metadata, and no inhibited optimizations.

A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation.  It is the frontend’s responsibility (or eventually the safepoint insertion pass we’ve developed, but that’s not part of this patch) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated.  The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis.  The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly.

During the lowering process, we currently spill aggressively to stack.  This is not entirely ideal (and we have plans to do better), but it’s functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  We leverage the existing StackMap section format, which is already used by the patchpoint intrinsics, to report where pointer values live.  Unlike a patchpoint, these locations are known (by the backend) to be writeable during the call.  This enables the garbage collector to transparently read and update pointer values if required.  We do optimize lowering in certain well known cases (constant pointers, a.k.a. null, being the key one.)

There are a few areas of this patch which could use improvement:

  • The patch needs rebased against TOT.  It’s currently based against a roughly 3 week old snapshot. (FIXED)
  • The intrinsics should probably be renamed to include an “experimental” prefix.
  • The usage of Direct and Indirect location types are currently inverted as compared to the definition used by patchpoint.  This is a simple fix. (FIXED)
  • The test coverage could be improved.  Most of the tests we’ve actually been using are built on top of the safepoint insertion mechanism (not included here) and our runtime.  We need to improve the IR level tests for optimizer semantics (i.e. not doing illegal transforms), and lowering.  There are some minimal tests in place for the lowering of simple statepoints.
  • The documentation is “in progress” (to put it kindly.)  (MUCH IMPROVED, MORE TODO)
  • Many functions are missing doxygen comments
  • There’s a hack in to force the use of RSP+Offset addressing vs RBP-Offset addressing for references in the StackMap section.  This works, shouldn’t break anyone else, but should definitely be cleaned up.  The choice of addressing preference should be up to the runtime.

When reviewing, I would greatly appreciate feedback on which issues need to be fixed before submission and those which can be addressed afterwards.  It is my plan to actively maintain and enhance this infrastructure over next few months (and years).  It’s already been developed out of tree entirely too long (our fault!), and I’d like to move to incremental work in tree as quickly as feasible.

Planned enhancements after submission:

  • The ordering of arguments in statepoints is essentially historical cruft at this point.  I’m open to suggestions on how to make this more approachable.  Reordering arguments would (preferably) be a post commit action.
  • Support for relocatable pointers in callee saved registers over call sites.  This will require the notation of an explicit relocation psuedo op and support for it throughout the backend (particularly the register allocator.)
  • Optimizations for non-relocating collectors.  For example, the clobber semantics of the spill slots aren’t needed if the collector isn’t relocating roots.
  • Further optimizations to reduce the cost of spilling around each statepoint (when required at all).
  • Support for invokable statepoints.
  • Once this has baked in tree for a while, I plan to delete the existing gc_root code.  It is unsound, and essentially unused.

In addition to the enhancements to the infrastructure in the currently proposed patch, we’re also working on a number of follow up changes:

  • Verification passes to confirm that safepoints were inserted in a semantically valid way (i.e. no memory access of a value after it has been inserted)
  • A transformation pass to convert naive IR to include both safepoint polling sites, and statepoints on every non-leaf call.  This transformation pass can be used at initial IR creation time to simplify the frontend authors’ work, but is also designed to run on *fully optimized* IR, provided the initial IR meets certain (fairly loose) restrictions.
  • A transformation pass to convert normal loads and stores into user provided load and store barriers.
  • Further optimizations to reduce the number of safepoints required, and improve the infrastructure as a whole.

We’ve been working on these topics for a while, but the follow on patches aren’t quite as mature as what’s being proposed now.  Once these pieces stabilize a bit, we plan to upstream them as well.  For those who are curious, our work on those topics is available here:

by reames at October 14, 2014 12:10 AM

October 06, 2014


LLVM Weekly - #40, Oct 6th 2014

Welcome to the fortieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

I'll be in Munich next weekend for the OpenRISC conference where I'll be presenting on the lowRISC project to produce an open-source SoC. I'll be giving a similar talk in London at the Open Source Hardware User Group on 23rd October.

The canonical home for this issue can be found here at

News and articles from around the web

Capstone 3.0 RC1 has been released Capstone is an open source disassembly engine, based initially on code from LLVM. This release features support for Sparc, SystemZ and XCore as well as the previously supported architectures. Among other changes, the Python bindings are now compatible with Python 3.

An interesting paper from last year came up on the mailing list. From EPFL, it proposes adding -OVERIFY to optimise programs for fast verification. The performance of symbolic execution tools is improved by reducing the number of paths to explore and the complexity of branch conditions. They managed a maximum 95x reduction in total compilation and analysis time.

The next Cambridge (UK) social will take place on Wed 8th Oct at 7.30 pm.

On the mailing lists

LLVM commits

  • The expansion of atomic loads/stores for PowerPC has been improved. r218922. The documentation on atomics has also been updated. r218937.

  • For the past few weeks, Chandler Carruth has been working on a new vector shuffle lowering implementation. There have been too many commits to summarise, but the time has come and the new codepath is now enabled by default. It claims 5-40% improvements in the right conditions (when the loop vectorizer fires in the hot path for SSE2/SSE3). r219046.

  • The Cortex-A57 scheduling model has been refined. r218627.

  • SimplifyCFG now has a configurable threshold for folding branches with common destination. Changing this threshold can be worthwhile for GPU programs where branches are expensive. r218711.

  • Basic support for the newly-announced Cortex-M7 has been added. r218747.

  • As discussed on the mailing list last week, the sqrt intrinsic will now return undef when given a negative input. r218803.

  • llvm-readobj learnt -coff-imports which will print out the COFF import table. r218891, r218915.

Clang commits

  • Support for the align_value attribute has been added, matching the behaviour of the attribute in the Intel compiler. The commit message explains why this attribute is useful in addition to aligned. r218910.

  • A rather useful diagnostic has been added. -Winconsistent-missing-override will warn if override is missing on an overridden method if that class has at least one override specified on its methods. r218925.

  • Support for MS ABI continues. thread_local is now supported for global variables. r219074.

  • Matcher and DynTypedMatcher saw some nice performance tweaking, resulting in a 14% improvement on a clang-tidy benchmark and compilation of Dynamic/Registry.cpp sped up by 17%. r218616.

  • lifetime.start and lifetime.end markers are now emitted for unnamed temporary objects. r218865.

  • The __sync_fetch_and_nand intrinsic was re-added. See the commit message for a history of its removal. r218905.

  • Clang gained its own implementation of C11 stdatomic.h. The system header will be used in preference if present. r218957.

  • Clang now understands -mthread-model to specify the thread model to use, e.g. posix, single (for bare-metal and single-threaded targets). r219027.

Other project commits

  • libcxxabi should now work with the ARM Cortex-M0. r218869.

  • lldb gained initial support for scripting stepping. This is the ability to add new stepping modes implemented by python classes. The example in the follow-on commit has a large comment at the head of the file to explain its operation. r218642, r218650.

by Alex Bradbury ( at October 06, 2014 12:49 PM

October 02, 2014

OpenMP Runtime Project

New code release

We are excited to announce the next release of the Intel® OpenMP* Runtime Library at This release aligns with Intel® Parallel Studio XE 2015.

New Features

  • Contribution from ScaleMP: stack padding.
  • Redesign of wait and release code; performance improvements.

Bug Fixes

by mad\egfefey at October 02, 2014 09:15 PM

September 30, 2014


LLVM Weekly - #39, Sep 29th 2014

Welcome to the thirty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

An implementation of Common Lisp with an LLVM backend, Clasp, has been announced. There's a lot of work to be done on performance, but development is very active on Github.

A backend for the educational 'y86' instruction set architecture has been started. The source is on Github.

A new binary snopshot of the ELLCC cross compilation toolchain is now available. Pre-compiled binaries are available for ARM, MIPS, PPC, and x86. All tarballs contain header files and runtime libraries for all targets to allow you to build for any supported target.

On the mailing lists

LLVM commits

  • Segmented stacks support for the x32 ABI has been fixed. r218247.

  • Robin Morisset's work on optimisation of atomics continues. AtomicExpandPass now inserts fences itself rather than SelectionDAGBuilder. r218329.

  • LLVM's libSupport gained a type-safe alternative to llvm::format(). r218463.

  • llvm-vtabledump learned how to dump RTTI structures for the MS ABI. r218498.

Clang commits

  • The assume_aligned function attribute is now supported. r218500.

  • The thread safety analysis documentation has seen a hefty update. r218420.

  • MS compatibility is further improved with support for the __super scope specifier. r218484.

Other project commits

  • ASan in compiler-rt gained the start of a debugging API. r218538.

  • LLDB gained the beginnings of an example Tk UI. r218279.

by Alex Bradbury ( at September 30, 2014 08:53 AM

September 22, 2014


LLVM Weekly - #38, Sep 22nd 2014

Welcome to the thirty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

I've been at PyConUK this past weekend so I'm afraid it's another slightly shorter than normal issue. I've been talking about Pyland, a programming game that aims to teach children programming in Python (and of course, runs on Raspberry Pi).

The canonical home for this issue can be found here at

News and articles from around the web

A paper has recently been published about Harmony. In the words of the authors "Harmony is an open source tool (built as an LLVM pass) that creates a new kind of application profile called Parallel Block Vectors, or PBVs. PBVs track dynamic program parallelism at basic block granularity to expose opportunities for improving hardware design and software performance." Their most recent paper on ParaShares describes how they find the most 'important' basic blocks in multithreaded programs.

Richard Pennington has written up some more thoughts on cross compilation configuration for Clang.

Clike is a low-level programming language with an extensible syntax based on C. It of course targets LLVM.

If you want your Emacs editor to automatically disassemble LLVM bitcode inside Emacs buffers, then autodisass-llvm-bitcode is for you.

On the mailing lists

LLVM commits

  • The LLVM MC layer can now write BigObj-style COFF object files. r217812.

  • X86AtomicExpandPass has been removed in favour of using the generic AtomicExpandHooks (which now has the necessary hooks). r217928.

  • llvm-cov's internal API has been reworked. r217975.

Clang commits

  • Clang can now use 'response files' when calling other tools when the length of the command line exceeds system limits. r217792.

  • The -Wbind-to-temporary-copy warning is no longer on by default. r218008.

  • Clang's thread safety analysis gained -Wthread-safety-reference which warns when a guarded variable is passed by reference as a function argument. r218087.

Other project commits

  • libcxx gained some support for using newlib as its C library. r218144.

by Alex Bradbury ( at September 22, 2014 03:57 PM

September 15, 2014


LLVM Weekly - #37, Sep 15th 2014

Welcome to the thirty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

This week's issue comes to you from sunny Tenerife. Yes, my dedication to weekly LLVM updates is so great that I'm writing it on holiday. Enjoy! I'll also note that I'm at PyCon UK next week where I'll be presenting on the results of a project we had some interns working on over the summer creating a programming game for the Raspberry Pi.

The canonical home for this issue can be found here at

News and articles from around the web

Not only does Pyston have a shiny new blog, they've also released version 0.2. Pyston is an implementation of Python using LLVM, led by Dropbox. This release supports a range of language features that weren't supported in 0.1, including support for the native C API. The plan is to focus on performance during the development cycle for 0.3.

Sylvestre Ledru has posted a report of progress in building Debian with Clang following the completion of this year's Google Summer of Code projects. Now with Clang 3.5.0 1261 packages fail to build with Clang. Sylvestre describes how they're attacking the problem from both sides, by submitting patches to upstream projects as well as to Clang where appropriate (e.g. to ignore some unsupported optimisation flags rather than erroring out).

On the mailing lists

  • Philip Reames has started a discussion on adding optimisation hints for 'constant' loads. A common case is where a field is initialised exactly once and then is never modified. If this invariant could be expressed, it could improve alias analysis as the AA pass would never consider that field to MayAlias with something else (Philip reports that the obvious approach of using type-based alias analysis isn't quite enough).

  • Hal Finkel has posted an RFC on attaching attributes to values. Currently, attributes such as noalias and nonnull can be attached to function parameters, but in cases such as C++11 lambdas these can be packed up into a structure and the attributes are lost. Some followup discussion focused on whether these could be represented as metadata. The problem there of course is that metadata is intended to be droppable (i.e. is semantically unimportant). I very much like the suggestion from Philip Reames that the test suite should run with a pass that forcibly drops metadata to verify it truly is safe to drop.

  • Robin Morisset has posted a proposal on implementing a fence elimination algorithm. The proposed algorithm is based on partial redundancy elimination. He's looking for feedback on the suggested implementation approach.

  • There's been a little bit of discussion on the topic of rekindling work on VMKit.

LLVM commits

  • The start of the llvm.assume infrastructure has been committed, as well as an AlignmentFromAssumptions pass. See the original RFC for a refresher on the llvm.assume intrinsic. r217342, r217344.

  • LLVM's sample profile reader has been refactored into lib/ProfileData. r217437.

  • The AMD 16H Jaguar microarchitecture now has a scheduling model. r217457.

  • The 'bigobj' COFF variant can now be read. r217496.

Clang commits

  • The __builtin_assume and __builtin_assume_aligned intrinsics have been added. r217349.

  • The thread safety TIL (Typed Intermediate Language) has seen a major update. r217556.

Other project commits

  • LLD gained support for AArch64 Mach-O. r217469.

by Alex Bradbury ( at September 15, 2014 02:01 PM

September 11, 2014

Sylvestre Ledru

Rebuild of Debian using Clang 3.5.0

Clang 3.5.0 has just been released. A new rebuild has been done highlight the progress to get Debian built with clang.

tl;dr: Great progress. We decreased from 9.5% to 5.7% of failures. Full results are available on

At time of the rebuild with 3.4.2, we had 2040 packages failing to build with clang. With 3.5.0, this dropped to 1261 packages.


With Arthur Marble and Alexander Ovchinnikov, both GSoC students, we worked on various ways to decrease the number of errors.

Upstream fixes

First, the most obvious way, we fixed programming bugs/mistakes in upstream sources. Basically, we took categories of failure and fixed issues one after the other. We started with simple bugs like 'Wrong main declaration', 'non-void function should return a value' or 'Void function should not return a value'.

They are trivial to fix. We continued with harder fixes like ' Undefined reference' or 'Variable length array for a non POD (plain old data) element'.

So, besides these one, we worked on:

In total, we reported 295 bugs with patches. 85 of them have been fixed (meaning that the Debian maintainer uploaded a new version with the fix).

In parallel, I think that the switch by FreeBSD and Mac OS X to Clang also helped to fix various issues by upstreams.

Hacking in clang

As a parallel approach, we started to implement a suggestion from Linus Torvalds and a few others. Instead of trying to fix all upstream, where we can, we tried to update clang to improve the gcc compatibility.

gcc has many flags to disable or enable optimizations. Some of them are legacy, others have no sense in clang, etc. Instead of failing in clang with an error, we create a new category of warnings (showing optimization flag '%0' is not supported) and moved all relevant flags into it. Some examples, r212805, r213365, r214906 or r214907

We also updated clang to silent some useless arguments like -finput-charset=UTF-8 (r212110), clang being UTF-8 compliant.

Finally, we worked on the forwarding of linker flags. Clang and gcc have a very different behavior: when gcc does not know an argument, it is going to forward the argument to the linker. Clang, in this case, is going to reject the argument and fail with an error. In clang, we have to explicitly declare which arguments are going to be transfer to the linker. Of course, the correct way to pass arguments to the linker is to use -Xlinker or -Wl but the Debian rebuild proved that these shortcuts are used. Two of these arguments are now forwarded:

  • -z keyword - r213198
  • -u Force symbol to be entered in the output file as an undefined symbol - r211756. This one fixed most of the haskell build failures. It fixed the most common issue that we had (701 occurrences but this does not mean that all these packages build fine now, some haskell-based package are failing later in the process)

New errors

Just like in other releases, new warnings are added in clang. With (bad) usage of -Werror by upstream software, this causes new build failures:

I also took the opportunity to add some further categorizations in the list of errors. Some examples:

Next steps

The Debile project being close to ready with Clément Schreiner's GSoC, we will now have an automatic and transparent way to rebuild packages using clang.


As stated, we can see a huge drop in term of number of failures over time:

Hopefully, Clang getting better and better, more and more projects adopting it as the default compiler or as a base for plugin/extension developments, this percentage will continue to decrease.
Having some kind of release goal with clang for Jessie+1 can now be considered as potentially reachable.

Want to help?

There are several things which can be done to help:

  • Point me common error patterns in the Not categorized list of errors to create new categories
  • Report and fix packages
  • As an upstream, integrate clang as part of your continuous integration system
  • Hack on cqa-scanlogs, the error detection tool to detect error patterns (example: Undetected error). This tool is used also for the regular rebuilds of the archive.
  • Improve website


Thanks to David Suarez for the rebuilds of the archive, Arthur Marble and Alexander Ovchinnikov for their GSoC works and Nicolas Sévelin-Radiguet for the few fixes.

by Sylvestre at September 11, 2014 12:17 PM

September 08, 2014

OpenMP Runtime Project

New code release

We are excited to announce the next release of the Intel® OpenMP* Runtime Library at  This release aligns with Intel® Composer XE 2013 SP1 Update 4, scheduled for release in summer of 2014.

New features

by Johnny Peyton at September 08, 2014 08:17 PM


LLVM Weekly - #36, Sep 8th 2014

Welcome to the thirty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

The biggest news this week is of course undoubtedly the long-awaited release of LLVM/Clang 3.5. See the release notes for a full breakdown of what's changed.

Rhine, a Clojure-inspired Lisp with an LLVM JIT backend has been released (or at least, I wasn't aware of it before). There's plenty of discussion about it over at HN.

Intel have released a new version of their CilkPlus LLVM-basd compiler. This releases implements support for version 1.2 of Intel's Cilk Plus Language Extension Specification.

On the mailing lists

LLVM commits

  • LLVM gained a new alias analysis implementation, the CFL (Context-free language) alias analysis algorithm. When bootstrapping LLVM, this pass gives 7-8% NoAlias responses to queries that TBAA and BasicAA couldn't answer. r216970.

  • The old JIT has finally been removed. r216982.

  • FastISel gained the option to skip target-independent instruction selection. This is now used by AARch64, which uses target-dependent instruction selection only. r216947, r216955.

  • MCAnalysis has been removed. The code was judged to be buggy and poorly tested. r216983.

  • AArch64 gained a pass to try to remove redundant comparison operations. r217220.

  • FastISel has seen some spring cleaning. r217060.

Clang commits

  • VariantMatcher::MatcherOps was modified to reduce the amount of generated code. This reduces object size and compilation time. r217152.

  • Support for the 'w' and 'h' length modifiers in MS format strings was added. r217195, r217196.

  • A new warning is born. -Wunused-local-typedef will warn about unused local typedefs. r217298.

Other project commits

  • LLDB has gained initial support for 'type validators'. To quote the commit message, "Type Validators have the purpose of looking at a ValueObject, and making sure that there is nothing semantically wrong about the object's contents For instance, if you have a class that represents a speed, the validator might trigger if the speed value is greater than the speed of light". r217277.

  • It is now possible to build libc++ on systems without POSIX threads. r217271.

  • A target.process.memory-cache-line-size option has been added to LLDB which changes the size of lldb's internal memory cache chunks read from the remote system. r217083.

by Alex Bradbury ( at September 08, 2014 02:51 PM

September 01, 2014


LLVM Weekly - #35, Sep 1st 2014

Welcome to the thirty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects.LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

As I mentioned in a previous issue, I am involved in the lowRISC projects to produce a fully open-source SoC. Just a quick reminder that we are hiring, and you have just over a week to get your application in.

The canonical home for this issue can be found here at

News and articles from around the web

LLVM/Clang 3.5 is inching ever closer to release. The fourth and hopefully final release candidate is available for testing.

Quarks Lab have published a preview of SCAF, a Source Code Analysis Framework built on Clang. It promises a release soon.

The VMKit project website has this week been updated to mark the project as retired. VMKit was a project to implement virtual machines such as a JVM on top of LLVM. People interested in restarting the project are encouraged to get in touch with Gaël Thomas.

AMD and Microsoft have released a C++ AMP compiler targeting version 1.2 of the specification. The C++ AMP (Accelerated Massive Parallelism) compiler is of course based on LLVM and Clang, and can be found here.

On the mailing lists

  • Manuel Klimek has provided a quick run-down of the state of his work on Clang C++ refactoring tools. He reports there are a number of standalone, single-use refacotring tools but more work needs to be done on generalising and integrating them. The plan is to push more of these tools to tools-extra (where clang-rename lives), make them integratable as a library, integrate them into libclang and then integrate them into projects like ycmd.

  • Robin Morisset has been working on optimisations for lowering of atomics and has asked for input on a fence elimination algorithm he's been thinking about. He has outlined two possible implementation routes he would like feedback on.

  • A discussion about improving llvm-objdump, kicked offed by Steve King, makes an interesting read. I'm looking forward to a future with a more featureful llvm-objdump that prints symbols of branch targets by default.

  • David Blaikie has started a discussion about supporting -gmlt in LLVM/Clang. Vital to having any chance of understanding this thread is to know that gmlt refers to debug info containing 'minimal line tables', a feature that was added to GCC a while back.

  • I linked last week to the mailing list thread on removing static initializers for command line options and regrettably was unable to summarise the extensive discussion. The bad news is discussion has continued at a rapid pace, but thankfully Chandler Carruth has rather helpfully sumarised the main outcomes of the discussion. It's also worth reading this thread for an idea of what the new infrastructure might look like.

LLVM commits

  • The AArch64 backend learned about v4f16 and v8f16 operations, r216555.

  • The LLVM CMake build system now includes support for building with UndefinedBehaviourSanitizer. r216701.

Clang commits

  • The -fdevirtualize and -fdevirtualize-speculatively flags are now recognised (and ignored) for compatibility with GCC. r216477.

  • Some Google Summer of Code work has started to land. In particular, the Clang static analyzer gained initial infrastructure to support for synthesizing function implementations from external model files. See the commit message for full details on the intent of this feature. r216550.

  • Support was added for capturing variable length arrays in C++11 lambda expressions. r216649.

Other project commits

  • LLDB gained documentation on its internal register numbering scheme. r216372.

  • LLDB is making progress towards AArch64 support. r216736.

by Alex Bradbury ( at September 01, 2014 05:48 PM

August 25, 2014


LLVM Weekly - #34, Aug 25th 2014

Welcome to the thirty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects.LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

The third release candidate for LLVM/Clang 3.5 is now available. As ever, test it on your codebases and report any regressions.

Adrian Sampson has written a blog post about Quala, a tool for implementing pluggable type systems for C/C++ using Clang. The example type systems are a system allowing nullable and non-nullable pointers as well as an information flow tracking system. In the future, Adrian wants to connect type annotations to LLVM IR.

C++14 is now done. A quick look at the Clang C++14 implementation statusconfirms that Clang support is in pretty good shape.

Santiago Fernandez has been an intern on the .NET team at Microsoft this summer. In this MSDN Channel9 posting, Beth Massi interviews him about his work on using LLVM in the .NET native code generator.

The next Cambridge (UK) LLVM social will be held on Weds 27th August, 7.30pm.

On the mailing lists

  • There is a proposal to move the minimum supported Visual Studio version for compiling LLVM/Clang up to 2013 from 2012. LLVM/Clang 3.6 would be the first stable release with this requirement assuming there are no objections. With the introduction of C++11 features into the LLVM/Clang codebases, MSVC2012 support is troublesome due to a number of unsupported constructs. If this change would effect you negatively, now is the time to pipe up.

  • Richard Carback reports that two of his interns at Draper Laboratories have been working on resurrecting the LLVM C Backend, with source on Github. If this is to make it back into the mainstream repository, somebody will have to volunteer to maintain it which Richard has kindly done.

  • Diego Novillo has posted an update on his plans for supporting profile data from Perf in LLVM. He is now planning on keeping conversion to Perf's format out-of-tree. The current LLVM representation can be used as an exchange format, but Diego will be submitting a more compact representation for internal use.

  • Chris Bieneman has posted an RFC on removing static initializers for command line options. This would make it easier for LLVM clients like WebKit and Mesa. There is a lot of discussion about this proposal that I'm afraid I don't have time to summarise.

LLVM commits

  • X86 Haswell gained a detailed scheduling model. r215094, r215905, and more.

  • LLVM's code coverage mapping format gained extensive documentation. r215990.

  • FastISel for AArch64 saw yet more changes, this time optimisations for ADDS/SUBS emission and support for variable shifts. r216033, r216242.

  • The MIPS assembler gained support for .set arch=x. r215978.

  • The PeepholeOptimizer has been improved to take advantage of the recently added isRegSequence, isExtractSubreg, and isInsertSubreg properties. r216088, r216136, r216144.

  • A thread-model option has been added along with the 'single' option for lowering atomics on baremetal and single-threaded systems. r216182.

  • The gold plugin has been rewritten in order to fix bug 19901. r216215.

Clang commits

  • C++1y is now called C++14. r215982.

  • CGcall (code generation for function call) has been refactored. r216251.

Other project commits

  • The libcxx build and test system gained support for LLCM_USE_SANITIZER. r215872.

  • libcxxabi/libunwind now supports baremetal ARM. r216202.

by Alex Bradbury ( at August 25, 2014 07:28 PM

August 18, 2014


LLVM Weekly - #33, Aug 18th 2014

Welcome to the thirty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects.LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Binaries for LLVM/Clang 3.5RC2 are now available for testing. Try it on your codebases, and be sure to report any regressions.

LDC 0.14.0 has been released. LDC is an LLVM-based compiler for the D programming language. There's a mixture of new features and bug fixes, see the release notes for full details of what's changed.

Viva64, who sell the PVS-Studio static analyzer has written up their experiences of using the Clang static analyzer on the PVS-Studio codebase. It managed to find 12 issues which the blog author considers genuine bugs.

On the mailing lists

LLVM commits

  • FastISel for AArch64 will now make use of the zero register when possible and supports more addressing modes. r215591, r215597.

  • MIPS gained support for the .end, .end, .frame, .mask, and .fmask assembler directives. r215359.

  • ARM gained the MRS/MSR system instructions. r215700.

Clang commits

  • Documentation has been added describing how the Language options in .clang-format files works. r215443.

  • Prefetch intrinsics were added for ARM and AArch64. r215568, r215569.

  • The logic for the -include command line parameter is now properly implemented. r215433.

Other project commits

  • LLD now has initial support for ELF/AArch64. r215544.

  • UndefinedBehaviourSanitizer gained a returns-nonnull sanitizer. This verifies that functions annotated with returns_nonnull do return nonnull pointers. r215485.

  • A number of lldb tests now compile on Windows. r215562.

by Alex Bradbury ( at August 18, 2014 12:07 PM

August 11, 2014


LLVM Weekly - #30, Jul 28th 2014

Welcome to the thirtieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Nuno Lopes, David Menendez, Santosh Nagarakatte, and John Regehr have written about ALIVe. This is a very promising tool that aims to aid the specification and proof of peephole optimisations (such as those currently found in LLVM's InstCombine). It uses an SMT solver in order to prove optimisations correct (and if incorrect, provides a counter-example).

Source and binaries for the first LLVM/Clang 3.5 Release Candidate are now available. If you like your LLVM releases to be on-time and regression-free, do your part and test them on your codebases.

Thomas Ströder and colleagues have recently published a paper "Proving Termination and Memory Safety for Programs with Pointer Arithmetic" which creates symbolic execution graphs from LLVM IR in order to perform its analysis. The preprint is available here.

The next Cambridge (UK) LLVM Social will be on the 30th July, at 7.30 pm.

On the mailing lists

LLVM commits

  • Support for scoped noalias metadata has been added. The motivation for this is to preserve noalias function attribute information when inlining and to model block-scope C99 restrict pointers. r213864, r213948, r213949.

  • The llvm-vtabledump tool is born. This will dump vtables inside object files. Right now it only supports MS ABI, but will in the future support Itanium ABI vtables as well. r213903.

  • The llvm.assume intrinsic has been added. This can be used to provide the optimizer with a condition it may assume to be true. r213973.

  • The loop vectorizer has been extended to make use of the alias analysis infrastructure. r213486.

  • Various additions have been made to support the PowerPC ELFv2 ABI. r213489, r213490, and more.

  • The R600 backend gained an instruction shrinking pass, which will convert 64-bit instructions to 32-bit when possible. r213561.

  • The llvm.loop.vectorize.unroll metadata has been renamed to llvm.loop.interleave.count. r213588.

  • LLVM 3.5 release notes for MIPS have been committed, if you're interested in seeing a summary of work in the last development cycle. r213749.

  • The IR backward compatibility policy is now documented. r213813.

Clang commits

  • Support for #pragma unroll was added. r213574.

  • Clang learned a range of AVX-512 intrinsics. r213641.

  • Work on MS ABI support continues. r214004.

Other project commits

  • A dynamic loader for the Hexagon DSP was committed to lldb as well as an ABI description. r213565, r213566.

  • A new fast-path implementation of C++ demangling has been added to lldb. It promises significantly better performance. r213671.

by Alex Bradbury ( at August 11, 2014 11:16 AM

LLVM Weekly - #31, Aug 4th 2014

Welcome to the thirty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Renato Golin has posted a summary of feedback from his talk on LLVM and GCC collaboration at the GNU Tools Cauldron. This both summarises the main areas he's looking for collaboration, and the feedback that people gave at the event or afterwards.

This blog post describes how to use Obfuscator-LLVM to to obfuscate Android NDK binaries.

On the mailing lists

LLVM commits

  • FastISel for AArch64 saw a number of improvements, including support for shift-immediate, arithmetic with overflow intrinsics. r214345, r214348, and more.

  • The SLPVectorizer has seen a largeish commit that implements an "improved scheduling algorithm". Sadly the commit message offers no further details. r214494.

  • TargetInstrInfo gained isAsCheapAsMove which takes a MachineInstruction and returns true if that instruction is as cheap as a move instruction. r214158.

  • LLVM libraries can now be exported as importable CMake targets, making it easier for those building LLVM-based projects. This is now documented. r214077.

  • Release notes for PowerPC changes during 3.5 development have been committed. r214403.

  • Initial work towards supporting debug locations for fragmented variables (e.g. by-value struct arguments passed in registers) has been committed. r214576.

Clang commits

  • Work on support for the MSVC ABI continues. Clang will now consider required alignment constraints on fields. r214274.

  • AddressSanitizer now passes source-level information from Clang to ASan using metadata rather than by creating global variables. r214604.

  • The PowerPC backend now support selection of the ELFv1/ELFv2 ABI via the -mabi= option. r214074.

Other project commits

  • lld gained support for interworking between thumb and ARM code with Mach-O binaries. r214140.

  • A massive ABI testsuite (contributed by Sony) has been committed to the test-suite repo. r214126.

by Alex Bradbury ( at August 11, 2014 11:16 AM

LLVM Weekly - #32, Aug 11th 2014

Welcome to the thirty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

Some readers may be interested to know that lowRISC, a project to produce a fully open-source SoC started by a number of us at the University of Cambridge Computer Lab has been announced. We are hiring.

The canonical home for this issue can be found here at

News and articles from around the web

Codeplay contributed the LLDB MI (Machine Interface) frontend a while ago, and have now committed some additional features. To coincide with that, they've published a series of blog posts covering the MI driver's implementation, how to set it up from within Eclipse, and how to add support for new MI commands.

McSema, a framework for transforming x86 programs to LLVM bitcode has now been open-sourced. The talk about McSema from the ReCON conference is also now online.

Registration for the LLVM Developer's Meeting 2014 is now open. The event will take place in San Jose on October 28th-29th. You have until September 1st to submit your talk/BoF/poster/tutorial proposal.

On the mailing lists

LLVM commits

  • Initial work on the MachineCombiner pass landed. This estimates critical path length of the original instruction sequence vs a transformed (combined) instruction sequence and chooses the faster code. An example given in the commit message is choosing between add+mul vs madd on AArch64, and a followup commit implements MachineCombiner for this target. r214666, r214669.

  • A few useful helper functions were added to the LLVM C API: LLVM{IsConstantString, GetAsString, GetElementAsConstant}. r214976.

  • A whole load of AVX512 instructions were added. r214719.

  • FastISel for AArch64 now support basic argument lowering. r214846.

  • A flag has been added to experiment with running the loop vectorizer before the SLP vectorizer. According to the commit message, eventually this should be the default. r214963.

  • The old JIT is almost dead, it has been removed (for those not paying close attention, 3.5 has already been branched so still contains the old JIT). However, the patch was then reverted, so it's in zombie status. r215111.

  • AArch64 gained a load balancing pass for the Cortex-A57, which tries to make maximum use of available resources by balancing use of even and odd FP registers. r215199.

Clang commits

  • Thread safety analysis gained support for negative requirements to be specified. r214725.

  • Coverage mapping generation has been committed. The -fcoverage-mapping command line option can be used to generate coverage mapping information, which can then be combined with execution counts from instrumentation-based profiling to perform code coverage analysis. r214752.

  • A command line option to limit the alignment that the compiler can assume for an arbitrary pointer. r214911.

Other project commits

  • LLDB's FileSpec class learned to understand Windows paths. r215123.

  • LLDB learned a whole bunch of new commands and features for its Machine Interface. r215223.

  • OpenMP gained PowerPC64 support. r215093.

by Alex Bradbury ( at August 11, 2014 11:15 AM

Sylvestre Ledru

clang 3.4, 3.5 and 3.6 are now coinstallable in Debian

Clang is finally co installable on Debian. 3.4, 3.5 and the current trunk (snapshot) can be installed together.

So, just like gcc, the different version can be called with clang-3.4, clang-3.5 or clang-3.6.

/usr/bin/clang, /usr/bin/clang++, /usr/bin/scan-build and /usr/bin/scan-view are now handled through the llvm-defaults package.

llvm-defaults is also now managing clang-check, clang-tblgen, c-index-test, clang-apply-replacements, clang-tidy, pp-trace and clang-query.

Changes are also available on
The next step will be to manage also llvm-defaults on to simplify the transition for people using these packages.

So, with:

# /etc/apt/sources.list
deb llvm-toolchain main
deb llvm-toolchain-3.4 main
deb llvm-toolchain-3.5 main
$ apt-get install clang-3.4 clang-3.5 clang-3.6

$ clang-3.4 --version
Debian clang version 3.4.2 (branches/release_34) (based on LLVM 3.4.2)
Target: x86_64-pc-linux-gnu
Thread model: posix

$ clang-3.5 --version
Debian clang version 3.5.0-+rc2-1~exp1 (tags/RELEASE_350/rc2) (based on LLVM 3.5.0)
Target: x86_64-pc-linux-gnu
Thread model: posix

$ clang-3.6 --version
Debian clang version 3.6.0-svn214990-1~exp1 (trunk) (based on LLVM 3.6.0)
Target: x86_64-pc-linux-gnu
Thread model: posix

by Sylvestre at August 11, 2014 05:47 AM

July 21, 2014


LLVM Weekly - #29, Jul 21st 2014

Welcome to the twenty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects.LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

This is a special extended issue which I'm choosing to subtitle "LLVM Weekly visits the GNU Tools Cauldron". The event took place over the weekend and had a wide range of interesting talks. You can find my notes at the end of this newsletter. Talks were recorded and the videos should be made available in the next month or two.

The canonical home for this issue can be found here at

News and articles from around the web

The eighth annual LLVM Developers meeting has been announced and will take place on October 28th and 29th in San Jose, CA. It is looking for sponsors and talk/poster submissions.

A new blog post as been published on the LLVM Blog giving more details on FTL: WebKit's LLVM-based JIT.

A tentative schedule for the release of LLVM/Clang 3.5 has been posted.

Botond Ballo has posted a summary of June's C++ Standards Committee Meeting.

On the mailing lists

LLVM commits

  • A dereferenceable attribute was added. This indicates that the parameter or return pointer is dereferenceable (i.e. can be loaded from speculatively without a risk of trapping). This is subtly different to the nonnull attribute which doesn't necessarily imply dereferenceability (you might for instance have a pointer to one element past the end of an array). r213385.

  • A new subtarget hook was added to allow targets to opt-out of register coalescing.r213078, r213188.

  • A MergedLoadStoreMotion pass was added. r213396.

  • RegionInfo has been templatified to it works on MachineBasicBlocks. r213456.

  • A monster patch from Nvidia adds a whole bunch of surface/texture intrinsics to the NVPTX backend. r213256.

  • Support was added for emitting warnings if vectorization is forced and fails. r213110.

  • Improvements to FastISel continue with the implementation of the FastLowerCall hook for X86. This actually reproduces what was already being done in X86, but is refactored against the target independent call lowering. r213049.

  • The ARM dmb, dsb and isb intrinsics have been implemented for AARch64. r213247.

Clang commits

  • Clang's rewrite engine is now a core feature (i.e. it can not be disabled at configure time). r213171.

  • Error recovery when the programmer mistypes :: as : was improved. r213120.

  • The AARch64 Clang CLI interface proposal for -march has been implemented. See the commit message for details. r213353.

  • OpenMP work continues with the addition of initial parsing and semantic analysis for the final, untied and other clauses, and the master directive. r213232, r213257, r213237, and more.

Other project commits

  • The 'Kalimba' platform is now supported by lldb (presumably this refers to the CSR processor). r213158.

LLVM Weekly at the GNU Tools Cauldron

For full details on the conference and details on the speakers for the talks I've summarised below see the GNU Tools Cauldron 2014 web page. Apologies for any inaccuracies, please do get in touch if you spot anything I may have noted incorrectly. LLVM followers may be particularly interested in Renato Golin's talk on collaboration between the GCC and LLVM communities.

Glibc BoF

  • 2.20 is in "slushy" freeze mode. What else is left? fmemopen, fd locking, some -Wundef work
  • Anyone planning to check in something big for 2.21?
    • Mentor Graphics planning to check in a NIOS II port. They won't be accepted until Linux kernel patches are in a kernel release.
    • A desire for AArch64 ILP32 ABI to get in. Kernel patches currently in review, compiler work is ready.
    • OpenRISC
    • NaCl (nptl)
  • Benchmarking glibc? Does anyone have a good approach. There is a preload library approach (see notes from Ondrej's talk).
  • Glibc has been built with AddressSanitizer, help needed to get it integrated into the build system. There was a comment this would be nice to get in to distributions.
  • Red Hat are working on supporting alternate libm implementations, including a low-precision and high-precision implementation. Intel are looking to add math functions that work on small vectors.

Abigail: toward ABI taming

  • Want to determine if changes to your shared library break apps for users, and users want to know whether an updated library remains compatible with their code. The bidiff tool will tell you the differences in terms of ABI given two object files as its input.
  • libabi consists of modules such as a DWARF reader, the comparison engine. Tools such as bidiff are built on this API
  • What's next for libabigail?
    • bicompat will help application authors determine whether their application A is still compatibile with an updated version of a given library L by examining the undefined symbols of A that are resolved by L.
    • More amenable to automation (such as integration into build systems)
    • Support for un-instantiated templates. This would require declarations of uninstantiated templates to be represented in DWARF.
  • A first official release (though source is available at

Writing VMs in Java and debugging them with GDB

  • Oracle Labs have been working on various dynamic language implementations in Java (e.g. Ruby, Python, R, JS, ...).
  • FastR is a reimplementation of R in Java featuring an interpreter (Truffle) and dynamic compiler (Graal).
  • Truffle and Graal starts with an AST interpreter. The first time a node is evaluated it is specialised to the type that was seen at runtime. Later the tree is compiled using partial evaluation.
  • It may be deployed on standard HotSpot (no compilation), GraalVM, or the SubstrateVM (SVM) which uses Graal to ahead-of-time compile the language implementation. Debugging the SVM is difficult as Java debugging tools are not available. The solution is to generate DWARF information in the SVM's output.
  • Truffle and Graal are open source, the SubstrateVM is not (yet?).

GCC and LLVM collaboration

  • Good news: license issues, personal grudges and performance are off-topic.
  • Users should be protected from whatever disagreements take place. In the future we should have more pro-active discussions on various issues as opposed to reactive discussions regarding e.g. compiler flags that have been noticed to be arbitrarily different after the fact.
  • Renato lists common projects that we may collaborate on: binutils, glibc, sanitizers. Sanitizers are a collaboration success story.
  • Can we agree on a (new?) common user interface?
  • There's a surprising amount of confusion about -march, -mtune, and -mcpu considering we're in a room of compiler developers. It sounds like there's not much support for re-engineering the set of compiler flags as the potential gain is not seen as being great enough.
  • Can we agree to standardise on attributes, C/C++ extensions, builtins, ASM, the linker API?
  • GCC docs have just been rewritten, so some criticisms about how difficult it is to dig in are no longer valid.

Machine Guided Energy Efficient Compilation

  • Initial investigations in 2012 found that compiler flags can have a meaningful effect on energy consumption. This raises the question of how to determine which flags to use.
  • MAGEEC will target both GCC and LLVM initially. It is implemented as a compiler plugin which performs feature extraction and allows the output of the machine learning algorithm to change the sequence of passes which are run. Fractional Factorial Design is used to reduce the optimisation space to explore.
  • Turning passes on/off arbitrarily can often result in internal compiler errors. Should the machine learning algorithm learn this, or should GCC better document pass requirements?
  • It would be useful to MAGEEC if the (currently internal) plugin API could be stabilized. They also currently have to use a hacked up Clang as it doesn't provide plugin hooks.
  • The project has produced a low cost energy measurement board as well as their own benchmark suite (Bristol/Embecosm Embedded Benchmark Suite, or BEEBS). BEEBS 2.0 is schedule for release by 31st August 2014 with a much wider range of benchmarks (currently 93). Jeremy showed a rather pleasing live demo where you can run a benchmark on a microcontroller development board and immediately find the number of mJ consumed in running it.
  • The current state of the project has it not achieving better results than GCC O2, but this is expected to change over the coming months.

Just-in-time compilation using GCC

  • is an experimental branch of GCC which allows you to build GCC as a shared library and embed it in other programs in order to allow in-process code generation at runtime.
  • A dedicated API for JIT will allow better stability guarantees. It provides a high-level API designed for ease of use.
  • The API doesn't offer solutions for type inference, escape analysis, unboxing, inline caching, etc.
  • It has a C++ API wich includes some cunning operator overloading to massively reduce verbosity, and a Python API.
  • David Malcolm has written Coconut, a JIT compiler for Python using It is incomplete and experimental.
  • Drawback: currently have to write out a .s to a file and invoke gcc on it.
    Some might make a cheeky comment about the benefits of architecting a compiler so it can be used as a library, but I of course wouldn't dare. The good news is the speaker is actively looking at what would be needed to use GAS and GNU ld as a library.

Introduction to new Intel SIMD ISA and its impact on GCC

  • AVX-512 offers 64 simple precision or 32 double precision floating point operations per cycle. It also has 8x64-bit mask registers.
  • Rounding modes can be set on a per-instruction process
  • Basic support is available from GCC 4.9.x.

News from Sanitizers

  • MemorySanitizer detects use of uninitialized memory. Increases CPU by about 2.5x and RAM by 2x. Was released in LLVM in 2013. It is currently Linux/x86-64 only.
  • History growth is limited by limiting the history depth and the number of new history nodes per stack trace.
  • MSan has found hundreds of bugs across Google internal code, Chromium, LLVM, etc. It was more challenging for Chromium due to the number of system libs that had to be rebuilt.
  • AddressSanitizer annotations allows you to detect access to the regions of e.g. std::vector<> which has been allocated as part of its capacity but not yet been used (i.e. will start to be used in the next push_back). Next is to do the same for std::string and std::deque.
  • Glibc uses GNU-C instead of ANSI C which currently prevents compilation with Clang (nested functions in particular are problematic). It can however be built with ASan by GCC.
  • Evgeniy comments that the lack of standardisation between Clang and GCC for things like __has_feature(address_sanitizer) vs __SANITIZE_ADDRESS__ is irritating. This is just the sort of thing Renato was talking about yesterday of course.

glibc performance tuning

  • Use memset as an example. Look at 3 variants.
  • Writing a useful benchmark is more difficult than you might think. Simply running memset many times in a loop is not a good benchmark when using the same memory locations due to the processor's load-store forwarding. Even when fixing this, the branch predictor may perform much better than it would when memset is used in a real world scenario and lead to unrepresentative results.
  • To move beyond microbenchmarks, Ondrej has been using LD_PRELOAD to link against instrumented versions of the functions which record details about the time taken.
  • See herefor memset benchmarks and here for more background.
  • strcmp was the most frequently called glibc function in Ondrej's testing (when running Firefox).

Devirtualization in GCC

  • This is a special case of indirect call removal, and although the talk is given in the context of C++ the techniques apply to other languages too. Some basic cases are handled in the front-end and even specified by the language standard.
  • It is a special case of constant propagation across aggregates, which is already done by Global Value Numbering and Interprocedural Constant Propagation. But these passes only catch a tiny number of possible cases.
  • Loss of information between the frontend and middle end can make some cases almost impossible. The intermediate language can be extended with explicit representations of base types, locations of virtual table pointers, and vtables. Also annotate polymorphic calls specifying instance and polymorphic call type and flags to denote constructors/destructors.
  • I'm not able to summarise details on the GCC devirt implementation better than the slides do. Hopefully they'll be made available online.
  • A particular challenge is to match types between different compilation units. The C++ One Definition Rule is used.
  • It can be used to strengthen unreachable function removal.
  • Feedback-directed devirtualization was extended in GCC 4.9 to work inter-module with LTO.

by Alex Bradbury ( at July 21, 2014 10:31 AM

July 17, 2014


FTL: WebKit’s LLVM based JIT

Over the past year, the WebKit project made tremendous progress on the ability to optimize JavaScript applications. A major part of that effort was the introduction of the Fourth Tier LLVM (FTL) JIT. The Fourth Tier JIT targets long-running JavaScript content and performs a level of optimization beyond WebKit's interpreter, baseline JIT, and high-level optimizing JIT. See the FTL Optimization Strategy section below for more on WebKit's tiered optimizations. The engineering advancements within WebKit that made the FTL possible were described by Filip Pizlo in the Surfin' Safari Blog post, Introducing the WebKit FTL JIT. On April 29, 2014, the WebKit team enabled FTL by default on trunk: r167958.

This achievement also represents a significant milestone for the LLVM community. FTL makes it clear that LLVM can be used to accelerate a dynamically type checked languages in a competitive production environment. This in itself is a tremendous success story and shows the advantage of the highly modular and flexible design of LLVM. It is the first time that the LLVM infrastructure has supported self-modifying code, and the first time profile guided information has been used inside the LLVM JIT. Even though this project pioneered new territory for LLVM, it was in no way an academic exercise. To be successful, FTL must perform at least as well as non-FTL JavaScript engines in use today across a range of workloads without compromising reliability. This post describes the technical aspects of that accomplishment that relate to LLVM and future opportunities for LLVM to improve JIT compilation and the LLVM infrastructure overall.

Read on for more information.

FTL Performance

JavaScript pages are ubiquitous and users expect fast load times, which WebKit's architecture is well suited for. However, some JavaScript applications require nontrivial computation and may run for periods longer than one hundred milliseconds. These applications demand aggressive compiler optimization and code generation tuned for the target CPU. FTL brings the full gamut of compiler technology to bear on the problem.

As with any high level language, high level optimizations must come first. Grafting an optimizing compiler backend onto an immature frontend would be futile. The marriage of WebKit's JIT with LLVM's optimizer and code generation works for two key reasons:

  1. Before translating to LLVM IR, WebKit's optimizing JIT operates on an IR that clearly expresses JavaScript semantics. Through type inference and profile-driven speculation, WebKit removes as much of the JavaScript abstraction penalty as possible.
  2. LLVM IR has now adopted features for supporting speculative, profile-driven optimization and avoiding the performance penalty associated with abstractions when they cannot be removed.
As a result, WebKit can engage the FTL on any long-running JavaScript method. In areas of the code dominated by abstraction overhead, FTL-compiled code is at least competitive with that of a custom JIT designed specifically for JavaScript. In areas of the code where WebKit can remove the abstraction penalty, FTL can achieve fantastic speedups.

Asm.js is a subset if JavaScript that avoids abstraction penalties, allowing JITs to directly benefit from low-level performance optimization. Consequently, the performance advantage of FTL is likely to be quite apparent on asm.js benchmarks. But although FTL performs well on asm.js, it is in no way customized to the standard. In fact, with FTL, regular JavaScript code written in a style similar to asm.js will derive the same benefits. Furthermore, as WebKit's high-level optimizations become even more advanced, the benefits of FTL will expand to a broader set of idiomatic JavaScript code.

A convenient way to measure the impact of LLVM optimizations on JavaScript code is by running C/C++ benchmarks that have been compiled to asm.js code via emscripten. This allows us to compare native C/C++ performance with WebKit's third tier (DFG) compiler and with WebKit FTL.

Figure 1: Time to run benchmarks from LLVM test-suite.
Figure 1 shows the time taken to run a handful of benchmarks from LLVM's own test-suite. The benchmark workloads have been adjusted to run for approximately one second. In every case, FTL achieves significant improvement over WebKit's non-LLVM JIT (DFG). In some cases, the emscripten compiled JavaScript code is already approaching native C performance, but in other cases FTL code still takes about twice as long as clang compiled C code[1]. One reason for the discrepancy between clang and FTL is the call overhead required for maintaining the JavaScript runtime's additional frame information. Another reason is that LLVM loop optimizations are not yet sophisticated enough to remove bounds and overflow checks and thus have not been enabled. These benchmarks are very tight loops, so a minor inefficiency, such as an extra compare or store in the loop, can result in a significant slowdown.

[1] gcc-loops is currently an outlier because clang performance recently sped up dramatically from auto-vectorization that has not been enabled yet in FTL.

FTL Optimization Strategy

WebKit's tiered architecture provides flexibility in balancing responsiveness, profile collection, and compiler optimization. The first tier is the low-level interpreter (LLInt). The second is the baseline JIT--a straightforward translation from JavaScript to machine code. WebKit's third tier is known as the Data Flow Graph (DFG) JIT. The DFG has its own high-level IR allowing it to perform aggressive JavaScript-specific optimization based on the profile data collected in earlier tiers. When running as a third tier, the DFG quickly emits code with additional profiling hooks. It may be invoked again as a fourth tier, but this time it produces LLVM IR for traditional compiler optimization.

Figure 2. The DFG and FTL JIT optimization pipelines (from Introducing the WebKit FTL JIT).
We reuse most of the DFG phases. The new FTL pipeline is a drop-in replacement for the third-tier DFG backend. It involves additional JavaScript-aware optimizations over DFG SSA form, followed by a phase that lowers DFG IR to LLVM IR. We then invoke LLVM's optimization pipeline and LLVM's MCJIT backend to generate machine code.

The DFG JIT front end generates LLVM IR in a form that is amenable to the same optimizations traditionally performed with C code. The most notable differences are summarized in FTL-Style LLVM IR.

Figure 3. The FTL optimization pipeline after lowering to LLVM IR.
After lowering to LLVM IR, FTL applies a subset of mid-level optimizations that are currently the most important in JavaScript code. It then invokes the LLVM backend for the host architecture with full optimization. This optimizes the code for the target CPU using aggressive instruction selection, register allocation, and machine-specific optimization.

LLVM Patch Points

Patch points are the key LLVM feature that allows dynamic type checking, inline caching, and runtime safety checks without penalizing performance. In October, 2013, we submitted a proposal to amend LLVM IR with patch points to the LLVM developer list. Since then, we've successfully implemented patch points for multiple architectures and their performance impact has been validated for various use cases, including branch-to-fail safety checks, inline caches, and code invalidation points. The details of the current design are explained in the LLVM specification of stack map and patch point intrinsics.

Patch points are actually two features in one intrinsic. The first feature is the ability to identify the location of specific values at the intrinsic's final instruction address. During code emission, LLVM records that information as meta-data alongside the object code in what we call a "stack map". A stack map communicates to the runtime the location of important values. This is a slight misnomer given that locations may refer to register names. Typically, the runtime will read values out of stack map locations when it needs to reconstruct a stack frame. This commonly occurs during "deoptimization"--the process of replacing an FTL stack frame with a lower-tier frame.

The second feature of patch points is the ability of the runtime to patch the compiled code at specific instruction address. To allow this, the intrinsic reserves a fixed amount of instruction encoding space and records the instruction address of that space along with the stack map. Because the runtime needs to know the location of values precisely at the point it patches code, the two features must be combined into one intrinsic.

Patch points are viewed by LLVM passes much like unknown call sites. An important aspect of their design is the ability to specify the effective calling convention. For example, code invalidation points are almost never taken and the call site should not clobber any registers, otherwise the register allocator could be severely restricted by frequent runtime checks. An optional feature of stack maps is the ability to record the registers that are actually live in compiled code at each call site. This way the JIT can declare a call as preserving all registers to maximize compiler freedom, but at the same time the runtime can avoid unnecessary save and restore operations when the "cold" call is actually taken.

To better support inline cache optimizations, LLVM now has a special "anyregcc" calling convention. This convention allows any number of arguments to be forced into registers without pinning down the name of the register. Consequently, the compiler does not have to place arguments in particular registers or stack locations, or emit extra copies and spills around call sites, and the runtime can emit efficient patched code sequences that operate directly on registers.

The current patch point design is labeled experimental so that it may continue to evolve without preserving bitcode compatibility. LLVM should soon be ready to adopt the patch point intrinsic in its final form. However, the current design should first be extended to capture the semantics of high level language runtime checks. See Extending Patchpoints.


FTL attempts to generate LLVM IR that closely resembles what the optimizer expects to see from other typical compiler frontends. Nonetheless, lowering JavaScript semantics into LLVM operations tends to result in IR with different characteristics from statically compiled C code. This section summarizes those differences. More details and examples will be provided in a subsequent blog post.

The prevalence of patch points in the IR means that values tend to have many more uses and can be live into a large number of patch point call sites. FTL emits patch points for a few distinct situations. First, when the FTL front end (DFG) fails to eliminate type checks or bounds checks, it emits explicit compare and branch operations in the IR. The branch target lands at a patch point intrinsic followed by unreachable. This can result in much more branchy code than LLVM typically handles with C benchmarks. Fortunately, LLVM's awareness of branch probability means that the branch-to-fail idiom does not excessively hinder optimization and code generation. Heap access and polymorphic calls also use patch points, but these are emitted directly inline with the hot path. This allows the runtime to implement inline caches with specific instruction sequences that can be patched as program behavior evolves. Finally, runtime calls may act as code invalidation points. A runtime event, such as a potential change in object layout, may invalidate speculatively optimized code. In this case WebKit emits nop patch points that can be overwritten with a separate runtime call at an invalidation event. This effectively invalidates all code that follows the original runtime call.

Some type checks result in multiple fast paths. For example, WebKit may check a numeric value for either a floating-point or fixed point representation and emit LLVM IR for both paths. This may result in a sequence of redundant checks interleaved with control flow merges.

To support integer overflow checks, when they cannot be removed through optimization, FTL emits llvm.sadd.with.overflow intrinsics in place of normal add instructions. These intrinsics ensure that the code generator produces an optimal code sequence for the overflow checks. They are also used by other active LLVM projects and are gradually gaining support within LLVM optimization passes.

LLVM heuristics are often sufficient to guess branch probability. However FTL makes the job easier by directly emitting LLVM branch weight meta-data based on profiling. This is particularly important when partially compiling a method starting at the inner loop. Such compilations can squash nested loops so that LLVM's heuristics can no longer infer the loop depth from the CFG structure.

FTL builds an internal model of the JavaScript program's type system determined by profiling. It conveys this information to LLVM via type-based-alias-analysis (tbaa) meta-data. In FTL tbaa, each object field has a unique tag. This is a very effective approach to memory disambiguation, and much simpler than the access-path scheme that clang now uses.

Another way that FTL deviates from the norm, is in its use of inttoptr instructions. These are used to materialize addresses of runtime objects, including all data and code from outside the current compilation unit (currently a single method at a time). inttoptr is also used to convert an untyped JS value to a pointer. Occasionally, pointer arithmetic is performed on non-pointer types rather than using getelementptr instructions. This is primarily a convenience and has not proven to hinder optimization. FTL's use of tbaa is effective enough to obviate the need to analyze getelementptr when the base address is already an unknown object.

An important pattern that occurs in FTL's LLVM IR is the repeated use of the same large constants that are used as masks to disambiguate tagged values, or several constants that represent global addresses that tend to be at small offsets from each other. LLVM's current one basic block a time code generation approach resulted in redundant rematerialization of the same large constant in each basic block. The fact that FTL creates a large number of basic blocks even further exacerbated this problem. The LLVM code generator has been enhanced to avoid these expensive repeated rematerialization of such constant values.


The FTL JIT successfully leverages LLVM's existing MCJIT framework for runtime compilation. MCJIT was designed as a low-level toolkit that allows runtime compilers to be built by reusing as much of the static compiler's machinery as possible. This approach improves maintainability on the LLVM side. It integrates with the existing compiler toolchain and allows developers to test features of the runtime compiler without understanding a particular JIT client. The current API, however, does not provide a simple out-of-the-box abstraction for portable JITs. Overcoming the impedance mismatch between WebKit goals and the low-level MCJIT API required close collaboration between WebKit and LLVM engineers. As LLVM becomes more important as a JIT platform, it should provide a more complete C API to improve interoperability with JIT clients and decrease the fragility and maintenance burden within the client code base.

Bridging the gap between LLVM internals and portable JITs can be accomplished by providing more convenience wrappers around the existing MCJIT framework and adding richer C APIs for object code parsing and introspection. Ideally, a cross-platform JIT client like WebKit should not need to embed target-specific details about LLVM code generation on the client side. The JIT should be able to request LLVM to emit code for the current host process without understanding LLVM's language of target triples and CPU features. LLVM could generally provide a more obvious C API for lazily invoking runtime compilation. Along these lines, a JIT should be able to reuse the MCJIT execution engine for multiple modules without the overhead of reinitializing pass manager instances each time. An API also needs to be added for configuring the code generation pass manager. Most of the coordination between the JIT and LLVM now occurs directly through a memory manager API, which can be awkward for the JIT client. For example, WebKit looks for platform-specific section names when allocating section memory in order to locate frame meta-data and debug information. A better interface for WebKit would be a portable API that communicates object code meta-data, including frame information and stack maps. In general, the JIT codebase should not need to provide its own support for platform-specific object file formats. LLVM already has this support, it only needs to be exposed through the C API. Similarly, a JIT should be able to lookup line numbers without implementing its own DWARF parser. An additional layer of functionality for general purpose debug info parsing and object code introspection would not be specific to JIT compilation and could benefit a variety of LLVM clients.

Linking WebKit with LLVM

FTL illustrates an important use case for LLVM: embedding LLVM optimization and codegen libraries cleanly within a larger application running in the same process. The ideal solution is to build a set of LLVM components as a shared library that exports only a limited C API. Several problems have made this a challenging endeavor:
  • The dynamic link time initialization overhead of the static initializers that LLVM defines is unacceptable at program launch time - especially if only parts of the library or nothing at all are used.
  • LLVM initializes global variables that require running exit-time destructors. This causes a multi-threaded parent application that attempts to exit normally to crash instead.
  • As with static initializers, weak vtables introduce an unnecessary and unacceptable dynamic link time overhead.
  • In general only a limited set of methods - the LLVM API - should be exported from the shared library.
  • LLVM usurps process-level API calls like assert, raise, and abort.
  • The resulting size of the LLVM shared library naively built from static libraries is larger than it needs to be. Build logic and conditional compilation should be added to ensure that only the passes and platform support required by the JIT client are ultimately linked into the shared library.
The issues listed above have required clever engineering tricks to circumvent. These are the sort of tricks that hinder adoption of LLVM. Therefore it would be in the best interest of the LLVM community to cooperate on improving the infrastructure for embedding LLVM.

FTL Efficiency

The LLVM optimizer and code generator are composed of generic, retargetable components designed to generate optimal code across an extremely diverse range of platforms. The compile time cost of this infrastructure is substantial and may be an order of magnitude greater than that of a custom-built JIT. Fortunately, WebKit's architecture for concurrent, tiered compilation largely sidesteps this penalty. Nonetheless, there is considerable opportunity to reengineer LLVM for use as a JIT, which will decrease FTL's CPU consumption and increase the breadth of JavaScript applications that benefit from FTL.

When running in a JIT environment, an opportunity exists for LLVM to strike a better balance between compile time and optimization strength. To this end, an alternate "compile-fast" optimization pass pipeline should be standardized so that the LLVM community can work together to maintain an ideal sequence of lighter-weight passes. Long running, iterative IR optimization passes, such as GVN, should be adapted to optionally run in fewer iterations. Hodge-podge passes like InstCombine that run many times should be optionally broken up so that some subset of functionality can run at different times: for example, canonicalize first and optimize later.

There are also considerable opportunities for improving code generation efficiency which will benefit JITs and static compilers alike. LLVM machine IR should be generated directly from LLVM IR without generating a Selection DAG, as proposed by Jakob Olesen in his Proposal for a global instruction selector. The benefit of this improvement would be considerable and widespread. More specific to high level languages, codegen passes should be tuned to handle branchy code more efficiently. For example, the register allocator can be taught to skip expensive analysis at points in the code where branches are not expected to be executed.

One overhead that will remain with the above improvements is simply the cost of bridging WebKit's DFG IR into LLVM IR. This involves lowering to SSA form and constructing LLVM instructions, which currently takes significant amount of time relative to DFG's non-LLVM codegen path. With some scrutiny, this could likely be made more efficient.

Optimization Improvements

Without incurring significant compile time increase, LLVM optimizations can be further improved to handle prevalent idioms in JavaScript programs. One straightforward LLVM IR enhancement would be to associate type-based alias information with call sites. This would improve redundant instruction elimination across runtime calls and patch points. Another area of improvement would be better handling of branch-and-merge idioms. These are quite common in FTL produced IR and can improved through CFG simplification, jump threading, or tail duplication. With careful pass pipeline management, loop optimizations can be enabled, such as auto-vectorization. Once LLVM is analyzing loops, bounds and overflow check elimination optimization can also be implemented. To do this well, patch points will need to be extended with new semantics.

Extending Patch Points

In settings like JavaScript and other high level languages, patch points will be used to transfer control to the runtime when speculative optimization fails in the sense that the program behaves differently than predicted. It is always safe to assume a misprediction and give control back to the runtime because the runtime always knows how to recover. Consequently, patch points could optionally be associated with a check condition and given the following semantics: the patch point code sequence must be executed whenever the condition holds, but may safely be executed at its current location under any superset of the condition. When combined with LLVM loop optimization, the conditional patch point semantics would allow powerful optimization of runtime checks. In particular, bounds and overflow checks could be safely hoisted outside loops. For example, the following simplified IR:

%a = cmp <TrapConditionA>
call @patchpoint(1, %a, <state-before-loop>)
%b = cmp <TrapConditionB>
@patchpoint(2, %b, <state-inside-loop>)
<do something...>

Could be safely optimized into:

%c = cmp <TrapConditionC> // where C implies both A and B
@patchpoint(1, %c, <state-before-loop>)
do something...
Note that the first patch point operand is an identifier that tells the runtime the program location of the intrinsic, allowing it find the correct stack map record for the program state at that location. After the above optimization, not only does LLVM avoid performing repeated checks within the loop, but it also avoids maintaining additional runtime state throughout the loop body.

Generally, high level optimization requiring knowledge of language-specific semantics is best performed on a higher level IR. But in this case, extending LLVM with one aspect of high level semantics allows LLVM's loop and expression analysis to be directly leveraged and naturally extended into a new class of optimization.


WebKit's FTL JIT already shows considerable value in improving JavaScript performance, demonstrating LLVM's remarkable success as a backend for a JavaScript JIT compiler. The FTL project highlights the value of further improving LLVM's JIT infrastructure and reveals several exciting opportunities: improved efficiency of optimization passes and codegen, optimizations targeted toward common idioms present in high level language, enabling more aggressive standard optimizations like vectorization, and extending and formalizing patch point intrinsics. Realizing these goals will require the continued support of the LLVM community and will advance and improve the LLVM project as a whole.

by Andrew Trick ( at July 17, 2014 04:39 PM

July 14, 2014


LLVM Weekly - #28, Jul 14th 2014

Welcome to the twenty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

I'll be at the GNU Tools Cauldron 2014 next weekend, being held at the University of Cambridge Computer Laboratory (which handily is also where I work). If you're there, do say hi.

The canonical home for this issue can be found here at

News and articles from around the web

An update on Clang/LLVM on Windows has been posted on the LLVM blog. Impressive progress has been made, and as I mentioned last week the MSVC compatibility page has been updated.

There is (somewhat amazingly) now a Pascal-86 frontend for LLVM. The compiler frontend is written entirely in Python. More information is available in the author's Master's thesis (via Phoronix).

On the mailing lists

LLVM commits

  • FastISel gained some infrastructure to support a target-independent call lowering hook as well as target-independent lowering for the patchpoint intrinsic. r212848, r212849.

  • DominanceFrontier has been templatified, so in theory it can now be used for MachineBasicBlocks (where previously it was only usable with BasicBlocks). r212885.

  • The quality of results for CallSite vs CallSite BasicAA queries has been improved by making use of knowledge about certain intrinsics such as memcpy and memset. r212572.

  • Work on overhauling x86 vector lowering continues. Chandler now reports that with the new codepath enabled, LLVM is now at performance pairty with GCC for the core C loops of the x264 code when compiling for SSE2/SSE3. r212610.

  • ASM instrumentation for AddressSanitizer is now generated entirely in MachineCode, without relying on runtime helper functions. r212455.

  • Generation of the new mips.abiflags section was added to the MIPS backend. r212519.

  • isDereferenceablePointer will now look through some bitcasts. r212686.

Clang commits

  • A new checker was added, to flag code that tests a variable for 0 after using it as a denominator (implying a potential division by zero). r212731.

  • Clang gained initial support for omp parallel for, the omp parallel sections directive, and omp task. r212453, r212516, r212804.

  • On the ARM target, LLVM's atomicrmw instructions will be used when ldrex/strex are available. r212598.

  • Support was adding for mips-img-linux-gnu toolchains. r212719.

Other project commits

  • ThreadSanitizer's deadlock detector is enabled by default after being battle-tested on the Chromium codebase for some time. r212533.

  • Support for Android's bionic C library has been added to libcxx. r212724.

  • LLDB's Python scripting interface should now work on Windows. r212785.

by Alex Bradbury ( at July 14, 2014 01:49 PM

July 09, 2014


LLVM Weekly - #25, Jun 23rd 2014

Welcome to the twenty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Facebook have released a number of clang plugins they have been using internally. This includes plugins to the clang analyzer primarily for iOS development as well as a plugin to export the Clang AST to JSON. The code is available on Github and they have started a discussion on the mailing list about moving some of this code upstream.

This week saw the release of LLVM and Clang 3.4.2. This is a bug-fix release which maintains ABI and API compatibility with 3.4.1.

Clang's C++ status page now lists C++1z feature status.

On the mailing lists

  • Rafael Espíndola has started a thread to discuss clarification on the backward compatibility promises of LLVM. He summarises what seems to be the current policy (old .bc is upgraded upon read, there is no strong guarantee on .ll compatibility). Much of the subsequent discussion is about issues such as compatibility with metadata format changes.

  • Duncan P.N. Exon Smith has posted a review of the new pass manager in its current form. He starts with a high-level overview of what Chandler Carruth's new PassManager infrastructure offers and has a list of queries and concerns. There are no responses yet, but it's worth keeping your eyes on this thread if you're interested in LLVM internals development.

  • This week has brought two separate proposals for LLVM code coverage support (neither of which have any replies at the time of writing). Christian Holler has proposed inclusion of LLCov code. This is a module pass that instruments basic blocks with calls to functions that will track coverage. The current LLCov code is available on Github. Alex L has also posted a detailed proposal on improving code coverage support for Clang and LLVM. He is looking for feedback on the approach before starting to submit patches.

LLVM commits

  • The LLVM global lock is dead, and the LLVM Programmer's Manual has been updated to reflect this. llvm_start_multithreaded and llvm_stop_multithreaded have been removed.
    r211277, r211287.

  • The patchset to improve MergeFunctions performance from O(NxN) to O(N x log(N)) has finally been completely merged. r211437, r211445 and more.

  • Range metadata can now be attached to call and invoke (previously it could only be attached to load). r211281.

  • ConvertUTF in the Support library was modified to find the maximal subpart of an ill-formed UTF-8 sequence. r211015.

  • LoopUnrollPass will now respect loop unrolling hints in metadata. r211076.

  • The R600 backend has been updated to make use of LDS (Local Data Share) and vectors for private memory. r211110.

  • X86FastISel continues to improve with optimisation for predicates, cmp folding, and support for 64-bit absolute relocations. r211126, r211130.

  • The SLPVectorizer (superword-level parallelism) will now recognize and vectorize non-SIMD instruction patterns like sequences of fadd,fsub or add,sub. These will be vectorized as vector shuffles if they are profitable. r211339.

  • LLVM can now generate native unwind info on Win64. r211399.

Clang commits

  • Clang's OpenMP implementation now contains initial support of the 'reduction' clause, #pragma omp for, the 'schedule' clause, the 'ordered' clause, and the 'nowait' clause. r211007, r211140, r211342, r211347, r211352.

  • MS ABI support continues with the merging of support for x86-64 RTTI. r211041.

  • The -std=c+++1z flag was added to enable support for C++17 features. r211030.

  • The clang User's Manual has been expanded with documentation for profile-guided optimisation with instrumentation. r211085.

  • Emission of ARM NEON intrinsics has been totally rewritten to be easier to read and maintain as well as to provide better protection against coding errors. r211101.

Other project commits

  • compiler-rt now offers add, sub, and mul for IEEE quad precision floating point. r211312, r211313.

by Alex Bradbury ( at July 09, 2014 09:18 AM

July 08, 2014

Aaron Ballman

Member Function Ref Qualifiers

One of the lesser-known features of C++11 is the fact that you can overload your non-static member functions based on whether the implicit this object parameter is an lvalue reference or an rvalue reference by specifying a functions ref-qualifier. This feature works similar to the way cv-qualifiers work when specifying a method must be called on a const or volatile object, and can in fact be combined with cv-qualifiers.

To specify a ref-qualifier for a member function, you can either qualify the function with & or &&. (The ref-qualifier must come after any cv-qualifiers.) For instance, if you wanted to declare a function to be called on an rvalue reference object only, you would write:

struct S {
  void func() &&;

S s1;
s1.func(); // Ill-formed
S().func(); // OK

If you want to overload a function based on the rvalue-ness of the implicit object parameter, you must specify the ref-qualifier for both functions.

struct S {
  void func() &;
  void func() &&;

S s1;
s1.func(); // OK, calls S::func() &
S().func(); // OK, calls S::func() &&

Overloading based on a ref-qualifier is useful in (somewhat rare) circumstances where your object can make use of move semantics to reduce expensive construction costs. For instance:

#include <iostream>
#include <utility>

class ExpensiveState {}; // Details unimportant

class Builder {
  ExpensiveState State;

  Builder() = default;
  Builder(const Builder &O) : State(O.State) {
    std::cout << "Copy" << std::endl;
  Builder(Builder &&O) : State(std::move(O.State)) {
    std::cout << "Move" << std::endl;

  Builder operator()() & {
    return Builder(*this);

  Builder operator()() && {
    return Builder(std::move(*this));

int main() {
  Builder b;


When executed, this code will output: Copy Move Move Move. The Copy is because b is an lvalue, not an rvalue, and so operator()() & will be called. However, the results of that function are an rvalue, and so the subsequent subexpressions will result in calling operator()() &&. Due to this, resources can be stolen from one invocation to the next on the last three subexpressions, reducing the performance penalties of a copy operation.

In case you are wondering why the std::move(*this) is used when constructing a Builder object; the unary expression *this always results in an lvalue, which would end up calling the copy constructor instead of the move constructor. So the std::move call is required to convert the lvalue into an rvalue.

Ref-qualifiers are not something you will likely use often. However, it is never a bad thing to understand the tools the programming language has to offer. Note: ref-qualifiers are currently supported by clang (tested with 3.4), gcc (tested with 4.9) but not MSVC 2013.

by Aaron Ballman at July 08, 2014 02:05 PM


Clang/LLVM on Windows Update

It’s time for an update on Clang’s support for building native Windows programs, compatible with Visual C++!  We’ve been working hard over the last few months and have improved the toolchain in a variety of ways.  All C++ features aside from debug info and exceptions should work well.  This link provide more specific details.  In February we reached an exciting milestone that we can self-host Clang and LLVM using clang-cl (without fallback), and both projects  pass all of their tests!  Additionally both Chrome and Firefox now compile successfully with fallback!  Here are some of the highlights of recent improvements:

Microsoft compatible record layout is done!  It’s been thoroughly fuzz tested and supports all Microsoft specific components such as virtual base table pointers, vtordisps, __declspec(align) and #pragma pack.  This turned out to be a major effort due to subtle interactions between various features.  For example, __declspec(align) and #pragma pack behave in an analogous manner to the gcc variants, but interact with each other in a different manner. Each version of Visual Studio changes the ABI slightly.  As of today clang-cl is layout compatible with VS2013.

Clang now supports all of the calling conventions used up to VS2012.  VS2013 added some new ones that we haven’t implemented yet.  One of the other major compatibility challenges we overcame was passing C++ objects by value on 32-bit x86.  Prior to this effort, LLVM modeled all outgoing arguments as SSA values, making it impossible to take the address of an argument to a call.  It turns out that on Windows C++ objects passed by value are constructed directly into the argument memory used for the function call.  Achieving 100% compatibility in this area required making fundamental changes to LLVM IR to allow us to compute this address.

Most recently support for run time type information (RTTI) was completed.  With RTTI support, a larger set of programs and libraries (for example ICU) compile without fallback and dynamic_cast and typeid both work.  RTTI support also brings along support for std::function.  We also recently added support for lambdas so you can enjoy all of the C++11 functional goodness!

We invite you to try it out for yourself and, as always, we encourage everyone to file bugs!

by Unknown ( at July 08, 2014 03:34 AM

July 07, 2014


LLVM Weekly - #27, Jul 7th 2014

Welcome to the twenty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects.LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

An LLVM code generator has been merged into the MLton whole-program optimizing compiler for Standard ML. This was written by Brian Leibig as part of his Master's thesis, which contains more information on its performance and design.

Eli Bendersky has written a tool which converts the output of Clang's -ast-dump to HTML. See here for an example. The code is available on Github.

Clang's Microsoft Visual C++ compatibility page has been updated to reflect the status of the current SVN trunk. As can be seen from the relevant diff, record layout has been marked complete along with RTTI. Lambdas are now marked mostly complete.

On the mailing lists

LLVM commits

  • The X86 backend now expands atomics in IR instead of as MachineInstrs. Doing the expansions at the IR level results in shorter code and potentially there may be benefit from other IR passes being able to run on the expanded atomics. r212119.

  • The ARM backend learned the ISB memory barrier intrinsic. r212276.

  • The X86 backend gained support for __builtin_ia32_rdpmc which is used to read performance monitoring counters. r212049.

  • The peephole optimizer gained new code (currently disabled) to rewrite copies to avoid copies across register banks. r212100.

  • Control flow graph building code has been moved from MC to a new MCAnalysis library. r212209.

  • TableGen gained support for MSBuiltin, which allows for adding intrinsics for Microsoft compatibility. r212350.

Clang commits

  • MSVC RTTI (run-time type information) implementation has been completed. r212125.

  • The __builin_arm_ldaex and __builtin_arm_stlex intrinsics were added. r212175.

  • Nested blocks are now supported in Microsoft inline assembly. r212389.

Other project commits

  • lldb-gdbserver support has been merged for Linux x86-64. r212069.

  • AddressSanitizer gained support for i686-linux-android. r212273.

  • libcxxabi gained a CMake build system. r212286.

  • lld now supports parsing of x86 and ARM/Thumb relocations for MachO. r212239, r212306.

by Alex Bradbury ( at July 07, 2014 02:35 PM

July 01, 2014

OpenMP Runtime Project

History of the OpenMP Standard

We have created a fun infographic on the history of the OpenMP standard which has been published in the Intel Parallel Universe (pdf). The folks over at liked it so much it’s currently their headline news. We now understand why “a picture is worth a thousand words”, since this took as much effort as writing 5,000!

We hope you enjoy it and find it informative.

by Terry Wilmarth (Intel) at July 01, 2014 07:35 PM

June 30, 2014


LLVM Weekly - #26, Jun 30th 2014

Welcome to the twenty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Vladmir Makarov has done his yearly comparison of GCC and LLVM, posting performance comparisons using SPECInt2000 on ARM and x86-64.

Version 0.13.0 of LDC, the LLVM-based D compiler has been released. This brings a whole host of improvements, listed in detail within the release announcement.

Some Mozilla engineers have been looking at using clang-cl (the MSVC-compatible Clang driver) to build Firefox. With the help of the fallback flag (which falls back o compiling with MSVC if Clang fails) they've managed to get a completed build. Ehsan tells us that 602 of the 7168 files, about 8% require the MSVC fallback at the moment.

Trail of Bits have posted a preview of McSema, a framework for translating x86 binaries to LLVM bitcode. The accompanying talk took place on the 28th June, so hopefully we'll hear more about this soon. The blog post tells us that McSema will be open source and made available soon.

Bruce Mitchener has written up his experience of integrating with LLDB for Dylan.

Codeplay (based in Edinburgh) are advertising for a full time compiler engineer.

On the mailing lists

LLVM commits

  • A significant overhaul of how vector lowering is done in the x86 backend has been started. While it's under development it's off by default, though it's hoped that in times there will be measurable performance improvements on benchmarks conducive to vectorization. r211888 and more.

  • X86 FastISel will use EFLAGS directly when lowering select instructions if the condition comes from a compare. It also now supports floating-point selects among other improvements. r211543, r211544, and more.

  • ScaledNumber has been split out from BlockFrequencyInfo into the Support library. r211562.

  • The loop vectorizer now features -Rpass-missed and -Rpass-analysis reports. r211721.

  • The developer documentation has been updated to clarify that although you can use Phabricator to submit code for review, you should also ensure the relevant -commits mailing list is added as a subscriber on the review and be prepared to respond to comments there. r211731.

  • COMDATs have been added to the IR. What's a COMDAT? StackOverflow has you covered. r211920.

  • The NVPTX backend saw a whole series of commits. r211930, r211932, r211935, and more.

  • LLVM gained an abstraction for a random number generator (RNG). r211705.

Clang commits

  • A nice little diagnostic improvement has been added for when the user accidentally puts braces before the identifer, e.g. int [4] foo;. r211641.

  • OpenMP learned the 'section' directive (and some more, see the full commit logs). r211685, r211767.

Other project commits

  • Support for ARM EHABI unwinding was added to libunwind. r211743.

  • The lldb Machine Interface gained a number of new commands and bug fixes. r211607.

by Alex Bradbury ( at June 30, 2014 06:53 PM

June 23, 2014

Aaron Ballman

Binary Operator Overloading

In C++, there are two forms of binary operator overloading you can use when designing an API. The first form is to overload the operator as a member function of the class, and the second form is to overload the operator as a friend function of the class. I want to explore why you would use one form of overloading instead of the other, using a Fraction class as an example.

For the purposes of this discussion, this is part of the interface for our expository class.

class Fraction {
  // Implementation details live here.

  Fraction(int Whole);
  Fraction(int Numerator, unsigned Demoninator);
  Fraction(double Value);

  // Binary operator overloads live here.

One of the ways we can implement our binary operator overloads is as member functions of the Fraction class. I’m going to pick on the equality operator, but any of the overloaded binary operators would suffice.

  // Binary operator overloads live here.
  bool operator==(const Fraction &RHS) const;

The other way we can implement our binary operator overloads is as a friend function of the Fraction class.

  // Binary operator overloads live here.
  friend bool operator==(const Fraction &LHS, const Fraction &RHS);

Since there are two different ways to implement this, it’s reasonable to ask which way is “correct?” The answer to that question depends on your intentions as a class designer. Consider the following use case:

void f(const Fraction &F) {
  if (1.0 == F) {
    // Do something interesting

Some coding conventions suggest that equality comparisons against a constant value put the constant on the left-hand side of the comparison (so that an accidental assignment operation by typing = instead of == would trigger a compile error), so this example is not particularly far-fetched.

If you use a member function for the operator overload, this code would not compile because there’s no way for the implicit converting constructor from double to Fraction to be called. However, by using a friend function for the operator overload, the compiler can call the converting constructor to create a Fraction object which would make the comparison viable. Because of this, I would claim that declaring the operators to be friends is the correct approach for the class design.

This exemplifies a reasonable way to decide how to implement the overloaded binary operators. If you want to allow implicit conversions for items on the left-hand side of the operator, then using friend function overloads is required. If implicit conversions are not desirable for some reason, or not possible (due to having no implicit converting constructors), then using a member function is acceptable. If you’re looking for a general rule of thumb, I would recommend always using the friend function form — it’s more likely to behave how the user would expect in all cases, instead of having curious edge cases where their usage fails. Imagine how confusing it would be for the user of a Fraction class that SomeFraction * 1 succeeds, but 1 * SomeFraction fails to compile! That being said, it ultimately boils down to a design choice that you must make as a class designer.

I would like to thank Jens Maurer for the design discussion which spawned this blog posting.

by Aaron Ballman at June 23, 2014 02:38 PM


LLVM Weekly - #24, Jun 16th 2014

Welcome to the twenty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

On the mailing lists

LLVM commits

  • A weak variant of cmpxchg has been added to the LLVM IR, as has been argued for on the mailing list. Weak cmpxchg allows failure and the operation returns {iN, i1} (in fact, for uniformity all cmpxchg instructions do this now). According to the commit message, this change will mean legacy assembly IR files will be invalid but legacy bitcode files will be upgraded during read. r210903.

  • X86 FastISel gained support for handling a bunch more intrinsics. r210709, r210720 and more. FastISel also saw some target-independent improvements r210742.

  • This week there were many updates to the MIPS backend for mips32r6/mips64r6. e.g. r210899, r210784 and many more.

  • NoSignedWrap, NoUnsignedWrap and Exact flags are now exposed to the SelectionDAG. r210467.

  • Support has been added for variable length arrays on the Windows on ARM Itanium ABI. r201489.

  • Some simple reordering of fields in Value and User saves 8 bytes of padding on 64-bit. r210501.

  • FastISel will now collect statistics on when it fails with intrinsics. r210556.

  • The MIPS backend gained support for jr.hb and jalr.hb (jump register with hazard barrier, jump and link register with hazard barrier). r210654.

  • AArch64 gained a basic schedule model for the Cortex-A57. r210705.

  • LLVM has transitioned to using std::error_code instead of llvm::error_code. r210687.

Clang commits

  • The -Wdate-time preprocessor warning from GCC has been implemented. This is useful when trying to create reproducible builds. r210511.

  • Loop unroll pragma support was added. r210667.

  • Yet more progress has been made on MS ABI compatibility. e.g. r210813, r210637.

Other project commits

  • libcxx gained an implementation of string_view as proposed in N4023. r210659.

  • Some of the iOS8/OS X Yosemite specific lldb support has been merged. r210874.

by Alex Bradbury ( at June 23, 2014 10:33 AM

June 12, 2014

Philip Reames

IR Restrictions for Late Safepoint Placement

The late safepoint placement pass we released recently has a couple of restrictions on the IR it can handle.  I’ve described those restrictions a couple of different times now, so I figured it was time to put them up somewhere I could reference and that google might find.  A shorter version of this post will also appear in the source code shortly.

The SafepointPlacementPass will insert safepoint polls for method entry and loop backedges.  It will also transform calls to non-leaf functions to statepoints.  The former are how the application (mutator) code interacts with the garbage collector and may actually trigger object relocation.  The latter are necessary so that polls in called functions can inspect and modify frames further up the stack.

The current SafepointPlacementPass works for nearly arbitrary IR.  Fundamentally, we require that:

  • Pointer values may not be cast to integers and back.
  • Pointers to garbage collected objects must be tagged with address space #1

In addition to these fundamental limitations, we currently do not support:

  • safepoints at invokes (as opposed to calls)
  • use of indirectbr
  • aggregate types which contain pointers to GC objects
  • pointers to GC objects stored in global variables, allocas, or at constant addresses
  • constant pointers to garbage collected objects (other than null)
  • garbage collected pointers which are undefined (“undef”)
  • use of gc_root

Patches welcome for the later class of items.  I don’t know of any fundamental reasons they couldn’t be supported.


Fundamentally, a precise garbage collector must be able to accurately identify which values are pointers to garbage collected objects.  We choose to use the distinction between pointer types and non-pointer types in the IR to establish that a particular value is a pointer and use the address space mechanism to distinguish between pointers to garbage collected and non-garbage collected objects.  We don’t require that the types of pointers be precise – in LLVM this would not be a safe assumption! – but we do require that the pointer be a pointer.

We disallow inttoptr instructions, and addrspacecast instructions in an effort to ensure this distinction is upheld.  Otherwise, you could have code like the following:

Object* p = …;
int x = (int)p;
foo(); <– becomes a safepoint, can move objects
Object* p2 = (Object*)x;

Note that while the SafepointPlacementPass will try to check for some violations of this assumption, it will not catch all cases.  At the end of the day, it is the responsibility of the frontend author to get this right.


Now on to the various implementation restrictions.

  • We plan to support safepoints on InvokeInsts.  In fact, the released code already has partial support for this.  This is not a high priority for us at the moment, but should be fairly straight forward to complete if anyone is interested.
  • IndirectBr creates problems for the LoopSimplify pass which we use as a helper for identifying backedges in loops.  Our source language doesn’t have any need for indirect branches, but if anyone can identify a better way to detect backedges which doesn’t involve this restriction, we’d gladly take the patch.
  • Currently, we not support finding pointers to garbage collected objects contained in first class aggregate types in the IR.  The extensions required to support this are fairly straight forward, but we have no need for this functionality.  Well structured patches are welcome, but since this will be a fairly invasive change, please coordinate the merge early and closely.  (Alternatively, wait until this has been merged into upstream LLVM and use the standard incremental review and commit process.)
  • Note that we have no plans to support untagged unions containing pointers.  We could support tagged pointers, but this would require either extensions to the IR, or language specific hooks exposed in the SafepointPlacementPass.  If you’re interested in this topic, please contact me directly.
  • The support for pointers to GC objects in global variables, allocas, or arbitrary constant memory locations is weak at best.  There’s some code intended to support these cases, but tests are lacking and the code is likely to be buggy.  Patches are welcome.
  • We do not support constants pointers to garbage collected objects other than null.  For a relocating garbage collector, such constant pointers wouldn’t make sense.  If you’re  interested in supporting non-relocating collectors or relocating collectors with pinned objects, some extensions may be necessary.
  • We have not integrated the late safepoint placement approach with the existing gcroot mechanism.  Given this mechanism is simply broken, we do not plan to do so.  Instead, we plan to simply remove that support once late safepoint placement lands.  If you’re interested in migrating from one approach to the other, please contact me directly.  I’ve got some ideas on how to make this easy using custom transform passes, but don’t plan on investing any time in this unless requested by interesting parties.


by reames at June 12, 2014 09:47 PM

June 09, 2014


LLVM Weekly - #23, Jun 9th 2014

Welcome to the twenty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Philip Reames has announced that code for late safepoint placement is now available.
This is a set of patches to LLVM from Azul Systems that aim to support precise relocating garbage collection in LLVM. Phlip has a long list of questions where he is seeking feedback from the community on how to move forwards with these patches. There's not been much response so far, hopefully it will come soon as I know there are many communities who are very interested in seeing better GC support in LLVM (e.g. Rust, Ocaml).

The biggest LLVM news this week is of course the announcement of the Swift programming language from Apple. Its development was led by Chris Lattner, original author of LLVM. He has some more info about Swift on his website. There is no source release as of yet, and no indication from Apple as to whether it will remain proprietary. Either way, it's an interesting development. Chris Lattner is now on Twitter and has been passing out tidbits about the Swift implementation.

LunarG have announced the Glassy Mesa project. This project, funded by Valve, will explore increasing game performance in Mesa through improvements in the shader compiler. The current parser and optimisation layer are replaced with glslang and the LLVM-based LunarGlass. More technical details are available in the slide deck.

Sébastien Métrot has released xspray, a frontend for lldb on OS X. One of its interesting features is the inbuilt support for plotting your data.

With all the LLVM news recently, it seems search traffic for 'llvm' has skyrocketed.

On the mailing lists

LLVM commits

  • The jumptable attribute has been introduced. If you mark a function with this attribute, references to it can be rewritten with a reference to the appropriate jump-instruction-table function pointer. r210280.

  • Support was added for Windows ARM exception handling data structures, including decoding them. r209998, r210192.

  • GlobalAlias can now point to an arbitrary ConstantExpression. See the commit message for a discussion of the consequences of this. r210062.

  • The subword level parallelism (SLP) vectorizer has been extended to support vectorization of getelementptr expressions. r210342.

  • The LLVM programmer's manual has been improved with an example of using IRBuilder. r210354.

Clang commits

  • Semantic analysis to make sure a loop is in OpenMP canonical form has been committed. r210095.

  • __builtin_operator_new and __builtin_operator_delete have been added. Some optimisations are allowed on these which would not be on ::operator new and are intended for the implementation of things like std::allocator. r210137.

  • New pragmas have been introduced to give optimisation hints for vectorization and interleaving. You can now use #pragma clang loop vectorize(enable) as well as vectorize(disable), vectorize_width(n), interleave(enable/disable), and interleave_count(n). r210330.

  • Support for the MSVC++ ABI continues with the addition of dynamic_cast for MS. r210377.

  • Support for global named registers has been expanded slightly to allow pointer types to be held in these variables. r210274.

  • GCC's -Wframe-larger-than=bytes diagnostic is now supported. r210293.

Other project commits

  • A benchmarking-only mode has been added to the testsuite r210251.

  • A status page for post-C++14 features in libcxx has been added. r210056.

  • An initial set of Makefiles has been committed to lld. r210177.

  • lldb gained support for inspecting enum members. r210046.

  • Polly can now be built without any GPLed software. r210176.

by Alex Bradbury ( at June 09, 2014 01:37 PM

LLVM Weekly - #22, Jun 2nd 2014

Welcome to the twenty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects.LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

Last week I expressed worry about GMANE not updating. I'm happy to report that it's back to normal now. Some of my readers might be interested in my account of the neat Raspberry Pi-based projects I saw at Maker Faire Bay Area.

The canonical home for this issue can be found here at

News and articles from around the web

David Given has shared his partially complete backend for the VideoCore IV VPU as used in the BCM2835 in the Raspberry Pi. It would also be interesting to see a QPU LLVM backend now it has been publicly documented.

Documentation on how TableGen's DAGISel backend works has been updated.

The LLVM Compiler Infrastructure in HPC Workshop has been announced. This is a workshop to be held in conjunction with SC14. The deadline for the call for papers is September 1st.

Tartan is a Clang analysis plugin for GLib and GNOME. To quote its homepage "The plugin works by loading gobject-introspection metadata for all functions it encounters (both functions exported by the code being compiled, if it is a library; and functions called by it). This metadata is then used to add compiler attributes to the code, such as non-NULL attributes, which the compiler can then use for static analysis and emitting extra compiler warnings."

On the mailing lists

LLVM commits

  • A LoadCombine pass was added, though is disabled by default for now. r209791.

  • AAPCS-VFP has been taught to deal with Cortex-M4 (which only has single precision floating point). r209650.

  • InstructionCombining gained support for combining GEPs across PHI nodes. r209843.

  • Vectorization of intrinsics such as powi, cttz and ctlz is now allowed. r209873.

  • MIPS64 long branch has been optimised to be 3 instructions smaller. r209678.

Clang commits

  • OpenMP implementation continues. Parsing and Sema have been implemented for OMPAlignedClause. r209816.

  • The -Rpass-missed and -Rpass-analysis flags have been added. pass-missed is used by optimizers to inform the user when they tried to apply an optimisation but couldn't, while pass-analysis is used to report analysis results back to the user. A followup commit documents the family of flags. r209839, r209841.

  • The clang optimize pragma has now been documented. r209738.

  • There has been some API refactoring. The release and take methods were removed from ActionResult and Owned removed from Sema. r209800, r209812.

Other project commits

  • ThreadSanitizer has seen a refactoring of storage of meta information for heap blocks and sync objects. r209810.

by Alex Bradbury ( at June 09, 2014 11:59 AM

June 04, 2014

Philip Reames

Code for late safepoint placement available

This post contains the text of an email I sent to the LLVMdev mailing list a few moments ago.  I would encourage you to direct technical questions and comments to that thread, though I will also respond to technical questions in comments posted here.

As I’ve mentioned on the mailing list a couple of times over the last few months, we’ve been working on an approach for supporting precise fully relocating garbage collection in LLVM.  I am happy to announce that we now have a version of the code available for public view and discussion.

Our goal is to eventually see this merged into the LLVM tree.  There’s a fair amount of cleanup that needs to happen before that point, but we are actively working towards that eventual goal.

Please note that there are a couple of known issues with the current version (see the README).  This is best considered a proof of concept implementation and is not yet ready for production use.  We will be addressing the remaining issues over the next few weeks and will be sharing updates as they occur.

In the meantime, I’d like to get the discussion started on how these changes will eventually land in tree.  Part of the reason for sharing the code in an early state is to be able to build a history of working in the open, and to to able to merge minor fixes into the main LLVM repository before trying to upstream the core changes.  We are aware this is a fairly major change set and are happy to work within the community process in that regard.

I’ve included a list of specific questions I know we’d like to get feedback on, but general comments or questions are also very welcome.

Open Topics:

  • How should we factor the core GC support for review?  Our current intent is to separate logically distinct pieces, and share each layer one at a time.  (e.g. first infrastructure enhancements, then intrinsics and codegen support, then verifiers, then safepoint insertion passes)  Is this the right approach?
  • How configurable does the GC support need to be for inclusion in LLVM?  Currently, we expect the frontend to mark GC pointers using address spaces.  Do we need to support alternate mechanisms?  If so, what interface should this take?
  • How should we approach removing the existing partial support for garbage collection? (gcroot)  Do we want to support both going forward?  Do we need to provide a forward migration path in bitcode?  Given the usage is generally though MCJIT, we would prefer we simply deprecate the existing gcroot support and target it for complete removal a couple of releases down the road..
  • What programmatic interface should we present at the IR level and where should it live?  We’re moving towards a CallSite like interface for statepoints, gc_relocates, and gc_results call sites.  Is this the right approach?  If so, should it live in the IR subtree, or Support?  (Note: The current code is only about 40% migrated to the new interface.)
  • To support invokable calls with safepoints, we need to make the statepoint intrinsic invokable.  This is new for intrinsics in LLVM.  Is there any reason that InvokeInst must be a subclass of CallInst? (rather than a view over either calls or invokes like CallSite)  Would changes to support invokable intrinsics be accepted upstream?  Alternate approaches are welcome.
  • Is the concept of an abstract VM state something LLVM should know about?  If so, how should it be represented?  We’re actively exploring this topic, but don’t have strong opinions on the topic yet.
  • Our statepoint shares a lot in the way of implementation and semantics with patchpoint and stackmap.  Is it better to submit new intrinsics, or try to identify a single intrinsic which could represent both?  Our current feeling is to keep them separate semantically, but share implementation where possible.

Philip (& team)

p.s. Sanjoy, one of my co-workers,  will be helping to answer questions as they arise.

p.p.s. For those wondering why the current gcroot mechanism isn’t sufficient, I covered that in a previous blog post:

by reames at June 04, 2014 04:54 PM

May 26, 2014


LLVM Weekly - #21, May 26th 2014

Welcome to the 21st issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

I'm back in the UK and mostly recovered from the ensuing jetlag. I am however disturbed that all mailing lists on GMANE don't seem to have been updated for the past week and have been unable to find any explanation of what is going on online. GMANE is an important and massively useful aggregrator and archiver of free software development lists and I really hope these are only temporary problems. For this issue, I have instead linked directly to the mailman archives at UIUC.

The canonical home for this issue can be found here at

News and articles from around the web

Jonathan Mah has written a Clang plugin for checking key path strings in Objective C code. The implementation is available on Github.

LWN has published an article about ThreadSanitizer v2.

This week, the merge of the AArch64 and the Apple-contributed ARM64 backends was completed. The old AArch64 was deleted and the result of merging code from AArch64 in to ARM64 was renamed to AArch64.

A paper 'Static energy consumption analysis of LLVM IR programs' has been posted to

On the mailing lists

LLVM commits

  • A new attribute, 'nonnull' has been added. When applied to a parameter or return pointer this indicates it is not null, which may allow additional optimisations (at least, avoiding comparisons between that value and null). r209185, r209193.

  • The llvm.arm.undefined intrinsic has been added. This is used to generate the 0xde opcode on ARM. It takes an integer parameter, which might be used by the OS to implement custom behaviour on the trap. r209390.

  • The MIPS disassembler has seen some work. Some support has been added for MIPS64r6 and various issues fixed. r209415.

  • LLVM learned the -pass-remarks-missed and -pass-remarks-analysis command line options. -pass-remarks-missed shows diagnostics when a pass tried to apply a transformation but couldn't. -pass-remarks-analysis shows information about analysis results. r209442.

  • The documentation for the llvm.mem.parallel_loop_access metadata has been updated. r209507.

  • Old AArch64 has been removed and ARM64 renamed to AArch64. r209576, r209577.

Clang commits

  • clang-format has seen more JS support. It can now reformat ES6 arrow functions and ES6 destructuring assignments. r209112, r209113.

  • Experimental checkers for the clang static analyzer are now documented. r209131.

  • Support was added to clang for global named registers, using the LLVM intrinsics which were recently added. r209149.

  • Clang learned the no_split_stack attribute to turn off split stacks on a per-function bases. r209167.

  • Clang learned the flatten attribute. This causes calls within the function to be inlined where possible. r209217.

  • An initial version of codegen for pragma omp simd has been committed. This also adds CGLoopInfo which is a helper for marking memory instructions with llvm.mem.parallel_loop_access metadata. r209411.

  • The pragma clang optimize {on,off} has been implemented. This allows you to selectively disable optimisations on certain functions. r209510.

  • An implementation of Microsoft ABI-compatible RTTI (run-time type information) has landed. r209523.

Other project commits

  • 'Chained origins' as used by MemorySanitizer has been redesigned. r209284.

by Alex Bradbury ( at May 26, 2014 11:32 AM

May 19, 2014


LLVM Weekly - #20, May 19th 2014

Welcome to the twentieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

This week's issue is perhaps a little less thorough than normal. I've been in San Francisco most of the week for Maker Faire this weekend, where I was at the Raspberry Pi booth with some other Foundation members. As this issue goes out, I'll be enjoying my last day in SF before heading to the airport for the long flight home and the ensuing jetlag.

The canonical home for this issue can be found here at

News and articles from around the web

The WebKit blog features an excellent and detailed article about the new Fourth Tier LLVM JIT which sheds light on the how and why.

The Neu framework has recently been announced. It is a C++11 framework, collection of programming languages and software system designed for artificial intelligence applications and technical computing in general. It makes use of the LLVM MC JIT for its NPL language as well as generating high performance neural networks.

On the mailing lists

LLVM commits

  • The inliner has been taught how to preserve musttail invariants. r208910.

  • A new C API has been added for a thread yielding callback. r208945.

  • Another patch in the series to improve MergeFunctions performance has been committed. A total ordering has now been implemented among operations. r208973, r208976.

  • The ARM load/store optimisation pass has been fixed to work with Thumb1. r208992.

  • GlobalValue has been split into GlobalValue and GlobalObject, which allows a code to statically accept a Function or a GlobalVariable but not an alias. r208716.

  • Integral reciprocal was optimised to not use division. This optimisation was influenced by Souper. r208750. Another optimisation opportunity uncovered by Souper was signed icmp of -(zext V). r208809.

  • I rather like that these transforms for single bit tests were verified with Z3. r208848.

  • PowerPC gained global named register support, for r1, r2 and r13 (depending on the subtarget). r208509.

  • Documentation was added for the ARM64 BigEndian NEON implementation. r208577.

  • The constant folder is now better at looking through bitcast constant expressions. This is a first step towards fixing this poor performance of these range comprehensions. r208856.

Clang commits

  • Initial support for MS ABI compliant RTTI mangling has been committed. r208661, r208668.

  • Clang will no longer copy objects with trivial, deleted copy constructors. This fixes bugs and improves ABI compatibility with GCC and MSVC. r208786. Though the itanium C++ ABI part was reverted for now. r208836.

Other project commits

  • The LLDB Machine Interface has been committed. This is an implementation of the GDB Machine Interface, useful for implementing your own frontend to LLDB. r208972.

  • AddressSanitizer started to gain some windows tests. r208554, r208859, r208873 and more.

  • The instrumented profiling library API was fixed to work with shared objects, and profiling is now supported for dlopened shared libraries.. r208940, r209053.

by Alex Bradbury ( at May 19, 2014 01:23 PM

May 12, 2014


LLVM Weekly - #19, May 12th 2014

Welcome to the ninteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

I'm flying out to San Francisco tomorrow and will be there for the Bay Area Maker Faire at the weekend with some other Raspberry Pi Foundation people. If you're around, be sure to say hi.

The canonical home for this issue can be found here at

News and articles from around the web

LLVM 3.4.1 has been released. This is a bug-fix release so offers API and ABI compatibility with LLVM 3.4. Thanks to everyone who contributed to the release by suggesting or backporting patches, and for testing.

John Regehr has shared some early results and discussion on using Souper (a new superoptimizer for LLVM IR) in combination with Csmith and C-reduce in order to find missed optimisations and then produce minimal test cases. This has already resulted in a new performance bug being filed with I'm sure many more to come.

Crange, a tool to index and cross-reference C/C++ source code built on top of Clang has been released. It aims to offer a more complete database than e.g. ctags, though the running time on a large codebase like the Linux kernel is currently very high.

llgo, the LLVM-based compiler for Go is now self-hosting.

Last week I asked for benchmarks of the new JavascriptCore Fourth Tier LLVM JIT. Arewefastyet from Mozilla now includes such results. FTLJIT does particularly well on asm.js examples.

On the mailing lists

LLVM commits

  • A new algorithm has been implemented for tail call marking. A build of clang now ends up with 470k calls in the IR marked as tail vs 375k before. The total tail call to loop conversions remains the same though. r208017.

  • llvm::function_ref has been introduced and described in the LLVM programmers manual. It is a type-erased reference to a callable object. r208025, r208067.

  • Initial support for named register intrinsics (as previously discussed on the mailing list has landed. Right now, only the stack pointer is supported. Other non-allocatable registers could be supported with not too much difficulty, allocatable registers are much harder. r208104.

  • The -disable-cfi option has been removed. LLVM now requires assemblers to support cfi (control-flow integrity) directives in order to generate stack unwinding information. r207979.

  • The superword-level parallelism (SLP) pass is now enabled by default for link time optimisation. r208013.

  • The llvm-cov documentation has been expanded r208098.

  • The second and third patch of a series to improve MergeFunctions performance to O(n*log(n)) has been merged. r208173, r208189.

  • The standard 'x86-64' CPU used as the default architecture now uses the Sandy Bridge scheduling model in the hope this provides a reasonable default over a wide range of modern x86-64 CPUs. r208230.

  • Custom lowering for the llvm.{u|s}add.with.otherflow.i32 intrinsics as been added for ARM. r208435.

Clang commits

  • MSVC ABI compatibility has again been improved. Clang now understands that the 'sret' (a structure return pointer) is passed after 'this' for MSVC. r208458.

  • Initial codegen from OpenMP's #pragma omp parallel has landed. r208077.

  • Field references to struct names and C++11 aliases are now supported from inline asm. r208053.

  • Parsing and semantic analysis has been implemented for the OpenMP proc_bind clause. r208060.

  • clang-format gained initial support for JavaScript regex literals (yes, clang-format can reformat your JavaScript!). r208281.

Other project commits

  • libcxxabi gained support for ARM zero-cost exception handling. r208466.

  • In libcxx, std::vector gained Address Sanitizer support. r208319.

  • The test suite from OpenUH has been added to the openmp repository. 208472.

by Alex Bradbury ( at May 12, 2014 03:04 PM

May 09, 2014


LLVM 3.4.1 Release

LLVM 3.4.1 has been released!  This is a bug-fix release that contains fixes for the AArch64, ARM, PowerPC, R600, and X86 targets as well as a number of other fixes in the core libraries.

The LLVM and Clang core libraries in this release are API and ABI compatible with LLVM 3.4, so projects that make use of the LLVM and Clang API and libraries will not need to make any changes in order to take advantage of the 3.4.1 release.

Bug-fix releases like this are very important for the project, because they help get critical fixes to users faster than the typical 6 month release cycle, and also make it easier for operating system distributors who in the past have had to track and apply bug fixes on their own.

A lot of work went into this release, and special thanks should be given to all the testers who helped to qualify the release:

Renato Golin
Sebastian Dreßler
Ben Pope
Arnaud Allard de Grandmaison
Erik Verbruggen
Hal Finkel
Nikola Smiljanic
Hans Wennborg
Sylvestre Ledru
David Fang

In addition there were a number community members who spent time tracking down bugs and helping to resolve merge conflicts in the 3.4 branch.  This is what made this release possible, so thanks to everyone
else who helped.

I would like to keep the trend of stable releases going to 3.5.x and beyond (Maybe even 3.4.2 if there is enough interest), but this can only be
done with the help of the community.  If you would like to help with the next stable release or even regular release, then the next time you see a proposed release schedule on the mailing list, let the release manager know you can help.  We can never have too many volunteers.

Thanks again to everyone who helped make this release possible.


by Tom ( at May 09, 2014 05:28 PM

May 05, 2014


LLVM Weekly - #18, May 5th 2014

Welcome to the eighteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

I'm going to be in the San Francisco area May 13th-20th with some other Raspberry Pi people. We'll be at Maker Faire Bay Area on the 17th and 18th. Let me know if there's anything else I should check out while over there.

The canonical home for this issue can be found here at

News and articles from around the web

Andrew Ruef has written a blog post about using static analysis and Clang to find the SSL heartbleed bug. The code for the checker described in the blog post is available on Github.

The FTL ('Fourth tier LLVM') Javascript JIT is now enabled in WebKit for Mac. The WebKit Wiki has more information. I haven't seen any public benchmark figures. Please do share if you have any.

Eli Bendersky has written an article about how to use libTooling to implement source to source transformations.

The next Paris LLVM Social will take place on May 5th (i.e. this evening).

The LLVM Bay Area social will take place on May 8th. Please RSVP if you are interested.

On the mailing lists

LLVM commits

  • The patch to perform common subexpression elimination for a group of getelementptrs that was discussed a couple of weeks ago has been merged. It is currently only enabled for the NVPTX backend. r207783.

  • X86 code generation has been implemented for the musttail function attribute. r207598.

  • Pass run listeners were added to the pass manager. This adds C/C++ APIs to enable fine-grain progress report and safe suspension points. See the commit message for more info r207430.

  • The optimisation remark system has started to be used, with calls to emitOptimizationRemark added to the loop unroller and vectorizer. r207528, r207574.

  • The SLPVectorizer gained the ability to recognize and vectorize intrinsic math functions. r207901.

Clang commits

  • NRVO (named return value optimisation) determination was rewritten. According to the commit message, "a variable now has NRVO applied if and only if every return statement in that scope returns that variable." Also, NRVO is performed roughly 7% more often in a bootstrap clang build. r207890.

  • libclang's documentation comment API has been split in to a separate header. r207392.

  • The SLPVectorizer (superword-level parallelism) is now disabled at O0, O1 and Oz. r207433. It was later re-enabled at Oz. r207858.

  • The libclang API now supports attributes 'pure', 'const', and 'noduplicate'. r207767.

  • The comment parser no longer attempts to validate HTML attributes (the previous solution was insufficient). r207712.

Other project commits

  • R_MIPS_REL32 relocation are now supported in lld. r207494.

  • A collection of CTRL+C related issues were fixed in lldb. r207816.

by Alex Bradbury ( at May 05, 2014 04:01 PM

April 28, 2014


LLVM Weekly - #17, Apr 28th 2014

Welcome to the 17th issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

Last week I wondered why the GCC logo is a GNU leaping out of an egg. Thank you to everyone who wrote in to let me know it is a reference to EGCS. GCC was of course famously forked as EGCS which was later merged back in. Apparently this was pronounced by some as "eggs". Mystery solved.

The canonical home for this issue can be found here at

News and articles from around the web

GCC 4.9.0 was released last Tuesday. See here for more detailed notes on changes in this release.

Honza Hubička wrote a blog post on the history of linktime optimisation in GCC, which was followed by a post containing a benchmark comparison of LTO in GCC vs LLVM.

On Twitter, @lambdamix drew my attention to Notes on Graph Algorithms Used in Optimizing Compilers (PDF). I imagine it will be of interest to many LLVM Weekly readers.

On the mailing lists

LLVM commits

  • The 'musttail' marker which was proposed several weeks ago has been added. Unlike the 'tail' marker, musttail guarantees that tail call optimization will occur. Check the documentation added in the commit for a more detailed explanation. r207143.

  • The rewrite of BlockFrequencyInfo finally landed. A description of the advantages of the new algorithm is in the original commit message, r206548. After a series of bounces, it landed in r206766.

  • LLVM can now generate PE/COFF object files targeting 'Windows on ARM'. r207345.

  • A CallGraph strongly connected components pass manager has been added making use of the new LazyCallGraph analysis framework. This is part of the new pass manager work Chandler Carruth has been working on and is of course a work in progress. r206745.

  • The scheduler model for the Intel Silvermont microarchitecture has been replaced. The commit message claims substantial improvements on integer tests. I'm assuming RAL in this context refers to RegAllocLocal? r206957.

  • ARM64 has of course seen a large number of changes. Among those, support for feature predicates for NEON/FP/CYPTO instructions. This allows the compiler to generate code without using those instructions. r206949. Additionally, there is now a big endian version of the ARM64 target machine. r206965.

  • getFileOffset has been dropped from LLVM's C API. Justification is in the commit message. r206750.

  • The LoopVectorize pass now keeps statistics on the number of analyzed loops and the number of vectorized loops. r206956.

  • The x86 backend gained new intrinsics for Read Time Stamp Counter. r207127.

  • Initial work on mutation support for the lazy call graph has landed. As with most of Chandler's commits, there's much more information in the commit message. r206968.

  • MCTargetOptions has been introduced, which for now only contains a single flag. SanitizeAddress enabled AddressSanitizer instrumentation of inline assembly. r206971.

  • llvm-cov now supports gcov's --long-file-names option. r207035.

Clang commits

  • Documentation for sample profiling was added. r206994.

  • Support for parsing the linear clause for the 'omp simd' directive was added. r206891.

  • Clang gained support for the -fmodules-search-all option, which searches for symbols in non-imported modules (i.e. those referenced in module maps but not imported). r206977.

Other project commits

  • AddressSanitizer gained an experimental detector for "one definition rule" violations (where two globals with the same name are defined in different modules). r207210.

by Alex Bradbury ( at April 28, 2014 02:50 PM