Planet Clang

April 10, 2018


EuroLLVM'18 developers' meeting program

The LLVM Foundation is excited to announce the program for the EuroLLVM'18 developers' meeting (April 16 - 17 in Bristol/UK) !





Student Research Competition

Lightning Talks


If you are interested in any of this talks, you should register to attend the EuroLLVM'18. Tickets are limited !

More information about the EuroLLVM'18 is available here

by Arnaud Allard de Grandmaison ( at April 10, 2018 05:03 PM

March 13, 2018


DragonFFI: FFI/JIT for the C language using Clang/LLVM


A foreign function interface is "a mechanism by which a program written in one programming language can call routines or make use of services written in another".
In the case of DragonFFI, we expose a library that allows calling C functions and using C structures from any languages. Basically, we want to be able to do this, let's say in Python:
import pydffi
CU = pydffi.FFI().cdef("int puts(const char* s);");
CU.funcs.puts("hello world!")
or, in a more advanced way, for instance to use libarchive directly from Python:
import pydffi
CU = pydffi.FFI().cdef("#include <archive.h>")
a = funcs.archive_read_new()
assert a
This blog post presents related works, their drawbacks, then how Clang/LLVM is used to circumvent these drawbacks, the inner working of DragonFFI and further ideas.
The code of the project is available on GitHub: Python 2/3 wheels are available for Linux/OSX x86/x64. Python 3.6 wheels are available for Windows x64. On all these architectures, just use:
$ pip install pydffi
and play with it :)

See below for more information.

Related work

libffi is the reference library that provides a FFI for the C language. cffi is a Python binding around this library that also uses PyCParserto be able to easily declare interfaces and types. Both these libraries have limitations, among them:
  • libffi does not support the Microsoft x64 ABI under Linux x64. It isn't that trivial to add a new ABI (hand-written ABI, get the ABI right, ...), while a lot of effort have already been put into compilers to get these ABIs right.
  • PyCParser only supports a very limited subset of C (no includes, function attributes, ...).
Moreover, in 2014, Jordan Rose and John McCall from Apple made a talk at the LLVM developer meeting of San José about how Clang can be used for C interoperability. This talk also shows various ABI issues, and has been a source of inspiration for DragonFFI at the beginning.

Somehow related, Sean Callanan, who worked on lldb, gave a talk in 2017 at the LLVM developer meeting of San José on how we could use parts of Clang/LLVM to implement some kind of eval() for C++. What can be learned from this talk is that debuggers like lldb must also be able to call an arbitrary C function, and uses debug information among other things to solve it (what we also do, see below :)).

DragonFFI is based on Clang/LLVM, and thanks to that it is able to get around these issues:
  • it uses Clang to parse header files, allowing direct usage of a C library headers without adaptation;
  • it support as many calling conventions and function attributes as Clang/LLVM do;
  • as a bonus, Clang and LLVM allows on-the-fly compilation of C functions, without relying on the presence of a compiler on the system (you still need the headers of the system's libc thought, or MSVCRT headers under Windows);
  • and this is a good way to have fun with Clang and LLVM! :)
Let's dive in!

Creating an FFI library for C

Supporting C ABIs

A C function is always compiled for a given C ABI. The C ABI isn't defined per the official C standards, and is system/architecture-dependent. Lots of things are defined by these ABIs, and it can be quite error prone to implement.

To see how ABIs can become complex, let's compile this C code:

typedef struct {
short a;
int b;
} A;

void print_A(A s) {
printf("%d %d\n", s.a, s.b);

Compiled for Linux x64, it gives this LLVM IR:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@.str = private unnamed_addr constant [7 x i8] c"%d %d\0A\00", align 1

define void @print_A(i64) local_unnamed_addr {
%2 = trunc i64 %0 to i32
%3 = lshr i64 %0, 32
%4 = trunc i64 %3 to i32
%5 = shl i32 %2, 16
%6 = ashr exact i32 %5, 16
%7 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i64 0, i64 0), i32 %6, i32 %4)
ret void

What happens here is what is called structure coercion. To optimize some function calls, some ABIs pass structure values through registers. For instance, an llvm::ArrayRef object, which is basically a structure with a pointer and a size (see, is passed through registers (though this optimization isn't guaranteed by any standard).

It is important to understand that ABIs are complex things to implement and we don't want to redo this whole work by ourselves, particularly when LLVM/Clang already know how.

Finding the right type abstraction

We want to list every types that is used in a parsed C file. To achieve that goal, various information are needed, among which:
  • the function types, and their calling convention
  • for structures: field offsets and names
  • for union/enums: field names (and values)
On one hand, we have seen in the previous section that the LLVM IR is too Low Level (as in Low Level Virtual Machine) for this. On the other hand, Clang's AST is too high level. Indeed, let's print the Clang AST of the code above:
|-RecordDecl 0x5561d7f9fc20 <a.c:1:9, line:4:1> line:1:9 struct definition
| |-FieldDecl 0x5561d7ff4750 <line:2:3, col:9> col:9 referenced a 'short'
| `-FieldDecl 0x5561d7ff47b0 <line:3:3, col:7> col:7 referenced b 'int'
We can see that there is no information about the structure layout (padding, ...). There's also no information about the size of standard C types. As all of this depends on the backend used, it is not surprising that these informations are not present in the AST.

The right abstraction appears to be the LLVM metadata produced by Clang to emit DWARF or PDB structures. They provide structure fields offset/name, various basic type descriptions, and function calling conventions. Exactly what we need! For the example above, this gives (at the LLVM IR level, with some inline comments):

target triple = "x86_64-pc-linux-gnu"
%struct.A = type { i16, i32 }
@.str = private unnamed_addr constant [7 x i8] c"%d %d\0A\00", align 1

define void @print_A(i64) local_unnamed_addr !dbg !7 {
%2 = trunc i64 %0 to i32
%3 = lshr i64 %0, 32
%4 = trunc i64 %3 to i32
tail call void @llvm.dbg.value(metadata i32 %4, i64 0, metadata !18, metadata !19), !dbg !20
tail call void @llvm.dbg.declare(metadata %struct.A* undef, metadata !18, metadata !21), !dbg !20
%5 = shl i32 %2, 16, !dbg !22
%6 = ashr exact i32 %5, 16, !dbg !22
%7 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([...] @.str, i64 0, i64 0), i32 %6, i32 %4), !dbg !23
ret void, !dbg !24

; DISubprogram defines (in our case) a C function, with its full type
!7 = distinct !DISubprogram(name: "print_A", scope: !1, file: !1, line: 6, type: !8, [...], variables: !17)
; This defines the type of our subprogram
!8 = !DISubroutineType(types: !9)
; We have the "original" types used for print_A, with the first one being the
; return type (null => void), and the other ones the arguments (in !10)
!9 = !{null, !10}
!10 = !DIDerivedType(tag: DW_TAG_typedef, name: "A", file: !1, line: 4, baseType: !11)
; This defines our structure, with its various fields
!11 = distinct !DICompositeType(tag: DW_TAG_structure_type, file: !1, line: 1, size: 64, elements: !12)
!12 = !{!13, !15}
; We have here the size and name of the member "a". Offset is 0 (default value)
!13 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !11, file: !1, line: 2, baseType: !14, size: 16)
!14 = !DIBasicType(name: "short", size: 16, encoding: DW_ATE_signed)
; We have here the size, offset and name of the member "b"
!15 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !11, file: !1, line: 3, baseType: !16, size: 32, offset: 32)
!16 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)


DragonFFI first parses the debug information included by Clang in the LLVM IR it produces, and creates a custom type system to represent the various function types, structures, enumerations and typedefs of the parsed C file. This custom type system has been created for two reasons:
  • create a type system that gathers only the necessary informations from the metadata tree (we don't need the whole debug informations)
  • make the public headers of the DragonFFI library free from any LLVM headers (so that the whole LLVM headers aren't needed to use the library)
Once we've got this type system, the DragonFFI API for calling C functions is this one:

DFFI FFI([...]);
// This will declare puts as a function that returns int and takes a const
// char* as an argument. We could also create this function type by hand.
CompilationUnit CU = FFI.cdef("int puts(const char* s);", [...]);
NativeFunc F = CU.getFunction("puts");
const char* s = "hello world!";
void* Args[] = {&s};
int Ret;, Args);

So, basically, a pointer to the returned data and an array of void* is given to DragonFFI. Each void* value is a pointer to the data that must be passed to the underlying function. So the last missing piece of the puzzle is the code that takes this array of void* (and pointer to the returned data) and calls puts, so a function like this:

void call_puts(void* Ret, void** Args) {
*((int*)Ret) = puts((const char*) Args[0]);

We call these "function wrappers" (how original! :)). One advantage of this signature is that it is a generic signature, which can be used in the implementation of DragonFFI. Supposing we manage to compile at run-time this function, we can then call it trivially as in the following:

typedef void(*puts_call_ty)(void*, void**);
puts_call_ty Wrapper = /* pointer to the compiled wrapper function */;
Wrapper(Ret, Args);

Generating and compiling a function like this is something Clang/LLVM is able to do. For the record, this is also what libffi mainly does, by generating the necessary assembly by hand. We optimize the number of these wrappers in DragonFFI, by generating them for each different function type. So, the actual wrapper that would be generated for puts is actually this one:

void __dffi_wrapper_0(int32_t( __attribute__((cdecl)) *__FPtr)(char *), int32_t *__Ret, void** __Args) {
*__Ret = (__FPtr)(*((char **)__Args[0]));

For now, all the necessary wrappers are generated when the DFFI::cdef or DFFI::compile APIs are used. The only exception where they are generated on-the-fly (when calling CompilationUnit::getFunction) is for variadic arguments. One possible evolution is to let the user chooses whether he wants this to happen on-the-fly or not for every declared function.

Issues with Clang

There is one major issue with Clang that we need to hack around in order to have the DFFI::cdef functionality: unused declarations aren't emitted by Clang (even when using -g -femit-all-decls).

Here is an example, produced from the following C code:

typedef struct {
short a;
int b;
} A;

void print_A(A s);
$ clang -S -emit-llvm -g -femit-all-decls -o - a.c |grep print_A |wc -l

The produced LLVM IR does not contain a function named print_A! The hack we temporarily use parses the clang AST and generates temporary functions that looks like this:

void __dffi_force_decl_print_A(A s) { }

This forces LLVM to generate an empty function named __dffi_force_decl_print_A with the good arguments (and associated debug informations).

This is why DragonFFI proposes another API, DFFI::compile. This API does not force declared-only functions to be present in the LLVM IR, and will only expose functions that end up naturally in the LLVM IR after optimizations.

If someone has a better idea to handle this, please let us know!

Python bindings

Python bindings were the first ones to have been written, simply because it's the "high level" language I know best.  Python provides its own set of challenges, but we will save that for another blog post.  These Python bindings are built using pybind11, and provides their own set of C types. Lots of example of what can be achieved can be found here and here.

Project status

DragonFFI currently supports Linux, OSX and Windows OSes, running on Intel 32 and 64-bits CPUs. Travis is used for continuous integration, and every changes is validated on all these platforms before being integrated.

The project will go from alpha to beta quality when the 0.3 version will be out (which will bring Travis and Appveyor CI integration and support for variadic functions). The project will be considered stable once these things happen:
  • user and developer documentations exist!
  • another foreign language is supported (JS? Ruby?)
  • the DragonFFI main library API is considered stable
  • a non negligible list of tests have been added
  • all the things in the TODO file have been done :)

Various ideas for the future

Here are various interesting ideas we have for the future. We don't know yet when they will be implemented, but we think some of them could be quite nice to have.

Parse embedded DWARF information

As the entry point of DragonFFI are DWARF informations, we could imagine parsing these debug informations from shared libraries that embed them (or provide them in a separate file). The main advantage is that all the necessary information for doing the FFI right are in one file, the header files are no longer required. The main drawback is that debug informations tend to take a lot of space (for instance, DWARF informations take 1.8Mb for libarchive 3.32 compiled in release mode, for an original binary code size of 735Kb), and this brings us to the next idea.

Lightweight debug info?

The DWARF standard allows to define lots of information, and we don't need all of them in our case. We could imagine embedding only the necessary DWARF objects, that is just the necessary types to call the exported functions of a shared library. One experiment of this is available here: This is an LLVM optimisation pass that is inserted at the end of the optimisation pipeline, and parse metadata to only keep the relevant one for DragonFFI. More precisely, it only keeps the dwarf metadata related to exported and visible functions, with the associated types. It also keeps debug information of global variables, even thought these ones aren't supported yet in DragonFFI. It also does some unconventional things, like replacing every file and directory by "_", to save space. "Fun" fact, to do this, it borrows some code from the LLVM bitcode "obfuscator" included in recent Apple's clang version, that is used to anonymize some information from the LLVM bitcode that is sent with tvOS/iOS applications (see for more information).

Enough talking, let's see some preliminary results (on Linux x64):
  • on libarchive 3.3.2, DWARF goes from 1.8Mb to 536Kb, for an original binary code size of 735Kb
  • on zlib 1.2.11, DWARF goes from 162Kb to 61Kb, for an original binary code size of 99Kb
The instructions to reproduce this are available in the README of the LLVM pass repository.
We can conclude that defining this "light" DWARF format could be a nice idea. One other thing that could be done is defining a new binary format, that would be thus more space-efficient, but there are drawbacks going this way:
  • debug informations are well supported on every platform nowadays: tools exist to parse them, embed/extract them from binary, and so on
  • we already got DWARD and PDB:
Nevertheless, it still could be a nice experiment to try and do this, figuring out the space won and see if this is worth it!

As a final note, these two ideas would also benefit to libffi, as we could process these formats and create libffi types!

JIT code from the final language (like Python) to native function code

One advantage of embedding a full working C compiler is that we could JIT the code from the final language glue to the final C function call, and thus limit the performance impact of this glue code.
Indeed, when a call is issued from Python, the following things happen:
  • arguments are converted from Python to C according to the function type
  • the function pointer and wrapper and gathered from DragonFFI
  • the final call is made
All this process involves basically a loop on the types of the arguments of the called function, which contains a big switch case. This loop generates the array of void* values that represents the C arguments, which is then passed to the wrapper. We could JIT a specialised version of this loop for the function type, inline the already-compiled wrapper and apply classical optimisation on top of the resulting IR, and get a straightforward conversion code specialized for the given function type, directly from Python to C.

One idea we are exploring is combining easy::jit (hello fellow Quarkslab teammates!) with LLPE to achieve this goal.

Reducing DragonFFI library size

The DragonFFI shared library embed statically compiled versions of LLVM and Clang. The size of the final shared library is about 55Mb (stripped, under Linux x64). This is really really huge, compared for instance to the 39Kb of libffi (also stripped, Linux x64)!

Here are some idea to try and reduce this footprint:
  • compile DragonFFI, Clang and LLVM using (Thin) LTO, with visibility hidden for both Clang and LLVM. This could have the effect of removing code from Clang/LLVM that isn't used by DragonFFI.
  • make DragonFFI more modular: - one core module that only have the parts from CodeGen that deals with ABIs. If the types and function prototypes are defined "by hand" (without DFFI::cdef), that's more or less the only part that is needed (with LLVM obviously) - one optional module that includes the full clang compiler (to provide the DFFI::cdef and DFFI::compile APIs)
Even with all of this, it seems to be really hard to match the 39Kb of libffi, even if we remove the cdef/compile API from DragonFFI. As always, pick the right tool for your needs :)


Writing the first working version of DragonFFI has been a fun experiment, that made me discover new parts of Clang/LLVM :) The current goal is to try and achieve a first stable version (see above), and experiment with the various cited ideas.

It's a really long road, so feel free to come on #dragonffi on FreeNode for any questions/suggestions you might have, (inclusive) or if you want to contribute!


Thanks to Serge «sans paille» Guelton for the discussions around the Python bindings, and for helping me finding the name of the project :) (one of the most difficult task). Thanks also to him, Fernand Lone-Sang and Kévin Szkudlapski for their review of this blog post!

by Adrien Guinet ( at March 13, 2018 02:45 PM

March 08, 2018


International Women's Day: Celebrating all the women in the LLVM Community!

Today is International Women's Day! To all the women in the LLVM community, thank you for all your contributions!

The LLVM Foundation values diversity within the LLVM community and the field of compilers and tools. Our Women in Compilers and Tools program began in 2015 with a birds of a feather discussion during the US LLVM Developers' Meeting and we have been expanding it over the years.

In 2017, we were a sponsor of the Grace Hopper Conference. With the help of community members Anna Zaks and David Blaikie, the LLVM Foundation had a booth at the career fair to introduce women to LLVM and encourage them to become contributors. It was very exciting to learn that many women knew of LLVM, were using it in their classes or research, using it in their career, or were interested in learning more. We hopefully encouraged more women to get involved with LLVM, compilers, and open source.

The LLVM Foundation was also a sponsor of the Programming Language Mentoring Workshop at SPLASH 2017. Our sponsorship went towards the travel costs for many women and other minorities to attend this workshop. The workshop focused on encouraging and preparing students to enter research careers in the field of programming languages, compilers, and related fields and to provide first hand perspectives on graduate school.

We hosted our first Women in Compilers & Tools reception before the 2017 US LLVM Developers' Meeting. Anna Zaks and Alice Chan participated in a panel discussion about the challenges and experiences that they have encountered in their careers and within the open source community. The event was attended by 60 members of the LLVM community.

In 2018, we look forward to another year of expanding our program. The LLVM Foundation will again sponsor the Grace Hopper Conference and we are looking for LLVM community members to help out at the career booth (more details to come). We will be having two Women in Compilers and Tools events. The first will have a reception and panel discussion before the 2018 EuroLLVM Developers' Meeting. Get your tickets here. The second will be before the 2018 US LLVM Developers' Meeting and details will be announced in the coming months.

The LLVM Foundation thanks the LLVM community and its sponsors for supporting this work. If you want to participate in the discussion or receive notifications on events, please join the Women in Compilers and Tools mailing list.

Question for the LLVM Foundation? Email us at

by Tanya Lattner ( at March 08, 2018 07:15 PM

March 06, 2018


Clang is now used to build Chrome for Windows

As of Chrome 64, Chrome for Windows is compiled with Clang. We now use Clang to build Chrome for all platforms it runs on: macOS, iOS, Linux, Chrome OS, Android, and Windows. Windows is the platform with the second most Chrome users after Android according to statcounter, which made this switch particularly exciting.

Clang is the first-ever open-source C++ compiler that’s ABI-compatible with Microsoft Visual C++ (MSVC) – meaning you can build some parts of your program (for example, system libraries) with the MSVC compiler (“cl.exe”), other parts with Clang, and when linked together (either by MSVC’s linker, “link.exe”, or LLD, the LLVM project’s linker – see below) the parts will form a working program.

Note that Clang is not a replacement for Visual Studio, but an addition to it. We still use Microsoft’s headers and libraries to build Chrome, we still use some SDK binaries like midl.exe and mc.exe, and many Chrome/Win developers still use the Visual Studio IDE (for both development and for debugging).

This post discusses numbers, motivation, benefits and drawbacks of using Clang instead of MSVC, how to try out Clang for Windows yourself, project history, and next steps. For more information on the technical side you can look at the slides of our 2015 LLVM conference talk, and the slides linked from there.


This is what most people ask about first, so let’s talk about it first. We think the other sections are more interesting though.

Build time

Building Chrome locally with Clang is about 15% slower than with MSVC. (We’ve heard that Windows Defender can make Clang builds a lot slower on some machines, so if you’re seeing larger slowdowns, make sure to whitelist Clang in Windows Defender.) However, the way Clang emits debug info is more parallelizable and builds with a distributed build service (e.g. Goma) are hence faster.

Binary size

Chrome installer size gets smaller for 64-bit builds and slightly larger for 32-bit builds using Clang. The same difference shows in uncompressed code size for regular builds as well (see the tracking bug for Clang binary size for many numbers). However, compared to MSVC builds using link-time code generation (LTCG) and profile-guided optimization (PGO) Clang generates larger code in 64-bit for targets that use /O2 but smaller code for targets that use /Os. The installer size comparison suggests Clang's output compresses better.

Some raw numbers for versions 64.0.3278.2 (MSVC PGO) and 64.0.3278.0 (Clang). mini_installer.exe is Chrome’s installer that users download, containing the LZMA-compressed code. chrome_child.dll is one of the two main dlls; it contains Blink and V8, and generally has many targets that are built with /O2. chrome.dll is the other main dll, containing the browser process code, mostly built with /Os.

32-bit win-pgo
45.46 MB
36.47 MB
53.76 MB
1.38 MB
32-bit win-clang
45.65 MB
42.56 MB (+16.7%)
62.38 MB
1.45 MB
64-bit win-pgo
49.4 MB
53.3 MB
65.6 MB
1.6 MB
64-bit win-clang
46.27 MB
50.6 MB
72.71 MB
1.57 MB


We conducted extensive A/B testing of performance. Performance telemetry numbers are about the same for MSVC-built and clang-built Chrome – some metrics get better, some get worse, but all of them are within 5% of each other. The official MSVC builds used LTCG and PGO, while the Clang builds currently use neither of these. This is potential for improvement that we look forward to exploring. The PGO builds took a very long time to build due to the need for collecting profiles and then building again, and as a result, the configuration was not enabled on our performance-measurement buildbots. Now that we use Clang, the perf bots again track the configuration that we ship.

Startup performance was worse in Clang-built Chrome until we started using a link-order file – a form of “PGO light” .


We A/B-tested stability as well and found no difference between the two build configurations.


There were many motivating reasons for this project, the overarching theme being the benefits of using the same compiler across all of Chrome’s platforms, as well as the ability to change the compiler and deploy those changes to all our developers and buildbots quickly. Here’s a non-exhaustive list of examples.
  • Chrome is heavily using technology that’s based on compiler instrumentation (ASan, CFI, ClusterFuzz—uses ASan). Clang supports this instrumentation already, but we can’t add it to MSVC. We previously used after-the-fact binary instrumentation to mitigate this a bit, but having the toolchain write the right bits in the first place is cleaner and faster.
  • Clang enables us to write compiler plugins that add Chromium-specific warnings and to write tooling for large-scale refactoring. Chromium’s code search can now learn to index Windows code.
  • Chromium is open-source, so it’s nice if it’s built with an open-source toolchain.
  • Chrome runs on 6+ platforms, and most developers are only familiar with 1-3 platforms. If your patch doesn’t compile on a platform you’re unfamiliar with, due to a compiler error that you can’t locally reproduce on your local development machine, it’ll take you a while to fix. On the other hand, if all platforms use the same compiler, if it builds on your machine then it’s probably going to build on all platforms.
  • Using the same compiler also means that compiler-specific micro-optimizations will help on all platforms (assuming that the same -O flags are used on all platforms – not yet the case in Chrome, and only on the same ISAs – x86 vs ARM will stay different).
  • Using the same compiler enables cross-compiling – developers who feel most at home on a Linux box can now work on Windows-specific code, from their Linux box (without needing to run Wine).
  • We can continuously build Chrome trunk with Clang trunk to find compiler regressions quickly. This allows us to update Clang every week or two. Landing a major MSVC update in Chrome usually took a year or more, with several rounds of reporting internal compiler bugs and miscompiles. The issue here isn’t that MSVC is more buggy than Clang – it isn’t, all software is buggy – but that we can continuously improve Clang due to Clang being open-source.
  • C++ receives major new revisions every few years. When C++11 was released, we were still using six different compilers, and enabling C++11 was difficult. With fewer compilers, this gets much easier.
  • We can prioritize compiler features that are important to us. For example:

Of course, not all – or even most – of these reasons will apply to other projects.

Benefits and drawbacks of using Clang instead of Visual C++

Benefits of using Clang, if you want to try for your project:
  • Clang supports 64-bit inline assembly. For example, in Chrome we built libyuv (a video format conversion library) with Clang long before we built all of Chrome with it. libyuv had highly-tuned 64-bit inline assembly with performance not reachable with intrinsics, and we could just use that code on Windows.
  • If your project runs on multiple platforms, you can use one compiler everywhere. Building your project with several compilers is generally considered good for code health, but in Chrome we found that Clang’s diagnostics found most problems and we were mostly battling compiler bugs (and if another compiler has a great new diagnostic, we can add that to Clang).
  • Likewise, if your project is Windows-only, you can get a second compiler’s opinion on your code, and Clang’s warnings might find bugs.
  • You can use Address Sanitizer to find memory bugs.
  • If you don’t use LTCG and PGO, it’s possible that Clang might create faster code.
  • Clang’s diagnostics and fix-it hints.
There are also drawbacks:
  • Clang doesn’t support C++/CX or #import “foo.dll”.
  • MSVC offers paid support, Clang only gives you the code and the ability to write patches yourself (although the community is very active and helpful!).
  • MSVC has better documentation.
  • Advanced debugging features such as Edit & Continue don’t work when using Clang.

How to use

If you want to give Clang for Windows a try, there are two approaches:
  1. You could use clang-cl, a compiler driver that tries to be command-line flag compatible with cl.exe (just like Clang tries to be command-line flag compatible with gcc). The Clang user manual describes how you can tell popular Windows build systems how to call clang-cl instead of cl.exe. We used this approach in Chrome to keep the Clang/Win build working alongside the MSVC build for years, with minimal maintenance cost. You can keep using link.exe, all your current compile flags, the MSVC debugger or windbg, ETW, etc. clang-cl even writes warning messages in a format that’s compatible with cl.exe so that you can click on build error messages in Visual Studio to jump to the right file and line. Everything should just work.
  2. Alternatively, if you have a cross-platform project and want to use gcc-style flags for your Windows build, you can pass a Windows triple (e.g. --target=x86_64-windows-msvc) to regular Clang, and it will produce MSVC-ABI-compatible output. Starting in Clang 7.0.0, due Fall 2018, Clang will also default to CodeView debug info with this triple.
Since Clang’s output is ABI-compatible with MSVC, you can build parts of your project with clang and other parts with MSVC. You can also pass /fallback to clang-cl to make it call cl.exe on files it can’t yet compile (this should be rare; it never happens in the Chrome build).

clang-cl accepts Microsoft language extensions needed to parse system headers but tries to emit -Wmicrosoft-foo warnings when it does so (warnings are ignored for system headers). You can choose to fix your code, or pass -Wno-microsoft-foo to Clang.

link.exe can produce regular PDB files from the CodeView information that Clang writes.

Project History

We switched chrome/mac and chrome/linux to Clang a while ago. But on Windows, Clang was still missing support for parsing many Microsoft language extensions, and it didn’t have any Microsoft C++ ABI-compatible codegen at all. In 2013, we spun up a team to improve Clang’s Windows support, consisting half of Chrome engineers with a compiler background and half of other toolchain people. In mid-2014, Clang could self-host on Windows. In February 2015, we had the first fallback-free build of 64-bit Chrome, in July 2015 the first fallback-free build of 32-bit Chrome (32-bit SEH was difficult). In Oct 2015, we shipped a first clang-built Chrome to the Canary channel. Since then, we’ve worked on improving the size of Clang’s output, improved Clang’s debug information (some of it behind -instcombine-lower-dbg-declare=0 for now), and A/B-tested stability and telemetry performance metrics.

We use versions of Clang that are pinned to a recent upstream revision that we update every one to three weeks, without any local patches. All our work is done in upstream LLVM.

Mid-2015, Microsoft announced that they were building on top of our work of making Clang able to parse all the Microsoft SDK headers with clang/c2, which used the Clang frontend for parsing code, but cl.exe’s codegen to generate code. Development on clang/c2 was halted again in mid-2017; it is conceivable that this was related to our improvements to MSVC-ABI-compatible Clang codegen quality. We’re thankful to Microsoft for publishing documentation on the PDB file format, answering many of our questions, fixing Clang compatibility issues in their SDKs, and for giving us publicity on their blog! Again, Clang is not a replacement for MSVC, but a complement to it.

Opera for Windows is also compiled with Clang starting in version 51.

Firefox is also looking at using clang-cl for building Firefox for Windows.

Next Steps

Just as clang-cl is a cl.exe-compatible interface for Clang, lld-link is a link.exe-compatible interface for lld, the LLVM linker. Our next step is to use lld-link as an alternative to link.exe for linking Chrome for Windows. This has many of the same advantages as clang-cl (open-source, easy to update, …). Also, using clang-cl together with lld-link allows using LLVM-bitcode-based LTO (which in turn enables using CFI) and using PE/COFF extensions to speed up linking. A prerequisite for using lld-link was its ability to write PDB files.
We’re also considering using libc++ instead of the MSVC STL – this allows us to instrument the standard library, which is again useful for CFI and Address Sanitizer.

In Closing

Thanks to the whole LLVM community for helping to create the first new production C++ compiler for Windows in over a decade, and the first-ever open-source C++ compiler that’s ABI-compatible with MSVC!

by Nico Weber ( at March 06, 2018 06:54 PM

February 14, 2018


LLVM accepted to 2018 Google Summer of Code!

We are excited to announce the LLVM project has been accepted to 2018 Google Summer of Code!

What is Google Summer of Code?

Google Summer of Code (GSoC) is a global program focused on introducing students to open source software development. Students work on a 3 month programming project with an open source organization during their break from university. There are several benefits to this program for both the students and LLVM:

  • Inspire students to get involved with open source, compilers and LLVM
  • Give students exposure to real-world software development while getting paid a stipend
  • Allow students to do paid work related to their academic pursuits versus getting an unrelated summer job
  • Bring new developers into the LLVM project
  • Some LLVM bugs get fixed or new features get added

Students - Apply now! 

Ok, so you can't apply right now as the official application to GSoC does not open until March 12, 2018, but you must begin discussing your project on the LLVM mailing lists well before that date. There are many open projects listed on our webpage. Once you have selected a project, you will discuss it on the appropriate mailing list.

If you have an idea for a project that is not listed, you can always propose it on the list as well and seek out a mentor.

Key Dates to Remember

We have listed a few key dates here, but always consult the official GSoC timeline to confirm.

  • March 12 (16:00 UTC) - Applications open
  • March 27 (16:00 UTC) - Deadline to file your application
  • April 23 (16:00 UTC) - Accepted student proposals are announced
  • May 14 - Coding begins

LLVM Developers - Consider being mentor!

This program is not a success without our mentors. Thank you to all that have all who have already volunteered! If you have never mentored a GSoC project but are curious, it is not too late to volunteer! You can either select an open project without a mentor or propose your own. Make sure to get it listed on the webpage so that students can see it as an option.

If mentoring just isn't an option for you at this time, consider helping the project out my spreading the word about GSoC.


If you have questions about the program for the organizers, please email Project specific questions should be sent to the appropriate developer mailing list instead.

by Tanya Lattner ( at February 14, 2018 09:38 PM

January 09, 2018


Improving Link Time on Windows with clang-cl and lld

One of our goals in bringing clang and lld to Windows has always been to improve developer experience, and what is it that developers want the most?  Faster build times!  Recently, our focus has been on improving link time because it's the step that's the hardest to parallelize so we can't fall back on the time honored tradition of throwing more cores at it.

Of the various steps involved in linking, generating the debug info (which, on Windows, is a PDB file) is by far the slowest since it involves merging O(# of linker inputs) sequences of type records, most of which are duplicate anyway.  For example, if two cpp files both include <string>, then both of those object files will have hundreds of duplicate type records that need to be de-duplicated during the link step.  This means you have to compute O(M x N) hash values, even though only a small fraction of those ultimately contribute to the final PDB.

Several strategies have been invented to deal with this over the years and try to make linking faster.  Many years ago, Microsoft introduced the notion of a Type Server (enabled via /Zi compiler option in MSVC), which moves some of the work into the compiler (to take advantage of parallelism).  More recently we have been given the /DEBUG:FASTLINK linker option which attempts to solve the problem by not merging types at all in the linker.  However, each of these strategies has its own set of disadvantages, and neither can be considered perfect for all use cases.

In this blog post, we'll first go over some technical background about CodeView so that we can understand the problem, followed by a summary of existing attempts to speed up type merging.  Then, we'll describe a novel extension to the PE/COFF file format which speeds up linking by offloading part of the work required to de-duplicate types to the compiler and using a new algorithm which uniquely identifies type records even across input files, and discuss the various tradeoffs of each approach.  Finally, we'll present some benchmarks and discuss how you can try this out in clang-cl and lld today.


Consider a simple structure in C++, defined like this a header file:

     struct Node {
       Node *Next = nullptr;
       Node *Prev = nullptr;
       int Value = 0;

Since each compilation happens independently of every other compilation, the compiler cannot assume any other translation unit will ever emit the records necessary to describe this type.  As a result, to guarantee that the type makes it into the final PDB, every compiler instance that encounters this definition must emit type information for this type.  So the record will be serialized by the compiler into a series of records that looks roughly like this:

0x1004 | LF_STRUCTURE [size = 40] `Node`
         unique name: `.?AUNode@@`
         vtable: <none>
         base list: <none>
         field list: <none>
         options: forward ref | has unique name
0x1005 | LF_POINTER [size = 12]
         referent = 0x1004
         mode = pointer
         opts = None
         kind = ptr32
0x1006 | LF_FIELDLIST [size = 52]
         - LF_MEMBER
           name = `Next`
           Type = 0x1005
           Offset = 0
           attrs = public
         - LF_MEMBER
           name = `Prev`
           Type = 0x1005
           Offset = 4
           attrs = public
         - LF_MEMBER
           name = `Value`
           Type = 0x0074 (int)
           Offset = 8
           attrs = public
0x1007 | LF_STRUCTURE [size = 40] `Node`
         unique name: `.?AUNode@@`
         vtable: <none>
         base list: <none>
         field list: 0x1006
         options: has unique name
The values on the left correspond to the types index in the type sequence and depend on what types have already been encountered, while other types can the refer to them (for example, referent = 0x1004) means that this record is a pointer to whatever the type at index 0x1004 was.

As a result of this design, another compilation unit which includes the same header file will need to emit this exact same type, with the only difference being the indices (since the other compilation may encounter other types before this one, causing the ordering to be different).

In short, type indices only make sense within the context of a single type sequence (i.e. compiland), but since the linker needs to see across all object files, it has to have some way of identifying whether a type from object file A is isomorphic to a different type from object file B, even if its type indices might be different numerically from any previously seen type. 

This algorithm, henceforth referred to as type merging, is the primary consumer of CPU cycles during linking (measured in LLD, and estimated in MSVC linker by comparing /DEBUG:FULL vs /DEBUG:FASTLINK times), and as such it is the portion of the linking process which this blog post presents a new solution to.

Existing Solutions

It’s worthwhile to discuss some of the existing attempts to reduce the cost associated with type merging so that we can compare and contrast their various pros and cons.

Type Servers (/Zi)

The /Zi compiler option was one of the first attempts to address type merging speed, and it dates back many years.  The idea behind type servers is to offload the work of de-duplication from the linking phase to the compilation phase.  Most build systems already support parallel compilation, and even if they don’t cl.exe supports it natively via the /MP compiler switch, so there is no roadblock to anyone taking advantage of parallel compilation. 

To implement type servers, each compilation process communicates via IPC with a single process (mspdbsrv.exe) whose job is to de-duplicate type records on the fly, and when a record is isomorphic to an existing record, the type server communicates back the previously saved index, and when it is new it sends back a new index.  This allows type deduplication to happen mostlyin parallel, but adding some overhead to each compilation (since there is contention over a global lock) in return for significantly reduced link times, since types will already have been merged.

Type servers bring with them some disadvantages though, so we enumerate them here:
  1. Type servers add significant context switching and global lock contention to the compilation phase, reducing parallelism and degrading overall system performance while a build is in process.  While some performance is reclaimed from the linker, some is sacrificed due to the use of a global system lock.  It’s still a net win, but as it is not free, it leaves open the possibility that we may be able to achieve better parallelism using a different approach.
  2. The type server process itself (mspdbsrv.exe) introduces a single point of failure.  When it crashes (we see C1033 several times per day on Chrome, for example, which seems to indicate an mspdbsrv.exe crash) it could trigger a full rebuild if the type server PDB file is left in a corrupt state.
  3. mspdbsrv is incompatible with distributed builds, which is a show-stopper for large applications that can take several hours to build on normal workstations.  Type servers operate only via local IPC.  While multi-processing works well for small applications, many large products have build farms that distribute compilations among tens or hundreds of physical machines.  Type servers are incompatible with this scenario.

Fastlink PDBs

Fastlink PDBs are a relatively recent introduction, and the approach used by this solution is to eliminate type merging entirely.  To support this, special metadata is set in the PDB file to indicate to the tool that this is a fastlink PDB, and when the tool (e.g. debugger) encounters this metadata, it will fetch all type information from the original object file, rather than from the PDB.  As before, there are several disadvantages to this approach, enumerated here:
  1. The pdbcopy utility is almost unusable with fastlink PDBs for performance reasons.
  2. Since type merging doesn’t happen, indexing of type information also doesn’t happen (since the expensive part of building an index -- the hashing -- comes for free when you were hashing the record anyway).  This leads to degradation in the debugger user experience, since waits which previously happened only at build time now happen at debug-time.
  3. Fastlink PDBs are not portable.  The PDB references the object files by path, so if you copy the PDB and object files to a different machine (or even different path on the same machine) for archival purposes, they can no longer be debugged.  This is a deal-breaker for using it on production builds
  4. Symbols can’t be enumerated in a Fastlink PDB.  This is most obvious if you attempt to use DIA SDK on a Fastlink PDB, where it will simply refuse to do anything at all.  This means that the only externally supported way of querying debug info for users is impossible against a Fastlink PDB.  Beyond that, however, it also means that even Microsoft’s own tools which need to enumerate symbols cannot use any standard API for doing so.  For example, WinDbg doesn’t fully support Fastlink PDBs, and many workflows are broken by the use of them, even using supported Microsoft tools.
  5. It has several serious stability issues which make it unusable on large projects  [ref].  This is probably related to point 4 above, namely the fact that every tool that wants to be able to work with a Fastlink PDB needs to use different code than the SDK that has been tested and battle-hardened through years of development.
  6. When compiling with clang-cl and linking with /debug:fastlink the compiler has to be instructed to emit additional debug information, making .obj files about 29% larger.

Clang's Solution - The COFF .debug$H section

This new approach tries to combine the ideas behind type servers and fastlink PDBs.  Like type servers, it attempts to offload the work of de-duplication to the compilation phase so that it can be done in parallel.  However, it does so using an algorithm with the property that the resulting hash can be used to identify a type record even across type streams.  Specifically, if two records have the same hash, they are the same record even if they are from different object files.  If you can take it on faith that such an algorithm exists (which will be henceforth referred to as a global hash), then the amount of work that the linker needs to perform is greatly reduced.  And the work that it does still have to do can be done much quicker.  Perhaps most importantly, it produces a byte-for-byte identical PDB to when the option is not used, meaning all of the issues surrounding Fastlink PDBs and compatibility are gone.

Previously, the linker would do something that looks roughly like this:

     HashTable<Type> HashedTypes;
     vector<Type> MergedTypes;
     for (ObjectFile &Obj : Objects) {
       for (Type &T : Obj.types()) {
         remapAllTypeIndices(MergedTypes, T);

         if (!HashedTypes.try_insert(T))
The important observations here are:
  1. remapAllTypeIndices is called unconditionally for every type in every object file.
  2. A hash of the type is computed unconditionally for every type
  3. At least one full record comparison is done for every type.  In practice it turns out to be much more, because hash buckets are computed modulo table size, so there will actually be 1 full record comparison for every probe.
Given a global hash function as described above, the algorithm can be re-written like this:
      HashMap<SHA1, int> HashToIndex;
      vector<Type> OrderedTypes;
      for (ObjectFile &Obj : Objects) {
        auto Hashes = Obj.DebugHSectionHashes;
        for (int I=0; I < Obj.NumTypes; ++I) {
          int NextIndex = OrderedTypes.size();
          if (!HashToIndex.try_emplace(Hashes[I], NextIndex))

While this appears very similar, its performance characteristics are quite different.
  1. remapAllTypeIndices is only called when the record is actually new.  Which, as we discussed earlier, is a small fraction of the time over many linker inputs.
  2. A hash of the type is never computed by the linker.  It is simply there in the object file (the exception to this is mixed linker inputs, discussed earlier, but those are a small fraction of input files).
  3. Full record comparisons never happen.  Since we are using a strong hash function with negligible chance of false collisions, and since the hash of a record provides equality semantics across streams, the hash is as good as the record itself.

Combining all of these points, we get an algorithm that is extremely cache friendly.  Amortized over all input files, most records during type merging are cache hits (i.e. duplicate records).  With this algorithm when we get a cache hit, the only two data structures that are accessed are:
  1. An array of contiguous hash values.
  2. An array of contiguous hash buckets.
Since we never do full equality comparison (which would blow out the L1 and sometimes even L2 cache due to the average size of a type record being larger than a cache line) the algorithm here is very fast.

We’ve deferred discussion of how to create such a hash up until now, but it is actually fairly straightforward.  We use what is known as a “tree hash” or “Merkle tree”.  The idea is to pass bytes from a type record directly to the hash function up until the point we get to a type index.  Then, instead of passing the numeric value of the type index to the hash function, we pass the previously computed hash of the record that is being referenced.

Such a hash is very fast to compute in the compiler because the compiler must already hash types anyway, so the incremental cost to emit this to the .debug$H section is negligible.  For example, when a type is encountered in a translation unit, before you can add that type to the object file’s .debug$T section, it must first be verified that the type has not already been added.  And since this is happening naturally in the order in which types are encountered, all that has to be done is to save these hash values in an array indexed by type index, and subsequent hash operations will have O(1) access to all of the information needed to compute this merkle hash.

Mixed Input Files and Compiler/Linker Compatibility

A linker must be prepared to deal with a mixed set of input files.  For example, while a particular compiler may choose to always emit .debug$H sections, a linker must be prepared to link objects that for whatever reason do not have this section.  To handle this, the linker can examine all inputs up front and manually compute hashes for inputs with missing .debug$Hsections.  In practice this proves to be a small fraction and the penalty for doing this serially is negligible, although it should be noted that in theory this can also be done as a parallel pre-processing step if some use cases show that this has non-negligible cost.

Similarly, the emission of this section in an object file has no impact on linkers which have not been taught to use it.  Since it is a purely additive (and optional) inclusion into the object file, any linker which does not understand it will continue to work exactly as it does today.

The On-Disk Format

Clang uses the following on-disk format for the .debug$H section.

           0x0     : <Section Magic>  (4 bytes)
     0x4     : <Version>        (2 bytes)
     0x6     : <Hash Algorithm> (2 bytes)
     0x8     : <Hash Value>     (N bytes)
     0x8 + N : <Hash Value>     (N bytes)

Here, “Section Magic” is an arbitrarily chosen 4-byte number whose purpose is to provide some level of certainty that what we’re seeing is a real .debug$H section, and not some section that someone created that accidentally happened to be called that.   Our current implementation uses the value 0x133C9C5, which represents the date of the initial prototype implementation.  But this can be any reasonable value here, as long as it never changes.

“Version” is reserved for future use, so that the format of the section can theoretically change.

“Hash Algorithm” is a value that indicates what algorithm was used to generate the hashes that follow.  As such, the value of N above is also a function of what hash algorithm is used.  Currently, the only proposed value for Hash Algorithm is SHA1 = 0, which would imply N = 20 when Hash Algorithm = 0.  Should it prove useful to have truncated 8-byte SHA1 hashes, we could define SHA1_8 = 1, for example.

Limitations and Pitfalls

The biggest limitation of this format is that it increases object file size.  Experiments locally on fairly large projects show an average aggregate object file size increase of ~15% compared to /DEBUG:FULL (which, for clang-cl, actually makes .debug$H object files smallerthan those needed to support /DEBUG:FASTLINK).

There is another, less obvious potential pitfall as well.  The worst case scenario is when no input files have a .debug$H section present, but this limitation is the same in principle even if only a subset of files have a .debug$H section.  Since the linker must agree on a single hash function for all object files, there is the question of what to do when not all object files agree on hash function, or when not all object files contain a .debug$H section.  If the code is not written carefully, you could get into a situation where, for example, no input files contain a .debug$H section so the linker decides to synthesize one on the fly for every input file.  Since SHA1 (for example) is quite slow, this could cause a huge performance penalty.

This limitation can be coded around with some care, however.  For example, tree hashes can be computed up-front in parallel as a pre-processing step.  Alternatively, a hash function could be chosen based on some heuristic estimate of what would likely lead to the fastest link (based on the percentage of inputs that had a .debug$H section, for example).  There are other possibilities as well.  The important thing is to just be aware of this potential pitfall, and if your links become very slow, you'll know that the first thing you should check is "do all my object files have .debug$H sections?"

Finally, since a hash is considered to be identical to the original record, we must consider the possibility of collisions.  That said, this does not appear to be a serious concern in practice.  A single PDB can have a theoretical maximum of 232 type records anyway (due to a type index being 4 bytes).  The following table shows the expected number of type records needed for a collision to exist as a function of hash size.
Hash Size (Bytes)
Average # of records needed for a collision
3.53 x 1014
2.31 x 1019
1.52 x 1024
Given that this is strictly for debug information and not generated code, it’s worth thinking about the severity of a collision.  We feel that an 8-byte hash is probably acceptable for real world use.


Here we will give some benchmarks on large real world applications (specifically, Chrome and clang).  The times presented are only for the linker.  gn args for each build of chromium are specified at the end..



The numbers here indicate a reduction in link time of up to 30% by enabling /DEBUG:GHASH in lld.

It's worth mentioning that lld does not yet have support for incremental linking so we could not compare the cost of an incremental link with /DEBUG:GHASH versus MSVC.  We still expect incremental linking using MSVC under optimal conditions (e.g. change whitespace in a header file) to produce much faster links than lld is currently able to do.

There are several possible avenues for further optimization though, so we will finish up by discussing them.

Further Improvements

There are several ways to improve the times further, which have yet to be explored.

  1. Use a smaller or faster hash.  We use a 20-byte SHA1 hash.  This is not a multiple of cache line size, and in any case the probability of collision is astronomically small even in the largest PDBs, considering that the theoretical limit of a PDB is just under 2^32 possible unique types (due to the 4-byte size of a type index).  SHA1 is also notoriously slow.  It might be interesting to try, for example, a Blake2 set to output an 8 byte hash.  This should give sufficiently low probability of a collision while improving cache performance.  The on-disk format is designed with this flexibility in mind, as different hash algorithms can be specified in the header.
  2. Hashes for compilands with missing .debug$H sections can be computed in parallel before linking.  Currently when we encounter an object file without a .debug$H section, we must synthesize one in the linker.  Our prototype algorithm does this serially for each input.
  3. Symbol records from .debug$S sections can be merged in parallel.  Currently in lld, we first merge type records into the TPI stream, then we iterate symbol records and remap types in each symbol record to correspond to the new type indices.  If we merge types from all modules up front, the symbol records (with the exception of global symbols) can be merged in parallel since they get written to independent streams).

Try it out!

If you're already using clang-cl and lld on Windows today, you can try this out.  There are two flags needed to enable this, one for the compiler and one for the linker:
  1. To enable the emission of a .debug$H section by the compiler, you will need to pass the undocumented -mllvm -emit-codeview-ghash-section flag to clang-cl  (this flag should go away in the future, once this is considered stable and good enough to be turned on by default).
  2. To tell lld to use this information, you will need to pass the /DEBUG:GHASH to lld.
Note that this feature is still considered highly experimental, so we're interested in your feedback (llvm-dev@ mailing list, direct email is ok too) and bug reports (  

by Zachary Turner ( at January 09, 2018 06:49 PM

September 21, 2017


2017 US LLVM Developers' Meeting Program

The LLVM Foundation is excited to announce the selected proposals for the 2017 US LLVM Developers' Meeting!





Lightning Talks:

Student Research Competition:


If you are interested in any of these talks, you should register to attend the 2017 US LLVM Developers' Meeting! Tickets are limited, so register now!

by Tanya Lattner ( at September 21, 2017 08:14 AM

September 20, 2017


Clang ♥ bash -- better auto completion is coming to bash

Compilers are complex pieces of software and have a multitude of command-line options to fine tune parameters. Clang is no exception: it has 447 command-line options. It’s nearly impossible to memorize all these options and their correct spellings, that's where shell completion can be very handy. When you type in the first few characters of a flag and hit tab, it will autocomplete the rest for you.

However, such a autocompletion feature is not available yet, as there's no easy way to get a complete list of the options Clang supports. For example, bash doesn’t have any autocompletion support for Clang, and despite some shells like zsh having a script for command-line autocompletion, they use hard coded lists of command-line options, and are not automatically updated when a new option is added to Clang. These shells also can’t autocomplete arguments which some flags take (-std=[tab] for instance).

This is the problem we were working to solve during this year’s Google Summer of Code. We’re adding a feature to Clang so that we can implement a complete, exact command-line option completion which is highly portable for any shell. To start with, we'll provide a completion script for bash which uses this feature.

Clang now has a new command line option called --autocomplete. This flag receives the incomplete user input from the shell and then queries the internal data structures of the current Clang binary, and returns a list of possible completions. With this API, we can always get an accurate list of options and values any time, on any newer versions of Clang.

We built an autocompletion using this in bash for the first implementation. You can find its source code here. Also, here is the sample for Qt text entry autocompletion to give an example how to use this API from an UI application as seen below:


You can always complete one flag at a time. So if you want to use the API, you have to select the flag that the user is currently typing. Then just pass this flag to the --autocomplete flag in the selected clang binary. So in the case below all flags start with `-tr` are displayed with their descriptions behind them (separated from the flag with a tab character).
The API also supports completing the values of flags. If you have a flag for which value completion is supported, you can also provide an incomplete value behind the flag separated by a comma to get completion for this:
If you provide nothing after the comma, the list of the all possible values for this flag is displayed.

How to get it
This feature is available for use now with LLVM/clang 5.0 and we’ll also be adding this feature to the standard bash completion package. Make sure you have the latest clang version on your machine, and source this script. If want to make the change permanent, just source it from your .bashrc and enjoy typing your clang invocations!

by Yuka Takahashi ( at September 20, 2017 04:11 PM

August 23, 2017

Sylvestre Ledru

Rebuild of Debian using Clang 3.9, 4.0 and 5.0

tldr: The percentage of failure is decreasing, Clang support is improving but there is a long way to go.

The goal of this initiative is to rebuild Debian using Clang as a compiler instead of gcc. I have been doing this analysis for the last 6 years.

Recently, we rebuilt the archive of the Debian archive with Clang 3.9.1 (July 6th), 4.0.1 (July 6th) and 5.0 rc2 (August 20th).

For various reasons, we didn't perform a rebuild since June 2016 with version 3.8. Therefor, we took the opportunity to do three over the last month.

Now, the 3.9 & 4.0 results are impacted by a build failure when building all haskell packages (the -no-pie option in Clang doesn't exist - I introduced it in clang 5.0). Fixing this issue with 5.0 removed more than 860 failures.

Also, for the same versions, a Qt compiler detection is considering that Clang is not a C++11 compiler because clang++, by default, defines __cplusplus as 199711L (-std=c++11 has to be added to define a correct __cplusplus). See for more information. Some discussions happened on the upstream mailing list about changing the default C++ dialect.
For example, with 4.0, this is causing 132 errors. With 5.0, probably thanks to a new Qt version, roughly the same number of packages are failing but because gcc just triggers a warning with the "nodiscard" attribute being incorrectly used when clang triggers an error.

In parallel, ignoring the haskell build failures, the numbers sightly increased since last year even if the overall percentage decreased (new packages being uploaded in the archive).

VersionBuild failuresIgnoring haskell pkgs
3.81367 / 5.6%
3.92274 / 8.1%1618 / 5.8%
4.02311 / 8.3%1655 / 5.9%
5.01445 / 5.1%

In parallel, new warnings and errors showed up in Clang.
This is causing a new set of build failures (especially with the usage of -Werror).

As few examples:
* Starting with 4.0, clang triggers an error ordered comparison between pointer and zero ('char *' and 'int').
* Similarly, with this version, -Wmain introduces a new warning which will trigger a warning when a bool literal is returned from main.
* clang also introduced a new warning called -Waddress-of-packed-member causing 5 new errors.
* With the same version, clang can trigger a new error when auto is used in function return type.

Now, as a conclusion, having Debian being built with clang by default is still a long shot.
First, when Clang became usable for a general audience, gcc was lagging in term of warning and error detections. Now, gcc is in a much better position than it was, decreasing the interest to have clang replacing gcc. In parallel, most of the efforts in term of warnings
and mistake detections are currently done under the clang tidy umbrella, making them less intrusive as part of this initiative (but harder to use and to deploy).
As an example, the gcc warning -Wmisleading-indentation has been implemented under a clang-tidy checker.
Second, the very permissive license of clang has been a key factor for some operating systems to switch like the PS4, Mac OS X or FreeBSD. With Debian, the community is generally happy with the GPL.
Third, the performances are similar enough that it is not worth the work, except for some projects with very special needs.

Last, despite that it is much easier to contribute to llvm/clang than gcc (not copyright assignment or actual review system for example), this isn't a big differentiator for most of the projects.

Of course, I will continue to run and analysis these rebuilds as this is a great source of information for clang upstream developers to improve the compatibility with gcc and understand some impacts. However, until there is a big game changer, I will stop pursuing the goal of having Debian switching to clang instead of gcc. I will stop effort on the debile project (which was aiming to rebuild in the background packages).

by Sylvestre at August 23, 2017 10:09 PM

August 18, 2017


LLVM on Windows now supports PDB Debug Info

For several years, we’ve been hard at work on making clang a world class toolchain for developing software on Windows.  We’ve written about this several times in the past, and we’ve had full ABI compatibility (minus bugs) for some time. One area that been notoriously hard to achieve compatibility on has been debug information, but over the past 2 years we’ve made significant leaps.  If you just want the TL;DR, then here you go: If you’re using clang on Windows, you can now get PDB debug information!

Background: CodeView vs. PDB
CodeView is a debug information format invented by Microsoft in the mid 1980s. For various reasons, other debuggers developed an independent format called DWARF, which eventually became standardized and is now widely supported by many compilers and programming languages.  CodeView, like DWARF, defines a set of records that describe mappings between source lines and code addresses, as well as types and symbols that your program uses.  The debugger then uses this information to let you set breakpoints by function name, display the value of a variable, etc.  But CodeView is only somewhat documented, with the most recent official documentation being at least 20 years old.  While some records still have the format documented above, others have evolved, and entirely new records have been introduced that are not documented anywhere.

It’s important to understand though that CodeView is just a collection of records.  What happens when the user says “show me the value of Foo”?  The debugger has to find the record that describes Foo.  And now things start getting complicated.  What optimizations are enabled?  What version of the compiler was used?  (These could be important if there are certain ABI incompatibilities between different versions of the compiler, or as a hint when trying to reconstruct a backtrace in heavily optimized code, or if the stack has been smashed).  There are a billion other symbols in the program, how can we find the one named Foo without doing an exhaustive O(n) search?  How can we support incremental linking so that it doesn’t take a long time to re-generate debug info when only a small amount of code has actually changed?  How can we save space by de-duplicating strings that are used repeatedly?  Enter PDB.

PDB (Program Database) is, as you might have guessed from the name, a database.  It contains CodeView but it also contains many other things that allow indexing of the CodeView records in various ways.  This allows for fast lookups of types and symbols by name or address, the philosophical equivalent of “tables” for individual input files, and various other things that are mostly invisible to you as a user but largely responsible for making the debugging experience on Windows so great.  But there’s a problem: While CodeView is at least kind-of documented, PDB is completely undocumented.  And it’s highly non-trivial.

We’re Stuck (Or Are We?)
Several years ago, we decided that the path forward was to abandon any hope of emitting CodeView and PDB, and instead focus on two things:
  1. Make clang-cl emit DWARF debug information on Windows
  2. Port LLDB to Windows and teach it about the Windows ABI, which would be significantly easier than teaching Visual Studio and/or WinDbg to be able to interpret DWARF (assuming this is even possible at all, given that everything would have to be done strictly through the Visual Studio / WinDbg extensibility model)
In fact, I even wrote another blog post about this very topic a little over 2 years ago.  So I got it to work, and I eventually got parts of LLDB working on Windows for simple debugging scenarios.

Unfortunately, it was beginning to become clear that we really needed PDB.  Our goal has always been to create as little friction as possible for developers who are embedded in the Windows ecosystem.  Tools like Windows Performance Analyzer and vTune are very powerful and standard tools in engineers’ existing repertoires.  Organizations already have infrastructure in place to archive PDB files, and collect & analyze crash dumps.  Debugging with PDB is extremely responsive given that the debugger does not have to index symbols upon startup, since the indices are built into the file format.  And last but not least, tools such as WinDbg are already great for post-mortem debugging, and frankly many (perhaps even most) Windows developers will only give up the Visual Studio debugger when it is pried from their cold dead hands.

I got some odd stares (to put it lightly) when I suggested that we just ask Microsoft if they would help us out.  But ultimately we did, and… they agreed!  This came in the form of some code uploaded to the Microsoft Github repo which we were on our own to figure out.  Although they were only able to upload a subset of their PDB code (meaning we had to do a lot of guessing and exploration, and the code didn’t compile either since half of it was missing), it filled in enough blanks that we were able to do the rest.

After about a year and a half of studying this code, hacking away, studying the code some more, hacking away some more, etc, I’m proud to say that lld (the LLVM linker) can finally emit working PDBs.  All the basics like setting breakpoints by line, or by name, or viewing variables, or searching for symbols or types, everything works (minus bugs, of course).

For those of you who are interested in digging into the internals of a PDB, we also have been developing a tool for expressly this purpose.  It’s called llvm-pdbutil and is the spiritual counterpart to Microsoft’s own cvdump utility.  It can dump the internals of a PDB, convert a PDB to yaml and vice versa, find differences between two PDBs, and much more.  Brief documentation for llvm-pdbutil is here, and a detailed description of the PDB file format internals are here, consisting of everything we’ve learned over the past 2 years (still a work in progress, as I have to divide my time between writing the documentation and actually making PDBs work).

Bring on the Bugs!
So this is where you come in.  We’ve tested simple debugging scenarios with our PDBs, but we still consider this alpha in terms of debug info quality.  We’d love for you to try it out and report issues on our bug tracker.  To get you started, download the latest snapshot of clang for Windows.  Here are two simple ways to test out this new functionality:
  1. Have clang-cl invoke lld automatically
    1. clang-cl -fuse-ld=lld -Z7 -MTd hello.cpp
  2. Invoke clang-cl and lld separately.
    1. clang-cl -c -Z7 -MTd -o hello.obj hello.cpp
    2. lld-link -debug hello.obj
We look forward to the onslaught of bug reports!

We would like to extend a very sincere and deep thanks to Microsoft for their help in getting the code uploaded to the github repository, as we would never have gotten this far without it.

And to leave you with something to get you even more excited for the future, it's worth reiterating that all of this is done without a dependency on any windows specific api, dll, or library.  It's 100% portable.  Do I hear cross-compilation?

Zach Turner (on behalf of the the LLVM Windows Team)

by Zachary Turner ( at August 18, 2017 07:55 PM

August 17, 2017


LLVM Weekly - #130, Jun 27th 2016

Welcome to the one hundred and thirtieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.
If you're reading this on then do note this is LAST TIME it will be cross-posted there directly. There is a great effort underway to increase the content on the LLVM blog, and unfortunately LLVM Weekly has the effect of drowning out this content. As ever, you can head to, subscribe to get it by email, or subscribe to the RSS feed.
The canonical home for this issue can be found here at

News and articles from around the web

After recently being taken down due to excessive resource usage, the LLVM apt repositories are now back.
A detailed introduction to ThinLTO has been published on the LLVM blog. This covers the background, design, current status, and usage information for ThinLTO.
A post on Reddit gives a summary of notable language features voted into the C++17 working draft at the Oulu meeting.

On the mailing lists

LLVM commits

  • The new representation for control-flow integrity and virtual call metadata has landed. The commit message further details the problems this change addresses. r273729.
  • The llvm.type.checked.load intrinsic was added. It loads a function pointer from a virtual table pointer using type metadata. r273576.
  • As part of the work on CFL-AA, interprocedural function summaries were added. These avoid recomputation for many properties of a function. r273219, r273596.
  • MemorySSA gained new APIs for PHI creation and MemoryAccess creation. r273295.
  • Metadata attachments are now allowed for declarations. r273336.
  • A new runtimes directory was added to the LLVM tree. r273620.
  • LLVM's dynamic loader gained basic support for COFF ARM. r273682.

Clang commits

  • constexpr if support has been added to Clang. r273602.
  • clang-tidy has a new modernize-use-emplace check that will replace calls of push_back to emplace_back. r273275.
  • The CMake build system for Clang gained a ENABLE_X86_RELAX_RELOCATIONS option. r273224.

Other project commits

  • Basic support for versioned symbols was added to LLD. r273143.
  • LLD now handles both single and double dashes for all options. r273256.

by Alex Bradbury ( at August 17, 2017 10:48 PM

March 10, 2017


Devirtualization in LLVM and Clang

This blog post is part of a series of blog posts from students who were funded by the LLVM Foundation to attend the 2016 LLVM Developers' Meeting in San Jose, CA. Please visit the LLVM Foundation's webpage for more information on our Travel Grants program. 

This post is from Piotr Padlewski on his work that he presented at the meeting:

This blogpost will show how C++ devirtualization is performed in current (4.0) clang and LLVM and also ongoing work on -fstrict-vtable-pointers features.

Devirtualization done by the frontend

In order to transform a virtual call into a direct call, the frontend must be sure that there are no overrides of vfunction in the program or know the dynamic type of object. Compilation proceeds one translation unit at a time, so, barring LTO, there are only a few cases when the compiler may conclude that there are no overrides:

  • either the class or virtual method is marked as final
  • the class is defined in an anonymous namespace and has no deriving classes in its translation unit

The latter is more tricky for clang, which translates the source code in chunks on the fly (see: ASTProducer and ASTConsumer), so is not able to determine if there are any deriving classes later in the source. This could be dealt with in a couple of ways:
  • give up immediate generation
  • run data flow analysis in LLVM to find all the dynamic types passed to function, which has static linkage
  • hope that every use of the virtual function, which is necessarily in the same translation unit, will be inlined by LLVM -- static linkage increases the chances of inlining

Store to load propagation in LLVM

In order to devirtualize a virtual call we need:
  • value of vptr - which virtual table is pointed by it
  • value of vtable slot - which exact virtual function it is

Because vtables are constant, the latter value is much easier to get when we have the value of vptr. The only thing we need is vtable definition, which can be achieved by using available_externally linkage.

In order to figure out the vptr value, we have to find the store to the same location that defines it. There are 2 analysis responsible for it:

  • MemDep (Memory Dependence Analysis) is a simple linear algorithm that for each quered instruction iterates through all instructions above and stops when first dependency is found. Because queries might be performed for each instruction we end up with a quadratic algorithm. Of course quadratic algorithms are not welcome in compilers, so MemDep can only check certain number of instructions.
  • Memory SSA on the other hand has constant complexity because of caching. To find out more, watch “Memory SSA in 5minutes” ( MemSSA is a pretty new analysis and it doesn’t have all the features MemDep has, therefore MemDep is still widely used.
The LLVM main pass that does store to load propagation is GVN - Global Value Numbering.

Finding vptr store

In order to figure out the vptr value, we need to see store from constructor. To not rely on constructor's availability or inlining, we decided to use the @llvm.assume intrinsic to indicate the value of vptr. Assume is akin to assert - optimizer seeing call to @llvm.assume(i1 %b) can assume that %b is true after it. We can indicate vptr value by comparing it with the vtable and then call the @llvm.assume with the result of this comparison.

call void @_ZN1AC1Ev(%struct.A* %a) ; call ctor
 %3 = load {...} %a                  ; Load vptr
 %4 = icmp eq %3, @_ZTV1A      ; compare vptr with vtable
 call void @llvm.assume(i1 %4)

Calling multiple virtual functions

A non-inlined virtual call will clobber the vptr. In other words, optimizer will have to assume that vfunction might change the vptr in passed object. This sounds like something that never happens because vptr is “const”. The truth is that it is actually weaker than C++ const member, because it changes multiple times during construction of an object (every base type constructor or destructor must set vptrs). But vptr can't be changed during a virtual call, right? Well, what about that?

void A::foo() { // virtual
static_assert(sizeof(A) == sizeof(Derived));
new(this) Derived;

This is call of placement new operator - it doesn’t allocate new memory, it just creates a new object in the provided location. So, by constructing a Derived object in the place where an object of type A was living, we change the vptr to point to Derived’s vtable. Is this code even legal? C++ Standard says yes.

However it turns out that if someone called foo 2 times (with the same object), the second call would be undefined behavior. Standard pretty much says that call or dereference of a pointer to an object whose lifetime has ended is UB, and because the standard agrees that nuking object from inside ends its lifetime, the second call is UB. Be aware that this is only because a zombie pointer is used for the second call. The pointer returned by placement new is considered alive, so performing calls on that pointer is valid. Note that we also silently used that fact with the use of assume.

(un)clobbering vptr

We need to somehow say that vptr is invariant during its lifetime. We decided to introduce a new metadata for that purpose - ! The presence of the metadata on the load/store tells the optimizer that every load and store to the same pointer operand within the same invariant group can be assumed to load or store the same value. With -fstrict-vtable-pointers Clang decorates vtable loads with metadana coresponding to caller pointer type. 

We can enhance the load of virtual function (second load) by decorating it with !invariant.load, which is equivalent of saying “load from this location is always the same”, which is true because vtables never changes. This way we don’t rely on having the definition of vtable.

Call like:

void g(A *a) {

Will be translated to:

define void @function(%struct.A* %a) {
 %1 = load {...} %a, ! !0
 %2 = load {...} %1, !invariant.load !1
 call void %2(%struct.A* %a)

 %3 = load {...} %a, ! !0
 %4 = load {...} %4, !invariant.load !1
 call void %4(%struct.A* %a)
 ret void

!0 = !{!"_ZTS1A"} ; mangled type name of A
!1 = !{}

And now by magic of GVN and MemDep:

define void @function(%struct.A* %a) {
 %1 = load {...} %a, ! !0
 %2 = load {...} %1, !invariant.load !1
 call void %2(%struct.A* %a)
 call void %2(%struct.A* %a)
 ret void

With this, llvm-4.0 is be able to devirtualize function calls inside loops. 


In order to prevent the middle-end from finding load/store with the same ! metadata, that would come from construction/destruction of dead dynamic object, was introduced. It returns another pointer that aliases its argument but is considered different for the purposes of load/store metadata. Optimizer won’t be able to figure out that returned pointer is the same because intrinsics don’t have a definition. Barrier must be inserted in all the places where the dynamic object changes:
  • constructors
  • destructors
  • placement new of dynamic object

Dealing with barriers

Barriers hinder some other optimizations. Some ideas how it could be fixed:

  • stripping metadata and barriers just after devirtualization. Currently it is done before codegen. The problem is that most of the devirtualization comes from GVN, which also does most of the optimizations we would miss with barriers. GVN is expensive therefore it is run only once. It also might make less sense if we are in LTO mode, because that would limit the devirtualization in the link phase. 
  • teaching important passes to look through the barrier. This might be very tricky to preserve the semantics of barrier, but e.g. looking for dependency of load without by jumping through the barrier to find a store without, is likely to do the trick.
  • removing invariant.barrier when its argument comes from alloca and is never used etc.
To find out more details about devirtualization check my talk ( from LLVM Dev Meeting 2016.

About author

Undergraduate student at University of Warsaw, currently working on C++ static analysis in IIIT.

by Tanya Lattner ( at March 10, 2017 09:23 PM

March 07, 2017


Some news about provides Debian and Ubuntu repositories for every maintained version of these distributions. LLVM, Clang, clang extra tools, compiler-rt, polly, LLDB and LLD packages are generated for the stable, stabilization and development branches.

As it seems that we have more and more users of these packages, I would like to share an update about various recent changes.

New features

First, the cool new stuff : lld is now proposed and built for i386/amd64 on all Debian and Ubuntu supported versions. The test suite is also executed and the coverage results are great.

Then, following the branching for the 4.0 release, I created new repositories to propose this release.
For example, for Debian stable, just add the following in /etc/apt/sources.list.d/llvm.list

deb llvm-toolchain-jessie-4.0 main
  deb-src llvm-toolchain-jessie main

Obviously, the trunk is now 5.0. If llvm-defaults is used, clang, lldb and other meta packages will be automatically updated to this version.
As a consequence and also because the branches are dead, 3.7 and 3.8 jobs have been disabled. Please note that both repositories are still available on and won't be removed.

Zesty: New Ubuntu
Packages for the next Ubuntu 17.04 (zesty) are also generated for 3.9, 4.0 and 5.0.

It has been implemented a few months ago but not clearly communicated. libfuzzer has also its own packages: libfuzzer-X.Y-dev (example: libfuzzer-3.9-dev, libfuzzer-4.0-dev or libfuzzer-5.0-dev).

Changes in the infrastructure

In order to support the load, I started to use new blades that Google (thanks again to Nick Lewycky) sponsored for an initiative that I was running for Debian and IRILL. The 6 new blades removed all the wait time. With a new salt configuration, I automated the deployment of the slaves. In case the load increases again, we will have access to more blades.

I also took the time to fix some long ongoing issues:
  • all repositories are signed and verified that they are    
  • i386 and amd64 packages are now uploaded at once instead of being uploaded separately. This was causing checksum error when one of the two architectures built correctly and the second was failing (ex: test failing)
Last but not least, the code coverage results are produced in a more reliable manner.

More information about the implementation and services.

As what is shipped on is exactly the same as in Debian and Ubuntu, packaging files are stored on the Debian subversion server.

A Jenkins instance is in charge of the orchestration of the whole build infrastructure.

The trunk packages are built twice a day for every Debian and Ubuntu packages. Branches (3.9 and 4.0 currently) are rebuilt only when the - trigger job found a change.

In both case, the Jenkins source job will checkout the Debian SVN branches for their version, checkout/update LLVM/clang/etc repositories and repack everything to create the source tarballs and Debian files (dsc, etc).The completion of job will trigger the binaries job to start. These jobs, thanks to Debian Jenkins glue will create or update Debian/Ubuntu versions.

Then builds are done the usual way through pbuilder for both i386 and amd64. All the test suites are going to be executed. If any LLVM test is failing on i386 or amd64, the whole build will fail. If both builds and the LLVM testsuite are successful, the sync job will start and rsync packages to the LLVM server to be replicated on the CDN. If one or both builds fail, a notification is sent to the administrator.

Some Debian static analysis (lintian) are executed on the packages to prevent some packaging errors. From time to time, some interesting issues are found.

In parallel, some binary builds have some special hooks like Coverity, code coverage or installation of more recent versions of gcc for Ubuntu precise.

Report bugs

Bugs can be reported on the bugzilla of the LLVM project in the product "Packaging" and the component "deb packages".

Common issues

Because packaging quickly moving projects like LLVM or clang, in some cases, this can be challenging to follow the rhythm in particular with regard to tests. For Debian unstable or the latest version of Ubuntu, the matrix is complexified by new versions of the basic pieces of the operating system like gcc/g++ or libtstdc++.

This is also not uncommon that some tests are being ignored in the process.

How to help

Some new comers bugs are available. As an example:
Related to all this, a Google Summer of Code 2017 under the LLVM umbrella has been proposed: Integrate libc++ and OpenMP in

Help is also needed to keep track of the new test failures and get them fixed upstream. For example, a few tests have been marked as expected to fail to avoid crashes.

by Ledru Sylvestre ( at March 07, 2017 06:59 PM

February 22, 2017


2016 LLVM Developers' Meeting - Experience from Johannes Doerfert, Travel Grant Recipient

This blog post is part of a series of blog posts from students who were funded by the LLVM Foundation to attend the 2016 LLVM Developers' Meeting in San Jose, CA. Please visit the LLVM Foundation's webpage for more information on our Travel Grants program.

This post is from Johannes Doerfert:
2016 was my third time attending the US LLVM developers meeting and for the third year in a row I was impressed by the quality of the talks, the organization and the diversity of attendees. The hands on experiences that are presented, combined with innovative ideas and cutting edge research makes it a perfect venue for me as a PhD student. The honest interest in the presented topics and the lively discussions that include students, professors and industry people are two of the many things that I experienced the strongest at these developer meetings.

For the last two years I was mainly attending as a Polly developer that talked about new features and possible applications of Polly. This year however my roles were different. First, I was attending as part of the organization team of the European LLVM developers meeting 2017 [0] together with my colleagues Tina Jung and Simon Moll. In this capacity I answered questions about the venue (Saarbruecken, Germany [1,2]) and the alterations in contrast to prior meetings. Though, more importantly, I advertised the meeting to core developers that usually do not attend the European version. Second on my agenda was the BoF on a parallel extension to the LLVM-IR which I organized with Simon Moll. In this BoF, but also during the preparation discussion on the mailing list [3], we tried to collect motivating examples, requirements as well as known challenges for a parallel extension to LLVM. These insights will be used to draft a proposal that can be discussed in the community.

Finally, I attended as a 4th year PhD student who is interested in contributing his work to the LLVM project (not only Polly). As my current research required a flexible polyhedral value (and iterationspace) analysis, I used the opportunity to implement one with aninterface similar to scalar evolution. The feedback I received on this topic was strictly positive. I will soon post a first version of this standalone analysis and start a public discussion. Since I hope to finish my studies at some (not too distant) point in time, I seized the opportunity to inquire about potential options for the time after my PhD.

As a final note I would like to thank the LLVM Foundation for their student travel grant that allowed me to attend the meeting in the first place.


by Tanya Lattner ( at February 22, 2017 07:29 AM

December 14, 2016


LLVM's New Versioning Scheme

Historically, LLVM's major releases always added "0.1" to the version number, producing major versions like 3.8, 3.9, and 4.0 (expected by March 2017). With our next release though, we're changing this.  The LLVM version number will now increase by "1.0" with every major release, which means that the first major release after LLVM 4.0 will be LLVM 5.0 (expected September 2017).
We believe that this approach will provide a simpler and more standard approach to versioning.
LLVM’s version number (also shared by many of its sub-projects, such as Clang, LLD, etc.) consists of three parts: major.minor.patch. The community produces a new release every six months, with "patch" releases (also known as "dot" or "stable" releases) containing bug fixes in between.
Until now, the six-monthly releases would cause the minor component of the version to be incremented. Every five years, after minor reached 9, a more major release would occur, including some breaking changes: 2.0 introduced the bitcode format, 3.0 a type system rewrite.
During the discussions about what to call the release after 3.9, it was pointed out that since our releases are time-based rather than feature-based, the distinction between major and minor releases seems arbitrary. Further, every release is also API breaking, so by the principles of semantic versioning, we should be incrementing the major version number.
We decided that going forward, every release on the six-month cycle will be a major release. Patch releases will increment the patch component as before (producing versions like 5.0.1), and the minor component will stay at zero since no minor releases will be made.

Bitcode Compatibility

Before LLVM 4.0.0, the Developer Policy specified that bitcode produced by LLVM would be readable by the next versions up to and including the next major release. The new version of the Developer Policy instead specifies that LLVM will currently load any bitcode produced by version 3.0 or later. When developers decide to drop support for some old bitcode feature, the policy will be updated.

API Compatibility

Nothing has changed. As before, patch releases are API and ABI compatible with the main releases, and the C API is "best effort" for stability, but besides that, LLVM’s API changes between releases.

What About the Minor Version?

Since the minor version is expected to always be zero, why not drop it and just use major.patch as the version number?
Dropping the minor component from the middle of the version string would introduce ambiguity: whether to interpret x.y as major.minor or major.patch would then depend on the value of x.
The version numbers are also exposed through various APIs, such as LLVM's llvm-config.h and Clang's __clang_minor__ preprocessor macro. Removing the minor component from these APIs would break a lot of existing code.
Going forward, since the minor number will be zero and patch releases are compatible, I expect we will generally refer to versions simply by their major number and treat the rest of the version string as details (just as Chromium 55 might really be 55.0.2883.76). Future versions of LLVM and Clang can generally be referred to simply as "LLVM 4" or "Clang 5".

by Hans Wennborg ( at December 14, 2016 11:38 PM

September 12, 2016


Announcing the next LLVM Foundation Board of Directors

The LLVM Foundation is pleased to announce its new Board of Directors:

Chandler Carruth
Hal Finkel
Arnaud de Grandmaison
David Kipping
Anton Korobeynikov
Tanya Lattner
Chris Lattner
John Regehr

Three new members and five continuing members were elected to the eight person board. The new board consists of individuals from corporations and from the academic and scientific communities. They also represent various geographical groups of the LLVM community. All board members are dedicated and passionate about the programs of the LLVM Foundation and growing and supporting the LLVM community.

When voting on new board members, we took into consideration all contributions (past and present) and current involvement in the LLVM community. We also tried to create a balanced board of individuals from a wide range of backgrounds and locations to provide a voice to as many groups within the LLVM community.

We want to thank everyone who applied as we had many strong applications. As the programs of the LLVM Foundation grow we will be relying on volunteers to help us reach success. Please join our mailing list to be informed of volunteer opportunities.

About the board of directors (listed alphabetically by last name):

Chandler Carruth has been an active contributor to LLVM since 2007. Over the years, he has has worked on LLVM’s memory model and atomics, Clang’s C++ support, GCC-compatible driver, initial profile-aware code layout optimization pass, pass manager, IPO infrastructure, and much more. He is the current code owner of inlining and SSA formation.

In addition to his numerous technical contributions, Chandler has led Google’s LLVM efforts since 2010 and shepherded a number of new efforts that have positively and significantly impacted the LLVM project. These new efforts include things such as adding C++ modules to Clang, adding address and other sanitizers to Clang/LLVM, making Clang compatible with MSVC and available to the Windows C++ developer community, and much more.

Chandler works at Google Inc. as a technical lead for their C++ developer platform and has served on the LLVM Foundation board of directors for the last 2 years.
Hal Finkel has been an active contributor to the LLVM project since 2011. He is the code owner for the PowerPC target, alias-analysis infrastructure, loop re-roller and the basic-block vectorizer.  

In addition to his numerous technical contributions, Hal has chaired the LLVM in HPC workshop, which is held in conjunction with Super Computing (SC), for the last 3 years. This workshop provides a venue for the presentation of peer-reviewed HPC-related researching LLVM from both industry and academia. He has also been involved in organizing an LLVM-themed BoF session at SC and LLVM socials in Austin.

Hal is Lead for Compiler Technology and Programming Languages at Argonne National Laboratory’s Leadership Computing Facility.

Arnaud de Grandmaison has been hacking on LLVM projects since 2008. In addition to his open source contributions, he has worked for many years on private out-of-tree LLVM-based projects at Parrot, DiBcom, or ARM. He has also been a leader in the European LLVM community by organizing the EuroLLVM Developers’ meeting, Paris socials, and chaired or participated in numerous program committees for the LLVM Developers’ Meetings and other LLVM related conferences.

Arnaud has attended numerous LLVM Developers’ meetings and volunteered as moderator or presented as well. He also moderates several LLVM mailing lists.  Arnaud is also very involved in community wide discussions and decisions such as re-licensing and code of conduct.

Arnaud is a Principal Engineer at ARM.

David Kipping has been involved with the LLVM project since 2010. He has been a key organizer and supporter of many LLVM community events such as the US and European LLVM Developers’ Meetings. He has served on many of the program committees for these events.

David has worked hard to advance the adoption of LLVM at Qualcomm and other companies. One such example of his efforts is the LLVM track he created at the 2011 Linux Collaboration summit. He has over 30 years experience in open source and developer tools including working on C++ at Borland.

David has served on the board of directors for the last 2 years and has held the officer position of treasurer. The treasurer is a time demanding position in that he supports the day to day operation of the foundation, balancing the books, and generates monthly treasurer reports.

David is Director of Product Management at Qualcomm and has served on the LLVM Foundation board of directors for the last 2 years

Anton Korobeynikov has been an active contributor to the LLVM project since 2006. Over the years, he has numerous technical contributions to areas including Windows support, ELF features, debug info, exception handling, and backends such as ARM and x86. He was the original author of the MSP430 and original System Z backend.

In addition to his technical contributions, Anton has maintained LLVM’s participation in Google Summer of Code by managing applications, deadlines, and overall organization. He also supports the LLVM infrastructure and has been on numerous program committees for the LLVM Developers’ Meetings (both US and EuroLLVM).

Anton is currently an associate professor at the Saint Petersburg State University and has served on the LLVM Foundation board of directors for the last 2 years.

Tanya Lattner has been involved in the LLVM project for over 14 years. She began as a graduate student who wrote her master's thesis using LLVM, and continued on using and extending LLVM technologies at various jobs during her career as a compiler engineer.   

Tanya has been organizing the US LLVM Developers’ meeting since 2008 and attended every developer meeting. She was the LLVM release manager for 3 years, moderates the LLVM mailing lists, and helps administer the LLVM infrastructure servers, mailing lists, bugzilla, etc. Tanya has also been on the program committee for the US LLVM Developers’ meeting (4 years) and the EuroLLVM Developers’ Meeting (1 year).

With the support of the initial board of directors, Tanya created the LLVM Foundation, defined its charitable and education mission, and worked to get 501(c)(3) status.

Tanya is the Chief Operating Officer and has served as the President of the LLVM Foundation board for the last 2 years.

Chris Lattner is well known as the founder for the LLVM project and has a lengthy history of technical contributions to the project over the years.  He drove much of the early implementation, architecture, and design of LLVM and Clang.

Chris has attended every LLVM Developers’ meeting, and presented at the majority. He helped drive the conception and incorporation of the LLVM Foundation, and has served as Secretary of the board for the last 2 years. Chris also grants commit access to the LLVM Project, moderates mailing lists, moderates and edits the LLVM blog, and drives important non-technical discussions and policy decisions related to the LLVM project.

Chris manages the Developer Tools department at Apple Inc and has served on the LLVM Foundation board of directors for the last 2 years.

John Regehr has been involved in LLVM for a number of years. As a professor of computer science at the University of Utah, his research specializes in compiler correctness and undefined behavior. He is well known within the LLVM community for the hundreds of bug reports his group has reported to LLVM/Clang.

John was a project lead for IOC, a Clang based integer overflow checker that eventually became the basis for the integer parts of UBSan. He was also the primary developer of C-Reduce which utilizes Clang as a library and is often used as a test case reducer for compiler issues.

In addition to his technical contributions, John has served on several LLVM-related program committees. He also has a widely read blog about LLVM and other compiler-related issues (Embedded in Academia).

by Tanya Lattner ( at September 12, 2016 05:10 PM

August 17, 2016

OpenMP Runtime Project

New code release

We are excited to announce the next release of the Intel® OpenMP* Runtime Library at This release aligns with Intel® Parallel Studio XE 2017 Composer Edition

New Features:

  • OpenMP* 4.5 nonmonotonic modifier for schedule dynamic and guided support

Bug Fixes:

by mad\vishakh1 at August 17, 2016 07:06 PM

July 22, 2016

Aaron Ballman

Here are Some Hints for Prospective Users about How to Select the Best Virtual Data Room for their Business

Keeping and sharing of the confidential documents on the Internet are becoming an indispensable necessity of the modern business sphere. A wide range of providers offer businessmen highly protected digital venues that have to simplify the entire course of the transaction fulfillment – so-called VDRs. Their main function is to ensure safety to virtual versions of documents. A virtual repository might be utilized not merely as a storage for secret data but also as space where numerous partners have a possibility to exchange and discuss files, transactions, and deals. In such a way, virtual platforms reduced the need to finish the entire project merely during personal negotiations. In the event that you wish to study more about data room services, please, pay your attention to this site – Ideals data room. Considering the fact that VDRs are becoming more and more popular, a lot of vendors provide their services on the market. Undoubtedly, not all the existing platforms are reliable and decent enough to rely on. To choose a reliable platform, the one is expected to pay attention to various characteristics.

1. Reputation and experience of the VDR

Be attentive to the reputation of the vendor. Usually, the reputation might be assessed on the basis of the comments available in media. Both – the comments of regular room visitors and the investigation by experts – are handy when it comes to choosing a virtual data room vendor. In addition, it seems to be interesting to look through the details of the transactions which were completed with the help of the software developer and, possibly, even to meet with the deal-makers from a corporation which has already exploited services of the provider. In addition, the decent vendor will be capable of offering customized services to a VDR visitor.

2. The functions the virtual room will provide you with

In a case of searching out a virtual room, a certain set of crucially important options must be considered. Thus, a decent room concentrates on document and access protection. The data room is excepted to possess all the specific certificates (SSAE 16 and ISO 27001), provide encryption of the data, firewalls, multi-sided user verification system, watermarks, etc. In addition, the room administrator is expected to possess all the rights to manage access to the virtual room in general and to chosen documents, folders. Regular audit reports allow to keep an eye on all the actions in the virtual repository. Together with being secure, the virtual room is expected to be convenient to utilize. That is why, simple interface that simplifies navigation in the VDR proved to be crucial. Various upload and search instruments also simplify and speed up the work in the virtual room. The mentioned options do not constitute an exhaustive list of the instruments which the user must look for in the virtual data room: the demands will depend on the customer’s needs.

3. Expected expenditures

If there are various provides on the market, the potential clients may choose among repositories of different price: the cost of utilization varies significantly on the vendor, on the expected time needed for execution of the deal, on the particular options demanded, etc. Hence, the deal-maker is supposed to be down-to-earth and to determine how much he is willing to pay for the virtual room.

4. The benefits you and your stakeholders will have a chance to experience

A platform have to be not only cheap enough and comfortable for the owner but also meet the expectations of the owner’s present or potential partners. From time to time it is advisable to choose the more luxurious virtual room considering it has numerous features required by the other side of the deal.

5. Required set of functions

Before make a first payment for a virtual data room, critical evaluation of the demands and expectations is supposed to be performed: not a single deal-maker wants to pay for an expensive software provided with a bunch of useless tools. The client has to make sure he is paying for what he really needs and expects – not for trendy and catchy instruments which have totally nothing in common with information security and storage.

If following these simple guidelines in mind, the potential VDR user will have just a few doubts when selecting a VDR. Although the selection process may require a considerable amount of time, it is better to waste a bit more time and to try demo versions of diverse virtual repositories than to choose the very first virtual platform which was on sale at affordable price. You are supposed to remember the fact that you are about to pay for your protection and comfort and the virtual data room should not be the ones to save money on.

by joe at July 22, 2016 02:19 PM

June 21, 2016


ThinLTO: Scalable and Incremental LTO

ThinLTO was first introduced at EuroLLVM in 2015, with results shown from a prototype implementation within clang and LLVM. Since then, the design was reviewed through several RFCs, it has been implemented in LLVM (for gold and libLTO), and tuning is ongoing. Results already show good performance for a number of benchmarks, with compile time close to a non-LTO build.

This blog post covers the background, design, current status and usage information.

This post was written by Teresa Johnson, Mehdi Amini and David Li.

LTO Background and Motivation

LTO (Link Time Optimization) is a method for achieving better runtime performance through whole-program analysis and cross-module optimization. During the compile phase, clang will emit LLVM bitcode  instead of an object file. The linker recognizes these bitcode files and invokes LLVM during the link to generate the final objects that will constitute the executable. The LLVM implementation loads all input bitcode files and merges them together to produce a single Module. The interprocedural analyses (IPA) as well as the interprocedural optimizations (IPO) are performed serially on this monolithic Module.

What this means in practice is that LTO often requires a large amount of memory (to hold all IR at once) and is very slow. And with debug information enabled via -g, the size of the IR and the resulting memory requirements are significantly larger. Even without debug information, this is prohibitive for very large applications, or when compiling on memory-constrained machines. It also makes incremental builds less effective, as everything from the LTO step on must be re-executed when any input source changes.

ThinLTO Design

ThinLTO is a new approach that is designed to scale like a non-LTO build, while retaining most of the performance achievement of full LTO.
In ThinLTO, the serial step is very thin and fast. This is because instead of loading the bitcode and merging a single monolithic module to perform these analyses, it utilizes compact summaries of each module for global analyses in the serial link step, as well as an index of function locations for later cross module importing. The function importing and other IPO transformations are performed later when the modules are optimized in fully parallel backends.

The key transformation enabled by ThinLTO global analyses is function importing, in which only those functions likely to be inlined are imported into each module. This minimizes the memory overhead in each ThinLTO backend, while maximizing the most impactful cross module optimization opportunities. The IPO transformations are therefore performed on each module extended with its imported functions.

The ThinLTO process is divided into 3 phases:
  1. Compile: Generate IR as with full LTO mode, but extended with module summaries 
  2. Thin Link: Thin linker plugin layer to combine summaries and perform global analyses 
  3. ThinLTO backend: Parallel backends with summary-based importing and optimizations 

By default, linkers that support ThinLTO (see below) are set up to launch the ThinLTO backends in threads. So the distinction between the second and third phases is transparent to the user.

The key enabler for this process are the summaries emitted during phase 1. These summaries are emitted using the bitcode format, but designed so that they can be separately loaded without involving an LLVMContext or any other expensive construction. Each global variable and function has an entry in the module summary. An entry contains metadata that abstracts the symbol it is describing. For example, a function is abstracted with its linkage type, the number of instructions it contains, and optional profiling information (PGO). Additionally, every reference (address taken, direct call) to another global is recorded. This information enables building a complete reference graph during the Thin Link phase, and subsequent fast analyses using the global summary information.

Current Status

ThinLTO is currently supported in both the gold plugin as well as in ld64 starting with Xcode 8. Additionally, support is currently being added to the lld linker. The 3.9 release of clang will have ThinLTO accessible using the -flto=thin command line option.

While tuning is still in progress, ThinLTO already performs well compared to LTO, in many cases matching the performance improvement. In a few cases ThinLTO even outperforms full LTO, most likely because the higher scalability of ThinLTO allows using a more aggressive backend optimization pipeline (similar to that of a non-LTO build).

The following results were collected for the C/C++ SPEC cpu2006 benchmarks on an 8-core 2.6GHz Intel Xeon E5-2689. Each benchmark was run in isolation three times and results are shown for the average of the three runs.

Critically, due to the scalable design of ThinLTO, this performance is achieved with a build time that stays within a non-LTO build scale. The following build times were collected on a 20 core 2.8GHz Intel Xeon CPU E5-2680 v2, running Linux and using the gold linker. The results are for an end-to-end build of clang (ninja clang) from a clean build directory, so it includes all the compile steps and links of intermediate binaries such as llvm-tblgen and clang-tblgen.

Release build shows how ThinLTO build time is very comparable to a non-LTO build. Adding -gline-tables-only adds a very small overhead, and ThinLTO is again similar to the regular non-LTO build. However with full debug information, ThinLTO is still somewhat slower than a non-LTO build due to the additional overhead during importing. Ongoing improvements to debug metadata representation and handling are expected to continue to reduce this overhead. In all cases, full LTO is actually significantly slower.

On the memory consumption side, the improvements are significant. Over the last two years, FullLTO was significantly improved, as shown on the chart below, but our measurement shows that ThinLTO keeps a large advantage.

Usage Information

To utilize ThinLTO, simply add the -flto=thin option to compile and link. E.g.
    % clang -flto=thin -O2 file1.c file2.c -c
    % clang -flto=thin -O2 file1.o file2.o -o a.out

As mentioned earlier, by default the linkers will launch the ThinLTO backend threads in parallel, passing the resulting native object files back to the linker for the final native link.  As such, the usage model the same as non- LTO. Similar to regular LTO, for Linux this requires using the gold linker configured with plugins enabled or ld64 starting with Xcode 8.

Distributed Build Support

To take advantage of a distributed build system, the parallel ThinLTO backends can each be launched as a separate process. To support this, the gold plugin provides a thinlto_index_only option that causes the link to exit after creating the combined index and performing global analysis.

Additionally, in this mode:
  • Instead of using a monolithic combined index, a separate individual index file is written per backend containing the necessary portions of the combined index for recording the imports and any other global summary based optimization decisions that should be acted on in the backend. 
  • A plain text listing of the bitcode files each module will import from is optionally emitted to aid in distributed build file staging (thinlto-emit-imports-files plugin option). 

The backends can be launched by invoking clang on the bitcode and providing its index via an option. Finally, the resulting native objects are linked to generate the final binary. For example:

    % clang -flto=thin -O2 file1.c file2.c -c
    % clang -flto=thin -O2 file1.o file2.o -Wl,-plugin-opt,-thinlto-index-only
    % clang -O2 -o file1.native.o -x ir file1.o -c -fthinlto-index=./file1.o.thinlto.bc
    % clang -O2 -o file2.native.o -x ir file2.o -c -fthinlto-index=./file2.o.thinlto.bc
    % clang file1.native.o file2.native.o -o a.out

Incremental ThinLTO Support

With full LTO, only the initial compile steps can be performed incrementally. If any input has changed, the expensive serial IPA/IPO step must be redone.

With ThinLTO, the serial Thin Link step must be redone if any input has changed, however, as noted earlier this is small and fast, and does not involve loading any module. And any particular ThinLTO backend must be redone iff:

  1. The corresponding (primary) module’s bitcode changed 
  2. The list of imports into or exports from the module changed 
  3. The bitcode for any module being imported from has changed 
  4. Any global analysis result affecting either the primary module or anything it imports has changed. 

For single machine builds, where the threads are launched by the linker, incremental builds can be achieved by caching the module after applying the global summary based optimizations such as importing, using a hash of the information listed above as the key. This caching is already supported in libLTO’s ThinLTO handling, which is used by ld64. To enable it, the link step needs to be passed an extra flag: -Wl,-cache_path_lto,/path/to/cache

For distributed builds, the above information in items 2-4 are all serialized into the individual index files. So the build system can compare the contents of the input bitcode files (the primary module’s bitcode and any it imports from) along with the combined index against those from an earlier build to decide if a particular ThinLTO backend must be redone. To make this process more efficient, the content of the bitcode file is hashed when emitted during the compile phase, and the result is stored in the bitcode file itself so that the cache can be queried during the Thin Link step without reading the IR.

The chart below illustrates the full build time of clang in three different situations:
  1. The full link following a clean build.
  2. The developer fixes the implementation of DenseMap::grow(). This is a widely used header in the project, which forces to rebuild a large number of files.
  3. The developer fixes the implementation of visitCallInst() in InstCombineCalls.cpp. This an implementation file and incremental build should be fast.

These results illustrate how full LTO is not friendly with incremental build, and show how ThinLTO is providing an incremental link-time very close to a non-LTO build.

by Teresa Johnson ( at June 21, 2016 05:24 PM

June 20, 2016


LLVM Weekly - #129, Jun 20th 2016

Welcome to the one hundred and twenty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Last week was WWDC, which featured talks on what's new in LLVM (slides) and what's new in Swift (slides). Note that the embedded video player suggests you need Safari or the WWDC app to stream the video, but you can find a downloadable version under the "resources" tab.

On the mailing lists

LLVM commits

  • FileCheck learnt the --check-prefixes option as a shorthand for multiple --check-prefix options. r272670.

  • A local_unnamed_addr attribute was introduced. This can be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. r272709.

  • The ScalarReplAggregates pass has been removed as it has been superseded by SROA by a long time. r272737.

  • LLVM's C API gained support for string attributes. r272811.

  • Assembly parsing and lexing has seem some cleanups. r273007.

Clang commits

  • A new loop distribution pragma was added. Loop distribution is a transformation which attempts to break a loop in to multiple loops with each taking part of the loop body. r272656.

  • The nodebug attribute can now be applied to local variables. r272859.

  • The validity check for MIPS CPU/ABI pairings is now performed at initialisation time and a much clearer message is printed. r272645.

Other project commits

  • A complete implementation of the C++ Filesystem TS has been checked in. r273034.

  • LLD's ARM port gained initial support for Thumb with ARMv7a. r272881.

by Alex Bradbury ( at June 20, 2016 11:23 AM

June 16, 2016


Using LNT to Track Performance

In the past year, LNT has grown a number of new features that makes performance tracking and understanding the root causes of performance deltas a lot easier. In this post, I’m showing how we’re using these features.

LNT contains 2 big pieces of functionality:
  1. A server,
    a. to which you can submit correctness and performance measurement data, by sending it a json-file in the correct format,
    b. that analyzes which performance changes are significant and which ones aren't,
    c. that has a webui to show results and analyses in a number of different ways.
  2. A command line tool to run tests and benchmarks, such as LLVM’s test-suite, SPEC2000 and SPEC2006 benchmarks.
This post focuses on using the server. None of the features I’ll show are LLVM-specific, or even specific to ahead-of-time code generators, so you should be able to use LNT in the same way for all your code performance tracking needs. At the end, I’ll give pointers to the documentation needed to setup an LNT server and how to construct the json file format with benchmarking and profiling data to be submitted to the server.
The features highlighted focus on tracking the performance of code, not on other aspects LNT can track and analyze.
We have 2 main uses cases in tracking performance:
  • Post-commit detection of performance regressions and improvements.
  • Pre-commit analysis of the impact of a patch on performance.
I'll focus on the post-commit detection use case.

Post-commit performance tracking

Step 1. Get an overview of the "Daily Report" page

Assuming your server runs at http://yourlntserver:8000, this page is located at http://yourlntserver:8000/db_default/v4/nts/daily_report
The page gives a summary of the significant changes it found today.
An example of the kind of view you can get on that page is the following
In the above screenshot, you can see that there were performance differences on 3 different programs, bigfib, fasta and ffbench. The improvement on ffbench only shows up on a machine named “machine3”, whereas the performance regression on the other 2 programs shows up on multiple machines.

The table shows how performance evolved over the past 7 days, one column for each day. The sparkline on the right shows graphically how performance has evolved over those days. When the program was run multiple times to get multiple sample points, these show as separate dots that are vertically aligned (because they happened on the same date). The background color in the sparkline represents a hash of the program binary. If the color is the same on multiple days, the binaries were identical on those days.

Let’s look first at the ffbench program. The background color in the sparkline is the same for the last 2 days, so the binary for this program didn’t change in those 2 days. Conclusion: the reported performance variation of -8.23% is caused by noise on the machine, not due to a change in code. The vertically spread out dots also indicate that this program has been noisy consistently over the past 7 days.

Let’s now look at the bigfib. The background color in the sparkline has changed since its previous run, so let’s investigate further. By clicking on one of the machine names in the table, we go to a chart showing the long-term evolution of the performance of this program on that machine.

Step 2. The long-term performance evolution chart

This view shows how performance has evolved for this program since we started measuring it. When you click on one of the dots, which each represent a single execution of the program, you get a pop-up with information such as revision, date at which this was run etc.
When you click on the number after “Run:” in that pop-up, it’ll bring you to the run page.

Step 3. The Run page

The run page gives an overview of a full “Run” on a given machine. Exactly what a Run contains depends a bit on how you organize the data, but typically it consists of many programs being run a few times on 1 machine, representing the quality of the code generated by a specific revision of the compiler on one machine, for one optimization level.
This run page shows a lot of information, including performance changes seen since the previous run:
When hovering with the mouse over entries, a “Profile” button will show, that when clicked, shows profiles of both the previous run and the current run.

Step 4. The Profile page

At the top, the page gives you an overview of differences of recorded performance events between the current and previous run.
After selecting which function you want to compare, this page shows you the annotated assembly:

While it’s clear that there are differences between the disassembly, it’s often much easier to understand the differences by reconstructing the control flow graph to get a per-basic-block view of differences. By clicking on the “View:” drop-down box and selecting the assembly language you see, you can get a CFG view. I find showing absolute values rather than relative values helps to understand performance differences better, so I also chose “Absolute numbers” in the drop down box on the far right:
There is obviously a single hot basic block, and there are differences in instructions in the 2 versions. The number in the red side-bar shows that the number of cycles spent in this basic block has increased from 431M to 716M. In just a few clicks, I managed to drill down to the key codegen change that caused the performance difference!

We combine the above workflow with the llvmbisect tool available at to also quickly find the commit introducing the performance difference. We find that using both the above LNT workflow and the llvmbisect tool are vital to be able to act quickly on performance deltas.

Pointers on setting up your own LNT server for tracking performance

Setting up an LNT server is as simple as running the half a dozen commands documented at under "Installation" and "Viewing Results". The "Running tests" section is specific to LLVM tests, the rest is generic to performance tracking of general software.

The documentation for the json file format to submit results to the LNT server is here:
The documentation for how to also add profile information, is at

by Kristof Beyls ( at June 16, 2016 06:19 AM

June 13, 2016


LLVM Weekly - #128, June 13th 2016

Welcome to the one hundred and twenty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

LDC, a compiler for the D programming language with an LLVM backends has a major release with 1.0.0. The big news with this release is that the frontend is now completely written in D. Congratulations to everyone involved in this release. See the D website for more information about the D programming language.

The minor release LLVM 3.8.1-rc1 has been tagged.

On the mailing lists

LLVM commits

  • Some of the work from the GSoC project on interprocedural register allocation has started to land. A RegUsageInfoCollector analysis was added that collects the list of clobbered registers for a MachineFunction. A new transformation pass was committed which scans the body of a function to find calls and updates the register mask with the one saved by RegUsageInfoCollector. r272403, r272414.

  • Chapter 2 of the tutorial on building a JIT with ORC has been fleshed out with a rough draft of the text. r271885.

  • The host CPU detection code for x86 has seen a large refactoring. r271921.

  • More documentation has been added about LLVM's CodeView support. r272057.

  • llvm-symbolizer will now be searched for in the same directory as the LLVM or Clang tool being executed. This increases the chance of being able to print pretty backtraces for systems where LLVM tools aren't installed in the $PATH. r272232.

Clang commits

  • Clang analyzer gained a checker for correct usage of the MPI API in C and C++. r271907.

  • Documentation was added on avoiding static initializers when using profiling. r272067, r272214.

Other project commits

  • A hardened allocator, 'scudo' was added to compiler-rt. It attempts to mitigate some common heap-based vulnerabilities. r271968.

  • Initial support for ARM has landed in LLD. This is just enough to link a hello world on ARM Linux. r271993.

  • Initial support for AddressSanitizer on Win64 was added. r271915.

by Alex Bradbury ( at June 13, 2016 12:08 PM

June 06, 2016


LLVM Weekly - #127, June 6th 2016

Welcome to the one hundred and twenty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Graham Markall at Embecosm has been comparing the code size of RISC-V binaries produced by the GCC and LLVM ports, as well as compared to ARM. GCC is currently ahead, though it is worth noting the LLVM port has seen much less attention.

Matthias Reisinger is a Google Summer of Code student working on enabling polyhedral optimisations for the Julia programming language. He's written a blog post detailing his initial steps and immediate future plans. Hopefully we'll see more posts over the summer.

Loïc Hamot has been working on a C++ to D converter, implemented using Clang.

The MSVC team have blogged about the latest release of Clang with Microsoft CodeGen, based on Clang 3.8.

There is going to be a clang-tidy code dojo in Warsaw on Tuesday the 7th of June.

On the mailing lists

LLVM commits

  • LLVM gained support for 'SJLJ' (setjmp/longjmp) exception handling on x86 targets. r271244.

  • LLVM now requires CMake 3.4.3 to build r271325.

  • Support was added for attaching metadata to global variables. r271348.

  • The AArch64 backend switched to use SubtargetFeatures rather than testing for specific CPUs. r271555.

Clang commits

  • The release notes have been updated to explain the current level of OpenMP support (full support of non-offloading features of OpenMP 4.5). r271263.

  • Clang's source-based code coverage has been documented. r271454.

Other project commits

  • An -fno-exceptions libc++abi library variant was defined, to match the -fno-exceptions libc++ build. r271267.

  • LLDB's compact unwind printing tool gained support for ARMv7's compact unwind format. r271744.

by Alex Bradbury ( at June 06, 2016 03:02 PM

May 30, 2016


LLVM Weekly - #126, May 30th 2016

Welcome to the one hundred and twenty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

I've been moving house this weekend, so do accept my apologies if you find this issue to be a little less thorough than usual.

News and articles from around the web

Pyston, the LLVM-based Python compiler has released version 0.5. The main changes are a switch to reference counting and NumPy compatibility.

I don't want to become "C++ weekly", but I think this audience appreciates a fun use of C++ features. Verdigris is a header-only library that allows you to use Qt5 without the moc preprocessor.

The call for papers for the 3rd workshop on the LLVM compiler infrastructure in HPC has been published. The deadline for paper submission is September 1st. The workshop will take place on November 14th in Salt Lake City, and is held in conjunction with SC16.

On the mailing lists

  • Vivek Pandya, a GSoC student working on interprocedural register allocation has shared a weekly status report.

  • Rafael Espíndola has proposed creating a bitcode symbol table.

  • There's been some updates on the progress of open-sourcing PGI's Fortran frontend.

  • Elena Lepilkina has proposed some enhancement to FileCheck. Some questions were raised about how useful the proposed extensions will be. Sergey Yakoushkin provided more background on how these features are used in a commercial codebase. As Elena notes, these features don't need to all be upstreamed at once (or at all), and are mostly independent.

  • Lang Hames has posted a heads-up about upcoming breaking API changes for ORC and MCJIT.

  • Sean Silva has kicked off a discussion on the state of IRPGO. You might ask what is IRPGO? This is profile-guided optimisation performed through instrumentation at the LLVM IR level, as opposed to FEPGO where instrumentation is added by the frontend (e.g. Clang), prior to lowering to IR. Sean would like to make IRPGO the default on all platforms other than Apple at the moment (who may require a longer deprecation period). A number of followup comments discuss possibilities for ensuring all platforms can move forward together, and ensuring a sensible flag exists to choose between frontend or middle-end PGO.

  • What exactly is a register pressure set? Both Quentin Colombet and Andrew Trick have answers for us.

LLVM commits

  • New optimisations covering checked arithmetic were added. r271152, r271153.

  • Advanced unrolling analysis is now enabled by default. r270478.

  • The initial version of a new chapter to the 'Kaleidoscope' tutorial has been committed. This describes how to build a JIT using ORC. r270487, r271054.

  • LLVM's stack colouring analysis data flow analysis has been rewritten in order to increase the number of stack variables that can be overlapped. r270559.

  • Parts of EfficiencySanitizer are starting to land, notably instrumentation for its working set tool. r270640.

  • SelectionDAG learned how to expand multiplication for larger integer types where there isn't a standard runtime call to handle it. r270720.

  • LLVM will now report more accurate loop locations in optimisation remarks by reading the starting location from llvm.loop metadata. r270771.

  • Symbolic expressions are now supported in assembly directives, matching the behaviour of the GNU assembler. r271102.

  • Symbols used by plugins can now be auto-exported on Windows, which improves support for plugins in Windows. See the commit message for a full description. r270839.

Clang commits

  • Software floating point for Sparc has been exposed in Clang through -msoft-float. r270538.

  • Clang now supports the -finline-functions argument to enable inlining separately from the standard -O flags. r270609.

Other project commits

  • SectionPiece in LLD is now 8-bytes smaller on 64-bit platforms. This improves the time to link Clang with debug info by 2%. r270717.

  • LLD has replaced a use of binary search with a hash table lookup, resulting in a 4% speedup when linking Clang with debug info. r270999.

  • LLDB now supports AArch64 compact unwind tables, as used on iOS, tvos and watchos. r270658.

by Alex Bradbury ( at May 30, 2016 07:07 PM

May 23, 2016


LLVM Weekly - #125, May 23rd 2016

Welcome to the one hundred and twenty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Stephen Kelly has written a blog post about using Clang through the cindex API to automatically generate Python bindings. He also makes use of SIP.

Krister Walfridsson has written a wonderfully clear post on C's type-based aliasing rules.

This week I discovered the Swift Weekly Brief newsletter. Its author, Jesse Squires does a wonderful job of summarising mailing list traffic, recent commits, and discussions on swift-evolution proposals. If you have an interest in Swift development or language design in general I highly recommend it.

Are you interested in writing for the LLVM blog? Or volunteering to help recruit content authors? If so, get in touch with Tanya.

The next Cambridge LLVM Social will be held at 7.30pm on May 25th at the Cambridge Blue.

On the mailing lists

LLVM commits

  • llc will now report all errors in the input file rather than just exiting after the first. r269655.

  • The SPARC backend gained support for soft floating point. r269892.

  • Reloc::Default no longer exists. Instead, Optional<Reloc> is used. r269988.

  • An initial implementation of a "guard widening" pass has been committed. This will combine multiple guards to reduce the number of checks at runtime. r269997.

Clang commits

  • clang-include-fixer gained a basic Vim integration. r269927.

  • The intrinsics headers now have feature guards enabled in Microsoft mode to combat the compile-time regression discussed last week due to their increased size. r269675.

  • avxintrin.h gained many new Doxygen comments. r269718.

Other project commits

  • lld now lets you specify a subset of passes to run in LTO. r269605.

  • LLDB has replaced uses of its own Mutex class with std::mutex. r269877, r270024.

by Alex Bradbury ( at May 23, 2016 11:41 AM

May 16, 2016


LLVM Weekly - #124, May 16th 2016

Welcome to the one hundred and twenty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

The main news this week is the announcement of Scala-native, an ahead-of-time compiler for Scala using LLVM. Jos Dirkens has written a getting started guide if you want to compile it and try it out. There's also more information in the slides from the announcement talk.

On the mailing lists

LLVM commits

  • The outdated guide on cross-compiling LLVM has been brought up to date. r269054.

  • The WebAssembly backend gained preliminary fast instruction selection (fast-isel) support. r269083, r269203, r269273.

  • Loop unrolling (other than in the case of explicit pragmas) is now disabled at -Os in LLVM. You may recall last week it was enabled for -Os in Clang, but with different thresholds. r269124.

  • A new cost-tracking system has been implemented for the loop unroller. r269388.

  • LLVM's Sparc backend has seen the addition of more LEON-specific features, e.g. signed and unsigned multiply-accumulate. r268908.

  • llc's -run-pass option will now work with any pass known to the pass registry. Previously it would silently do nothing if you specify indirectly added analysis passes or passes not present in the optimisation pipeline. r269003.

  • WebAssembly register stackification and coloring are now run very late in the optimisation pipeline. The commit message suggests it's useful to think of these passes as domain-specific liveness-based compression rather than a conventional optimisation. r269012.

  • When declaring global in textual LLVM IR, you must now assign them with e.g. @0 = global i32 42. r269096.

  • The internal assembler is now enabled by default for 32-bit MIPS targets. r269560.

Clang commits

  • Clang now supports __float128. r268898.

  • Clang gained a new warning that triggers when casting away calling conventions from a function. r269116.

  • The recently developed include-fixer tools now has documentation. r269167.

Other project commits

  • compiler-rt's CMake build system can now build builtins without a full toolchain, allowing you to bootstrap a cross-compiler. r268977.

  • LLD will now sort relocations to optimise dynamic linker performance. r269066.

by Alex Bradbury ( at May 16, 2016 11:26 AM

May 09, 2016


LLVM Weekly - #123, May 9th 2016

Welcome to the one hundred and twenty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

If you're in London tomorrow you may be interested in the NMI Open Source Conference. You can register until midday today. I'll be giving a brief talk on lowRISC. While on the subject of conferences, if you are interested in diversity and inclusion in computing education, you may want to check out the CAS #include diversity conference in Manchester on the 11th June.

The canonical home for this issue can be found here at

News and articles from around the web

Fabien Giesen has written a brief article explaining why compilers exploit undefined signed overflow.

The Google Open Source blog has a short piece on the XRay function call tracing system that was proposed for upstreaming last week on the LLVM mailing list.

On the mailing lists

LLVM commits

  • LLVM's CppBackend has been removed. As the commit message says, this backend has bit-rotted to the extent that it's not useful for its original purpose and doesn't generate code that compiles. r268631.

  • The AVR backend has seen a large amount of code merged in to LLVM. r268722.

  • The MIPS backend has seen some large changes to how relocations are handled. These are now represented using MipsMCExpr instead of MCSymbolRefExpr. As someone who has done quite a lot of (out-of-tree) LLVM backend work, I've always found it odd how some architectures have globally visible enum members in include/llvm/MC/MCExpr.h. r268379.

  • LLVM builds should hopefully now be deterministic by default, as LLVM_ENABLE_TIMESTAMPS is now opt-in rather than opt-out. In fact, a follow-up patch removed the option altogether. r268441, r268670.

  • The AARch64 backend learned to combine adjustments to the stack pointer for callee-save stack memory and local stack memory. r268746.

Clang commits

  • Clang now supports -malign-double for x86. This matches the default behaviour on x86-64, where i64 and f64 types are aligned to 8-bytes instead of 4. r268473.

  • Loop unrolling is no longer completely disabled for -Os. r268509.

  • Clang's release notes (reflecting the state of current trunk) have been updated to say more about the state of C++1z support. r268663.

Other project commits

  • libcxx will now build a libc++experimental.a static library to hold symbols from the experimental C++ Technical Specifications (e.g. filesystem). This library provides no ABI compatibility. r268443, r268456.

  • All usage of pthreads in libcxx has been refactored in to the __threading_support header, with the intention of making it easier to retarget libcxx to platform that don't support pthreads. r268374.

  • libcxx gained support for the polymorphic memory resources C++ TS. r268829.

by Alex Bradbury ( at May 09, 2016 08:28 AM

May 02, 2016


LLVM Weekly - #122, May 2nd 2016

Welcome to the one hundred and twenty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

GCC 6.1 has been released. Perhaps the most apparent user-visible change is that the C++ frontend now defaults to C++14.

The Rust compiler has introduced a new intermediate representation, MIR, used for optimisations prior to lowering to LLVM IR.

Tanya Lattner has written about the LLVM Foundation's plans for 2016. The LLVM Foundation has established 3 main programs: Educational Outreach, Grants and Scholarships, and Women in Compilers and Tools.

On the mailing lists

LLVM commits

  • LLVM now supports indirect call promotion based on value-profile information. This will promote indirect calls to a direct call guarded by a precondition. r267815.

  • The LLVM documentation has been extended with a CMake primer covering the basics of the CMake scripting language. r268096.

  • The PDB dumper has been refactored into a library. r267431.

  • The MinLatency attributed has been removed from SchedMachineModel. r267502.

  • CodeGenPrepare will now use branch weight metadata to decide if a select should be turned into a branch. r267572.

  • Support for llvm.loop.distribute.enable metadata was added. This indicates a loop should be split in to multiple loops. r267672.

  • The SystemZ backend now supports the Swift calling convention. r267823.

  • libFuzzer's documentation has been expanded and improved. r267892.

Clang commits

  • clang-tidy gained a new checker for redundant expressions on both sides of a binary operator. r267574.

  • A new clang-tidy check will warn for use of functions like atoi and atol that don't report conversion errors. r268100.

  • The nodebug attribute on a global or static variable will now suppress all debug info for that variable. r267746.

  • A number of OpenMP features gained codegen support, such as the map clause and target data directive. r267808, r267811.

Other project commits

  • LLD now supports an -O0 option to produce output as quickly as possible. Currently this disables section merging at the cost of a potentially much larger output. r268056.

  • The symbol table in LLD's ELF linker has been redesigned with the intent of improving memory locality. The new design produces measurable speedups for the binaries tested in the commit message. r268178.

  • LLD's linkerscript support expanded to encompass comparison operators. r267832.

  • LLD performance on large executables has been improved by skipping scanRelocs on sections that are never mapped to memory at runtime (e.g. debug sections). r267917.

by Alex Bradbury ( at May 02, 2016 03:31 PM

April 27, 2016


LLVM Foundation 2016 Announcements

With 2016 upon us, the LLVM Foundation would like to announce our plans for the year. If you are not familiar with the LLVM Foundation, we are a 501(c)(3) nonprofit that supports the LLVM Project and its community. We are best known for our LLVM Developers’ Meetings, but we are introducing several new programs this year. 

The LLVM Foundation originally grew out of the need to have a legal entity to plan and support the annual LLVM Developers’ Meeting and LLVM infrastructure. However, as the Foundation was created we saw a need for help in other areas related to the LLVM project, compilers, and tools. The LLVM Foundation has established 3 main programs: Educational Outreach, Grants & Scholarships, and Women in Compilers & Tools.

Educational Outreach 

The LLVM Foundation plans to expand its educational materials and events related to the LLVM Project and compiler technology and tools. 

First, the LLVM Foundation is excited to announce the 2016 Bay Area LLVM Developers’ Meeting will be held November 3-4 in San Jose, CA. This year will be the 10th anniversary of the developer meeting which brings together developers of LLVM, Clang, and related projects. For this year’s meeting, we are increasing our registration cap to 400 in order to allow more community members to attend.

We also are investigating how we can support or be involved in other conferences in the field of compilers and tools. This may include things such as LLVM workshops or tutorials by sponsoring presenters, or providing instructional materials. We plan to work with other conference organizers to determine how the LLVM Foundation can be helpful and develop a plan going forward.

However, we want to do more for the community and have brainstormed some ideas for the coming year. We plan to create some instructional videos for those just beginning with LLVM. These will be short 5-10 minute videos that introduce developers to the project and get them started. Documentation is always important, but we find that many are turning to videos as a way to learn. 

Grants & Scholarships

We are creating a grants and scholarships program to cover student presenter travel expenses to the LLVM Developers’ Meetings. However, we also hope to expand this program to include student presenter travel to other conferences where the student is presenting their LLVM related work. Details on this program will be published once they have been finalized. 

Women in Compilers & Tools

Grace Hopper invented the first compiler and yet women are severely underrepresented in the field of compilers and tools. At the 2015 Bay Area LLVM Developers’ Meeting, we held a BoF on this topic and brainstormed ideas about what can be done. One idea was to increase LLVM awareness at technical conferences that have strong female participation. One such conference is the Grace Hopper Conference (GHC). The LLVM Foundation has submitted a proposal to present about LLVM and how to get involved with the LLVM open source community. We hope our submission is accepted, but if not, we are exploring other ways we can increase our visibility at GHC. Many of the other ideas from this BoF are being considered and actionable plans are in progress.

In addition, to these 3 programs, we will continue to support the LLVM Project’s infrastructure. The server will move to a new machine to increase performance and reliability.  

We hope that you are excited about the work the LLVM Foundation will be doing in 2016. Our 2016 Plans & Budget may be viewed here. You may also contact our COO & President, Tanya Lattner ( or the LLVM Foundation Board of Directors (

by Tanya Lattner ( at April 27, 2016 03:48 PM

April 25, 2016


LLVM Weekly - #121, Apr 25th 2016

Welcome to the one hundred and twenty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Congratulations to the eight students who have been selected for LLVM projects on Google Summer of Code this year. There's about a month before they start coding. The time between now and then is the 'community bonding period', so please do make them feel welcome.

The preliminary release schedule for LLVM/Clang 3.8.1 has been published. This would have a deadline of May 25th for requesting changes to be merged and would see the final release on June 15th.

On the mailing lists

LLVM commits

  • An implementation of optimisation bisection support has landed. This helps to track down bugs by allowing optimisations to be selectively disabled at compile-time to identify the one introducing a miscompile. r267022.

  • The AArch64 and ARM thread pointer intrinsics have been merged to make a target-independent llvm.thread.pointer intrinsic. r266818.

  • The llvm.load.relative intrinsic has been added. r267233.

  • There have been more changes to DebugInfo which will require a bitcode upgrade. A script to perform this upgrade is linked in the commit message. r27296.

  • The ORC JIT API improved its support for RPC, including support for calling functions with return values. r266581.

  • The patchable-function function attribution has been introduced, indicating that the function should be easily patchable at runtime. r266715.

  • The IntrReadArgMem intrinsic property has been split in to IntrReadMem and IntrArgMemOnly. r267021.

  • The MachineCombiner gained the ability to combine AArch64 fmul and fadd in to an fmadd. r267328.

  • Scheduling itineraries were added for Sparc, specifically for the LEON processors. r267121.

Clang commits

  • A prototype of an include fixing tool was created. The indexer remains to be written. r266870.

  • A new warning has been added, which will trigger if the compiler tries to make an implicit instantiation of a template but cannot find the template definition. r266719.

  • Initial driver flags for EfficiencySanitizer were added. r267059.

Other project commits

  • The initial EfficiencySanitizer base runtime library was added to compiler-rt. It doesn't do much of anything yet. r267060.

  • LLD learned to support the linkerscript ALIGN command. r267145.

  • LLDB can now parse EABI attributes for an ELF input. r267291.

by Alex Bradbury ( at April 25, 2016 11:20 AM

April 18, 2016


LLVM Weekly - #120, Apr 18th 2016

Welcome to the one hundred and twentieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

This week has seen not one, but two articles about LLVM and profile-guided optimisation. Dig in John Engelen's article about optimising D's virtual function calls with PGO, then read Geoffroy Couprie's article about PGO with Rust.

The next Cambridge (UK) social will be at 7.30pm on April 20th, at the Cambridge Blue.

Alex Denisov has written a blog post around the idea of building a mutation testing system using LLVM.

On the mailing lists

LLVM commits

  • AtomicExpandPass learned to lower various atomic operations to __atomic_* library calls. The eventual aim is to move all atomic lowering from Clang to LLVM. r266115.

  • Targets can now define an inlining threshold multiplier, to e.g. increase the likelihood of inlining on platforms where calls are very expensive. r266405.

  • The ownership between DICompileUnit and DISubprogram has been reversed. This may break tests for your out-of-tree backend, but the commit has a link to a Python script to update your testcases. r266446.

  • llvm-readobj learned to print a histogram of an input ELF file's .gnu.hash . r265967.

  • More target-specific support for the Swift calling convention (on ARM, AARch64, and X86) has landed. Also, a callee save register is used for the swiftself parameter. r265997, r266251.

  • A new allocsize attribute has been introduced. This indicates the given function is an allocation function. r266032.

  • analyzeSiblingValues has been replaced with a new lower-complexity implementation in order to reduce compile times. r266162.

  • The AMDGPU backend gained a skeleton GlobalISel implementation. r266356.

  • Every use of getGlobalContext other than the C API has been removed. r266379.

Clang commits

  • Clang gained support for the GCC ifunc attribute. r265917.

  • The __unaligned type qualifier was implemented for MSVC compatibility. r266415.

  • Support for C++ core guideline Type.6: always initialize a member variable was completed in clang-tidy. r266191.

  • A new clang-tidy checker for suspicious sizeof expressions was added. r266451.

Other project commits

  • The way relocations are applied in the new ELF linker has been reworked. r266158.

  • ELF LLD now supports parallel codegen for LTO using splitCodeGen. r266484.

  • Support for Linux on SystemZ in LLDB landed. r266308.

by Alex Bradbury ( at April 18, 2016 01:05 PM

April 11, 2016


LLVM Weekly - #119, Apr 11th 2016

Welcome to the one hundred and nineteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Last week the slides from the recent EuroLLVM 2016 Developers' Meeting made it online. This week this has been followed by videos of the talks from the conference.

John Regehr has written about efficient integer overflow checking in LLVM, looking at cases where LLVM can and cannot remove unnecessary overflow checks, and how this might be improved.

Version 0.13 of Pocl, the portable OpenCL implementation has been released. This release works with LLVM/Clang 3.8 and 3.7, and adds initial OpenCL 2.0 support and improved HSA support.

Serge Guelton at QuarksLab has written up a really useful guide to implementing a custom directive handler in Clang.

Microsoft's Visual C++ team are looking for feedback on Clang/C2 (Clang with Microsoft CodeGen).

On the mailing lists

  • James Molloy has posted an RFC on adding support for constant folding calls to math.h functions on long doubles. Currently these functions aren't constant-folded as the internal APFloat class doesn't implement them and long double operations aren't portable. Solutions include adding support to APFloat, linking against libMPFR to provide compile-time evaluation, or recognising when the long double format of the host and target are the same, so the host math library can be called. From the responses so far, there seems to be some push-back on adding the libMPFR dependency.

  • Sanjoy Das has an RFC on adding a patchable-prologue attribute. This would be used to indicate that the function's prologue is compiled so as to provide support for easy hot-patching.

  • Ulrich Weigand has shared a patch for supporting LLDB on Linux on SystemZ. The patchset contains many big-endian fixes, and may be of interest to others looking at porting LLDB.

LLVM commits

  • The Swift calling convention as well as support for the 'swifterror' argument has been added. r265433, r265480.

  • Work on GlobalISel continues with many commits related to the assignment of virtual registers to register banks. r265445, r265440.

  • LLVM will no longer perform inter-procedural optimisation over functions that can be "de-refined". r265762.

  • The substitutions supported by lit are now documented. r265314.

  • Unrolled loops now execute the remainder in an epilogue rather than the prologue. This should produce slightly improved code. r265388.

Clang commits

  • Clang gained necessary support for the Swift calling convention. r265324.

  • New flags -fno-jump-tables and -fjump-tables can be used to disable/enable support for jump tables when lowering switch statements. r265425.

  • TargetOptions is now passed through all the TargetInfo constructors. This will allow target information to be modified based on the ABI selected. r265640.

  • A large number of intrinsics from emmintrin.h now have Doxygen docs. r265844.

Other project commits

  • clang-tidy gained a new check to flag initializers of globals that access extern objects, leading to potential order-of-initialization issues. r265774.

  • LLD's ELF linker gained new options --start-lib, --end-lib, --no-gnu-unique, --strip-debug. r265710, r265717, r265722.

by Alex Bradbury ( at April 11, 2016 01:03 PM

April 04, 2016


LLVM Weekly - #118, Apr 4th 2016

Welcome to the one hundred and eighteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Almost all slides from the recent EuroLLVM conference are now available online for your enjoyment.

Some readers my be interested in a new paper about the 'LifeJacket' tool for verifying precise floating-point optimisations in LLVM.

Christian Neumüller has written a new tool for syntax highlighting and cross-referencing C and C++ source using libclang.

On the mailing lists

LLVM commits

  • The Lanai backend has landed. r264578.

  • A new llvm.experimental.guard intrinsic has been added. As described in the accompanying documentation, along with deoptimization operand bundles this allows frontends to express guards or checks on optimistic assumptions made during compilation. r264976.

  • Support for a number of new Altivec instructions has been added. Amazingly, this includes BCD (Binary Coded Decimal) instructions. r264568.

  • The concept of MachineFunctionProperties has been introduced, with the first property being AllVRegsAllocated. This allows passes to declare that they require a particular property, in this case requiring that they be run after regalloc. r264593.

  • On X86, push will now be used in preference to mov at all optimisation levels (before it was only enabled for -Os). r264966.

  • LLVM's support library can now compute SHA1 hashes. This is used to implement a 'build-id'. r265094, r265095.

  • When metadata is only referenced in a single function, it will now be emitted just in that function block. The aim of this is to improve the potential of lazy-loading. r265226.

Clang commits

  • The Lanai backend is now supported in the Clang driver. r264655.

  • libTooling gained a handy formatAndApplyAllReplacements function. r264745.

Other project commits

  • Parts of LLD are starting to use the new Error handling. r264910, r264921, r264924, and more.

  • Infrastructure was added to LLD for generating thunks (as required on platforms like MIPS when calling PIC code from non-PIC). r265059.

by Alex Bradbury ( at April 04, 2016 11:22 AM

April 01, 2016


My Little LLVM: Undefined Behavior is Magic!

A horrible mashup between LLVM's old dragon logo and a My Little Pony inspired pegasus pony
New LLVM logo

There’s been lots of discussion online (and then quite some more) about compilers abusing undefined behavior. As a response the LLVM compiler infrastructure is rebranding and adopting a motto to make undefined behavior friendlier and less prone to corruption.

The re-branding puts to rest a long-standing issue with LLVM’s “dragon” logo actually being a wyvern with an upside-down head, a special form of undefined behavior in its own right. The logo is now clearly a pegasus pony.

Another great side-effect of this rebranding is increased security by auto-magically closing all vulnerabilities used by the hacker who goes by the pseudonym “Pinkie Pie”.

These new features are enabled with the -rainbow clang option, in honor of Rainbow Dash’s unary name.

A Few Examples

C++’s memory model specifies that data races are undefined behavior. It is well established that no sane compiler would optimize atomics, LLVM will therefore supplement the Standard’s happens-before relationship with an LLVM-specific happens-to-work relationship. On most architectures this will be implemented with micro-pause primitives such as x86’s rep rep rep nop instruction.

Shifts by bit-width or larger will now return a normally-distributed random number. This also obsoletes rand() and std::random_shuffle.

bool now obeys the rules of truthiness to avoid that annoying “but what if it’s not zero or one?” interview question. Further, incrementing a bool with ++ now does the right thing.

Atomic integer arithmetic is already specified to be two’s complement. Regular arithmetic will therefore now also be atomic. Except when volatile, but not when volatile atomic.

NaNs will now compare equal, subnormals are free to self-classify as normal / zero / other, negative zero simply won’t be a thing, IEEE-754 has been upgraded to PONY-754, floats will still round with style, and generating a signaling NaN is now guaranteed to not be quiet by being equivalent to putchar('\a'). While we’re at it none of math.h will set errno anymore. This has nothing to do with undefined behavior but seriously, errno?

Type-punning isn’t a thing anymore. We’re renaming it to type-pony-ing, but it doesn’t do anything surprising besides throw parties. AND WHO DOESN’T LIKE PARTIES‽ EVEN SECURITY PEOPLE DO! 🎉

A Word From Our Sponsors

The sanitizers—especially undefined behavior sanitizer, address sanitizer and thread sanitizer—are great tools when dealing with undefined behavior. Use them on your tests, combine them with fuzzers, try them as cupcake topping! Be warned: their runtimes aren’t designed to be secure and you shouldn’t ship them in production code!

Cutie Marks

To address the horse in the room: we’ve left the new LLVM logo’s cutie mark as implementation-defined. Different instances of the logo can use their own cutie mark to illustrate their proclivities, but must clearly document them.

by Unknown ( at April 01, 2016 07:02 AM

March 29, 2016

OpenMP Runtime Project

New code release

We are excited to announce the next release of the Intel® OpenMP* Runtime Library at This release aligns with Intel® Parallel Studio XE 2016 Composer Edition Update 3.

New Features

  • OpenMP* 4.5 schedule(simd:static) support

Bug Fixes

  • Hwloc topology discovery improved
  • Spin backoff mechanism fixed in lock code
  • Plain barrier performance improved on Intel(R) Xeon Phi


by mad\tlwilmar at March 29, 2016 06:50 PM

March 28, 2016


LLVM Weekly - #117, Mar 28th 2016

Welcome to the one hundred and seventeenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Google Summer of Code applications are now closed. Applicants and interested third-parties can look forward to finding out which projects were selected on April 22nd.

Ramkumar Ramachandra has written a blog post giving a whirlwind tour of the internals of LLVM's fast register allocator (FastRegAlloc.cpp).

Alex Denisov has blogged about the various test suites used within the LLVM project.

Version 1.13 of the TTA-based Co-design Environment (TCE) has been released. This adds support for LLVM 3.8.

On the mailing lists

LLVM commits

  • A new utility, was added to update opt or llc test cases with new FileCheck patterns. r264357.

  • Non-power-of-2 loop unroll count pragmas are now supported. r264407.

  • The NVPTX backend gained a new address space inference pass. r263916.

  • Instances of Error are now convertible to std::error_code. Conversions are also available between Expected<T> and ErrorOr<T>. r264221, r264238.

  • Hexagon gained supported for run-time stack overflow checking. r264328.

Clang commits

  • Clang now supports lambda capture of *this by value. r263921.

  • The bitreverse builtins are now documented. r264203.

Other project commits

  • LLDB will fix inputted expressions with 'trivial' mistakes automatically. r264379.

  • ThreadSanitizer debugging support was added to LLDB. r264162.

  • Polly gained documentation to describe how it fits in to the LLVM pass pipeline. r264446.

  • LLDB has been updated to handle the UTF-16 APIs on Windows. r264074.

by Alex Bradbury ( at March 28, 2016 01:22 PM

March 21, 2016


LLVM Weekly - #116, Mar 21st 2016

Welcome to the one hundred and sixteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

If you're a student and would like to get paid to work on an LLVM-related project over the summer then do consider applying for Google Summer of Code with LLVM. More details about Summer of Code are available here. The deadline for applications is this Friday, March 25th at 1900 GMT. I'd also encourage you to look at lowRISC's project ideas if you have an interest in open source hardware.

Stephen Kelly has written about his new Clang-based tool for porting a C++ codebase to use almost-always-auto. As was pointed out on Twitter, Ryan Stortz from Trail of Bits has a tools that removes auto and does roughly the opposite.

Honza Hubička has written up his experiments of building LibreOffice with GCC6 and LTO. This includes a comparison to a build using LLVM and Clang.

Nick Clifton has shared an update for February and March on the GNU toolchain that may be of interest.

The developer of the Capstone disassembly framework and the Unicorn multi-architecture simulator is running a funding campaign for the Keystone multi-architecture assembler framework. Like Capstone, this will build on LLVM but also aims to go beyond it.

On the mailing lists

LLVM commits

  • A new Error support class has been added to support structured error handling. See the associated updates to the LLVM programmer's manual for more info. r263609.

  • New documentation was committed for advanced CMake build configurations. r263834.

  • Support was added for MIPS32R6 compact branches. r263444.

  • The MemCpyOptimizer will now attempt to reorder instructions in order to create an optimisable sequence. r263503.

  • llvm-readobj learnt to print sections and relocations in the GNU style. r263561.

Clang commits

  • Attributes have been added for the preserve_mostcc and preserve_allcc calling conventions. r263647.

  • clang-format will handle some cases of automatic semicolon insertion in JavaScript. r263470.

  • Clang learned to convert some Objective-C message sends to runtime calls. r263607.

Other project commits

  • AddressSanitizer is now supported on mips/mips64 Android. r263261.

  • The documentation on the LLD linker has added a few numbers to give an idea of the sort of inputs it needs to handle. e.g. Chrome with debug info contains roughly 13M relocations, 6.3M symbols, 1.8M sections and 17k files. r263466.

by Alex Bradbury ( at March 21, 2016 12:06 PM

March 14, 2016


LLVM Weekly - #115, Mar 14th 2016

Welcome to the one hundred and fifteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

We have an LLVM-related research position currently being advertised here at the University of Cambridge Computer Lab. If you'd like an informal chat about what it's like working in this group or on this project please don't hesitate to get in touch with me.

News and articles from around the web

LLVM and Clang 3.8 have now been released. Check out the LLVM and Clang release notes for a run-down of the new features.

It's GDC this week and if you're attending you may be interested that there's an LLVM meetup scheduled for Thursday.

Felix Angell has a detailed blog post introducing generating LLVM IR from Go.

On the mailing lists

LLVM commits

  • Loop invariant code motion learnt the ability the exploit the fact a memory location is known to be thread-local. r263072.

  • A new llvm.experimental.deoptimize intrinsic has been added. r26328.

  • A ThinLTOCodeGenerator was added in order to provide a proof-of-concept implementation. r262977.

  • The Sparc backend gained support for co-processor condition branching and conditional traps. r263044.

Clang commits

  • Clang gained support for the [[nodiscard]] attribute. r262872.

  • New AST matchers were added for addrLabelExpr, atomicExpr, binaryCondtionalOperator, designatedINitExpr, designatedInitExpr, designatorCountIs, hasSyntacticForm, implicitValueINitExpr, labelDecl, opaqueValueExpr, parenListExpr, predefinedExpr, requiresZeroInitialization, and stmtExpr. r263027.

Other project commits

  • Error and warning messages in LLD are now more consistent. r263125.

  • Documentation on the new ELF and COFF LLD linkers has been updated. r263336.

by Alex Bradbury ( at March 14, 2016 11:56 AM

March 07, 2016


LLVM Weekly - #114, Mar 7th 2016

Welcome to the one hundred and fourteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

LLVM has been accepted as a mentoring organisation in Google Summer of Code 2016. See here for more about what that means. If you're a student who would like to get paid to work on LLVM over the summer, you should definitely consider applying. Also take a look at the full list of organisations in GSoC 2016. If you have an interest in open source hardware, in my (biased) opinion you should definitely look at lowRISC's listed project ideas.

LLVM and Clang 3.8 'final' has been tagged. A release should be imminent.

There was a big C++ committee meeting last week. You can find summaries here and here. If you were hoping for modules, concepts, UFCS, ranges, or coroutines in C++17 I'm afraid you're in for disappointment. Many new features will be available in C++ Technical Specifications though.

llvmlite 0.9.0 has been released. llvmlite is a light-weight Python binding for LLVM. If you're wondering how to get started with llvmlite, then check out this recent blog post from Ian Bertolacci on writing fibonacci in LLVM with llvmlite.

Andi McClure has written a really interesting blog post about writing software without a compiler. In this case, generating LLVM IR from LuaJIT.

On the mailing lists

LLVM commits

  • MemorySSA has gained an initial update API. r262362.

  • TableGen can now check at compile time that a scheduling model is complete. r262384.

  • New comments in PassBuilder give a description of what trade-offs are expected for each optimisation level. r262196.

  • LoopLoadElimination is now enabled by default. r262250.

  • A new patch adding infrastructure for profile-guided optimisation enhancements in the inline has landed. r262636.

  • Experimental ValueTracking code which tried to infer more precise known bits using implied dominating conditions has been removed. Experiments didn't find it to be profitable enough, but it may still be useful to people wanting to experiment out of tree. r262646.

Clang commits

  • Clang's C API gained an option to demote fatal errors to non-fatal errors. This is likely to be useful for clients like IDEs. r262318.

  • clang-cl gained initial support for precompiled headers. r262420.

  • An -fembed-bitcode driver option has been introduced. r262282.

  • Semantic analysis for the swiftcall calling convention has landed. r262587.

  • Clang's TargetInfo will now store an actual DataLayout instance rather than a string. r262737.

Other project commits

  • LLDB can now read line tables from Microsoft's PDB debug info files. r262528.

  • The LLVM test-suite gained the ability to hash generated binaries and to skip tests if the hash didn't change since a previous run. r262307.

  • LLVM's OpenMP runtime now supports the new OpenMP 4.5 doacross loop nest and taskloop features. r262532, r262535.

by Alex Bradbury ( at March 07, 2016 12:40 PM

February 29, 2016


LLVM Weekly - #113, Feb 29th 2016

Welcome to the one hundred and thirteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

News and articles from around the web

LLVM and Clang 3.8RC3 has been tagged.

EuroLLVM 2016 is less than a month away. If you want to attend, be sure to register.

The Red Hat blog has a summary of new features in the upcoming GCC 6 release.

The Meeting C++ blog has a helpful summary of a subset of the proposals for the next C++ committee meeting.

On the mailing lists

LLVM commits

  • The Sparc backend now contains definitions for all registers and instructions defined in the Sparc v8 manual. r262133.

  • LLVM gained a basic LoopPassManager, though it currently only contains dummy passes. r261831.

  • A number of TargetInstrInfo predicates now take a reference to a MachineInstr rather than a pointer. r261605.

  • The WebAssembly backend gained redzone support for the userspace stack. r261662.

Clang commits

  • Whole-program vtable optimisation is now available in Clang using the -fwhole-program-vtables flag. r261767.

  • Clang gained __builtin_canonicalize which returns the platform-specific canonical encoding of a floating point number. r262122.

  • A hasAnyName matcher was added. r261574.

  • The pointer arithmetic checker has been improved to report fewer false positives. r261632.

Other project commits

  • The new ELF linker gained support for identical code folding (ICF). This reduces the size of an LLD binary by 3.6% and of a Clang binary by 2.7%. As described in the commit message, this is not a "safe" version of ICF as implemented in GNU gold, so will cause issues if the input relies on two distinct functions always having distinct addresses. r261912.

  • Polly's tree now contains an script that may be useful to other LLVM devs. It updates a FileCheck-based lit test by updating the CHECK: lines with the actual output of the RUN: command. r261899.

  • LLDB gained a new set of plugins to help debug Java programs, specifically Java code JIT-ed by the Android runtime. r262015.

  • The new OpenMP 4.5 affinity API is now supported in LLVM's openmp implementation. r261915.

  • The new ELF linker gained support for the -r command-line option, which produces relocatable output (partial linking). r261838.

  • The CMake/lit runner for SPEC in the LLVM test-suite can now run the C CPU2006 floating point benchmarks (but not the Fortran ones). r261816.

  • The old ELF linker has been deleted from LLD. r262158.

by Alex Bradbury ( at February 29, 2016 02:58 PM

February 25, 2016

OpenMP Runtime Project

New code release

We are excited to announce the next release of the Intel® OpenMP* Runtime Library at This release aligns with Intel® Parallel Studio XE 2016 Composer Edition Update 2

New Features:

  • Hwloc* 2.0 support added for affinity interface
  • OMPT support for windows
  • Support for untied tasks
  • OpenMP* 4.5 doacross, taskloop, and new affinity API

Bug Fixes:

by mad\jlpeyton at February 25, 2016 10:21 PM

February 22, 2016


LLVM Weekly - #112, Feb 22nd 2016

Welcome to the one hundred and twelfth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Filip Pizlo has written a fantastic article introducing the new B3 JIT compiler for WebKit's JavaScriptCore. This intends to replace LLVM as the optimising backend to their fourth-tier JIT. The article describes in detail their reasons for moving away from LLVM (mainly compile-time) and the design trade-offs made, such as in reducing memory allocations and minimising pointer-chasing in the IR. This reminds me of the trade-offs Mike Pall made in the LuaJIT 2.0 IR. Philip Reames also shared some initial thoughts on B3. I know some people have expressed disappointment about WebKit moving away from LLVM, but if you'll allow me to insert just a little bit of editorial I'd argue B3 is a very positive development for LLVM and the wider compiler community. B3 explores a different set of design trade-offs to those chosen for LLVM, and these sort of changes are probably easiest to explore in a fresh codebase.Thanks to this write-up (and hopefully future B3/AIR documentation), we can learn from the B3 developers' experiences and consider if some of their choices will make sense for LLVM. It's also good to remember that LLVM isn't the only feasible route for code generation and optimisation, and we shouldn't treat LLVM's design choices as the one-true way to do things. Impressively, B3 was developed to its current state in only 6 months of developer-time.

Version 0.17.0 of LDC, the LLVM-based compiler for the D programming language has been released. You can view a detailed changelog here.

GCC6 will feature a whole bunch of new warnings, and this blog post details many of them.

The schedule for EuroLLVM 2016 has now been posted. This will be held March 17th-18th in Barcelona.

On the mailing lists

LLVM commits

  • The PPCLoopDataPrefetch pass has been moved to Transforms/Scalar/LoopDataPrefetch in preparation for it becoming a target-agnostic pass. r261265.

  • The cmpxchg LLVM instruction now allows pointer type operands. r261281.

  • The X86 backend gained support for a new stack symbol ordering optimisation. This is primarily intended to reduce code size, and produces small but measurable improvements across some SPEC CPU 2000 benchmarks. r260917.

  • The LLVM C API has been extended to allow it to be used to manipulate the datalayout. r260936.

  • Some major work on the LazyCallGraph has been checked in. r261040.

  • The AMDGPU backend gained a basic disassembler. r261185.

  • The PostOrderFuctionAttrs pass has been ported to the new pass manager. As described in the commit message, this actually represents a major milestone. r261203.

  • The Hexagon backend gained support for thread-local storage. r261218.

Clang commits

  • A nullPointerConstant AST matcher was added. r261008.

  • Clang gained a -Wcomma warning, which will warn for most uses of the builtin comma operator. r261278

Other project commits

  • LLD has sprouted a release notes document. r260960.

  • The LLVM test-suite's CMake build system saw a number of fixes for SPEC. r261470.

by Alex Bradbury ( at February 22, 2016 11:29 AM

February 16, 2016

Philip Reames

Quick thoughts on WebKit’s B3

I just finished reading Introducing the B3 JIT Compiler and wanted to jot down some thoughts.  Fair warning, this post is being written in a hurry.  I’m focusing on getting down my initial reactions rather than trying for a really well thought out post.  That may follow at a later time or it may not.

The first key bit is that the goal of this effort appears to be strictly compile time, not peak performance.  Understanding that makes the effort make a lot more sense.  It’s still seems a bit odd to me for the compile time of your *fourth tier* JIT to be that important, but given I’m no expert in JavaScript, I’ll just accept that as a given.

In that context, I find the claimed 4.7x reduction in compile time surprisingly unexciting.  There’s enough low hanging fruit in LLVM – in particular, a better representation for “check” nodes – that I would expect something on that magnitude being possible within the existing framework they had.  Achieving a ~5 improvement of compile time with an entirely new compiler (and all of the engineering that implies), seems slightly disappointing.  Now it’s possible (heck, even likely!) that the new architecture will allow them to further drop compile time, but still…

The performance numbers quoted were uninformative at best.  From what I can gather in the write up, the JetStream benchmark is highly influenced by compile time.  While I understand the goal (it’s a useful one), it doesn’t really say anything about the peak performance of the code generated by the two compilers.  Given that, it’s really hard to tell if B3 is actually breakeven with the existing LLVM backend at peak. It have been really nice to see the same numbers with the compile time somehow removed or adjusted for.  (b3 with a sleep to be slower?  A longer warmup period in a modified benchmark?)

Just to be clear, I’m not saying that the numbers presented are “wrong”.  Merely that they don’t answer the question I’d most like them to. 🙂

Other things that jumped out at me:

  • The points about matching the IR to the source language in an effort to reduce the number of nodes (and thus memory, and time) are spot on.  If what you’re going for is compile time above all else, using an IR which closely matches your source language is absolutely the right approach.  This same general idea (remove memory/nodes where they don’t provide enough value) is what’s motivating the removal of pointer-to-pointer bitcasts in LLVM right now.
  • The emphasis on the importance of the “check” node (i.e. early OSR exit if condition fails) matches our experience as well.  You can also see this in Swift’s IR as well.  There is clearly an area that LLVM needs to improve.  I think we can do a lot better within LLVM, and I’m surprised they didn’t try to push that.  In particular, the aliasing problems mentioned could have been addressed with a custom AliasAnalysis instance.  Oh well.
  • The choice to use arrays (instead of lists) gives some interesting tradeoffs.  From a compile time perspective, a modify and compact scheme is likely a win.  Interestingly, this reminds me a lot of a mark-compact garbage collector (b3’s layout) vs a standard malloc/free allocator (llvm’s layout).  Given typical lifetimes in a compiler (short!), the collector approach is likely to be the right one.  It does raise some interesting challenges though: pointer equality can no longer be used, trivial dead code elimination is no longer trivial (which complicates various other transforms), and transforms have to deal with non-canonical forms due to extra identify nodes.  It’ll be really interesting to see where b3 goes on this basis alone.

by reames at February 16, 2016 12:22 AM

February 15, 2016


LLVM Weekly - #111, Feb 15th 2016

Welcome to the one hundred and eleventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

There has been a new release of the CilkPlus compiler. This includes an update to the latest LLVM and Clang trunk. CilkPlus implements the Cilk Plus language extensions for data and task parallelism in Clang.

There's been some more papers appearing from the C++ standards committee. P0225R0, or as you may prefer to call it "Why I want Concepts, and why I want them sooner rather than later" is worth a read. There's also been a few other recently published papers on iterator facades, the filesystem technical specification, and unified function call syntax.

On the mailing lists

LLVM commits

  • The WholeProgramDevirt pass has been added. This implements whole program optimization of virtual calls where the list of callees is known to be fixed. r260312.

  • The AVR backend upstreaming continues with the addition of the AVR tablegen instruction definitions. r260363.

  • There's been a bunch of other work on the new global instruction selection mechanism this week, but the commits I'd pick out are the addition of support for translating Add instructions and for lowering returns. It is currently being tests with the AArch64 backend. r260549, r260562, r260600.

  • The AArch64 backend gained support (including a scheduling model) for the Qualcomm Kryo CPU. r260686.

  • LoopUnrollAnalyzer has been abstracted out from LoopUnrollPass, and gained unit tests for its functionality. r260169.

  • llvm-config gained preliminary Windows support. r260263.

  • The details of the convergent attribute have been clarified in the language reference. The convergent attribute will now be removed on functions which provably don't converge or invoke any convergent functions. r260316, r260319.

Clang commits

  • It is now possible to perform a 3-stage Clang build using CMake. It is suggested in the commit message this may be useful for detecting non-determinism in the compiler by verifying stage2 and stage3 are identical. r260261.

  • ARMv8.2-A can be targeted using appropriate Clang options. r260533.

  • Clang's CMake build system learned the CLANG_DEFAULT_CXX_STDLIB to set the default C++ standard library. r260662.

Other project commits

  • The new LLD ELF linker gained initial link-time optimisation support. r260726.

  • LLDB has seen some more updates for Python 3 support, though not yet enough for a clean testsuite run. r260721.

by Alex Bradbury ( at February 15, 2016 11:58 AM

February 08, 2016


LLVM Weekly - #110, Feb 8th 2016

Welcome to the one hundred and tenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Slides from the LLVM devroom at FOSDEM last weekend are now available online. Unfortunately there was an issue with the recording of the talks so videos will not be available.

JavaScriptCore's FTL JIT is moving away from using LLVM as its backend, towards B3 (Bare Bones Backend). This includes its own SSA IR, optimisations, and instruction selection backend.

Source tarballs and binaries are now available for LLVM and Clang 3.8-RC2.

The Zurich LLVM Social is coming up this Thursday, February 11th at 7pm.

Jeremy Bennett has written up a comparison of the Clang and GCC command-line flags. The headline summary is that 397 work in both GCC and LLVM, 433 are LLVM-only and 598 are GCC-only.

vim-llvmcov has been released. It is a vim plugin to show code coverage using the llvm-cov tool.

On the mailing lists

  • Mehdi Amini has posted an RFC on floating point environment and rounding mode handling in LLVM. The work started all the way back in 2014 and has a whole bunch of patches up for review. Chandler Carruth has responded with a detail description of his concerns about the current design, and his proposed alternative seems to be getting a lot of positive feedback.

  • Morten Brodersen has recently upgraded a number of applications from the old JIT to the new MCJIT under LLVM 3.7.1 but has found significant performance regressions. Some other respondents have seen similar issues, either in compilation time or in reduced code quality in the generated code. Some of the thread participants will be providing specific examples so they can be investigated. It's possible the issue is something as simple as a different default somewhere. Benoit Belley noted they saw regressions due to their frontend's use of allocas in 3.7.

  • Lang Hames kicked off a long discussion about error handling in LLVM libraries. Lang has implemented a new scheme and is seeking feedback on it. There's a lot of discussion that unfortunately I haven't had time to summarise properly. If error handling design interests you, do get stuck in.

  • Adrian McCarthy has written up details on the recent addition of minidump support to LLDB. Minidumps are the Windows equivalent of a core file.

  • Juan Wajnerman is looking at adding support for multithreading to the Crystal language, and has a question about thread local variables. LLVM won't re-load the thread local address, which causes issues when a thread local variable is read in a coroutine running on one thread which is then suspended and continued on a different thread. This is apparently a known issue, covered by PR19177.

  • Steven Wu has posted an RFC on embedding bitcode in object files. The intent is to upstream support that already exists in Apple's fork. Understandably some of the respondents asked how this relates to the .llvmbc section that the Thin-LTO work is introducing. Steven indicates it's pretty much the same, but for Mach-O rather than ELF and that he hopes to unify them during the upstreaming.

LLVM commits

  • LLVM now has a memory SSA form. This isn't yet used by anything in-tree, but should form a very useful basis for a variety of analyses and transformations. This patch has been baking for a long time, first being submitted for initial feedback in April last year. r259595.

  • A new loop versioning loop-invariant code motion (LICM) pass was introduced. This enables more opportunities for LICM by creating a new version of the loop guarded by runtime checks to test for potential aliases that can't be determined not to exist at compile-time. r259986.

  • LazyValueInfo gained an intersect operation on lattice values, which can be used to exploit multiple sources of facts at once. The intent is to make greater use of it, but already it is able to remove a half range-check when performing jump-threading. r259461.

  • The SmallSet and SmallPtrSet templates will now error out if created with a size greater than 32. r259419.

  • The ability to emit errors from the backend for unsupported features has been refactored, so BPF, WebAssembly, and AMDGPU backends can all share the same implementation. r259498.

  • A simple pass using LoopVersioning has been added, primarily for testing. The new pass will fully disambiguate all may-aliasing memory accesses no matter how many runtime checks are required. r259610.

  • The way bitsets are used to encode type information has now been documented. r259619.

  • You can now use the flag -DLLVM_ENABLE_LTO with CMake to build LLVM with link-time optimisation. r259766.

  • TableGen's AsmOperandClass gained the IsOptional field. Setting this to 1 means the operand is optional and the AsmParser will not emit an error if the operand isn't present. r259913.

  • There is now a scheduling model for the Exynos-M1. r259958.

Clang commits

  • Clang now has builtins for the bitreverse intrinsic. r259671.

  • The option names for profile-guided optimisations with the cc1 driver have been modified. r259811.

Other project commits

  • AddressSanitizer now supports iOS. r259451.

  • The current policy for using the new ELF LLD as a library has been documented. r259606.

  • Polly's new Sphinx documentation gained a guide on using Polly with Clang. r259767.

by Alex Bradbury ( at February 08, 2016 04:44 PM

LLVM Weekly - #109, Feb 1st 2016

Welcome to the one hundred and ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

The GNU Tools Cauldron 2016 has been announced for the 9th-11th of September 2016, in Hebden Bridge, UK.

The Sulong project has been announced. It is an LLVM IR interpreter using the Truffle framework and Graal on the JVM to support JIT compilation.

Ehsan Akhgari has posted an updated on building Firefox with clang-cl. It is now possible to build a complete Firefox with Clang without using the MSVC fallback once.

I've mentioned it down below in the list of notable commits, but it's worth calling out here too: the old autoconf build-system has now been removed from LLVM. 3.8 will be the last release to include it. Time to switch to CMake if you haven't already.

John Regehr gave a talk about undefined behaviour in LLVM at the Paris LLVM meetup, and you can find the slides here.

On the mailing lists

LLVM commits

  • The autoconf build system for LLVM has been removed. r258861.

  • The WebAssembly backend gained support for unaligned loads and stores. r258779.

  • LLVM's MCAsmSreamer will now always use .p2align rather than .align, because .align's behaviour can differ between targets. r258750.

  • Intrinsic IDs are now looked up by binary search rather than the previous more complex mechanism. This improves the compile time of Function.cpp. r258774.

  • TargetSelectionDAGInfo has been renamed to SelectionDAGTargetInfo and now lives in CodeGen rather than Target. r258939.

  • A LoopSimplifyCFG pass was added to canonicalise loops before running through passes such as LoopRotate and LoopUnroll. r259256.

Clang commits

  • The clang-cl driver will now warn for unknown arguments rather than erroring, to match the behaviour of MSVC. r258720.

  • The old autoconf build system was removed from Clang. r258862.

  • The 'sancov' (SanitizerCoverage) tool gained some documentation. r259000.

Other project commits

  • libcxx gained an implementation of ostream_joiner. r259014, r259015.

  • lld gained a new error function which won't cause process exit. The hope is this can be used to provide a gradual path towards lld-as-a-library. r259069.

  • The lit runner for the LLVM test suite can now be passed --param=profile=perf which will cause each test to be run under perf record. r259051.

by Alex Bradbury ( at February 08, 2016 04:43 PM

January 25, 2016


LLVM Weekly - #108, Jan 25th 2016

Welcome to the one hundred and eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

LLVM 3.8 RC1 has been released. Now is the time to test it out with your favourite projects and report any issues.

The deadline for the EuroLLVM call for papers is today.

Version 1.6 of the Rust programming language was released las week. Rust uses LLVM for its code generation.

The LLVM Social in Paris will be held this week on Wednesday.

On the mailing lists

LLVM commits

  • llvm::SplitModule gained a new flag which can be used to cause it to attempt to split the module without globalizing local objects. r258083.

  • The WebAssembly backend will now rematerialize constants with multiple uses rather than holding them live in registers, as there is no code size saving in using registers in for constants in most cases in the WebAssembly encoding. r258142.

  • Some small patches from the global instruction selection effort have started to land, such as the introduction of a generic machine opcode for ADD (G_ADD) and the all-important CMake support for building it. r258333, r258344.

  • getCacheLineSize was added to TargetTransformInfo. It's currently only used by PPCLoopDataPrefetch. r258419.

  • LoopIdiomRecognize improved in its ability to recognise memsets. r258620.

Clang commits

  • A number of new AST matchers were added. r258042, r258072, and more.

  • The LeakSanitizer documentation has been updated with a usage example. r258476.

Other project commits

  • The new ELF linker gained initial support for MIPS local GOT (global offset table) entries. r2583888.

  • The LLVM test suite now contains a ClangAnalyzer subdirectory containing tests for the static analyzer. r258336.

by Alex Bradbury ( at January 25, 2016 12:48 PM

January 18, 2016


LLVM Weekly - #107, Jan 18th 2016

Welcome to the one hundred and seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

I have a very exciting piece of non-LLVM news to share this week. On Saturday I proposed to my partner Carrie Anne, and I'm delighted to report that she said yes. You may well question if this piece of personal news has any relevance to you, and in response I'd like to highlight just how important Carrie Anne is to this weekly newsletter. For over two years now, I've given up 2-3+ hours of my time every week without fail on evenings and weekends, time we could really be spending together as a couple. Without Carrie Anne's understanding and support LLVM Weekly couldn't exist. 2016 is going to be a very exciting year.

News and articles from around the web

Registration is now open for EuroLLVM 2016. The conference will be held in Barcelona on March 17th-18th. The call for papers closes on January 25th.

Registration is open for the Clang/LLVM development sprint to be held on the weekend of Feb 6th/7th at Bloomberg's London and New York offices.

The next Cambridge LLVM social will be held on Wednesday 20th January at 7.30pm, and will be colocated with the FreeBSD social.

On the mailing lists

LLVM commits

  • The ORC JIT API now supports remote JITing over an RPC interface to a separate process. The LLI tool has been updated to use this interface. r257305, r257343.

  • The Hexagon backend gained a target-independent SSA-based data flow framework for representing data flow between physical registers and passes using this to implement register liveness analysis, dead code elimination, and copy propagation. r257447, r257480, r257485, r257490.

  • The documentation on committing code reviewed on Phabricator to trunk has been improved. r257764.

  • WebAssembly gained a prototype instruction encoder and disassembler based on a temporary binary format. r257440.

  • LLVM's MathExtras gained a SaturatingMultiplyAdd helper. r257352.

  • llvm-readobj has much-expanded support for dumping CodeView debug info. r257658.

  • The code that finds code sequences implementing bswap or bitreverse and emits the appropriate intrinsic has been rewritten. r257875.

  • The AMDGPU backend gained a new machine scheduler for the Southern Islands architecture. r257609.

Clang commits

  • A Python implementation of scan-build has been added. r257533.

  • The 'interrupt' attribute is now supported on x86. r257867.

  • Clang learned to respond to the -fsanitize-stats flag. It can currently only be used with control-flow integrity and allows statistics to be dumped. r257971.

Other project commits

  • The compiler-rt CMake buildsystem gained experimental support for tvOS and watchOS. r257544.

  • Initial support was added for PPC and the new ELF linker. r257374.

  • The CMake and Lit runners in the LLVM test-suite can now support the integer C and C++ tests from SPEC CPU2006. r257370.

by Alex Bradbury ( at January 18, 2016 01:33 PM

January 11, 2016


LLVM Weekly - #106, Jan 11th 2016

Welcome to the one hundred and sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

Many readers may be interested that last week was the 3rd RISC-V Workshop. You can find slides from the two lowRISC talks here and here. You may also want to read my liveblog of the event.

News and articles from around the web

The BSD Now podcast recently interviewed Alex Rosenberg about his work on LLVM/Clang and FreeBSD.

The folks at QuarksLab have shared a Clang hardening cheat sheet.

LLDB 3.8 will feature initial Go debugging support.

The next Paris LLVM Social will be held on January 27th and includes a talk from John Regehr.

The next Zurich LLVM Social will be taking place on January 14th.

On the mailing lists

LLVM commits

  • LLVM gained the -print-funcs option which can be used to filter IR printing to only certain functions. r256952.

  • The LLVM ADT library gained a new sum type abstraction for pointer-like types and an abstraction for embedding an integer within a pointer-like type. r257282, r257284.

  • LLVM now recognises the Samsung Exynos M1 core. r256828.

  • InstCombine learned to expose more constants when comparing getelementptrs (GEPs) by detecting when both GEPs could be expressed as GEPs with the same base pointer. r257064.

  • SelectionDAGBuilder will set NoUnsignedWrap for an inbounds getelementptr and for load/store offsets. r256890.

  • AArch64 MachineCombine will now allow fadd and fmul instructions to be reassociated. r257024.

  • Macro emission in DWARFv4 is now supported. r257060.

  • llvm-symbolizer gained the -print-source-context-lines option to print source code around the line. r257326.

Clang commits

  • Clang's CMake build system can now perform a multi-stage bootstrap build with profile-guided optimisation. r256873.

  • Clang's command line frontend learned to handle a whole bunch of -fno-builtin-* arguments. r256937.

  • The new ELF LLD linker will now be used for th AMDGPU target. r257175.

Other project commits

  • The performance of string table construction in the LLD ELF linker has been improved. This improves link time of lld by 12% from 3.50 seconds to 3.08 seconds. r257017.

  • The LLD ELF linker gained support for the AMDGPU target. r257023.

by Alex Bradbury ( at January 11, 2016 01:15 PM

January 04, 2016


LLVM Weekly - #105, Jan 4th 2016

Welcome to the one hundred and fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

Happy new year! This issue marks the second anniversary of LLVM Weekly. It's rather short as the past week has been very quiet, with most LLVM developers seemingly taking a break over the holidays. My colleague Wei Song and myself will be presenting about lowRISC at the 3rd RISC-V workshop on Wednesday this week. Do say hi if you're going to be there.

The canonical home for this issue can be found here at

News and articles from around the web

Sanjoy Das has written a blog post about issues with LLVM's undef value. Interestingly, he provides an example where undef can actually inhibit optimisations.

On the mailing lists

LLVM commits

  • The -align-all-loops and -align-all-functions arguments have been introduced to force function or loop alignments for testing purposes. r256571.

  • The x86 backend has added intrinsics for reading and writing to the flags register. r256685.

Clang commits

  • Various Clang classes have been converted to use the TrailingObjects helper. r256658, r256659, and more.

  • __readeflags and __writeeflags intrinsics are exposed in Clang. r256686.

Other project commits

  • In libcxx, undefined behaviour in <list> has been fixed for builtin pointer types and support added for the next ABI version. r256652.

by Alex Bradbury ( at January 04, 2016 02:07 PM

Philip Reames

A perspective on friendly C

I was talking about John Regehr’s Friendly C proposal and recent follow-on post tonight with a friend, and decide to jot down some thoughts in a sharable format.

I believe the idea of a friendly C variant is entirely feasible, but it posses an incredibly challenging design problem.  Every change considered needs to be validated against a deep knowledge of the implementation of the associated compiler, runtime environment, and the underlying hardware.

As a simple example, let’s consider trying to establish semantics for stray (i.e. out of bounds) read and writes.  We can start by trying to define what happens for a stray read.  That’s fairly easy, we can simply return an undefined value.  We could even be a bit more restrictive and say that the value must be one which is written to that address by some part of the program.

(The vagueness in that last bit is to allow concurrent execution reordering.  However, we accidentally required atomic reads and writes since we disallowed wording tearing.  Is that a good thing or not?  There’s a cost to that, but maybe it’s a cost we’re willing to pay.  Or maybe not…)

Now let’s consider how to handle stray writes.  We could simply define them to be erroneous, but that simply gets us back to undefined behavior in C/C++.  We’re trying to avoid that.  We either need to detect them, or provide a reasonable semantics.  Detecting arbitrary stray writes is a very hard problem.  We can easily handle specific categories of stray writes through techniques like implicit null checking, but detecting an arbitrary stray write requires something like a full address-sanitizer (or possibly even more expensive checks).  I doubt anyone is willing to pay 2x performance for their C code to be more friendly.  If they were, why are they writing in C?

The challenge with having defined stray writes is what does a particular read return?  Does it return the last written value to a particular address?  Or the last value written to the particular field of the given object?  With out of bounds writes, these are not necessarily the same.

It’s very tempting to have the read return the last value written to the underlying address, but that introduces a huge problem.  In particular, it breaks essentially all load-load forwarding.

int foo(int* p_int, float p_float) {
 int a = *p_int;
 *p_float = 0.0;
 return a - *p_int;

In the example above, your normal C compiler could return “0” because it assumes the intervening write can’t change the value at p_int.  An implementation of a friendly C variant with the semantics we’ve proposed could not.  In practice, this is probably unacceptable from a performance perspective; memory optimization (load-load forwarding and associated optimizations) is a huge part of what a normal C/C++ compiler does.  (see: BasicAA, MDA. GVN, DSE, EarlyCSE in LLVM)

If we want to avoid that problem, we could try to be more subtle in our definition.  Let’s say we instead defined a read as returning either the last value written to that field (i.e. in bounds write) or underlying memory address (i.e. stray write).  We still have the problem of requiring atomic memory access, but we seem to have allowed the compiler optimization we intended.

The problem with this definition is that we’ve introduced a huge amount of complexity to our language specification and compiler.  We now have to have separate definitions of both our objects, their underlying addresses, and all the associated implementation machinery.

Another approach would be to define a read as returning either the last value written to the field (if no stray write has occurred to that address) or an undefined value (if a stray write to that address has occurred).  Is that friendly enough?

Moreover, what happens if we improve our ability to detect stray writes?  Are we allowed to make that write suddenly fail?  Is a program which functions only because of a stray write correct?

(Before you dismiss this as ridiculous, I personally know of an emergency software release that did nothing but reintroduce a particular stray memory write in a C++ program because it happened to restore behavior that a client had been relying on for many many years.)

Hopefully, I’ve given you a hint of the complexities inherent in any friendly C proposal.  These are the same complexities involved in designing any new language.  If anything designing a workable friendly-C proposal is harder than designing a new language.  At least with a new language you’d have the freedom to change other aspects of the language to avoid having to deal with garbage pointers entirely; in practice, that’s often the much easier approach.


by reames at January 04, 2016 04:32 AM

December 28, 2015


LLVM Weekly - #104, Dec 28th 2015

Welcome to the one hundred and fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

The schedule for the LLVM devroom at FOSDEM has been published. This will be on January 30th 2016 in Brussels at FOSDEM.

Andy Finnell spent some time over the Christmas vacation porting the LLVM Kaleidoscope tutorial to Erlang and has kindly shared the fruits of his labours.

Richard Pennington has written another blog post about ELLCC, this time about using it to cross-compile the Linux kernel for the Raspberry Pi.

Tim Jones (lecturer at the University of Cambridge Computer Laboratory) has written about the alias analysis used in the HELIX compiler. There's nothing LLVM-specific here, indeed it was implemented using ILDJIT but should be of general interest to compiler developers.

On the mailing lists

LLVM commits

  • An initial implementation of an LLVMCodeView library has landed. This implements support for emitting debug info in the CodeView format. r256385.

  • lit has gained support for a per-test timeout which can be set using --timeout=. r256471.

  • All uses of edge wights in BranchProbabilityInfo have been replaced with probabilities. r256263.

  • The LLVM project documentation on patch reviews via Phabricator now has advice on choosing reviewers. r256265.

  • The gc.statepoint intrinsic's return type is now a token type rather than i32. r256443.

Clang commits

  • ASTtemplateKWAndArgsInfo and ASTTemplateArgumentListInfo have been converted to use the TrailingObjects header. This abstracts away reinterpret_cast, pointer arithmetic, and size calculations needed for the case where a class has some other objects appended to the end of it. r256359.

Other project commits

  • Development of LLD's new ELF linker is continuing, with support for new relocations on x86, x86-64, and MIPS. r256143, r256144, r256172, r256416.

by Alex Bradbury ( at December 28, 2015 03:50 PM

December 21, 2015


LLVM Weekly - #103, Dec 21st 2015

Welcome to the one hundred and third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

Regular readers will know about lowRISC, a not-for-profit project a group of us founded aiming to produce a complete open-source System-on-Chip in volume. We've just hit a new milestone with the untethering of the base SoC. If you're interested in contributing, the blog post contains a number of potential starting points.

News and articles from around the web

The 6th EuroLLVM conference will be held on March 17th-18th in Barcelona, Spain. The call for papers is now open and will remain open until January 25th 2016. EuroLLVM CFP

Chandler Carruth's keynote, "Understanding compiler optimizations" from the Meeting C++ 2015 conference is now online.

Richard Pennington has blogged about bootstrapping LLVM and Clang using pre-compiled ELLCC binaries.

Bloomberg is going to be holding a weekend Clang and LLVM hackathon in NYC and in London on February 6th and 7th. The event will be open to everyone in the community and Bloomberg will provide space, power, food, beverages, and internet access.They're looking for experienced Clang and LLVM developers to help as mentors.

On the mailing lists

LLVM commits

  • LLVM IR now supports floating point atomic loads and stores. r255737.

  • New attributes have been introduced: InaccessibleMemOnly (a function may only access memory that is not accessible by the module being compiled) and InaccessibleMemOrArgMemOnly (a function may only access memory that is either not accessible by the module being compiled or is pointed to by its pointer arguments). r255778.

  • The PowerPC backend gained support for soft float operations on ppc32. r255516.

  • The terminatepad instruction has been removed from LLVM IR. r255522.

  • IR call instructions can now take a fast-math flags marker which indicates fast-math flags may allow otherwise unsafe optimisations. r255555.

  • LLVM gained a C++11 ThreadPool in its internal library. It is intended to be used for ThinLTO. r255593.

  • The default set of passes has been adjusted. mem2reg will not be run immediately after globalopt and more scalar optimization passes have been added to the LTO pipeline. r255634.

  • The llvm-profdata tool now supports specifying a weight when merging profile data. This can be used to give more relative importance to one of multiple profile runs. r255659.

  • For CMake builds, a compile_commands.json file will now be generated which tells tools like YouCompleteMe and clang_complete how to build each source file. r255789.

  • The Hexagon VLIW packetizer saw a large update (though unfortunately the changes aren't summarised in the commit message). r255807.

  • A number of LLVM's C APIs have been depreciated: LLVMParseBitcode, LLVMParseBitcodeInContext, LLVMGetBitcodeModuleInContext and LLVMGetBitcodeModule. These have been replaced with new versions of the functions which don't record a diagnostic. r256065.

  • The AVR backend (which is being imported incrementally) gained and r256120.

Clang commits

  • A new checker has been introduced to detect excess padding in classes and structs. r255545.

  • A new control-flow integrity mode was introduced, cross-DSO CFI allows control flow to be protected across shared objects. It is currently marked experimental. r255694.

  • Clang's CMake build system now supports generating profile data for Clang. r255740, r256069.

Other project commits

  • It is now possible to suppress reports from UndefinedBehaviourSanitizer for certain files, functions, or modules at runtime. r256018.

  • The llvm test-suite's CMake+Lit runner gained support for SPEC2000 and SPEC CPU95. r255876, r255878.

by Alex Bradbury ( at December 21, 2015 10:13 PM

December 15, 2015


LLVM Weekly - #102, Dec 14th 2015

Welcome to the one hundred and second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

Version 1.5 of the Rust programming language has been released. Rust of course uses LLVM as its backend.

George Balatsouras has written a blog post on compiling a project using autotools to LLVM bitcode.

On the mailing lists

LLVM commits

  • A new minimum spanning tree based method of instrumenting code for profile-guided optimisation was added. This guarantees the minimum number of CFG edges are instrumented. r255132.

  • MatchBSwap in InstCombine will now also detect bit reversals. r255334.

  • Sample-based profile-guided optimisation memory usage has been reduced by 10x by changing from using a DenseMap for sample records to a std::map. r255389.

  • An Instruction::getFunction method was added. It's perhaps surprising this didn't exist before. r254975.

  • FP16 vector instructions defined in ARMv8.2-A are now supported. r255010.

  • The EarlyCSE (common subexpression elimination) pass learned to perform value forwarding for unordered atomics. r255054.

  • Debug info in LLVM IR can now refer to macros. r255245.

  • LLVM's developer policy has been updated to detail the currently accepted C API stability policy and other guidelines. r255300.

  • A massive rework of funclet-oriented exception handling (needed for Windows exceptions) has landed. r255422.

Clang commits

  • Clang gained an option to use the new ThinLTO pipeline. r254927.

  • Hexagon will use the integrated assembler by default. r255127.

  • dllexport and dllimport attributes are now exposed through the libclang API. r255273.

Other project commits

  • ThreadSanitizer gained initial support for PPC64. r255057.

by Alex Bradbury ( at December 15, 2015 09:41 AM

December 07, 2015


LLVM Weekly - #101, Dec 7th 2015

Welcome to the one hundred and first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

The implementation of the Swift programming language is now open source. Rather than being a simple code dump, development will now occur out in the open with external contributions encouraged. If you haven't already, now might be a good time to watch Joseph Groff and Chris Lattner's talk on the Swift Intermediate Language.

Rui Ueyama wrote about the new LLD ELF linker on the official LLVM blog.

The Visual C++ team have released Clang with Microsoft CodeGen. This uses the Clang parser along with the code generator and optimizer from Visual C++. The majority of the Clang and LLVM changes will be contributed back upstream.

Alex Denisov wrote about using the LLVM API with Swift.

If you haven't already submitted your talk proposal for the LLVM devroom at FOSDEM, you've now got a little more time. Get your submission in by this Friday.

On the mailing lists

LLVM commits

  • llc and opt gained an option to run all passes twice. This is intended to help show up bugs that occur when using the same pass manager to compile multiple modules. r254774.

  • An initial prototype for llvm-dwp has been committed. This will eventually be a tool for building a DWARF package file out of a number of .dwo split debug files. r254355.

  • All weight-based interfaces in MachineBasicBlock have now been replaced with probability-based interfaces. r254377.

  • LLVM's STLExtras gained a range-based version of std::any_of and std::find. r254391, r254390.

  • llvm.get.dynamic.area.offset.{i32,264} intrinsics have been added. These can be used to get the address of the most recent dynamic alloca. r254404.

  • The X86 backend gained a new pass to reduce code size by removing redundant address recalculations for LEA. r254712.

  • The WebAssembly backend now has initial support for varargs. r254799.

Clang commits

  • Design docs have been added for forward-edge CFI for indirect calls. r254464.

  • The pass_object_size attribute was added to Clang. This intended to be used to work around cases where __builtin_object_size doesn't function. r254554.

  • Documentation was added for UndefinedBehaviorSanitizer. r254733.

Other project commits

  • LLD now supports the R_MIPS_HI16/LO16 relocations. r254461.

  • libomp can now make use of libhwloc on Unix to discover topology of the host system. r254320.

by Alex Bradbury ( at December 07, 2015 11:52 AM

November 30, 2015


New ELF Linker from the LLVM Project

We have been working hard for a few months now to rewrite the ELF support in lld, the LLVM linker. We are happy to announce that it has reached a significant milestone: it is now able to bootstrap LLVM, Clang, and itself and pass all tests on x86-64 Linux and FreeBSD with the speed expected of an LLVM project.

ELF is the standard file format for executables on Unix-like systems, such as Linux and BSDs. GNU ld and GNU gold are commonly used linkers for such systems today. In many use cases, the linker is a black box for which only speed matters. Depending on program size, linking a program takes from tens of milliseconds to more than a minute. We designed the new linker so that it runs as fast as possible. Although no serious benchmarking or optimization has been conducted yet, it is consistently observed that the new lld links the LLVM/Clang/lld executables in about half the time of GNU gold. Generated executables are roughly the same size. lld is not at feature parity with gold yet, so it is too early to make a conclusion, but we are working hard to maintain or improve lld’s speed while adding more features.

lld is command-line compatible with GNU ld so that it can be used as a drop-in replacement. This does not necessarily mean that we are implementing all the features of the GNU linkers in the same way as they did. Some features are no longer relevant for modern Unix-like systems and can be removed. Some other features can be implemented in more efficient ways than those in the traditional linkers. Writing a new linker from scratch is a rare occasion. We take advantage of this opportunity to simplify the linker while keeping compatibility with the existing linkers for normal use.

The new ELF linker is a relatively small program which currently consists of about 7000 lines of C++ code. It is based on the same design as the PE/COFF (Windows) support in lld, so the design document for the PE/COFF support is directly applicable to the ELF support.

The older ELF support still exists in lld repository in parallel with the new one. Please be careful to not confuse the two. They are separated at the top directory and do not share code. You can run the new linker with ld.lld command or by passing -fuse-ld=lld to Clang when linking.

We are still working on implementing remaining functionality such as improved linker script support or improved support for architectures beyond x86_64. If you are interested in the new linker, try it out for yourself.

by Rui Ueyama ( at November 30, 2015 04:29 PM

LLVM Weekly - #100, Nov 30th 2015

Welcome to the one hundredth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

Eagle-eyed readers will note we've now reached issue 100, marking 100 weeks of uninterrupted service and of course meaning there's just 28 weeks to go until an important numerical milestone.

The canonical home for this issue can be found here at

News and articles from around the web

There is going to be an LLVM Devroom at FOSDEM next year and the call for proposals closes on December 1st. Get your submissions in!

Most slides from the recent LLVM in HPC workshop have now been posted.

Jeff Trull has posted a great blog post on fuzzing C++ code with AFL and libFuzzer.

On the mailing lists

LLVM commits

  • A number of patches related to ARMv8.2-A have landed. Public documentation doesn't seem to have been released for this architecture revision, but the patches indicate some of the new features including: persistent memory instruction and FP16 instructions. You can see the patches still in review here. r254156, r254198.

  • A series of helper functions from SelectionDAGNodes have been exposed (isNullConstant, isNullFPConstant, isAllOnesConstant, isOneConstant). These helpers can help simplify code in your target's ISelLowering. r254085.

  • The WebAssembly backend's block placement algorithm has been improved. r253876.

  • Tests generated from utils/ are now marked as autogenerated. r253917.

Clang commits

  • DataRecursiveASTVisitor has been removed, and RecursiveASTVisitor can be used in its place. This resulted in the removal of 2912 lines of code. r253948.

  • Sparc and SparcV9 default to using an external assembler again. r254199

  • Functions with the interrupt attribute are now supported for mips32r2+. r254205.

Other project commits

  • A single DataFlowSanitizer or ThreadSanitizer-instrumented binary can now run on both 39-bit virtual address space and 42-bit virtual address space AArch64 platforms. r254151, r254197.

  • lldb gained a for generating bindings. r254022.

by Alex Bradbury ( at November 30, 2015 12:41 PM

November 23, 2015


LLVM Weekly - #99, Nov 23rd 2015

Welcome to the ninety-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at

News and articles from around the web

LLVM/Clang 3.7.1-rc2 has been tagged. As always, help testing is appreciated.

Clasp 0.4 has been released. Clasp is a new Common Lisp implementation that uses LLVM as a compiler backend and aims to offer seamless C++ interoperation.

On the mailing lists

LLVM commits

  • Initial support for value profiling landed. r253484.

  • It is now possible to use the -force-attribute command-line option for specifying a function attribute for a particular function (e.g. norecurse, noinline etc). This should be very useful for testing. r253550.

  • The WebAssembly backend gained initial prototype passes for register coloring (on its virtual registers) and register stackifying. r253217, r253465.

  • The built-in assembler now treats fatal errors as non-fatal in order to report all errors in a file rather than just the first one encountered. r253328.

  • As discussed on the mailing list last week, lane masks are now always precise. r253279.

  • Support for prelinking has been dropped. See the commit message for a full rationale. r253280.

  • llvm-lto can now be used to emit assembly rather than object code. r253622, r253624.

Clang commits

  • Clang should now be usable for CUDA compilation out of the box. r253389.

  • When giving the -mcpu/-march options to Clang targeting ARM, you can now specify +feature. r253471.

Other project commits

  • Compiler-rt gained support for value profiling. r253483.

  • The 'new ELF linker' is now the default ELF linker in lld. r253318.

  • The LLVM test suite gained support for running SPEC2000int and SPEC2006int+fp with PGO and reference inputs. r253362.

by Alex Bradbury ( at November 23, 2015 02:20 PM

November 16, 2015


LLVM Weekly - #98, Nov 16th 2015

Welcome to the ninety-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback to, or @llvmweekly or @asbradbury on Twitter.

This week's issue comes to you from Vienna where I'm just about to head home from a short break (so apologies if it's a little later than usual and perhaps a little less detailed). I'll admit that nobody has actually written in to beg that LLVM Weekly share travel tips, but I will say that Vienna is a beautiful city that's provided lots to do over the past few days. If you're visiting, I can strongly recommend Salm Bräu for good beer and food.

The canonical home for this issue can be found here at

News and articles from around the web

All of the LLVM Dev Meeting Videos are now up, and will stay up. This includes Chris Lattner and Joseph Groff's talk on Swift's IR. You can also find most of the slides here. The folks at Quarkslab have also posted a trip report.

The big news this week is that code derived from NVIDIA's PGI Fortran compiler is to be open-sourced and a production-grade Fortran front-end to LLVM produced. This project is a collaboration between the US NNSA (National Nuclear Security Administration), NVIDIA, and the Lawrence Livermore, Sandia, and Los Alamos national laboratories. Hal Finkel has shared a little more on the LLVM mailing list. With a source code release not due for about another year, where does this leave the existing Flang efforts? The hope is that parts of Flang will be merged with the PGI release. Douglas Miles from the PGI team has also shared a mini-FAQ. Fortran announcement

Bjarne Stroustrup has shared a detailed trip report from the last C++ Standards Meeting.

This post over at the Include Security Blog delves in to some details of support for the SafeStack buffer overflow protection in LLVM.

At the official LLVM blog, a new post gives a very useful guide on how to reduce your testcases using bugpoint and custom scripts. As the post notes, bugpoint is a very powerful tool but can be difficult to use.

On the mailing lists

LLVM commits

  • LLVM's autoconf-based build system is now officially deprecated, with the CMake build system being preferred. r252520.

  • Do you want to compile CUDA code with Clang and LLVM? There's now some handy documentation describing how to do so. See also Jingyue's talk from the recent LLVM Dev Meeting. r252660.

  • A simple MachineInstruction SSA pass for PowerPC has been added. The implementation is short and straight-forward, so worth a read if you want to do some MI-level peephole optimisations for your target. r252651.

  • Basic support for AArch64's address tagging has been added. In AArch64, the top 8 bits of an address can be used to store extra metadata with these bits being masked out before going through address translation. r252573.

  • The Hexagon backend now supports assembly parsing. r252443.

  • The CMake build system gained a new LLVMExternalProjectUtils module. As an example, this is used with the LLVM test suite which can be set up to be rebuilt whenever the in-tree clang or lld change. This could also be used with compiler-rt or libcxx. r252747.

  • An 'empty token' is now defined (written as token empty) for when using tokens in LLVM IR. r252811.

  • LibFuzzer gained a new experimental search heuristic, drill. As the comment in FuzzerLoop.cpp explains, this will 1) read+shuffle+execute+minimize the corpus, 2) choose a random unit, 3) reset the coverage, 4) start fuzzing as if the chosen unit was the only element of the corpus, 5) reset the coverage again when done, 6) merge the newly created corpus into the original one. r252838.

  • A BITREVERSE SelectionDAG node and a set of llvm.bitreverse.* intrinsics have been introduced. The intention is that backends should no longer have to reimplement similar code to match instruction patterns to their own ISA's bitreverse instruction. See also the patch to the ARM backend that replaces ARMISD::RBIT with ISD::BITREVERSE. r252878, r253047.

Clang commits

  • Support for __attribute__(internal_linkage) was added. This is much like C's static keyword, but applies to C++ class methods. r252648.

  • Clang now supports GCC's __auto_type extension, with a few minor enhancements. r252690.

Other project commits

  • libcxx gained initial support for building with mustl libc. Primarily this is a new CMake option, necessary as Musl doesn't provide a macro to indicate its presense. r252457).

by Alex Bradbury ( at November 16, 2015 12:18 PM

November 13, 2015


Reduce Your Testcases with Bugpoint and Custom Scripts

LLVM provides many useful command line tools to handle bitcode: opt is the most widely known and is used to run individual passes on an IR module, and llc invokes the backend to generate an assembly or object file from an IR module. Less known but very powerful is bugpoint, the automatic test case reduction tool, that should be part of every developer's toolbox.

The bugpoint tool helps to reduce an input IR file while preserving some interesting behavior, usually a compiler crash or a miscompile. Multiple strategies are involved in the reduction of the test case (shuffling instructions, modifying the control flow, etc.), but because it is oblivious to the LLVM passes and the individual backend specificities, "it may appear to do stupid things or miss obvious simplifications", as stated in the official description. The documentation gives some insights on the strategies that can be involved by bugpoint, but the details are beyond the scope of this post.

Read on to learn how you can use the power of bugpoint to solve some non-obvious problems.

Bugpoint Interface Considered Harmful

Bugpoint is a powerful tool to reduce your test case, but its interface can lead to frustration (as stated in the documentation: "bugpoint can be a remarkably useful tool, but it sometimes works in non-obvious ways"). One of the main issue seems to be that bugpoint is ironically too advanced! It operates under three modes and switches automatically among them to solve different kind of problem: crash, miscompilation, or code generation (see the documentation for more information on these modes). However it is not always obvious to know beforehand which mode will be activated and which strategy bugpoint is actually using.

I found that for most of my uses, I don't want the advanced bugpoint features that deal with pass ordering for example, and I don't need bugpoint to detect which mode to operate and switch automatically. For most of my usage, the `compile-custom` option is perfectly adequate: similar to
`git bisect`, it allows you to provide a script to bugpoint. This script is a black box for bugpoint, it needs to accept a single argument (the bitcode file to process) and needs to return 0 if the bitcode does not exhibit the behavior you're interested in, or a non zero value in the other case. Bugpoint will apply multiple strategies in order to reduce the test case, and will call your custom script after each transformation to validate if the behavior you're looking for is still exhibited. The invocation for bugpoint is the following:

$ ./bin/bugpoint -compile-custom -compile-command=./ -opt-command=./bin/opt my_test_case.ll

The important part is the two options -compile-custom and that indicate to bugpoint that it should use your own script to process the file. The other important part is the -opt-command option that should point to the correct opt that will be used to reduce the test case. Indeed by default bugpoint will search in the path for opt and may use an old system one that won't be able to process your IR properly, leading to some curious error message:

*** Debugging code generator crash!
Checking for crash with only these blocks:  diamond .preheader .end: error: Invalid type for value
simplifycfg failed!

Considering such a script ``, running it with your original test case this way:

$ ./ my_test_case.ll && echo "NON-INTERESTING" || echo "INTERESTING"

should display INTERESTING before you try to use it with bugpoint, or you may very well be surprised. In fact bugpoint considers the script as a compile command. If you start with an NON-INTERESTING test case and feed it to bugpoint, it will assume that the code compiles correctly, and will try to assemble it, link it, and execute it to get a reference result. This is where bugpoint behavior can be confusing when it automatically switches mode, leaving the user with a confusing trace. A correct invocation should lead to a trace such as:

./bin/bugpoint  -compile-custom  -compile-command=./  -opt-command=./bin/opt slp.ll 
Read input file      : 'slp.ll'
*** All input ok
Initializing execution environment: Found command in: ./
Running the code generator to test for a crash: 
Error running tool:
  ./ bugpoint-test-program-1aa0e1d.bc
*** Debugging code generator crash!

Checking to see if we can delete global inits: <crash>

*** Able to remove all global initializers!
Checking for crash with only these blocks:    .lr.ph6.preheader .preheader .backedge  ._crit_edge.loopexit... <11 total>: <crash>
Checking for crash with only these blocks: .preheader .backedge .lr.ph6.preheader: 
Checking for crash with only these blocks: ._crit_edge: 
Checking instruction:   store i8 %16, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 15), align 1, !tbaa !2

*** Attempting to perform final cleanups: <crash>
Emitted bitcode to 'bugpoint-reduced-simplified.bc'

In practice the ability to write a custom script is very powerful, I will go over a few use cases I recently used bugpoint with.

Search For a String in the Output

I recently submitted a patch ( for a case where the loop vectorizer didn't kick-in on a quite simple test case. After fixing the underlying issue I needed to submit a test with my patch. The original IR was a few hundred lines. Since I believe it is good practice to reduce test cases as much as possible, bugpoint is often my best friend. In this case the analysis result indicates "Memory dependences are safe with run-time checks" on the output after my patch.

Having compiled `opt` with and without my patch and copied each version in `/tmp/` I wrote this shell script:


/tmp/opt.original -loop-accesses -analyze $1 | grep "Memory dependences are safe"
/tmp/opt.patched -loop-accesses -analyze $1 | grep "Memory dependences are safe"
[[ $res_original == 1 && $res_patched == 0 ]] && exit 1
exit 0 

It first runs the bitcode supplied as argument to the script (the $1 above) through opt and uses grep to check for the presence of the expected string in the output. When grep exits, $? contains with 1 if the string is not present in the output. The reduced test case is valid if the original opt didn't produce the expected analysis but the new opt did.

Reduce While a Transformation Makes Effects

In another case (, I patched the SLP vectorizer and I wanted to reduce the test case so that it didn't vectorize before my changes but vectorizes after:

set -e

/tmp/opt.original -slp-vectorizer -S > /tmp/original.ll $1
/tmp/opt.patched -slp-vectorizer -S > /tmp/patched.ll $1
diff /tmp/original.ll /tmp/patched.ll && exit 0
exit 1

The use of a custom script offers flexibility and allows to run any complex logic to decide if a reduction is valid or not. I used it in the past to reduce crashes on a specific assertion and avoiding the reduction leading to a different crash, or to reduce for tracking instruction count regressions or any other metric.

Just Use FileCheck

LLVM comes with a Flexible pattern matching file verifier (FileCheck) that the tests are using intensively. You can annotate your original test case and write a script that reduce it for your patch. Let's take an example from the public LLVM repository with commit r252051 "[SimplifyCFG] Merge conditional stores". The associated test in the validation is test/Transforms/SimplifyCFG/merge-cond-stores.ll ; and it already contains all the check we need, let's try to reduce it. For this purpose you'll need to process one function at a time, or bugpoint may not produce what you expect: because the check will fail for one function, bugpoint can do any transformation to another function and the test would still be considered "interesting". Let's extract the function test_diamond_simple from the original file:

$ ./bin/llvm-extract -func=test_diamond_simple test/Transforms/SimplifyCFG/merge-cond-stores.ll -S > /tmp/my_test_case.ll

Then checkout and compile opt for revision r252050 and r252051, and copy them in /tmp/opt.r252050 and /tmp/opt.r252051. The script is then based on the CHECK line in the original test case:


# Process the test before the patch and check with FileCheck,
# this is expected to fail.
/tmp/opt.r252050 -simplifycfg -instcombine -phi-node-folding-threshold=2 -S < $1 | ./bin/FileCheck merge-cons-stores.ll

# Process the test after the patch and check with FileCheck,
# this is expected to succeed.
/tmp/opt.r252051 -simplifycfg -instcombine -phi-node-folding-threshold=2 -S < $1 | ./bin/FileCheck merge-cons-stores.ll

# The test is interesting if FileCheck failed before and
# succeed after the patch.
[[ $original != 0 && $patched == 0 ]] && exit 1
exit 0

I intentionally selected a very well written test to show you both the power of bugpoint and its limitation. If you look at the function we just extracted in my_test_case.ll for instance:

; CHECK-LABEL: @test_diamond_simple
; This should get if-converted.
; CHECK: store
; CHECK-NOT: store
; CHECK: ret
define i32 @test_diamond_simple(i32%pi32%qi32 %ai32 %b) {
  %x1 = icmp eq i32 %a0
  br i1 %x1label %no1label %yes1

  store i32 0i32%p
  br label %fallthrough

  %z1 = add i32 %a%b
  br label %fallthrough

  %z2 = phi i32 [ %z1%no1 ], [ 0%yes1 ]
  %x2 = icmp eq i32 %b0
  br i1 %x2label %no2label %yes2

  store i32 1i32%p
  br label %end

  %z3 = sub i32 %z2%b
  br label %end

  %z4 = phi i32 [ %z3%no2 ], [ 3%yes2 ]
  ret i32 %z4

The transformation introduced in this patch allows to merge the stores in the true branches yes1 and yes2:

declare void @f()

define i32 @test_diamond_simple(i32%pi32%qi32 %ai32 %b) {
  %x1 = icmp eq i32 %a0
  %z1 = add i32 %a%b
  %z2 = select i1 %x1i32 %z1i32 0
  %x2 = icmp eq i32 %b0
  %z3 = sub i32 %z2%b
  %z4 = select i1 %x2i32 %z3i32 3
  %0 = or i32 %a%b
  %1 = icmp eq i32 %00
  br i1 %1label %3label %2

; <label>:2 ; preds = %entry
  %simplifycfg.merge = select i1 %x2i32 %z2i32 1
  store i32 %simplifycfg.mergei32%palign 4
  br label %3

; <label>:3 ; preds = %entry, %2
  ret i32 %z4

The original code seems pretty minimal, the variable and block names are explicit, it is easy to follow and you probably wouldn't think about reducing it. For the exercise, let's have a look at what bugpoint can do for us here:

define void @test_diamond_simple(i32%pi32 %b) {
  br i1 undeflabel %fallthroughlabel %yes1

yes1:                  ; preds = %entry
  store i32 0i32%p
  br label %fallthrough

fallthrough:           ; preds = %yes1, %entry
  %x2 = icmp eq i32 %b0
  br i1 %x2label %endlabel %yes2

yes2:                  ; preds = %fallthrough
  store i32 1i32%p
  br label %end

yes2:                  ; preds = %yes2, %fallthrough
  ret void

Bugpoint figured out that the no branches were useless for this test and removed them. The drawback is that bugpoint also has a tendency to introduce undef or unreachable here and there, which can make the test more fragile and harder to understand.  

Not There Yet: Manual Cleanup

At the end of the reduction, the test is small but probably not ready to be submitted with your patch "as is". Some cleanup is probably still needed: for instance bugpoint won't convert invoke into calls,  remove metadata, tbaa informations, personality function, etc. We also saw before that bugpoint can modify your test in unexpected way, adding undef or unreachable. Also you probably want to rename the variables to end up with a readable test case.

Fortunately, having the script at hand is helpful in this process, since you can just manually modify your test and run continuously the same command:

$ ./ my_test_case.ll && echo "NON-INTERESTING" || echo "INTERESTING"

While the result is  INTERESTING you know you keep having a valid test and you can continue to proceed with your cleanup.

Keep in mind that bugpoint can do far more, but hopefully this subset will be helpful to the ones that are still struggling with its command line options.

Finally, I'm grateful to Manman Ren for her review of this post.

by Joker-eph ( at November 13, 2015 04:18 AM