Planet Clang

August 18, 2014

LLVM Blog

LLVM Weekly - #33, Aug 18th 2014

Welcome to the thirty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects.LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Binaries for LLVM/Clang 3.5RC2 are now available for testing. Try it on your codebases, and be sure to report any regressions.

LDC 0.14.0 has been released. LDC is an LLVM-based compiler for the D programming language. There's a mixture of new features and bug fixes, see the release notes for full details of what's changed.

Viva64, who sell the PVS-Studio static analyzer has written up their experiences of using the Clang static analyzer on the PVS-Studio codebase. It managed to find 12 issues which the blog author considers genuine bugs.

On the mailing lists

LLVM commits

  • FastISel for AArch64 will now make use of the zero register when possible and supports more addressing modes. r215591, r215597.

  • MIPS gained support for the .end, .end, .frame, .mask, and .fmask assembler directives. r215359.

  • ARM gained the MRS/MSR system instructions. r215700.

Clang commits

  • Documentation has been added describing how the Language options in .clang-format files works. r215443.

  • Prefetch intrinsics were added for ARM and AArch64. r215568, r215569.

  • The logic for the -include command line parameter is now properly implemented. r215433.

Other project commits

  • LLD now has initial support for ELF/AArch64. r215544.

  • UndefinedBehaviourSanitizer gained a returns-nonnull sanitizer. This verifies that functions annotated with returns_nonnull do return nonnull pointers. r215485.

  • A number of lldb tests now compile on Windows. r215562.

by Alex Bradbury (noreply@blogger.com) at August 18, 2014 12:07 PM

August 11, 2014

LLVM Blog

LLVM Weekly - #30, Jul 28th 2014

Welcome to the thirtieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Nuno Lopes, David Menendez, Santosh Nagarakatte, and John Regehr have written about ALIVe. This is a very promising tool that aims to aid the specification and proof of peephole optimisations (such as those currently found in LLVM's InstCombine). It uses an SMT solver in order to prove optimisations correct (and if incorrect, provides a counter-example).

Source and binaries for the first LLVM/Clang 3.5 Release Candidate are now available. If you like your LLVM releases to be on-time and regression-free, do your part and test them on your codebases.

Thomas Ströder and colleagues have recently published a paper "Proving Termination and Memory Safety for Programs with Pointer Arithmetic" which creates symbolic execution graphs from LLVM IR in order to perform its analysis. The preprint is available here.

The next Cambridge (UK) LLVM Social will be on the 30th July, at 7.30 pm.

On the mailing lists

LLVM commits

  • Support for scoped noalias metadata has been added. The motivation for this is to preserve noalias function attribute information when inlining and to model block-scope C99 restrict pointers. r213864, r213948, r213949.

  • The llvm-vtabledump tool is born. This will dump vtables inside object files. Right now it only supports MS ABI, but will in the future support Itanium ABI vtables as well. r213903.

  • The llvm.assume intrinsic has been added. This can be used to provide the optimizer with a condition it may assume to be true. r213973.

  • The loop vectorizer has been extended to make use of the alias analysis infrastructure. r213486.

  • Various additions have been made to support the PowerPC ELFv2 ABI. r213489, r213490, and more.

  • The R600 backend gained an instruction shrinking pass, which will convert 64-bit instructions to 32-bit when possible. r213561.

  • The llvm.loop.vectorize.unroll metadata has been renamed to llvm.loop.interleave.count. r213588.

  • LLVM 3.5 release notes for MIPS have been committed, if you're interested in seeing a summary of work in the last development cycle. r213749.

  • The IR backward compatibility policy is now documented. r213813.

Clang commits

  • Support for #pragma unroll was added. r213574.

  • Clang learned a range of AVX-512 intrinsics. r213641.

  • Work on MS ABI support continues. r214004.

Other project commits

  • A dynamic loader for the Hexagon DSP was committed to lldb as well as an ABI description. r213565, r213566.

  • A new fast-path implementation of C++ demangling has been added to lldb. It promises significantly better performance. r213671.

by Alex Bradbury (noreply@blogger.com) at August 11, 2014 11:16 AM

LLVM Weekly - #31, Aug 4th 2014

Welcome to the thirty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Renato Golin has posted a summary of feedback from his talk on LLVM and GCC collaboration at the GNU Tools Cauldron. This both summarises the main areas he's looking for collaboration, and the feedback that people gave at the event or afterwards.

This blog post describes how to use Obfuscator-LLVM to to obfuscate Android NDK binaries.

On the mailing lists

LLVM commits

  • FastISel for AArch64 saw a number of improvements, including support for shift-immediate, arithmetic with overflow intrinsics. r214345, r214348, and more.

  • The SLPVectorizer has seen a largeish commit that implements an "improved scheduling algorithm". Sadly the commit message offers no further details. r214494.

  • TargetInstrInfo gained isAsCheapAsMove which takes a MachineInstruction and returns true if that instruction is as cheap as a move instruction. r214158.

  • LLVM libraries can now be exported as importable CMake targets, making it easier for those building LLVM-based projects. This is now documented. r214077.

  • Release notes for PowerPC changes during 3.5 development have been committed. r214403.

  • Initial work towards supporting debug locations for fragmented variables (e.g. by-value struct arguments passed in registers) has been committed. r214576.

Clang commits

  • Work on support for the MSVC ABI continues. Clang will now consider required alignment constraints on fields. r214274.

  • AddressSanitizer now passes source-level information from Clang to ASan using metadata rather than by creating global variables. r214604.

  • The PowerPC backend now support selection of the ELFv1/ELFv2 ABI via the -mabi= option. r214074.

Other project commits

  • lld gained support for interworking between thumb and ARM code with Mach-O binaries. r214140.

  • A massive ABI testsuite (contributed by Sony) has been committed to the test-suite repo. r214126.

by Alex Bradbury (noreply@blogger.com) at August 11, 2014 11:16 AM

LLVM Weekly - #32, Aug 11th 2014

Welcome to the thirty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Some readers may be interested to know that lowRISC, a project to produce a fully open-source SoC started by a number of us at the University of Cambridge Computer Lab has been announced. We are hiring.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Codeplay contributed the LLDB MI (Machine Interface) frontend a while ago, and have now committed some additional features. To coincide with that, they've published a series of blog posts covering the MI driver's implementation, how to set it up from within Eclipse, and how to add support for new MI commands.

McSema, a framework for transforming x86 programs to LLVM bitcode has now been open-sourced. The talk about McSema from the ReCON conference is also now online.

Registration for the LLVM Developer's Meeting 2014 is now open. The event will take place in San Jose on October 28th-29th. You have until September 1st to submit your talk/BoF/poster/tutorial proposal.

On the mailing lists

LLVM commits

  • Initial work on the MachineCombiner pass landed. This estimates critical path length of the original instruction sequence vs a transformed (combined) instruction sequence and chooses the faster code. An example given in the commit message is choosing between add+mul vs madd on AArch64, and a followup commit implements MachineCombiner for this target. r214666, r214669.

  • A few useful helper functions were added to the LLVM C API: LLVM{IsConstantString, GetAsString, GetElementAsConstant}. r214976.

  • A whole load of AVX512 instructions were added. r214719.

  • FastISel for AArch64 now support basic argument lowering. r214846.

  • A flag has been added to experiment with running the loop vectorizer before the SLP vectorizer. According to the commit message, eventually this should be the default. r214963.

  • The old JIT is almost dead, it has been removed (for those not paying close attention, 3.5 has already been branched so still contains the old JIT). However, the patch was then reverted, so it's in zombie status. r215111.

  • AArch64 gained a load balancing pass for the Cortex-A57, which tries to make maximum use of available resources by balancing use of even and odd FP registers. r215199.

Clang commits

  • Thread safety analysis gained support for negative requirements to be specified. r214725.

  • Coverage mapping generation has been committed. The -fcoverage-mapping command line option can be used to generate coverage mapping information, which can then be combined with execution counts from instrumentation-based profiling to perform code coverage analysis. r214752.

  • A command line option to limit the alignment that the compiler can assume for an arbitrary pointer. r214911.

Other project commits

  • LLDB's FileSpec class learned to understand Windows paths. r215123.

  • LLDB learned a whole bunch of new commands and features for its Machine Interface. r215223.

  • OpenMP gained PowerPC64 support. r215093.

by Alex Bradbury (noreply@blogger.com) at August 11, 2014 11:15 AM

Sylvestre Ledru

clang 3.4, 3.5 and 3.6 are now coinstallable in Debian

Clang is finally co installable on Debian. 3.4, 3.5 and the current trunk (snapshot) can be installed together.

So, just like gcc, the different version can be called with clang-3.4, clang-3.5 or clang-3.6.

/usr/bin/clang, /usr/bin/clang++, /usr/bin/scan-build and /usr/bin/scan-view are now handled through the llvm-defaults package.

llvm-defaults is also now managing clang-check, clang-tblgen, c-index-test, clang-apply-replacements, clang-tidy, pp-trace and clang-query.

Changes are also available on llvm.org/apt/.
The next step will be to manage also llvm-defaults on llvm.org/apt to simplify the transition for people using these packages.

So, with:

# /etc/apt/sources.list
deb http://llvm.org/apt/unstable/ llvm-toolchain main
deb http://llvm.org/apt/unstable/ llvm-toolchain-3.4 main
deb http://llvm.org/apt/unstable/ llvm-toolchain-3.5 main
$ apt-get install clang-3.4 clang-3.5 clang-3.6

$ clang-3.4 --version
Debian clang version 3.4.2 (branches/release_34) (based on LLVM 3.4.2)
Target: x86_64-pc-linux-gnu
Thread model: posix


$ clang-3.5 --version
Debian clang version 3.5.0-+rc2-1~exp1 (tags/RELEASE_350/rc2) (based on LLVM 3.5.0)
Target: x86_64-pc-linux-gnu
Thread model: posix


$ clang-3.6 --version
Debian clang version 3.6.0-svn214990-1~exp1 (trunk) (based on LLVM 3.6.0)
Target: x86_64-pc-linux-gnu
Thread model: posix

by Sylvestre at August 11, 2014 05:47 AM

July 21, 2014

LLVM Blog

LLVM Weekly - #29, Jul 21st 2014

Welcome to the twenty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects.LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

This is a special extended issue which I'm choosing to subtitle "LLVM Weekly visits the GNU Tools Cauldron". The event took place over the weekend and had a wide range of interesting talks. You can find my notes at the end of this newsletter. Talks were recorded and the videos should be made available in the next month or two.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The eighth annual LLVM Developers meeting has been announced and will take place on October 28th and 29th in San Jose, CA. It is looking for sponsors and talk/poster submissions.

A new blog post as been published on the LLVM Blog giving more details on FTL: WebKit's LLVM-based JIT.

A tentative schedule for the release of LLVM/Clang 3.5 has been posted.

Botond Ballo has posted a summary of June's C++ Standards Committee Meeting.

On the mailing lists

LLVM commits

  • A dereferenceable attribute was added. This indicates that the parameter or return pointer is dereferenceable (i.e. can be loaded from speculatively without a risk of trapping). This is subtly different to the nonnull attribute which doesn't necessarily imply dereferenceability (you might for instance have a pointer to one element past the end of an array). r213385.

  • A new subtarget hook was added to allow targets to opt-out of register coalescing.r213078, r213188.

  • A MergedLoadStoreMotion pass was added. r213396.

  • RegionInfo has been templatified to it works on MachineBasicBlocks. r213456.

  • A monster patch from Nvidia adds a whole bunch of surface/texture intrinsics to the NVPTX backend. r213256.

  • Support was added for emitting warnings if vectorization is forced and fails. r213110.

  • Improvements to FastISel continue with the implementation of the FastLowerCall hook for X86. This actually reproduces what was already being done in X86, but is refactored against the target independent call lowering. r213049.

  • The ARM dmb, dsb and isb intrinsics have been implemented for AARch64. r213247.

Clang commits

  • Clang's rewrite engine is now a core feature (i.e. it can not be disabled at configure time). r213171.

  • Error recovery when the programmer mistypes :: as : was improved. r213120.

  • The AARch64 Clang CLI interface proposal for -march has been implemented. See the commit message for details. r213353.

  • OpenMP work continues with the addition of initial parsing and semantic analysis for the final, untied and other clauses, and the master directive. r213232, r213257, r213237, and more.

Other project commits

  • The 'Kalimba' platform is now supported by lldb (presumably this refers to the CSR processor). r213158.

LLVM Weekly at the GNU Tools Cauldron

For full details on the conference and details on the speakers for the talks I've summarised below see the GNU Tools Cauldron 2014 web page. Apologies for any inaccuracies, please do get in touch if you spot anything I may have noted incorrectly. LLVM followers may be particularly interested in Renato Golin's talk on collaboration between the GCC and LLVM communities.

Glibc BoF

  • 2.20 is in "slushy" freeze mode. What else is left? fmemopen, fd locking, some -Wundef work
  • Anyone planning to check in something big for 2.21?
    • Mentor Graphics planning to check in a NIOS II port. They won't be accepted until Linux kernel patches are in a kernel release.
    • A desire for AArch64 ILP32 ABI to get in. Kernel patches currently in review, compiler work is ready.
    • OpenRISC
    • NaCl (nptl)
  • Benchmarking glibc? Does anyone have a good approach. There is a preload library approach (see notes from Ondrej's talk).
  • Glibc has been built with AddressSanitizer, help needed to get it integrated into the build system. There was a comment this would be nice to get in to distributions.
  • Red Hat are working on supporting alternate libm implementations, including a low-precision and high-precision implementation. Intel are looking to add math functions that work on small vectors.

Abigail: toward ABI taming

  • Want to determine if changes to your shared library break apps for users, and users want to know whether an updated library remains compatible with their code. The bidiff tool will tell you the differences in terms of ABI given two object files as its input.
  • libabi consists of modules such as a DWARF reader, the comparison engine. Tools such as bidiff are built on this API
  • What's next for libabigail?
    • bicompat will help application authors determine whether their application A is still compatibile with an updated version of a given library L by examining the undefined symbols of A that are resolved by L.
    • More amenable to automation (such as integration into build systems)
    • Support for un-instantiated templates. This would require declarations of uninstantiated templates to be represented in DWARF.
  • A first official release (though source is available at https://sourceware.org/libabigail/)

Writing VMs in Java and debugging them with GDB

  • Oracle Labs have been working on various dynamic language implementations in Java (e.g. Ruby, Python, R, JS, ...).
  • FastR is a reimplementation of R in Java featuring an interpreter (Truffle) and dynamic compiler (Graal).
  • Truffle and Graal starts with an AST interpreter. The first time a node is evaluated it is specialised to the type that was seen at runtime. Later the tree is compiled using partial evaluation.
  • It may be deployed on standard HotSpot (no compilation), GraalVM, or the SubstrateVM (SVM) which uses Graal to ahead-of-time compile the language implementation. Debugging the SVM is difficult as Java debugging tools are not available. The solution is to generate DWARF information in the SVM's output.
  • Truffle and Graal are open source, the SubstrateVM is not (yet?).

GCC and LLVM collaboration

  • Good news: license issues, personal grudges and performance are off-topic.
  • Users should be protected from whatever disagreements take place. In the future we should have more pro-active discussions on various issues as opposed to reactive discussions regarding e.g. compiler flags that have been noticed to be arbitrarily different after the fact.
  • Renato lists common projects that we may collaborate on: binutils, glibc, sanitizers. Sanitizers are a collaboration success story.
  • Can we agree on a (new?) common user interface?
  • There's a surprising amount of confusion about -march, -mtune, and -mcpu considering we're in a room of compiler developers. It sounds like there's not much support for re-engineering the set of compiler flags as the potential gain is not seen as being great enough.
  • Can we agree to standardise on attributes, C/C++ extensions, builtins, ASM, the linker API?
  • GCC docs have just been rewritten, so some criticisms about how difficult it is to dig in are no longer valid.

Machine Guided Energy Efficient Compilation

  • Initial investigations in 2012 found that compiler flags can have a meaningful effect on energy consumption. This raises the question of how to determine which flags to use.
  • MAGEEC will target both GCC and LLVM initially. It is implemented as a compiler plugin which performs feature extraction and allows the output of the machine learning algorithm to change the sequence of passes which are run. Fractional Factorial Design is used to reduce the optimisation space to explore.
  • Turning passes on/off arbitrarily can often result in internal compiler errors. Should the machine learning algorithm learn this, or should GCC better document pass requirements?
  • It would be useful to MAGEEC if the (currently internal) plugin API could be stabilized. They also currently have to use a hacked up Clang as it doesn't provide plugin hooks.
  • The project has produced a low cost energy measurement board as well as their own benchmark suite (Bristol/Embecosm Embedded Benchmark Suite, or BEEBS). BEEBS 2.0 is schedule for release by 31st August 2014 with a much wider range of benchmarks (currently 93). Jeremy showed a rather pleasing live demo where you can run a benchmark on a microcontroller development board and immediately find the number of mJ consumed in running it.
  • The current state of the project has it not achieving better results than GCC O2, but this is expected to change over the coming months.

Just-in-time compilation using GCC

  • libgccjit.so is an experimental branch of GCC which allows you to build GCC as a shared library and embed it in other programs in order to allow in-process code generation at runtime.
  • A dedicated API for JIT will allow better stability guarantees. It provides a high-level API designed for ease of use.
  • The API doesn't offer solutions for type inference, escape analysis, unboxing, inline caching, etc.
  • It has a C++ API wich includes some cunning operator overloading to massively reduce verbosity, and a Python API.
  • David Malcolm has written Coconut, a JIT compiler for Python using libgccjit.so. It is incomplete and experimental.
  • Drawback: currently have to write out a .s to a file and invoke gcc on it.
    Some might make a cheeky comment about the benefits of architecting a compiler so it can be used as a library, but I of course wouldn't dare. The good news is the speaker is actively looking at what would be needed to use GAS and GNU ld as a library.

Introduction to new Intel SIMD ISA and its impact on GCC

  • AVX-512 offers 64 simple precision or 32 double precision floating point operations per cycle. It also has 8x64-bit mask registers.
  • Rounding modes can be set on a per-instruction process
  • Basic support is available from GCC 4.9.x.

News from Sanitizers

  • MemorySanitizer detects use of uninitialized memory. Increases CPU by about 2.5x and RAM by 2x. Was released in LLVM in 2013. It is currently Linux/x86-64 only.
  • History growth is limited by limiting the history depth and the number of new history nodes per stack trace.
  • MSan has found hundreds of bugs across Google internal code, Chromium, LLVM, etc. It was more challenging for Chromium due to the number of system libs that had to be rebuilt.
  • AddressSanitizer annotations allows you to detect access to the regions of e.g. std::vector<> which has been allocated as part of its capacity but not yet been used (i.e. will start to be used in the next push_back). Next is to do the same for std::string and std::deque.
  • Glibc uses GNU-C instead of ANSI C which currently prevents compilation with Clang (nested functions in particular are problematic). It can however be built with ASan by GCC.
  • Evgeniy comments that the lack of standardisation between Clang and GCC for things like __has_feature(address_sanitizer) vs __SANITIZE_ADDRESS__ is irritating. This is just the sort of thing Renato was talking about yesterday of course.

glibc performance tuning

  • Use memset as an example. Look at 3 variants.
  • Writing a useful benchmark is more difficult than you might think. Simply running memset many times in a loop is not a good benchmark when using the same memory locations due to the processor's load-store forwarding. Even when fixing this, the branch predictor may perform much better than it would when memset is used in a real world scenario and lead to unrepresentative results.
  • To move beyond microbenchmarks, Ondrej has been using LD_PRELOAD to link against instrumented versions of the functions which record details about the time taken.
  • See herefor memset benchmarks and here for more background.
  • strcmp was the most frequently called glibc function in Ondrej's testing (when running Firefox).

Devirtualization in GCC

  • This is a special case of indirect call removal, and although the talk is given in the context of C++ the techniques apply to other languages too. Some basic cases are handled in the front-end and even specified by the language standard.
  • It is a special case of constant propagation across aggregates, which is already done by Global Value Numbering and Interprocedural Constant Propagation. But these passes only catch a tiny number of possible cases.
  • Loss of information between the frontend and middle end can make some cases almost impossible. The intermediate language can be extended with explicit representations of base types, locations of virtual table pointers, and vtables. Also annotate polymorphic calls specifying instance and polymorphic call type and flags to denote constructors/destructors.
  • I'm not able to summarise details on the GCC devirt implementation better than the slides do. Hopefully they'll be made available online.
  • A particular challenge is to match types between different compilation units. The C++ One Definition Rule is used.
  • It can be used to strengthen unreachable function removal.
  • Feedback-directed devirtualization was extended in GCC 4.9 to work inter-module with LTO.

by Alex Bradbury (noreply@blogger.com) at July 21, 2014 10:31 AM

July 17, 2014

LLVM Blog

FTL: WebKit’s LLVM based JIT

Over the past year, the WebKit project made tremendous progress on the ability to optimize JavaScript applications. A major part of that effort was the introduction of the Fourth Tier LLVM (FTL) JIT. The Fourth Tier JIT targets long-running JavaScript content and performs a level of optimization beyond WebKit's interpreter, baseline JIT, and high-level optimizing JIT. See the FTL Optimization Strategy section below for more on WebKit's tiered optimizations. The engineering advancements within WebKit that made the FTL possible were described by Filip Pizlo in the Surfin' Safari Blog post, Introducing the WebKit FTL JIT. On April 29, 2014, the WebKit team enabled FTL by default on trunk: r167958.

This achievement also represents a significant milestone for the LLVM community. FTL makes it clear that LLVM can be used to accelerate a dynamically type checked languages in a competitive production environment. This in itself is a tremendous success story and shows the advantage of the highly modular and flexible design of LLVM. It is the first time that the LLVM infrastructure has supported self-modifying code, and the first time profile guided information has been used inside the LLVM JIT. Even though this project pioneered new territory for LLVM, it was in no way an academic exercise. To be successful, FTL must perform at least as well as non-FTL JavaScript engines in use today across a range of workloads without compromising reliability. This post describes the technical aspects of that accomplishment that relate to LLVM and future opportunities for LLVM to improve JIT compilation and the LLVM infrastructure overall.

Read on for more information.

FTL Performance

JavaScript pages are ubiquitous and users expect fast load times, which WebKit's architecture is well suited for. However, some JavaScript applications require nontrivial computation and may run for periods longer than one hundred milliseconds. These applications demand aggressive compiler optimization and code generation tuned for the target CPU. FTL brings the full gamut of compiler technology to bear on the problem.

As with any high level language, high level optimizations must come first. Grafting an optimizing compiler backend onto an immature frontend would be futile. The marriage of WebKit's JIT with LLVM's optimizer and code generation works for two key reasons:

  1. Before translating to LLVM IR, WebKit's optimizing JIT operates on an IR that clearly expresses JavaScript semantics. Through type inference and profile-driven speculation, WebKit removes as much of the JavaScript abstraction penalty as possible.
  2. LLVM IR has now adopted features for supporting speculative, profile-driven optimization and avoiding the performance penalty associated with abstractions when they cannot be removed.
As a result, WebKit can engage the FTL on any long-running JavaScript method. In areas of the code dominated by abstraction overhead, FTL-compiled code is at least competitive with that of a custom JIT designed specifically for JavaScript. In areas of the code where WebKit can remove the abstraction penalty, FTL can achieve fantastic speedups.

Asm.js is a subset if JavaScript that avoids abstraction penalties, allowing JITs to directly benefit from low-level performance optimization. Consequently, the performance advantage of FTL is likely to be quite apparent on asm.js benchmarks. But although FTL performs well on asm.js, it is in no way customized to the standard. In fact, with FTL, regular JavaScript code written in a style similar to asm.js will derive the same benefits. Furthermore, as WebKit's high-level optimizations become even more advanced, the benefits of FTL will expand to a broader set of idiomatic JavaScript code.

A convenient way to measure the impact of LLVM optimizations on JavaScript code is by running C/C++ benchmarks that have been compiled to asm.js code via emscripten. This allows us to compare native C/C++ performance with WebKit's third tier (DFG) compiler and with WebKit FTL.

Figure 1: Time to run benchmarks from LLVM test-suite.
Figure 1 shows the time taken to run a handful of benchmarks from LLVM's own test-suite. The benchmark workloads have been adjusted to run for approximately one second. In every case, FTL achieves significant improvement over WebKit's non-LLVM JIT (DFG). In some cases, the emscripten compiled JavaScript code is already approaching native C performance, but in other cases FTL code still takes about twice as long as clang compiled C code[1]. One reason for the discrepancy between clang and FTL is the call overhead required for maintaining the JavaScript runtime's additional frame information. Another reason is that LLVM loop optimizations are not yet sophisticated enough to remove bounds and overflow checks and thus have not been enabled. These benchmarks are very tight loops, so a minor inefficiency, such as an extra compare or store in the loop, can result in a significant slowdown.

[1] gcc-loops is currently an outlier because clang performance recently sped up dramatically from auto-vectorization that has not been enabled yet in FTL.

FTL Optimization Strategy

WebKit's tiered architecture provides flexibility in balancing responsiveness, profile collection, and compiler optimization. The first tier is the low-level interpreter (LLInt). The second is the baseline JIT--a straightforward translation from JavaScript to machine code. WebKit's third tier is known as the Data Flow Graph (DFG) JIT. The DFG has its own high-level IR allowing it to perform aggressive JavaScript-specific optimization based on the profile data collected in earlier tiers. When running as a third tier, the DFG quickly emits code with additional profiling hooks. It may be invoked again as a fourth tier, but this time it produces LLVM IR for traditional compiler optimization.

Figure 2. The DFG and FTL JIT optimization pipelines (from Introducing the WebKit FTL JIT).
We reuse most of the DFG phases. The new FTL pipeline is a drop-in replacement for the third-tier DFG backend. It involves additional JavaScript-aware optimizations over DFG SSA form, followed by a phase that lowers DFG IR to LLVM IR. We then invoke LLVM's optimization pipeline and LLVM's MCJIT backend to generate machine code.

The DFG JIT front end generates LLVM IR in a form that is amenable to the same optimizations traditionally performed with C code. The most notable differences are summarized in FTL-Style LLVM IR.

Figure 3. The FTL optimization pipeline after lowering to LLVM IR.
After lowering to LLVM IR, FTL applies a subset of mid-level optimizations that are currently the most important in JavaScript code. It then invokes the LLVM backend for the host architecture with full optimization. This optimizes the code for the target CPU using aggressive instruction selection, register allocation, and machine-specific optimization.

LLVM Patch Points

Patch points are the key LLVM feature that allows dynamic type checking, inline caching, and runtime safety checks without penalizing performance. In October, 2013, we submitted a proposal to amend LLVM IR with patch points to the LLVM developer list. Since then, we've successfully implemented patch points for multiple architectures and their performance impact has been validated for various use cases, including branch-to-fail safety checks, inline caches, and code invalidation points. The details of the current design are explained in the LLVM specification of stack map and patch point intrinsics.

Patch points are actually two features in one intrinsic. The first feature is the ability to identify the location of specific values at the intrinsic's final instruction address. During code emission, LLVM records that information as meta-data alongside the object code in what we call a "stack map". A stack map communicates to the runtime the location of important values. This is a slight misnomer given that locations may refer to register names. Typically, the runtime will read values out of stack map locations when it needs to reconstruct a stack frame. This commonly occurs during "deoptimization"--the process of replacing an FTL stack frame with a lower-tier frame.

The second feature of patch points is the ability of the runtime to patch the compiled code at specific instruction address. To allow this, the intrinsic reserves a fixed amount of instruction encoding space and records the instruction address of that space along with the stack map. Because the runtime needs to know the location of values precisely at the point it patches code, the two features must be combined into one intrinsic.

Patch points are viewed by LLVM passes much like unknown call sites. An important aspect of their design is the ability to specify the effective calling convention. For example, code invalidation points are almost never taken and the call site should not clobber any registers, otherwise the register allocator could be severely restricted by frequent runtime checks. An optional feature of stack maps is the ability to record the registers that are actually live in compiled code at each call site. This way the JIT can declare a call as preserving all registers to maximize compiler freedom, but at the same time the runtime can avoid unnecessary save and restore operations when the "cold" call is actually taken.

To better support inline cache optimizations, LLVM now has a special "anyregcc" calling convention. This convention allows any number of arguments to be forced into registers without pinning down the name of the register. Consequently, the compiler does not have to place arguments in particular registers or stack locations, or emit extra copies and spills around call sites, and the runtime can emit efficient patched code sequences that operate directly on registers.

The current patch point design is labeled experimental so that it may continue to evolve without preserving bitcode compatibility. LLVM should soon be ready to adopt the patch point intrinsic in its final form. However, the current design should first be extended to capture the semantics of high level language runtime checks. See Extending Patchpoints.

FTL-Style LLVM IR

FTL attempts to generate LLVM IR that closely resembles what the optimizer expects to see from other typical compiler frontends. Nonetheless, lowering JavaScript semantics into LLVM operations tends to result in IR with different characteristics from statically compiled C code. This section summarizes those differences. More details and examples will be provided in a subsequent blog post.

The prevalence of patch points in the IR means that values tend to have many more uses and can be live into a large number of patch point call sites. FTL emits patch points for a few distinct situations. First, when the FTL front end (DFG) fails to eliminate type checks or bounds checks, it emits explicit compare and branch operations in the IR. The branch target lands at a patch point intrinsic followed by unreachable. This can result in much more branchy code than LLVM typically handles with C benchmarks. Fortunately, LLVM's awareness of branch probability means that the branch-to-fail idiom does not excessively hinder optimization and code generation. Heap access and polymorphic calls also use patch points, but these are emitted directly inline with the hot path. This allows the runtime to implement inline caches with specific instruction sequences that can be patched as program behavior evolves. Finally, runtime calls may act as code invalidation points. A runtime event, such as a potential change in object layout, may invalidate speculatively optimized code. In this case WebKit emits nop patch points that can be overwritten with a separate runtime call at an invalidation event. This effectively invalidates all code that follows the original runtime call.

Some type checks result in multiple fast paths. For example, WebKit may check a numeric value for either a floating-point or fixed point representation and emit LLVM IR for both paths. This may result in a sequence of redundant checks interleaved with control flow merges.

To support integer overflow checks, when they cannot be removed through optimization, FTL emits llvm.sadd.with.overflow intrinsics in place of normal add instructions. These intrinsics ensure that the code generator produces an optimal code sequence for the overflow checks. They are also used by other active LLVM projects and are gradually gaining support within LLVM optimization passes.

LLVM heuristics are often sufficient to guess branch probability. However FTL makes the job easier by directly emitting LLVM branch weight meta-data based on profiling. This is particularly important when partially compiling a method starting at the inner loop. Such compilations can squash nested loops so that LLVM's heuristics can no longer infer the loop depth from the CFG structure.

FTL builds an internal model of the JavaScript program's type system determined by profiling. It conveys this information to LLVM via type-based-alias-analysis (tbaa) meta-data. In FTL tbaa, each object field has a unique tag. This is a very effective approach to memory disambiguation, and much simpler than the access-path scheme that clang now uses.

Another way that FTL deviates from the norm, is in its use of inttoptr instructions. These are used to materialize addresses of runtime objects, including all data and code from outside the current compilation unit (currently a single method at a time). inttoptr is also used to convert an untyped JS value to a pointer. Occasionally, pointer arithmetic is performed on non-pointer types rather than using getelementptr instructions. This is primarily a convenience and has not proven to hinder optimization. FTL's use of tbaa is effective enough to obviate the need to analyze getelementptr when the base address is already an unknown object.

An important pattern that occurs in FTL's LLVM IR is the repeated use of the same large constants that are used as masks to disambiguate tagged values, or several constants that represent global addresses that tend to be at small offsets from each other. LLVM's current one basic block a time code generation approach resulted in redundant rematerialization of the same large constant in each basic block. The fact that FTL creates a large number of basic blocks even further exacerbated this problem. The LLVM code generator has been enhanced to avoid these expensive repeated rematerialization of such constant values.

MCJIT and the LLVM C API

The FTL JIT successfully leverages LLVM's existing MCJIT framework for runtime compilation. MCJIT was designed as a low-level toolkit that allows runtime compilers to be built by reusing as much of the static compiler's machinery as possible. This approach improves maintainability on the LLVM side. It integrates with the existing compiler toolchain and allows developers to test features of the runtime compiler without understanding a particular JIT client. The current API, however, does not provide a simple out-of-the-box abstraction for portable JITs. Overcoming the impedance mismatch between WebKit goals and the low-level MCJIT API required close collaboration between WebKit and LLVM engineers. As LLVM becomes more important as a JIT platform, it should provide a more complete C API to improve interoperability with JIT clients and decrease the fragility and maintenance burden within the client code base.

Bridging the gap between LLVM internals and portable JITs can be accomplished by providing more convenience wrappers around the existing MCJIT framework and adding richer C APIs for object code parsing and introspection. Ideally, a cross-platform JIT client like WebKit should not need to embed target-specific details about LLVM code generation on the client side. The JIT should be able to request LLVM to emit code for the current host process without understanding LLVM's language of target triples and CPU features. LLVM could generally provide a more obvious C API for lazily invoking runtime compilation. Along these lines, a JIT should be able to reuse the MCJIT execution engine for multiple modules without the overhead of reinitializing pass manager instances each time. An API also needs to be added for configuring the code generation pass manager. Most of the coordination between the JIT and LLVM now occurs directly through a memory manager API, which can be awkward for the JIT client. For example, WebKit looks for platform-specific section names when allocating section memory in order to locate frame meta-data and debug information. A better interface for WebKit would be a portable API that communicates object code meta-data, including frame information and stack maps. In general, the JIT codebase should not need to provide its own support for platform-specific object file formats. LLVM already has this support, it only needs to be exposed through the C API. Similarly, a JIT should be able to lookup line numbers without implementing its own DWARF parser. An additional layer of functionality for general purpose debug info parsing and object code introspection would not be specific to JIT compilation and could benefit a variety of LLVM clients.

Linking WebKit with LLVM

FTL illustrates an important use case for LLVM: embedding LLVM optimization and codegen libraries cleanly within a larger application running in the same process. The ideal solution is to build a set of LLVM components as a shared library that exports only a limited C API. Several problems have made this a challenging endeavor:
  • The dynamic link time initialization overhead of the static initializers that LLVM defines is unacceptable at program launch time - especially if only parts of the library or nothing at all are used.
  • LLVM initializes global variables that require running exit-time destructors. This causes a multi-threaded parent application that attempts to exit normally to crash instead.
  • As with static initializers, weak vtables introduce an unnecessary and unacceptable dynamic link time overhead.
  • In general only a limited set of methods - the LLVM API - should be exported from the shared library.
  • LLVM usurps process-level API calls like assert, raise, and abort.
  • The resulting size of the LLVM shared library naively built from static libraries is larger than it needs to be. Build logic and conditional compilation should be added to ensure that only the passes and platform support required by the JIT client are ultimately linked into the shared library.
The issues listed above have required clever engineering tricks to circumvent. These are the sort of tricks that hinder adoption of LLVM. Therefore it would be in the best interest of the LLVM community to cooperate on improving the infrastructure for embedding LLVM.

FTL Efficiency

The LLVM optimizer and code generator are composed of generic, retargetable components designed to generate optimal code across an extremely diverse range of platforms. The compile time cost of this infrastructure is substantial and may be an order of magnitude greater than that of a custom-built JIT. Fortunately, WebKit's architecture for concurrent, tiered compilation largely sidesteps this penalty. Nonetheless, there is considerable opportunity to reengineer LLVM for use as a JIT, which will decrease FTL's CPU consumption and increase the breadth of JavaScript applications that benefit from FTL.

When running in a JIT environment, an opportunity exists for LLVM to strike a better balance between compile time and optimization strength. To this end, an alternate "compile-fast" optimization pass pipeline should be standardized so that the LLVM community can work together to maintain an ideal sequence of lighter-weight passes. Long running, iterative IR optimization passes, such as GVN, should be adapted to optionally run in fewer iterations. Hodge-podge passes like InstCombine that run many times should be optionally broken up so that some subset of functionality can run at different times: for example, canonicalize first and optimize later.

There are also considerable opportunities for improving code generation efficiency which will benefit JITs and static compilers alike. LLVM machine IR should be generated directly from LLVM IR without generating a Selection DAG, as proposed by Jakob Olesen in his Proposal for a global instruction selector. The benefit of this improvement would be considerable and widespread. More specific to high level languages, codegen passes should be tuned to handle branchy code more efficiently. For example, the register allocator can be taught to skip expensive analysis at points in the code where branches are not expected to be executed.

One overhead that will remain with the above improvements is simply the cost of bridging WebKit's DFG IR into LLVM IR. This involves lowering to SSA form and constructing LLVM instructions, which currently takes significant amount of time relative to DFG's non-LLVM codegen path. With some scrutiny, this could likely be made more efficient.

Optimization Improvements

Without incurring significant compile time increase, LLVM optimizations can be further improved to handle prevalent idioms in JavaScript programs. One straightforward LLVM IR enhancement would be to associate type-based alias information with call sites. This would improve redundant instruction elimination across runtime calls and patch points. Another area of improvement would be better handling of branch-and-merge idioms. These are quite common in FTL produced IR and can improved through CFG simplification, jump threading, or tail duplication. With careful pass pipeline management, loop optimizations can be enabled, such as auto-vectorization. Once LLVM is analyzing loops, bounds and overflow check elimination optimization can also be implemented. To do this well, patch points will need to be extended with new semantics.

Extending Patch Points

In settings like JavaScript and other high level languages, patch points will be used to transfer control to the runtime when speculative optimization fails in the sense that the program behaves differently than predicted. It is always safe to assume a misprediction and give control back to the runtime because the runtime always knows how to recover. Consequently, patch points could optionally be associated with a check condition and given the following semantics: the patch point code sequence must be executed whenever the condition holds, but may safely be executed at its current location under any superset of the condition. When combined with LLVM loop optimization, the conditional patch point semantics would allow powerful optimization of runtime checks. In particular, bounds and overflow checks could be safely hoisted outside loops. For example, the following simplified IR:


%a = cmp <TrapConditionA>
call @patchpoint(1, %a, <state-before-loop>)
Loop:
%b = cmp <TrapConditionB>
@patchpoint(2, %b, <state-inside-loop>)
<do something...>

Could be safely optimized into:

%c = cmp <TrapConditionC> // where C implies both A and B
@patchpoint(1, %c, <state-before-loop>)
Loop:
do something...
Note that the first patch point operand is an identifier that tells the runtime the program location of the intrinsic, allowing it find the correct stack map record for the program state at that location. After the above optimization, not only does LLVM avoid performing repeated checks within the loop, but it also avoids maintaining additional runtime state throughout the loop body.

Generally, high level optimization requiring knowledge of language-specific semantics is best performed on a higher level IR. But in this case, extending LLVM with one aspect of high level semantics allows LLVM's loop and expression analysis to be directly leveraged and naturally extended into a new class of optimization.

Conclusion

WebKit's FTL JIT already shows considerable value in improving JavaScript performance, demonstrating LLVM's remarkable success as a backend for a JavaScript JIT compiler. The FTL project highlights the value of further improving LLVM's JIT infrastructure and reveals several exciting opportunities: improved efficiency of optimization passes and codegen, optimizations targeted toward common idioms present in high level language, enabling more aggressive standard optimizations like vectorization, and extending and formalizing patch point intrinsics. Realizing these goals will require the continued support of the LLVM community and will advance and improve the LLVM project as a whole.

by Andrew Trick (noreply@blogger.com) at July 17, 2014 04:39 PM

July 14, 2014

LLVM Blog

LLVM Weekly - #28, Jul 14th 2014

Welcome to the twenty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'll be at the GNU Tools Cauldron 2014 next weekend, being held at the University of Cambridge Computer Laboratory (which handily is also where I work). If you're there, do say hi.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

An update on Clang/LLVM on Windows has been posted on the LLVM blog. Impressive progress has been made, and as I mentioned last week the MSVC compatibility page has been updated.

There is (somewhat amazingly) now a Pascal-86 frontend for LLVM. The compiler frontend is written entirely in Python. More information is available in the author's Master's thesis (via Phoronix).

On the mailing lists

LLVM commits

  • FastISel gained some infrastructure to support a target-independent call lowering hook as well as target-independent lowering for the patchpoint intrinsic. r212848, r212849.

  • DominanceFrontier has been templatified, so in theory it can now be used for MachineBasicBlocks (where previously it was only usable with BasicBlocks). r212885.

  • The quality of results for CallSite vs CallSite BasicAA queries has been improved by making use of knowledge about certain intrinsics such as memcpy and memset. r212572.

  • Work on overhauling x86 vector lowering continues. Chandler now reports that with the new codepath enabled, LLVM is now at performance pairty with GCC for the core C loops of the x264 code when compiling for SSE2/SSE3. r212610.

  • ASM instrumentation for AddressSanitizer is now generated entirely in MachineCode, without relying on runtime helper functions. r212455.

  • Generation of the new mips.abiflags section was added to the MIPS backend. r212519.

  • isDereferenceablePointer will now look through some bitcasts. r212686.

Clang commits

  • A new checker was added, to flag code that tests a variable for 0 after using it as a denominator (implying a potential division by zero). r212731.

  • Clang gained initial support for omp parallel for, the omp parallel sections directive, and omp task. r212453, r212516, r212804.

  • On the ARM target, LLVM's atomicrmw instructions will be used when ldrex/strex are available. r212598.

  • Support was adding for mips-img-linux-gnu toolchains. r212719.

Other project commits

  • ThreadSanitizer's deadlock detector is enabled by default after being battle-tested on the Chromium codebase for some time. r212533.

  • Support for Android's bionic C library has been added to libcxx. r212724.

  • LLDB's Python scripting interface should now work on Windows. r212785.

by Alex Bradbury (noreply@blogger.com) at July 14, 2014 01:49 PM

July 09, 2014

LLVM Blog

LLVM Weekly - #25, Jun 23rd 2014

Welcome to the twenty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Facebook have released a number of clang plugins they have been using internally. This includes plugins to the clang analyzer primarily for iOS development as well as a plugin to export the Clang AST to JSON. The code is available on Github and they have started a discussion on the mailing list about moving some of this code upstream.

This week saw the release of LLVM and Clang 3.4.2. This is a bug-fix release which maintains ABI and API compatibility with 3.4.1.

Clang's C++ status page now lists C++1z feature status.

On the mailing lists

  • Rafael Espíndola has started a thread to discuss clarification on the backward compatibility promises of LLVM. He summarises what seems to be the current policy (old .bc is upgraded upon read, there is no strong guarantee on .ll compatibility). Much of the subsequent discussion is about issues such as compatibility with metadata format changes.

  • Duncan P.N. Exon Smith has posted a review of the new pass manager in its current form. He starts with a high-level overview of what Chandler Carruth's new PassManager infrastructure offers and has a list of queries and concerns. There are no responses yet, but it's worth keeping your eyes on this thread if you're interested in LLVM internals development.

  • This week has brought two separate proposals for LLVM code coverage support (neither of which have any replies at the time of writing). Christian Holler has proposed inclusion of LLCov code. This is a module pass that instruments basic blocks with calls to functions that will track coverage. The current LLCov code is available on Github. Alex L has also posted a detailed proposal on improving code coverage support for Clang and LLVM. He is looking for feedback on the approach before starting to submit patches.

LLVM commits

  • The LLVM global lock is dead, and the LLVM Programmer's Manual has been updated to reflect this. llvm_start_multithreaded and llvm_stop_multithreaded have been removed.
    r211277, r211287.

  • The patchset to improve MergeFunctions performance from O(NxN) to O(N x log(N)) has finally been completely merged. r211437, r211445 and more.

  • Range metadata can now be attached to call and invoke (previously it could only be attached to load). r211281.

  • ConvertUTF in the Support library was modified to find the maximal subpart of an ill-formed UTF-8 sequence. r211015.

  • LoopUnrollPass will now respect loop unrolling hints in metadata. r211076.

  • The R600 backend has been updated to make use of LDS (Local Data Share) and vectors for private memory. r211110.

  • X86FastISel continues to improve with optimisation for predicates, cmp folding, and support for 64-bit absolute relocations. r211126, r211130.

  • The SLPVectorizer (superword-level parallelism) will now recognize and vectorize non-SIMD instruction patterns like sequences of fadd,fsub or add,sub. These will be vectorized as vector shuffles if they are profitable. r211339.

  • LLVM can now generate native unwind info on Win64. r211399.

Clang commits

  • Clang's OpenMP implementation now contains initial support of the 'reduction' clause, #pragma omp for, the 'schedule' clause, the 'ordered' clause, and the 'nowait' clause. r211007, r211140, r211342, r211347, r211352.

  • MS ABI support continues with the merging of support for x86-64 RTTI. r211041.

  • The -std=c+++1z flag was added to enable support for C++17 features. r211030.

  • The clang User's Manual has been expanded with documentation for profile-guided optimisation with instrumentation. r211085.

  • Emission of ARM NEON intrinsics has been totally rewritten to be easier to read and maintain as well as to provide better protection against coding errors. r211101.

Other project commits

  • compiler-rt now offers add, sub, and mul for IEEE quad precision floating point. r211312, r211313.

by Alex Bradbury (noreply@blogger.com) at July 09, 2014 09:18 AM

July 08, 2014

Aaron Ballman

Member Function Ref Qualifiers

One of the lesser-known features of C++11 is the fact that you can overload your non-static member functions based on whether the implicit this object parameter is an lvalue reference or an rvalue reference by specifying a functions ref-qualifier. This feature works similar to the way cv-qualifiers work when specifying a method must be called on a const or volatile object, and can in fact be combined with cv-qualifiers.

To specify a ref-qualifier for a member function, you can either qualify the function with & or &&. (The ref-qualifier must come after any cv-qualifiers.) For instance, if you wanted to declare a function to be called on an rvalue reference object only, you would write:

struct S {
  void func() &&;
};

S s1;
s1.func(); // Ill-formed
S().func(); // OK

If you want to overload a function based on the rvalue-ness of the implicit object parameter, you must specify the ref-qualifier for both functions.

struct S {
  void func() &;
  void func() &&;
};

S s1;
s1.func(); // OK, calls S::func() &
S().func(); // OK, calls S::func() &&

Overloading based on a ref-qualifier is useful in (somewhat rare) circumstances where your object can make use of move semantics to reduce expensive construction costs. For instance:

#include <iostream>
#include <utility>

class ExpensiveState {}; // Details unimportant

class Builder {
  ExpensiveState State;

public:
  Builder() = default;
  Builder(const Builder &O) : State(O.State) {
    std::cout << "Copy" << std::endl;
  }
  Builder(Builder &&O) : State(std::move(O.State)) {
    std::cout << "Move" << std::endl;
  }

  Builder operator()() & {
    return Builder(*this);
  }

  Builder operator()() && {
    return Builder(std::move(*this));
  }
};

int main() {
  Builder b;

  b()()()();
}

When executed, this code will output: Copy Move Move Move. The Copy is because b is an lvalue, not an rvalue, and so operator()() & will be called. However, the results of that function are an rvalue, and so the subsequent subexpressions will result in calling operator()() &&. Due to this, resources can be stolen from one invocation to the next on the last three subexpressions, reducing the performance penalties of a copy operation.

In case you are wondering why the std::move(*this) is used when constructing a Builder object; the unary expression *this always results in an lvalue, which would end up calling the copy constructor instead of the move constructor. So the std::move call is required to convert the lvalue into an rvalue.

Ref-qualifiers are not something you will likely use often. However, it is never a bad thing to understand the tools the programming language has to offer. Note: ref-qualifiers are currently supported by clang (tested with 3.4), gcc (tested with 4.9) but not MSVC 2013.

by Aaron Ballman at July 08, 2014 02:05 PM

LLVM Blog

Clang/LLVM on Windows Update

It’s time for an update on Clang’s support for building native Windows programs, compatible with Visual C++!  We’ve been working hard over the last few months and have improved the toolchain in a variety of ways.  All C++ features aside from debug info and exceptions should work well.  This link provide more specific details.  In February we reached an exciting milestone that we can self-host Clang and LLVM using clang-cl (without fallback), and both projects  pass all of their tests!  Additionally both Chrome and Firefox now compile successfully with fallback!  Here are some of the highlights of recent improvements:


Microsoft compatible record layout is done!  It’s been thoroughly fuzz tested and supports all Microsoft specific components such as virtual base table pointers, vtordisps, __declspec(align) and #pragma pack.  This turned out to be a major effort due to subtle interactions between various features.  For example, __declspec(align) and #pragma pack behave in an analogous manner to the gcc variants, but interact with each other in a different manner. Each version of Visual Studio changes the ABI slightly.  As of today clang-cl is layout compatible with VS2013.


Clang now supports all of the calling conventions used up to VS2012.  VS2013 added some new ones that we haven’t implemented yet.  One of the other major compatibility challenges we overcame was passing C++ objects by value on 32-bit x86.  Prior to this effort, LLVM modeled all outgoing arguments as SSA values, making it impossible to take the address of an argument to a call.  It turns out that on Windows C++ objects passed by value are constructed directly into the argument memory used for the function call.  Achieving 100% compatibility in this area required making fundamental changes to LLVM IR to allow us to compute this address.


Most recently support for run time type information (RTTI) was completed.  With RTTI support, a larger set of programs and libraries (for example ICU) compile without fallback and dynamic_cast and typeid both work.  RTTI support also brings along support for std::function.  We also recently added support for lambdas so you can enjoy all of the C++11 functional goodness!

We invite you to try it out for yourself and, as always, we encourage everyone to file bugs!

by Unknown (noreply@blogger.com) at July 08, 2014 03:34 AM

July 07, 2014

LLVM Blog

LLVM Weekly - #27, Jul 7th 2014

Welcome to the twenty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects.LLVM Weekly is brought to you by Alex Bradbury.Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

An LLVM code generator has been merged into the MLton whole-program optimizing compiler for Standard ML. This was written by Brian Leibig as part of his Master's thesis, which contains more information on its performance and design.

Eli Bendersky has written a tool which converts the output of Clang's -ast-dump to HTML. See here for an example. The code is available on Github.

Clang's Microsoft Visual C++ compatibility page has been updated to reflect the status of the current SVN trunk. As can be seen from the relevant diff, record layout has been marked complete along with RTTI. Lambdas are now marked mostly complete.

On the mailing lists

LLVM commits

  • The X86 backend now expands atomics in IR instead of as MachineInstrs. Doing the expansions at the IR level results in shorter code and potentially there may be benefit from other IR passes being able to run on the expanded atomics. r212119.

  • The ARM backend learned the ISB memory barrier intrinsic. r212276.

  • The X86 backend gained support for __builtin_ia32_rdpmc which is used to read performance monitoring counters. r212049.

  • The peephole optimizer gained new code (currently disabled) to rewrite copies to avoid copies across register banks. r212100.

  • Control flow graph building code has been moved from MC to a new MCAnalysis library. r212209.

  • TableGen gained support for MSBuiltin, which allows for adding intrinsics for Microsoft compatibility. r212350.

Clang commits

  • MSVC RTTI (run-time type information) implementation has been completed. r212125.

  • The __builin_arm_ldaex and __builtin_arm_stlex intrinsics were added. r212175.

  • Nested blocks are now supported in Microsoft inline assembly. r212389.

Other project commits

  • lldb-gdbserver support has been merged for Linux x86-64. r212069.

  • AddressSanitizer gained support for i686-linux-android. r212273.

  • libcxxabi gained a CMake build system. r212286.

  • lld now supports parsing of x86 and ARM/Thumb relocations for MachO. r212239, r212306.

by Alex Bradbury (noreply@blogger.com) at July 07, 2014 02:35 PM

July 01, 2014

OpenMP Runtime Project

History of the OpenMP Standard

We have created a fun infographic on the history of the OpenMP standard which has been published in the Intel Parallel Universe (pdf). The folks over at OpenMP.org liked it so much it’s currently their headline news. We now understand why “a picture is worth a thousand words”, since this took as much effort as writing 5,000!

We hope you enjoy it and find it informative.

by Terry Wilmarth (Intel) at July 01, 2014 07:35 PM

June 30, 2014

LLVM Blog

LLVM Weekly - #26, Jun 30th 2014

Welcome to the twenty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Vladmir Makarov has done his yearly comparison of GCC and LLVM, posting performance comparisons using SPECInt2000 on ARM and x86-64.

Version 0.13.0 of LDC, the LLVM-based D compiler has been released. This brings a whole host of improvements, listed in detail within the release announcement.

Some Mozilla engineers have been looking at using clang-cl (the MSVC-compatible Clang driver) to build Firefox. With the help of the fallback flag (which falls back o compiling with MSVC if Clang fails) they've managed to get a completed build. Ehsan tells us that 602 of the 7168 files, about 8% require the MSVC fallback at the moment.

Trail of Bits have posted a preview of McSema, a framework for translating x86 binaries to LLVM bitcode. The accompanying talk took place on the 28th June, so hopefully we'll hear more about this soon. The blog post tells us that McSema will be open source and made available soon.

Bruce Mitchener has written up his experience of integrating with LLDB for Dylan.

Codeplay (based in Edinburgh) are advertising for a full time compiler engineer.

On the mailing lists

LLVM commits

  • A significant overhaul of how vector lowering is done in the x86 backend has been started. While it's under development it's off by default, though it's hoped that in times there will be measurable performance improvements on benchmarks conducive to vectorization. r211888 and more.

  • X86 FastISel will use EFLAGS directly when lowering select instructions if the condition comes from a compare. It also now supports floating-point selects among other improvements. r211543, r211544, and more.

  • ScaledNumber has been split out from BlockFrequencyInfo into the Support library. r211562.

  • The loop vectorizer now features -Rpass-missed and -Rpass-analysis reports. r211721.

  • The developer documentation has been updated to clarify that although you can use Phabricator to submit code for review, you should also ensure the relevant -commits mailing list is added as a subscriber on the review and be prepared to respond to comments there. r211731.

  • COMDATs have been added to the IR. What's a COMDAT? StackOverflow has you covered. r211920.

  • The NVPTX backend saw a whole series of commits. r211930, r211932, r211935, and more.

  • LLVM gained an abstraction for a random number generator (RNG). r211705.

Clang commits

  • A nice little diagnostic improvement has been added for when the user accidentally puts braces before the identifer, e.g. int [4] foo;. r211641.

  • OpenMP learned the 'section' directive (and some more, see the full commit logs). r211685, r211767.

Other project commits

  • Support for ARM EHABI unwinding was added to libunwind. r211743.

  • The lldb Machine Interface gained a number of new commands and bug fixes. r211607.

by Alex Bradbury (noreply@blogger.com) at June 30, 2014 06:53 PM

June 23, 2014

Aaron Ballman

Binary Operator Overloading

In C++, there are two forms of binary operator overloading you can use when designing an API. The first form is to overload the operator as a member function of the class, and the second form is to overload the operator as a friend function of the class. I want to explore why you would use one form of overloading instead of the other, using a Fraction class as an example.

For the purposes of this discussion, this is part of the interface for our expository class.

class Fraction {
  // Implementation details live here.

public:
  Fraction(int Whole);
  Fraction(int Numerator, unsigned Demoninator);
  Fraction(double Value);

  // Binary operator overloads live here.
};

One of the ways we can implement our binary operator overloads is as member functions of the Fraction class. I’m going to pick on the equality operator, but any of the overloaded binary operators would suffice.

  ...
  // Binary operator overloads live here.
  bool operator==(const Fraction &RHS) const;
  ...

The other way we can implement our binary operator overloads is as a friend function of the Fraction class.

  ...
  // Binary operator overloads live here.
  friend bool operator==(const Fraction &LHS, const Fraction &RHS);

Since there are two different ways to implement this, it’s reasonable to ask which way is “correct?” The answer to that question depends on your intentions as a class designer. Consider the following use case:

void f(const Fraction &F) {
  if (1.0 == F) {
    // Do something interesting
  }
}

Some coding conventions suggest that equality comparisons against a constant value put the constant on the left-hand side of the comparison (so that an accidental assignment operation by typing = instead of == would trigger a compile error), so this example is not particularly far-fetched.

If you use a member function for the operator overload, this code would not compile because there’s no way for the implicit converting constructor from double to Fraction to be called. However, by using a friend function for the operator overload, the compiler can call the converting constructor to create a Fraction object which would make the comparison viable. Because of this, I would claim that declaring the operators to be friends is the correct approach for the class design.

This exemplifies a reasonable way to decide how to implement the overloaded binary operators. If you want to allow implicit conversions for items on the left-hand side of the operator, then using friend function overloads is required. If implicit conversions are not desirable for some reason, or not possible (due to having no implicit converting constructors), then using a member function is acceptable. If you’re looking for a general rule of thumb, I would recommend always using the friend function form — it’s more likely to behave how the user would expect in all cases, instead of having curious edge cases where their usage fails. Imagine how confusing it would be for the user of a Fraction class that SomeFraction * 1 succeeds, but 1 * SomeFraction fails to compile! That being said, it ultimately boils down to a design choice that you must make as a class designer.

I would like to thank Jens Maurer for the design discussion which spawned this blog posting.

by Aaron Ballman at June 23, 2014 02:38 PM

LLVM Blog

LLVM Weekly - #24, Jun 16th 2014

Welcome to the twenty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

On the mailing lists

LLVM commits

  • A weak variant of cmpxchg has been added to the LLVM IR, as has been argued for on the mailing list. Weak cmpxchg allows failure and the operation returns {iN, i1} (in fact, for uniformity all cmpxchg instructions do this now). According to the commit message, this change will mean legacy assembly IR files will be invalid but legacy bitcode files will be upgraded during read. r210903.

  • X86 FastISel gained support for handling a bunch more intrinsics. r210709, r210720 and more. FastISel also saw some target-independent improvements r210742.

  • This week there were many updates to the MIPS backend for mips32r6/mips64r6. e.g. r210899, r210784 and many more.

  • NoSignedWrap, NoUnsignedWrap and Exact flags are now exposed to the SelectionDAG. r210467.

  • Support has been added for variable length arrays on the Windows on ARM Itanium ABI. r201489.

  • Some simple reordering of fields in Value and User saves 8 bytes of padding on 64-bit. r210501.

  • FastISel will now collect statistics on when it fails with intrinsics. r210556.

  • The MIPS backend gained support for jr.hb and jalr.hb (jump register with hazard barrier, jump and link register with hazard barrier). r210654.

  • AArch64 gained a basic schedule model for the Cortex-A57. r210705.

  • LLVM has transitioned to using std::error_code instead of llvm::error_code. r210687.

Clang commits

  • The -Wdate-time preprocessor warning from GCC has been implemented. This is useful when trying to create reproducible builds. r210511.

  • Loop unroll pragma support was added. r210667.

  • Yet more progress has been made on MS ABI compatibility. e.g. r210813, r210637.

Other project commits

  • libcxx gained an implementation of string_view as proposed in N4023. r210659.

  • Some of the iOS8/OS X Yosemite specific lldb support has been merged. r210874.

by Alex Bradbury (noreply@blogger.com) at June 23, 2014 10:33 AM

June 12, 2014

Philip Reames

IR Restrictions for Late Safepoint Placement

The late safepoint placement pass we released recently has a couple of restrictions on the IR it can handle.  I’ve described those restrictions a couple of different times now, so I figured it was time to put them up somewhere I could reference and that google might find.  A shorter version of this post will also appear in the source code shortly.

The SafepointPlacementPass will insert safepoint polls for method entry and loop backedges.  It will also transform calls to non-leaf functions to statepoints.  The former are how the application (mutator) code interacts with the garbage collector and may actually trigger object relocation.  The latter are necessary so that polls in called functions can inspect and modify frames further up the stack.

The current SafepointPlacementPass works for nearly arbitrary IR.  Fundamentally, we require that:

  • Pointer values may not be cast to integers and back.
  • Pointers to garbage collected objects must be tagged with address space #1

In addition to these fundamental limitations, we currently do not support:

  • safepoints at invokes (as opposed to calls)
  • use of indirectbr
  • aggregate types which contain pointers to GC objects
  • pointers to GC objects stored in global variables, allocas, or at constant addresses
  • constant pointers to garbage collected objects (other than null)
  • garbage collected pointers which are undefined (“undef”)
  • use of gc_root

Patches welcome for the later class of items.  I don’t know of any fundamental reasons they couldn’t be supported.

 

Fundamentally, a precise garbage collector must be able to accurately identify which values are pointers to garbage collected objects.  We choose to use the distinction between pointer types and non-pointer types in the IR to establish that a particular value is a pointer and use the address space mechanism to distinguish between pointers to garbage collected and non-garbage collected objects.  We don’t require that the types of pointers be precise – in LLVM this would not be a safe assumption! – but we do require that the pointer be a pointer.

We disallow inttoptr instructions, and addrspacecast instructions in an effort to ensure this distinction is upheld.  Otherwise, you could have code like the following:

Object* p = …;
int x = (int)p;
foo(); <– becomes a safepoint, can move objects
Object* p2 = (Object*)x;

Note that while the SafepointPlacementPass will try to check for some violations of this assumption, it will not catch all cases.  At the end of the day, it is the responsibility of the frontend author to get this right.

 

Now on to the various implementation restrictions.

  • We plan to support safepoints on InvokeInsts.  In fact, the released code already has partial support for this.  This is not a high priority for us at the moment, but should be fairly straight forward to complete if anyone is interested.
  • IndirectBr creates problems for the LoopSimplify pass which we use as a helper for identifying backedges in loops.  Our source language doesn’t have any need for indirect branches, but if anyone can identify a better way to detect backedges which doesn’t involve this restriction, we’d gladly take the patch.
  • Currently, we not support finding pointers to garbage collected objects contained in first class aggregate types in the IR.  The extensions required to support this are fairly straight forward, but we have no need for this functionality.  Well structured patches are welcome, but since this will be a fairly invasive change, please coordinate the merge early and closely.  (Alternatively, wait until this has been merged into upstream LLVM and use the standard incremental review and commit process.)
  • Note that we have no plans to support untagged unions containing pointers.  We could support tagged pointers, but this would require either extensions to the IR, or language specific hooks exposed in the SafepointPlacementPass.  If you’re interested in this topic, please contact me directly.
  • The support for pointers to GC objects in global variables, allocas, or arbitrary constant memory locations is weak at best.  There’s some code intended to support these cases, but tests are lacking and the code is likely to be buggy.  Patches are welcome.
  • We do not support constants pointers to garbage collected objects other than null.  For a relocating garbage collector, such constant pointers wouldn’t make sense.  If you’re  interested in supporting non-relocating collectors or relocating collectors with pinned objects, some extensions may be necessary.
  • We have not integrated the late safepoint placement approach with the existing gcroot mechanism.  Given this mechanism is simply broken, we do not plan to do so.  Instead, we plan to simply remove that support once late safepoint placement lands.  If you’re interested in migrating from one approach to the other, please contact me directly.  I’ve got some ideas on how to make this easy using custom transform passes, but don’t plan on investing any time in this unless requested by interesting parties.

 

by reames at June 12, 2014 09:47 PM

June 09, 2014

LLVM Blog

LLVM Weekly - #23, Jun 9th 2014

Welcome to the twenty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Philip Reames has announced that code for late safepoint placement is now available.
This is a set of patches to LLVM from Azul Systems that aim to support precise relocating garbage collection in LLVM. Phlip has a long list of questions where he is seeking feedback from the community on how to move forwards with these patches. There's not been much response so far, hopefully it will come soon as I know there are many communities who are very interested in seeing better GC support in LLVM (e.g. Rust, Ocaml).

The biggest LLVM news this week is of course the announcement of the Swift programming language from Apple. Its development was led by Chris Lattner, original author of LLVM. He has some more info about Swift on his website. There is no source release as of yet, and no indication from Apple as to whether it will remain proprietary. Either way, it's an interesting development. Chris Lattner is now on Twitter and has been passing out tidbits about the Swift implementation.

LunarG have announced the Glassy Mesa project. This project, funded by Valve, will explore increasing game performance in Mesa through improvements in the shader compiler. The current parser and optimisation layer are replaced with glslang and the LLVM-based LunarGlass. More technical details are available in the slide deck.

Sébastien Métrot has released xspray, a frontend for lldb on OS X. One of its interesting features is the inbuilt support for plotting your data.

With all the LLVM news recently, it seems search traffic for 'llvm' has skyrocketed.

On the mailing lists

LLVM commits

  • The jumptable attribute has been introduced. If you mark a function with this attribute, references to it can be rewritten with a reference to the appropriate jump-instruction-table function pointer. r210280.

  • Support was added for Windows ARM exception handling data structures, including decoding them. r209998, r210192.

  • GlobalAlias can now point to an arbitrary ConstantExpression. See the commit message for a discussion of the consequences of this. r210062.

  • The subword level parallelism (SLP) vectorizer has been extended to support vectorization of getelementptr expressions. r210342.

  • The LLVM programmer's manual has been improved with an example of using IRBuilder. r210354.

Clang commits

  • Semantic analysis to make sure a loop is in OpenMP canonical form has been committed. r210095.

  • __builtin_operator_new and __builtin_operator_delete have been added. Some optimisations are allowed on these which would not be on ::operator new and are intended for the implementation of things like std::allocator. r210137.

  • New pragmas have been introduced to give optimisation hints for vectorization and interleaving. You can now use #pragma clang loop vectorize(enable) as well as vectorize(disable), vectorize_width(n), interleave(enable/disable), and interleave_count(n). r210330.

  • Support for the MSVC++ ABI continues with the addition of dynamic_cast for MS. r210377.

  • Support for global named registers has been expanded slightly to allow pointer types to be held in these variables. r210274.

  • GCC's -Wframe-larger-than=bytes diagnostic is now supported. r210293.

Other project commits

  • A benchmarking-only mode has been added to the testsuite r210251.

  • A status page for post-C++14 features in libcxx has been added. r210056.

  • An initial set of Makefiles has been committed to lld. r210177.

  • lldb gained support for inspecting enum members. r210046.

  • Polly can now be built without any GPLed software. r210176.

by Alex Bradbury (noreply@blogger.com) at June 09, 2014 01:37 PM

LLVM Weekly - #22, Jun 2nd 2014

Welcome to the twenty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects.LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Last week I expressed worry about GMANE not updating. I'm happy to report that it's back to normal now. Some of my readers might be interested in my account of the neat Raspberry Pi-based projects I saw at Maker Faire Bay Area.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

David Given has shared his partially complete backend for the VideoCore IV VPU as used in the BCM2835 in the Raspberry Pi. It would also be interesting to see a QPU LLVM backend now it has been publicly documented.

Documentation on how TableGen's DAGISel backend works has been updated.

The LLVM Compiler Infrastructure in HPC Workshop has been announced. This is a workshop to be held in conjunction with SC14. The deadline for the call for papers is September 1st.

Tartan is a Clang analysis plugin for GLib and GNOME. To quote its homepage "The plugin works by loading gobject-introspection metadata for all functions it encounters (both functions exported by the code being compiled, if it is a library; and functions called by it). This metadata is then used to add compiler attributes to the code, such as non-NULL attributes, which the compiler can then use for static analysis and emitting extra compiler warnings."

On the mailing lists

LLVM commits

  • A LoadCombine pass was added, though is disabled by default for now. r209791.

  • AAPCS-VFP has been taught to deal with Cortex-M4 (which only has single precision floating point). r209650.

  • InstructionCombining gained support for combining GEPs across PHI nodes. r209843.

  • Vectorization of intrinsics such as powi, cttz and ctlz is now allowed. r209873.

  • MIPS64 long branch has been optimised to be 3 instructions smaller. r209678.

Clang commits

  • OpenMP implementation continues. Parsing and Sema have been implemented for OMPAlignedClause. r209816.

  • The -Rpass-missed and -Rpass-analysis flags have been added. pass-missed is used by optimizers to inform the user when they tried to apply an optimisation but couldn't, while pass-analysis is used to report analysis results back to the user. A followup commit documents the family of flags. r209839, r209841.

  • The clang optimize pragma has now been documented. r209738.

  • There has been some API refactoring. The release and take methods were removed from ActionResult and Owned removed from Sema. r209800, r209812.

Other project commits

  • ThreadSanitizer has seen a refactoring of storage of meta information for heap blocks and sync objects. r209810.

by Alex Bradbury (noreply@blogger.com) at June 09, 2014 11:59 AM

June 04, 2014

Philip Reames

Code for late safepoint placement available

This post contains the text of an email I sent to the LLVMdev mailing list a few moments ago.  I would encourage you to direct technical questions and comments to that thread, though I will also respond to technical questions in comments posted here.

As I’ve mentioned on the mailing list a couple of times over the last few months, we’ve been working on an approach for supporting precise fully relocating garbage collection in LLVM.  I am happy to announce that we now have a version of the code available for public view and discussion.

https://github.com/AzulSystems/llvm-late-safepoint-placement

Our goal is to eventually see this merged into the LLVM tree.  There’s a fair amount of cleanup that needs to happen before that point, but we are actively working towards that eventual goal.

Please note that there are a couple of known issues with the current version (see the README).  This is best considered a proof of concept implementation and is not yet ready for production use.  We will be addressing the remaining issues over the next few weeks and will be sharing updates as they occur.

In the meantime, I’d like to get the discussion started on how these changes will eventually land in tree.  Part of the reason for sharing the code in an early state is to be able to build a history of working in the open, and to to able to merge minor fixes into the main LLVM repository before trying to upstream the core changes.  We are aware this is a fairly major change set and are happy to work within the community process in that regard.

I’ve included a list of specific questions I know we’d like to get feedback on, but general comments or questions are also very welcome.

Open Topics:

  • How should we factor the core GC support for review?  Our current intent is to separate logically distinct pieces, and share each layer one at a time.  (e.g. first infrastructure enhancements, then intrinsics and codegen support, then verifiers, then safepoint insertion passes)  Is this the right approach?
  • How configurable does the GC support need to be for inclusion in LLVM?  Currently, we expect the frontend to mark GC pointers using address spaces.  Do we need to support alternate mechanisms?  If so, what interface should this take?
  • How should we approach removing the existing partial support for garbage collection? (gcroot)  Do we want to support both going forward?  Do we need to provide a forward migration path in bitcode?  Given the usage is generally though MCJIT, we would prefer we simply deprecate the existing gcroot support and target it for complete removal a couple of releases down the road..
  • What programmatic interface should we present at the IR level and where should it live?  We’re moving towards a CallSite like interface for statepoints, gc_relocates, and gc_results call sites.  Is this the right approach?  If so, should it live in the IR subtree, or Support?  (Note: The current code is only about 40% migrated to the new interface.)
  • To support invokable calls with safepoints, we need to make the statepoint intrinsic invokable.  This is new for intrinsics in LLVM.  Is there any reason that InvokeInst must be a subclass of CallInst? (rather than a view over either calls or invokes like CallSite)  Would changes to support invokable intrinsics be accepted upstream?  Alternate approaches are welcome.
  • Is the concept of an abstract VM state something LLVM should know about?  If so, how should it be represented?  We’re actively exploring this topic, but don’t have strong opinions on the topic yet.
  • Our statepoint shares a lot in the way of implementation and semantics with patchpoint and stackmap.  Is it better to submit new intrinsics, or try to identify a single intrinsic which could represent both?  Our current feeling is to keep them separate semantically, but share implementation where possible.

Yours,
Philip (& team)

p.s. Sanjoy, one of my co-workers,  will be helping to answer questions as they arise.

p.p.s. For those wondering why the current gcroot mechanism isn’t sufficient, I covered that in a previous blog post:
[1] http://www.philipreames.com/Blog/2014/02/21/why-not-use-gcroot/

by reames at June 04, 2014 04:54 PM

May 26, 2014

LLVM Blog

LLVM Weekly - #21, May 26th 2014

Welcome to the 21st issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'm back in the UK and mostly recovered from the ensuing jetlag. I am however disturbed that all mailing lists on GMANE don't seem to have been updated for the past week and have been unable to find any explanation of what is going on online. GMANE is an important and massively useful aggregrator and archiver of free software development lists and I really hope these are only temporary problems. For this issue, I have instead linked directly to the mailman archives at UIUC.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Jonathan Mah has written a Clang plugin for checking key path strings in Objective C code. The implementation is available on Github.

LWN has published an article about ThreadSanitizer v2.

This week, the merge of the AArch64 and the Apple-contributed ARM64 backends was completed. The old AArch64 was deleted and the result of merging code from AArch64 in to ARM64 was renamed to AArch64.

A paper 'Static energy consumption analysis of LLVM IR programs' has been posted to arXiv.org.

On the mailing lists

LLVM commits

  • A new attribute, 'nonnull' has been added. When applied to a parameter or return pointer this indicates it is not null, which may allow additional optimisations (at least, avoiding comparisons between that value and null). r209185, r209193.

  • The llvm.arm.undefined intrinsic has been added. This is used to generate the 0xde opcode on ARM. It takes an integer parameter, which might be used by the OS to implement custom behaviour on the trap. r209390.

  • The MIPS disassembler has seen some work. Some support has been added for MIPS64r6 and various issues fixed. r209415.

  • LLVM learned the -pass-remarks-missed and -pass-remarks-analysis command line options. -pass-remarks-missed shows diagnostics when a pass tried to apply a transformation but couldn't. -pass-remarks-analysis shows information about analysis results. r209442.

  • The documentation for the llvm.mem.parallel_loop_access metadata has been updated. r209507.

  • Old AArch64 has been removed and ARM64 renamed to AArch64. r209576, r209577.

Clang commits

  • clang-format has seen more JS support. It can now reformat ES6 arrow functions and ES6 destructuring assignments. r209112, r209113.

  • Experimental checkers for the clang static analyzer are now documented. r209131.

  • Support was added to clang for global named registers, using the LLVM intrinsics which were recently added. r209149.

  • Clang learned the no_split_stack attribute to turn off split stacks on a per-function bases. r209167.

  • Clang learned the flatten attribute. This causes calls within the function to be inlined where possible. r209217.

  • An initial version of codegen for pragma omp simd has been committed. This also adds CGLoopInfo which is a helper for marking memory instructions with llvm.mem.parallel_loop_access metadata. r209411.

  • The pragma clang optimize {on,off} has been implemented. This allows you to selectively disable optimisations on certain functions. r209510.

  • An implementation of Microsoft ABI-compatible RTTI (run-time type information) has landed. r209523.

Other project commits

  • 'Chained origins' as used by MemorySanitizer has been redesigned. r209284.

by Alex Bradbury (noreply@blogger.com) at May 26, 2014 11:32 AM

May 19, 2014

LLVM Blog

LLVM Weekly - #20, May 19th 2014

Welcome to the twentieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

This week's issue is perhaps a little less thorough than normal. I've been in San Francisco most of the week for Maker Faire this weekend, where I was at the Raspberry Pi booth with some other Foundation members. As this issue goes out, I'll be enjoying my last day in SF before heading to the airport for the long flight home and the ensuing jetlag.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The WebKit blog features an excellent and detailed article about the new Fourth Tier LLVM JIT which sheds light on the how and why.

The Neu framework has recently been announced. It is a C++11 framework, collection of programming languages and software system designed for artificial intelligence applications and technical computing in general. It makes use of the LLVM MC JIT for its NPL language as well as generating high performance neural networks.

On the mailing lists

LLVM commits

  • The inliner has been taught how to preserve musttail invariants. r208910.

  • A new C API has been added for a thread yielding callback. r208945.

  • Another patch in the series to improve MergeFunctions performance has been committed. A total ordering has now been implemented among operations. r208973, r208976.

  • The ARM load/store optimisation pass has been fixed to work with Thumb1. r208992.

  • GlobalValue has been split into GlobalValue and GlobalObject, which allows a code to statically accept a Function or a GlobalVariable but not an alias. r208716.

  • Integral reciprocal was optimised to not use division. This optimisation was influenced by Souper. r208750. Another optimisation opportunity uncovered by Souper was signed icmp of -(zext V). r208809.

  • I rather like that these transforms for single bit tests were verified with Z3. r208848.

  • PowerPC gained global named register support, for r1, r2 and r13 (depending on the subtarget). r208509.

  • Documentation was added for the ARM64 BigEndian NEON implementation. r208577.

  • The constant folder is now better at looking through bitcast constant expressions. This is a first step towards fixing this poor performance of these range comprehensions. r208856.

Clang commits

  • Initial support for MS ABI compliant RTTI mangling has been committed. r208661, r208668.

  • Clang will no longer copy objects with trivial, deleted copy constructors. This fixes bugs and improves ABI compatibility with GCC and MSVC. r208786. Though the itanium C++ ABI part was reverted for now. r208836.

Other project commits

  • The LLDB Machine Interface has been committed. This is an implementation of the GDB Machine Interface, useful for implementing your own frontend to LLDB. r208972.

  • AddressSanitizer started to gain some windows tests. r208554, r208859, r208873 and more.

  • The instrumented profiling library API was fixed to work with shared objects, and profiling is now supported for dlopened shared libraries.. r208940, r209053.

by Alex Bradbury (noreply@blogger.com) at May 19, 2014 01:23 PM

May 12, 2014

LLVM Blog

LLVM Weekly - #19, May 12th 2014

Welcome to the ninteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'm flying out to San Francisco tomorrow and will be there for the Bay Area Maker Faire at the weekend with some other Raspberry Pi Foundation people. If you're around, be sure to say hi.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

LLVM 3.4.1 has been released. This is a bug-fix release so offers API and ABI compatibility with LLVM 3.4. Thanks to everyone who contributed to the release by suggesting or backporting patches, and for testing.

John Regehr has shared some early results and discussion on using Souper (a new superoptimizer for LLVM IR) in combination with Csmith and C-reduce in order to find missed optimisations and then produce minimal test cases. This has already resulted in a new performance bug being filed with I'm sure many more to come.

Crange, a tool to index and cross-reference C/C++ source code built on top of Clang has been released. It aims to offer a more complete database than e.g. ctags, though the running time on a large codebase like the Linux kernel is currently very high.

llgo, the LLVM-based compiler for Go is now self-hosting.

Last week I asked for benchmarks of the new JavascriptCore Fourth Tier LLVM JIT. Arewefastyet from Mozilla now includes such results. FTLJIT does particularly well on asm.js examples.

On the mailing lists

LLVM commits

  • A new algorithm has been implemented for tail call marking. A build of clang now ends up with 470k calls in the IR marked as tail vs 375k before. The total tail call to loop conversions remains the same though. r208017.

  • llvm::function_ref has been introduced and described in the LLVM programmers manual. It is a type-erased reference to a callable object. r208025, r208067.

  • Initial support for named register intrinsics (as previously discussed on the mailing list has landed. Right now, only the stack pointer is supported. Other non-allocatable registers could be supported with not too much difficulty, allocatable registers are much harder. r208104.

  • The -disable-cfi option has been removed. LLVM now requires assemblers to support cfi (control-flow integrity) directives in order to generate stack unwinding information. r207979.

  • The superword-level parallelism (SLP) pass is now enabled by default for link time optimisation. r208013.

  • The llvm-cov documentation has been expanded r208098.

  • The second and third patch of a series to improve MergeFunctions performance to O(n*log(n)) has been merged. r208173, r208189.

  • The standard 'x86-64' CPU used as the default architecture now uses the Sandy Bridge scheduling model in the hope this provides a reasonable default over a wide range of modern x86-64 CPUs. r208230.

  • Custom lowering for the llvm.{u|s}add.with.otherflow.i32 intrinsics as been added for ARM. r208435.

Clang commits

  • MSVC ABI compatibility has again been improved. Clang now understands that the 'sret' (a structure return pointer) is passed after 'this' for MSVC. r208458.

  • Initial codegen from OpenMP's #pragma omp parallel has landed. r208077.

  • Field references to struct names and C++11 aliases are now supported from inline asm. r208053.

  • Parsing and semantic analysis has been implemented for the OpenMP proc_bind clause. r208060.

  • clang-format gained initial support for JavaScript regex literals (yes, clang-format can reformat your JavaScript!). r208281.

Other project commits

  • libcxxabi gained support for ARM zero-cost exception handling. r208466.

  • In libcxx, std::vector gained Address Sanitizer support. r208319.

  • The test suite from OpenUH has been added to the openmp repository. 208472.

by Alex Bradbury (noreply@blogger.com) at May 12, 2014 03:04 PM

May 09, 2014

LLVM Blog

LLVM 3.4.1 Release

LLVM 3.4.1 has been released!  This is a bug-fix release that contains fixes for the AArch64, ARM, PowerPC, R600, and X86 targets as well as a number of other fixes in the core libraries.

The LLVM and Clang core libraries in this release are API and ABI compatible with LLVM 3.4, so projects that make use of the LLVM and Clang API and libraries will not need to make any changes in order to take advantage of the 3.4.1 release.

Bug-fix releases like this are very important for the project, because they help get critical fixes to users faster than the typical 6 month release cycle, and also make it easier for operating system distributors who in the past have had to track and apply bug fixes on their own.

A lot of work went into this release, and special thanks should be given to all the testers who helped to qualify the release:

Renato Golin
Sebastian Dreßler
Ben Pope
Arnaud Allard de Grandmaison
Erik Verbruggen
Hal Finkel
Nikola Smiljanic
Hans Wennborg
Sylvestre Ledru
David Fang

In addition there were a number community members who spent time tracking down bugs and helping to resolve merge conflicts in the 3.4 branch.  This is what made this release possible, so thanks to everyone
else who helped.

I would like to keep the trend of stable releases going to 3.5.x and beyond (Maybe even 3.4.2 if there is enough interest), but this can only be
done with the help of the community.  If you would like to help with the next stable release or even regular release, then the next time you see a proposed release schedule on the mailing list, let the release manager know you can help.  We can never have too many volunteers.

Thanks again to everyone who helped make this release possible.

-Tom





by Tom (noreply@blogger.com) at May 09, 2014 05:28 PM

May 05, 2014

LLVM Blog

LLVM Weekly - #18, May 5th 2014

Welcome to the eighteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'm going to be in the San Francisco area May 13th-20th with some other Raspberry Pi people. We'll be at Maker Faire Bay Area on the 17th and 18th. Let me know if there's anything else I should check out while over there.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Andrew Ruef has written a blog post about using static analysis and Clang to find the SSL heartbleed bug. The code for the checker described in the blog post is available on Github.

The FTL ('Fourth tier LLVM') Javascript JIT is now enabled in WebKit for Mac. The WebKit Wiki has more information. I haven't seen any public benchmark figures. Please do share if you have any.

Eli Bendersky has written an article about how to use libTooling to implement source to source transformations.

The next Paris LLVM Social will take place on May 5th (i.e. this evening).

The LLVM Bay Area social will take place on May 8th. Please RSVP if you are interested.

On the mailing lists

LLVM commits

  • The patch to perform common subexpression elimination for a group of getelementptrs that was discussed a couple of weeks ago has been merged. It is currently only enabled for the NVPTX backend. r207783.

  • X86 code generation has been implemented for the musttail function attribute. r207598.

  • Pass run listeners were added to the pass manager. This adds C/C++ APIs to enable fine-grain progress report and safe suspension points. See the commit message for more info r207430.

  • The optimisation remark system has started to be used, with calls to emitOptimizationRemark added to the loop unroller and vectorizer. r207528, r207574.

  • The SLPVectorizer gained the ability to recognize and vectorize intrinsic math functions. r207901.

Clang commits

  • NRVO (named return value optimisation) determination was rewritten. According to the commit message, "a variable now has NRVO applied if and only if every return statement in that scope returns that variable." Also, NRVO is performed roughly 7% more often in a bootstrap clang build. r207890.

  • libclang's documentation comment API has been split in to a separate header. r207392.

  • The SLPVectorizer (superword-level parallelism) is now disabled at O0, O1 and Oz. r207433. It was later re-enabled at Oz. r207858.

  • The libclang API now supports attributes 'pure', 'const', and 'noduplicate'. r207767.

  • The comment parser no longer attempts to validate HTML attributes (the previous solution was insufficient). r207712.

Other project commits

  • R_MIPS_REL32 relocation are now supported in lld. r207494.

  • A collection of CTRL+C related issues were fixed in lldb. r207816.

by Alex Bradbury (noreply@blogger.com) at May 05, 2014 04:01 PM

April 28, 2014

LLVM Blog

LLVM Weekly - #17, Apr 28th 2014

Welcome to the 17th issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Last week I wondered why the GCC logo is a GNU leaping out of an egg. Thank you to everyone who wrote in to let me know it is a reference to EGCS. GCC was of course famously forked as EGCS which was later merged back in. Apparently this was pronounced by some as "eggs". Mystery solved.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

GCC 4.9.0 was released last Tuesday. See here for more detailed notes on changes in this release.

Honza Hubička wrote a blog post on the history of linktime optimisation in GCC, which was followed by a post containing a benchmark comparison of LTO in GCC vs LLVM.

On Twitter, @lambdamix drew my attention to Notes on Graph Algorithms Used in Optimizing Compilers (PDF). I imagine it will be of interest to many LLVM Weekly readers.

On the mailing lists

LLVM commits

  • The 'musttail' marker which was proposed several weeks ago has been added. Unlike the 'tail' marker, musttail guarantees that tail call optimization will occur. Check the documentation added in the commit for a more detailed explanation. r207143.

  • The rewrite of BlockFrequencyInfo finally landed. A description of the advantages of the new algorithm is in the original commit message, r206548. After a series of bounces, it landed in r206766.

  • LLVM can now generate PE/COFF object files targeting 'Windows on ARM'. r207345.

  • A CallGraph strongly connected components pass manager has been added making use of the new LazyCallGraph analysis framework. This is part of the new pass manager work Chandler Carruth has been working on and is of course a work in progress. r206745.

  • The scheduler model for the Intel Silvermont microarchitecture has been replaced. The commit message claims substantial improvements on integer tests. I'm assuming RAL in this context refers to RegAllocLocal? r206957.

  • ARM64 has of course seen a large number of changes. Among those, support for feature predicates for NEON/FP/CYPTO instructions. This allows the compiler to generate code without using those instructions. r206949. Additionally, there is now a big endian version of the ARM64 target machine. r206965.

  • getFileOffset has been dropped from LLVM's C API. Justification is in the commit message. r206750.

  • The LoopVectorize pass now keeps statistics on the number of analyzed loops and the number of vectorized loops. r206956.

  • The x86 backend gained new intrinsics for Read Time Stamp Counter. r207127.

  • Initial work on mutation support for the lazy call graph has landed. As with most of Chandler's commits, there's much more information in the commit message. r206968.

  • MCTargetOptions has been introduced, which for now only contains a single flag. SanitizeAddress enabled AddressSanitizer instrumentation of inline assembly. r206971.

  • llvm-cov now supports gcov's --long-file-names option. r207035.

Clang commits

  • Documentation for sample profiling was added. r206994.

  • Support for parsing the linear clause for the 'omp simd' directive was added. r206891.

  • Clang gained support for the -fmodules-search-all option, which searches for symbols in non-imported modules (i.e. those referenced in module maps but not imported). r206977.

Other project commits

  • AddressSanitizer gained an experimental detector for "one definition rule" violations (where two globals with the same name are defined in different modules). r207210.

by Alex Bradbury (noreply@blogger.com) at April 28, 2014 02:50 PM

April 24, 2014

LLVM Blog

LLVM Weekly - #16, Apr 21st 2014

Welcome to the 16th issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Apologies that last week's LLVM Weekly went out twice via email. Mailgun have the useful ability to schedule an email for the future, but when this is done incorrectly have no ability to cancel it via the API. Possibly there is no way for them to cancel it either, I have no way to know as my support ticket on the issue was never answered.

Seeing as it's Easter, does anybody know why GCC has a GNU breaking out of an egg as a logo?

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The new backend to Emscripten which is implemented as an LLVM backend has now been merged to Emscripten's master branch. This should result in a noticeable speedup in compile times.

Phoronix have published a small set of benchmarks comparing GCC 4.9RC1 and Clang 3.5 HEAD.

Diego Novillo has announced AutoFDO, a tool which will convert profile data generated with Linux Perf to a format compatible with LLVM's sample-based profiler.

The Polly project have minutes from another phone call, this time focusing on delinearization.

On the mailing lists

LLVM commits

  • LLVM's internal BumpPtrAllocator has been switched to using a vector of pointers to slabs rather than a single linked list and the underlying allocator is now a template parameter. r206147, r206149. The allocator can now also pass the size to the deallocation function, which improves performance with some libraries (e.g. tcmalloc). r206265.

  • Support for building persistent strongly connected components has been added to the LazyCallGraph. There are detailed comments on the reasoning of this approach and some details on implementation in the commit message. r206581.

  • Constant hoisting has been enabled on PowerPC. r206141.

  • PseudoSourceValue is no longer a subclass of Value. r206255.

  • A DebugInfoVerifier has been implemented. r206300.

  • MIPS gained initial support for the IEEE 754-2008 NaN encoding. r206396.

  • OnDiskHashTable has been moved from Clang to LLVM. r206438.

  • ARM's IR-based atomics pass has been moved from Target to CodeGen, which allows it to be used by ARM64. r206485, r206490.

  • Module verification is now off by default in release builds for the JIT, but this can be overridden. r206561.

  • The Cortex-A53 machine model description has been ported from AArch64 to ARM64. r206652.

Clang commits

  • There is now a new hash algorithm for calculating the function hash for instruction profiling, rewritten to help ensure the hash changes when control flow does. r206397.

  • The thread safety analysis SSA pass has been rewritten. r206338.

  • Support for big endian ARM64 was added to Targets.cpp. r206390. It is also now possible to disable NEON and crypto support for ARM64. r206394.

Other project commits

  • LLD now supports --defsym=<symbol>=<symbol>, as supported by GNU LD. r206417.

by Alex Bradbury (noreply@blogger.com) at April 24, 2014 12:17 PM

April 14, 2014

LLVM Blog

LLVM Weekly - #15, Apr 14th 2014

Welcome to the 15th issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Videos are not yet ready, but most slides from last week's EuroLLVM meeting are now up.

ARM have announced the release of version 6 of the ARM compiler, which is now built on LLVM and Clang.

Philip Reames has written an update on his work on late safepoint placement, which is useful for implementing efficient precise garbage collection on LLVM. The bad news is their initial plan did not survive contact with the enemy, though they're hard at work on fixing it and a new update can be expected in good time.

As reported by Phoronix, a number of patches towards the goal of compiling the Linux kernel with clang have been merged by Linus.

The first release candidate of GCC 4.9 has been released. New features in the 4.9 branch are documented here.

Polly had a meeting via phone call, and notes of that meeting are available. The part of most general interest is probably discussion around the potential of merging Polly into the LLVM mainline.

On the mailing lists

LLVM commits

  • The NVPTX backend gained preliminary intrinsics and codegen support for textures and surfaces. r205907.

  • Support for optimisation report diagnostics was added. This starts to implement the idea documented and discussed previously. In the future it will be possible to get a report of the major optimization decisions taken by compiler transformations. r205774, r205775.

  • The merge of AArch64 and ARM4 continues. Named immediate operand mapping logic and enums have been copied from AArch64 to ARM64. r205866. The ARM64 backend has seen a large series of smaller commits as well.

  • Constant hoisting is now enabled for the ARM64 backend. r205791.

  • Previously, optimisation logic in CodeGenPrepare that tried to merge address computation in to the memory operation itself (when supported by the platform's addressing modes) would do so by adding integer operations and using ptrtoint and inttoptr. This caused issues when trying to use alias analysis during CodeGen. There is now opt-in support for doing this using GetElementPtr. r206092.

  • The debug info compression support introduced two weeks ago was reverted, and replaced with a new implementation that compresses the whole section rather than a fragment. r205989, r205990.

  • The segmented stack switch has been moved to a function attribute and the old -segmented-stacks command line flag removed. r205997.

Clang commits

  • A major refactoring of the thread safety analysis has been started. r205728, r205745, and more.

  • libclang gained a clang_CXXMethod_isConst method. r205714.

  • As part of the ongoing project to support the MSVC++ ABI, support for #pragma section and related programs was added. r205810.

  • New command line options were added to support big or little endian for ARM and AArch64. r205966, r205967.

Other project commits

  • The openmp project gained the offload directory, which contains code needed to support OpenMP 4.0 target directives. r205909.

by Alex Bradbury (noreply@blogger.com) at April 14, 2014 02:54 PM

April 10, 2014

Philip Reames

Late Safepoint Placement: An Update

A couple of weeks ago, I promised further detail on the late safepoint placement approach.  Since that hasn’t developed – yet – I wanted to give a small update.

All along, we’ve had two designs in mind for representing safepoints.  One was “clearly the right one” for long term usage, but was a bit more complicated to implement.  The other was “good enough” for the moment – we thought – and allowed us to prototype the design quickly.

Not too long after my last post, “good enough” stopped being good enough.  :)

Over the last few weeks, we’ve been rearchitecting our prototype and exploring all the unexpected corner cases.  Nothing too major to date, but I wanted to hold off on describing things in detail before we had some actual hands on experience.  Once things settles out, I’ll take the time to write it up and share it.

So, in other words, please be patient for a bit longer.. :)

by reames at April 10, 2014 04:12 PM

April 07, 2014

LLVM Blog

LLVM Weekly - #14, Apr 7th 2014

Welcome to the fourteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

There seems to have been a flood of LLVM-related news this week, hopefully I've managed to collect it all. If you're in London next week, you might be interested in attending my introductory LLVM talk on Wednesday. Abstract is here.

EuroLLVM is of course taking place on Monday and Tuesday of this week. Sadly I won't be in attendance. If anyone is blogging the event, please do send me links.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The LLVM-related news that has made the biggest splash this week is surely the announcement of Pyston, a JIT for Python targeting LLVM. More technical details are available on the Github repo. For many this immediately conjures up memories of the Unladen Swallow project, started by Google engineers with the same aim of JITting Python with LLVM. That project was eventually unsuccessful, but it's unfair to the authors of Pyston to assume it will have the same fate. It's unclear how much developer time Dropbox are contributing to Pyston. They clearly have a lot of work to do, though it's no secret that Apple are also looking to target LLVM from JavaScript which means they're not the only developers working in this area. Kevin Modzelewski shared some more info on the LLVM mailing list which details some of the LLVM work they've implemented so far (including some initial escape analysis for GCed memory).

An independent, non-profit LLVM Foundation is to be formed. As a vendor neutral organisation it will represent the community interest and aims to be set up by the end of the year.The initial board of directors will be Vikram Adve, Chandler Carruth, Doug Gregor, David Kipping, Anton Korobeynikov, Chris Lattner, Tanya Lattner, and Alex Rosenberg.

Rust 0.10 has been released. See also the discussion on Hacker News and Reddit. Rust is a systems programming language from Mozilla which uses LLVM as its code generator backend.

The Dagger LLVM-based decompilation framework has released its source as well as publishing a series of five articles documenting its implementation approach and documenting the next steps or 'TODOs'.

An LLVM backend for the Accelerate Array Language has been released. It compiles Accelerate code to LLVM IR and can target multicore CPUs as well as NVIDIA GPUs.

The PDF slidesfor a recent talk about the LLVM-based MalDiv diversifying compiler have been published. Such a tool effectively defeats signature-based matching of malware.

On the mailing lists

LLVM commits

  • MipsAsmParser and MipsOperand was rewritten. The improvements are documented in the commit message. r205292.

  • The ARM backend gained support for segmented stacks. r205430.

  • Windows on ARM support is now possible with the MachineCode layer. r205459.

  • TargetLowering gained a hook to control when BUILD_VECTOR might be expanded using shuffles. r205230. Targets might choose to use ExpandBVWithShuffles which was added in a later commit. r205243.

  • X86TargetTransformInfo gained getUnrollingPreferences, which is used by the generic loop unroller. This helps to optimise use of the micro-op caches on X86. This produced 7.5%-15% speedups in the TSVC benchmark suite. r205348.

  • ARM gained a nice little optimisation pass that removes duplicated DMB instructions. r205409.

  • Atomic ldrex/strex loops are now expanded in IR rather than at MachineInstr emission time. This cleans up code, but should also make future optimisations easier. r205525.

Clang commits

  • The clang static analyzer gained double-unlock detection in PthreadLockChecker, as well as a check for using locks after they are destroyed. r205274, r205275.

  • The OpenMP 'copyin' clause was implemented. r205164.

  • The 'optnone' attribute was added, which suppresses most optimisations on a function. r205255.

  • The heuristics for choosing methods to suggest as corrections were improved, to ignore methods that obviously won't work. r205653.

  • The 'BitwiseConstraintManager' idea was added to the open projects page. r205666.

Other project commits

  • AddressSanitizer can now be used as a shared library on Linux. r205308.

  • compiler-rt gained support for IEEE754 quad precision comparison functions. r205312.

  • lld now supports .gnu.linkonce sections. r205280.

by Alex Bradbury (noreply@blogger.com) at April 07, 2014 01:47 PM

April 06, 2014

LLVM Blog

LLVM Weekly - #13, Mar 31st 2014

Welcome to the thirteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

Thanks in no small part to a mention on the Raspberry Pi blog, Learning Python with Raspberry Pi by myself and Ben Everard is at the time of writing #1 in the Programming books section on Amaon UK. Also, keep your eyes on the X-Dev London meetup page as I'm expecting to give an LLVM-related talk there on the 9th April, though it's not listed yet and is subject to change.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

It's only a week to go until EuroLLVM 2014, which wil be held in Edinburgh on the 7th and 8th of April. Tragically I'm not going to be there as I'm trying to focus on getting my PhD finished, but the schedule looks fantastic.

The Linux Collaboration Summit featured an update on progress of the LLVMLinux project to build the Linux kernel using LLVM/Clang (slides). As of right now, there are approximately 48 kernel patches still working their way upstream for the project.

John Regehr has written an interesting blog post on the subject of using Z3 to prove some things about LLVM optimisations.

Facebook have released the Warp C and C++ preprocessor, written in D. It claims to benchmark much faster than GCC's preprocessor resulting in faster build times, though a quick comparison with Clang didn't show it in a favourable light speed-wise.

Meeting C++ have published a helpful summary of what might make its way in to C++17 or C++1y.

On the mailing lists

  • Apple are contributing their 64-bit ARM backend upstream. Initially, this will co-exist with the current AArch64 backend (the Apple implementation is called ARM64), and over time the backends will be merged.

  • Tom Stellard has [announced a tentative release schedule for LLVM and Clang 3.4.1] and is searching for volunteers to test, as well as nominations for patches that should be included. The proposed schedule is Mar 26 - April 9: Identify and backport additional bug fixes to the 3.4 branch. April 9 - April 18: Testing Phase April 18: 3.4.1 Release

  • Frank Winter started a discussion on how to specify the alignment of a pointer in LLVM IR, which yields some interesting responses.

  • Renato Golin kicked off a discussion about supporting named registers in LLVM/Clang. This is a GNU extension not currently supported. There seemed to be some agreement that this is worth supporting, which resulted in a follow-on thread on how to implement support for named registers.

  • A query from Geoffrey Irving about how to safely make use of floating point rounding mode resulted in an interesting discussion about how a changing rounding modes could be supported. For example, with the introduction of a fp_rounding_sensitive annotation.

LLVM commits

  • The ARM big-endian targets armeb and thumbeb have been added. r205007.

  • Apple's ARM64 backend has been merged, and will for a time live side-by-side with the existing AArch64 backend (see 'on the mailing lists' for more details). r205090.

  • The @llvm.clear_cache builtin has been born. r204802, r204806.

  • Windows target triple spellings have been canonicalised. See the commit for full details, but in short i686-pc-win32 is now i686-pc-windows-msvc, i686-pc-mingw32 is now i686-pc-windows-gnu and i686-pc-cygwin is now i686-pc-windows-cygnus. r204977.

  • The first step towards little-endian code generation for PowerPC has been committed. This initial patch allows the PowerPC backend to produce little-endian ELF objects. r204634.

  • Another LLVM optimisation pass has been fixed to be address space aware, and will no longer perform an addrspacecast. r204733.

  • It is now disallowed for an alias to point to a weak alias. r204934.

  • CloneFunctions will now clone all attributes, including the calling convention.r204866.

  • DebugInfo gained support for compressed debug info sections. r204958.

Clang commits

  • The static analyzer is now aware of M_ZERO and __GFP_ZERO flags for kernel mallocs. r204832.

  • Clang learned how to de-duplicate string the MSVC way. r204675.

  • Capability attributes can be declared on a typedef declaration as well as a structure declaration. r204657.

  • module.private.modulemap and module_private.map are now documented. r205030.

  • Clang's CodeGen module now allows different RTTI emission strategies. This was added for ARM64. r205101.

Other project commits

  • ThreadSanitizer has new benchmarks for synchronization handling. r204608.

  • Initial infrastructure for IEEE quad precision was added to compiler-rt. r204999.

  • LLD gained the --allow-multiple-definition and --defsym options. r205015, r205029.

  • In LLDB, JITed function can now have debug info and be debugged with debug and source info. r204682.

  • ThreadSanitizer vector clock operations have been optimized and are now O(1) for several important use cases. r204656.

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:30 PM

LLVM Weekly - #12, Mar 24th 2014

Welcome to the twelfth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

On the mailing lists

LLVM commits

  • The MIPS64r2-based Octeon CPU has been added. r204337.

  • The ProfileData library, discussed last week was committed. r204482, r204489.

  • The constant hoisting pass saw some refactoring and improvements. r204431, r204433, r204435, r204537.

  • The ARM integrated assembler learned how to handle the .thumb_set directive. r204059.

  • Assembler directives were added to create version-min load commands for iOS or Mac OSX. e.g. .ios_version_min 5,2,0. r204190.

  • It is now possible to specify the 'noduplicate' attribution for intrinsics. r204200.

  • The TableGen backends documentation was fleshed out a bit. r204479.

  • Scheduling annotations have been added to NEON AArch64 instructions. r204505.

Clang commits

  • Counters used in instrumentation-based profiling are now represented in a static array. This is the first commit of a larger project to reduce runtime overhead (initialization in particular) for instrumentation-based profiling. r204080. Other commits for instrumentation-based profiling include r204186, r204379, r204390. There's a matching set of commits in compiler-rt.

  • The deprecated -faddress-sanitizer, -fthread-sanitizer, and -fcatch-undefined-behavior flags were removed. Users whould use -fsanitize= instead. r204330.

  • Support for parsing the OpenMP safelen clause (for 'omp simd') was committed. r204428.

Other project commits

  • Support was added to MemorySanitizer for 'advanced origin tracking', which records all locations where an uninitialized value is stored to memory rather than just the creation point. r204152.

  • The lldb backtrace view has been changed to a process view where you can expand the process, its threads, and see all frames under each thread. r204251.

  • In compiler-rt, Google have re-licensed the Android ucontext implementation under the standard dual license of compiler-rt. r204128.

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:30 PM

LLVM Weekly - #11, Mar 17th 2014

Welcome to the eleventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

It seems an extra comma slipped in to my bio in Learning Python with Raspberry Pi (US) meaning rather than being described as a "compiler hacker, ...", I am a "compiler, hacker, Linux geek, and Free Software enthusiast". It's therefore official, I am a compiler. Presumably this makes me uniquely suited to writing LLVM Weekly.

Previously I've only linked to internship opportunities rather than job ads. I'd be interested in how readers feel about linking to job ads looking for someone with LLVM experience? Do let me know via email or Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

On the mailing lists

LLVM commits

  • I don't believe I made mention of this last week, but it's been decided that virtual methods that override their base class should be marked with the override keyword (and that the virtual keyword would then be considered redundant). r203433, r203442, and others.

  • Support for NaCl support on MIPS developed, with the addition of sandboxing for loads, stores, stack pointer changes, and function calls. r203484, r203606.

  • As discussed in an RFC last week, cmpxchg now has a second ordering operand which describes the required semantics in case no exchange takes place. r203559.

  • An optimisation was added so that switch-to-lookup-table conversion can be done by adding a bitmask check. An example is given in the commit message. r203694.

  • The sample LLVM project has been removed. It has bitrotted over time and doesn't include CMake support at a time that LLVM is moving away from autoconf/automake. r203729.

  • The PowerPC backend learned basic support for the VSX instruction set extensions. r203768.

  • The merging of a patchset to improve MergeFunctions time complexity from O(N*N) to O(N*log(N)). r203788.

  • MachineRegisterInfo has been undergoing some major refactoring in order to allow the use of C++11 range-based for loops. r203865.

  • The linker_private and linker_private_weak linkage types were removed. r203866.

Clang commits

  • Clang will now produce a warning when an invalid ordering is passed to one of the atomic builtins. r203561, r203564..

  • In the world of profile guided optimisation (PGO), PGO counters are now scaled down to 32 bits when necessary instead of just truncated. r203592.

  • The static analyzer gained support for detecting when passing pointers to const but uninitialized memory. r203822.

  • The -Wunreachable-code diagnostic has been broken up into different diagnostic groups to provide access to unreachable code checks for cases where the default heuristics of -Wunreachable-code aren't enough. r203994.

Other project commits

  • lld now has a todo list containing a listing of missing GNU ld command line options. r203491.

  • lldb saw some reworking on how the ShouldStopHere mechanism works. This allows a mode where stepping out of a frame into a frame with no debug information will continue stepping until it arrives at a frame that does have deug information. r203747.

  • The Polly build system has been updated so the Makefile builds a single monolithic LLVMPolly.so. r203952.

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:30 PM

LLVM Weekly - #10, Mar 10th 2014

Welcome to the tenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

This week the book I authored in collaboration with Ben Everard, Learning Python with Raspberry Pi (Amazon US) is officially released.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

John Regehr has written a blog post detailing some ideas for implementing a superoptimizer for LLVM. There's some good stuff in the comments too.

Version 2.1 of the Capstone disassembly framework has been released. Library size is about 40% smaller, memory usage is down 40% and there are performance improvements as well.

Facebook have released Chisel, a collection of LLDB commands to assist debugging iOS apps.

A fork of vim-lldb, the LLDB plugin for Vim included in the lldb repo has appeared. Changes so far have been relatively minor.

On the mailing lists

  • Probably one of the most interesting discussions on the mailing list this week came from Mark Seaborn's questions about upstreaming PNaCl's IR simplification passes. Both PNaCl and the new Emscripten fastcomp backend make use of a series of out-of-tree IR-to-IR passes that simplify the IR by lowering complex to simpler features. The suggestion is to upstream these so that everyone can benefit. Chandler Carruth raises concerns that these passes might bitrot without any in-tree users, suggesting that the PNaCl and Emscripten communities could do more to contribute to upstream LLVM and that such a track record would help ease that concern. There are a number of people chiming in to say they would find the functionality useful.

  • Diego Novillo asks whether people would be interested in an optimization report facility. This would give useful information about the decisions taken by various optimisers, which might be useful when investigating why code doesn't perform as expected and in finding missed optimisations. Unsurprisingly, everyone thinks this would be a good feature. Diego promises to write some more detailed, concrete proposals in the coming days now it's clear people like the general idea.

  • Kev Kitchens asks about cross-compiling compiler-rt. Vadim Chugunov helpfully points to his work on Rust's fork of compiler-rt to support this. It's also pointed out in the thread that support for unwinding actually lives in libc++abi rather than in a separate libunwind where you might expect to find it.

  • There's a pretty long discussion on naming for reverse iterators and other issues which should make entertaining reading for anyone who enjoys bikeshedding (not that there's anything wrong with it, there's some good discussion of API naming principles there).

  • Tim Northover has shared an RFC (with initial patch) on adding a second ordering operand to cmpxchg which which indicates the failure ordering.

LLVM commits

  • A new implementation of PBQP (partitioned boolean quadratic programming)-based register allocator landed. r202735. The original commit (r202551) message details the changes, including massively reduced memory consumption. Average memory reduction is claimed to be 400x. The tagline is "PBQP: No longer feasting upon every last byte of your RAM".

  • AArch64 gained a machine description for Cortex-A53, which involved giving all non-NEON instructions scheduling annotations. r203125. REVERTED

  • The SPARC backend gained support for the VIS SIMD instruction set extensions. r202660

  • A whole bunch of classes moved around, with the intention that LLVM's support library will work when building with C++ modules. r202814 and many others. InstIterator, GetElementPtrTypeIterator, CallSite, PatternMatch, ValueHandle, ValueMap, CFG, ConstantFolder, NoFolder, PredIteratorCache, ConstantRange, PassNameParser, and LeakDetector moved from Support to IR.

  • The PROLOG_LABEL TargetOpcode was replaced with CFI_INSTRUCTION which is intended to have simpler semantics and be less convoluted to use. See the commit message for more details. r203204.

  • Uses of OwningPtr<T> were replaced with std::unique_ptr<T>. r203083.

  • The inalloca grammar was cleaned up. r203376.

Clang commits

  • The PGO (profile-guided optimisation) code got some minor refactoring in preparation for handling non-C/C++ code, and initial suport for Objective-C blocks. r203073, r203157.

  • Module dependences are now included in the dependency files created by -MD/-MMD etc. r203208.

  • The -Wunreachable-code option no longer warns about dead code guarded by a configuration value. r202912.

  • The MSVC mangling scheme is not used for statics or types which aren't visible. r202951.

Other project commits

  • LLDB now supported JIT debugging on Linux using the GDB JIT interface. r202956.

  • Polly started emitting llvm.loop metadata for parallel loops. r202854.

  • In compiler-rt, assembler implementations of __sync_fetch_and_* for ARM were committed. r202812.

  • The level of windows support in LLD has been documented. r203017.

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:30 PM

LLVM Weekly - #9, Mar 3rd 2014

Welcome to the ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

As well as growing another year older last week, I've also started publicising the book I authored with Ben Everard, Learning Python with Raspberry Pi (Amazon US) which should ship soon in paperback or is available right now for Kindle. Hopefully it should be available soon in DRM-free digital formats on oreilly.com. I will be putting more of my Raspberry Pi exploits and tutorials on muxup.com, so if that interests you follow @muxup.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The list of mentoring organisations for Google Summer of Code 2014 has been released. LLVM is one of them, so any budding compiler engineers who qualify may want to check out the ideas page. Other organisations I spotted advertising relevant project ideas are the Linux Foundation, X.org and of course GCC.

At the end of last week, Broadcom made a major step forward in announcing the release of full register level documentation for the VideoCore IV graphics engine as well as full graphics driver source. The device most well-known for featuring VideoCore IV is the Raspberry Pi. The released documentation opens the door to producing something similar to the GPU-accelerated FFT library support that was recently released. Some readers of LLVM Weekly may of course be interested in using this information to produce an LLVM backend. Hopefully the following pointers will help. There are lots of resources linked to at the homepage of the VideoCore IV reverse engineering project. I'd draw particular attention to the QPU reverse engineering effort which contains good information despite the reverse engineering part of the work being made unnecessary by the Broadcom release. You may want to check out the raspi-internals mailing list and #raspberrypi-internals on Freenode. It's also worth looking at the commented disassembly of the VideoCore FFT code and Herman Hermitage's work in progress QPU tutorial.

Code for Fracture, an architecture-independent decompiler to LLVM IR has been released.

Olivier Goffart has written about his proof of concept reimplementation of Qt's moc using libclang. It's actually from last year, but it's new to me.

Alex Denisov has written a guide to writing a clang plugin. He gives an example of a minimal plugin that complains about lowercased Objective C class names.

Coursera are re-running their compilers course on March the 17th. See Dirkjan Ochtman's impressions of the course from the previous run.

The Qualcomm LLVM team are advertising for an intern.

On the mailing lists

LLVM commits

  • LLVM grew a big-endian AArch64 target r202024. Some might consider it a step back, but apparently there's a decent number of people interested in big-endian on AArch64. There's an interesting presentation from ARM about running a virtualised BE guest on a LE host.

  • The flipping of the C++11 switch has allowed a number of simplifications to start to make their way in to the LLVM codebase. For instance, turning simple functors in to lambdas. Like or loathe C++11 lambda syntax, they're certainly less verbose. r202588. OwningPtr<T> gained support for being converted to and from std::unique_ptr<T>, which lays the ground for LLVM moving to using std::unique_ptr in the future. r202609.

  • The coding standards document was updated to reflect the C++11 features that can now be used in the LLVM/Clang codebase and to provide guidance on their use. r202497, r202620.

  • The loop vectorizer is now included in the LTO optimisation pipeline by default. r202051.

  • DataLayout has been converted to be a plain object rather than a pass. A DataLayoutPass which holds a DataLayout has been introduced. r202168.

  • The PowerPC backend learned to track condition register bits, which produced measurable speedups (10-35%) for the POWER7 benchmark suite. r202451.

  • X86 SSE-related instructions gained a scheduling model. Sadly there is no indication whether this makes any measurable difference to common benchmarks. r202065.

  • The scalar replacement of aggregates pass (SROA) got a number of refactorings and bug fixes from Chandler Carruth, including some bug fixes for handling pointers from address spaces other than the default. r202092, r202247, and more.

  • An experimental implementation of an invalid-pointer-pair detector was added as part of AddressSanitizer. This attempts to identify when two unrelated pointers are compared or subtracted. r202389.

  • Shed a tear, for libtool has been removed from the LLVM build system. The commit says it was only being used to find the shared library extension and nm. The diffstat of 93 insertions and 35277 deletions speaks for itself. r202524.

Clang commits

  • The initial changes needed for omp simd directive support were landed. r202360.

  • The -Wabsolute-value warning was committed, which will warn for several cases of misuse of absolute value functions. It will warn when using e.g. an int absolute value function on a float, or when using it one a type of the wrong size (e.g. using abs rather than llabs on a long long), or whn taking the absolute value of an unsigned value. r202211.

  • An API was added to libclang to create a buffer with a JSON virtual file overlay description. r202105.

  • The driver option -ivfsoverlay was added, which reads the description of a virtual filesystem from a file and overlays it over the real file system. r202176.

  • CFG edges have been reworked to encode potentially unreachable edges. This involved adding the AdjacentBlock class this encodes whether the block is reachable or not. r202325.

  • The 'remark' diagnostic type was added. This provides additional information to the user (e.g. information from the vectorizer about loops that have been vectorized). r202475.

Other project commits

  • The compiler-rt subproject now has a CODE_OWNERS.txt to indicate who is primarily responsible for each part of the project. r202377.

  • A standalone deadlock detector was added to ThreadSanitizer. r202505.

  • The OpenMP runtime has been ported to FreeBSD. r202478.

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:30 PM

LLVM Weekly - #8, Feb 24th 2014

Welcome to the eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

I'll be at the Raspberry Jamboree at the end of the week, so if you're going as well be sure to say hi.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

  • As a followup to the proposal that Philip Reames posted last week, where he described plans for contributing precise garbage collection support to LLVM, he has written a blog post about why the llvm.gcroot intrinsic is insufficient for this purpose. A followup post is promised describing the approach they've decided to take. The article is is written so as to be accessible to those who may not be familiar with GC implementation. He also raises some interesting issues with gcroot, even when using it for a non-relocating collector.

  • Some may remember the Dagger project for decompilation of programs to LLVM IR from EuroLLVM 2013 (slides). The promised source code release didn't happen, but the developers have posted an update detailing what they've been up to. There have been a lot of design changes, and some of the work has been submitted upstream as patches to LLVM MC. "At this point we don't really have a schedule; whenever we feel a patch is ready to go, we submit it to the community. The goal being, once we're done, our work becomes a full part of LLVM, where we and all contributors can continue to advance it!"

  • Tamás Szelei has written up a useful guide to implementing a code generator with libclang and Python.

  • The Emscripten project is getting ready to use its 'fastcomp' LLVM backend by default. Previously they had a series of passes written in Javascript to convert from LLVM IR to Javascript, but this is now implemented as a C++ LLVM backend. See their wiki for more info.

  • Agner has updated his popular optimisation manuals to include test results for AMD Steamroller processors, as well as adding some more AVX-512 information.

  • The DWARF Debugging Information Format Committee are welcoming comments, suggestions or proposals for changes to DWARF until March 31st. Although DWARF Version 5 is 'nearing completion', it seems that no drafts have been published so you'll have to base your comments on DWARF 4. Do drop me a note if you know otherwise.

  • There have been several updates to http://llvm.org/apt/. The upcoming Ubuntu 14.04 is now a supported distribution, and additionally both the stable and development versions of LLVM/Clang are built and can be installed side-by-side.

On the mailing lists

  • Renato Golin reminds us that although work is underway to update all buildbots to support C++11, the switch to use -std=c++11 has not yet been flipped, so you'll have to hold off on using C++11 features in LLVM/Clang patches for just a little longer.

  • Saleem Abulrasool points to an issue running recent Clang on the Linux kernel related to the integrated assembler. As regular readers will know, the behaviour was recently changed so that for backends which have an integrated assembler, any inline assembly will be validated by it during compilation, even when compiling with -S (i.e. outputting assembly). The problem is that the Linux kernel is purposely including invalid assembly in some cases when outputting assembler files. Early responses are in favour of keeping current behaviour, people who are doing weird and wacky things can just use the -no-integrated-as switch.

  • In response to the earlier discussion about unwind behaviour in LLVM/Clang, Logan Chien has posted a detailed description of the problems he sees.

  • Kevin Qin writes asking about adding register allocation constraints. Often the mailing list threads which get highlighted in LLVM Weekly are about particularly hairy problems that don't currently have a good solution. I'm happy to see this problem has a simple solution though, as Tim Northover and Quentin Colombet point out the @earlyclobber constraint can be used to ensure the output register allocated is different to the input registers.

  • While working on changes to CodeGenPrepare, Quentin Combet noted his patch would introduce a dependency from libLLVMScalarOpts (where CodeGenPrepare currently lives) to libLLVMCodeGen. He writes to the list asking for views on how to solve this problem. The forming consensus seems to be that it should just be moved to CodeGen. Potentially, any IR pass that depends directly on TargetLowering should be moved also. The move to CodeGenPrepare to lib/CodeGen has now been committed.

  • Per Viberg is soliciting comments on his design draft for improving the detection of uninitialized arguments.

LLVM commits

  • The llvm-profdata tool was introduced. This tool will merge profile data generated by PGO instrumentation in Clang, though may later pick up more functionality. r201535.

  • In a long overdue cleanup, various member variables were renamed from TD to DL to match the renaming of TargetData to DataLayout. r201581, r201827, r201833. Additionally, DebugLoc variables which were named DL have now been renamed to DbgLoc so as not to be confused with DataLayout. r201606.

  • MCAsmParser now supports required parameters in macros, increasing GNU assembler compatibility. r201630.

  • A new TargetLowering hook, isVectorShiftByScalarCheap was added to indicate whether it's significantly cheaper to shift a vector by a scalar rather than by another vector with different values for different lanes. This is used by the new OptimizeShuffleVectorInst in CodeGenPrepare which tries to sink shufflevector instructions down to the basic block they're used so CodeGen can determine if the right hand of a shift is really a scalar. r201655.

  • Private linkage is now properly supported for MachO. r201700.

  • getNameWithPrefix and getSymbol were moved from TargetLowering to TargetMachine, which removes the dependency from Target to CodeGen. r201711.

Clang commits

  • The PGO instrumentation will now compute counts in a separate AST traversal. The reasons for and advantages of this change are documented in detail in the commit message. r201528.

  • Some initial work was committed on documenting available attributes in Clang. Attribute authors are encouraged to submit missing documentation (the method of documentation is described in the addition to the InternalManual.rst). r201515.

  • The IdenticalExprChecker has been extended to check the two branches of an if as well as logical and bitwise expressions. For those not familiar, this checker tries to warn about the unintended use of identical expressions. r201701, r201702.

  • CGRecordLayoutBuilder has been completely rewritten to remove cruft, simplify the implementation, and to work in one pass. r201907.

  • The CastSizeChecker was taught how to correctly deal with flexible array members. r201583.

  • A number of thread-safety attributes have been renamed (with their old name silently deprecated). e.g. lockable is now capability, exclusive_locks_required is now requires_capability. r201585. Additionally, the documentation was updated and greatly expanded. r201598.

  • Initial virtual file system support discussed previously on the mailing list has landed. r201618, r201635.

  • The vcvtX intrinsics were added for v8 ARM as opposed to only being recognised when targeting AArch64. r201661.

  • The hard-float ARM EABI (often known as gnueabihf) is now supported for FreeBSD. r201662.

  • Clang will now provide max_align_t in C11 and C++11 modes. Note the complaint in the commit message though that max_align_t as defined is not 'good' or 'useful'. r201729.

  • Again, there were a number of commits related to increasing compatibility with the MS ABI. None of them immediately leaped out at me as worth highlighting individually, so I recommend you have a flick through last weeks commits if you're particularly interested.

Other project commits

  • Rudimentary support for a standalone compiler-rt build system was added, which will allow the compiler-rt libraries to be built seperately from LLVM/Clang. r201647, r201656.

  • Assembly functions for AddressSanitizer on x86/amd64 were added. r201650.

  • LLDB gained a hardware watchpoint implementation for FreeBSD r201706.

  • Polly gained support for polyhedral dead code elimination. r201817.

  • A patch was added to lldb to provide initial support for the Hexagon DSP. r201665.

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:30 PM

LLVM Weekly - #7, Feb 17th 2014

Welcome to the seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

As always, apologies if I didn't pick up your favourite mailing list discussion or commits. Do drop me a line if you think I'm missing something interesting. There haven't been that many external news stories or blog posts (that I've found) in the last week, but it's been a particularly busy week on the mailing lists with a whole bunch of interesting discussions or RFCs.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The ELLCC Embedded Compiler Collection now has precompiled builds available for ARM, i386, Mips, PowerPC and x86-64. Handily, these are all statically linked. The aim of ELLCC is to provide an easy to use multi-target cross compilation environment for embedded systems, and is of course based on LLVM and clang.

The next Waterloo Region LLVM Social has been announced for Feb 20th.

The next Paris LLVM Social will take place on Feb 24th, hosted by Mozilla.

On the mailing lists

LLVM commits

  • AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC (MachineCode) support. The MCASMInfo::UseIntegratedAS field was added. When true, the integrated assembler will parse inline assembly even when emitting assembly output. r201237.

  • The RTDyld API learned to precompute the amount of memory required for all sections in a module, and reserveAllocationSpace to reserve the given amount of space. r201259.

  • The LTO API gained lto_module_create_from_memory_with_path, which is mainly useful for emitting warning messages from the linker. r201114.

  • ScalarEvolution will now analyze the trip count of loops where there is a switch guarding the exit. r201159.

  • The notes on Phabricator now include a suggestion to provide a link to the Differential revision (the code review) on commits that have been reviewed in this way. Please do this! r201160.

  • The AArch64 backend now recognises Cortex-A53 and Cortex-A57. r201305.

  • The ARM integrated assembler gained partial support for the .arch_extension directive. r201471.

Clang commits

  • There have been yet more updates to Clang's Microsoft ABI compatibility. Clang now understands Microsoft's pointers_to_members pragma. r201105 as well as the vtordisp pragma r201274. In other MS ABI developments, the -vm{b,g,s,m,v} are now supported too. r201175.

  • The command line option -fstack-protector-strong was added. r201120. See also r200601 for info on the sspstrong function attribute.

  • Frontend support for the OpenMP if clause was committed. r201297.

  • You can use the --rtlib=compiler-rt flag to attempt to link against compiler-rt rather than libgcc. r201307.

  • The -Wignored-pragmas diagnostic was added, that will warn in any case where a pragma would have a side effect but is ignored. r201102.

  • The name of the checker producing each diagnostic message is now available through getCheckName(). This can be used to enable/disable a specific checker. r201186.

  • Clang now understands the -fbuild-session-timestamp= and -fmodules-validate-once-per-build-session flags which allows you to make it verify source files for a module only once during a build. r201224, r201225.

Other project commits

  • The sanitizer projects in compiler-rt gained the beginnings for a deadlock detector. r201302, r201407..

  • The original compiler-rt functions (i.e. those that act as a libgcc replacement) now live in the lib/builtins directory. r201393.

  • In lldb, the user can now specify trap-handler-names, a list of functions which should be treated as trap handlers. r201386.

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:29 PM

LLVM Weekly - #6, Feb 10th 2014

Welcome to the sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter. I've been keeping the @llvmweekly Twitter account updated throughout the week, so follow that if you want more frequent news updates.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Alexi Starovoitov has published an LLVM backend targeting an extended version of the Linux Kernel's BPF. An example of the sort of program that might be compiled and run via BPF can be found here.

There is now under a week to go to submit proposals for presentations, tutorials, posters, etc for the upcoming EuroLLVM 2014. Get writing!

LWN's coverage of the recent discussion about LLVM and its licensing on the GCC mailing list is now available to non-subscribers.

Renato Golin posted to the GCC mailing list suggesting there be more collaboration where possible on issues such as standardisation of command line interfaces, language extensions, or just general technical discussion.I know a mailing list GCC developers who want to keep abreast of LLVM/Clang developments should subscribe to...

Phoronix has published a benchmark comparing GCC 4.8.2, a GCC 4.9 snapshot and Clang 3.4 on an Intel Core i5-4670 system.

On the mailing lists

LLVM commits

  • The x86 backend was slightly simplified by moving some matching for x86 bit manipulation instructions from X86ISelLowering.cpp to X86InstrInfo.td. I mention this commit mainly as it's a useful reference for those of you working on LLVM backend code. r200824.

  • The register allocator gained a new 'last chance recoloring mechanism'. Sadly the commit message doesn't include any data of how this improves register allocation for a given codebase. r200883.

  • The old SmallPtrSetImpl was renamed to SmallPtrSetImplBase, and a new SmallPtrSetImpl was introduced. This new SmallPtrSetImpl doesn't require a specific set size to be specified in its templated parameter. r200688.

  • A whole bunch of code was added to CodeGenPrepare which attempts to move sign extensions away from loads in order to increase the chance that the address computation can be folded in to the load on architectures like x86 with complex addressing modes. r200947.

  • strchr(p, 0) is now simplified to p + strlen(p). r200736.

  • Information on handling minor ('dot') releases was added to the HowToReleaseLLVM documentation. r200772.

  • The MIPS assembler learned to understand %hi(sym1 - sym2) and %hi(sym1 - sym2) expressions. r200783.

  • Mips gained a NaCl target. r200855.

  • LLVM now assumes the assembler supports the .loc directive for specifying debug line numbers. r200862.

  • The inliner was modified to consider the cold attribute on a function when deciding whether to inline. r200886. A later commit set the inlinecold-threshold to the same as the inline-threshold so that current inlining behaviour is maintained for now. r200898.

  • Initial implementation for a lazy call graph analysis pass (for use with the upcoming new pass manager) was committed. r200903.

  • The allowsUnalignedMemoryAccess function in TargetLowering now takes an address space argument. This was added for architectures like the R600 where different address spaces have different alignment requirements. r200887.

Clang commits

  • More support was MS ABI-compatible mangling was added. r200857.

  • The behaviour suggested by the C++ Defect Report 329 was implemented. r200673.

  • The ARM target gained support for crypto intrinsics defined in arm_neon.h. r200708.

  • The forRangeStmt AST matcher gained a handy hasLoopVariable sub-matcher. r200850.

  • The -verify-pch CC1 option is now supported. r200884.

  • The -fhiding-week-vtables CC1 option has been removed. r201011.

  • LLVM's new diagnostic system is now wired into clang's diagnostic system. r200931.

Other project commits

  • The address sanitizer gained two functions that would allow implementation of C++ garbage collection to work with its fake stack. r200908.

  • In lldb, the the Mac OS X SystemRuntime plugin now uses the libBacktraceRecording library. r200822.

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:29 PM

LLVM Weekly - #5, Feb 3rd 2014

Welcome to the fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter. I've been keeping the @llvmweekly Twitter account updated throughout the week, so follow that if you want more frequent news updates.

I'm afraid my summary of mailing list activities is much less thorough than usual, as I've been rather busy this weekend both moving house and suffering from a cold. Do ping me if you think I've missed anything important.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

This weekend there was an LLVM devroom at FOSDEM 2014. Slides have already been posted for some of the talks. Hopefully videos will follow.

Pocl (Portable Computing Language) 0.9 has been released. Pocl aims to be an efficient MIT-licensed implementation of the OpenCL 1.2 standard.

Mike Ash has published a useful introduction to libclang.

Ever wanted to use LLVM from within Rust? This blog post will tell you how.

Phoronix has published a benchmark of Clang 3.4 vs GCC 4.9.0 20140126 on AMD Kaveri.

On the mailing lists

LLVM commits

  • The ARM exception handling ABI (EHABI) is now enabled by default. r200388.

  • TargetLowering gained a hook which targets can implement to indicate whether a load of a constant should be converted to just the constant. r200271.

  • Line table debug info is now supported for COFF files when targeting win32. r200340.

  • LLVM now has the beginnings of a line editor library, initially to be used by clang-query but possibly consumed by LLDB as well in the future. r200595.

  • The R600 backend learned intrinsics for S_SENDMSG and BUFFER_LOAD_DWORD* instructions. r200195, r200196.

  • The loop vectorizer gained a number of flags to help experiment with changing thresholds. It now also only unrolls by powers of 2. r200212, r200213.

  • The loop vectorizer now supports conditional stores by scalarizing (they are put behind an if). This improves performance on the SPEC libquantum benchmark by 4.15%. r200270.

  • MCSubtargetInfo is now explicitly passed to the EmitInstruction, EmitInstTo*, EncodeInstruction and other functions in the MC module. r200345 and others.

  • llvm-readobj learned to decode ARM attributes. r200450.

  • Speculative execution of llvm.{sqrt,fma,fmuladd} is now allowed. r200501.

Clang commits

  • Position Independent Code (PIC) is now turned on by default for Android targets. r200290.

  • The Parser::completeExpression function was introduced, which returns a list of completions for a given expression and completion position. r200497.

  • The default CPU for 32-bit and 64-bit MIPS targets is now mips32r2 and mips64r2 respectively. r200222.

  • The ARM and AArch64 backends saw some refactoring to share NEON intrinsics. r200524 and others.

Other project commits

  • Compiler-rt gained a cache invalidation implementation for AArch64 r200317.

  • Compiler-rt now features an optimised implementation of __clzdi2 and __clzsi2 for ARM. r200394.

  • Compiler-rt's CMake files will now compile the library for ARM. Give it a go and see what breaks. r200546.

  • The iohandler LLDB branch was merged in. The commit log describes the benefits. r200263.

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:29 PM

LLVM Weekly - #4, Jan 27th 2014

Welcome to the fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. This marks the end of the first month of operation, here's to many more! LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter. I've been keeping the @llvmweekly Twitter account updated throughout the week, so follow that if you want more frequent news updates.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

The biggest compiler-related news this week has been the discussions on the GCC mailing list. Things kicked off with Eric S. Raymond's post suggesting that technical progress in GCC is being held back by concerns about reusing parts of GCC in a way that bypasses the copyleft license. Ian Lance Taylor responded to point out that GCC now has a plugin system, albeit with an unstable interface, which mostly put a stop to that line of discussion. However a later post to the mailing list from Richard Stallman has proved very controversial by claiming that "The existence of LLVM is a terrible setback for our community precisely because it is not copylefted and can be used as the basis for nonfree compilers". There's plenty of discussion of these comments around the web at LWN, Hacker News, Reddit, Slashdot etc. Although many of us may have a preference for non-copyleft ('permissive') free software licenses, RMS has consistently and over a long period of time argued that copyleft licenses ultimately do a better job of spreading free software and preserving its freedom. As such, it's not clear to me why this mailing list post has come as a surprise to many. I'm personally surprised he didn't bring up the fact that the BSD-style license used by LLVM contains no explicit patent grant (though LLVM does have a patent policy to help protect its users).

Rapidly moving away from controversial topics, an exciting milestone for the LLVM project was hit this week. The 200000th commit has been applied. Takumi Nakamura was lucky enough to be the one to author that commit.

The Khronos group has released the SPIR 1.2 specification. SPIR is a standardised intermediate representation meant for use with OpenCL, and is based on LLVM 3.2 IR. With the release, the Khronos Group have open sourced a modified Clang 3.2 which can generate SPIR from OpenCL C programs as well as a module verifier.

Joaquín M López Muñoz has published a benchmark comparing hash table performance on Clang. He compares GCC's libstdc++-v3 to the LLVM project's libc++.

The Cambridge (UK) LLVM socials are starting up again, with the next one on the 29th Jan at 7.30pm. Sadly I can't make it, hopefully the next one!

On the mailing lists

LLVM commits

  • LoopSimplify is no longer a LoopPass, instead it is both a utility function and a FunctionPass. The motivation was to be able to compute function analysis passes after running LoopSimplify, but the change has a bunch of other advantages described in detail in the commit message. r199884. Additionally, the LCSSA (loop-closed SSA) pass was made a utility with a function pass and the LoopVectorizer became a FunctionPass. r200067, r200074.

  • The Constant Hoisting Pass was born. r200022.

  • InstCombine learned how to deal with vectors for most fmul/fvid/add/sub/mul/div combines. r199598, r199602.

  • Type-based alias analysis has, for the time being, been disabled when using alias analysis in CodeGen due to two shortcomings described in the commit message. r200093.

  • LTO gained new methods which allows the user to parse metadata nodes, extract linker options, and extract dependent libraries from a bitcode module. r199759.

  • The Sparc backend now supports the inline assembly constraint 'I'. r199781.

  • The x86 backend allows segment and address-size overrides for movs/lods/outs, fixing bug 9385. r199803 and more.

  • llvm-ar no longer opens or fstats file twice. r199815.

  • When compiling a function with the minsize attribute, the ARM backend will now use literal pools even for normal i32 immediates. r199891.

  • There was a fair bit of activity on the R600 backend. I haven't had the time to properly summarise that activity or pick out the most important commits, so I recommend those interested take a look through the commit logs.

  • JIT is now supported for Sparc64. r199977.

  • llvm-readobj gained support for the PE32+ format (used for Windows 64-bit executables). r200117.

Clang commits

  • Registry::getCompletions was implemented. This returns a list of valid completions for a given context. r199950.

  • Clang gained basic support for the attribute returns_nonnull. r199626, r199790.

  • getResultType on function and method declarations was renamed to getReturnType which is a semantically more accurate name. r200082. Similarly, getResultLoc was renamed to getReturnLoc. r200105.

  • All applicable accessors in FunctionProtoType have been renamed from *argument* to *parameter*. r199686.

  • Clang was taught to look in its installation libdir for libraries such as libc++ when the installation is within the system root. r199769.

  • A module.map file is now required to load a module. r199852.

Other project commits

  • lldb learned the 'step-avoid-libraries' setting, which allows a user to list libraries to avoid. r199943.

  • In compiler-rt, support was added for intercepting and sanitizing arguments passed to printf functions in AddressSanitizer and ThreadSanitizer. r199729.

  • A fix was committed to ThreadSanitizer to prevent deadlocking after a fork. r199993.

  • Dragonegg can now be built with CMake. r199994.

  • Compiler-rt gained support in its udiv/umod implementations for ARMv4 which lacks bx and clz. Code changes also resulting in a 30%+ performance improvement on the Raspberry Pi (armv7, ARM1176) and 5-10% on a Cortex A9. r200001.

  • In AddressSanitizer on Android, all AddressSanitizer output is duplicated to the system log. r199887.

  • lld gained support for emitting a PE32+ file header. r200128.

  • lldb now supports Haswell on x86-64. r199854.

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:29 PM

LLVM Weekly - #3, Jan 20th 2014

Welcome to the third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Eli Bendersky has penned some thoughts on LLVM vs. libjit. Eli describes libjit as being more limited, yet easier to understand and to get going with due to its focus. He also makes interesting claims such as "to be honest, I don't think it's possible to create a really fast JIT within the framework of LLVM, because of its modularity. The faster the JIT, the more you’ll have to deviate from the framework of LLVM". As well as the comments directly on the blog post, there is some good discussion over at Reddit .

Version 2.0-RC1 Capstone disassembly framework has been released. Capstone is built using code from LLVM. The new release features reduced memory usage, faster Python bindings, and support for PowerPC among other changes.

Planet Clang has been announced. It is a news feed following blog posts from Clang and LLVM committers and contributors. The blog roll is fairly short right now, but you're welcome to submit your RSS feed via the email address in the announcement post.

The PDF of an upcoming paper to be presented at CGO next month has been released. WatchdogLite: Hardware-Accelerated Compiler-Based Pointer Checking proposes instruction set extensions to accelerate pointer checking functions and achieves a performance overhead of 29% in return for memory safety. The compiler extends (and is compared to) SoftBound + CETS.

On the mailing lists

  • David Woodhouse has posted a detailed update on the status of 16-bit x86 in LLVM. David has successfully built the 16-bit startup code of the Linux kernel and invites people to start testing it on real code.

  • Tom Stellard opens a discussion on stable LLVM 3.4.x releases. A number of people volunteer their assistance and there seems to be general agreement that any 3.4.1 release would include bug-fixes only with no ABI changes.

  • Diego Novillo is looking to boost the performance of the SPEC benchmark libquantum using profile info and loop unrolling. Sean Silva did us all a great service by asking for clarification on what a "runtime unroller" means in this context. The answer is that the trip count (the number of times the loop is executed) is not known at compile time. The thread is worth a read if you're interested in loop unrolling or vectorization.

  • Aaron Ballman has stepped up as code owner for the attribute subsystem with unanimous approval.

  • Skye Wanderman-Milne was looking for help on loop unrolling a single function using the C++ API. Simply adding the LoopUnrollPass to a FunctionPassManager had no effect, but after some advice from the mailing list Skye did respond to confirm that the set of ScalarReplAggregates, LoopRotate, and LoopUnroll passes did have the desired effect.

  • Tobias Grosser asks why LLVM's LNT (used for performance tracking) defaults to aggregating results by taking the minimum rather than an average. Replies quickly hone on in the real problem at hand, which is that results are 'noisy' potentially due to other processes on the machine but also quantised to certain values due to the timer being relatively coarse-grained in comparison to the execution time for the benchmarks.

  • This week's unsolved question is from Keith Walker, who's noticed that on ARM, the function prologue generated in GCC and LLVM ends up with the frame register pointing to a different address. The LLVM prologue results in the frame pointer pointing to just after the pushed r11 register (the saved frame pointer) while on GCC the frame pointer points to just after the pushed link register. The difference makes it difficult to produce a generic stack walker.

LLVM commits

  • The MCJIT remote execution protocol was heavily refactored and it was hoped fixed on ARM where it was previously non-functional. There are still some random failures on ARM though, see bug 18507. r199261

  • The cutoff on when to attempt to convert a switch to a lookup table was changed from 4 to 3. Experimentally, Hans Wennborg found that there was no speedup for two cases but three produced a speedup. When building Clang, this results in 480 new switches to be transformed and an 8KB smaller binary size. r199294

  • Support for the preserve_mostcc and preserve_allcc calling conventions was introduced and implemented for x86-64. These are intend to be used by a future version of the ObjectiveC runtime in order to reduce overhead of runtime calls. r199508

  • The configure script now checks for a sufficiently modern host compiler (Clang 3.1 or GCC 4.7) r199182

  • More work on the new PassManager driver. Bitcode can now be written using the new PM and more preparation/cleanup work has been performed. r199078, r199095, r199104

  • Dominators.h and Verifier.h moved from the Analysis directory to the IR directory. r199082

  • The DAGCombiner learned to reassociate (i.e. change the order of) vector operations r199135

  • dllexport and dllimport are no longer represented as linkage types r199218

  • Parsing of the .symver directive in ARM assembly was fixed r199339

Clang commits

  • The MS ABI is now used for Win32 targets by default r199131

  • The MicrosoftMode language option was renamed to MSVCCompat and its role clarified (see the commit message for a description of MicrosoftExt vs MSVCCompat). r199209

  • The -cxx-abi command-line flag was killed and is instead inferred depending on the target. r199250

  • The analyzer learned that shifting a constant value by its bit width is undefined. r199405

  • The nonnull attribute can now be applied to parameters directly. r199467

  • Support for AArch64 on NetBSD was added to the compiler driver. r199124

Other project commits

  • AddressSanitizer in compiler-rt gained the ability to start in 'deactivated' mode. It can later be activated when __asan_init is called in an instrumented library. r199377

  • A number of patches were committed to lld for better MIPS support. r199231 and many more.

  • lldb recognises Linux distribution in the vendor portion of the host triple. e.g. x86_64-ubuntu-linux-gnu. r199510

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:29 PM

LLVM Weekly - #2, Jan 13th 2014

Welcome to the second issue of LLVM Weekly (and the first to appear on the LLVM blog). LLVM Weekly is a newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. Thank you to everyone who has given positive feedback or suggestions, I'm planning in future editions to integrate some sort of statistics of git activity and contributions as well as activity on Bugzilla. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

The canonical home for this issue can be found here at llvmweekly.org.

News and articles from around the web

Stephen Diehl has published a Haskell adaptation of the 'Kaleidoscope' tutorials on how to implement aJIT compiled language using LLVM. This was received very positively and prompted some discussion on HN.

Registration for EuroLLVM is now open. There's still time to submit your talk, poster or workshop proposals too.

Rust 0.9 has been released. Rust is a systems programming language which targetting LLVM with a particular emphasis on type safety, memory safety and concurrency. One of the core Rust developers at Mozilla, Niko Matsakis, also gave a talk at linux.conf.au introducing Rust.

Coverity 7 now includes Clang compiler support.

On the mailing lists

LLVM commits

  • The Sparc ASM parser has seen further development, learning to parse branch instructions, conditional moves, and floating point instructions and more. It also gained an initial disassembler implementation. r198738, r199033, r198591

  • In SimplifyCFG will now understand that when building a lookup table for a switch with no holes that it doesn't need a default result. According to the commit message, this saves 164KB off the clang binary in a bootstrap build. r199025

  • The new pass manager saw further developments. r198998 and many others (prefixed with [PM] in the commit logs)

  • The SampleProfile pass saw further development. Profile samples are propagated through the CFG, heuristically converting instruction samples to branch weights. Work was done to extend and simplify the format of sample profile input files. r198972, r198973

  • The LoopVectorizer can use versioning for strided memory access. r198950

  • In DataLayout, 'w' is now used rather than 'c' for win32 name mangling. r198938

  • The LLVM developer policy was updated to more clearly spell out how to submit patches to the project. r198901

  • WriteAsOperand has been killed in favour of the the printAsOperand method. r198836

  • The x86 backend learned some more AVX-512 intrinsics. r198745

Clang commits

  • Support for the enable_if attribute was added. This can be placed on function declarations to control which overload is selected based on the values of the function's arguments. See the commitdiff (particularly the docs change) for more info. r198996

  • More work on MS VC++ ABI compatibility was committed. r198907, r198975, r198977, r198900 and more.

  • __has_attribute now understands target-specific attributes. r198897

  • The clang plugin infrastructure is now covered by at least some tests. r198747

  • The documentation on clang internals was updated with more up to date information on adding new attributes. r198705

  • An initial instrumentation-based profile guided optimisation implementation landed. r198640

by Alex Bradbury (noreply@blogger.com) at April 06, 2014 05:28 PM

April 03, 2014

LLVM Blog

The LLVM Foundation


The LLVM umbrella project has grown over the years into a vibrant community made up of many sub-projects, with hundreds of contributors.  The results of this project are used by millions of people every day.  Today, I'm happy to announce that we are taking the next big step, and forming a new, independent non-profit to represent the community interest.  "The LLVM Foundation" will allow us to:

 - Solve infrastructure problems.
 - Address financial challenges around the developer meetings and infrastructure expenses.
 - Deliver improved transparency into operational and policy matters
 - Provide a company-neutral organization to help ensure success well into the future.

We are working towards a goal of having this organization functioning by the end of the year, possibly sooner.

In terms of structure, we expect the foundation to be lean: an oversight board of volunteers and a dedicated administrator.  The expectations of this administrator will be to oversee long-overdue upgrades to our infrastructure (such as the web site design, bug database, mailing lists, etc), organize the US LLVM Developer Meeting, and drive the formation of the foundation itself.  We do not expect significant change to our developer policies (i.e. licensing & copyright).  Our system of domain-expert "code owners" will continue to drive the technical direction of their areas.  As an LLVM contributor, the biggest changes that you can expect to see are things operating more smoothly and infrastructure issues getting the attention they deserve.

Tanya Lattner has been spearheading the formation of the Foundation and has graciously agreed to take on the administrator role.  Tanya has been actively involved in LLVM since 2003.  Many of you know Tanya from her LLVM community involvement such as organizing the US LLVM Developer Meetings and management of the website and other infrastructure.  She implemented Swing Modulo Scheduling in LLVM for her Masters thesis at UIUC, served as the release manager from LLVM 1.7 through 2.7, and in a commercial setting she implemented LLVM based optimization tools and contributed to an LLVM-based OpenCL implementation.

We have also been quietly speaking with members of the community, working to select a diverse board of directors, that accurately reflects the community’s interests.  We looked to active members of the community, representing both academic and commercial interests, people from organizations that contribute to the developer meeting and testing infrastructure, and technical leads from some of the prominent LLVM sub-projects.  We sought diversity in the board, while trying to keep it small enough to be nimble.  The initial board of directors will be:

Vikram Adve
Chandler Carruth
Doug Gregor
David Kipping
Anton Korobeynikov
Chris Lattner
Tanya Lattner
Alex Rosenberg
I expect that there will be many questions and comments.  The foundation is intended to represent the interests of the community, so please send questions to the LLVMdev mailing list.  More information will be coming over the next few months as things progress.

-Chris

by Chris Lattner (noreply@blogger.com) at April 03, 2014 04:09 PM

March 21, 2014

Sylvestre Ledru

Rebuild of Debian using Clang 3.4

Using the AWS donation, David Suarez and myself have been able to rebuild the whole archive with Clang 3.4.
The rebuild has been done January 10th but, with my new job, I did not find the time to publish the result.
Releases after releases, the results are getting better and better.

Currently, 2193 packages in the archive are failing to build from source.
That is roughly the same number of build failures as with the precedent rebuild with Clang 3.3.
However, this is good news for two reasons:
* the number of new packages in Debian increased (18854 at time of the 3.3 release, 21204 for the 3.4)
* clang 3.4 has more checks and error detections.

I also started to update clang to make it closer to gcc. For example, I transformed wrong usage of -O (> 6) error to be treated as regular warning.

However, a critical bug has emerged during this release. When using -D_FORTIFY_SOURCE=2 (which is the case of many Debian packages), the binaries produced freeze (infinity loop in the ASM). This has been reported upstream as bug 16821 and concerns about 150 packages (including Firefox, gcc, LLVM, etc). Hopefully, this will be fixed in 3.5 (this is unlikely for 3.4.1).

About the new build failures, now, Clang triggers an error (or warning + -Werror) on:

* Wrong usage of the default argument (should be done in the definition)
16 occurrences

* Usage of C++11 feature without the appropriate argument
7 occurrences

* Unused static const declaration
5 occurrences

* Recursive template instantiation exceeded
4 occurrences

* Defitinion of a builtin function
3 occurrences

* Read-only variable is not assignable
2 occurrences

By the way, I proposed a Google Summer of Code Project to work faster on a support of Debian built by Clang. As requirements, I asked students to fix some bugs, they already did a great job.

by Sylvestre at March 21, 2014 09:51 PM

March 18, 2014

Philip Reames

Late Safepoint Placement Overview

In a previous post, I sketched out some of the problems with the existing gcroot mechanism for garbage collection in LLVM.  This post is going to layout the general approach of what we’ve started referring to as “late safepoint placement.”  This will be both fairly high level and fairly short.  Details will follow in future articles.

The general approach we’ve taken is to partition LLVM’s optimization and code generation process into two distinct phases.  Between the two phases, we rewrite the IR to contain explicit safepoints – constructed in a way which conservatively encodes their relocation semantics.

The first phase runs before safepoints are inserted.  When safepoints are inserted, we require that a couple key invariants have been upheld.  We construct the initial IR such that these invariants hold.  Each pass in the first phase must preserve them.  Somewhat surprisingly, most existing optimization passes seem to preserve them without modification. The key ones are:

  • pointers must remain pointers
  • pointers into the garbage collected heap must be distinguishable from pointers outside the garbage collected heap
  • a base pointer must be available (at runtime) for every derived pointer

Together, these give us all the information we need to insert safepoints.  There will be a future article which will focus on this in more depth.

The actual insertion of the safepoints is a fairly complex set of IR transform passes.  The objective of these passes is to represent the inserted safepoints in a way that it would be illegal – using LLVM’s own semantics – to transform the IR in a way which subverts the desired safepoint semantics.  We plan on contributing these transform passes to LLVM community.  There will also be a future post in this series which discusses some of the steps involved and the algorithms used.

Once this transformation is complete, we can run the resulting IR through any remaining optimization passes and backend code generation without concern for correctness.  Nor do we need to extend the entire process to preserve the invariants mentioned above.  (Since the SelectionDAG completely throws away the distinction between pointers and integers, that’s pretty much a hard requirement for a practical system.)  As a result, the second phase consists of any optimization and code generation steps which were not placed into phase 1.

Note: The bit we’re skipping for the moment is how to construct the IR for a safepoint and how to propagate that through all of code generation.  That does require some additions to LLVM and will be a separate article in the near future.  For the moment, you’ll have to just accept it is possible.

What makes this approach very powerful is that the boundary between the two phases is adjustable.  We can trade implementation effort directly for generated code quality by pushing the boundary further back into optimization and (someday) code generation.

A naive implementation could use an empty first phase and insert safepoints before running any optimization passes.  This would be analogous to the “early safepoint placement” scheme I mentioned in the previous post.  On the other extreme, you could pull all of optimization and code generation into the first phase, thus getting a classic garbage collection aware compiler.  At the moment, this is somewhat impractical since we can’t preserve the required invariants that late in the compilation process, but it highlights an interesting direction for future work.

At the moment, we’ve chosen to place the safepoint insertion step immediately after the target independent optimization passes and right before we begin lowering towards the specific machine model.  (i.e. after high level optimization such as constant propagation, gvn, loop optimizations, etc.., but immediately before CodeGenPrepare)  We think that we’ve managed to push the required invariants this far. Though to be fair, we haven’t yet had serious burn in on the prototype; we may find an insurmountable bug and have to pull this slightly earlier.

One advantage of this approach which can’t be understated is the flexibility it allows.  Combined with LLVM’s existing pass scheduling mechanism, we can place *any* problematic pass after safepoint insertion.  This both gives us a means to work around bugs in the short term, and also gives us a means to work incrementally towards a fully GC aware compiler.

Note: The flexibility in pass scheduling does come at some cost.  Moving a pass out of its expected order may reduce optimization effectiveness.  On one hand, the moved pass may not be as effective once safepoints are inserted.  On the other, pass ordering is a well known problem in compilers and moving a pass may decrease the effectiveness of other passes (even those not moved).

I believe that “late safepoint placement” is a viable path towards high performance fully relocating garbage collection in LLVM.  We implemented enough of this to be reasonably confident it actually works.  Over the next few weeks, I will be devoting more of my time to describing our approach publicly and preparing changes for upstream contribution.  Check back here over the next few weeks for updates.

Aside: It is not clear that we will ever reach the goal of what I’ve termed a “fully GC-aware compiler” above.  Pushing safepoint insertion further back in the process would require substantial changes to large pieces of the backend infrastructure.  It’s not even clear that doing so would be sufficiently profitable to justify the effort.  We believe that the currently placement of safepoint insertion will be adequate from the perspective of code quality.  There’s room for improvement, but it may not be worth the engineering investment or maintenance costs.

by reames at March 18, 2014 05:11 PM

March 15, 2014

Sylvestre Ledru

scan-build on the llvm toolchain runs nightly

Just a small blog post to LLVM developers that the automatic scan-build reports on LLVM+Clang+LLDB+compiler-rt are now run using LLVM nightly.

That brings few advantages:
* New checks quickly available for the LLVM developer community
* Quick feedbacks for scan-build developers on the whole code base
* Automatic testing of the packages generated

by Sylvestre at March 15, 2014 10:51 AM

February 22, 2014

Sylvestre Ledru

Some updates on llvm.org/apt/

I made some changes on http://llvm.org/apt/ for the last 2 months.

  • Added trusty, Ubuntu 14.04, as a new supported distribution (on the request of Michael Larabel, Phoronix)
  • Support both the stable and development version. Currently, that means that the release_34 branch and the trunk are built. So, for example, clang-3.4 and clang-3.5 can be installed.
    release_34 are only built when a new commit is submitted in this branch. trunk is built twice a day.
  • Add a new package llvm-{3.4,3.5}-tools which contains various tools to build software/packages on top of llvm. Contributed by Martin Nowack in the context of Klee.
  • Since a C++ 11 compiler is now mandatory, I had to force the usage of a backported gcc/g++ 4.8 (thanks Doko).
    This is the case for Ubuntu Precise (12.04), Quantal (12.10) and raring (13.04).
    The thing is that it triggers a dependency on the libstdc++ 4.8 causing the PPA to be mandatory.
    deb http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu $DISTRIBUTION main

    For now, because of the lack of backport of gcc 4.8, I am not providing support for Debian stable (wheezy).

by Sylvestre at February 22, 2014 09:51 AM

February 21, 2014

Philip Reames

Why not use gcroot?

In a couple of recent threads on llvmdev the question of what’s wrong with the existing garbage collection support in llvm has come up.  I’ve partially answered this in a couple of different places, but this post series is an attempt to group everything together into one consistent answer.

This post will focus on what I believe to the fundamental issues with the gcroot mechanism used to support garbage collection in llvm.  In a follow on post, I will discuss a few of the current implementation limits and how they might be addressed.  The second post is mostly for completeness sake; as I hope you’ll gather from the first post, I believe pursuing the gcroot approach to be non-viable long term.

To give a bit of context, I’ve been looking into how to efficiently support GC in LLVM for several months.  We investigated the gcroot approach, but for the reasons described here have decided to pursue an alternate approach.  If this works out in practice, I’ll write a followup post sometime in the next few weeks.

To be clear, I do not mean this post as an attack on the authors or users of the current GC mechanism.  While I believe it can and should be enhanced, we likely would have not seriously considered LLVM if some form of GC support was not already present.  Having any support for GC at all says a lot about the openness of the LLVM community towards non-C family languages and we were thrilled to see that.  Thank you.

Outline for this post:

  • What you need to know about GC
  • How gcroot works
  • gcroot can lose objects (for any GC)
  • gcroot is broken for GCs which relocate roots

GC Background

This section briefly introduces the key concepts of GC needed to explain my points.  I’m gearing this article towards folks who know compilers, but might not necessarily know the intricacies of GC very well.  Feel free to skip to the next section if desired.

A garbage collector is a tool for automatically managing memory.  At it’s most fundamental, a garbage collector recycles memory for an object when that object is no longer reachable through the object graph.  If the program provably can’t touch the memory again, it’s assumed to be safe to recycle.

Liveness of an object is generally established through reachability.  You start with a set of known ‘root’ values and recursively mark any object reachable from them.  Any object not reached during this traversal is assumed to be dead.  (For more of the basic terminology, you may find this previous post useful.

A relocating collector is one which moves objects around at runtime.  A collector doesn’t necessarily need to move objects to collect garbage, but relocating collectors have proven to be some of the most robust designs.  Explaining exactly why is beyond the scope of this post, but as a starting point, consider how a heap fragments over time and how this effects a non-relocating collector’s ability to reclaim free space.

Relocating collectors need to move objects reachable from running thread state around at runtime.  To do this, they need a point in the code where they can identify all the object pointers in the thread’s execution context, whether on the stack or in registers.  These points are called “safepoints”.

Let’s restrict ourselves to single threaded programs for a moment so that we don’t have to do anything fancy to get atomicity.  In particular, let’s ignore the intricacies of concurrent collection.  There’s no need to get into read or write barriers for the purpose of this post.

A collector doesn’t necessarily have to move all objects in a particular collection cycle.  Many real world collectors don’t.  One common subset to not move are objects directly reachable from the thread state (stack + registers) of currently executing threads.  As we’ll see in a bit, relocating objects directly reachable from executing code requires a fair amount of compiler work.

A collector which chooses to not relocate objects reachable from thread state can also choose to be somewhat “conservative” about identifying such objects.  So long as every object so directly reachable doesn’t move, it doesn’t really matter (from a correctness perspective) if a few objects are falsely identified as being “roots” (i.e. reachable from thread state) or if some non-pointer value on the stack is treated like a possible root.

A collector which does relocate all objects can be called a fully relocating precise collector.  The “precise” bit comes from the fact that a collector which relocates everything doesn’t really have the option to be “conservative”.  Consider what might happen if you updated some random integer in the program which just happened to “look like” a pointer to an object.  Ouch.

From a practical perspective, the company I work for has a fully relocating precise collector.  As such, this is where my primary interest lies.

How does gcroot work?

So how does the existing gcroot mechanism work?  At a conceptual level, it’s actually fairly straightforward.

The frontend is responsible for identifying all pointers in code being compiled which should be treated as roots.  This is done by creating a slot on the stack (via an alloca) and tagging that slot with a “gcroot” intrinsic.

The frontend also provides a GCStrategy which is responsible for deciding where safepoints are.  This information is used to record the address of a safepoint and the slot indices (i.e. stack offsets) into a side data structure during compilation.  This is done fairly late in the compilation process; in particular, it’s long after IR level optimization passes have run.  During optimization, there are no safepoints explicitly represented in the IR.

We’ll skip most of the implementation details for the moment and return to those later on.  They’re not really relevant for the first post.  What is important to remember is that the gcroot mechanism only records the stack slots.  Any copy of that pointer value is not recorded.

Note: The documentation for gcroot can be found here.   The wording is somewhat confusing since it makes it sound like relocation of pointers tagged as gcroot is fully supported.  As we’ll explore in a moment, it’s not.

gcroot can miss roots (even for non-locating collectors)

For gcroot to work, we need to preserve a non-trivial invariant.  One copy of a live pointer value must always be present in a location known to be a gcroot.  There can be other copies, but one copy must be in the gcroot location.  (This is true for any reachability based collector, not simply relocating ones.)

Consider the following bit of code from a made up language which happens to look like C++, but with garbage collection:

int* factory() {
  int* p = new int(0);
  return p;
}

For the sake of discussion, let’s assume we want to place a safepoint on method return.  (i.e. between the two lines of code)

This will get translated into LLVM which looks like this:

define i32* @factory() gc "gc name here" {
  ; This is the gc root setup
  %p = alloca i32*, align 8
  %tmp = bitcast i32** %p to i8**
  call void @llvm.gcroot(i8** %tmp, i8* null)
  store i32* null, i32** %p, align 8
  ; This is the allocation and initialization
  %1 = call noalias i8* @_Znwm(i64 4) #3
  %2 = bitcast i8* %1 to i32*
  store i32 0, i32* %2, align 4
  store i32* %2, i32** %p, align 8
  ; This is where we want the safepoint
  ; This is the return (including the required reload)
  %3 = load i32** %p, align 8
  ret i32* %3
}

Now so far everything looks fine.  The return value is reloaded from the known root location, and a safepoint inserted before the return would perform as expected.

The problem here is that compiler optimization passes are applied to this IR before the safepoint is inserted.  Running this IR through “opt -O3″ we get the following:

define i32* @factory() gc "gc name here" {
  ; Still the gcroot setup
  %p = alloca i8*, align 8
  call void @llvm.gcroot(i8** %p, i8* null)
  ; allocation, init, and return
  ; there is no store to the gcroot slot!
  %1 = call noalias i8* @_Znwm(i64 4) #2
  %2 = bitcast i8* %1 to i32*
  store i32 0, i32* %2, align 4
  ret i32* %2
}

When we try to insert the safepoint this time, we’ve got a serious problem.  All of the writes to the gcroot location have been optimized away.  There is no place in this function where we can insert a safepoint and capture the value of the pointer returned.  To put it another way, we have a live pointer which is untracked by the GC.  This is blatantly incorrect for any GC.

Now, what went wrong?  Essentially, LLVM was able to – correctly given this IR semantics – eliminate the load from p right before the return.  The value of *p is easily inferred from the store immediately preceding it.  Given that, all of the stores to p become trivially dead.  Now to be clear, this is exactly what the optimizer is supposed to do with this IR.

Aside: One fix for the problems I’m describing would be to insert safepoints before any compiler optimization.  Assuming you constructed your safepoint correctly – use and update of every gcroot alloca slot, read/write all memory with volatile semantics, etc.. – this would be adequate.

From everything I can gather from the current implementation and documentation, this is not the intended usage model of gcroot.  Adopting what I’ll call an “early safepoint insertion” model would be a modification to the existing scheme and suffers from one key problem of it’s own.  The inserted safepoints would absolutely cripple the optimizer.  Consider something simple like this bit of code:

int sum = 0;
for i in 0...n:
  sum += 1;

You’d expect the optimizer to convert this reduction into a single assignment of the form “sum = n”.  Without safepoints, that’s exactly what would happen.  If you inserted a safepoint on the backedge of the loop like so:

int sum = 0;
for i in 0...n:
  sum += 1;
  (all objects..) = safepoint(all objects...) clobber_all

The existing optimizer would no longer recognize the reduction.  Given this severe impact on practical performance, I consider early safepoint solution a non-solution from a performance standpoint.

You could improve the behavior of the naive early safepoint scheme, but not without adjusting every single optimization pass one by one.  Even then, I strongly suspect you’d worsen overall optimization in the end.  And suffer a long bug tail as well.

The core issue here is that LLVM doesn’t know that stores and loads to gcroot locations are ‘special’.  For the invariant to be upheld, LLVM would need to know that another actor can observe the values in a gcroot location.  These additional semantics are not actually baked into the IR.  As a result, the optimizer is quite capable of completely invalidating the entire scheme gcroot relies on.

Now, we’ve shown this for one particular case.  I haven’t actually checked, but I’m fairly sure this could be easily extended to any case where a pointer value escapes from a local function.  A few such cases could be:

  • Storing into a global variable
  • Storing into a object field
  • Throwing an exception

There reason this doesn’t show up everywhere is that common code patterns inhibit the optimizations we’ve discussed.  If there had been a call – which can have unknown memory effects – anywhere after the final store, LLVM would not have been able to eliminate that key store.  That’s pure luck though, not something which can be relied upon to continue being true.

For example, what happens if LLVM decides to inline that call?  Now the previous unknown side effects become known.  If LLVM decides the inlined code doesn’t write to the gcroot alloca, it can once again eliminate the store.

There are a few ways I can think of to ‘fix’ this:

  1. You could mark every one of the stores to ‘p’ as volatile.  Given LLVM’s semantics for volatile access, this would be sufficient to preserve the invariant which gcroot relies on.  However, its also going to greatly inhibit optimization of the IR.  From performance perspective, this not really a viable solution.
  2. You could extend LLVM to treat stores to gcroot locations differently.  (This would essentially be a limited form of volatile semantics.)  However, this would require a very non-trivial bit of analysis to determine the base object for every store.  In fact, doing so exactly is clearly impossible.  (e.g. (cond ? &root : nonroot) = pointer();)  As a result, your analysis would have to be somewhat conservative (i.e. classify things which aren’t gc roots as gc roots occasionally).  Beyond the major engineering effort required to make this work and modify all the places in LLVM which deal with stores, I suspect this would also have detrimental effect on optimization.  You could restrict that to functions marked as GC, but still.  I really really doubt you could get this accepted into the mainline of LLVM development.
  3. Place a call to an ‘extern’ function – one not known to the optimizer – at every site you might possibly want to insert a safepoint later.  This would have the effect of blocking the particular optimization we discussed.  You’d have to manually remove these fake calls when inserting safepoints, and you’d suffer most of the cost of early safepoint insertion.

Worth explicitly noting is that the two options practically available to the frontend author – first and third – are likely to have severe performance impact.  You might as well just do early safepoint insertion and might actually be better off doing so.

I also want to emphasize that this problem is only one symptom of a broader issue.  The gcroot scheme is fundamentally problematic since it relies on special semantics which are not modelled in the IR – i.e. that stores to gcroot locations can be observed by the garbage collector.  This is one case where it breaks; there may be – and likely are – others as well.

gcroot is broken for GCs which relocate roots

The existing gcroot mechanism is not sufficient to support the relocation of objects directly reachable from executing thread state.

To implement a precise safepoint, you need to know the location of every pointer – in particular, every copy of every pointer value – which is currently in use.  Updating one copy of a pointer, but not another leads to very very nasty bugs.

Returning to our previous example, let’s take a variant of the optimized code where we assume that LLVM is not able to remove the stores (to avoid our previous issue), but did forward the value of the load.  We’d be left with:

define i32* @factory() gc "gc name here" {
  ; This is the gc root setup
  %p = alloca i32*, align 8
  %tmp = bitcast i32** %p to i8**
  call void @llvm.gcroot(i8** %tmp, i8* null)
  ; This is the allocation and intiialization
  %1 = call noalias i8* @_Znwm(i64 4) #3
  %2 = bitcast i8* %1 to i32*
  store i32 0, i32* %2, align 4
  store i32* %2, i32** %p, align 8
  ; This is where we want the safepoint
  ; This is the return (with the load value forwarded)
  ret i32* %2
}

The core issue with the current gcroot mechanism is that it only tracks values on the stack.  In the above code, “%2″ (the forwarded value of the load) is not visible to the GC at the safepoint.  More generally, any temporary virtual register copy of a pointer inserted by the compiler is not visible to the GC.  This value could easily be held in a register across the safepoint – which in gcroot exists only as a label and has no special semantics – and be used again afterwards.  Since the GC may have moved the object ‘%2′ pointed at, this is blatantly incorrect.

Now, all of the ‘solutions’ I described above could be applied here as well (except for loads not stores).  As I said above, I believe these to be non-viable from a performance perspective.

Conclusion & Notes

I’ve highlighted two particular symptoms of what I believe to be a fundamental design issue with gcroot.  I believe we need a better mechanism which does not rely on a known set of ‘home’ locations for every root.  As I mentioned, using an early safepoint insertion model would solve some of the correctness concerns, but comes with a serious performance problem.

In the next few days, I will post a separate post with a list of a few issues with the current implementation.  I consider these less interesting than the fundamental issues discussed here, but if you were to pursue a partial solution such as marking all reads and writes to gcroot locations as volatile, they could be important to address.

Notes

  • I used a recent snapshot of LLVM to create these examples.  Specifically, I used “clang version 3.5 (trunk 198264)”.
  • My thanks for Sanjoy, Michael, and Kris for feedback on an early version of this post.  As always, credit goes to those who help find mistakes and any remaining blame for uncaught mistakes goes to the author (me) for making them.

by reames at February 21, 2014 05:42 PM

February 15, 2014

Philip Reames

RFC: GEP as canonical form for pointer addressing

This post is a copy of a proposal I sent to llvmdev yesterday.  I’m posting it here for broader dissemination. 

I would like to propose that we designate GEPs as the canonical form for pointer addressing in LLVM IR before CodeGenPrepare.

Corollaries

  1. It is legal for an optimizer to convert inttoptr+arithmetic+inttoptr sequences to GEPs, but not vice versa.
  2. Input IR which does not contain inttoptr instructions will never contain inttoptr instructions (before CodeGenPrepare.)

Aside: From follow up discussion on the thread, corallary 1 might have been broader than I’d originally intended. Depending on where discussion goes, it may be removed.

I’ve spoken with Nick Lewycky & Owen Anderson offline at the last social.  On first reflection, both were okay with the proposal, but I’d like broader buy-in and discussion.  Nick & Owen, if I’ve accidentally misrepresented our discussion or you’ve had second thoughts since, please speak up.

Background & Motivation

We want to support fully precise relocating garbage collection(1) in LLVM.  To do so, we have written a pass which inserts safepoints, read, and write barriers as appropriate.  This pass needs to be able to reliably(2) identify pointer vs non-pointer values.  Its advantageous to run this pass as late as practical in the optimization pipeline, but we can schedule it before lowering begins (i.e. before CodeGenPrepare).

We control the initial IR which is generated and can ensure that it does not contain any inttoptr instructions.  We’re looking to have a guarantee(*) that a random LLVM optimization pass will not decide to replace GEPs with a sequence of ptrtoint, int arithmetic, and inttoptr which are hard for us to reason about.

* “guarantee” isn’t really the right word here.  I’m really just looking to make sure that the community is comfortable with GEPs as canonical form.  If some pass decides to insert inttoptr instructions into otherwise clean IR, I want some assurance a patch fixing that would stand a good chance of being accepted.  I’m happy to do any cleanup required.

In addition to my own use case, here’s a few others which might come up:

  • Backends for targets which support different operations on pointers vs integers.  Examples would be some of the older mainframe architectures.  (There’d be a lot more work needed to support this.)
  • Various security related applications (e.g. CFI w.r.t. function pointers)

I don’t really want to get into these applications in detail, mostly because I’m not particularly knowledgeable on those topics.  I’d appreciate any other applications anyone wants to throw out, but lets try to keep from derailing the discussion.  (As I did to Nick’s original thread on DataLayout. :))

Notes:

  1. We’re not using the existing gc.root implementation strategy.  I plan on explaining why in a lot more detail once we’re closer to having a complete implementation that we can upstream.  That should be coming relatively shortly.  (i.e. months, not weeks, not years)
  2. As Nick pointed out in a separate thread, other types of typecasts can obscure pointer vs integer classifications.  (i.e. casting the base type of a pointer we then load through could load a field of the “wrong” type”)  I plan on responding to his point separately, but let’s leave that out of this discussion for the moment.  Having GEPs as canonical form is a step forward by itself, even if I decide to propose something further down the road.

by reames at February 15, 2014 10:22 PM

Tweaking LLVM to exploit Assume(x)

This post started off as a comment over on Embedded in Academia.  After posting it there, I realized it might be interesting to other folks.  I think this may be the most I’ve ever said publicly about this particular side project.

A while back, I took a look at improving LLVM’s ability to handle assumes. I was interested in a particular use case: pre and post conditions, in particular class invariants.

These have a fairly standard form of:

assume(x);
..do something
assert(x);

I found at the time that LLVM would happily throw away the assume block, but would not touch the assert block. The reason was that unreachability information was being exploited too early in the compiler and then was being thrown away before it reached later passes.

As a complete utter hack, I created a wrapper around __builtin_unreachable, optimized the code once, desugared the wrapper, and then optimized again. This gave the desired result for many cases.

In effect, my hack was a messy solution to a pass ordering problem. By running the optimizer the first time, the two conditions are commoned (CSE) and the unreachable information could remove both dead code paths.

You can find the code for this here: https://github.com/preames/llvm-assume-hack.

Keep in mind that all of this was done a while ago and might not properly reflect the current state of LLVM.

Another interesting hack might be writing a pass which unconditionally removes values which only flow into assume checks. This would address (in a very hacky form) the issue of function calls remaining in the final IR. (I haven’t implemented this one, but doing so would be fairly quick.)

Aside: To be clear, as described this would also be complete unsound. Consider:

bool success = modify_filesystem();
assume(success);

The pass I described would happily remove the call to modify_filesystem which is probably not the intent. You could play some tricks to avoid this using either frontend support or unsound assumptions about optimization, but that would be quite a bit more complicated.

I’d be really curious to see what Regehr’s results looked like with one or both of those hacks in place.  Given I don’t know exactly what benchmarks he ran, it might be hard to duplicate, but who knows I might give it a shot.  After all, I don’t have any plans for Monday… :)

by reames at February 15, 2014 07:58 PM

OpenMP Runtime Project

Image resize and composition

Hi all,

I have an app that captures 6 x HD television feeds in real time via 6 separate threads. The second part of the app requires that all of the 6 HD buffers get resized (1 into 1280x720 and 5 into 640x360). The third part of the app is that once the resizing is completed then the 6 resized images are composited together to recreate one full HD image (1920x1080) which is then output back to TV.

The problem is that the final output is not stable and seems to drop frames in some of the sub-windows but not all of them. I am assuming that this is most likely a timing issue compounded by the WaitForMultipleObjects construct that I am using.

I am assuming from what I have read about TBB that there maybe a more productive way of streamlining this application using TBB but not sure where I should begin.

Any suggestions greatly appreciated.

Warren Brennan

 

 

February 15, 2014 11:13 AM

Good candidate for concurrent hash map?

I have a dynamically growing 3D grid of pointers to objects (containers of 3D points, amongst other meta data). Each cell's pointer to container can be accessed by a 3D address I,J,K into the grid.

Essentially the 3D grid represents 3D space and points are added to n grid cells dependent on their spatial extents (all cells are of the same size). As the space is further explored points fall into new potential cells that do not yet exist in the grid, so the grid is expanded and new cell container objects are created to hold the new points.

I am currently trying to use read write mutexes (and other mechanisms) to provide concurrent cell object addition, read only access, and write access. So that multiple threads can (concurrently) read from multiple cells, threads can get write access to a cell and all other threads block when trying to access that cell, and other threads can be adding new cells on the fly. Btw the number of cells grow from 1 to potentially 100s of cells.

Would this be a good candidate for using a concurrent_hash_map, where the key is an IJK address, and the value is a pointer to one of my cell container objects? Can the map provide the kind of manipulation of the grid I described? Wondering if only 100's of cells is ok (I was reading 1000s are typical), if adding to the map can be handled safely, how to effectively hash IJKs, etc. Kinda vague question's I know. Or would another container would be more appropriate?

I am a newbie to TBB and any advice would be appreciated.

-Ryan

February 15, 2014 10:30 AM

February 14, 2014

OpenMP Runtime Project

boost::thread pool vs tbb parallel_for

 

Hi guys. I'm testing Intel TBB and i would appreciate any comments.

"To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner."

So, why boost::threading pool is more high performance than Intel TBB? 

Intel TBB is a task oriented model that it know hardware features. It know a better way to do something. So...I dont understand because Intel TBB has low performance than boost threading pool. 

ps: I have a Core i7 Windows Pro and Intel TBB testing creates 4 threads to execute task.

Thank you very much for your time.

class Engine
{
public:
    Engine() : m_v( Engine::Empty ) {}
    Engine( const Engine& eng ) : m_v( eng.m_v ){}
    Engine( std::vector< std::string >& v ) : m_v( v ){}

    void operator()( tbb::blocked_range< size_t >& r ) const

    { // parallel_for
        std::vector< std::string >& v = m_v;

        for( size_t iIndex = r.begin(); iIndex != r.end(); ++iIndex ) 
            Verify( v[ iIndex ] );
    }

    void Verify( std::string& str ) const
    {

       ...

    }

    std::vector< std::string >& m_v;

    void Start()
    {
        boost::thread_group grp;

        for( int iIndex = 0; iIndex < 10; iIndex++ ) //creating 10 threads...
        {
            grp.create_thread( boost::bind( &Engine::WorkThread, this, iIndex ) );
        }

        grp.join_all();

    }

    void WorkThread( int iIdx ) // Each thread take a range from vector...thread 0 handle m_v[0]...m_v[99], thread 1 handle m_v[100]...m_v[199], ....
    {
        int iStart = ( iIdx * 100 );
        int iEnd = iStart + 99 + 1;

        for( int iIndex = iStart; iIndex < iEnd; iIndex++ )
            Verify( m_v[ iIndex ] );

    }

...

    void ParallelApply( std::vector< std::string >& v ) // low performance(Intel TBB)
    {
        DWORD dwStart = GetTickCount();

        tbb::parallel_for( tbb::blocked_range< size_t >( 0, v.size() - 1 ), Engine( v ) ); 

        DWORD dwEnd = GetTickCount();

        std::cout << "(" << dwEnd - dwStart << ")" << "Elapsed" << std::endl; // ~1000 miliseconds
    }

    void ThreadLevelApply( std::vector< std::string >& v ) // high performance(boost::threading pool)
    {
        DWORD dwStart = GetTickCount();

        Engine eng( v );
        eng.Start();

        DWORD dwEnd = GetTickCount();

        std::cout << "(" << dwEnd - dwStart << ")" << "Elapsed" << std::endl; // ~500miliseconds
    }

 

February 14, 2014 01:05 AM

February 13, 2014

OpenMP Runtime Project

OS X library install_name, current_version and compatibility_version

The dynamic libraries that tbb builds on OS X are missing the install_name, current_version and compatibility_version. These should be specified at build time. (Where you already use the -dynamiclib flag, add the -install_name, -current_version and -compatibility_version flags with the appropriate values.) The install_name should be the absolute path where the library will be found after installation. For example, if libtbb.dylib will ultimately be installed at /usr/local/lib/libtbb.dylib, then at build time its install_name should be set to /usr/local/lib/libtbb.dylib. This means there will need to be a way (variable?) for the user invoking the build system to inform it what the final install prefix will be.

February 13, 2014 11:38 AM

Intel(R) TBB 4.2 update 3 is released and available for download

Changes (w.r.t. Intel TBB 4.2 Update 2):

  • Added support for Microsoft* Visual Studio* 2013.
  • Improved Microsoft* PPL-compatible form of parallel_for for better support of auto-vectorization.
  • Added a new example for cancellation and reset in the flow graph: Kohonen self-organizing map (examples/graph/som).
  • Various improvements in source code, tests, and makefiles.

Bugs fixed:

  • Added dynamic replacement of _aligned_msize() previously missed.
  • Fixed task_group::run_and_wait() to throw invalid_multiple_scheduling exception if the specified task handle is already scheduled.

Open-source contributions integrated:

  • A fix for ARM* processors by Steve Capper.
  • Improvements in std::swap calls by Robert Maynard.

You can download Intel TBB 4.2 update 3 from commercial and open source sites.

February 13, 2014 07:33 AM

January 17, 2014

Sylvestre Ledru

Debian & LLVM events

Being a bit hyperactive, I have been involved in the organization of two events. I am the main organizer with Alexandre Delanoë of the Mini Debconf 2014 in Paris, January 18 & 19th.

The (great) planning is available here:
https://france.debian.net/events/minidebconf2014/
Saturday morning presentations will be general public, the beginning of Saturday afternoon will be used by the Debian France association to vote the new status (1901 law and Debian Trusted Organization).
Sunday will be more focused on Debian itself.
During the week end, I will be talking about the Debile project, the finance of Debian France and be part of the round table on compiler selection for Debian.
The (mandatory) registration should be done on the Wiki or meetup.com

In parallel, with Tobias Grosser, we organized the LLVM devroom track at FOSDEM (Bruxelles), February 2nd (Sunday).
The schedule is a mix between core developers, third party software using LLVM / Clang and academic users.
https://fosdem.org/2014/schedule/track/llvm/
I will be talking on how to become a LLVM contributor.

Both events should be recorded.

by Sylvestre at January 17, 2014 12:46 PM

January 15, 2014

Sylvestre Ledru

Some thoughts concerning LLVM & Clang and their evolutions: Release of LLVM & Clang 3.4

We just released LLVM & Clang 3.4 (already available in Debian Jessie and Ubuntu Trusty). The clang new release introduces some more warning and error detections. The point of the blog post is not to detail the changes (LLVM / Clang) but more to take a step back on the LLVM and Clang toolchains, usages and dissemination.

For the last couple years, I have been presenting our work on the Debian side to make Clang part of the Debian infrastructure. I add the chance to speak at many events (Debconf 12 & 13, FOSDEM 2013, Linux Plumbers 2013, Distro recipes, etc). I even had Linus Torvalds attending to one of these talks (and proposing new approach). I had many feedback from the Debian/Ubuntu communities but also from other communities.
More and more projects are using Clang as part of their workflows, in term of development, production and QA. For example, some of them are running continuous integration using clang instead of gcc, others are running scan-build (Wireshark or LLVM itself) or Address Sanitizer (asan) (Firefox for example), etc.
More and more actors are moving from gcc to clang for their developments (Sony with their PS4, Chrome under Mac OS X, Apple, the Linux Kernel with the LLVMLinux project, FreeBSD, etc).
I stopped counting the number of times when people answered me "Yeh, I am now using Clang" when I said I am involved upstream.

In parallel, more and more projects are using LLVM as a backend or computing engine and, in the meantime, getting more and more attraction (not because they are based on LLVM but just because they are great projects). For example, Mesa (llvmpipe), Julia, Rust, Emscripten or Native Client (NaCL) are excellent examples of this. The two last are excellent proofs that LLVM based technologies have an impact which goes further than geek-tools and will be used on a daily basis by millions of users.

I also saw an interesting and growing number of new projects using libclang and libTooling. My feeling is that the lack of plugin capabilities in gcc for a long time and, now, its current complexity limited the innovation in term of compilation capabilities, analysis and fun tools. The features and the quality of this library are enabling new developments and I am sure we will see more and more excellent tools based on this library during 2014.
For example, the LLVM toolchain ships Clang modernize and clang format and some cool projects like include-what-you-use, DXR, OCLint or creduce are emerging.

Thanks to this great competition, gcc is also improving (ASAN, JIT, better warnings, etc), remaining still relevant as before. No doubt that 2014 will be a great year for compilers.

by Sylvestre at January 15, 2014 12:12 PM

January 06, 2014

Alp Toker

Clang 3.4 C++ compiler is out!

The LLVM Clang 3.4 release brings full coverage of the provisional C++1y / C++14 standard, lots of improved diagnostics and static analyzer enhancements to catch more bugs with fewer false positives.

There’s also plenty of new functionality in the developer SDK, including new tooling facilities for refactoring and rewriting source code.

One area we’ve focused on in particular is Windows and MSVC drop-in compatibility. Although 3.4 is not all the way there yet there’s a lot to see already.  This is a particularly exciting prospect for Windows developers who want to stay on the edge and for Unix developers who want seamless portability to the Microsoft Windows platform.

It’s been an amazing ride to 3.4 — this release incorporates significant parts of our own work at Nuanti and it’s only the beginning of what I’m certain is going to be a game changer in the way we write software applications in coming years.

I look forward to discussing how we’re using the compiler to build our own future web platform at Nuanti in short order.

Last but not least, if you care about C++ and Clang, be sure to catch my talk at FOSDEM 2014: Clang: Re-inventing the Compiler

Onward to 3.5!

by alp at January 06, 2014 04:38 PM

December 23, 2013

OpenMP Runtime Project

New code release

We are excited to announce the next release of the Intel® OpenMP* Runtime Library at  openmprtl.org.  This release aligns with Intel® Composer XE 2013 SP1 Update 2, scheduled for release in early 2014.

New features

by Terry Wilmarth (Intel) at December 23, 2013 05:30 PM

December 21, 2013

Philip Reames

Accurate stack traces, compiler optimization, and LLVM/Clang

This post is a summary of the current status of LLVM and Clang with regards to their ability to preserve stack traces in the face of optimization. Such preservation is useful in the implementations of some programming languages, but in the context of this post, I’m mostly going to be discussing it in the context of debugging.

Just to note, this post does not talk about preserving values for the debugger at all. I know LLVM currently has issues in this area – which are being fixed at a fairly decent rate – but that is not the subject of this post.

Why is stack trace preservation desirable?

When debugging a runtime fault, the stack trace – obtained either through gdb’s backtrace command, a core file, or a debugging routine such dbgutils’s ‘print_backtrace()’ routine – is often one of the key tools available. A stack trace provides a listing of caller function and the line number associated with the call to the callee function. In the final frame, the line number will often provide a strong hint as to the immediate cause of the crash – if not always the root cause.

When a stack trace is inaccurate, it can hide valuable information about the cause of the program. In some cases, it can evenly actively mislead.

Wait, you mean my stack trace might be wrong?

In short, yes. And it’s actually fairly common.

Unfortunately, commonly applied compiler optimizations have a tendency to render the stack trace reported by common tools inaccurate. Below, you can find a sample of the ones that have been uncovered in LLVM to date. You can also find code fragments and the resulting output from gdb at various optimization levels, in this accompanying code.

If you’re not interested in the details, you can also skip the next section.

inline path

void a() {
  debug_trap();
  extern_function();
}
void b() {
  a();
  extern_function();
}
void c() {
  a();
  extern_function();
}

__attribute__((noinline))
void inlining_path() {
  b();
  c();
  extern_function();
}

Simple inlining – probably the most important compiler optimization after simple constant propagation – can destroy your call trace. If either “b” or “c” were inlined into “inlining_path” without recording some extra metadata, we’d loose the accurate stack trace.  We would not be able to tell where the call to “debug_trap” was originally from the path “a”,”b”, “inlining_path” or the path “a”,”c”,”inlining_path”.  We might be able to determine two of the elements in the path (if “a” itself wasn’t inlined), but we’d definitely loose the middle element in the path.

Thankfully, modern versions of Clang and GCC get this one right. At least if you pass “-g” to request debug information. With non-”-g” builds, you likely will see this occurring.

self recursive tail call optimization

void factorial(int n, int accum = 1) {
  if( n < 1 ) return accum;
  else return factorial(n-1, n*accum);
}

The above is a classic factorial function, rewritten slightly to pass the accumulated value down through the recursive calls. This formulation makes it obvious that nothing happens in the caller frame after the call to the recursive case. Given this, a commonly applied compiler optimization is to simply replace the caller frame (n factorial) with the callee frame (n-1 factorial) after rearranging all the arguments into the appropriate locations for the calling convention. In particular, the callers frame is not preserved. This converts an algorithm which is O(n) in stack space to one which is O(1).

Unfortunately, it also have the effect of completely removing all mention of the intermediate frames from the stack trace. This isn’t such a big issue in this case – with a hint it’s pretty clear what happened – but for more complicated self recursive functions it can be downright confusing.

Currently, Clang gets this wrong at every optimization level beyond “-O0″. Including at “-O1″ which is frequently considered a “fast debug” build.

sibling call optimization

A generalization of the self recursive TCO mentioned above is that calls to _any_ function found in a tail recursive position can use the frame replacement trick. This is critical for the transformation of mutually recursive functions into iteration, and also reduces stack usage in surprisingly common cases.

However, precisely because a tail call is relatively common, this is probably the case of stack trace destruction most often seen in practice. And since the stack trace might contain no mention of the caller function at all – not even another invocation – it can have the appearance of creating manufactured calls to seemly random functions.

One of the most confusing examples of this I’ve seen involved a virtual function which was devirtualized in a tail call position. The virtual function called a base class implementation of the same function as a helper routine. The stack trace had the appearance of completely skipping the subclass’s definition of the virtual method.

As before, Clang gets this wrong at all levels beyond “-O0″.

Aside: If you wondered why I have the “extern_function();” calls all over the place in the example code, this is why. Without this, writing useful test cases for other issues is rather frustrating.

Aside: I found it surprising that neither tail call optimization or sibling TCO respected the “noinline” attribute. While technically this is correct – it isn’t inlining them – it seems dangerous given common usage of “noinline” as a “don’t touch calls to this compiler!” annotation.

basic block commoning (hoisting or sinking)

When the compiler can recognize that two instructions perform the same action, it will attempt to execute the action once and reuse the result. When two possible execution paths diverge due to a condition, or join into a single successor node, the compiler will often attempt to pull duplicated code out of the branches into a single copy preceding (following) the split (join).

As an example, the compiler can convert:

if(b) { a = 5; } else { a = 5; }

into simply:

a = 5;

Now this particular case isn’t problematic, but consider the following:

if(b) { a = compute_a(); } else { a = compute_a(); }

Is it safe to remove the “redundant” code? Both Clang and GCC will rewrite this fragment analogous to the above. Unfortunately, we’ve now lost line information for the stack trace. Unless the value of “b” happens to be retained for debugging (unlikely), we have no clue which branch was taken.

Now the particular example above is fairly harmless. For one thing, we didn’t change the frames in play as shown by the debugger. (See the companion code for that.) Unfortunately, you can easily construct examples – by combining inlining, constant folding, macros, and other complexities of real code – where this has the effect of completely obscuring large sections of control flow.

Aside: In the time between when I original wrote this post and when I got around to publishing it, Clang’s results on this class of examples improved. Originally, I had an example which showed an inaccurate stack trace at “-O3″; I can no longer reproduce that.  Yeah!

Help! How can I get back my sane stack traces?

Unfortunately, there is no easy answer here. As recently as Dec 2013, Clang had bugs related to this topic. (I would be shocked if GCC did not as well.)

Part of the problem is that to preserve stack traces, potentially non-trivial optimization possibilities have to be ignored. Generally, no optimizer author is keen to make that choice.

The consensus in the Clang/LLVM community appears to be that code compiled at “-O1″ is intended for so called “fast debug” builds. As a result, several of the optimizations mentioned above are disabled at that level. This improves the quality of the stack traces greatly, at some performance cost.

Note: I could not find any “official” documentation of this consensus. However, it has arisen in a number of separate conversations on the llvmdev mailing list and appears to be widely accepted.

One possibility – which has been strongly rejected by key LLVM contributors and for good cause – would have been to enable different optimizations under the combination of “-On” and “-g”. While initially, this may seem promising, it breaks a key tenant of debug information. The ability to take a currently broken program, recompile with debug information, and still have a good chance of reproducing the original problem is a key design objective. In practice, having different sets of compiler options enabled between the debug and non-debug builds would unacceptable increase the odds that a problem could not be reproduced.

Unfortunately, there is currently no way to opt out of stack trace breaking optimizations at higher optimization levels. It’s like this will change moving forward since there are a number of interested parties – the sanitizer guys and myself among them – who need to present accurate stack traces. Right now, this is being done with sanatizer specific function attributes, but it’s possible this mechanism could be generalized in the future: either as a special purpose function attribute or a command line flag for Clang & opt (i.e. something like “-fpreserve-stack-traces”). As always when it comes to open source, patches welcome!

Dreaming Big

So, what might an ideal solution look like for LLVM? I’ve already hinted at some elements, but let’s spell it all out.

Note: I stole some of these ideas from others proposals. I don’t take credit for the good ideas and do deserve blame for the bad ones. Feel free point out flaws in comments, on llvmdev, or by direct email.

Here are the use cases I currently know of:

To satisfy most of these needs, I’d suggest the following:

First, a new function attribute (bikeshed: “preserve-stack-traces”) would need to be added. All of the optimizations currently condition on sanatizer specific attributes would be made dependent on the lack of this attribute.

Second, an option to Clang (and opt, llc, etc..) would enable the addition of this attribute to every function. (bikeshed: “-fpreserve-stack-traces”) Additionally, C++11 attribute would be added for both the positive and negative forms of the attribute. (i.e. [[__fake::preserve-stack-traces]] and [[__fake::no-preserve-stack-traces]])

Third, an analysis pass could be written to selectively remove the attribute from functions which provably do not require stack trace information. In general this would be very, very hard. It basically reduces to proving a given function can never crash. On the other hand, implementing such a pass to preserve language specific semantics that were less strict would be fairly straight forward. (i.e. does this function potentially call an interesting function?) My guess is that this is where the major utility lies.

Out of these parts, the only real wildcard is the third. The first and second are clearly implementable and would likely be useful in their own. If the third part proved successful, there’s a hope for fully optimized builds with optimization limited only to functions which might require stack trace information.

by reames at December 21, 2013 05:57 AM