Planet Clang

December 29, 2023

Aaron Ballman

Musings on the C charter

Within the C committee, there’s been a lot of talk about what the C charter means and whether it should be updated or not. My personal musing on the topic is that the C charter needs to be updated in order for C to remain relevant long-term. Whether C should remain relevant long-term is an exercise left to the reader. Note, these thoughts may change over time as I gather more feedback from more places.

Existing code is important, existing implementations are not.
A standard is a treaty between implementor and programmer.
Migration of an existing code base is an issue.
Minimize incompatibilities with C90.

I think we could combine these principles into one bullet instead of four because they’re all effectively about code migration to newer standard versions.

Each new release of C should provide a simple upgrade path from the previous release of C so that users can gradually transition between adjacent versions of C, but incompatibilities between releases are acceptable. In other words, it’s fine to deprecate old functionality even without replacement, or remove previously deprecated functionality (again, even without replacement), but this should only be done with significant justification given the costs to users of C.

Changing the standard a code base compiles against is a new agreement between the programmer and the implementer; it is akin to changing your code (as is changing your compiler), and there should be no expectation that everything remains identical to how it was before. Older functionality will be deprecated, newer functionality will be added, bugs will be fixed, etc. New standard == new treaty.

I think we should be careful about blind adherence to this principle. It’s obviously true that there’s a lot of economic value tied up in existing C code bases and that rewriting that code is both expensive and error-prone. However, C has undergone multiple revisions in its long history. A good idea in 1978 is not necessarily even acceptable practice in 2024. I would love to see this principle updated to set a time limit, along the lines of: existing code that has been maintained to not use features marked deprecated, obsolescent, or removed in the past ten years is important; unmaintained code and existing implementations are not. If you cannot update your code to stop relying on deprecated functionality, your code is not actually economically important — people spend time and money maintaining things that are economically important.

C code can be portable.
C code can be non-portable.

I think both of these principles are still reasonable today. While commodity hardware has trended towards more homogeneity, there is still plenty of heterogeneous hardware solutions out there (DSPs, FPGAs, etc) and domains like AI, ML, cryptography, etc are pushing the boundaries for new hardware solutions. We should continue to aim to be as portable as possible while not constraining ourselves to the lowest common denominator of hardware.

Avoid quiet changes.

I still strongly believe in this principle. It directly impacts transitions between standard versions — quiet changes make it much harder for users to upgrade to the latest standard.

Keep the spirit of C.

I do not think this principle has much value because it is too subjective, though I understand the desire behind it. I think people want C to remain familiar between releases instead of being reinvented. For example, I think users do not want to see a Perl 5/Perl 6- or Python 2/Python 3-style bifurcation of the language and community. However, I think the principle that existing code is important is sufficient to cover that need.

Most especially, I think concepts like “trust the programmer”, “don’t prevent the programmer from doing what needs to be done”, “make it fast”, etc are problematic given the increased need for improved security in system’s level programming languages.

I would prefer to see this bullet removed because it’s not an actionable principle and I think other bullets cover the main thrust of what it was trying to say.

Support international programming.

I still strongly believe in this principle, though I worry that WG14 lacks enough expertise to really help validate designs in this space. I appreciate that WG21’s SG16 members have been willing to provide advice, and that the Unicode consortium folks are actively engaging with programming language committees as well. But there’s more to internationalization than simply text (for example, units of measure, date/time functionality, and input methods are also part of internationalization). I think the goal is excellent, but hopefully we have enough humility as a committee to recognize when we could use outside advice.

Codify existing practice to address evident deficiencies.
Unlike for C99, the consensus at the London meeting was that there should be no invention, without exception.

I think these bullets are saying the same thing and we only need to say this once if we wish to say it at all. However, I do not think the committee agrees with the idea of no invention without exception and we do everyone a disservice by stating it as a principle. For starters, we’ve shown we’re perfectly comfortable with invention during the C11 and C23 cycles. Invention is necessary if we want to unify divergent existing practice in the wild into something standardized. “No new invention” is why we still have no solution for a “pointer + size” type or a “string” type. There are a lot of different ways to solve those kinds of problems and until the committee picks one and says “This is The Way”, we’ll continue to see more ad hoc solutions.

That said, I think there are really important points hiding in these two bullets. For example, having a principled way to separate “need” from “want” is critical to avoid spending too much committee time on work that does not benefit enough users. What makes something a good feature to standardize vs not? How integrated should a feature be (for example, does adding a new arithmetic type to the language also require it be supported in I/O operations in the standard library, etc)? When two implementations support the same notional feature with slightly differing semantics, should the committee use undefined behavior to resolve the conflict so no users have to change code, or should the committee push for well-defined behavior despite knowing it will break some users? I think we want to have answers to these sorts of questions, but they’re less about charter and more about feature design guidance. Perhaps we want a second document to help authors understand what the committee needs in a proposal and why?

Minimize incompatibilities with C++

C++ inherits the C standard library almost directly from C, but the core C++ language is specified entirely independently of the C language specification. Despite the languages being distinct in terms of specification, every production C++ compiler also claims to be a conforming C compiler. Further, C++ standard library implementations eventually need to use C interfaces to communicate with the host operating system. So I think it is in the best interests of WG21, users of C++, and C++ implementers for WG14 to minimize incompatibilities with C++.

Based on my experience as the former C and C++ compatibility study group chair and from being on both of these committees for about a decade, I think WG14 spends a significantly larger percentage of committee time on C and C++ compatibility than WG21 does. WG14 has a charter that mandates we minimize these incompatibilities (WG21 has no such mandate as they do not have a charter). The WG14 committee spends a considerable amount of meeting time deliberating how changes to C will impact C++ and our convener regularly reminds us of our obligations to minimize incompatibilities where possible. Comparatively, WG21 spends relatively little committee time on C compatibility and WG21 leadership does not proactively attempt to steer the committee to minimize incompatibilities. Given the difference in size between our committees (WG21 is approximate 15x larger than WG14), this is not a tenable situation. The committee with the least resources is spending the most time on compatibility despite reaping the least benefits.

I believe compatibility with C++ should not be a chartered requirement for features brought into C that are present in C++. Without closer cooperation between WG21 and WG14 (such as a shared base document between the two languages), I do not think WG14, C implementers, or C users get sufficient benefit for the amount of WG14 committee effort spent on compatibility with C++. Unfortunately, an inability for the committees to effectively collaborate hurts the people least represented on either committee: users working in this shared space. Therefore, I do believe compatibility with C++ should continue to be a goal as a design principle, especially for features that are likely to be used in header files shared between a mixed C and C++ code base.

Maintain conceptual simplicity.

I still believe in this principle, though it’s a bit too ambiguous for my tastes. What one committee members finds to be simple, another might find to be too complex. For example, is a feature for overloading operators maintaining conceptual simplicity? I suspect asking five committee members that exact question will net about seven different answers.

Trust the programmer, as a goal, is outdated in respect to the security and safety programming communities.

I don’t think this goes far enough. I think “trust the programmer” has lead to decades of security vulnerabilities, to the point that major government agencies are now recommending against using C (and C++). I would prefer we take a stronger stance along the lines of: do not trust the programmer. Design facilities such that their misuse becomes almost impossible outside of pathological code. Prefer “defined” behavior (well-defined, implementation-defined, or unspecified behaviors) to undefined behavior where possible. When misuse is possible, require strong error detection mechanisms which can prevent undefined behavior. I think it’s perfectly reasonable to expect “trust me, I know what I’m doing” users to have to step outside of the language to accomplish their goals and use facilities like inline assembly or implementation extensions.

Application Programming Interfaces (APIs) should be self-documenting when possible.

I agree with the idea, but I don’t agree with the way it is presented regarding preference for VLA interfaces. That’s reasonable guidance for new APIs unrelated to any existing APIs, but it is not defensible when adding new APIs that are related to existing ones. e.g., adding memset_explicit should not (and thankfully did not) rearrange parameter order to fit this guidance, because it would have been very hard for people to migrate existing uses of memset to memset_explicit without introducing memory-related errors. I would prefer we made the guidance more general and less about VLAs.

What’s Missing?

To me, the biggest thing we’re missing is a plan. ISO standards committees are “obligated” to consider all proposals from committee members, and so the approach the committee takes is to take whatever papers arrive and try to give the author feedback so that they can make progress. That said, our obligations do not include a mandate on when to schedule discussion of papers — we could elect to consider papers that aren’t in the committee’s plan only after considering all other in-scope work. But the committee has never gone to those lengths and so we release standards with a hodge podge of unrelated features and bug fixes. I would love to see the committee charter require that we come up with a roadmap of what we want to accomplish in a given release cycle and then stick to that road map. I don’t mind considering occasional out-of-scope work during the cycle, but that should not come at the expense of in-scope work. I think having a roadmap will also help the relationship between the committee, implementers, and users because it gives everyone more time to plan for what’s coming in the next few years.

The #1 thing I want to see on that roadmap is a goal to address the memory safety issues that plague C. We don’t have to fix everything at once, but we do need to start making inroads into solving the problem.

Closing Thoughts

Regardless of whether others on the committee agree with my current thinking, I think it’s a great thing that committee is reflecting on our charter. I think C is at an inflection point in many ways the should spend time carefully considering what we want the future of the language to look like. I don’t think C or C++ will be irrelevant any time soon, but I do worry they’re heading that direction and will continue to do so without course correction. No programming language is “too big to fail”, not even something as venerable and ubiquitous as C.

by Aaron Ballman at December 29, 2023 02:52 PM

October 01, 2023

Aaron Ballman

What’s New in C in 2023?

The year is 2023 and you’re asking “What’s new in C?” Surely the answer is “absolutely nothing”, right?

Wrong!

C23 will be released in the coming months (likely in early 2024), and this is a whirlwind tour of what changes were made for the latest revision of one of the world’s most popular programming languages.

I originally prepared this talk for NDC Tech Town in Kongsberg, but I was unable to give the talk because a hurricane + poor airline services interrupted my travel plans. Oh well, their loss is your gain! I hope you enjoy the talk.

Errata

  • The slide about underlying enumeration types did not make it clear that the choice of the underlying type for the enumeration type is implementation-defined, so the given types in the comments might be incorrect depending on your implementation.
  • When mentioning that signed overflow is still implementation-defined, I misspoke. I was talking about this situation involving unsigned -> signed conversion:
unsigned int i = UINT_MAX;
signed int j = i; // Implementation-defined behavior

However, signed *overflow* is still undefined behavior, as it always was. e.g.,

signed int i = INT_MAX + 1; // Undefined behavior

by Aaron Ballman at October 01, 2023 06:11 PM

August 27, 2020

Aaron Ballman

Don’t use the [[likely]] or [[unlikely]] attributes

C++20 introduced the likelihood attributes [[likely]] and [[unlikely]] as a way for a programmer to give an optimization hint to their implementation that a given code path is more or less likely to be taken. On its face, this seems like a great set of attributes because you can give hints to the optimizer in a way that is hopefully understood by all implementations and will result in faster performance. What’s not to love?

The attribute is specified to appertain to arbitrary statements or labels with the recommended practice “to optimize for the case where paths of execution including it are arbitrarily more likely|unlikely than any alternative path of execution that does not include such an attribute on a statement or label.” Pop quiz, what does this code do?

if (something) {
  [[likely]];
  [[unlikely]];
  foo(something);
}

Sorry, but the answer key for this quiz is currently unavailable. However, one rule you should follow about how to use these attributes is: never allow both attributes to appear in the same path of execution. Lest you think, “but who would write such bad code?”, consider this reasonable-looking-but-probably-very-unfortunate code:

#define MY_COOL_ASSERT(x) [[unlikely]] assert(x)

if (something) {
  [[likely]];
  MY_COOL_ASSERT(something > 0);
  foo(something);
}

Despite the name, these attributes do not mark whether the statement itself is likely, only whether the path leading to the statement is likely. This brings us to our second rule: only mark the dominating statement or label of the flow control path you want to optimize for. This will often mean you only mark the compound statement after a flow control statement, as in:

if (foo) [[likely]] {
  do_something(foo);
}

while (bar) [[unlikely]] {
  ;
}

switch (baz) {
[[likely]] case 0: whatever(); break;
[[unlikely]] case 1: something_else(); break;
default: break;
}

Speaking of code that looks reasonable when you apply the attribute to the dominating statement of flow control, what does this code do?

if (foo) [[likely]] { // A
  baz();
} else if (bar) [[likely]] { // B
  quux();
} else [[unlikely]] { // C
  bing();
}

It marks the true branch at A as being likely and says nothing about the false (else) branch (making it unlikely by default). It then marks the true (if) branch at B as being likely and has a redundant (but harmless) [[unlikely]] attribute at C. What it does not do is mark that A and B are equally likely and C is unlikely; it will optimize the path for A over B rather than treat them as equals. The issue is that the attribute is not written on the dominating statement of flow control, and the code should be written as:

if (foo) [[likely]] { // A
  baz();
} else [[likely]] if (bar) [[likely]] { // B
  quux();
} else [[unlikely]] { // C
  bing();
}

However, because of the duplicate likely branches at A and B (on the else), it’s not clear what the implementation will do with the construct from reading the code (not to mention that it’s super ugly and unintuitive code). Because of this, the initial rule should be augmented to be: Never allow multiple likelihood attributes to appear in the same path of execution. This sort of confusion comes up in other places as well. Pop quiz, which cases are likely and which cases are unlikely in the following?

switch (foo) {
case 0:
[[likely]] case 1:
[[unlikely]] case 2: bar(); break;
[[likely]] case 3:
default: baz(); break;

Sorry, I still can’t find the answer key. Given that case 1 looks to be likely, but it falls through to case 2 which looks to be unlikely, it’s hard to say what should happen here. Further, it’s hard to say whether the default case is likely given that case 3 is likely. The only unambiguous cases are that case 0 say nothing about whether it is or isn’t likely and case 3 is likely. Unfortunately, the wording from the standard leaves a bit to be desired when considering switch statements because it says “A path of execution includes a label if and only if it contains a jump to that label.” A switch statement contains a path of execution which can jump to any of its labels, so when you couple this recommended practice with the earlier one about applying to arbitrary statements, you have to work to answer whether this code path is likely, unlikely, or something else:

if (foo) { // is this branch likely or unlikely?
  switch (*foo) {
  [[likely]] case 0: bar(); break;
  [[unlikely]] case 1: baz(); break;
  [[likely]] default: quux(); break;
  }
} else {
  ...
}

Now, a sensible person would look at this and say “aha, those attributes shouldn’t impact the if statement because they’re within a different control flow statement with its own substatements.” Well, unfortunately the standard doesn’t say anything about how recursive these attributes should be treated. For instance, one would certainly hope that an implementation allowing attributes on arbitrary statements would do something reasonable with this:

if (foo) { // Is this path likely?
  {
    [[likely]];
    SomeRAIIObject Obj;
    Obj.whatever(foo);
  }
} else {
  ...
}

The standard also doesn’t say what happens when you follow my rule to only mark the dominating statement or label and that leads to a conflict like this (with thanks to Arthur O’Dwyer for the example):

if (ch == ' ') [[likely]] {
  goto whitespace;  // A
} else if (ch == '\n' || ch == '\t') [[unlikely]] {
  goto whitespace;  // B
} else {
  foo();
}
[[likely]] whitespace: bar();  // C

The [[likely]] attribute at C says the path at both A and B are likely, despite the path of B being marked as unlikely. Which attributes, if any, are ignored? Who knows — in all of these circumstances, the standard says nothing and so implementations will likely come up with different answers to different situations. This brings me to the next rule: assume no two implementations will behave the same way for optimizing using these attributes.

So given all of these odd issues with the attributes, why would you want to use them? In my mind, there are only two use cases for the likelihood attributes. Either you have an implementation which does not support profile-guided optimizations (which will generally do a far better job of predicting branch weights for optimization than a programmer ever could) or you need to optimize a code path in a strange way where you cannot use PGO. The first question is the easier one to address: can you point to a C++20 implementation that doesn’t support profile-guided optimizations? I can’t find one. Maybe these implementations really do exist, but the major vendors all support the concept, so this isn’t a very compelling argument for adding the attributes to your own code unless you’re in that situation. That is why my rule is: prefer profile-guided optimization over likelihood attributes. It is more suited to the purpose of optimizing flow control and is likely to result in better performing code.

The second situation is more interesting to talk about because it seems off-the-wall until you understand it. Sometimes you want to optimize the failure path that almost never gets hit rather than the common paths that do. Consider writing some safety-critical piece of code to control an elevator where you need the failure path to meet some real-time obligations in order to stop the elevator from dashing its occupants to death. In that situation, your optimization needs can’t be met by PGO and the likelihood attributes could be very useful. Consider this use case which came up during the standards discussions about the feature:

try {
  foo();
} catch (...) [[likely]] {
  dont_kill_people();
}

This is an attempt to convince the optimizer to optimize the catch statement control flow path, but it has three problems that may not be obvious from looking at the code. The first problem is a small one, the attribute is misnamed for its use in this case, which makes the code far harder to read than it needs to be. The second problem is that the C++ grammar doesn’t allow you to write the attribute at that position! You’d have to put the [[likely]] attribute inside of the catch block’s compound statement. The final problem is: implementations typically have no idea how to optimize the failure path for C++ exceptions. So these attributes failed to address the intended need in this circumstance, which is another rule: not all flow control paths can be optimized. Exception handling, setjmp/longjmp, the branches in a ?: operator are all examples of flow control where the likelihood attributes either cannot be written or may look like they’ll do something useful, but likely won’t (pun totally intended).

Let’s review the rules we’ve got so far:

0) Never allow multiple likelihood attributes to appear in the same path of execution.
1) Only mark the dominating statement or label of the flow control path you want to optimize for.
2) Assume no two implementations will behave the same way for optimization behaviors with these attributes.
3) Prefer profile-guided optimization over likelihood attributes.
4) Not all flow control paths can be optimized.

These attributes are starting to look a bit more like some other code constructs we’ve seen in the past: the register keyword as an optimization hint to put things in registers and the inline keyword as an optimization hint to inline function bodies into the call site. Using register or inline for these purposes is often strongly discouraged because experience has shown that optimizer implementations eventually improved to the point where they were consistently better at optimizing than the user trying to give their own hints. However, at least the register and inline keywords have other semantic impact (like not being able to take the address of a register variable in C). The likelihood attributes have no semantic impact beyond their optimization hints. Given how hard it is to use these attributes properly (especially if the code is being compiled by multiple implementations), how good profile-guided optimization is by comparison, and that there is no semantic impact from the attribute, my recommendation is to never use the likelihood attributes. They’re just not worth it.

by Aaron Ballman at August 27, 2020 02:57 PM

June 02, 2020

Sylvestre Ledru

Debian rebuild with clang 10 + some patches

Because of the lock-down in France and thanks to Lucas, I have been able to make some progress rebuilding Debian with clang instead of gcc.

TLDR

Instead of patching clang itself, I used a different approach this time: patching Debian tools or implementing some workaround to mitigate an issue.
The percentage of packages failing drop from 4.5% to 3.6% (1400 packages to 1110 - on a total of 31014).

I focused on two classes of issues:

Qmake

As I have no intention to merge the patch upstream, I used a very dirty workaround. I overwrote the g++ qmake file by clang's:
https://salsa.debian.org/lucas/collab-qa-tools/-/blob/master/modes/clang10#L44-47

I dropped the number of this failure to 0, making some packages build flawlessly (example: qtcreator, chessx, fwbuilder, etc).

However, some packages are still failing later and therefore increasing the number of failures in some other categories like link error. For example, qtads fails because of ordered comparison between pointer and zero or oscar fails on a -Werror,-Wdeprecated-copy error.

Breaking the build later also highlighted some new classes of issues which didn't occur with clang < 10.
For example, warnings related to C++ range loop or implicit int float conversion (I fixed a bunch of them in Firefox) .

Symbol differences

Historically, symbol management for C++ in Debian has been a pain. Russ Allbery wrote a blog post in 2012 explaining the situation. AFAIK, it hasn't changed much.
Once more, I took the dirty approach: if there new or missing symbols, don't fail the build.
The rational is the following: Packages in the Debian archive are supposed to build without any issue. If there is new or missing symbols, it is probably clang generating a different library but this library is very likely working as expected (and usable by a program compiled with g++ or clang). It is purely a different approach taken by the compiler developer.

In order to mitigate this issue, before the build starts, I am modifying dpkg-gensymbols to transform the error into a warning.
So, the typical Debian error some new symbols appeared in the symbols file or some symbols or patterns disappeared in the symbols file will NOT fail the build.

Unsurprisingly, all but one package (libktorrent) build.

Even if I am pessimistic, I reported a bug on dpkg-dev to evaluate if we could improve dpkg-gensymbol not to fail on these cases.

Next steps

The next offender is Imake.tmpl:2243:10: fatal error: ' X11 .rules' file not found with more than an hundred occurrences, reported upstream quite sometime ago.

Then, the big issues are going to be much harder to fix as they are real issues/warnings (with -Werror) in the code of the packages. Example: -Wc++11-narrowing & Wreserved-user-defined-literal... The list is long.
I will probably work on that when llvm/clang 11 are in RC phase.

For maintainers & upstream

Maintainer of Debian/Ubuntu packages? I am providing a list of failing packages per maintainer: https://clang.debian.net/maintainers.php
For upstream, it is also easy to test with clang. Usually, apt install clang && CC=clang CXX=clang++ <build step> is good enough.

Conclusion

With these two changes, I have been able to fix about 290 packages. I think I will be able to get that down a bit more but we will soon reach a plateau as many warnings/issues will have to fix in the C/C++ code itself.

by sylvestre at June 02, 2020 11:45 PM

April 22, 2020

LLVM Blog

The New Clang _ExtInt Feature Provides Exact Bitwidth Integer Types

Author: Erich Keane, Compiler Frontend Engineer, Intel Corporation

Earlier this month I finally committed a patch to implement the extended-integer type class, _ExtInt after nearly two and a half years of design and implementation. These types allow developers to use custom width integers, such as a 13-bit signed integer. This patch is currently designed to track N2472, a proposal being actively considered by the ISO WG14 C Language Committee. We feel that these types are going to be extremely useful to many downstream users of Clang, and provides a language interface for LLVM's extremely powerful integer type class.

Motivation

LLVM-IR has the ability to represent integers with a bitwidth from 1 all the way to 16,777,215((1<<24)-1), however the C language is limited to just a few power-of-two sizes. Historically, these types have been sufficient for nearly all programming architectures, since power-of-two representation of integers is convenient and practical.

Recently, Field-Programmable Gate Array (FPGA) tooling, called High Level Synthesis Compilers (HLS), has become practical and powerful enough to use a general purpose programming language for their generation. These tools take C or C++ code and produce a transistor layout to be used by the FPGA. However, once programmers gained experience in these tools, it was discovered that the standard C integer types are incredibly wasteful for two main reasons.

First, a vast majority of the time programmers are not using the full width of their integer types. It is rare for someone to use all 16, 32, or 64 bits of their integer representation. On traditional CPUs this isn't much of a problem as the hardware is already in place, so having bits never set comes at zero cost. On the other hand, on FPGAs logic gates are an incredibly valuable resource, and HLS compilers should not be required to waste bits on large power of two integers when they only need a small subset of that! While the optimizer passes are capable of removing some of these widths, a vast majority of this hardware needs to be emitted.

Second, the C language requires that integers smaller than int are promoted to operations on the 'int' type. This further complicates hardware generation, as promotions to int are expensive and tend to stick with the operation for an entire statement at a time. These promotions typically have semantic meaning, so simply omitting them isn't possible without changing the meaning of the source code. Even worse, the proliferation of auto has resulted in user code results in the larger integer size being quite viral throughout a program.

The result is massively larger FPGA/HLS programs than the programmer needed, and likely much larger than they intended. Worse, there was no way for the programmer express their intent in the cases where they do not need the full width of a standard integer type.

Using the _ExtInt Language Feature

The patch as accepted and committed into LLVM solves most of the above problems by providing the _ExtInt class of types. These types translate directly into the corresponding LLVM-IR integer types. The _ExtInt keyword is a type-specifier (like int) that accepts a required integral constant expression parameter representing the number of bits to be used. More succinctly: _ExtInt(7) is a signed integer type using 7 bits. Because it is a type-specifier, it can also be combined with signed and unsigned to change the signedness (and overflow behavior!) of the values. So "unsigned _ExtInt(9) foo;" declares a variable foo that is an unsigned integer type taking up 9 bits and represented as an i9 in LLVM-IR.

The _ExtInt types as implemented do not participate in any implicit conversions or integer promotions, so all math done on them happens at the appropriate bit-width. The WG14 paper proposes integer promotion to the largest of the types (that is, adding an _ExtInt(5) and an _ExtInt(6) would result in an _ExtInt(6)), however the implementation does not permit that and _ExtInt(5) + _ExtInt(6) would result in a compiler error. This was done so that in the event that WG14 changes the design of the paper, we will be able to implement it without breaking existing programs. In the meantime, this can be worked around with explicit casts: (_ExtInt(6))AnExtInt5 + AnExtInt6 or static_cast<ExtInt(6)>(AnExtInt5) + AnExtInt6.

Additionally, for C++, clang supports making the bitwidth parameter a dependent expression, so that the following is legal:
template<size_t WidthA, size_t WidthB>
  _ExtInt(WidthA + WidthB) lossless_mul(_ExtInt(WidthA) a, _ExtInt(WidthB) b) {
  return static_cast<
_ExtInt(WidthA + WidthB)>(a) 
       * static_cast<_ExtInt(WidthA + WidthB)>(b);


We anticipate that this ability and these types will result in some extremely useful pieces of code, including novel uses of 256 bit, 512 bit, or larger integers, plus uses of 8 and 16 bit integers for those who can't afford promotions. For example, one can now trivially implement an extended integer type struct that does all operations provably losslessly, that is, adding two 6 bit values would result in a 7 bit value.

In order to be consistent with the C Language, expressions that include a standard type will still follow integral promotion and conversion rules. All types smaller than int will be promoted, and the operation will then happen at the largest type.  This can be surprising in the case where you add a short and an _ExtInt(15), where the result will be int. However, this ends up being the most consistent with the C language specification.

Additionally, when it comes to conversions, these types 'lose' to the C standard types of the same size or greater. So, an int added to a _ExtInt(32) will result in an int. However, an int and a _ExtInt(33)will be the latter. This is necessary to preserve C integer semantics.

History

As mentioned earlier, this feature has been a long time coming! In fact, this is likely the fourth implementation that was done along the way in order to get to this point. Additionally, this is far from over, we very much hope that upon acceptance of this by the WG14 Standards Committee that additional extensions and features will become available.

I was approached to implement this feature in the Fall of 2017 by my company's FPGA group, which had the problems mentioned above. They had attempted a solution that used some clever parsing to make these look like templates, and implemented them extensively throughout the compiler. As I was concerned about the flexibility and usability of these types in the type and template system, we opted to implement these as a type-attribute under the controversially named Arbitrary Precision Int (spelled __ap_int). This spelling was heavily influenced by the vector-types implementations in GCC and Clang.

We then were able to wrap a set of typedefs (or dependent __ap_int types) in a structure that provided exactly the C and C++ interface we wished to expose. As this was a then proprietary implementation, it was kept in our downstream implementation, where it received extensive testing and usage.

Roughly a year later (and a little more than year ago from today!) I was authorized to contribute our implementation to the open source LLVM community! I decided to significantly refactor the implementation in order to better fit into the Clang type system, and uploaded it for review.This (now third!) implementation of this feature was proposed via RFC and code review at the same time.

While the usefulness was immediately acknowledged, it was rejected by the Clang code owner for two reasons: First the spelling was considered unpalatable, and Second it was a pure extension without standardization. This began the nearly year-long effort to come up with a standards proposal that would better define and describe the feature as well as come up with a spelling that was more in line with the standard language.

Thanks to the invaluable feedback and input from Richard Smith, my coworkers Melanie Blower, Tommy Hoffner, and myself were able to propose the spelling _ExtInt for standardization. Additionally, the feature again re-implemented at the beginning of this year and eventually accepted and committed!

The standardization paper (N2472) was presented at this Spring's WG14 ISO C Language Committee Meeting where it received near unanimous support. We expect to have an updated version of the paper with wording ready for the next WG14 meeting, where we hope it will receive sufficient support to be accepted into the language.

Future Extensions

While the feautre as committed in Clang is incredibly useful, it can be taken further. There are a handful of future extensions that we wish to implement once guidance from WG14 has been given on their direction and implementation.

First, we believe the special integer promotion/conversion rules, which omit automatic promotion to  int and instead provide operations at the largest type are both incredibly useful and powerful. While we have received positive encouragement from WG14, we hope that the wording paper we provide will both clarify the mechanism and definition in a way that supports all common uses.

Secondly, we would like to choose a printf/scanf specifier that permits specifying the type for the C language. This was the topic of the WG14 discussion, and also received strong encouragement. We intend to come up with a good representation, then implement this in major implementations.

Finally, numerous people have suggested implementing a way of spelling literals of this type. This is important for two reasons: First, it allows using literals without casts in expressions in a way that doesn't run afoul of promotion rules. Second, it provides a way of spelling integer literals larger than UINTMAX_MAX, which can be useful for initializing the larger versions of these types. While the spelling is undecided, we intend something like: 1234X would result in an integer literal with the value 1234 represented in an _ExtInt(11), which is the smallest type capable of storing this value.

However, without the integer promotion/conversion rules above, this feature isn't nearly as useful. Additionally, we'd like to be consistent with whatever the C language committee chooses. As soon as we receive positive guidance on the spelling and syntax of this type, we look forward to providing an implementation.

Conclusion

In closing, we encourage you to try using this and provide feedback to both myself, my proposal co-authors, and the C committee itself! We feel this is a really useful feature and would love to get as much user experience as possible. Feel free to contact myself and my co-authors with any questions or concerns!

-Erich Keane, Intel Corporation

by Erich Keane (noreply@blogger.com) at April 22, 2020 01:04 PM

March 22, 2020

Sylvestre Ledru

Some clang rebuild results (8.0.1, 9.0.1 & 10rc2)

As part of the LLVM release cycle, I am continuing rebuilding the Debian archive with clang instead of gcc to evaluate potential regressions.

Processed results are available on the website: https://clang.debian.net/status.php - Now includes some fancy graphs to show the evolution
Raw logs are published on github: https://github.com/opencollab/clang.debian.net/tree/master/logs

Since my last blog post on the subject (August 2017), Clang is more and more present in the tech ecosystem. It is now the compiler used to build Firefox and Chrome upstream binaries on all the supported architectures/operating systems. More architectures are supported, it has a new linker (lld), a new hybrid IR (MLIR), a lot of checkers in clang-tidy, cross-language linking with Rust, etc.


Results

Now, about Debian results, we rebuilt using 8.0.1, 9.0.1 and 10.0rc2. Results are pretty similar to what we had with previous versions: between 4 to 5% of packages are failing when gcc is replaced by clang.

Some clang rebuild results (8.0.1, 9.0.1 &amp; 10rc2)

Even if most of the software are still using gcc as compiler, we can see that clang has a positive effect on code quality. With many different kinds of errors and warnings found clang over the years, we noticed a steady decline of the number of errors. For example, the number of incorrect C/C++ main declarations has been decreasing years after years:

Some clang rebuild results (8.0.1, 9.0.1 &amp; 10rc2)

Errors found

The biggest offender is still the qmake changes which doesn't allow the used workaround (replacing /usr/bin/gcc by /usr/bin/clang) - about 250 errors. Most of these packages would probably compile fine with clang. More on the Qt bug tracker. The workaround proposed in the bug isn't applicable for us as we use the dropped-in replacement of the compiler.

The second error is still some differences in symbol generation. Unlike gcc, it seems that clang doesn't generate some symbols (or adds some). As a bunch of Debian packages are checking the list of symbols in the library (for ABI management), the build fails on purpose. For example, with libcec, the symbol _ZN10P8PLATFORM14CConditionImplD1Ev@Base 3.1.0 isn't generated anymore. I am not expecting this to be a big deal: the generated libraries probably works most of the time. More on C++ symbol management in Debian.
I reported this bug upstream a while back: https://bugs.llvm.org/show_bug.cgi?id=30441

Current status

As previously said in a blog post, I don't think there is a strong intensive to go away from gcc for most of the Linux distributions. The big reason for BSD was the license (even if the move to the Apache 2 license wasn't received positively by some of them).
While the LLVM/clang ecosystem clearly won the tooling battle, as a C/C++ compiler, gcc is still an excellent compiler which supports more architecture and more languages.
In term of new warnings and checks, as the clang community moved the efforts in clang-tidy (which requires more complex tooling), out of the box, gcc provides a better experience (as example, see the Firefox meta bug to build with -Werror with the default warnings using gcc 9, gcc 10 and clang trunk for example).

Next steps

I see some potential next steps to decrease the number of failure:

  • Workaround the Qt/Qmake issue
  • Fix the objective-c header include issues (echo "#include <objc/objc.h>" > foo.m && clang -c foo.m is currently failing)
  • Identify why clang generates more/less symbols that gcc in the library and try to fix that
  • Rebuild the archive with clang-7 - Seems that I have some data problem

Many thanks to Lucas Nussbaum for the rebuilds.

by sylvestre at March 22, 2020 11:31 PM

Some clang rebuild results (8.0.1, 9.0.1 & 10rc2)

As part of the LLVM release cycle, I am continuing rebuilding the Debian archive with clang instead of gcc to evaluate potential regressions.

Processed results are available on the website: https://clang.debian.net/status.php - Now includes some fancy graphs to show the evolution
Raw logs are published on github: https://github.com/opencollab/clang.debian.net/tree/master/logs

Since my last blog post on the subject (August 2017), Clang is more and more present in the tech ecosystem. It is now the compiler used to build Firefox and Chrome upstream binaries on all the supported architectures/operating systems. More architectures are supported, it has a new linker (lld), a new hybrid IR (MLIR), a lot of checkers in clang-tidy, cross-language linking with Rust, etc.



Results  


Now, about Debian results, we rebuilt using 8.0.1, 9.0.1 and 10.0rc2. Results are pretty similar to what we had with previous versions: between 4 to 5% of packages are failing when gcc is replaced by clang.

Some clang rebuild results (8.0.1, 9.0.1 &amp; 10rc2)


Even if most of the software are still using gcc as compiler, we can see that clang has a positive effect on code quality. With many different kinds of errors and warnings found clang over the years, we noticed a steady decline of the number of errors. For example, the number of incorrect C/C++ main declarations has been decreasing years after years:

Some clang rebuild results (8.0.1, 9.0.1 &amp; 10rc2)


Errors found  


The biggest offender is still the qmake changes which doesn't allow the used workaround (replacing /usr/bin/gcc by /usr/bin/clang) - about 250 errors. Most of these packages would probably compile fine with clang. More on the Qt bug tracker. The workaround proposed in the bug isn't applicable for us as we use the dropped-in replacement of the compiler.


The second error is still some differences in symbol generation. Unlike gcc, it seems that clang doesn't generate some symbols (or adds some). As a bunch of Debian packages are checking the list of symbols in the library (for ABI management), the build fails on purpose. For example, with libcec, the symbol _ZN10P8PLATFORM14CConditionImplD1Ev@Base 3.1.0 isn't generated anymore. I am not expecting this to be a big deal: the generated libraries probably works most of the time. More on C++ symbol management in Debian.


Current status  


As previously said in a blog post, I don't think there is a strong intensive to go away from gcc for most of the Linux distributions. The big reason for BSD was the license (even if the move to the Apache 2 license wasn't received positively by some of them).
While the LLVM/clang ecosystem clearly won the tooling battle, as a C/C++ compiler, gcc is still an excellent compiler which supports more architecture and more languages.
In term of new warnings and checks, as the clang community moved the efforts in clang-tidy (which requires more complex tooling), out of the box, gcc provides a better experience (as example, see the Firefox meta bug to build with -Werror with the default warnings using gcc 9, gcc 10 and clang trunk for example).


Next steps  


I see some potential next steps to decrease the number of failure:


  • Workaround the Qt/Qmake issue
  • Fix the objective-c header include issues (echo "#include <objc/objc.h>" > foo.m && clang -c foo.m is currently failing)
  • Identify why clang generates more/less symbols that gcc in the library and try to fix that
  • Rebuild the archive with clang-7 - Seems that I have some data problem

Many thanks to Lucas Nussbaum for the rebuilds.

by sylvestre at March 22, 2020 10:31 PM

November 07, 2019

LLVM Blog

Deterministic builds with clang and lld

Deterministic builds can lower continuous integration costs and give you more confidence in your build and test process. This post outlines what it means for a build to be deterministic, the advantages of deterministic builds, and how to achieve them using LLVM tools.

What is a deterministic build, and its advantages

A build is called deterministic or reproducible if running it twice produces exactly the same build outputs.

There are several degrees of build determinism that are increasingly useful but increasingly difficult to achieve:

  1. Basic determinism: Doing a full build of the same source code in the same directory on the same machine produces exactly the same output every time, in the sense that a content hash of the final build artifacts and of all intermediate files does not change.
    • Once you have this, if all your builders are configured the same way (OS version, toolchain, build path, checkout path, …), they can share build artifacts, for example by using distcc.
    • This also allows local caching of test suite results keyed by a hash of test binary and test input files.
    • Illustrative example: ./build src out ; mv out out.old ; ./build src out ; diff -r out out.old
  2. Incremental basic determinism: Like basic determinism, but the output binaries also don’t change in partial rebuilds. In build systems that track file modification times to decide when to rebuild, this means for example that updating the modification time on a C++ source file (without doing any actual changes) and rebuilding will produce the same output as a full build.
    • This allows having build bots that don’t do full builds each time, while still allowing caching of compile artifacts and test results.
    • Illustrative example: ./build src out ; cp -r out out.old ; touch src/foo.c ; ./build src out ; diff -r out out.old
  3. Local determinism: Like incremental basic determinism, but builds are also independent of the name of the build directory. Builds of the same source code on the same machine produce exactly the same output every time, independent of the location of the source checkout directory or the build directory.
    • This allows machines to have several build directories at different locations but still share compile and test caches.
    • Illustrative example: cp -r src src2 ; ./build src out ; ./build src2 out2 ; diff -r out out2
  4. Universal determinism: Like 3, but builds are also independent of the machine the build runs on. Everybody that checks out the project at a given revision into any directory and builds it following the build instructions ends up with exactly the same bits in the build output.
    • Since exact local OS and locally installed packages no longer matter, this allows devs to share compile and test caches with bots, without having to use difficult-to-setup containers.
    • It also allows easy verification of builds done by others to make sure output binaries haven’t been tampered with.
    • Illustrative example: ./build src out ; ssh remote ./build src out && scp remote:out out2 ; diff -r out out2

Plan of attack

To make sure that a deterministic build stays deterministic, you should set up a builder that verifies that your build is deterministic. Even if your build isn’t deterministic yet, you can set up a bot that verifies that some parts of your build are deterministic and then expand the checks over time.

For example, you could have a bot that does a full build in a fixed build directory, then moves the build artifacts out of the way, and does another full build, and once your compiles have basic determinism, add a step that checks that object files between the two builds directories are the same. You could even add incremental checking for specific subdirectories or build targets while you work towards full basic determinism.

Once your links are deterministic, check that binaries are identical as well. Once all your build steps are deterministic, compare all files in the two build directories.

Once your build has incremental determinism, do an incremental build for the first build and a full build for the second build. Once your build has local determinism, do the two builds at different build paths.

Getting to basic determinism

Basic determinism needs tools (compiler, linker, etc) that are deterministic. Tools internally must not output things in hash table order, multi-threaded programs must not write output in the order threads finish, etc. All of LLVM’s tools have deterministic outputs when run with the right flags but not necessarily by default.

The C standard defines the predefined macros __TIME__ and __DATE__ that expand to the time a source file is compiled. Several compilers, including clang, also define the non-standard __TIMESTAMP__. This is inherently nondeterministic. You should not use these macros, and you can use -Wdate-time to make the compiler emit a warning when they are used.

If they are used in third-party code you don’t control, you can use -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= to make them expand to nothing.

When targeting Windows, clang and clang-cl by default also embed the current time in a timestamp field in the output .obj file, because Microsoft’s link.exe in /incremental mode silently mislinks files if that field isn’t set correctly. If you don’t use link.exe’s /incremental flag, or if you link with lld-link, you should pass /Brepro to clang-cl to make it not write the current timestamp into its output.

Both link.exe and lld-link also write the current timestamp into output .dll or .exe files. To make them instead write a hash of the binary into this field, you can pass /Brepro to the linker as well. However, some tools, such as Windows 7’s app compatibility database, try to interpret that field as an actual timestamp and can get confused if it’s set to a hash of the binary. For this case, lld-link also offers a /timestamp: flag that you can give an explicit timestamp that’s written into the output. You could use this to for example write the time of the commit the code is built at instead of the current time to make it deterministic. (But see the footnote on embedding commit hashes below.)

Visual Studio’s assemblers ml.exe and ml64.exe also insist on writing the current time into their output. In situations like this, where you can’t easily fix the tool to write the right output in the first place, you need to write wrappers that fix up the file after the fact. As an example, ml.py is the wrapper the Chromium project uses to make ml’s output deterministic.

macOS’s libtool and ld64 also insist on writing timestamps into their outputs. You can set the environment variable ZERO_AR_DATE to 1 in a wrapper to make their output deterministic, but that confuses lldb of older Xcode versions.

Gcc sometimes uses random numbers in certain symbol mangling situations. Clang does not do this, so there’s no need to pass -frandom-seed to clang.

It’s a good idea to make your build independent of environment variables as much as possible, so that accidental local changes in the environment don’t affect the build output. You should pass /X to clang-cl to make it ignore %INCLUDE% and explicitly pass system include directories via the -imsvc switch instead. Likewise, very new lld-link versions (LLVM 10 and newer, at the time of this writing still unreleased) understand the flag /lldignoreenv flag, which makes lld-link ignore the %LIB% environment variable; explicitly pass system library directories via /libpath:.

Footnote on embedding git hashes into the binary
It might be tempting to embed the git commit hash or svn revision that a binary was built at into the binary’s --version output, or use the revision as a cache key to invalidate on-disk caches when the version changes.

This doesn’t affect your build’s determinism, but it does affect the hit rate if you’re using deterministic builds to cache test run results. If your binary embeds the current commit, it is guaranteed to change on every single commit, and you won’t be able to cache test results across commits. Even commits that just fix typos in comments, add non-code documentation, or that only affect code used by some but not all of your binaries will change every binary.

For cache invalidation, consider using something finer-grained, such as only the latest commit of the directory containing the cache handling code, or the hash of all source files containing the cache handling code.

For --version output, if your build is fully deterministic, the hash of the binary itself (and its dynamic library dependencies) can serve as a stable version identifier. You can keep a map of binary hash to all commit hashes that produce that binary somewhere.

Windows only: For the same reason, just using the timestamp of the latest commit as a /timestamp: might not be the best option. Rounding the timestamp of the latest commit to 6h (or similar) granularity is a possible approach for not having the timestamp change the binary on every commit, while still keeping the timestamp close to reality. For production builds, the symbol server key for binaries is a (executable size, timestamp) pair, so here having fairly granular timestamps is important to not map binaries from consecutive commits to the same symbol server key. Depending on how often you push production binaries to your symbol servers, you might want to use the timestamp of the latest commit as /timestamp: for official builds, or you might want to round to finer granularity than you do on dev builds.

Getting to incremental determinism

Having deterministic incremental builds mostly requires having correct incremental builds, meaning that if a file is changed and the build reruns, everything that uses this file needs to be rebuilt.

This is very build system dependent, so this post can’t say much about it.

In general, every build step needs to correctly declare all the inputs it depends on.

Some tools, such as Visual Studio’s link.exe in /incremental mode, by design write a different output every time. Don’t use inherently incrementally non-deterministic tools like that if you care about build determinism.

The build should not depend on environment variables, since build systems usually don’t model dependencies on environment variables.

Getting to local determinism

Making build outputs independent of the names of the checkout or build directory means that build outputs must not contain absolute paths, or relative paths that contain the name of either directory.

A possible way to arrange for that is to put all build directories into the checkout directory. For example, if your code is at path/to/src, then you could have “out” in your .gitignore and build directories at path/to/src/out/debug, path/to/src/out/release, and so on. The relative path from each build artifact to the source is with “../../” followed by the path of the source file in the source directory, which is identical for each build directory.

The C standard defines the predefined macro __FILE__ that expands to the name of the current source file. Clang expands this to an absolute path if it is invoked with an absolute path (`clang -c /absolute/path/to/my/file.cc`), and to a relative path if it is invoked with a relative path (`clang ../../path/to/my/file.cc`). To make your build locally deterministic, pass relative paths to your .cc files to clang.

By default, clang will internally use absolute paths to refer to compiler-internal headers. Pass -no-canonical-prefixes to make clang use relative paths for these internal files.

Passing relative paths to clang makes clang expand __FILE__ to a relative path, but paths in debug information are still absolute by default. Pass -fdebug-compilation-dir . to make paths in debug information relative to the build directory. (Before LLVM 9, this is an internal clang flag that must be used as `-Xclang -fdebug-compilation-dir -Xclang .`) When using clang’s integrated assembler (the default), -Wa,-fdebug-compilation-dir,. will do the same for object files created from assembly input. (For ml.exe / ml64.exe, see the script linked to from the “Basic determinism” section above.)

Using this means that debuggers won’t automatically find the source code belonging to your binary. At the moment, there’s no way to tell debuggers to resolve relative paths relative to the location of the binary (DWARF proposal, gdb patch). See the end of this section for how to configure common debuggers to work correctly.

There are a few flags that try to make compilers produce relative paths in outputs even if the filename passed to the compiler is absolute (-fdebug-prefix-map, -ffile-prefix-map, -fmacro-prefix-map). Do not use these flags.
  • They work by adding lhs=rhs replacement patterns, and the lhs must be an absolute path to remove the absolute path from the output. That means that while they make the compile output path-independent, they make the compile command itself path-dependent, which hinders distributed compile caching. With -grecord-gcc-switches or -frecord-gcc-switches the compile command is embedded in debug info or even the object file itself, so in that case the flags even break local determinism. (Both -grecord-gcc-switches and -frecord-gcc-switches default to false in clang.)
  • They don’t affect the paths in dwo files when using fission; passing relative paths to the compiler is the only way to make these paths relative.
On Windows, it’s very unusual to have PDBs with relative paths. You can pass /pdbsourcepath:X:\fake\prefix to lld-link to make it resolve all relative paths in object files against a fixed absolute path to make sure your final PDBs contain absolute paths. Since the absolute path is against a fixed prefix, this doesn’t impair determinism. With this, both binaries and PDBs created by clang-cl and lld-link will be fully deterministic and build path independent.

Also on Windows, the linker by default puts the absolute path the to the generated PDB file in the output binary. Pass /pdbaltpath:%_PDB% when you pass /debug to make the linker write a relative path to the generated PDB file instead. If you have custom build steps that extract PDB names from binaries, you have to make sure these scripts work with relative paths. Microsoft’s tools (debuggers, ETW) work fine with this set in most situations, and you can add a symbol search path in the cases where they don’t (when the binaries are copied before being run).

Getting debuggers to work well with locally deterministic builds
At the moment, no debugger offers an option to resolve relative paths in debug info against the directory the debugged binary is in.

Some debuggers (gdb, lldb) do try to resolve relative paths against the cwd, so a simple way to make debugging work is to cd into your build directory before debugging.

If you don’t want to require devs to cd into the build directory for debugging to work, you have to do debugger-specific configuration tweaks.

To make sure devs don’t miss this, you could have your custom init script set an env var and query if it’s set early during your test binary startup, and exit with a message like “Add `source /path/to/your/project/gdbinit` to your ~/.gdbinit” if the environment variable isn’t set.

gdb
`dir path/to/build/dir` tells gdb what directory to resolve relative paths against.

`show debug-file-directory` prints the list of directories gdb looks in for dwo files. Query that, append `:path/to/build/dir`, and call `set debug-file-directory` to add your build dir to that search path.

For an example, see Chromium’s gdbinit (which also does a few other unrelated things).

lldb
`settings set target.source-map ../.. /absolute/path/to/build/dir` can map the “../..” prefix that all .cc files will refer to when using the setup described above with an absolute path. This requires Xcode 10.3 or newer; the lldb shipping with Xcode 10.1 has problems with this setup.

For an example, see Chromium’s lldbinit.

Visual Studio’s debugger and windbg
If you use the setup described above,  /PDBSourcePath:X:\fake\prefix will combine with the “..\..\my\file.cc” relative paths to make your code appear at “X:\my\file.cc”. To make Windows debuggers find them, you have two options:
  1. Run `subst X: C:\src\real\root` in cmd.exe before launching the debuggers to create a virtual drive that maps X: to the actual source location. Both windbg and Visual Studio will load code over X: this way.
  2. Add “C:\src\real\root” to each debugger’s source search path.
    • Windbg: Run `.srcpath+ C:\src\real\root`. You can also set this via the _NT_SOURCE_PATH  environment variable, or via  File->Source File Path (Ctrl+P). Or pass `-srcpath C:\src\real\root` when launching windbg from the command line.
    • Visual Studio: The IDE has a “Debug Source Files” property. Add C:\src\real\root to “Directories containing source code” to Project->Properties (Alt+F7)->Common Properties->Debug Source Files->Directories containing source code.
Alternatively, you could pass the absolute path to the actual build directory to /PDBSourcePath: instead of something like “X:\fake\prefix”. That way, all PDBs have “correct” absolute paths in them, while your compile steps are still path-independent and can share a cache across machines. However, since executables contain a reference to the PDB hash, none of your binaries will be path-independent. This setup doesn’t require any debugger configuration, but it doesn’t allow your builds to be locally deterministic.

Getting to universal determinism

By now, your build output is deterministic as long as everyone uses the same compiler, and linker binaries, and as long as everyone uses the version of the SDK and system libraries.

Making your build independent of that requires making sure that everyone automatically uses the same compiler, linker, and SDK.

This might seem like a lot of work, but in addition to build determinism this work also gives you cross builds (where you can e.g. build the Linux version of your product on a Windows host).

It also versions the compiler, linker, and SDK used within your code, which means you will be able to update all your bots and devs to new versions automatically (and if an update causes issues, it’s easy to revert it).

You need to store the currently-used compiler, linker, and SDK versions in a file in your source control repository, and from some kind of hook that runs after pulling the newest version of the source, download compiler, linker, and SDK of the right version from some kind of cloud storage service.

You then need to modify your build files to use --sysroot (Linux), -isysroot (macOS), -imsvc (Windows) to use these hermetic SDKs for builds. They need to be somewhere below your source root to not regress build directory name invariance.

You also want to make sure your build doesn’t depend on environment variables, as already mentioned in the “Getting to incremental determinism”, since environments between different machines can be very different and difficult to control.

Build steps shouldn’t embed the hostname of the current machine or the logged-in user name in the build output, or similar.

Summary

This post explained what deterministic builds are, how build determinism spans a spectrum (local, fixed-build-dir-path-only to fully host-OS-independent) instead of just being binary, and how you can use LLVM’s tools to make your build deterministic. It also touched on techniques you can use to make your test caches more effective.

Thanks to Elly Fong-Jones for helping edit and structure this post, and to Adrian McCarthy, Bob Haarman, Bruce Dawson, Dirk Pranke, Fumitoshi Ukai, Hans Wennborg, Kai Naschinski, Reid Kleckner, Rui Ueyama, and Takuto Ikuta for reading drafts and suggesting improvements.

by Nico Weber (noreply@blogger.com) at November 07, 2019 08:34 PM

September 19, 2019

LLVM Blog

Closing the gap: cross-language LTO between Rust and C/C++

Link time optimization (LTO) is LLVM's way of implementing whole-program optimization. Cross-language LTO is a new feature in the Rust compiler that enables LLVM's link time optimization to be performed across a mixed C/C++/Rust codebase. It is also a feature that beautifully combines two respective strengths of the Rust programming language and the LLVM compiler platform:
  • Rust, with its lack of a language runtime and its low-level reach, has an almost unique ability to seamlessly integrate with an existing C/C++ codebase, and
  • LLVM, as a language agnostic foundation, provides a common ground where the source language a particular piece of code was written in does not matter anymore.
So, what does cross-language LTO do? There are two answers to that:
  • From a technical perspective it allows for codebases to be optimized without regard for implementation language boundaries, making it possible for important optimizations, such as function inlining, to be performed across individual compilation units even if, for example, one of the compilation units is written in Rust while the other is written in C++.
  • From a psychological perspective, which arguably is just as important, it helps to alleviate the nagging feeling of inefficiency that many performance conscious developers might have when working on a piece of software that jumps back and forth a lot between functions implemented in different source languages.
Because Firefox is a large, performance sensitive codebase with substantial parts written in Rust, cross-language LTO has been a long-time favorite wish list item among Firefox developers. As a consequence, we at Mozilla's Low Level Tools team took it upon ourselves to implement it in the Rust compiler.

To explain how cross-language LTO works it is useful to take a step back and review how traditional compilation and "regular" link time optimization work in the LLVM world.


Background - A bird's eye view of the LLVM compilation pipeline

Clang and the Rust compiler both follow a similar compilation workflow which, to some degree, is prescribed by LLVM:
  1. The compiler front-end generates an LLVM bitcode module (.bc) for each compilation unit. In C and C++ each source file will result in a single compilation unit. In Rust each crate is translated into at least one compilation unit.

    .c --clang--> .bc

    .c --clang--> .bc


    .rs --+
    |
    .rs --+--rustc--> .bc
    |
    .rs --+

  2. In the next step, LLVM's optimization pipeline will optimize each LLVM module in isolation:

    .c --clang--> .bc --LLVM--> .bc (opt)

    .c --clang--> .bc --LLVM--> .bc (opt)


    .rs --+
    |
    .rs --+--rustc--> .bc --LLVM--> .bc (opt)
    |
    .rs --+

  3. LLVM then lowers each module into machine code so that we get one object file per module:

    .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o

    .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o


    .rs --+
    |
    .rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o
    |
    .rs --+

  4. Finally, the linker will take the set of object files and link them together into a binary:

    .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o ------+
    |
    .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o ------+
    |
    +--ld--> bin
    .rs --+ |
    | |
    .rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o --+
    |
    .rs --+

This is the regular compilation workflow if no kind of LTO is involved. As you can see, each compilation unit is optimized in isolation. The optimizer does not know the definition of functions inside of other compilation units and thus cannot inline them or make other kinds of decisions based on what they actually do. To enable inlining and optimizations to happen across compilation unit boundaries, LLVM supports link time optimization.


Link time optimization in LLVM

The basic principle behind LTO is that some of LLVM's optimization passes are pushed back to the linking stage. Why the linking stage? Because that is the point in the pipeline where the entire program (i.e. the whole set of compilation units) is available at once and thus optimizations across compilation unit boundaries become possible. Performing LLVM work at the linking stage is facilitated via a plugin to the linker.

Here is how LTO is concretely implemented:
  • the compiler translates each compilation unit into LLVM bitcode (i.e. it skips lowering to machine code),
     
  • the linker, via the LLVM linker plugin, knows how to read LLVM bitcode modules like regular object files, and
     
  • the linker, again via the LLVM linker plugin, merges all bitcode modules it encounters and then runs LLVM optimization passes before doing the actual linking.
With these capabilities in place a new compilation workflow with LTO enabled for C++ code looks like this:

.c --clang--> .bc --LLVM--> .bc (opt) ------------------+ - - +
| |
.c --clang--> .bc --LLVM--> .bc (opt) ------------------+ - - +
| |
+-ld+LLVM--> bin
.rs --+ |
| |
.rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o -+
|
.rs --+

As you can see our Rust code is still compiled to a regular object file. Therefore, the Rust code is opaque to the optimization taking place at link time. Yet, looking at the diagram it seems like that shouldn't be too hard to change, right?


Cross-language link time optimization

Implementing cross-language LTO is conceptually simple because the feature is built on the shoulders of giants. Since the Rust compiler uses LLVM all the important building blocks are readily available. The final diagram looks very much as you would expect, with rustc emitting optimized LLVM bitcode and the LLVM linker plugin incorporating that into the LTO process with the rest of the modules:

.c --clang--> .bc --LLVM--> .bc (opt) ---------+
|
.c --clang--> .bc --LLVM--> .bc (opt) ---------+
|
+-ld+LLVM--> bin
.rs --+ |
| |
.rs --+--rustc--> .bc --LLVM--> .bc (opt) -----+
|
.rs --+

Nonetheless, achieving a production-ready implementation still turned out to be a significant time investment. After figuring out how everything fits together, the main challenge was to get the Rust compiler to produce LLVM bitcode that was compatible with both the bitcode that Clang produces and with what the linker plugin would accept. Some of the issues we ran into where:
  • The Rust compiler and Clang are both based on LLVM but they might be using different versions of LLVM. This was further complicated by the fact that Rust's LLVM version often does not match a specific LLVM release, but can be an arbitrary revision from LLVM's repository. We learned that all LLVM versions involved really have to be a close match in order for things to work out. The Rust compiler's documentation now offers a compatibility table for the various versions of Rust and Clang.
     
  • The Rust compiler by default performs a special form of LTO, called ThinLTO, on all compilation units of the same crate before passing them on to the linker. We quickly learned, however, that the LLVM linker plugin crashes with a segmentation fault when trying to perform another round of ThinLTO on a module that had already gone through the process. No problem, we thought and instructed the Rust compiler to disable its own ThinLTO pass when compiling for the cross-language case and indeed everything was fine -- until the segmentation faults mysteriously returned a few weeks later even though ThinLTO was still disabled.

    We noticed that the problem only occurred in a specific, presumably innocent setting: again two passes of LTO needed to happen, this time the first was a regular LTO pass within rustc and the output of that would then be fed into ThinLTO within the linker plugin. This setup, although computationally expensive, was desirable because it produced faster code and allowed for better dead-code elimination on the Rust side. And in theory it should have worked just fine. Yet somehow rustc produced symbol names that had apparently gone through ThinLTO's mangling even though we checked time and again that ThinLTO was disabled for Rust. We were beginning to seriously question our understanding of LLVM's inner workings as the problem persisted while we slowly ran out of ideas on how to debug this further.

    You can picture the proverbial lightbulb appearing over our heads when we figured out that Rust's pre-compiled standard library would still have ThinLTO enabled, no matter the compiler settings we were using for our tests. The standard library, including its LLVM bitcode representation, is compiled as part of Rust's binary distribution so it is always compiled with the settings from Rust's build servers. Our local full LTO pass within rustc would then pull this troublesome bitcode into the output module which in turn would make the linker plugin crash again. Since then ThinLTO is turned off for libstd by default.
     
  • After the above fixes, we succeeded in compiling the entirety of Firefox with cross-language LTO enabled. Unfortunately, we discovered that no actual cross-language optimizations were happening. Both Clang and rustc were producing LLVM bitcode and LLD produced functioning Firefox binaries, but when looking at the machine code, not even trivial functions were being inlined across language boundaries. After days of debugging (and unfortunately without being aware of LLVM's optimization remarks at the time) it turned out that Clang was emitting a target-cpu attribute on all functions while rustc didn't, which made LLVM reject inlining opportunities.

    In order to prevent the feature from silently regressing for similar reasons in the future we put quite a bit of effort into extending the Rust compiler's testing framework and CI. It is now able to compile and run a compatible version of Clang and uses that to perform end-to-end tests of cross-language LTO, making sure that small functions will indeed get inlined across language boundaries.
This list could still go on for a while, with each additional target platform holding new surprises to be dealt with. We had to progress carefully by putting in regression tests at every step in order to keep the many moving parts in check. At this point, however, we feel confident in the underlying implementation, with Firefox providing a large, complex, multi-platform test case where things have been working well for several months now.


Using cross-language LTO: a minimal example

The exact build tool invocations differ depending on whether it is rustc or Clang performing the final linking step, and whether Rust code is compiled via Cargo or via rustc directly. Rust's compiler documentation describes the various cases. The simplest of them, where rustc directly produces a static library and Clang does the linking, looks as follows:

# Compile the Rust static library, called "xyz"
rustc --crate-type=staticlib -O -C linker-plugin-lto -o libxyz.a lib.rs

# Compile the C code with "-flto"
clang -flto -c -O2 main.c

# Link everything
clang -flto -O2 main.o -L . -lxyz

The -C linker-plugin-lto option instructs the Rust compiler to emit LLVM bitcode which then can be used for both "full" and "thin" LTO. Getting things set up for the first time can be quite cumbersome because, as already mentioned, all compilers and the linker involved must be compatible versions. In theory, most major linkers will work; in practice LLD seems to be the most reliable one on Linux, with Gold in second place and the BFD linker needing to be at least version 2.32. On Windows and macOS the only linkers properly tested are LLD and ld64 respectively. For ld64 Firefox uses a patched version because the LLVM bitcode that rustc produces likes to trigger a pre-existing issue this linker has with ThinLTO.


Conclusion

Cross-language LTO has been enabled for Firefox release builds on Windows, macOS, and Linux for several months at this point and we at Mozilla's Low Level Tools team are pleased with how it turned out. While we still need to work on making the initial setup of the feature easier, it already enabled removing duplicated logic from Rust components in Firefox because now code can simply call into the equivalent C++ implementation and rely on those calls to be inlined. Having cross-language LTO in place and continuously tested will definitely lower the psychological bar for implementing new components in Rust, even if they are tightly integrated with existing C++ code.

Cross-language LTO is available in the Rust compiler since version 1.34 and works together with Clang 8. Feel free to give it a try and report any problems in the Rust bug tracker.


Acknowledgments

I'd like to thank my Low Level Tools team colleagues David Major, Eric Rahm, and Nathan Froyd for their invaluable help and encouragement, and I'd like to thank Alex Crichton for his tireless reviews on the Rust side.

by Michael Woerister (noreply@blogger.com) at September 19, 2019 12:15 PM

September 04, 2019

LLVM Blog

Announcing the program for the 2019 LLVM Developers' Meeting - Bay Area

Announcing the program for the 2019 LLVM Developers' Meeting in San Jose, CA! This program is the largest we have ever had and has over 11 tutorials, 29 technical talks, 24 lightning talks, 2 panels, 3 birds of a feather, 14 posters, and 4 SRC talks. Be sure to register to attend this event and hear some of these great talks.

Keynotes
Technical Talks
Tutorials
Student Research Competition
Panels
Birds of a Feather
Lightning Talks
Posters


by Tanya Lattner (noreply@blogger.com) at September 04, 2019 09:18 PM

August 01, 2019

LLVM Blog

The LLVM Project is Moving to GitHub

The LLVM Project is Moving to GitHub

After several years of discussion and planning, the LLVM project is getting ready to complete the migration of its source code from SVN to GitHub!  At last year’s developer meeting, many interested community members convened at a series of round tables to lay out a plan to completely migrate LLVM source code from SVN to GitHub by the 2019 U.S. Developer’s Meeting.  We have made great progress over the last nine months and are on track to complete the migration on October 21, 2019.

As part of the migration to GitHub we are maintaining the ‘monorepo’ layout which currently exists in SVN.  This means that there will be a single git repository with one top-level directory for each LLVM sub-project.  This will be a change for those of you who are already using git and accessing the code via the official sub-project git mirrors (e.g. https://git.llvm.org/git/llvm.git) where each sub-project has its own repository.

One of the first questions people ask when they hear about the GitHub plans is: Will the project start using GitHub pull requests and issues?  And the answer to that for now is: no. The current transition plan focuses on migrating only the source code. We will continue to use Phabricator for code reviews, and bugzilla for issue tracking after the migration is complete.  We have not ruled out using pull requests and issues at some point in the future, but these are discussions we still need to have as a community.

The most important takeaway from this post, though, is that if you consume the LLVM source code in any way, you need to take action now to migrate your workflows.  If you manage any continuous integration or other systems that need read-only access to the LLVM source code, you should begin pulling from the official GitHub repository instead of SVN or the current sub-project mirrors.  If you are a developer that needs to commit code, please use the git-llvm script for committing changes.

We have created a status page, if you want to track the current progress of the migration.  We will be posting updates to this page as we get closer to the completion date.  If you run into issues of any kind with GitHub you can file a bug in bugzilla and mark it as a blocker of the github tracking bug.

This entire process has been a large community effort.  Many many people have put in time discussing, planning, and implementing all the steps required to make this happen.  Thank you to everyone who has been involved and let’s keep working to make this migration a success.

Blog post by Tom Stellard.

by Tanya Lattner (noreply@blogger.com) at August 01, 2019 11:17 PM

May 24, 2019

LLVM Blog

LLVM and Google Season of Docs

The LLVM Project is pleased to announce that we have been selected to participate in Google’s Season of Docs!

Our project idea list may be found here:

From now until May 29th, technical writers are encouraged to review the proposed project ideas and to ask any questions you have on our gsdocs@llvm.org mailing list. Other documentation ideas are allowed, but we can not guarantee that a mentor will be found for the project. You are encouraged to discuss new ideas on the mailing list prior to submitting your technical writer application, in order to start the process of finding a mentor.

When submitting your application for an LLVM documentation project, please consider the following:

  • Include Prior Experience: Do you have prior technical writing experience? We want to see this! Considering including links to prior documentation or attachments of documentation you have written. If you can’t include a link to the actual documentation, please describe in detail what you wrote, who the audience was, and any other important information that can help us gauge your prior experience. Please also include any experience with Sphinx or other documentation generation tools.
  • Take your time writing the proposal: We will be looking closely at your application to see how well it is written. Take the time to proofread and know who your audience is.
  • Propose your plan for our documentation project: We have given a rough idea of what changes or topics we envision for the documentation, but this is just a start. We expect you to take the idea and expand or modify it as you see fit. Review our existing documentation and see how it would compliment or replace other pieces. Optionally include an overview or document design or layout plan in your application.
  • Become familiar with our project: We don’t expect you to become a compiler expert, but we do expect you read up on our project to learn a bit about LLVM.

We look forward to working with some fabulous technical writers and improving our documentation. Again, please email gsdocs@llvm.org with your questions.

by Tanya Lattner (noreply@blogger.com) at May 24, 2019 06:09 PM

March 18, 2019

LLVM Blog

EuroLLVM'19 developers' meeting program

The LLVM Foundation is excited to announce the program for the EuroLLVM'19 developers' meeting (April 8 - 9 in Brussels / Belgium) !

Keynote
Technical talks
Tutorials
Student Research Competition
Lightning talks
BoFs
Posters
If you are interested in any of this talks, you should register to attend the EuroLLVM'19. Tickets are limited !

More information about the EuroLLVM'19 is available here

by Anonymous (noreply@blogger.com) at March 18, 2019 03:28 PM

March 15, 2019

LLVM Blog

LLVM Numerics Blog

Keywords: Numerics, Clang, LLVM-IR, : 2019 LLVM Developers' Meeting, LLVMDevMtg.

The goal of this blog post is to start a discussion about numerics in LLVM – where we are, recent work and things that remain to be done.  There will be an informal discussion on numerics at the 2019 EuroLLVM conference next month. One purpose of this blog post is to refresh everyone's memory on where we are on the topic of numerics to restart the discussion.

In the last year or two there has been a push to allow fine-grained decisions on which optimizations are legitimate for any given piece of IR.  In earlier days there were two main modes of operation: fast-math and precise-math.  When operating under the rules of precise-math, defined by IEEE-754, a significant number of potential optimizations on sequences of arithmetic instructions are not allowed because they could lead to violations of the standard.  

For example: 

The Reassociation optimization pass is generally not allowed under precise code generation as it can change the order of operations altering the creation of NaN and Inf values propagated at the expression level as well as altering precision.  

Precise code generation is often overly restrictive, so an alternative fast-math mode is commonly used where all possible optimizations are allowed, acknowledging that this impacts the precision of results and possibly IEEE compliant behavior as well.  In LLVM, this can be enabled by setting the unsafe-math flag at the module level, or passing the -funsafe-math-optimizations to clang which then sets flags on the IR it generates.  Within this context the compiler often generates shorter sequences of instructions to compute results, and depending on the context this may be acceptable.  Fast-math is often used in computations where loss of precision is acceptable.  For example when computing the color of a pixel, even relatively low precision is likely to far exceed the perception abilities of the eye, making shorter instruction sequences an attractive trade-off.  In long-running simulations of physical events however loss of precision can mean that the simulation drifts from reality making the trade-off unacceptable.

Several years ago LLVM IR instructions gained the ability of being annotated with flags that can drive optimizations with more granularity than an all-or-nothing decision at the module level.  The IR flags in question are: 

nnan, ninf, nsz, arcp, contract, afn, reassoc, nsw, nuw, exact.  

Their exact meaning is described in the LLVM Language Reference Manual.   When all the flags are are enabled, we get the current fast-math behavior.  When these flags are disabled, we get precise math behavior.  There are also several options available between these two models that may be attractive to some applications.  In the past year several members of the LLVM community worked on making IR optimizations passes aware of these flags.  When the unsafe-math module flag is not set these optimization passes will work by examining individual flags, allowing fine-grained selection of the optimizations that can be enabled on specific instruction sequences.  This allows vendors/implementors to mix fast and precise computations in the same module, aggressively optimizing some instruction sequences but not others.

We now have good coverage of IR passes in the LLVM codebase, in particular in the following areas:
* Intrinsic and libcall management
* Instruction Combining and Simplification
* Instruction definition
* SDNode definition
* GlobalIsel Combining and code generation
* Selection DAG code generation
* DAG Combining
* Machine Instruction definition
* IR Builders (SDNode, Instruction, MachineInstr)
* CSE tracking
* Reassociation
* Bitcode

There are still some areas that need to be reworked for modularity, including vendor specific back-end passes.  

The following are some of the contributions mentioned above from the last 2 years of open source development:

https://reviews.llvm.org/D45781 : MachineInst support mapping SDNode fast math flags for support in Back End code generation 
https://reviews.llvm.org/D46322 : [SelectionDAG] propagate 'afn' and 'reassoc' from IR fast-math-flags
https://reviews.llvm.org/D45710 : Fast Math Flag mapping into SDNode
https://reviews.llvm.org/D46854 : [DAG] propagate FMF for all FPMathOperators
https://reviews.llvm.org/D48180 : updating isNegatibleForFree and GetNegatedExpression with fmf for fadd
https://reviews.llvm.org/D48057: easing the constraint for isNegatibleForFree and GetNegatedExpression
https://reviews.llvm.org/D47954 : Utilize new SDNode flag functionality to expand current support for fdiv
https://reviews.llvm.org/D47918 : Utilize new SDNode flag functionality to expand current support for fma
https://reviews.llvm.org/D47909 : Utilize new SDNode flag functionality to expand current support for fadd
https://reviews.llvm.org/D47910 : Utilize new SDNode flag functionality to expand current support for fsub
https://reviews.llvm.org/D47911 : Utilize new SDNode flag functionality to expand current support for fmul
https://reviews.llvm.org/D48289 : refactor of visitFADD for AllowNewConst cases
https://reviews.llvm.org/D47388 : propagate fast math flags via IR on fma and sub expressions
https://reviews.llvm.org/D47389 : guard fneg with fmf sub flags
https://reviews.llvm.org/D47026 : fold FP binops with undef operands to NaN
https://reviews.llvm.org/D47749 : guard fsqrt with fmf sub flags
https://reviews.llvm.org/D46447 : Mapping SDNode flags to MachineInstr flags
https://reviews.llvm.org/D50195 : extend folding fsub/fadd to fneg for FMF
https://reviews.llvm.org/rL339197 : [NFC] adding tests for Y - (X + Y) --> -X
https://reviews.llvm.org/D50417 : [InstCombine] fold fneg into constant operand of fmul/fdiv
https://reviews.llvm.org/rL339357 : extend folding fsub/fadd to fneg for FMF
https://reviews.llvm.org/D50996 : extend binop folds for selects to include true and false binops flag intersection
https://reviews.llvm.org/rL339938 : add a missed case for binary op FMF propagation under select folds
https://reviews.llvm.org/D51145 : Guard FMF context by excluding some FP operators from FPMathOperator
https://reviews.llvm.org/rL341138 : adding initial intersect test for Node to Instruction association
https://reviews.llvm.org/rL341565 : in preparation for adding nsw, nuw and exact as flags to MI
https://reviews.llvm.org/D51738 : add IR flags to MI
https://reviews.llvm.org/D52006 : Copy utilities updated and added for MI flags
https://reviews.llvm.org/rL342598 : add new flags to a DebugInfo lit test
https://reviews.llvm.org/D53874 : [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative
https://reviews.llvm.org/D55668 : Add FMF management to common fp intrinsics in GlobalIsel
https://reviews.llvm.org/rL352396 : [NFC] TLI query with default(on) behavior wrt DAG combines for fmin/fmax target…
https://reviews.llvm.org/rL316753 (Fold fma (fneg x), K, y -> fma x, -K, y)
https://reviews.llvm.org/D57630 : Move IR flag handling directly into builder calls for cases translated from Instructions in GlobalIsel
https://reviews.llvm.org/rL332756 : adding baseline fp fold tests for unsafe on and off
https://reviews.llvm.org/rL334035 : NFC: adding baseline fneg case for fmf
https://reviews.llvm.org/rL325832 : [InstrTypes] add frem and fneg with FMF creators
https://reviews.llvm.org/D41342 : [InstCombine] Missed optimization in math expression: simplify calls exp functions
https://reviews.llvm.org/D52087 : [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.
https://reviews.llvm.org/D52075 : [InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and (sub (zext x), (zext y)) --> (zext (sub x, y))
https://reviews.llvm.org/rL338059 : [InstCombine] fold udiv with common factor from muls with nuw
Commit: e0ab896a84be9e7beb59874b30f3ac51ba14d025 : [InstCombine] allow more fmul folds with ‘reassoc'
Commit: 3e5c120fbac7bdd4b0ff0a3252344ce66d5633f9 : [InstCombine] distribute fmul over fadd/fsub
https://reviews.llvm.org/D37427 : [InstCombine] canonicalize fcmp ord/uno with constants to null constant
https://reviews.llvm.org/D40130 : [InstSimplify] fold and/or of fcmp ord/uno when operand is known nnan
https://reviews.llvm.org/D40150 : [LibCallSimplifier] fix pow(x, 0.5) -> sqrt() transforms
https://reviews.llvm.org/D39642 : [ValueTracking] readnone is a requirement for converting sqrt to llvm.sqrt; nnan is not
https://reviews.llvm.org/D39304 : [IR] redefine 'reassoc' fast-math-flag and add 'trans' fast-math-flag
https://reviews.llvm.org/D41333 : [ValueTracking] ignore FP signed-zero when detecting a casted-to-integer fmin/fmax pattern
https://reviews.llvm.org/D5584 : Optimize square root squared (PR21126)
https://reviews.llvm.org/D42385 : [InstSimplify] (X * Y) / Y --> X for relaxed floating-point ops
https://reviews.llvm.org/D43160 : [InstSimplify] allow exp/log simplifications with only 'reassoc’ FMF
https://reviews.llvm.org/D43398 : [InstCombine] allow fdiv folds with less than fully 'fast’ ops
https://reviews.llvm.org/D44308 : [ConstantFold] fp_binop AnyConstant, undef --> NaN
https://reviews.llvm.org/D43765 : [InstSimplify] loosen FMF for sqrt(X) * sqrt(X) --> X
https://reviews.llvm.org/D44521 : [InstSimplify] fp_binop X, NaN --> NaN
https://reviews.llvm.org/D47202 : [CodeGen] use nsw negation for abs
https://reviews.llvm.org/D48085 : [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
https://reviews.llvm.org/D48401 : [InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)
https://reviews.llvm.org/D39669 : DAG: Preserve nuw when reassociating adds
https://reviews.llvm.org/D39417 : InstCombine: Preserve nuw when reassociating nuw ops
https://reviews.llvm.org/D51753 : [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)
https://reviews.llvm.org/D51630 : [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
https://reviews.llvm.org/D53650 : [FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes
https://reviews.llvm.org/D54001 : [ValueTracking] determine sign of 0.0 from select when matching min/max FP
https://reviews.llvm.org/D51942 : [InstCombine] Fold (C/x)>0 into x>0 if possible
https://llvm.org/svn/llvm-project/llvm/trunk@348016 : [SelectionDAG] fold FP binops with 2 undef operands to undef
http://llvm.org/viewvc/llvm-project?view=revision&revision=346242 : propagate fast-math-flags when folding fcmp+fpext, part 2
http://llvm.org/viewvc/llvm-project?view=revision&revision=346240 : propagate fast-math-flags when folding fcmp+fpext
http://llvm.org/viewvc/llvm-project?view=revision&revision=346238 : [InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2
http://llvm.org/viewvc/llvm-project?view=revision&revision=346169 : [InstSimplify] fold select (fcmp X, Y), X, Y
http://llvm.org/viewvc/llvm-project?view=revision&revision=346234 : propagate fast-math-flags when folding fcmp+fneg
http://llvm.org/viewvc/llvm-project?view=revision&revision=346147 : [InstCombine] canonicalize -0.0 to +0.0 in fcmp
http://llvm.org/viewvc/llvm-project?view=revision&revision=346143 : [InstCombine] loosen FP 0.0 constraint for fcmp+select substitution
http://llvm.org/viewvc/llvm-project?view=revision&revision=345734 : [InstCombine] refactor fabs+fcmp fold; NFC
http://llvm.org/viewvc/llvm-project?view=revision&revision=345728 : [InstSimplify] fold 'fcmp nnan ult X, 0.0' when X is not negative
http://llvm.org/viewvc/llvm-project?view=revision&revision=345727 : [InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC


While multiple people have been working on finer-grained control over fast-math optimizations and other relaxed numerics modes, there has also been some initial progress on adding support for more constrained numerics models. There has been considerable progress towards adding and enabling constrained floating-point intrinsics to capture FENV_ACCESS ON and similar semantic models.

These experimental constrained intrinsics prohibit certain transforms that are not safe if the default floating-point environment is not in effect. Historically, LLVM has in practice basically “split the difference” with regard to such transforms; they haven’t been explicitly disallowed, as LLVM doesn’t model the floating-point environment, but they have been disabled when they caused trouble for tests or software projects. The absence of a formal model for licensing these transforms constrains our ability to enable them. Bringing language and backend support for constrained intrinsics across the finish line will allow us to include transforms that we disable as a matter of practicality today, and allow us to give developers an easy escape valve (in the form of FENV_ACCESS ON and similar language controls) when they need more precise control, rather than an ad-hoc set of flags to pass to the driver.

We should discuss these new intrinsics to make sure that they can capture the right models for all the languages that LLVM supports.


Here are some possible discussion items:

  • Should specialization be applied at the call level for edges in a call graph where the caller has special context to extend into the callee wrt to flags?
  • Should the inliner apply something similar to calls that meet inlining criteria?
  • What other part(s) of the compiler could make use of IR flags that are currently not covered?
  • What work needs to be done regarding code debt wrt current areas of implementation.

by Michael Berg (noreply@blogger.com) at March 15, 2019 10:47 PM

March 07, 2019

LLVM Blog

FOSDEM 2019 LLVM developer room report


As well as at the LLVM developer meetings, the LLVM community is also present at a number of other events. One of those is FOSDEM, which has had a dedicated LLVM track since 2014.
Earlier this February, the LLVM dev room was back for the 6th time.

FOSDEM is one of the largest open source conferences, attracting over 8000 developers attending over 30 parallel tracks, occupying almost all space of the ULB university campus in Brussels.

In comparison to the LLVM developer meetings, this dev room offers more of an opportunity to meet up with developers from a very wide range of open source projects.

As in previous years, the LLVM dev room program consisted of presentations with a varied target audience, ranging from LLVM developers to LLVM users, including people not yet using LLVM but interested in discovering what can be done with it. 
On the day itself, the room was completely packed for most presentations, often with people waiting outside to be able to enter for the next presentation.
Slides and videos of the presentations are available via the links below


Finally, I want to express my gratitude to the LLVM Foundation, which sponsored travel expenses for a few presenters who couldn't otherwise have made it to the conference.

by Kristof Beyls (noreply@blogger.com) at March 07, 2019 10:33 AM

November 14, 2018

LLVM Blog

30% faster Windows builds with clang-cl and the new /Zc:dllexportInlines- flag

Background

In the course of adding Microsoft Visual C++ (MSVC) compatible Windows support to Clang, we worked hard to make sure the dllexport and dllimport declspecs are handled the same way by Clang as by MSVC.

dllexport and dllimport are used to specify what functions and variables should be externally accessible ("exported") from the currently compiled Dynamic-Link Library (DLL), or should be accessed ("imported") from another DLL. In the class declaration below, S::foo() will be exported when building a DLL:


struct __declspec(dllexport) S {
void foo() {}
};

and code using that DLL would typically see a declaration like this:


struct __declspec(dllimport) S {
void foo() {}
};

to indicate that the function is defined in and should be accessed from another DLL.

Often the same declaration is used along with a preprocessor macro to flip between dllexport and dllimport, depending on whether a DLL is being built or consumed.

The basic idea of dllexport and dllimport is simple, but the semantics get more complicated as they interact with more facets of the C++ language: templates, inheritance, different kinds of instantiation, redeclarations with different declspecs, and so on. Sometimes the semantics are surprising, but by now we think clang-cl gets most of them right. And as the old maxim goes, once you know the rules well, you can start tactfully breaking them.

One issue with dllexport is that for inline functions such as S::foo() above, the compiler must emit the definition even if it's not used in the translation unit. That's because the DLL must export it, and the compiler cannot know if any other translation unit will provide a definition.

This is very inefficient. A dllexported class with inline members in a header file will cause definitions of those members to be emitted in every translation unit that includes the header, directly or indirectly. And as we know, C++ source files often end up including a lot of headers. This behaviour is also different from non-Windows systems, where inline function definitions are not emitted unless they're used, even in shared objects and dynamic libraries.

/Zc:dllexportInlines-

To address this problem, clang-cl recently gained a new command-line flag, /Zc:dllexportInlines- (MSVC uses the /Zc: prefix for language conformance options). The basic idea is simple: since the definition of an inline function is available along with its declaration, it's not necessary to import or export it from a DLL — the inline definition can be used directly. The effect of the flag is to not apply class-level dllexport/dllimport declspecs to inline member functions. In the two examples above, it means S::foo() would not be dllexported or dllimported, even though the S class is declared as such.

This is very similar to the -fvisibility-inlines-hidden Clang and GCC flag used on non-Windows. For C++ projects with many inline functions, it can significantly reduce the set of exported functions, and thereby the symbol table and file size of the shared object or dynamic library, as well as program load time.

On Windows however, the main benefit is not having to emit the unused inline function definitions. This means the compiler has to do much less work, and reduces object file size which in turn reduces the work for the linker. For Chrome, we saw 30 % faster full builds, 30 % shorter link times for blink_core.dll, and 40 % smaller total .obj file size.

The reduction in .obj file size, combined with the enormous reduction in .lib files allowed by previously switching linkers to lld-link which uses thin archives, means that a typical Chrome build directory is now 60 % smaller than it would have been just a year ago.

(Some of the same benefit can be had without this flag if the dllexport inline function comes from a pre-compiled header (PCH) file. In that case, the definition will be emitted in the object file when building the PCH, and so is not emitted elsewhere unless it's used.)

Compatibility

Using /Zc:dllexportInlines- is "half ABI incompatible". If it's used to build a DLL, inline members will no longer be exported, so any code using the DLL must use the same flag to not dllimport those members. However, the reverse scenario generally works: a DLL compiled without the flag (such as a system DLL built with MSVC) can be referenced from code that uses the flag, meaning that the referencing code will use the inline definitions instead of importing them from the DLL.

Like -fvisibility-inlines-hidden, /Zc:dllexportInlines- breaks the C++ language guarantee that (even an inline) function has a unique address within the program. When using these flags, an inline function will have a different address when used inside the library and outside.

Also, these flags can lead to link errors when inline functions, which would normally be dllimported, refer to internal symbols of a DLL:


void internal();

struct __declspec(dllimport) S {
void foo() { internal(); }
}

Normally, references to S::foo() would use the definition in the DLL, which also contains the definition of internal(), but when using /Zc:dllexportInlines-, the inline definition of S::foo() is used directly, resulting in a link error since no definition of internal() can be found.

Even worse, if there is an inline definition of internal() containing a static local variable, the program will now refer to a different instance of that variable than in the DLL:


inline int internal() { static int x; return x++; }

struct __declspec(dllimport) S {
int foo() { return internal(); }
}

This could lead to very subtle bugs. However, since Chrome already uses -fvisibility-inlines-hidden, which has the same potential problem, we believe this is not a common issue.

Summary

/Zc:dllexportInlines- is like -fvisibility-inlines-hidden for DLLs and significantly reduces build times. We're excited that using Clang on Windows allows us to benefit from new features like this.

More information

For more information, see the User's Manual for /Zc:dllexportInlines-.

The flag was added in Clang r346069, which will be part of the Clang 8 release expected in March 2019. It's also available in the Windows Snapshot Build.

Acknowledgements

/Zc:dllexportInlines- was implemented by Takuto Ikuta based on a prototype by Nico Weber.

by Hans Wennborg (noreply@blogger.com) at November 14, 2018 12:49 PM

September 26, 2018

LLVM Blog

Announcing the program for the 2018 LLVM Developers' Meeting Bay Area

The LLVM Foundation is excited to announce the program for the 2018 LLVM Developers' Meeting in San Jose, CA on October 17 & 18.
As a reminder, ticket prices for the event will increase on September 17th. Purchase your tickets today!
Technical Talks
Tutorials
Birds of a Feather
Lightning Talks
Posters

by Tanya Lattner (noreply@blogger.com) at September 26, 2018 02:50 AM

September 25, 2018

LLVM Blog

Integration of libc++ and OpenMP packages into llvm-toolchain

A bit more than a year ago, we gave an update about recent changes in apt.llvm.org. Since then, we noticed an important increase of the usage of the service. Just last month, we saw more than 16.5TB of data being transferred from our CDN.
Thanks to the Google Summer of Code 2018, and after number of requests, we decided to focus our energy to bring new great projects from the LLVM ecosystems into apt.llvm.org.

Starting from version 7, libc++, libc++abi and OpenMP packages are available into the llvm-toolchain packages. This means that, just like clang, lldb or lldb, libc++, libc++abi and OpenMP packages are also built, tested and shipped on https://apt.llvm.org/.

The integration focuses to preserve the current usage of these libraries. The newly merged packages have adopted the llvm-toolchain versioning:

libc++ packages
  • libc++1-7
  • libc++-7-dev
libc++abi packages
  • libc++abi1-7
  • libc++abi-7-dev
OpenMP packages
  • libomp5-7
  • libomp-7-dev
  • libomp-7-doc
This packages are built twice a day for trunk. For version 7, only when new changes happen in the SVN branches.
    Integration of libc++* packages

    Both libc++ and libc++abi packages are built at same time using the clang built during the process. The existing libc++ and libc++abi packages present in Debian and Ubuntu repositories will not be affected (they will be removed at some point). Newly integrated libcxx* packages are not co-installable with them.

    Symlinks have been provided from the original locations to keep the library usage same.

    Example:  /usr/lib/x86_64-linux-gnu/libc++.so.1.0 -> /usr/lib/llvm-7/lib/libc++.so.1.0

    The usage of the libc++ remains super easy:
    Usage:
    $ clang++-7 -std=c++11 -stdlib=libc++ foo.cpp
    $ ldd ./a.out|grep libc++
      libc++.so.1 => /usr/lib/x86_64-linux-gnu/libc++.so.1 (0x00007f62a1a90000)
      libc++abi.so.1 => /usr/lib/x86_64-linux-gnu/libc++abi.so.1 (0x00007f62a1a59000)

    In order to test new developments in libc++, we are also building the experimental features.
    For example, the following command will work out of the box:

    $ clang++-7 -std=c++17 -stdlib=libc++ foo.cpp -lc++experimental -lc++fs

    Integration of OpenMP packages

    While OpenMP packages have been present in the Debian and Ubuntu archives for a while, only a single version of the package was available.

    For now, the newly integrated packages creates a symlink from /usr/lib/libomp.so.5 to /usr/lib/llvm-7/lib/libomp.so.5 keeping the current usage same and making them non co-installable.

    It can be used with clang through -fopenmp flag:
    $ clang -fopenmp foo.c

    The dependency packages providing the default libc++* and OpenMP package are also integrated in llvm-defaults. This means that the following command will install all these new packages at the current version:
    $ apt-get install libc++-dev libc++abi-dev libomp-dev

    LLVM 7 => 8 transition

    In parallel of the libc++ and OpenMP work, https://apt.llvm.org/ has been updated to reflect the branching of 7 from the trunk branches.
    Therefore, we have currently on the platform:

    Stable
    6.0
    Qualification
    7
    Development
    8


    Please note that, from version 7, the packages and libraries are called 7 (and not 7.0).
    For the rational and implementation, see https://reviews.llvm.org/D41869 & https://reviews.llvm.org/D41808.

    Stable packages of LLVM toolchain are already officially available in Debian Buster and in Ubuntu Cosmic.

    Cosmic support

    In order to make sure that the LLVM toolchain does not have too many regressions with this new version, we also support the next Ubuntu version, 18.10, aka Cosmic.

    A Note on coinstallability

    We tried to make them coinstallable, in the resulting packages we had no control over the libraries used during the runtime. This could lead to many unforeseen issues. Keeping these in mind we settled to keep them conflicting with other versions.

    Future work
    • Code coverage build fails for newly integrated packages
    • Move to a 2 phases build to generate clang binary using clang

    Sources of the project are available on the gitlab instance of Debian: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/tree/7


    Reshabh Sharma & Sylvestre Ledru




    by Reshabh Sharma (noreply@blogger.com) at September 25, 2018 03:29 PM

    September 18, 2018

    LLVM Blog

    Announcing the new LLVM Foundation Board of Directors

    The LLVM Foundation is pleased to announce its new Board of Directors:


    Chandler Carruth
    Mike Edwards (Treasurer)
    Hal Finkel
    Arnaud de Grandmaison
    Anton Korobeynikov
    Tanya Lattner (President)
    Chris Lattner
    John Regehr (Secretary)
    Tom Stellard

    Two new members and seven continuing members were elected to the nine person board.

    We want to thank David Kipping for his 2 terms on the board. David has been actively involved with the LLVM Developer Meetings and was the treasurer for the past 4 years. The treasurer is a time demanding position in that he supports the day to day operation of the foundation, balancing the books, and generates monthly treasurer reports.

    We also want to thank all the applicants to the board. When voting on new board members, we took into consideration all contributions (past and present) and current involvement in the LLVM community. We also tried to create a balanced board of individuals from a wide range of backgrounds and locations to provide a voice to as many groups within the LLVM community. Given this criteria and strong applicants, we increased the board from 8 members to 9.

    About the board of directors (listed alphabetically by last name):


    Chandler Carruth:

    Chandler Carruth has been an active contributor to LLVM since 2007. Over the years, he has has worked on LLVM’s memory model and atomics, Clang’s C++ support, GCC-compatible driver, initial profile-aware code layout optimization pass, pass manager, IPO infrastructure, and much more. He is the current code owner of inlining and SSA formation.

    In addition to his numerous technical contributions, Chandler has led Google’s LLVM efforts since 2010 and shepherded a number of new efforts that have positively and significantly impacted the LLVM project. These new efforts include things such as adding C++ modules to Clang, adding address and other sanitizers to Clang/LLVM, making Clang compatible with MSVC and available to the Windows C++ developer community, and much more.

    Chandler works at Google Inc. as a technical lead for their C++ developer platform and has served on the LLVM Foundation board of directors for the last 4 years.

    Mike Edwards:

    Mike Edwards is a relative newcomer to the LLVM community, beginning his involvement just a few years ago while working for Sony Playstation. Finding the LLVM community to be an incredibly amazing and welcoming group of people, Mike knew he had to find a way to contribute. Mike’s previous work in DevOps led him to get involved in helping to work on the llvm.org infrastructure. Last year, with the help of the Board and several community members, Mike was able to get the llvm.org infrastructure moved onto a modern compute platform at Amazon Web Services. Mike is one of the maintainers of our llvm.org infrastructure.

    Mike is currently working as a Software Engineer at Apple, Inc. working on the Continuous Integration and Quality Engineering efforts for LLVM and Clang development.

    Hal Finkel:

    Hal Finkel has been an active contributor to the LLVM project since 2011. He is the code owner for the PowerPC target, the alias-analysis infrastructure, and other components.

    In addition to his numerous technical contributions, Hal has chaired the LLVM in HPC workshop, which is held in conjunction with Super Computing (SC), for the last five years. This workshop provides a venue for the presentation of peer-reviewed HPC-related researching LLVM from both industry and academia. He has also been involved in organizing an LLVM-themed BoF session at SC and LLVM socials in Austin.

    Hal is Lead for Compiler Technology and Programming Languages at Argonne National Laboratory’s Leadership Computing Facility.

    Arnaud de Grandmaison:

    Arnaud de Grandmaison has been hacking on LLVM projects since 2008. In addition to his open source contributions, he has worked for many years on private out-of-tree LLVM-based projects at Parrot, DiBcom, or Arm. He has also been a leader in the European LLVM community by organizing the EuroLLVM Developers’ meeting, Paris socials, and chaired or participated in numerous program committees for the LLVM Developers’ Meetings and other LLVM related conferences.

    Arnaud has attended numerous LLVM Developers’ meetings and volunteered as moderator or presented as well. He also moderates several LLVM mailing lists. Arnaud is also very involved in community wide discussions and decisions such as re-licensing and code of conduct.

    Arnaud is a Senior Principal Engineer at Arm.

    Anton Korobeynikov:

    Anton Korobeynikov has been an active contributor to the LLVM project since 2006. Over the years, he has numerous technical contributions to areas including Windows support, ELF features, debug info, exception handling, and backends such as ARM and x86. He was the original author of the MSP430 and original System Z backend.

    In addition to his technical contributions, Anton has maintained LLVM’s participation in Google Summer of Code by managing applications, deadlines, and overall organization. He also supports the LLVM infrastructure and has been on numerous program committees for the LLVM Developers’ Meetings (both US and EuroLLVM).

    Anton is currently an associate professor at the Saint Petersburg State University and has served on the LLVM Foundation board of directors for the last 4 years.

    Tanya Lattner:

    Tanya Lattner has been involved in the LLVM project for over 14 years. She began as a graduate student who wrote her master's thesis using LLVM, and continued on using and extending LLVM technologies at various jobs during her career as a compiler engineer.

    Tanya has been organizing the US LLVM Developers’ meeting since 2008 and attended every developer meeting. She was the LLVM release manager for 3 years, moderates the LLVM mailing lists, and helps administer the LLVM infrastructure servers, mailing lists, bugzilla, etc. Tanya has also been on the program committee for the US LLVM Developers’ meeting (4+ years) and the EuroLLVM Developers’ Meeting.

    With the support of the initial board of directors, Tanya created the LLVM Foundation, defined its charitable and education mission, and worked to get 501(c)(3) status.

    Tanya is the Chief Operating Officer and has served as the President of the LLVM Foundation board for the last 4 years.

    Chris Lattner:

    Chris Lattner is well known as the founder for the LLVM project and has a lengthy history of technical contributions to the project over the years. He drove much of the early implementation, architecture, and design of LLVM and Clang.

    Chris has attended every LLVM Developers’ meeting, and presented at many of them. He helped drive the conception and incorporation of the LLVM Foundation, and has served as its secretary. Chris also grants commit access to the LLVM Project, moderates mailing lists, moderates and edits the LLVM blog, and drives important non-technical discussions and policy decisions related to the LLVM project.

    Chris manages a team building machine learning infrastructure at Google and has served on the LLVM Foundation board of directors for the last 4 years.

    John Regehr:

    John Regehr has been involved in LLVM for a number of years. As a professor of computer science at the University of Utah, his research specializes in compiler correctness and undefined behavior. He is well known within the LLVM community for the hundreds of bug reports his group has reported to LLVM/Clang.

    John was a project lead for IOC, a Clang based integer overflow checker that eventually became the basis for the integer parts of UBSan. He was also the primary developer of C-Reduce which utilizes Clang as a library and is often used as a test case reducer for compiler issues.

    In addition to his technical contributions, John has served on several LLVM-related program committees. He also has a widely read blog about LLVM and other compiler-related issues (Embedded in Academia).

    Tom Stellard:

    Tom Stellard has been contributing to the LLVM project since 2012. He was the original author of the AMDGPU backend and was also an active contributor to libclc. He has been the LLVM project’s stable release manager since 2014.

    Tom is currently a Software Engineer at Red Hat and is the technical lead for emerging toolchains including Clang/LLvm. He also maintains the LLVM packages for the Fedora project.

    by Tanya Lattner (noreply@blogger.com) at September 18, 2018 04:00 PM

    August 24, 2018

    LLVM Blog

    2018 LLVM Foundation's Women in Compilers and Tools Workshop


    The LLVM Foundation is excited to announce our first half day Women in Compilers and Tools Workshop held the day before the 2018 LLVM Developers’ Meeting - Bay Area. The workshop will be held at the Fairmont Hotel on October 16th from 1:00-6:30PM and includes a cocktail reception.

    This event aims to connect women in the field of compilers and tools and provide them with ideas and techniques to overcome barriers or enhance their careers. It also is open to anyone (not just women) who are interested in increasing diversity within the LLVM community, their workplace or university.

    Registration for the event will open on Monday, August 27th at 9:00AM PDT. Attendance is limited to 100 attendees and tickets will be priced at $50 (students $25). Please see the EventBrite registration page for details.

    The workshop will consist of 3 topics described below:

    1. Inner Critic: How to Deal with Your Imposter Syndrome
    Presented by Women Catalysts

    You're smart. People really like you. And yet, you can't shake the feeling that maybe you don't really deserve your success. Or that someone else can do what you do better...and what if your boss can see it too? You are not alone: it's called the Imposter Syndrome. Believe it or not, the most confident and successful people often fear that
    they are actually inadequate. The great Maya Angelou once said, "I have written 11 books, but each time I think, 'Uh-oh, they're going to find out now. I've run a game on everybody, and they're going to find me out.’" But it doesn't have to be that way. In this workshop, you'll learn to identify the voice of your Imposter Syndrome and develop with strategies for dealing with your inner critics.

    1. Present! A Techie's Guide to Public Speaking
    Presented by Karen Catlin

    To grow your career, you know what you need to do: improve your public speaking skills.

    Public speaking provides the visibility and professional credibility that helps you score the next big opportunity. But even more important is the fact that it transforms the way you communicate. Improved confidence and the ability to convey messages clearly will impact your relationships with your managers, coworkers, customers, industry peers, and even potential new hires.

    In this presentation, Karen Catlin will cover the importance of speaking at conferences and events, along with strategies to get started. She'll share some favorite tips from the book she co-authored with Poornima Vijayashanker, "Present! A Techie's Guide to Public Speaking." And she'll tell some embarrassing stories that are just too good to keep to herself.

    About Karen: After spending 25 years building software products, Karen Catlin is now an advocate for women in the tech industry. She’s a leadership coach, a keynote and TEDx speaker, and co-author of "Present! A Techie’s Guide to Public Speaking.”

    Formerly, Karen was a vice president of engineering at Macromedia and Adobe.

    Karen holds a computer science degree from Brown University and serves as an advisor to Brown's Computer Science Diversity Initiative. She’s also on the Advisory Boards for The Women’s CLUB of Silicon Valley and WEST (Women Entering & Staying in Technology).

    1. Update on Women in Compilers & Tools Program
    Presented by Tanya Lattner
    Over the past year we have hosted panels and BoFs on women in compilers and tools. We now need to take many of the items discussed during the events and put them into action. We will discuss some key areas and potentially break into smaller groups to determine action plans and steps to move forward.

    FAQ:

    Do I need to attend the LLVM Developers’ Meeting to attend this event?
    This is an independent event which is open to anyone.  

    Is this a women only event?
    Anyone is welcome to attend that values diversity within the field of compiler and tools.  These topics can relate to anyone, not just women, and our mission is to improve inclusion and diversity in general.

    Is there a financial hardship discount?

    We have discounted the tickets for all attendees but please reach out to the organizer and we will decide on a case by case basis.

    by Tanya Lattner (noreply@blogger.com) at August 24, 2018 05:41 AM

    April 10, 2018

    LLVM Blog

    EuroLLVM'18 developers' meeting program

    The LLVM Foundation is excited to announce the program for the EuroLLVM'18 developers' meeting (April 16 - 17 in Bristol/UK) !

    Keynotes

    Tutorials

    Talks

    BoFs

    Student Research Competition

    Lightning Talks

    Posters

    If you are interested in any of this talks, you should register to attend the EuroLLVM'18. Tickets are limited !

    More information about the EuroLLVM'18 is available here

    by Anonymous (noreply@blogger.com) at April 10, 2018 05:03 PM

    March 13, 2018

    LLVM Blog

    DragonFFI: FFI/JIT for the C language using Clang/LLVM


    Introduction

    A foreign function interface is "a mechanism by which a program written in one programming language can call routines or make use of services written in another".
    In the case of DragonFFI, we expose a library that allows calling C functions and using C structures from any languages. Basically, we want to be able to do this, let's say in Python:
    import pydffi
    CU = pydffi.FFI().cdef("int puts(const char* s);");
    CU.funcs.puts("hello world!")
    or, in a more advanced way, for instance to use libarchive directly from Python:
    import pydffi
    pydffi.dlopen("/path/to/libarchive.so")
    CU = pydffi.FFI().cdef("#include <archive.h>")
    a = funcs.archive_read_new()
    assert a
    ...
    This blog post presents related works, their drawbacks, then how Clang/LLVM is used to circumvent these drawbacks, the inner working of DragonFFI and further ideas.
    The code of the project is available on GitHub: https://github.com/aguinet/dragonffi. Python 2/3 wheels are available for Linux/OSX x86/x64. Python 3.6 wheels are available for Windows x64. On all these architectures, just use:
    $ pip install pydffi
    and play with it :)

    See below for more information.

    Related work

    libffi is the reference library that provides a FFI for the C language. cffi is a Python binding around this library that also uses PyCParserto be able to easily declare interfaces and types. Both these libraries have limitations, among them:
    • libffi does not support the Microsoft x64 ABI under Linux x64. It isn't that trivial to add a new ABI (hand-written ABI, get the ABI right, ...), while a lot of effort have already been put into compilers to get these ABIs right.
    • PyCParser only supports a very limited subset of C (no includes, function attributes, ...).
    Moreover, in 2014, Jordan Rose and John McCall from Apple made a talk at the LLVM developer meeting of San José about how Clang can be used for C interoperability. This talk also shows various ABI issues, and has been a source of inspiration for DragonFFI at the beginning.

    Somehow related, Sean Callanan, who worked on lldb, gave a talk in 2017 at the LLVM developer meeting of San José on how we could use parts of Clang/LLVM to implement some kind of eval() for C++. What can be learned from this talk is that debuggers like lldb must also be able to call an arbitrary C function, and uses debug information among other things to solve it (what we also do, see below :)).

    DragonFFI is based on Clang/LLVM, and thanks to that it is able to get around these issues:
    • it uses Clang to parse header files, allowing direct usage of a C library headers without adaptation;
    • it support as many calling conventions and function attributes as Clang/LLVM do;
    • as a bonus, Clang and LLVM allows on-the-fly compilation of C functions, without relying on the presence of a compiler on the system (you still need the headers of the system's libc thought, or MSVCRT headers under Windows);
    • and this is a good way to have fun with Clang and LLVM! :)
    Let's dive in!

    Creating an FFI library for C

    Supporting C ABIs

    A C function is always compiled for a given C ABI. The C ABI isn't defined per the official C standards, and is system/architecture-dependent. Lots of things are defined by these ABIs, and it can be quite error prone to implement.

    To see how ABIs can become complex, let's compile this C code:

    typedef struct {
    short a;
    int b;
    } A;

    void print_A(A s) {
    printf("%d %d\n", s.a, s.b);
    }

    Compiled for Linux x64, it gives this LLVM IR:

    target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-pc-linux-gnu"

    @.str = private unnamed_addr constant [7 x i8] c"%d %d\0A\00", align 1

    define void @print_A(i64) local_unnamed_addr {
    %2 = trunc i64 %0 to i32
    %3 = lshr i64 %0, 32
    %4 = trunc i64 %3 to i32
    %5 = shl i32 %2, 16
    %6 = ashr exact i32 %5, 16
    %7 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i64 0, i64 0), i32 %6, i32 %4)
    ret void
    }

    What happens here is what is called structure coercion. To optimize some function calls, some ABIs pass structure values through registers. For instance, an llvm::ArrayRef object, which is basically a structure with a pointer and a size (see https://github.com/llvm-mirror/llvm/blob/release_60/include/llvm/ADT/ArrayRef.h#L51), is passed through registers (though this optimization isn't guaranteed by any standard).

    It is important to understand that ABIs are complex things to implement and we don't want to redo this whole work by ourselves, particularly when LLVM/Clang already know how.

    Finding the right type abstraction

    We want to list every types that is used in a parsed C file. To achieve that goal, various information are needed, among which:
    • the function types, and their calling convention
    • for structures: field offsets and names
    • for union/enums: field names (and values)
    On one hand, we have seen in the previous section that the LLVM IR is too Low Level (as in Low Level Virtual Machine) for this. On the other hand, Clang's AST is too high level. Indeed, let's print the Clang AST of the code above:
    [...]
    |-RecordDecl 0x5561d7f9fc20 <a.c:1:9, line:4:1> line:1:9 struct definition
    | |-FieldDecl 0x5561d7ff4750 <line:2:3, col:9> col:9 referenced a 'short'
    | `-FieldDecl 0x5561d7ff47b0 <line:3:3, col:7> col:7 referenced b 'int'
    We can see that there is no information about the structure layout (padding, ...). There's also no information about the size of standard C types. As all of this depends on the backend used, it is not surprising that these informations are not present in the AST.

    The right abstraction appears to be the LLVM metadata produced by Clang to emit DWARF or PDB structures. They provide structure fields offset/name, various basic type descriptions, and function calling conventions. Exactly what we need! For the example above, this gives (at the LLVM IR level, with some inline comments):

    target triple = "x86_64-pc-linux-gnu"
    %struct.A = type { i16, i32 }
    @.str = private unnamed_addr constant [7 x i8] c"%d %d\0A\00", align 1

    define void @print_A(i64) local_unnamed_addr !dbg !7 {
    %2 = trunc i64 %0 to i32
    %3 = lshr i64 %0, 32
    %4 = trunc i64 %3 to i32
    tail call void @llvm.dbg.value(metadata i32 %4, i64 0, metadata !18, metadata !19), !dbg !20
    tail call void @llvm.dbg.declare(metadata %struct.A* undef, metadata !18, metadata !21), !dbg !20
    %5 = shl i32 %2, 16, !dbg !22
    %6 = ashr exact i32 %5, 16, !dbg !22
    %7 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([...] @.str, i64 0, i64 0), i32 %6, i32 %4), !dbg !23
    ret void, !dbg !24
    }

    [...]
    ; DISubprogram defines (in our case) a C function, with its full type
    !7 = distinct !DISubprogram(name: "print_A", scope: !1, file: !1, line: 6, type: !8, [...], variables: !17)
    ; This defines the type of our subprogram
    !8 = !DISubroutineType(types: !9)
    ; We have the "original" types used for print_A, with the first one being the
    ; return type (null => void), and the other ones the arguments (in !10)
    !9 = !{null, !10}
    !10 = !DIDerivedType(tag: DW_TAG_typedef, name: "A", file: !1, line: 4, baseType: !11)
    ; This defines our structure, with its various fields
    !11 = distinct !DICompositeType(tag: DW_TAG_structure_type, file: !1, line: 1, size: 64, elements: !12)
    !12 = !{!13, !15}
    ; We have here the size and name of the member "a". Offset is 0 (default value)
    !13 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !11, file: !1, line: 2, baseType: !14, size: 16)
    !14 = !DIBasicType(name: "short", size: 16, encoding: DW_ATE_signed)
    ; We have here the size, offset and name of the member "b"
    !15 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !11, file: !1, line: 3, baseType: !16, size: 32, offset: 32)
    !16 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
    [...]

    Internals

    DragonFFI first parses the debug information included by Clang in the LLVM IR it produces, and creates a custom type system to represent the various function types, structures, enumerations and typedefs of the parsed C file. This custom type system has been created for two reasons:
    • create a type system that gathers only the necessary informations from the metadata tree (we don't need the whole debug informations)
    • make the public headers of the DragonFFI library free from any LLVM headers (so that the whole LLVM headers aren't needed to use the library)
    Once we've got this type system, the DragonFFI API for calling C functions is this one:

    DFFI FFI([...]);
    // This will declare puts as a function that returns int and takes a const
    // char* as an argument. We could also create this function type by hand.
    CompilationUnit CU = FFI.cdef("int puts(const char* s);", [...]);
    NativeFunc F = CU.getFunction("puts");
    const char* s = "hello world!";
    void* Args[] = {&s};
    int Ret;
    F.call(&Ret, Args);

    So, basically, a pointer to the returned data and an array of void* is given to DragonFFI. Each void* value is a pointer to the data that must be passed to the underlying function. So the last missing piece of the puzzle is the code that takes this array of void* (and pointer to the returned data) and calls puts, so a function like this:

    void call_puts(void* Ret, void** Args) {
    *((int*)Ret) = puts((const char*) Args[0]);
    }

    We call these "function wrappers" (how original! :)). One advantage of this signature is that it is a generic signature, which can be used in the implementation of DragonFFI. Supposing we manage to compile at run-time this function, we can then call it trivially as in the following:

    typedef void(*puts_call_ty)(void*, void**);
    puts_call_ty Wrapper = /* pointer to the compiled wrapper function */;
    Wrapper(Ret, Args);

    Generating and compiling a function like this is something Clang/LLVM is able to do. For the record, this is also what libffi mainly does, by generating the necessary assembly by hand. We optimize the number of these wrappers in DragonFFI, by generating them for each different function type. So, the actual wrapper that would be generated for puts is actually this one:

    void __dffi_wrapper_0(int32_t( __attribute__((cdecl)) *__FPtr)(char *), int32_t *__Ret, void** __Args) {
    *__Ret = (__FPtr)(*((char **)__Args[0]));
    }

    For now, all the necessary wrappers are generated when the DFFI::cdef or DFFI::compile APIs are used. The only exception where they are generated on-the-fly (when calling CompilationUnit::getFunction) is for variadic arguments. One possible evolution is to let the user chooses whether he wants this to happen on-the-fly or not for every declared function.

    Issues with Clang

    There is one major issue with Clang that we need to hack around in order to have the DFFI::cdef functionality: unused declarations aren't emitted by Clang (even when using -g -femit-all-decls).

    Here is an example, produced from the following C code:

    typedef struct {
    short a;
    int b;
    } A;

    void print_A(A s);
    $ clang -S -emit-llvm -g -femit-all-decls -o - a.c |grep print_A |wc -l
    0

    The produced LLVM IR does not contain a function named print_A! The hack we temporarily use parses the clang AST and generates temporary functions that looks like this:

    void __dffi_force_decl_print_A(A s) { }

    This forces LLVM to generate an empty function named __dffi_force_decl_print_A with the good arguments (and associated debug informations).

    This is why DragonFFI proposes another API, DFFI::compile. This API does not force declared-only functions to be present in the LLVM IR, and will only expose functions that end up naturally in the LLVM IR after optimizations.

    If someone has a better idea to handle this, please let us know!

    Python bindings

    Python bindings were the first ones to have been written, simply because it's the "high level" language I know best.  Python provides its own set of challenges, but we will save that for another blog post.  These Python bindings are built using pybind11, and provides their own set of C types. Lots of example of what can be achieved can be found here and here.

    Project status

    DragonFFI currently supports Linux, OSX and Windows OSes, running on Intel 32 and 64-bits CPUs. Travis is used for continuous integration, and every changes is validated on all these platforms before being integrated.

    The project will go from alpha to beta quality when the 0.3 version will be out (which will bring Travis and Appveyor CI integration and support for variadic functions). The project will be considered stable once these things happen:
    • user and developer documentations exist!
    • another foreign language is supported (JS? Ruby?)
    • the DragonFFI main library API is considered stable
    • a non negligible list of tests have been added
    • all the things in the TODO file have been done :)

    Various ideas for the future

    Here are various interesting ideas we have for the future. We don't know yet when they will be implemented, but we think some of them could be quite nice to have.

    Parse embedded DWARF information

    As the entry point of DragonFFI are DWARF informations, we could imagine parsing these debug informations from shared libraries that embed them (or provide them in a separate file). The main advantage is that all the necessary information for doing the FFI right are in one file, the header files are no longer required. The main drawback is that debug informations tend to take a lot of space (for instance, DWARF informations take 1.8Mb for libarchive 3.32 compiled in release mode, for an original binary code size of 735Kb), and this brings us to the next idea.

    Lightweight debug info?

    The DWARF standard allows to define lots of information, and we don't need all of them in our case. We could imagine embedding only the necessary DWARF objects, that is just the necessary types to call the exported functions of a shared library. One experiment of this is available here: https://github.com/aguinet/llvm-lightdwarf. This is an LLVM optimisation pass that is inserted at the end of the optimisation pipeline, and parse metadata to only keep the relevant one for DragonFFI. More precisely, it only keeps the dwarf metadata related to exported and visible functions, with the associated types. It also keeps debug information of global variables, even thought these ones aren't supported yet in DragonFFI. It also does some unconventional things, like replacing every file and directory by "_", to save space. "Fun" fact, to do this, it borrows some code from the LLVM bitcode "obfuscator" included in recent Apple's clang version, that is used to anonymize some information from the LLVM bitcode that is sent with tvOS/iOS applications (see http://lists.llvm.org/pipermail/llvm-dev/2016-February/095588.html for more information).

    Enough talking, let's see some preliminary results (on Linux x64):
    • on libarchive 3.3.2, DWARF goes from 1.8Mb to 536Kb, for an original binary code size of 735Kb
    • on zlib 1.2.11, DWARF goes from 162Kb to 61Kb, for an original binary code size of 99Kb
    The instructions to reproduce this are available in the README of the LLVM pass repository.
    We can conclude that defining this "light" DWARF format could be a nice idea. One other thing that could be done is defining a new binary format, that would be thus more space-efficient, but there are drawbacks going this way:
    • debug informations are well supported on every platform nowadays: tools exist to parse them, embed/extract them from binary, and so on
    • we already got DWARD and PDB: https://xkcd.com/927/
    Nevertheless, it still could be a nice experiment to try and do this, figuring out the space won and see if this is worth it!

    As a final note, these two ideas would also benefit to libffi, as we could process these formats and create libffi types!

    JIT code from the final language (like Python) to native function code

    One advantage of embedding a full working C compiler is that we could JIT the code from the final language glue to the final C function call, and thus limit the performance impact of this glue code.
    Indeed, when a call is issued from Python, the following things happen:
    • arguments are converted from Python to C according to the function type
    • the function pointer and wrapper and gathered from DragonFFI
    • the final call is made
    All this process involves basically a loop on the types of the arguments of the called function, which contains a big switch case. This loop generates the array of void* values that represents the C arguments, which is then passed to the wrapper. We could JIT a specialised version of this loop for the function type, inline the already-compiled wrapper and apply classical optimisation on top of the resulting IR, and get a straightforward conversion code specialized for the given function type, directly from Python to C.

    One idea we are exploring is combining easy::jit (hello fellow Quarkslab teammates!) with LLPE to achieve this goal.

    Reducing DragonFFI library size

    The DragonFFI shared library embed statically compiled versions of LLVM and Clang. The size of the final shared library is about 55Mb (stripped, under Linux x64). This is really really huge, compared for instance to the 39Kb of libffi (also stripped, Linux x64)!

    Here are some idea to try and reduce this footprint:
    • compile DragonFFI, Clang and LLVM using (Thin) LTO, with visibility hidden for both Clang and LLVM. This could have the effect of removing code from Clang/LLVM that isn't used by DragonFFI.
    • make DragonFFI more modular: - one core module that only have the parts from CodeGen that deals with ABIs. If the types and function prototypes are defined "by hand" (without DFFI::cdef), that's more or less the only part that is needed (with LLVM obviously) - one optional module that includes the full clang compiler (to provide the DFFI::cdef and DFFI::compile APIs)
    Even with all of this, it seems to be really hard to match the 39Kb of libffi, even if we remove the cdef/compile API from DragonFFI. As always, pick the right tool for your needs :)

    Conclusion

    Writing the first working version of DragonFFI has been a fun experiment, that made me discover new parts of Clang/LLVM :) The current goal is to try and achieve a first stable version (see above), and experiment with the various cited ideas.

    It's a really long road, so feel free to come on #dragonffi on FreeNode for any questions/suggestions you might have, (inclusive) or if you want to contribute!

    Acknowledgments

    Thanks to Serge «sans paille» Guelton for the discussions around the Python bindings, and for helping me finding the name of the project :) (one of the most difficult task). Thanks also to him, Fernand Lone-Sang and Kévin Szkudlapski for their review of this blog post!

    by Adrien Guinet (noreply@blogger.com) at March 13, 2018 02:45 PM

    March 08, 2018

    LLVM Blog

    International Women's Day: Celebrating all the women in the LLVM Community!


    Today is International Women's Day! To all the women in the LLVM community, thank you for all your contributions!

    The LLVM Foundation values diversity within the LLVM community and the field of compilers and tools. Our Women in Compilers and Tools program began in 2015 with a birds of a feather discussion during the US LLVM Developers' Meeting and we have been expanding it over the years.

    In 2017, we were a sponsor of the Grace Hopper Conference. With the help of community members Anna Zaks and David Blaikie, the LLVM Foundation had a booth at the career fair to introduce women to LLVM and encourage them to become contributors. It was very exciting to learn that many women knew of LLVM, were using it in their classes or research, using it in their career, or were interested in learning more. We hopefully encouraged more women to get involved with LLVM, compilers, and open source.

    The LLVM Foundation was also a sponsor of the Programming Language Mentoring Workshop at SPLASH 2017. Our sponsorship went towards the travel costs for many women and other minorities to attend this workshop. The workshop focused on encouraging and preparing students to enter research careers in the field of programming languages, compilers, and related fields and to provide first hand perspectives on graduate school.

    We hosted our first Women in Compilers & Tools reception before the 2017 US LLVM Developers' Meeting. Anna Zaks and Alice Chan participated in a panel discussion about the challenges and experiences that they have encountered in their careers and within the open source community. The event was attended by 60 members of the LLVM community.

    In 2018, we look forward to another year of expanding our program. The LLVM Foundation will again sponsor the Grace Hopper Conference and we are looking for LLVM community members to help out at the career booth (more details to come). We will be having two Women in Compilers and Tools events. The first will have a reception and panel discussion before the 2018 EuroLLVM Developers' Meeting. Get your tickets here. The second will be before the 2018 US LLVM Developers' Meeting and details will be announced in the coming months.

    The LLVM Foundation thanks the LLVM community and its sponsors for supporting this work. If you want to participate in the discussion or receive notifications on events, please join the Women in Compilers and Tools mailing list.

    Question for the LLVM Foundation? Email us at llvm-foundation@lists.llvm.org.



    by Anonymous (noreply@blogger.com) at March 08, 2018 07:15 PM

    March 06, 2018

    LLVM Blog

    Clang is now used to build Chrome for Windows

    As of Chrome 64, Chrome for Windows is compiled with Clang. We now use Clang to build Chrome for all platforms it runs on: macOS, iOS, Linux, Chrome OS, Android, and Windows. Windows is the platform with the second most Chrome users after Android according to statcounter, which made this switch particularly exciting.

    Clang is the first-ever open-source C++ compiler that’s ABI-compatible with Microsoft Visual C++ (MSVC) – meaning you can build some parts of your program (for example, system libraries) with the MSVC compiler (“cl.exe”), other parts with Clang, and when linked together (either by MSVC’s linker, “link.exe”, or LLD, the LLVM project’s linker – see below) the parts will form a working program.

    Note that Clang is not a replacement for Visual Studio, but an addition to it. We still use Microsoft’s headers and libraries to build Chrome, we still use some SDK binaries like midl.exe and mc.exe, and many Chrome/Win developers still use the Visual Studio IDE (for both development and for debugging).

    This post discusses numbers, motivation, benefits and drawbacks of using Clang instead of MSVC, how to try out Clang for Windows yourself, project history, and next steps. For more information on the technical side you can look at the slides of our 2015 LLVM conference talk, and the slides linked from there.

    Numbers

    This is what most people ask about first, so let’s talk about it first. We think the other sections are more interesting though.

    Build time

    Building Chrome locally with Clang is about 15% slower than with MSVC. (We’ve heard that Windows Defender can make Clang builds a lot slower on some machines, so if you’re seeing larger slowdowns, make sure to whitelist Clang in Windows Defender.) However, the way Clang emits debug info is more parallelizable and builds with a distributed build service (e.g. Goma) are hence faster.

    Binary size

    Chrome installer size gets smaller for 64-bit builds and slightly larger for 32-bit builds using Clang. The same difference shows in uncompressed code size for regular builds as well (see the tracking bug for Clang binary size for many numbers). However, compared to MSVC builds using link-time code generation (LTCG) and profile-guided optimization (PGO) Clang generates larger code in 64-bit for targets that use /O2 but smaller code for targets that use /Os. The installer size comparison suggests Clang's output compresses better.

    Some raw numbers for versions 64.0.3278.2 (MSVC PGO) and 64.0.3278.0 (Clang). mini_installer.exe is Chrome’s installer that users download, containing the LZMA-compressed code. chrome_child.dll is one of the two main dlls; it contains Blink and V8, and generally has many targets that are built with /O2. chrome.dll is the other main dll, containing the browser process code, mostly built with /Os.



    mini_installer.exe
    chrome.dll
    chrome_child.dll
    chrome.exe
    32-bit win-pgo
    45.46 MB
    36.47 MB
    53.76 MB
    1.38 MB
    32-bit win-clang
    45.65 MB
    (+0.04%)
    42.56 MB (+16.7%)
    62.38 MB
    (+16%)
    1.45 MB
    (+5.1%)
    64-bit win-pgo
    49.4 MB
    53.3 MB
    65.6 MB
    1.6 MB
    64-bit win-clang
    46.27 MB
    (-6.33%)
    50.6 MB
    (-5.1%)
    72.71 MB
    (+10.8%)
    1.57 MB
    (-1.2%)

    Performance

    We conducted extensive A/B testing of performance. Performance telemetry numbers are about the same for MSVC-built and clang-built Chrome – some metrics get better, some get worse, but all of them are within 5% of each other. The official MSVC builds used LTCG and PGO, while the Clang builds currently use neither of these. This is potential for improvement that we look forward to exploring. The PGO builds took a very long time to build due to the need for collecting profiles and then building again, and as a result, the configuration was not enabled on our performance-measurement buildbots. Now that we use Clang, the perf bots again track the configuration that we ship.

    Startup performance was worse in Clang-built Chrome until we started using a link-order file – a form of “PGO light” .

    Stability

    We A/B-tested stability as well and found no difference between the two build configurations.

    Motivation

    There were many motivating reasons for this project, the overarching theme being the benefits of using the same compiler across all of Chrome’s platforms, as well as the ability to change the compiler and deploy those changes to all our developers and buildbots quickly. Here’s a non-exhaustive list of examples.
    • Chrome is heavily using technology that’s based on compiler instrumentation (ASan, CFI, ClusterFuzz—uses ASan). Clang supports this instrumentation already, but we can’t add it to MSVC. We previously used after-the-fact binary instrumentation to mitigate this a bit, but having the toolchain write the right bits in the first place is cleaner and faster.
    • Clang enables us to write compiler plugins that add Chromium-specific warnings and to write tooling for large-scale refactoring. Chromium’s code search can now learn to index Windows code.
    • Chromium is open-source, so it’s nice if it’s built with an open-source toolchain.
    • Chrome runs on 6+ platforms, and most developers are only familiar with 1-3 platforms. If your patch doesn’t compile on a platform you’re unfamiliar with, due to a compiler error that you can’t locally reproduce on your local development machine, it’ll take you a while to fix. On the other hand, if all platforms use the same compiler, if it builds on your machine then it’s probably going to build on all platforms.
    • Using the same compiler also means that compiler-specific micro-optimizations will help on all platforms (assuming that the same -O flags are used on all platforms – not yet the case in Chrome, and only on the same ISAs – x86 vs ARM will stay different).
    • Using the same compiler enables cross-compiling – developers who feel most at home on a Linux box can now work on Windows-specific code, from their Linux box (without needing to run Wine).
    • We can continuously build Chrome trunk with Clang trunk to find compiler regressions quickly. This allows us to update Clang every week or two. Landing a major MSVC update in Chrome usually took a year or more, with several rounds of reporting internal compiler bugs and miscompiles. The issue here isn’t that MSVC is more buggy than Clang – it isn’t, all software is buggy – but that we can continuously improve Clang due to Clang being open-source.
    • C++ receives major new revisions every few years. When C++11 was released, we were still using six different compilers, and enabling C++11 was difficult. With fewer compilers, this gets much easier.
    • We can prioritize compiler features that are important to us. For example:

    Of course, not all – or even most – of these reasons will apply to other projects.

    Benefits and drawbacks of using Clang instead of Visual C++

    Benefits of using Clang, if you want to try for your project:
    • Clang supports 64-bit inline assembly. For example, in Chrome we built libyuv (a video format conversion library) with Clang long before we built all of Chrome with it. libyuv had highly-tuned 64-bit inline assembly with performance not reachable with intrinsics, and we could just use that code on Windows.
    • If your project runs on multiple platforms, you can use one compiler everywhere. Building your project with several compilers is generally considered good for code health, but in Chrome we found that Clang’s diagnostics found most problems and we were mostly battling compiler bugs (and if another compiler has a great new diagnostic, we can add that to Clang).
    • Likewise, if your project is Windows-only, you can get a second compiler’s opinion on your code, and Clang’s warnings might find bugs.
    • You can use Address Sanitizer to find memory bugs.
    • If you don’t use LTCG and PGO, it’s possible that Clang might create faster code.
    • Clang’s diagnostics and fix-it hints.
    There are also drawbacks:
    • Clang doesn’t support C++/CX or #import “foo.dll”.
    • MSVC offers paid support, Clang only gives you the code and the ability to write patches yourself (although the community is very active and helpful!).
    • MSVC has better documentation.
    • Advanced debugging features such as Edit & Continue don’t work when using Clang.

    How to use

    If you want to give Clang for Windows a try, there are two approaches:
    1. You could use clang-cl, a compiler driver that tries to be command-line flag compatible with cl.exe (just like Clang tries to be command-line flag compatible with gcc). The Clang user manual describes how you can tell popular Windows build systems how to call clang-cl instead of cl.exe. We used this approach in Chrome to keep the Clang/Win build working alongside the MSVC build for years, with minimal maintenance cost. You can keep using link.exe, all your current compile flags, the MSVC debugger or windbg, ETW, etc. clang-cl even writes warning messages in a format that’s compatible with cl.exe so that you can click on build error messages in Visual Studio to jump to the right file and line. Everything should just work.
    2. Alternatively, if you have a cross-platform project and want to use gcc-style flags for your Windows build, you can pass a Windows triple (e.g. --target=x86_64-windows-msvc) to regular Clang, and it will produce MSVC-ABI-compatible output. Starting in Clang 7.0.0, due Fall 2018, Clang will also default to CodeView debug info with this triple.
    Since Clang’s output is ABI-compatible with MSVC, you can build parts of your project with clang and other parts with MSVC. You can also pass /fallback to clang-cl to make it call cl.exe on files it can’t yet compile (this should be rare; it never happens in the Chrome build).

    clang-cl accepts Microsoft language extensions needed to parse system headers but tries to emit -Wmicrosoft-foo warnings when it does so (warnings are ignored for system headers). You can choose to fix your code, or pass -Wno-microsoft-foo to Clang.

    link.exe can produce regular PDB files from the CodeView information that Clang writes.

    Project History

    We switched chrome/mac and chrome/linux to Clang a while ago. But on Windows, Clang was still missing support for parsing many Microsoft language extensions, and it didn’t have any Microsoft C++ ABI-compatible codegen at all. In 2013, we spun up a team to improve Clang’s Windows support, consisting half of Chrome engineers with a compiler background and half of other toolchain people. In mid-2014, Clang could self-host on Windows. In February 2015, we had the first fallback-free build of 64-bit Chrome, in July 2015 the first fallback-free build of 32-bit Chrome (32-bit SEH was difficult). In Oct 2015, we shipped a first clang-built Chrome to the Canary channel. Since then, we’ve worked on improving the size of Clang’s output, improved Clang’s debug information (some of it behind -instcombine-lower-dbg-declare=0 for now), and A/B-tested stability and telemetry performance metrics.

    We use versions of Clang that are pinned to a recent upstream revision that we update every one to three weeks, without any local patches. All our work is done in upstream LLVM.

    Mid-2015, Microsoft announced that they were building on top of our work of making Clang able to parse all the Microsoft SDK headers with clang/c2, which used the Clang frontend for parsing code, but cl.exe’s codegen to generate code. Development on clang/c2 was halted again in mid-2017; it is conceivable that this was related to our improvements to MSVC-ABI-compatible Clang codegen quality. We’re thankful to Microsoft for publishing documentation on the PDB file format, answering many of our questions, fixing Clang compatibility issues in their SDKs, and for giving us publicity on their blog! Again, Clang is not a replacement for MSVC, but a complement to it.

    Opera for Windows is also compiled with Clang starting in version 51.

    Firefox is also looking at using clang-cl for building Firefox for Windows.

    Next Steps

    Just as clang-cl is a cl.exe-compatible interface for Clang, lld-link is a link.exe-compatible interface for lld, the LLVM linker. Our next step is to use lld-link as an alternative to link.exe for linking Chrome for Windows. This has many of the same advantages as clang-cl (open-source, easy to update, …). Also, using clang-cl together with lld-link allows using LLVM-bitcode-based LTO (which in turn enables using CFI) and using PE/COFF extensions to speed up linking. A prerequisite for using lld-link was its ability to write PDB files.
    We’re also considering using libc++ instead of the MSVC STL – this allows us to instrument the standard library, which is again useful for CFI and Address Sanitizer.

    In Closing

    Thanks to the whole LLVM community for helping to create the first new production C++ compiler for Windows in over a decade, and the first-ever open-source C++ compiler that’s ABI-compatible with MSVC!

    by Nico Weber (noreply@blogger.com) at March 06, 2018 06:54 PM

    February 14, 2018

    LLVM Blog

    LLVM accepted to 2018 Google Summer of Code!

    We are excited to announce the LLVM project has been accepted to 2018 Google Summer of Code!

    What is Google Summer of Code?

    Google Summer of Code (GSoC) is a global program focused on introducing students to open source software development. Students work on a 3 month programming project with an open source organization during their break from university. There are several benefits to this program for both the students and LLVM:

    • Inspire students to get involved with open source, compilers and LLVM
    • Give students exposure to real-world software development while getting paid a stipend
    • Allow students to do paid work related to their academic pursuits versus getting an unrelated summer job
    • Bring new developers into the LLVM project
    • Some LLVM bugs get fixed or new features get added

    Students - Apply now! 

    Ok, so you can't apply right now as the official application to GSoC does not open until March 12, 2018, but you must begin discussing your project on the LLVM mailing lists well before that date. There are many open projects listed on our webpage. Once you have selected a project, you will discuss it on the appropriate mailing list.

    If you have an idea for a project that is not listed, you can always propose it on the list as well and seek out a mentor.

    Key Dates to Remember

    We have listed a few key dates here, but always consult the official GSoC timeline to confirm.

    • March 12 (16:00 UTC) - Applications open
    • March 27 (16:00 UTC) - Deadline to file your application
    • April 23 (16:00 UTC) - Accepted student proposals are announced
    • May 14 - Coding begins


    LLVM Developers - Consider being mentor!

    This program is not a success without our mentors. Thank you to all that have all who have already volunteered! If you have never mentored a GSoC project but are curious, it is not too late to volunteer! You can either select an open project without a mentor or propose your own. Make sure to get it listed on the webpage so that students can see it as an option.

    If mentoring just isn't an option for you at this time, consider helping the project out my spreading the word about GSoC.

    Questions?

    If you have questions about the program for the organizers, please email gsoc@lists.llvm.org. Project specific questions should be sent to the appropriate developer mailing list instead.

    by Anonymous (noreply@blogger.com) at February 14, 2018 09:38 PM

    January 09, 2018

    LLVM Blog

    Improving Link Time on Windows with clang-cl and lld

    One of our goals in bringing clang and lld to Windows has always been to improve developer experience, and what is it that developers want the most?  Faster build times!  Recently, our focus has been on improving link time because it's the step that's the hardest to parallelize so we can't fall back on the time honored tradition of throwing more cores at it.

    Of the various steps involved in linking, generating the debug info (which, on Windows, is a PDB file) is by far the slowest since it involves merging O(# of linker inputs) sequences of type records, most of which are duplicate anyway.  For example, if two cpp files both include <string>, then both of those object files will have hundreds of duplicate type records that need to be de-duplicated during the link step.  This means you have to compute O(M x N) hash values, even though only a small fraction of those ultimately contribute to the final PDB.

    Several strategies have been invented to deal with this over the years and try to make linking faster.  Many years ago, Microsoft introduced the notion of a Type Server (enabled via /Zi compiler option in MSVC), which moves some of the work into the compiler (to take advantage of parallelism).  More recently we have been given the /DEBUG:FASTLINK linker option which attempts to solve the problem by not merging types at all in the linker.  However, each of these strategies has its own set of disadvantages, and neither can be considered perfect for all use cases.

    In this blog post, we'll first go over some technical background about CodeView so that we can understand the problem, followed by a summary of existing attempts to speed up type merging.  Then, we'll describe a novel extension to the PE/COFF file format which speeds up linking by offloading part of the work required to de-duplicate types to the compiler and using a new algorithm which uniquely identifies type records even across input files, and discuss the various tradeoffs of each approach.  Finally, we'll present some benchmarks and discuss how you can try this out in clang-cl and lld today.


    Background

    Consider a simple structure in C++, defined like this a header file:

         struct Node {
           Node *Next = nullptr;
           Node *Prev = nullptr;
           int Value = 0;
         };

    Since each compilation happens independently of every other compilation, the compiler cannot assume any other translation unit will ever emit the records necessary to describe this type.  As a result, to guarantee that the type makes it into the final PDB, every compiler instance that encounters this definition must emit type information for this type.  So the record will be serialized by the compiler into a series of records that looks roughly like this:

    0x1004 | LF_STRUCTURE [size = 40] `Node`
             unique name: `.?AUNode@@`
             vtable: <none>
             base list: <none>
             field list: <none>
             options: forward ref | has unique name
    0x1005 | LF_POINTER [size = 12]
             referent = 0x1004
             mode = pointer
             opts = None
             kind = ptr32
    0x1006 | LF_FIELDLIST [size = 52]
             - LF_MEMBER
               name = `Next`
               Type = 0x1005
               Offset = 0
               attrs = public
             - LF_MEMBER
               name = `Prev`
               Type = 0x1005
               Offset = 4
               attrs = public
             - LF_MEMBER
               name = `Value`
               Type = 0x0074 (int)
               Offset = 8
               attrs = public
    0x1007 | LF_STRUCTURE [size = 40] `Node`
             unique name: `.?AUNode@@`
             vtable: <none>
             base list: <none>
             field list: 0x1006
             options: has unique name
    The values on the left correspond to the types index in the type sequence and depend on what types have already been encountered, while other types can the refer to them (for example, referent = 0x1004) means that this record is a pointer to whatever the type at index 0x1004 was.

    As a result of this design, another compilation unit which includes the same header file will need to emit this exact same type, with the only difference being the indices (since the other compilation may encounter other types before this one, causing the ordering to be different).

    In short, type indices only make sense within the context of a single type sequence (i.e. compiland), but since the linker needs to see across all object files, it has to have some way of identifying whether a type from object file A is isomorphic to a different type from object file B, even if its type indices might be different numerically from any previously seen type. 

    This algorithm, henceforth referred to as type merging, is the primary consumer of CPU cycles during linking (measured in LLD, and estimated in MSVC linker by comparing /DEBUG:FULL vs /DEBUG:FASTLINK times), and as such it is the portion of the linking process which this blog post presents a new solution to.

    Existing Solutions

    It’s worthwhile to discuss some of the existing attempts to reduce the cost associated with type merging so that we can compare and contrast their various pros and cons.

    Type Servers (/Zi)


    The /Zi compiler option was one of the first attempts to address type merging speed, and it dates back many years.  The idea behind type servers is to offload the work of de-duplication from the linking phase to the compilation phase.  Most build systems already support parallel compilation, and even if they don’t cl.exe supports it natively via the /MP compiler switch, so there is no roadblock to anyone taking advantage of parallel compilation. 



    To implement type servers, each compilation process communicates via IPC with a single process (mspdbsrv.exe) whose job is to de-duplicate type records on the fly, and when a record is isomorphic to an existing record, the type server communicates back the previously saved index, and when it is new it sends back a new index.  This allows type deduplication to happen mostlyin parallel, but adding some overhead to each compilation (since there is contention over a global lock) in return for significantly reduced link times, since types will already have been merged.


    Type servers bring with them some disadvantages though, so we enumerate them here:
    1. Type servers add significant context switching and global lock contention to the compilation phase, reducing parallelism and degrading overall system performance while a build is in process.  While some performance is reclaimed from the linker, some is sacrificed due to the use of a global system lock.  It’s still a net win, but as it is not free, it leaves open the possibility that we may be able to achieve better parallelism using a different approach.
    2. The type server process itself (mspdbsrv.exe) introduces a single point of failure.  When it crashes (we see C1033 several times per day on Chrome, for example, which seems to indicate an mspdbsrv.exe crash) it could trigger a full rebuild if the type server PDB file is left in a corrupt state.
    3. mspdbsrv is incompatible with distributed builds, which is a show-stopper for large applications that can take several hours to build on normal workstations.  Type servers operate only via local IPC.  While multi-processing works well for small applications, many large products have build farms that distribute compilations among tens or hundreds of physical machines.  Type servers are incompatible with this scenario.

    Fastlink PDBs

    Fastlink PDBs are a relatively recent introduction, and the approach used by this solution is to eliminate type merging entirely.  To support this, special metadata is set in the PDB file to indicate to the tool that this is a fastlink PDB, and when the tool (e.g. debugger) encounters this metadata, it will fetch all type information from the original object file, rather than from the PDB.  As before, there are several disadvantages to this approach, enumerated here:
    1. The pdbcopy utility is almost unusable with fastlink PDBs for performance reasons.
    2. Since type merging doesn’t happen, indexing of type information also doesn’t happen (since the expensive part of building an index -- the hashing -- comes for free when you were hashing the record anyway).  This leads to degradation in the debugger user experience, since waits which previously happened only at build time now happen at debug-time.
    3. Fastlink PDBs are not portable.  The PDB references the object files by path, so if you copy the PDB and object files to a different machine (or even different path on the same machine) for archival purposes, they can no longer be debugged.  This is a deal-breaker for using it on production builds
    4. Symbols can’t be enumerated in a Fastlink PDB.  This is most obvious if you attempt to use DIA SDK on a Fastlink PDB, where it will simply refuse to do anything at all.  This means that the only externally supported way of querying debug info for users is impossible against a Fastlink PDB.  Beyond that, however, it also means that even Microsoft’s own tools which need to enumerate symbols cannot use any standard API for doing so.  For example, WinDbg doesn’t fully support Fastlink PDBs, and many workflows are broken by the use of them, even using supported Microsoft tools.
    5. It has several serious stability issues which make it unusable on large projects  [ref].  This is probably related to point 4 above, namely the fact that every tool that wants to be able to work with a Fastlink PDB needs to use different code than the SDK that has been tested and battle-hardened through years of development.
    6. When compiling with clang-cl and linking with /debug:fastlink the compiler has to be instructed to emit additional debug information, making .obj files about 29% larger.

    Clang's Solution - The COFF .debug$H section

    This new approach tries to combine the ideas behind type servers and fastlink PDBs.  Like type servers, it attempts to offload the work of de-duplication to the compilation phase so that it can be done in parallel.  However, it does so using an algorithm with the property that the resulting hash can be used to identify a type record even across type streams.  Specifically, if two records have the same hash, they are the same record even if they are from different object files.  If you can take it on faith that such an algorithm exists (which will be henceforth referred to as a global hash), then the amount of work that the linker needs to perform is greatly reduced.  And the work that it does still have to do can be done much quicker.  Perhaps most importantly, it produces a byte-for-byte identical PDB to when the option is not used, meaning all of the issues surrounding Fastlink PDBs and compatibility are gone.

    Previously, the linker would do something that looks roughly like this:

         HashTable<Type> HashedTypes;
         vector<Type> MergedTypes;
         for (ObjectFile &Obj : Objects) {
           for (Type &T : Obj.types()) {
             remapAllTypeIndices(MergedTypes, T);

             if (!HashedTypes.try_insert(T))
               continue;
             MergedTypes.push_back(T);
           }
         }
    The important observations here are:
    1. remapAllTypeIndices is called unconditionally for every type in every object file.
    2. A hash of the type is computed unconditionally for every type
    3. At least one full record comparison is done for every type.  In practice it turns out to be much more, because hash buckets are computed modulo table size, so there will actually be 1 full record comparison for every probe.
    Given a global hash function as described above, the algorithm can be re-written like this:
          HashMap<SHA1, int> HashToIndex;
          vector<Type> OrderedTypes;
          for (ObjectFile &Obj : Objects) {
            auto Hashes = Obj.DebugHSectionHashes;
            for (int I=0; I < Obj.NumTypes; ++I) {
              int NextIndex = OrderedTypes.size();
              if (!HashToIndex.try_emplace(Hashes[I], NextIndex))
                continue;
              remapAllTypeIndices(T);
              OrderedTypes.push_back(T);
            }
          }

    While this appears very similar, its performance characteristics are quite different.
    1. remapAllTypeIndices is only called when the record is actually new.  Which, as we discussed earlier, is a small fraction of the time over many linker inputs.
    2. A hash of the type is never computed by the linker.  It is simply there in the object file (the exception to this is mixed linker inputs, discussed earlier, but those are a small fraction of input files).
    3. Full record comparisons never happen.  Since we are using a strong hash function with negligible chance of false collisions, and since the hash of a record provides equality semantics across streams, the hash is as good as the record itself.

    Combining all of these points, we get an algorithm that is extremely cache friendly.  Amortized over all input files, most records during type merging are cache hits (i.e. duplicate records).  With this algorithm when we get a cache hit, the only two data structures that are accessed are:
    1. An array of contiguous hash values.
    2. An array of contiguous hash buckets.
    Since we never do full equality comparison (which would blow out the L1 and sometimes even L2 cache due to the average size of a type record being larger than a cache line) the algorithm here is very fast.

    We’ve deferred discussion of how to create such a hash up until now, but it is actually fairly straightforward.  We use what is known as a “tree hash” or “Merkle tree”.  The idea is to pass bytes from a type record directly to the hash function up until the point we get to a type index.  Then, instead of passing the numeric value of the type index to the hash function, we pass the previously computed hash of the record that is being referenced.

    Such a hash is very fast to compute in the compiler because the compiler must already hash types anyway, so the incremental cost to emit this to the .debug$H section is negligible.  For example, when a type is encountered in a translation unit, before you can add that type to the object file’s .debug$T section, it must first be verified that the type has not already been added.  And since this is happening naturally in the order in which types are encountered, all that has to be done is to save these hash values in an array indexed by type index, and subsequent hash operations will have O(1) access to all of the information needed to compute this merkle hash.
      

    Mixed Input Files and Compiler/Linker Compatibility

    A linker must be prepared to deal with a mixed set of input files.  For example, while a particular compiler may choose to always emit .debug$H sections, a linker must be prepared to link objects that for whatever reason do not have this section.  To handle this, the linker can examine all inputs up front and manually compute hashes for inputs with missing .debug$Hsections.  In practice this proves to be a small fraction and the penalty for doing this serially is negligible, although it should be noted that in theory this can also be done as a parallel pre-processing step if some use cases show that this has non-negligible cost.

    Similarly, the emission of this section in an object file has no impact on linkers which have not been taught to use it.  Since it is a purely additive (and optional) inclusion into the object file, any linker which does not understand it will continue to work exactly as it does today.

    The On-Disk Format

    Clang uses the following on-disk format for the .debug$H section.

               0x0     : <Section Magic>  (4 bytes)
         0x4     : <Version>        (2 bytes)
         0x6     : <Hash Algorithm> (2 bytes)
         0x8     : <Hash Value>     (N bytes)
         0x8 + N : <Hash Value>     (N bytes)
                        …

    Here, “Section Magic” is an arbitrarily chosen 4-byte number whose purpose is to provide some level of certainty that what we’re seeing is a real .debug$H section, and not some section that someone created that accidentally happened to be called that.   Our current implementation uses the value 0x133C9C5, which represents the date of the initial prototype implementation.  But this can be any reasonable value here, as long as it never changes.

    “Version” is reserved for future use, so that the format of the section can theoretically change.

    “Hash Algorithm” is a value that indicates what algorithm was used to generate the hashes that follow.  As such, the value of N above is also a function of what hash algorithm is used.  Currently, the only proposed value for Hash Algorithm is SHA1 = 0, which would imply N = 20 when Hash Algorithm = 0.  Should it prove useful to have truncated 8-byte SHA1 hashes, we could define SHA1_8 = 1, for example.

    Limitations and Pitfalls

    The biggest limitation of this format is that it increases object file size.  Experiments locally on fairly large projects show an average aggregate object file size increase of ~15% compared to /DEBUG:FULL (which, for clang-cl, actually makes .debug$H object files smallerthan those needed to support /DEBUG:FASTLINK).

    There is another, less obvious potential pitfall as well.  The worst case scenario is when no input files have a .debug$H section present, but this limitation is the same in principle even if only a subset of files have a .debug$H section.  Since the linker must agree on a single hash function for all object files, there is the question of what to do when not all object files agree on hash function, or when not all object files contain a .debug$H section.  If the code is not written carefully, you could get into a situation where, for example, no input files contain a .debug$H section so the linker decides to synthesize one on the fly for every input file.  Since SHA1 (for example) is quite slow, this could cause a huge performance penalty.

    This limitation can be coded around with some care, however.  For example, tree hashes can be computed up-front in parallel as a pre-processing step.  Alternatively, a hash function could be chosen based on some heuristic estimate of what would likely lead to the fastest link (based on the percentage of inputs that had a .debug$H section, for example).  There are other possibilities as well.  The important thing is to just be aware of this potential pitfall, and if your links become very slow, you'll know that the first thing you should check is "do all my object files have .debug$H sections?"

    Finally, since a hash is considered to be identical to the original record, we must consider the possibility of collisions.  That said, this does not appear to be a serious concern in practice.  A single PDB can have a theoretical maximum of 232 type records anyway (due to a type index being 4 bytes).  The following table shows the expected number of type records needed for a collision to exist as a function of hash size.
    Hash Size (Bytes)
    Average # of records needed for a collision
    4
    82,137
    6
    21,027,121
    8
    5,382,943,231
    12
    3.53 x 1014
    16
    2.31 x 1019
    20
    1.52 x 1024
    Given that this is strictly for debug information and not generated code, it’s worth thinking about the severity of a collision.  We feel that an 8-byte hash is probably acceptable for real world use.

    Benchmarks

    Here we will give some benchmarks on large real world applications (specifically, Chrome and clang).  The times presented are only for the linker.  gn args for each build of chromium are specified at the end..


    Toolchain 

    Mode
    Target
    blink_core.dll
    content.dll
    chrome.dll
    clang.exe
    MSVC
    /DEBUG:FULL
    553.11s
    205.45s
    507.17s
    62.45s
    MSVC
    /DEBUG:FASTLINK
    116.77s
    56.05s
    67.80s
    29.37s
    lld-link
    /DEBUG:FULL
    121.17s
    42.10s
    42.31s
    24.14s
    lld-link
    /DEBUG:GHASH
    88.71s
    33.30s
    34.76s
    17.99s


    The numbers here indicate a reduction in link time of up to 30% by enabling /DEBUG:GHASH in lld.

    It's worth mentioning that lld does not yet have support for incremental linking so we could not compare the cost of an incremental link with /DEBUG:GHASH versus MSVC.  We still expect incremental linking using MSVC under optimal conditions (e.g. change whitespace in a header file) to produce much faster links than lld is currently able to do.

    There are several possible avenues for further optimization though, so we will finish up by discussing them.

    Further Improvements

    There are several ways to improve the times further, which have yet to be explored.

    1. Use a smaller or faster hash.  We use a 20-byte SHA1 hash.  This is not a multiple of cache line size, and in any case the probability of collision is astronomically small even in the largest PDBs, considering that the theoretical limit of a PDB is just under 2^32 possible unique types (due to the 4-byte size of a type index).  SHA1 is also notoriously slow.  It might be interesting to try, for example, a Blake2 set to output an 8 byte hash.  This should give sufficiently low probability of a collision while improving cache performance.  The on-disk format is designed with this flexibility in mind, as different hash algorithms can be specified in the header.
    2. Hashes for compilands with missing .debug$H sections can be computed in parallel before linking.  Currently when we encounter an object file without a .debug$H section, we must synthesize one in the linker.  Our prototype algorithm does this serially for each input.
    3. Symbol records from .debug$S sections can be merged in parallel.  Currently in lld, we first merge type records into the TPI stream, then we iterate symbol records and remap types in each symbol record to correspond to the new type indices.  If we merge types from all modules up front, the symbol records (with the exception of global symbols) can be merged in parallel since they get written to independent streams).

    Try it out!

    If you're already using clang-cl and lld on Windows today, you can try this out.  There are two flags needed to enable this, one for the compiler and one for the linker:
    1. To enable the emission of a .debug$H section by the compiler, you will need to pass the undocumented -mllvm -emit-codeview-ghash-section flag to clang-cl  (this flag should go away in the future, once this is considered stable and good enough to be turned on by default).
    2. To tell lld to use this information, you will need to pass the /DEBUG:GHASH to lld.
    Note that this feature is still considered highly experimental, so we're interested in your feedback (llvm-dev@ mailing list, direct email is ok too) and bug reports (bugs.llvm.org).  

    by Unknown (noreply@blogger.com) at January 09, 2018 06:49 PM

    September 21, 2017

    LLVM Blog

    2017 US LLVM Developers' Meeting Program

    The LLVM Foundation is excited to announce the selected proposals for the 2017 US LLVM Developers' Meeting!

    Keynotes:


    Talks:


    BoFs:


    Tutorials:


    Lightning Talks:


    Student Research Competition:


    Posters:

    If you are interested in any of these talks, you should register to attend the 2017 US LLVM Developers' Meeting! Tickets are limited, so register now!

    by Anonymous (noreply@blogger.com) at September 21, 2017 08:14 AM

    September 20, 2017

    LLVM Blog

    Clang ♥ bash -- better auto completion is coming to bash



    Compilers are complex pieces of software and have a multitude of command-line options to fine tune parameters. Clang is no exception: it has 447 command-line options. It’s nearly impossible to memorize all these options and their correct spellings, that's where shell completion can be very handy. When you type in the first few characters of a flag and hit tab, it will autocomplete the rest for you.

    Background
    However, such a autocompletion feature is not available yet, as there's no easy way to get a complete list of the options Clang supports. For example, bash doesn’t have any autocompletion support for Clang, and despite some shells like zsh having a script for command-line autocompletion, they use hard coded lists of command-line options, and are not automatically updated when a new option is added to Clang. These shells also can’t autocomplete arguments which some flags take (-std=[tab] for instance).


    This is the problem we were working to solve during this year’s Google Summer of Code. We’re adding a feature to Clang so that we can implement a complete, exact command-line option completion which is highly portable for any shell. To start with, we'll provide a completion script for bash which uses this feature.

    Implementation
    Clang now has a new command line option called --autocomplete. This flag receives the incomplete user input from the shell and then queries the internal data structures of the current Clang binary, and returns a list of possible completions. With this API, we can always get an accurate list of options and values any time, on any newer versions of Clang.

    We built an autocompletion using this in bash for the first implementation. You can find its source code here. Also, here is the sample for Qt text entry autocompletion to give an example how to use this API from an UI application as seen below:

    final.gif

    You can always complete one flag at a time. So if you want to use the API, you have to select the flag that the user is currently typing. Then just pass this flag to the --autocomplete flag in the selected clang binary. So in the case below all flags start with `-tr` are displayed with their descriptions behind them (separated from the flag with a tab character).
    The API also supports completing the values of flags. If you have a flag for which value completion is supported, you can also provide an incomplete value behind the flag separated by a comma to get completion for this:
    If you provide nothing after the comma, the list of the all possible values for this flag is displayed.

    How to get it
    This feature is available for use now with LLVM/clang 5.0 and we’ll also be adding this feature to the standard bash completion package. Make sure you have the latest clang version on your machine, and source this script. If want to make the change permanent, just source it from your .bashrc and enjoy typing your clang invocations!

    by Anonymous (noreply@blogger.com) at September 20, 2017 04:11 PM

    August 24, 2017

    Sylvestre Ledru

    Rebuild of Debian using Clang 3.9, 4.0 and 5.0

    tldr: The percentage of failure is decreasing, Clang support is improving but there is a long way to go.

    The goal of this initiative is to rebuild Debian using Clang as a compiler instead of gcc. I have been doing this analysis for the last 6 years.

    Recently, we rebuilt the archive of the Debian archive with Clang 3.9.1 (July 6th), 4.0.1 (July 6th) and 5.0 rc2 (August 20th).

    For various reasons, we didn't perform a rebuild since June 2016 with version 3.8. Therefor, we took the opportunity to do three over the last month.

    Now, the 3.9 & 4.0 results are impacted by a build failure when building all haskell packages (the -no-pie option in Clang doesn't exist - I introduced it in clang 5.0). Fixing this issue with 5.0 removed more than 860 failures.

    Also, for the same versions, a Qt compiler detection is considering that Clang is not a C++11 compiler because clang++, by default, defines __cplusplus as 199711L (-std=c++11 has to be added to define a correct __cplusplus). See https://bugreports.qt.io/browse/QTBUG-62535 for more information. Some discussions happened on the upstream mailing list about changing the default C++ dialect.
    For example, with 4.0, this is causing 132 errors. With 5.0, probably thanks to a new Qt version, roughly the same number of packages are failing but because gcc just triggers a warning with the "nodiscard" attribute being incorrectly used when clang triggers an error.

    In parallel, ignoring the haskell build failures, the numbers sightly increased since last year even if the overall percentage decreased (new packages being uploaded in the archive).

    VersionBuild failuresIgnoring haskell pkgs
    3.81367 / 5.6%
    3.92274 / 8.1%1618 / 5.8%
    4.02311 / 8.3%1655 / 5.9%
    5.01445 / 5.1%

    In parallel, new warnings and errors showed up in Clang.
    This is causing a new set of build failures (especially with the usage of -Werror).

    As few examples:
    * Starting with 4.0, clang triggers an error ordered comparison between pointer and zero ('char *' and 'int').
    * Similarly, with this version, -Wmain introduces a new warning which will trigger a warning when a bool literal is returned from main.
    * clang also introduced a new warning called -Waddress-of-packed-member causing 5 new errors.
    * With the same version, clang can trigger a new error when auto is used in function return type.

    Now, as a conclusion, having Debian being built with clang by default is still a long shot.
    First, when Clang became usable for a general audience, gcc was lagging in term of warning and error detections. Now, gcc is in a much better position than it was, decreasing the interest to have clang replacing gcc. In parallel, most of the efforts in term of warnings
    and mistake detections are currently done under the clang tidy umbrella, making them less intrusive as part of this initiative (but harder to use and to deploy).
    As an example, the gcc warning -Wmisleading-indentation has been implemented under a clang-tidy checker.
    Second, the very permissive license of clang has been a key factor for some operating systems to switch like the PS4, Mac OS X or FreeBSD. With Debian, the community is generally happy with the GPL.
    Third, the performances are similar enough that it is not worth the work, except for some projects with very special needs.

    Last, despite that it is much easier to contribute to llvm/clang than gcc (not copyright assignment or actual review system for example), this isn't a big differentiator for most of the projects.

    Of course, I will continue to run and analysis these rebuilds as this is a great source of information for clang upstream developers to improve the compatibility with gcc and understand some impacts. However, until there is a big game changer, I will stop pursuing the goal of having Debian switching to clang instead of gcc. I will stop effort on the debile project (which was aiming to rebuild in the background packages).

    by sylvestre at August 24, 2017 12:09 AM

    August 23, 2017

    Sylvestre Ledru

    Rebuild of Debian using Clang 3.9, 4.0 and 5.0

    tldr: The percentage of failure is decreasing, Clang support is improving but there is a long way to go.

    The goal of this initiative is to rebuild Debian using Clang as a compiler instead of gcc. I have been doing this analysis for the last 6 years.

    Recently, we rebuilt the archive of the Debian archive with Clang 3.9.1 (July 6th), 4.0.1 (July 6th) and 5.0 rc2 (August 20th).

    For various reasons, we didn't perform a rebuild since June 2016 with version 3.8. Therefor, we took the opportunity to do three over the last month.

    Now, the 3.9 & 4.0 results are impacted by a build failure when building all haskell packages (the -no-pie option in Clang doesn't exist - I introduced it in clang 5.0). Fixing this issue with 5.0 removed more than 860 failures.

    Also, for the same versions, a Qt compiler detection is considering that Clang is not a C++11 compiler because clang++, by default, defines __cplusplus as 199711L (-std=c++11 has to be added to define a correct __cplusplus). See https://bugreports.qt.io/browse/QTBUG-62535 for more information. Some discussions happened on the upstream mailing list about changing the default C++ dialect.
    For example, with 4.0, this is causing 132 errors. With 5.0, probably thanks to a new Qt version, roughly the same number of packages are failing but because gcc just triggers a warning with the "nodiscard" attribute being incorrectly used when clang triggers an error.

    In parallel, ignoring the haskell build failures, the numbers sightly increased since last year even if the overall percentage decreased (new packages being uploaded in the archive).

    VersionBuild failuresIgnoring haskell pkgs
    3.81367 / 5.6%
    3.92274 / 8.1%1618 / 5.8%
    4.02311 / 8.3%1655 / 5.9%
    5.01445 / 5.1%

    In parallel, new warnings and errors showed up in Clang.
    This is causing a new set of build failures (especially with the usage of -Werror).

    As few examples:
    * Starting with 4.0, clang triggers an error ordered comparison between pointer and zero ('char *' and 'int').
    * Similarly, with this version, -Wmain introduces a new warning which will trigger a warning when a bool literal is returned from main.
    * clang also introduced a new warning called -Waddress-of-packed-member causing 5 new errors.
    * With the same version, clang can trigger a new error when auto is used in function return type.

    Now, as a conclusion, having Debian being built with clang by default is still a long shot.
    First, when Clang became usable for a general audience, gcc was lagging in term of warning and error detections. Now, gcc is in a much better position than it was, decreasing the interest to have clang replacing gcc. In parallel, most of the efforts in term of warnings
    and mistake detections are currently done under the clang tidy umbrella, making them less intrusive as part of this initiative (but harder to use and to deploy).
    As an example, the gcc warning -Wmisleading-indentation has been implemented under a clang-tidy checker.
    Second, the very permissive license of clang has been a key factor for some operating systems to switch like the PS4, Mac OS X or FreeBSD. With Debian, the community is generally happy with the GPL.
    Third, the performances are similar enough that it is not worth the work, except for some projects with very special needs.

    Last, despite that it is much easier to contribute to llvm/clang than gcc (not copyright assignment or actual review system for example), this isn't a big differentiator for most of the projects.

    Of course, I will continue to run and analysis these rebuilds as this is a great source of information for clang upstream developers to improve the compatibility with gcc and understand some impacts. However, until there is a big game changer, I will stop pursuing the goal of having Debian switching to clang instead of gcc. I will stop effort on the debile project (which was aiming to rebuild in the background packages).

    by sylvestre at August 23, 2017 10:09 PM

    August 18, 2017

    LLVM Blog

    LLVM on Windows now supports PDB Debug Info

    For several years, we’ve been hard at work on making clang a world class toolchain for developing software on Windows.  We’ve written about this several times in the past, and we’ve had full ABI compatibility (minus bugs) for some time. One area that been notoriously hard to achieve compatibility on has been debug information, but over the past 2 years we’ve made significant leaps.  If you just want the TL;DR, then here you go: If you’re using clang on Windows, you can now get PDB debug information!


    Background: CodeView vs. PDB
    CodeView is a debug information format invented by Microsoft in the mid 1980s. For various reasons, other debuggers developed an independent format called DWARF, which eventually became standardized and is now widely supported by many compilers and programming languages.  CodeView, like DWARF, defines a set of records that describe mappings between source lines and code addresses, as well as types and symbols that your program uses.  The debugger then uses this information to let you set breakpoints by function name, display the value of a variable, etc.  But CodeView is only somewhat documented, with the most recent official documentation being at least 20 years old.  While some records still have the format documented above, others have evolved, and entirely new records have been introduced that are not documented anywhere.


    It’s important to understand though that CodeView is just a collection of records.  What happens when the user says “show me the value of Foo”?  The debugger has to find the record that describes Foo.  And now things start getting complicated.  What optimizations are enabled?  What version of the compiler was used?  (These could be important if there are certain ABI incompatibilities between different versions of the compiler, or as a hint when trying to reconstruct a backtrace in heavily optimized code, or if the stack has been smashed).  There are a billion other symbols in the program, how can we find the one named Foo without doing an exhaustive O(n) search?  How can we support incremental linking so that it doesn’t take a long time to re-generate debug info when only a small amount of code has actually changed?  How can we save space by de-duplicating strings that are used repeatedly?  Enter PDB.


    PDB (Program Database) is, as you might have guessed from the name, a database.  It contains CodeView but it also contains many other things that allow indexing of the CodeView records in various ways.  This allows for fast lookups of types and symbols by name or address, the philosophical equivalent of “tables” for individual input files, and various other things that are mostly invisible to you as a user but largely responsible for making the debugging experience on Windows so great.  But there’s a problem: While CodeView is at least kind-of documented, PDB is completely undocumented.  And it’s highly non-trivial.


    We’re Stuck (Or Are We?)
    Several years ago, we decided that the path forward was to abandon any hope of emitting CodeView and PDB, and instead focus on two things:
    1. Make clang-cl emit DWARF debug information on Windows
    2. Port LLDB to Windows and teach it about the Windows ABI, which would be significantly easier than teaching Visual Studio and/or WinDbg to be able to interpret DWARF (assuming this is even possible at all, given that everything would have to be done strictly through the Visual Studio / WinDbg extensibility model)
    In fact, I even wrote another blog post about this very topic a little over 2 years ago.  So I got it to work, and I eventually got parts of LLDB working on Windows for simple debugging scenarios.


    Unfortunately, it was beginning to become clear that we really needed PDB.  Our goal has always been to create as little friction as possible for developers who are embedded in the Windows ecosystem.  Tools like Windows Performance Analyzer and vTune are very powerful and standard tools in engineers’ existing repertoires.  Organizations already have infrastructure in place to archive PDB files, and collect & analyze crash dumps.  Debugging with PDB is extremely responsive given that the debugger does not have to index symbols upon startup, since the indices are built into the file format.  And last but not least, tools such as WinDbg are already great for post-mortem debugging, and frankly many (perhaps even most) Windows developers will only give up the Visual Studio debugger when it is pried from their cold dead hands.


    I got some odd stares (to put it lightly) when I suggested that we just ask Microsoft if they would help us out.  But ultimately we did, and… they agreed!  This came in the form of some code uploaded to the Microsoft Github repo which we were on our own to figure out.  Although they were only able to upload a subset of their PDB code (meaning we had to do a lot of guessing and exploration, and the code didn’t compile either since half of it was missing), it filled in enough blanks that we were able to do the rest.


    After about a year and a half of studying this code, hacking away, studying the code some more, hacking away some more, etc, I’m proud to say that lld (the LLVM linker) can finally emit working PDBs.  All the basics like setting breakpoints by line, or by name, or viewing variables, or searching for symbols or types, everything works (minus bugs, of course).


    For those of you who are interested in digging into the internals of a PDB, we also have been developing a tool for expressly this purpose.  It’s called llvm-pdbutil and is the spiritual counterpart to Microsoft’s own cvdump utility.  It can dump the internals of a PDB, convert a PDB to yaml and vice versa, find differences between two PDBs, and much more.  Brief documentation for llvm-pdbutil is here, and a detailed description of the PDB file format internals are here, consisting of everything we’ve learned over the past 2 years (still a work in progress, as I have to divide my time between writing the documentation and actually making PDBs work).


    Bring on the Bugs!
    So this is where you come in.  We’ve tested simple debugging scenarios with our PDBs, but we still consider this alpha in terms of debug info quality.  We’d love for you to try it out and report issues on our bug tracker.  To get you started, download the latest snapshot of clang for Windows.  Here are two simple ways to test out this new functionality:
    1. Have clang-cl invoke lld automatically
      1. clang-cl -fuse-ld=lld -Z7 -MTd hello.cpp
    2. Invoke clang-cl and lld separately.
      1. clang-cl -c -Z7 -MTd -o hello.obj hello.cpp
      2. lld-link -debug hello.obj
    We look forward to the onslaught of bug reports!


    We would like to extend a very sincere and deep thanks to Microsoft for their help in getting the code uploaded to the github repository, as we would never have gotten this far without it.


    And to leave you with something to get you even more excited for the future, it's worth reiterating that all of this is done without a dependency on any windows specific api, dll, or library.  It's 100% portable.  Do I hear cross-compilation?

    Zach Turner (on behalf of the the LLVM Windows Team)

    by Unknown (noreply@blogger.com) at August 18, 2017 07:55 PM

    August 17, 2017

    LLVM Blog

    LLVM Weekly - #130, Jun 27th 2016

    Welcome to the one hundred and thirtieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.
    If you're reading this on blog.llvm.org then do note this is LAST TIME it will be cross-posted there directly. There is a great effort underway to increase the content on the LLVM blog, and unfortunately LLVM Weekly has the effect of drowning out this content. As ever, you can head to http://llvmweekly.org, subscribe to get it by email, or subscribe to the RSS feed.
    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    After recently being taken down due to excessive resource usage, the LLVM apt repositories are now back.
    A detailed introduction to ThinLTO has been published on the LLVM blog. This covers the background, design, current status, and usage information for ThinLTO.
    A post on Reddit gives a summary of notable language features voted into the C++17 working draft at the Oulu meeting.

    On the mailing lists

    LLVM commits

    • The new representation for control-flow integrity and virtual call metadata has landed. The commit message further details the problems this change addresses. r273729.
    • The llvm.type.checked.load intrinsic was added. It loads a function pointer from a virtual table pointer using type metadata. r273576.
    • As part of the work on CFL-AA, interprocedural function summaries were added. These avoid recomputation for many properties of a function. r273219, r273596.
    • MemorySSA gained new APIs for PHI creation and MemoryAccess creation. r273295.
    • Metadata attachments are now allowed for declarations. r273336.
    • A new runtimes directory was added to the LLVM tree. r273620.
    • LLVM's dynamic loader gained basic support for COFF ARM. r273682.

    Clang commits

    • constexpr if support has been added to Clang. r273602.
    • clang-tidy has a new modernize-use-emplace check that will replace calls of push_back to emplace_back. r273275.
    • The CMake build system for Clang gained a ENABLE_X86_RELAX_RELOCATIONS option. r273224.

    Other project commits

    • Basic support for versioned symbols was added to LLD. r273143.
    • LLD now handles both single and double dashes for all options. r273256.

    by Unknown (noreply@blogger.com) at August 17, 2017 10:48 PM

    March 10, 2017

    LLVM Blog

    Devirtualization in LLVM and Clang

    This blog post is part of a series of blog posts from students who were funded by the LLVM Foundation to attend the 2016 LLVM Developers' Meeting in San Jose, CA. Please visit the LLVM Foundation's webpage for more information on our Travel Grants program. 

    This post is from Piotr Padlewski on his work that he presented at the meeting:

    This blogpost will show how C++ devirtualization is performed in current (4.0) clang and LLVM and also ongoing work on -fstrict-vtable-pointers features.

    Devirtualization done by the frontend


    In order to transform a virtual call into a direct call, the frontend must be sure that there are no overrides of vfunction in the program or know the dynamic type of object. Compilation proceeds one translation unit at a time, so, barring LTO, there are only a few cases when the compiler may conclude that there are no overrides:

    • either the class or virtual method is marked as final
    • the class is defined in an anonymous namespace and has no deriving classes in its translation unit

    The latter is more tricky for clang, which translates the source code in chunks on the fly (see: ASTProducer and ASTConsumer), so is not able to determine if there are any deriving classes later in the source. This could be dealt with in a couple of ways:
    • give up immediate generation
    • run data flow analysis in LLVM to find all the dynamic types passed to function, which has static linkage
    • hope that every use of the virtual function, which is necessarily in the same translation unit, will be inlined by LLVM -- static linkage increases the chances of inlining

    Store to load propagation in LLVM

    In order to devirtualize a virtual call we need:
    • value of vptr - which virtual table is pointed by it
    • value of vtable slot - which exact virtual function it is

    Because vtables are constant, the latter value is much easier to get when we have the value of vptr. The only thing we need is vtable definition, which can be achieved by using available_externally linkage.

    In order to figure out the vptr value, we have to find the store to the same location that defines it. There are 2 analysis responsible for it:

    • MemDep (Memory Dependence Analysis) is a simple linear algorithm that for each quered instruction iterates through all instructions above and stops when first dependency is found. Because queries might be performed for each instruction we end up with a quadratic algorithm. Of course quadratic algorithms are not welcome in compilers, so MemDep can only check certain number of instructions.
    • Memory SSA on the other hand has constant complexity because of caching. To find out more, watch “Memory SSA in 5minutes” (https://www.youtube.com/watch?v=bdxWmryoHak). MemSSA is a pretty new analysis and it doesn’t have all the features MemDep has, therefore MemDep is still widely used.
    The LLVM main pass that does store to load propagation is GVN - Global Value Numbering.



    Finding vptr store

    In order to figure out the vptr value, we need to see store from constructor. To not rely on constructor's availability or inlining, we decided to use the @llvm.assume intrinsic to indicate the value of vptr. Assume is akin to assert - optimizer seeing call to @llvm.assume(i1 %b) can assume that %b is true after it. We can indicate vptr value by comparing it with the vtable and then call the @llvm.assume with the result of this comparison.

    call void @_ZN1AC1Ev(%struct.A* %a) ; call ctor
     %3 = load {...} %a                  ; Load vptr
     %4 = icmp eq %3, @_ZTV1A      ; compare vptr with vtable
     call void @llvm.assume(i1 %4)


    Calling multiple virtual functions

    A non-inlined virtual call will clobber the vptr. In other words, optimizer will have to assume that vfunction might change the vptr in passed object. This sounds like something that never happens because vptr is “const”. The truth is that it is actually weaker than C++ const member, because it changes multiple times during construction of an object (every base type constructor or destructor must set vptrs). But vptr can't be changed during a virtual call, right? Well, what about that?

    void A::foo() { // virtual
    static_assert(sizeof(A) == sizeof(Derived));
    new(this) Derived;
    }

    This is call of placement new operator - it doesn’t allocate new memory, it just creates a new object in the provided location. So, by constructing a Derived object in the place where an object of type A was living, we change the vptr to point to Derived’s vtable. Is this code even legal? C++ Standard says yes.

    However it turns out that if someone called foo 2 times (with the same object), the second call would be undefined behavior. Standard pretty much says that call or dereference of a pointer to an object whose lifetime has ended is UB, and because the standard agrees that nuking object from inside ends its lifetime, the second call is UB. Be aware that this is only because a zombie pointer is used for the second call. The pointer returned by placement new is considered alive, so performing calls on that pointer is valid. Note that we also silently used that fact with the use of assume.

    (un)clobbering vptr

    We need to somehow say that vptr is invariant during its lifetime. We decided to introduce a new metadata for that purpose - !invariant.group. The presence of the invariant.group metadata on the load/store tells the optimizer that every load and store to the same pointer operand within the same invariant group can be assumed to load or store the same value. With -fstrict-vtable-pointers Clang decorates vtable loads with invariant.group metadana coresponding to caller pointer type. 

    We can enhance the load of virtual function (second load) by decorating it with !invariant.load, which is equivalent of saying “load from this location is always the same”, which is true because vtables never changes. This way we don’t rely on having the definition of vtable.

    Call like:

    void g(A *a) {
      a->foo();
      a->foo();
    }

    Will be translated to:

    define void @function(%struct.A* %a) {
     %1 = load {...} %a, !invariant.group !0
     %2 = load {...} %1, !invariant.load !1
     call void %2(%struct.A* %a)

     %3 = load {...} %a, !invariant.group !0
     %4 = load {...} %4, !invariant.load !1
     call void %4(%struct.A* %a)
     ret void
    }

    !0 = !{!"_ZTS1A"} ; mangled type name of A
    !1 = !{}

    And now by magic of GVN and MemDep:

    define void @function(%struct.A* %a) {
     %1 = load {...} %a, !invariant.group !0
     %2 = load {...} %1, !invariant.load !1
     call void %2(%struct.A* %a)
     call void %2(%struct.A* %a)
     ret void
    }

    With this, llvm-4.0 is be able to devirtualize function calls inside loops. 

    Barriers

    In order to prevent the middle-end from finding load/store with the same !invariant.group metadata, that would come from construction/destruction of dead dynamic object, @llvm.invariant.group.barrier was introduced. It returns another pointer that aliases its argument but is considered different for the purposes of load/store invariant.group metadata. Optimizer won’t be able to figure out that returned pointer is the same because intrinsics don’t have a definition. Barrier must be inserted in all the places where the dynamic object changes:
    • constructors
    • destructors
    • placement new of dynamic object

    Dealing with barriers

    Barriers hinder some other optimizations. Some ideas how it could be fixed:

    • stripping invariant.group metadata and barriers just after devirtualization. Currently it is done before codegen. The problem is that most of the devirtualization comes from GVN, which also does most of the optimizations we would miss with barriers. GVN is expensive therefore it is run only once. It also might make less sense if we are in LTO mode, because that would limit the devirtualization in the link phase. 
    • teaching important passes to look through the barrier. This might be very tricky to preserve the semantics of barrier, but e.g. looking for dependency of load without invariant.group by jumping through the barrier to find a store without invariant.group, is likely to do the trick.
    • removing invariant.barrier when its argument comes from alloca and is never used etc.
    To find out more details about devirtualization check my talk (http://llvm.org/devmtg/2016-11/#talk6) from LLVM Dev Meeting 2016.

    About author

    Undergraduate student at University of Warsaw, currently working on C++ static analysis in IIIT.

    by Anonymous (noreply@blogger.com) at March 10, 2017 09:23 PM

    March 07, 2017

    LLVM Blog

    Some news about apt.llvm.org

    apt.llvm.org provides Debian and Ubuntu repositories for every maintained version of these distributions. LLVM, Clang, clang extra tools, compiler-rt, polly, LLDB and LLD packages are generated for the stable, stabilization and development branches.

    As it seems that we have more and more users of these packages, I would like to share an update about various recent changes.

    New features

    LLD
    First, the cool new stuff : lld is now proposed and built for i386/amd64 on all Debian and Ubuntu supported versions. The test suite is also executed and the coverage results are great.

    4.0
    Then, following the branching for the 4.0 release, I created new repositories to propose this release.
    For example, for Debian stable, just add the following in /etc/apt/sources.list.d/llvm.list

    deb http://apt.llvm.org/jessie/ llvm-toolchain-jessie-4.0 main
      deb-src http://apt.llvm.org/jessie/ llvm-toolchain-jessie main

    llvm-defaults
    Obviously, the trunk is now 5.0. If llvm-defaults is used, clang, lldb and other meta packages will be automatically updated to this version.
    As a consequence and also because the branches are dead, 3.7 and 3.8 jobs have been disabled. Please note that both repositories are still available on apt.llvm.org and won't be removed.

    Zesty: New Ubuntu
    Packages for the next Ubuntu 17.04 (zesty) are also generated for 3.9, 4.0 and 5.0.

    libfuzzer
    It has been implemented a few months ago but not clearly communicated. libfuzzer has also its own packages: libfuzzer-X.Y-dev (example: libfuzzer-3.9-dev, libfuzzer-4.0-dev or libfuzzer-5.0-dev).


    Changes in the infrastructure


    In order to support the load, I started to use new blades that Google (thanks again to Nick Lewycky) sponsored for an initiative that I was running for Debian and IRILL. The 6 new blades removed all the wait time. With a new salt configuration, I automated the deployment of the slaves. In case the load increases again, we will have access to more blades.

    I also took the time to fix some long ongoing issues:
    • all repositories are signed and verified that they are    
    • i386 and amd64 packages are now uploaded at once instead of being uploaded separately. This was causing checksum error when one of the two architectures built correctly and the second was failing (ex: test failing)
    Last but not least, the code coverage results are produced in a more reliable manner.


    More information about the implementation and services.

    As what is shipped on apt.llvm.org is exactly the same as in Debian and Ubuntu, packaging files are stored on the Debian subversion server.

    A Jenkins instance is in charge of the orchestration of the whole build infrastructure.

    The trunk packages are built twice a day for every Debian and Ubuntu packages. Branches (3.9 and 4.0 currently) are rebuilt only when the - trigger job found a change.

    In both case, the Jenkins source job will checkout the Debian SVN branches for their version, checkout/update LLVM/clang/etc repositories and repack everything to create the source tarballs and Debian files (dsc, etc).The completion of job will trigger the binaries job to start. These jobs, thanks to Debian Jenkins glue will create or update Debian/Ubuntu versions.

    Then builds are done the usual way through pbuilder for both i386 and amd64. All the test suites are going to be executed. If any LLVM test is failing on i386 or amd64, the whole build will fail. If both builds and the LLVM testsuite are successful, the sync job will start and rsync packages to the LLVM server to be replicated on the CDN. If one or both builds fail, a notification is sent to the administrator.

    Some Debian static analysis (lintian) are executed on the packages to prevent some packaging errors. From time to time, some interesting issues are found.

    In parallel, some binary builds have some special hooks like Coverity, code coverage or installation of more recent versions of gcc for Ubuntu precise.

    Report bugs

    Bugs can be reported on the bugzilla of the LLVM project in the product "Packaging" and the component "deb packages".
      

    Common issues

    Because packaging quickly moving projects like LLVM or clang, in some cases, this can be challenging to follow the rhythm in particular with regard to tests. For Debian unstable or the latest version of Ubuntu, the matrix is complexified by new versions of the basic pieces of the operating system like gcc/g++ or libtstdc++.

    This is also not uncommon that some tests are being ignored in the process.

    How to help


    Some new comers bugs are available. As an example:
    Related to all this, a Google Summer of Code 2017 under the LLVM umbrella has been proposed: Integrate libc++ and OpenMP in apt.llvm.org

    Help is also needed to keep track of the new test failures and get them fixed upstream. For example, a few tests have been marked as expected to fail to avoid crashes.

    by Ledru Sylvestre (noreply@blogger.com) at March 07, 2017 06:59 PM

    February 22, 2017

    LLVM Blog

    2016 LLVM Developers' Meeting - Experience from Johannes Doerfert, Travel Grant Recipient

    This blog post is part of a series of blog posts from students who were funded by the LLVM Foundation to attend the 2016 LLVM Developers' Meeting in San Jose, CA. Please visit the LLVM Foundation's webpage for more information on our Travel Grants program.

    This post is from Johannes Doerfert:
    2016 was my third time attending the US LLVM developers meeting and for the third year in a row I was impressed by the quality of the talks, the organization and the diversity of attendees. The hands on experiences that are presented, combined with innovative ideas and cutting edge research makes it a perfect venue for me as a PhD student. The honest interest in the presented topics and the lively discussions that include students, professors and industry people are two of the many things that I experienced the strongest at these developer meetings.

    For the last two years I was mainly attending as a Polly developer that talked about new features and possible applications of Polly. This year however my roles were different. First, I was attending as part of the organization team of the European LLVM developers meeting 2017 [0] together with my colleagues Tina Jung and Simon Moll. In this capacity I answered questions about the venue (Saarbruecken, Germany [1,2]) and the alterations in contrast to prior meetings. Though, more importantly, I advertised the meeting to core developers that usually do not attend the European version. Second on my agenda was the BoF on a parallel extension to the LLVM-IR which I organized with Simon Moll. In this BoF, but also during the preparation discussion on the mailing list [3], we tried to collect motivating examples, requirements as well as known challenges for a parallel extension to LLVM. These insights will be used to draft a proposal that can be discussed in the community.

    Finally, I attended as a 4th year PhD student who is interested in contributing his work to the LLVM project (not only Polly). As my current research required a flexible polyhedral value (and iterationspace) analysis, I used the opportunity to implement one with aninterface similar to scalar evolution. The feedback I received on this topic was strictly positive. I will soon post a first version of this standalone analysis and start a public discussion. Since I hope to finish my studies at some (not too distant) point in time, I seized the opportunity to inquire about potential options for the time after my PhD.

    As a final note I would like to thank the LLVM Foundation for their student travel grant that allowed me to attend the meeting in the first place.


    [0] http://llvm.org/devmtg/2017-03/
    [1] http://sic.saarland/
    [2] https://en.wikipedia.org/wiki/Saarbr%C3%BCcken
    [3] http://lists.llvm.org/pipermail/llvm-dev/2016-October/106051.html

    by Anonymous (noreply@blogger.com) at February 22, 2017 07:29 AM

    December 14, 2016

    LLVM Blog

    LLVM's New Versioning Scheme

    Historically, LLVM's major releases always added "0.1" to the version number, producing major versions like 3.8, 3.9, and 4.0 (expected by March 2017). With our next release though, we're changing this.  The LLVM version number will now increase by "1.0" with every major release, which means that the first major release after LLVM 4.0 will be LLVM 5.0 (expected September 2017).
    We believe that this approach will provide a simpler and more standard approach to versioning.
    LLVM’s version number (also shared by many of its sub-projects, such as Clang, LLD, etc.) consists of three parts: major.minor.patch. The community produces a new release every six months, with "patch" releases (also known as "dot" or "stable" releases) containing bug fixes in between.
    Until now, the six-monthly releases would cause the minor component of the version to be incremented. Every five years, after minor reached 9, a more major release would occur, including some breaking changes: 2.0 introduced the bitcode format, 3.0 a type system rewrite.
    During the discussions about what to call the release after 3.9, it was pointed out that since our releases are time-based rather than feature-based, the distinction between major and minor releases seems arbitrary. Further, every release is also API breaking, so by the principles of semantic versioning, we should be incrementing the major version number.
    We decided that going forward, every release on the six-month cycle will be a major release. Patch releases will increment the patch component as before (producing versions like 5.0.1), and the minor component will stay at zero since no minor releases will be made.

    Bitcode Compatibility

    Before LLVM 4.0.0, the Developer Policy specified that bitcode produced by LLVM would be readable by the next versions up to and including the next major release. The new version of the Developer Policy instead specifies that LLVM will currently load any bitcode produced by version 3.0 or later. When developers decide to drop support for some old bitcode feature, the policy will be updated.

    API Compatibility

    Nothing has changed. As before, patch releases are API and ABI compatible with the main releases, and the C API is "best effort" for stability, but besides that, LLVM’s API changes between releases.

    What About the Minor Version?

    Since the minor version is expected to always be zero, why not drop it and just use major.patch as the version number?
    Dropping the minor component from the middle of the version string would introduce ambiguity: whether to interpret x.y as major.minor or major.patch would then depend on the value of x.
    The version numbers are also exposed through various APIs, such as LLVM's llvm-config.h and Clang's __clang_minor__ preprocessor macro. Removing the minor component from these APIs would break a lot of existing code.
    Going forward, since the minor number will be zero and patch releases are compatible, I expect we will generally refer to versions simply by their major number and treat the rest of the version string as details (just as Chromium 55 might really be 55.0.2883.76). Future versions of LLVM and Clang can generally be referred to simply as "LLVM 4" or "Clang 5".

    by Hans Wennborg (noreply@blogger.com) at December 14, 2016 11:38 PM

    September 12, 2016

    LLVM Blog

    Announcing the next LLVM Foundation Board of Directors

    The LLVM Foundation is pleased to announce its new Board of Directors:


    Chandler Carruth
    Hal Finkel
    Arnaud de Grandmaison
    David Kipping
    Anton Korobeynikov
    Tanya Lattner
    Chris Lattner
    John Regehr


    Three new members and five continuing members were elected to the eight person board. The new board consists of individuals from corporations and from the academic and scientific communities. They also represent various geographical groups of the LLVM community. All board members are dedicated and passionate about the programs of the LLVM Foundation and growing and supporting the LLVM community.


    When voting on new board members, we took into consideration all contributions (past and present) and current involvement in the LLVM community. We also tried to create a balanced board of individuals from a wide range of backgrounds and locations to provide a voice to as many groups within the LLVM community.


    We want to thank everyone who applied as we had many strong applications. As the programs of the LLVM Foundation grow we will be relying on volunteers to help us reach success. Please join our mailing list to be informed of volunteer opportunities.


    About the board of directors (listed alphabetically by last name):


    Chandler Carruth has been an active contributor to LLVM since 2007. Over the years, he has has worked on LLVM’s memory model and atomics, Clang’s C++ support, GCC-compatible driver, initial profile-aware code layout optimization pass, pass manager, IPO infrastructure, and much more. He is the current code owner of inlining and SSA formation.


    In addition to his numerous technical contributions, Chandler has led Google’s LLVM efforts since 2010 and shepherded a number of new efforts that have positively and significantly impacted the LLVM project. These new efforts include things such as adding C++ modules to Clang, adding address and other sanitizers to Clang/LLVM, making Clang compatible with MSVC and available to the Windows C++ developer community, and much more.


    Chandler works at Google Inc. as a technical lead for their C++ developer platform and has served on the LLVM Foundation board of directors for the last 2 years.
    Hal Finkel has been an active contributor to the LLVM project since 2011. He is the code owner for the PowerPC target, alias-analysis infrastructure, loop re-roller and the basic-block vectorizer.  


    In addition to his numerous technical contributions, Hal has chaired the LLVM in HPC workshop, which is held in conjunction with Super Computing (SC), for the last 3 years. This workshop provides a venue for the presentation of peer-reviewed HPC-related researching LLVM from both industry and academia. He has also been involved in organizing an LLVM-themed BoF session at SC and LLVM socials in Austin.


    Hal is Lead for Compiler Technology and Programming Languages at Argonne National Laboratory’s Leadership Computing Facility.


    Arnaud de Grandmaison has been hacking on LLVM projects since 2008. In addition to his open source contributions, he has worked for many years on private out-of-tree LLVM-based projects at Parrot, DiBcom, or ARM. He has also been a leader in the European LLVM community by organizing the EuroLLVM Developers’ meeting, Paris socials, and chaired or participated in numerous program committees for the LLVM Developers’ Meetings and other LLVM related conferences.


    Arnaud has attended numerous LLVM Developers’ meetings and volunteered as moderator or presented as well. He also moderates several LLVM mailing lists.  Arnaud is also very involved in community wide discussions and decisions such as re-licensing and code of conduct.


    Arnaud is a Principal Engineer at ARM.


    David Kipping has been involved with the LLVM project since 2010. He has been a key organizer and supporter of many LLVM community events such as the US and European LLVM Developers’ Meetings. He has served on many of the program committees for these events.


    David has worked hard to advance the adoption of LLVM at Qualcomm and other companies. One such example of his efforts is the LLVM track he created at the 2011 Linux Collaboration summit. He has over 30 years experience in open source and developer tools including working on C++ at Borland.


    David has served on the board of directors for the last 2 years and has held the officer position of treasurer. The treasurer is a time demanding position in that he supports the day to day operation of the foundation, balancing the books, and generates monthly treasurer reports.


    David is Director of Product Management at Qualcomm and has served on the LLVM Foundation board of directors for the last 2 years


    Anton Korobeynikov has been an active contributor to the LLVM project since 2006. Over the years, he has numerous technical contributions to areas including Windows support, ELF features, debug info, exception handling, and backends such as ARM and x86. He was the original author of the MSP430 and original System Z backend.


    In addition to his technical contributions, Anton has maintained LLVM’s participation in Google Summer of Code by managing applications, deadlines, and overall organization. He also supports the LLVM infrastructure and has been on numerous program committees for the LLVM Developers’ Meetings (both US and EuroLLVM).


    Anton is currently an associate professor at the Saint Petersburg State University and has served on the LLVM Foundation board of directors for the last 2 years.


    Tanya Lattner has been involved in the LLVM project for over 14 years. She began as a graduate student who wrote her master's thesis using LLVM, and continued on using and extending LLVM technologies at various jobs during her career as a compiler engineer.   


    Tanya has been organizing the US LLVM Developers’ meeting since 2008 and attended every developer meeting. She was the LLVM release manager for 3 years, moderates the LLVM mailing lists, and helps administer the LLVM infrastructure servers, mailing lists, bugzilla, etc. Tanya has also been on the program committee for the US LLVM Developers’ meeting (4 years) and the EuroLLVM Developers’ Meeting (1 year).


    With the support of the initial board of directors, Tanya created the LLVM Foundation, defined its charitable and education mission, and worked to get 501(c)(3) status.


    Tanya is the Chief Operating Officer and has served as the President of the LLVM Foundation board for the last 2 years.


    Chris Lattner is well known as the founder for the LLVM project and has a lengthy history of technical contributions to the project over the years.  He drove much of the early implementation, architecture, and design of LLVM and Clang.


    Chris has attended every LLVM Developers’ meeting, and presented at the majority. He helped drive the conception and incorporation of the LLVM Foundation, and has served as Secretary of the board for the last 2 years. Chris also grants commit access to the LLVM Project, moderates mailing lists, moderates and edits the LLVM blog, and drives important non-technical discussions and policy decisions related to the LLVM project.


    Chris manages the Developer Tools department at Apple Inc and has served on the LLVM Foundation board of directors for the last 2 years.



    John Regehr has been involved in LLVM for a number of years. As a professor of computer science at the University of Utah, his research specializes in compiler correctness and undefined behavior. He is well known within the LLVM community for the hundreds of bug reports his group has reported to LLVM/Clang.


    John was a project lead for IOC, a Clang based integer overflow checker that eventually became the basis for the integer parts of UBSan. He was also the primary developer of C-Reduce which utilizes Clang as a library and is often used as a test case reducer for compiler issues.

    In addition to his technical contributions, John has served on several LLVM-related program committees. He also has a widely read blog about LLVM and other compiler-related issues (Embedded in Academia).

    by Unknown (noreply@blogger.com) at September 12, 2016 05:10 PM

    August 17, 2016

    OpenMP Runtime Project

    New code release

    We are excited to announce the next release of the Intel® OpenMP* Runtime Library at openmprtl.org. This release aligns with Intel® Parallel Studio XE 2017 Composer Edition

    New Features:

    • OpenMP* 4.5 nonmonotonic modifier for schedule dynamic and guided support

    Bug Fixes:

    by mad\vishakh1 at August 17, 2016 07:06 PM

    June 21, 2016

    LLVM Blog

    ThinLTO: Scalable and Incremental LTO

    ThinLTO was first introduced at EuroLLVM in 2015, with results shown from a prototype implementation within clang and LLVM. Since then, the design was reviewed through several RFCs, it has been implemented in LLVM (for gold and libLTO), and tuning is ongoing. Results already show good performance for a number of benchmarks, with compile time close to a non-LTO build.

    This blog post covers the background, design, current status and usage information.

    This post was written by Teresa Johnson, Mehdi Amini and David Li.

    LTO Background and Motivation

    LTO (Link Time Optimization) is a method for achieving better runtime performance through whole-program analysis and cross-module optimization. During the compile phase, clang will emit LLVM bitcode  instead of an object file. The linker recognizes these bitcode files and invokes LLVM during the link to generate the final objects that will constitute the executable. The LLVM implementation loads all input bitcode files and merges them together to produce a single Module. The interprocedural analyses (IPA) as well as the interprocedural optimizations (IPO) are performed serially on this monolithic Module.



    What this means in practice is that LTO often requires a large amount of memory (to hold all IR at once) and is very slow. And with debug information enabled via -g, the size of the IR and the resulting memory requirements are significantly larger. Even without debug information, this is prohibitive for very large applications, or when compiling on memory-constrained machines. It also makes incremental builds less effective, as everything from the LTO step on must be re-executed when any input source changes.

    ThinLTO Design

    ThinLTO is a new approach that is designed to scale like a non-LTO build, while retaining most of the performance achievement of full LTO.
    In ThinLTO, the serial step is very thin and fast. This is because instead of loading the bitcode and merging a single monolithic module to perform these analyses, it utilizes compact summaries of each module for global analyses in the serial link step, as well as an index of function locations for later cross module importing. The function importing and other IPO transformations are performed later when the modules are optimized in fully parallel backends.

    The key transformation enabled by ThinLTO global analyses is function importing, in which only those functions likely to be inlined are imported into each module. This minimizes the memory overhead in each ThinLTO backend, while maximizing the most impactful cross module optimization opportunities. The IPO transformations are therefore performed on each module extended with its imported functions.

    The ThinLTO process is divided into 3 phases:
    1. Compile: Generate IR as with full LTO mode, but extended with module summaries 
    2. Thin Link: Thin linker plugin layer to combine summaries and perform global analyses 
    3. ThinLTO backend: Parallel backends with summary-based importing and optimizations 

    By default, linkers that support ThinLTO (see below) are set up to launch the ThinLTO backends in threads. So the distinction between the second and third phases is transparent to the user.

    The key enabler for this process are the summaries emitted during phase 1. These summaries are emitted using the bitcode format, but designed so that they can be separately loaded without involving an LLVMContext or any other expensive construction. Each global variable and function has an entry in the module summary. An entry contains metadata that abstracts the symbol it is describing. For example, a function is abstracted with its linkage type, the number of instructions it contains, and optional profiling information (PGO). Additionally, every reference (address taken, direct call) to another global is recorded. This information enables building a complete reference graph during the Thin Link phase, and subsequent fast analyses using the global summary information.

    Current Status

    ThinLTO is currently supported in both the gold plugin as well as in ld64 starting with Xcode 8. Additionally, support is currently being added to the lld linker. The 3.9 release of clang will have ThinLTO accessible using the -flto=thin command line option.

    While tuning is still in progress, ThinLTO already performs well compared to LTO, in many cases matching the performance improvement. In a few cases ThinLTO even outperforms full LTO, most likely because the higher scalability of ThinLTO allows using a more aggressive backend optimization pipeline (similar to that of a non-LTO build).

    The following results were collected for the C/C++ SPEC cpu2006 benchmarks on an 8-core 2.6GHz Intel Xeon E5-2689. Each benchmark was run in isolation three times and results are shown for the average of the three runs.



    Critically, due to the scalable design of ThinLTO, this performance is achieved with a build time that stays within a non-LTO build scale. The following build times were collected on a 20 core 2.8GHz Intel Xeon CPU E5-2680 v2, running Linux and using the gold linker. The results are for an end-to-end build of clang (ninja clang) from a clean build directory, so it includes all the compile steps and links of intermediate binaries such as llvm-tblgen and clang-tblgen.

    Release build shows how ThinLTO build time is very comparable to a non-LTO build. Adding -gline-tables-only adds a very small overhead, and ThinLTO is again similar to the regular non-LTO build. However with full debug information, ThinLTO is still somewhat slower than a non-LTO build due to the additional overhead during importing. Ongoing improvements to debug metadata representation and handling are expected to continue to reduce this overhead. In all cases, full LTO is actually significantly slower.

    On the memory consumption side, the improvements are significant. Over the last two years, FullLTO was significantly improved, as shown on the chart below, but our measurement shows that ThinLTO keeps a large advantage.

    Usage Information

    To utilize ThinLTO, simply add the -flto=thin option to compile and link. E.g.
        % clang -flto=thin -O2 file1.c file2.c -c
        % clang -flto=thin -O2 file1.o file2.o -o a.out

    As mentioned earlier, by default the linkers will launch the ThinLTO backend threads in parallel, passing the resulting native object files back to the linker for the final native link.  As such, the usage model the same as non- LTO. Similar to regular LTO, for Linux this requires using the gold linker configured with plugins enabled or ld64 starting with Xcode 8.

    Distributed Build Support

    To take advantage of a distributed build system, the parallel ThinLTO backends can each be launched as a separate process. To support this, the gold plugin provides a thinlto_index_only option that causes the link to exit after creating the combined index and performing global analysis.

    Additionally, in this mode:
    • Instead of using a monolithic combined index, a separate individual index file is written per backend containing the necessary portions of the combined index for recording the imports and any other global summary based optimization decisions that should be acted on in the backend. 
    • A plain text listing of the bitcode files each module will import from is optionally emitted to aid in distributed build file staging (thinlto-emit-imports-files plugin option). 

    The backends can be launched by invoking clang on the bitcode and providing its index via an option. Finally, the resulting native objects are linked to generate the final binary. For example:

        % clang -flto=thin -O2 file1.c file2.c -c
        % clang -flto=thin -O2 file1.o file2.o -Wl,-plugin-opt,-thinlto-index-only
        % clang -O2 -o file1.native.o -x ir file1.o -c -fthinlto-index=./file1.o.thinlto.bc
        % clang -O2 -o file2.native.o -x ir file2.o -c -fthinlto-index=./file2.o.thinlto.bc
        % clang file1.native.o file2.native.o -o a.out

    Incremental ThinLTO Support

    With full LTO, only the initial compile steps can be performed incrementally. If any input has changed, the expensive serial IPA/IPO step must be redone.

    With ThinLTO, the serial Thin Link step must be redone if any input has changed, however, as noted earlier this is small and fast, and does not involve loading any module. And any particular ThinLTO backend must be redone iff:

    1. The corresponding (primary) module’s bitcode changed 
    2. The list of imports into or exports from the module changed 
    3. The bitcode for any module being imported from has changed 
    4. Any global analysis result affecting either the primary module or anything it imports has changed. 

    For single machine builds, where the threads are launched by the linker, incremental builds can be achieved by caching the module after applying the global summary based optimizations such as importing, using a hash of the information listed above as the key. This caching is already supported in libLTO’s ThinLTO handling, which is used by ld64. To enable it, the link step needs to be passed an extra flag: -Wl,-cache_path_lto,/path/to/cache

    For distributed builds, the above information in items 2-4 are all serialized into the individual index files. So the build system can compare the contents of the input bitcode files (the primary module’s bitcode and any it imports from) along with the combined index against those from an earlier build to decide if a particular ThinLTO backend must be redone. To make this process more efficient, the content of the bitcode file is hashed when emitted during the compile phase, and the result is stored in the bitcode file itself so that the cache can be queried during the Thin Link step without reading the IR.

    The chart below illustrates the full build time of clang in three different situations:
    1. The full link following a clean build.
    2. The developer fixes the implementation of DenseMap::grow(). This is a widely used header in the project, which forces to rebuild a large number of files.
    3. The developer fixes the implementation of visitCallInst() in InstCombineCalls.cpp. This an implementation file and incremental build should be fast.

    These results illustrate how full LTO is not friendly with incremental build, and show how ThinLTO is providing an incremental link-time very close to a non-LTO build.

    by Unknown (noreply@blogger.com) at June 21, 2016 05:24 PM

    June 20, 2016

    LLVM Blog

    LLVM Weekly - #129, Jun 20th 2016

    Welcome to the one hundred and twenty-ninth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    Last week was WWDC, which featured talks on what's new in LLVM (slides) and what's new in Swift (slides). Note that the embedded video player suggests you need Safari or the WWDC app to stream the video, but you can find a downloadable version under the "resources" tab.

    On the mailing lists

    LLVM commits

    • FileCheck learnt the --check-prefixes option as a shorthand for multiple --check-prefix options. r272670.

    • A local_unnamed_addr attribute was introduced. This can be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. r272709.

    • The ScalarReplAggregates pass has been removed as it has been superseded by SROA by a long time. r272737.

    • LLVM's C API gained support for string attributes. r272811.

    • Assembly parsing and lexing has seem some cleanups. r273007.

    Clang commits

    • A new loop distribution pragma was added. Loop distribution is a transformation which attempts to break a loop in to multiple loops with each taking part of the loop body. r272656.

    • The nodebug attribute can now be applied to local variables. r272859.

    • The validity check for MIPS CPU/ABI pairings is now performed at initialisation time and a much clearer message is printed. r272645.

    Other project commits

    • A complete implementation of the C++ Filesystem TS has been checked in. r273034.

    • LLD's ARM port gained initial support for Thumb with ARMv7a. r272881.

    by Unknown (noreply@blogger.com) at June 20, 2016 11:23 AM

    June 16, 2016

    LLVM Blog

    Using LNT to Track Performance


    In the past year, LNT has grown a number of new features that makes performance tracking and understanding the root causes of performance deltas a lot easier. In this post, I’m showing how we’re using these features.

    LNT contains 2 big pieces of functionality:
    1. A server,
      a. to which you can submit correctness and performance measurement data, by sending it a json-file in the correct format,
      b. that analyzes which performance changes are significant and which ones aren't,
      c. that has a webui to show results and analyses in a number of different ways.
    2. A command line tool to run tests and benchmarks, such as LLVM’s test-suite, SPEC2000 and SPEC2006 benchmarks.
    This post focuses on using the server. None of the features I’ll show are LLVM-specific, or even specific to ahead-of-time code generators, so you should be able to use LNT in the same way for all your code performance tracking needs. At the end, I’ll give pointers to the documentation needed to setup an LNT server and how to construct the json file format with benchmarking and profiling data to be submitted to the server.
    The features highlighted focus on tracking the performance of code, not on other aspects LNT can track and analyze.
    We have 2 main uses cases in tracking performance:
    • Post-commit detection of performance regressions and improvements.
    • Pre-commit analysis of the impact of a patch on performance.
    I'll focus on the post-commit detection use case.

    Post-commit performance tracking

    Step 1. Get an overview of the "Daily Report" page

    Assuming your server runs at http://yourlntserver:8000, this page is located at http://yourlntserver:8000/db_default/v4/nts/daily_report
    The page gives a summary of the significant changes it found today.
    An example of the kind of view you can get on that page is the following
    In the above screenshot, you can see that there were performance differences on 3 different programs, bigfib, fasta and ffbench. The improvement on ffbench only shows up on a machine named “machine3”, whereas the performance regression on the other 2 programs shows up on multiple machines.

    The table shows how performance evolved over the past 7 days, one column for each day. The sparkline on the right shows graphically how performance has evolved over those days. When the program was run multiple times to get multiple sample points, these show as separate dots that are vertically aligned (because they happened on the same date). The background color in the sparkline represents a hash of the program binary. If the color is the same on multiple days, the binaries were identical on those days.

    Let’s look first at the ffbench program. The background color in the sparkline is the same for the last 2 days, so the binary for this program didn’t change in those 2 days. Conclusion: the reported performance variation of -8.23% is caused by noise on the machine, not due to a change in code. The vertically spread out dots also indicate that this program has been noisy consistently over the past 7 days.

    Let’s now look at the bigfib. The background color in the sparkline has changed since its previous run, so let’s investigate further. By clicking on one of the machine names in the table, we go to a chart showing the long-term evolution of the performance of this program on that machine.

    Step 2. The long-term performance evolution chart

    This view shows how performance has evolved for this program since we started measuring it. When you click on one of the dots, which each represent a single execution of the program, you get a pop-up with information such as revision, date at which this was run etc.
    When you click on the number after “Run:” in that pop-up, it’ll bring you to the run page.

    Step 3. The Run page

    The run page gives an overview of a full “Run” on a given machine. Exactly what a Run contains depends a bit on how you organize the data, but typically it consists of many programs being run a few times on 1 machine, representing the quality of the code generated by a specific revision of the compiler on one machine, for one optimization level.
    This run page shows a lot of information, including performance changes seen since the previous run:
    When hovering with the mouse over entries, a “Profile” button will show, that when clicked, shows profiles of both the previous run and the current run.

    Step 4. The Profile page

    At the top, the page gives you an overview of differences of recorded performance events between the current and previous run.
    After selecting which function you want to compare, this page shows you the annotated assembly:


    While it’s clear that there are differences between the disassembly, it’s often much easier to understand the differences by reconstructing the control flow graph to get a per-basic-block view of differences. By clicking on the “View:” drop-down box and selecting the assembly language you see, you can get a CFG view. I find showing absolute values rather than relative values helps to understand performance differences better, so I also chose “Absolute numbers” in the drop down box on the far right:
    There is obviously a single hot basic block, and there are differences in instructions in the 2 versions. The number in the red side-bar shows that the number of cycles spent in this basic block has increased from 431M to 716M. In just a few clicks, I managed to drill down to the key codegen change that caused the performance difference!

    We combine the above workflow with the llvmbisect tool available at http://llvm.org/viewvc/llvm-project/zorg/trunk/llvmbisect/ to also quickly find the commit introducing the performance difference. We find that using both the above LNT workflow and the llvmbisect tool are vital to be able to act quickly on performance deltas.

    Pointers on setting up your own LNT server for tracking performance

    Setting up an LNT server is as simple as running the half a dozen commands documented at http://lnt.llvm.org/quickstart.html under "Installation" and "Viewing Results". The "Running tests" section is specific to LLVM tests, the rest is generic to performance tracking of general software.

    The documentation for the json file format to submit results to the LNT server is here: http://lnt.llvm.org/importing_data.html.
    The documentation for how to also add profile information, is at http://lnt.llvm.org/profiles.html.


    by Kristof Beyls (noreply@blogger.com) at June 16, 2016 06:19 AM

    June 13, 2016

    LLVM Blog

    LLVM Weekly - #128, June 13th 2016

    Welcome to the one hundred and twenty-eighth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    LDC, a compiler for the D programming language with an LLVM backends has a major release with 1.0.0. The big news with this release is that the frontend is now completely written in D. Congratulations to everyone involved in this release. See the D website for more information about the D programming language.

    The minor release LLVM 3.8.1-rc1 has been tagged.

    On the mailing lists

    LLVM commits

    • Some of the work from the GSoC project on interprocedural register allocation has started to land. A RegUsageInfoCollector analysis was added that collects the list of clobbered registers for a MachineFunction. A new transformation pass was committed which scans the body of a function to find calls and updates the register mask with the one saved by RegUsageInfoCollector. r272403, r272414.

    • Chapter 2 of the tutorial on building a JIT with ORC has been fleshed out with a rough draft of the text. r271885.

    • The host CPU detection code for x86 has seen a large refactoring. r271921.

    • More documentation has been added about LLVM's CodeView support. r272057.

    • llvm-symbolizer will now be searched for in the same directory as the LLVM or Clang tool being executed. This increases the chance of being able to print pretty backtraces for systems where LLVM tools aren't installed in the $PATH. r272232.

    Clang commits

    • Clang analyzer gained a checker for correct usage of the MPI API in C and C++. r271907.

    • Documentation was added on avoiding static initializers when using profiling. r272067, r272214.

    Other project commits

    • A hardened allocator, 'scudo' was added to compiler-rt. It attempts to mitigate some common heap-based vulnerabilities. r271968.

    • Initial support for ARM has landed in LLD. This is just enough to link a hello world on ARM Linux. r271993.

    • Initial support for AddressSanitizer on Win64 was added. r271915.

    by Alex Bradbury (noreply@blogger.com) at June 13, 2016 12:08 PM

    June 06, 2016

    LLVM Blog

    LLVM Weekly - #127, June 6th 2016

    Welcome to the one hundred and twenty-seventh issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    Graham Markall at Embecosm has been comparing the code size of RISC-V binaries produced by the GCC and LLVM ports, as well as compared to ARM. GCC is currently ahead, though it is worth noting the LLVM port has seen much less attention.

    Matthias Reisinger is a Google Summer of Code student working on enabling polyhedral optimisations for the Julia programming language. He's written a blog post detailing his initial steps and immediate future plans. Hopefully we'll see more posts over the summer.

    Loïc Hamot has been working on a C++ to D converter, implemented using Clang.

    The MSVC team have blogged about the latest release of Clang with Microsoft CodeGen, based on Clang 3.8.

    There is going to be a clang-tidy code dojo in Warsaw on Tuesday the 7th of June.

    On the mailing lists

    LLVM commits

    • LLVM gained support for 'SJLJ' (setjmp/longjmp) exception handling on x86 targets. r271244.

    • LLVM now requires CMake 3.4.3 to build r271325.

    • Support was added for attaching metadata to global variables. r271348.

    • The AArch64 backend switched to use SubtargetFeatures rather than testing for specific CPUs. r271555.

    Clang commits

    • The release notes have been updated to explain the current level of OpenMP support (full support of non-offloading features of OpenMP 4.5). r271263.

    • Clang's source-based code coverage has been documented. r271454.

    Other project commits

    • An -fno-exceptions libc++abi library variant was defined, to match the -fno-exceptions libc++ build. r271267.

    • LLDB's compact unwind printing tool gained support for ARMv7's compact unwind format. r271744.

    by Alex Bradbury (noreply@blogger.com) at June 06, 2016 03:02 PM

    May 30, 2016

    LLVM Blog

    LLVM Weekly - #126, May 30th 2016

    Welcome to the one hundred and twenty-sixth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    I've been moving house this weekend, so do accept my apologies if you find this issue to be a little less thorough than usual.

    News and articles from around the web

    Pyston, the LLVM-based Python compiler has released version 0.5. The main changes are a switch to reference counting and NumPy compatibility.

    I don't want to become "C++ weekly", but I think this audience appreciates a fun use of C++ features. Verdigris is a header-only library that allows you to use Qt5 without the moc preprocessor.

    The call for papers for the 3rd workshop on the LLVM compiler infrastructure in HPC has been published. The deadline for paper submission is September 1st. The workshop will take place on November 14th in Salt Lake City, and is held in conjunction with SC16.

    On the mailing lists

    • Vivek Pandya, a GSoC student working on interprocedural register allocation has shared a weekly status report.

    • Rafael Espíndola has proposed creating a bitcode symbol table.

    • There's been some updates on the progress of open-sourcing PGI's Fortran frontend.

    • Elena Lepilkina has proposed some enhancement to FileCheck. Some questions were raised about how useful the proposed extensions will be. Sergey Yakoushkin provided more background on how these features are used in a commercial codebase. As Elena notes, these features don't need to all be upstreamed at once (or at all), and are mostly independent.

    • Lang Hames has posted a heads-up about upcoming breaking API changes for ORC and MCJIT.

    • Sean Silva has kicked off a discussion on the state of IRPGO. You might ask what is IRPGO? This is profile-guided optimisation performed through instrumentation at the LLVM IR level, as opposed to FEPGO where instrumentation is added by the frontend (e.g. Clang), prior to lowering to IR. Sean would like to make IRPGO the default on all platforms other than Apple at the moment (who may require a longer deprecation period). A number of followup comments discuss possibilities for ensuring all platforms can move forward together, and ensuring a sensible flag exists to choose between frontend or middle-end PGO.

    • What exactly is a register pressure set? Both Quentin Colombet and Andrew Trick have answers for us.

    LLVM commits

    • New optimisations covering checked arithmetic were added. r271152, r271153.

    • Advanced unrolling analysis is now enabled by default. r270478.

    • The initial version of a new chapter to the 'Kaleidoscope' tutorial has been committed. This describes how to build a JIT using ORC. r270487, r271054.

    • LLVM's stack colouring analysis data flow analysis has been rewritten in order to increase the number of stack variables that can be overlapped. r270559.

    • Parts of EfficiencySanitizer are starting to land, notably instrumentation for its working set tool. r270640.

    • SelectionDAG learned how to expand multiplication for larger integer types where there isn't a standard runtime call to handle it. r270720.

    • LLVM will now report more accurate loop locations in optimisation remarks by reading the starting location from llvm.loop metadata. r270771.

    • Symbolic expressions are now supported in assembly directives, matching the behaviour of the GNU assembler. r271102.

    • Symbols used by plugins can now be auto-exported on Windows, which improves support for plugins in Windows. See the commit message for a full description. r270839.

    Clang commits

    • Software floating point for Sparc has been exposed in Clang through -msoft-float. r270538.

    • Clang now supports the -finline-functions argument to enable inlining separately from the standard -O flags. r270609.

    Other project commits

    • SectionPiece in LLD is now 8-bytes smaller on 64-bit platforms. This improves the time to link Clang with debug info by 2%. r270717.

    • LLD has replaced a use of binary search with a hash table lookup, resulting in a 4% speedup when linking Clang with debug info. r270999.

    • LLDB now supports AArch64 compact unwind tables, as used on iOS, tvos and watchos. r270658.

    by Alex Bradbury (noreply@blogger.com) at May 30, 2016 07:07 PM

    May 23, 2016

    LLVM Blog

    LLVM Weekly - #125, May 23rd 2016

    Welcome to the one hundred and twenty-fifth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    Stephen Kelly has written a blog post about using Clang through the cindex API to automatically generate Python bindings. He also makes use of SIP.

    Krister Walfridsson has written a wonderfully clear post on C's type-based aliasing rules.

    This week I discovered the Swift Weekly Brief newsletter. Its author, Jesse Squires does a wonderful job of summarising mailing list traffic, recent commits, and discussions on swift-evolution proposals. If you have an interest in Swift development or language design in general I highly recommend it.

    Are you interested in writing for the LLVM blog? Or volunteering to help recruit content authors? If so, get in touch with Tanya.

    The next Cambridge LLVM Social will be held at 7.30pm on May 25th at the Cambridge Blue.

    On the mailing lists

    LLVM commits

    • llc will now report all errors in the input file rather than just exiting after the first. r269655.

    • The SPARC backend gained support for soft floating point. r269892.

    • Reloc::Default no longer exists. Instead, Optional<Reloc> is used. r269988.

    • An initial implementation of a "guard widening" pass has been committed. This will combine multiple guards to reduce the number of checks at runtime. r269997.

    Clang commits

    • clang-include-fixer gained a basic Vim integration. r269927.

    • The intrinsics headers now have feature guards enabled in Microsoft mode to combat the compile-time regression discussed last week due to their increased size. r269675.

    • avxintrin.h gained many new Doxygen comments. r269718.

    Other project commits

    • lld now lets you specify a subset of passes to run in LTO. r269605.

    • LLDB has replaced uses of its own Mutex class with std::mutex. r269877, r270024.

    by Alex Bradbury (noreply@blogger.com) at May 23, 2016 11:41 AM

    May 16, 2016

    LLVM Blog

    LLVM Weekly - #124, May 16th 2016

    Welcome to the one hundred and twenty-fourth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    The main news this week is the announcement of Scala-native, an ahead-of-time compiler for Scala using LLVM. Jos Dirkens has written a getting started guide if you want to compile it and try it out. There's also more information in the slides from the announcement talk.

    On the mailing lists

    LLVM commits

    • The outdated guide on cross-compiling LLVM has been brought up to date. r269054.

    • The WebAssembly backend gained preliminary fast instruction selection (fast-isel) support. r269083, r269203, r269273.

    • Loop unrolling (other than in the case of explicit pragmas) is now disabled at -Os in LLVM. You may recall last week it was enabled for -Os in Clang, but with different thresholds. r269124.

    • A new cost-tracking system has been implemented for the loop unroller. r269388.

    • LLVM's Sparc backend has seen the addition of more LEON-specific features, e.g. signed and unsigned multiply-accumulate. r268908.

    • llc's -run-pass option will now work with any pass known to the pass registry. Previously it would silently do nothing if you specify indirectly added analysis passes or passes not present in the optimisation pipeline. r269003.

    • WebAssembly register stackification and coloring are now run very late in the optimisation pipeline. The commit message suggests it's useful to think of these passes as domain-specific liveness-based compression rather than a conventional optimisation. r269012.

    • When declaring global in textual LLVM IR, you must now assign them with e.g. @0 = global i32 42. r269096.

    • The internal assembler is now enabled by default for 32-bit MIPS targets. r269560.

    Clang commits

    • Clang now supports __float128. r268898.

    • Clang gained a new warning that triggers when casting away calling conventions from a function. r269116.

    • The recently developed include-fixer tools now has documentation. r269167.

    Other project commits

    • compiler-rt's CMake build system can now build builtins without a full toolchain, allowing you to bootstrap a cross-compiler. r268977.

    • LLD will now sort relocations to optimise dynamic linker performance. r269066.

    by Alex Bradbury (noreply@blogger.com) at May 16, 2016 11:26 AM

    May 09, 2016

    LLVM Blog

    LLVM Weekly - #123, May 9th 2016

    Welcome to the one hundred and twenty-third issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    If you're in London tomorrow you may be interested in the NMI Open Source Conference. You can register until midday today. I'll be giving a brief talk on lowRISC. While on the subject of conferences, if you are interested in diversity and inclusion in computing education, you may want to check out the CAS #include diversity conference in Manchester on the 11th June.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    Fabien Giesen has written a brief article explaining why compilers exploit undefined signed overflow.

    The Google Open Source blog has a short piece on the XRay function call tracing system that was proposed for upstreaming last week on the LLVM mailing list.

    On the mailing lists

    LLVM commits

    • LLVM's CppBackend has been removed. As the commit message says, this backend has bit-rotted to the extent that it's not useful for its original purpose and doesn't generate code that compiles. r268631.

    • The AVR backend has seen a large amount of code merged in to LLVM. r268722.

    • The MIPS backend has seen some large changes to how relocations are handled. These are now represented using MipsMCExpr instead of MCSymbolRefExpr. As someone who has done quite a lot of (out-of-tree) LLVM backend work, I've always found it odd how some architectures have globally visible enum members in include/llvm/MC/MCExpr.h. r268379.

    • LLVM builds should hopefully now be deterministic by default, as LLVM_ENABLE_TIMESTAMPS is now opt-in rather than opt-out. In fact, a follow-up patch removed the option altogether. r268441, r268670.

    • The AARch64 backend learned to combine adjustments to the stack pointer for callee-save stack memory and local stack memory. r268746.

    Clang commits

    • Clang now supports -malign-double for x86. This matches the default behaviour on x86-64, where i64 and f64 types are aligned to 8-bytes instead of 4. r268473.

    • Loop unrolling is no longer completely disabled for -Os. r268509.

    • Clang's release notes (reflecting the state of current trunk) have been updated to say more about the state of C++1z support. r268663.

    Other project commits

    • libcxx will now build a libc++experimental.a static library to hold symbols from the experimental C++ Technical Specifications (e.g. filesystem). This library provides no ABI compatibility. r268443, r268456.

    • All usage of pthreads in libcxx has been refactored in to the __threading_support header, with the intention of making it easier to retarget libcxx to platform that don't support pthreads. r268374.

    • libcxx gained support for the polymorphic memory resources C++ TS. r268829.

    by Alex Bradbury (noreply@blogger.com) at May 09, 2016 08:28 AM

    May 02, 2016

    LLVM Blog

    LLVM Weekly - #122, May 2nd 2016

    Welcome to the one hundred and twenty-second issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    GCC 6.1 has been released. Perhaps the most apparent user-visible change is that the C++ frontend now defaults to C++14.

    The Rust compiler has introduced a new intermediate representation, MIR, used for optimisations prior to lowering to LLVM IR.

    Tanya Lattner has written about the LLVM Foundation's plans for 2016. The LLVM Foundation has established 3 main programs: Educational Outreach, Grants and Scholarships, and Women in Compilers and Tools.

    On the mailing lists

    LLVM commits

    • LLVM now supports indirect call promotion based on value-profile information. This will promote indirect calls to a direct call guarded by a precondition. r267815.

    • The LLVM documentation has been extended with a CMake primer covering the basics of the CMake scripting language. r268096.

    • The PDB dumper has been refactored into a library. r267431.

    • The MinLatency attributed has been removed from SchedMachineModel. r267502.

    • CodeGenPrepare will now use branch weight metadata to decide if a select should be turned into a branch. r267572.

    • Support for llvm.loop.distribute.enable metadata was added. This indicates a loop should be split in to multiple loops. r267672.

    • The SystemZ backend now supports the Swift calling convention. r267823.

    • libFuzzer's documentation has been expanded and improved. r267892.

    Clang commits

    • clang-tidy gained a new checker for redundant expressions on both sides of a binary operator. r267574.

    • A new clang-tidy check will warn for use of functions like atoi and atol that don't report conversion errors. r268100.

    • The nodebug attribute on a global or static variable will now suppress all debug info for that variable. r267746.

    • A number of OpenMP features gained codegen support, such as the map clause and target data directive. r267808, r267811.

    Other project commits

    • LLD now supports an -O0 option to produce output as quickly as possible. Currently this disables section merging at the cost of a potentially much larger output. r268056.

    • The symbol table in LLD's ELF linker has been redesigned with the intent of improving memory locality. The new design produces measurable speedups for the binaries tested in the commit message. r268178.

    • LLD's linkerscript support expanded to encompass comparison operators. r267832.

    • LLD performance on large executables has been improved by skipping scanRelocs on sections that are never mapped to memory at runtime (e.g. debug sections). r267917.

    by Alex Bradbury (noreply@blogger.com) at May 02, 2016 03:31 PM

    April 27, 2016

    LLVM Blog

    LLVM Foundation 2016 Announcements

    With 2016 upon us, the LLVM Foundation would like to announce our plans for the year. If you are not familiar with the LLVM Foundation, we are a 501(c)(3) nonprofit that supports the LLVM Project and its community. We are best known for our LLVM Developers’ Meetings, but we are introducing several new programs this year. 

    The LLVM Foundation originally grew out of the need to have a legal entity to plan and support the annual LLVM Developers’ Meeting and LLVM infrastructure. However, as the Foundation was created we saw a need for help in other areas related to the LLVM project, compilers, and tools. The LLVM Foundation has established 3 main programs: Educational Outreach, Grants & Scholarships, and Women in Compilers & Tools.

    Educational Outreach 

    The LLVM Foundation plans to expand its educational materials and events related to the LLVM Project and compiler technology and tools. 

    First, the LLVM Foundation is excited to announce the 2016 Bay Area LLVM Developers’ Meeting will be held November 3-4 in San Jose, CA. This year will be the 10th anniversary of the developer meeting which brings together developers of LLVM, Clang, and related projects. For this year’s meeting, we are increasing our registration cap to 400 in order to allow more community members to attend.

    We also are investigating how we can support or be involved in other conferences in the field of compilers and tools. This may include things such as LLVM workshops or tutorials by sponsoring presenters, or providing instructional materials. We plan to work with other conference organizers to determine how the LLVM Foundation can be helpful and develop a plan going forward.

    However, we want to do more for the community and have brainstormed some ideas for the coming year. We plan to create some instructional videos for those just beginning with LLVM. These will be short 5-10 minute videos that introduce developers to the project and get them started. Documentation is always important, but we find that many are turning to videos as a way to learn. 


    Grants & Scholarships

    We are creating a grants and scholarships program to cover student presenter travel expenses to the LLVM Developers’ Meetings. However, we also hope to expand this program to include student presenter travel to other conferences where the student is presenting their LLVM related work. Details on this program will be published once they have been finalized. 

    Women in Compilers & Tools

    Grace Hopper invented the first compiler and yet women are severely underrepresented in the field of compilers and tools. At the 2015 Bay Area LLVM Developers’ Meeting, we held a BoF on this topic and brainstormed ideas about what can be done. One idea was to increase LLVM awareness at technical conferences that have strong female participation. One such conference is the Grace Hopper Conference (GHC). The LLVM Foundation has submitted a proposal to present about LLVM and how to get involved with the LLVM open source community. We hope our submission is accepted, but if not, we are exploring other ways we can increase our visibility at GHC. Many of the other ideas from this BoF are being considered and actionable plans are in progress.

    In addition, to these 3 programs, we will continue to support the LLVM Project’s infrastructure. The llvm.org server will move to a new machine to increase performance and reliability.  


    We hope that you are excited about the work the LLVM Foundation will be doing in 2016. Our 2016 Plans & Budget may be viewed here. You may also contact our COO & President, Tanya Lattner (tanyalattner@llvm.org) or the LLVM Foundation Board of Directors (board@llvm.org).

    by Tanya Lattner (noreply@blogger.com) at April 27, 2016 03:48 PM

    April 25, 2016

    LLVM Blog

    LLVM Weekly - #121, Apr 25th 2016

    Welcome to the one hundred and twenty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    Congratulations to the eight students who have been selected for LLVM projects on Google Summer of Code this year. There's about a month before they start coding. The time between now and then is the 'community bonding period', so please do make them feel welcome.

    The preliminary release schedule for LLVM/Clang 3.8.1 has been published. This would have a deadline of May 25th for requesting changes to be merged and would see the final release on June 15th.

    On the mailing lists

    LLVM commits

    • An implementation of optimisation bisection support has landed. This helps to track down bugs by allowing optimisations to be selectively disabled at compile-time to identify the one introducing a miscompile. r267022.

    • The AArch64 and ARM thread pointer intrinsics have been merged to make a target-independent llvm.thread.pointer intrinsic. r266818.

    • The llvm.load.relative intrinsic has been added. r267233.

    • There have been more changes to DebugInfo which will require a bitcode upgrade. A script to perform this upgrade is linked in the commit message. r27296.

    • The ORC JIT API improved its support for RPC, including support for calling functions with return values. r266581.

    • The patchable-function function attribution has been introduced, indicating that the function should be easily patchable at runtime. r266715.

    • The IntrReadArgMem intrinsic property has been split in to IntrReadMem and IntrArgMemOnly. r267021.

    • The MachineCombiner gained the ability to combine AArch64 fmul and fadd in to an fmadd. r267328.

    • Scheduling itineraries were added for Sparc, specifically for the LEON processors. r267121.

    Clang commits

    • A prototype of an include fixing tool was created. The indexer remains to be written. r266870.

    • A new warning has been added, which will trigger if the compiler tries to make an implicit instantiation of a template but cannot find the template definition. r266719.

    • Initial driver flags for EfficiencySanitizer were added. r267059.

    Other project commits

    • The initial EfficiencySanitizer base runtime library was added to compiler-rt. It doesn't do much of anything yet. r267060.

    • LLD learned to support the linkerscript ALIGN command. r267145.

    • LLDB can now parse EABI attributes for an ELF input. r267291.

    by Alex Bradbury (noreply@blogger.com) at April 25, 2016 11:20 AM

    April 18, 2016

    LLVM Blog

    LLVM Weekly - #120, Apr 18th 2016

    Welcome to the one hundred and twentieth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    This week has seen not one, but two articles about LLVM and profile-guided optimisation. Dig in John Engelen's article about optimising D's virtual function calls with PGO, then read Geoffroy Couprie's article about PGO with Rust.

    The next Cambridge (UK) social will be at 7.30pm on April 20th, at the Cambridge Blue.

    Alex Denisov has written a blog post around the idea of building a mutation testing system using LLVM.

    On the mailing lists

    LLVM commits

    • AtomicExpandPass learned to lower various atomic operations to __atomic_* library calls. The eventual aim is to move all atomic lowering from Clang to LLVM. r266115.

    • Targets can now define an inlining threshold multiplier, to e.g. increase the likelihood of inlining on platforms where calls are very expensive. r266405.

    • The ownership between DICompileUnit and DISubprogram has been reversed. This may break tests for your out-of-tree backend, but the commit has a link to a Python script to update your testcases. r266446.

    • llvm-readobj learned to print a histogram of an input ELF file's .gnu.hash . r265967.

    • More target-specific support for the Swift calling convention (on ARM, AARch64, and X86) has landed. Also, a callee save register is used for the swiftself parameter. r265997, r266251.

    • A new allocsize attribute has been introduced. This indicates the given function is an allocation function. r266032.

    • analyzeSiblingValues has been replaced with a new lower-complexity implementation in order to reduce compile times. r266162.

    • The AMDGPU backend gained a skeleton GlobalISel implementation. r266356.

    • Every use of getGlobalContext other than the C API has been removed. r266379.

    Clang commits

    • Clang gained support for the GCC ifunc attribute. r265917.

    • The __unaligned type qualifier was implemented for MSVC compatibility. r266415.

    • Support for C++ core guideline Type.6: always initialize a member variable was completed in clang-tidy. r266191.

    • A new clang-tidy checker for suspicious sizeof expressions was added. r266451.

    Other project commits

    • The way relocations are applied in the new ELF linker has been reworked. r266158.

    • ELF LLD now supports parallel codegen for LTO using splitCodeGen. r266484.

    • Support for Linux on SystemZ in LLDB landed. r266308.

    by Alex Bradbury (noreply@blogger.com) at April 18, 2016 01:05 PM

    April 11, 2016

    LLVM Blog

    LLVM Weekly - #119, Apr 11th 2016

    Welcome to the one hundred and nineteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    Last week the slides from the recent EuroLLVM 2016 Developers' Meeting made it online. This week this has been followed by videos of the talks from the conference.

    John Regehr has written about efficient integer overflow checking in LLVM, looking at cases where LLVM can and cannot remove unnecessary overflow checks, and how this might be improved.

    Version 0.13 of Pocl, the portable OpenCL implementation has been released. This release works with LLVM/Clang 3.8 and 3.7, and adds initial OpenCL 2.0 support and improved HSA support.

    Serge Guelton at QuarksLab has written up a really useful guide to implementing a custom directive handler in Clang.

    Microsoft's Visual C++ team are looking for feedback on Clang/C2 (Clang with Microsoft CodeGen).

    On the mailing lists

    • James Molloy has posted an RFC on adding support for constant folding calls to math.h functions on long doubles. Currently these functions aren't constant-folded as the internal APFloat class doesn't implement them and long double operations aren't portable. Solutions include adding support to APFloat, linking against libMPFR to provide compile-time evaluation, or recognising when the long double format of the host and target are the same, so the host math library can be called. From the responses so far, there seems to be some push-back on adding the libMPFR dependency.

    • Sanjoy Das has an RFC on adding a patchable-prologue attribute. This would be used to indicate that the function's prologue is compiled so as to provide support for easy hot-patching.

    • Ulrich Weigand has shared a patch for supporting LLDB on Linux on SystemZ. The patchset contains many big-endian fixes, and may be of interest to others looking at porting LLDB.

    LLVM commits

    • The Swift calling convention as well as support for the 'swifterror' argument has been added. r265433, r265480.

    • Work on GlobalISel continues with many commits related to the assignment of virtual registers to register banks. r265445, r265440.

    • LLVM will no longer perform inter-procedural optimisation over functions that can be "de-refined". r265762.

    • The substitutions supported by lit are now documented. r265314.

    • Unrolled loops now execute the remainder in an epilogue rather than the prologue. This should produce slightly improved code. r265388.

    Clang commits

    • Clang gained necessary support for the Swift calling convention. r265324.

    • New flags -fno-jump-tables and -fjump-tables can be used to disable/enable support for jump tables when lowering switch statements. r265425.

    • TargetOptions is now passed through all the TargetInfo constructors. This will allow target information to be modified based on the ABI selected. r265640.

    • A large number of intrinsics from emmintrin.h now have Doxygen docs. r265844.

    Other project commits

    • clang-tidy gained a new check to flag initializers of globals that access extern objects, leading to potential order-of-initialization issues. r265774.

    • LLD's ELF linker gained new options --start-lib, --end-lib, --no-gnu-unique, --strip-debug. r265710, r265717, r265722.

    by Alex Bradbury (noreply@blogger.com) at April 11, 2016 01:03 PM

    April 04, 2016

    LLVM Blog

    LLVM Weekly - #118, Apr 4th 2016

    Welcome to the one hundred and eighteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    Almost all slides from the recent EuroLLVM conference are now available online for your enjoyment.

    Some readers my be interested in a new paper about the 'LifeJacket' tool for verifying precise floating-point optimisations in LLVM.

    Christian Neumüller has written a new tool for syntax highlighting and cross-referencing C and C++ source using libclang.

    On the mailing lists

    LLVM commits

    • The Lanai backend has landed. r264578.

    • A new llvm.experimental.guard intrinsic has been added. As described in the accompanying documentation, along with deoptimization operand bundles this allows frontends to express guards or checks on optimistic assumptions made during compilation. r264976.

    • Support for a number of new Altivec instructions has been added. Amazingly, this includes BCD (Binary Coded Decimal) instructions. r264568.

    • The concept of MachineFunctionProperties has been introduced, with the first property being AllVRegsAllocated. This allows passes to declare that they require a particular property, in this case requiring that they be run after regalloc. r264593.

    • On X86, push will now be used in preference to mov at all optimisation levels (before it was only enabled for -Os). r264966.

    • LLVM's support library can now compute SHA1 hashes. This is used to implement a 'build-id'. r265094, r265095.

    • When metadata is only referenced in a single function, it will now be emitted just in that function block. The aim of this is to improve the potential of lazy-loading. r265226.

    Clang commits

    • The Lanai backend is now supported in the Clang driver. r264655.

    • libTooling gained a handy formatAndApplyAllReplacements function. r264745.

    Other project commits

    • Parts of LLD are starting to use the new Error handling. r264910, r264921, r264924, and more.

    • Infrastructure was added to LLD for generating thunks (as required on platforms like MIPS when calling PIC code from non-PIC). r265059.

    by Alex Bradbury (noreply@blogger.com) at April 04, 2016 11:22 AM

    April 01, 2016

    LLVM Blog

    My Little LLVM: Undefined Behavior is Magic!

    A horrible mashup between LLVM's old dragon logo and a My Little Pony inspired pegasus pony
    New LLVM logo

    There’s been lots of discussion online (and then quite some more) about compilers abusing undefined behavior. As a response the LLVM compiler infrastructure is rebranding and adopting a motto to make undefined behavior friendlier and less prone to corruption.


    The re-branding puts to rest a long-standing issue with LLVM’s “dragon” logo actually being a wyvern with an upside-down head, a special form of undefined behavior in its own right. The logo is now clearly a pegasus pony.


    Another great side-effect of this rebranding is increased security by auto-magically closing all vulnerabilities used by the hacker who goes by the pseudonym “Pinkie Pie”.


    These new features are enabled with the -rainbow clang option, in honor of Rainbow Dash’s unary name.


    A Few Examples


    C++’s memory model specifies that data races are undefined behavior. It is well established that no sane compiler would optimize atomics, LLVM will therefore supplement the Standard’s happens-before relationship with an LLVM-specific happens-to-work relationship. On most architectures this will be implemented with micro-pause primitives such as x86’s rep rep rep nop instruction.


    Shifts by bit-width or larger will now return a normally-distributed random number. This also obsoletes rand() and std::random_shuffle.


    bool now obeys the rules of truthiness to avoid that annoying “but what if it’s not zero or one?” interview question. Further, incrementing a bool with ++ now does the right thing.


    Atomic integer arithmetic is already specified to be two’s complement. Regular arithmetic will therefore now also be atomic. Except when volatile, but not when volatile atomic.


    NaNs will now compare equal, subnormals are free to self-classify as normal / zero / other, negative zero simply won’t be a thing, IEEE-754 has been upgraded to PONY-754, floats will still round with style, and generating a signaling NaN is now guaranteed to not be quiet by being equivalent to putchar('\a'). While we’re at it none of math.h will set errno anymore. This has nothing to do with undefined behavior but seriously, errno?


    Type-punning isn’t a thing anymore. We’re renaming it to type-pony-ing, but it doesn’t do anything surprising besides throw parties. AND WHO DOESN’T LIKE PARTIES‽ EVEN SECURITY PEOPLE DO! 🎉


    A Word From Our Sponsors



    The sanitizers—especially undefined behavior sanitizer, address sanitizer and thread sanitizer—are great tools when dealing with undefined behavior. Use them on your tests, combine them with fuzzers, try them as cupcake topping! Be warned: their runtimes aren’t designed to be secure and you shouldn’t ship them in production code!


    Cutie Marks


    To address the horse in the room: we’ve left the new LLVM logo’s cutie mark as implementation-defined. Different instances of the logo can use their own cutie mark to illustrate their proclivities, but must clearly document them.



    by Unknown (noreply@blogger.com) at April 01, 2016 07:02 AM

    March 29, 2016

    OpenMP Runtime Project

    New code release

    We are excited to announce the next release of the Intel® OpenMP* Runtime Library at openmprtl.org. This release aligns with Intel® Parallel Studio XE 2016 Composer Edition Update 3.

    New Features

    • OpenMP* 4.5 schedule(simd:static) support

    Bug Fixes

    • Hwloc topology discovery improved
    • Spin backoff mechanism fixed in lock code
    • Plain barrier performance improved on Intel(R) Xeon Phi

    Contributions

    by mad\tlwilmar at March 29, 2016 06:50 PM

    March 28, 2016

    LLVM Blog

    LLVM Weekly - #117, Mar 28th 2016

    Welcome to the one hundred and seventeenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    Google Summer of Code applications are now closed. Applicants and interested third-parties can look forward to finding out which projects were selected on April 22nd.

    Ramkumar Ramachandra has written a blog post giving a whirlwind tour of the internals of LLVM's fast register allocator (FastRegAlloc.cpp).

    Alex Denisov has blogged about the various test suites used within the LLVM project.

    Version 1.13 of the TTA-based Co-design Environment (TCE) has been released. This adds support for LLVM 3.8.

    On the mailing lists

    LLVM commits

    • A new utility, update_test_checks.py was added to update opt or llc test cases with new FileCheck patterns. r264357.

    • Non-power-of-2 loop unroll count pragmas are now supported. r264407.

    • The NVPTX backend gained a new address space inference pass. r263916.

    • Instances of Error are now convertible to std::error_code. Conversions are also available between Expected<T> and ErrorOr<T>. r264221, r264238.

    • Hexagon gained supported for run-time stack overflow checking. r264328.

    Clang commits

    • Clang now supports lambda capture of *this by value. r263921.

    • The bitreverse builtins are now documented. r264203.

    Other project commits

    • LLDB will fix inputted expressions with 'trivial' mistakes automatically. r264379.

    • ThreadSanitizer debugging support was added to LLDB. r264162.

    • Polly gained documentation to describe how it fits in to the LLVM pass pipeline. r264446.

    • LLDB has been updated to handle the UTF-16 APIs on Windows. r264074.

    by Alex Bradbury (noreply@blogger.com) at March 28, 2016 01:22 PM

    March 21, 2016

    LLVM Blog

    LLVM Weekly - #116, Mar 21st 2016

    Welcome to the one hundred and sixteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    If you're a student and would like to get paid to work on an LLVM-related project over the summer then do consider applying for Google Summer of Code with LLVM. More details about Summer of Code are available here. The deadline for applications is this Friday, March 25th at 1900 GMT. I'd also encourage you to look at lowRISC's project ideas if you have an interest in open source hardware.

    Stephen Kelly has written about his new Clang-based tool for porting a C++ codebase to use almost-always-auto. As was pointed out on Twitter, Ryan Stortz from Trail of Bits has a tools that removes auto and does roughly the opposite.

    Honza Hubička has written up his experiments of building LibreOffice with GCC6 and LTO. This includes a comparison to a build using LLVM and Clang.

    Nick Clifton has shared an update for February and March on the GNU toolchain that may be of interest.

    The developer of the Capstone disassembly framework and the Unicorn multi-architecture simulator is running a funding campaign for the Keystone multi-architecture assembler framework. Like Capstone, this will build on LLVM but also aims to go beyond it.

    On the mailing lists

    LLVM commits

    • A new Error support class has been added to support structured error handling. See the associated updates to the LLVM programmer's manual for more info. r263609.

    • New documentation was committed for advanced CMake build configurations. r263834.

    • Support was added for MIPS32R6 compact branches. r263444.

    • The MemCpyOptimizer will now attempt to reorder instructions in order to create an optimisable sequence. r263503.

    • llvm-readobj learnt to print sections and relocations in the GNU style. r263561.

    Clang commits

    • Attributes have been added for the preserve_mostcc and preserve_allcc calling conventions. r263647.

    • clang-format will handle some cases of automatic semicolon insertion in JavaScript. r263470.

    • Clang learned to convert some Objective-C message sends to runtime calls. r263607.

    Other project commits

    • AddressSanitizer is now supported on mips/mips64 Android. r263261.

    • The documentation on the LLD linker has added a few numbers to give an idea of the sort of inputs it needs to handle. e.g. Chrome with debug info contains roughly 13M relocations, 6.3M symbols, 1.8M sections and 17k files. r263466.

    by Alex Bradbury (noreply@blogger.com) at March 21, 2016 12:06 PM

    March 14, 2016

    LLVM Blog

    LLVM Weekly - #115, Mar 14th 2016

    Welcome to the one hundred and fifteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    We have an LLVM-related research position currently being advertised here at the University of Cambridge Computer Lab. If you'd like an informal chat about what it's like working in this group or on this project please don't hesitate to get in touch with me.

    News and articles from around the web

    LLVM and Clang 3.8 have now been released. Check out the LLVM and Clang release notes for a run-down of the new features.

    It's GDC this week and if you're attending you may be interested that there's an LLVM meetup scheduled for Thursday.

    Felix Angell has a detailed blog post introducing generating LLVM IR from Go.

    On the mailing lists

    LLVM commits

    • Loop invariant code motion learnt the ability the exploit the fact a memory location is known to be thread-local. r263072.

    • A new llvm.experimental.deoptimize intrinsic has been added. r26328.

    • A ThinLTOCodeGenerator was added in order to provide a proof-of-concept implementation. r262977.

    • The Sparc backend gained support for co-processor condition branching and conditional traps. r263044.

    Clang commits

    • Clang gained support for the [[nodiscard]] attribute. r262872.

    • New AST matchers were added for addrLabelExpr, atomicExpr, binaryCondtionalOperator, designatedINitExpr, designatedInitExpr, designatorCountIs, hasSyntacticForm, implicitValueINitExpr, labelDecl, opaqueValueExpr, parenListExpr, predefinedExpr, requiresZeroInitialization, and stmtExpr. r263027.

    Other project commits

    • Error and warning messages in LLD are now more consistent. r263125.

    • Documentation on the new ELF and COFF LLD linkers has been updated. r263336.

    by Alex Bradbury (noreply@blogger.com) at March 14, 2016 11:56 AM

    March 07, 2016

    LLVM Blog

    LLVM Weekly - #114, Mar 7th 2016

    Welcome to the one hundred and fourteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    The canonical home for this issue can be found here at llvmweekly.org.

    News and articles from around the web

    LLVM has been accepted as a mentoring organisation in Google Summer of Code 2016. See here for more about what that means. If you're a student who would like to get paid to work on LLVM over the summer, you should definitely consider applying. Also take a look at the full list of organisations in GSoC 2016. If you have an interest in open source hardware, in my (biased) opinion you should definitely look at lowRISC's listed project ideas.

    LLVM and Clang 3.8 'final' has been tagged. A release should be imminent.

    There was a big C++ committee meeting last week. You can find summaries here and here. If you were hoping for modules, concepts, UFCS, ranges, or coroutines in C++17 I'm afraid you're in for disappointment. Many new features will be available in C++ Technical Specifications though.

    llvmlite 0.9.0 has been released. llvmlite is a light-weight Python binding for LLVM. If you're wondering how to get started with llvmlite, then check out this recent blog post from Ian Bertolacci on writing fibonacci in LLVM with llvmlite.

    Andi McClure has written a really interesting blog post about writing software without a compiler. In this case, generating LLVM IR from LuaJIT.

    On the mailing lists

    LLVM commits

    • MemorySSA has gained an initial update API. r262362.

    • TableGen can now check at compile time that a scheduling model is complete. r262384.

    • New comments in PassBuilder give a description of what trade-offs are expected for each optimisation level. r262196.

    • LoopLoadElimination is now enabled by default. r262250.

    • A new patch adding infrastructure for profile-guided optimisation enhancements in the inline has landed. r262636.

    • Experimental ValueTracking code which tried to infer more precise known bits using implied dominating conditions has been removed. Experiments didn't find it to be profitable enough, but it may still be useful to people wanting to experiment out of tree. r262646.

    Clang commits

    • Clang's C API gained an option to demote fatal errors to non-fatal errors. This is likely to be useful for clients like IDEs. r262318.

    • clang-cl gained initial support for precompiled headers. r262420.

    • An -fembed-bitcode driver option has been introduced. r262282.

    • Semantic analysis for the swiftcall calling convention has landed. r262587.

    • Clang's TargetInfo will now store an actual DataLayout instance rather than a string. r262737.

    Other project commits

    • LLDB can now read line tables from Microsoft's PDB debug info files. r262528.

    • The LLVM test-suite gained the ability to hash generated binaries and to skip tests if the hash didn't change since a previous run. r262307.

    • LLVM's OpenMP runtime now supports the new OpenMP 4.5 doacross loop nest and taskloop features. r262532, r262535.

    by Alex Bradbury (noreply@blogger.com) at March 07, 2016 12:40 PM

    February 29, 2016

    LLVM Blog

    LLVM Weekly - #113, Feb 29th 2016

    Welcome to the one hundred and thirteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at http://llvmweekly.org and pass it on to anyone else you think may be interested. Please send any tips or feedback to asb@asbradbury.org, or @llvmweekly or @asbradbury on Twitter.

    News and articles from around the web

    LLVM and Clang 3.8RC3 has been tagged.

    EuroLLVM 2016 is less than a month away. If you want to attend, be sure to register.

    The Red Hat blog has a summary of new features in the upcoming GCC 6 release.

    The Meeting C++ blog has a helpful summary of a subset of the proposals for the next C++ committee meeting.

    On the mailing lists

    LLVM commits

    • The Sparc backend now contains definitions for all registers and instructions defined in the Sparc v8 manual. r262133.

    • LLVM gained a basic LoopPassManager, though it currently only contains dummy passes. r261831.

    • A number of TargetInstrInfo predicates now take a reference to a MachineInstr rather than a pointer. r261605.

    • The WebAssembly backend gained redzone support for the userspace stack. r261662.

    Clang commits

    • Whole-program vtable optimisation is now available in Clang using the -fwhole-program-vtables flag. r261767.

    • Clang gained __builtin_canonicalize which returns the platform-specific canonical encoding of a floating point number. r262122.

    • A hasAnyName matcher was added. r261574.

    • The pointer arithmetic checker has been improved to report fewer false positives. r261632.

    Other project commits

    • The new ELF linker gained support for identical code folding (ICF). This reduces the size of an LLD binary by 3.6% and of a Clang binary by 2.7%. As described in the commit message, this is not a "safe" version of ICF as implemented in GNU gold, so will cause issues if the input relies on two distinct functions always having distinct addresses. r261912.

    • Polly's tree now contains an update_check.py script that may be useful to other LLVM devs. It updates a FileCheck-based lit test by updating the CHECK: lines with the actual output of the RUN: command. r261899.

    • LLDB gained a new set of plugins to help debug Java programs, specifically Java code JIT-ed by the Android runtime. r262015.

    • The new OpenMP 4.5 affinity API is now supported in LLVM's openmp implementation. r261915.

    • The new ELF linker gained support for the -r command-line option, which produces relocatable output (partial linking). r261838.

    • The CMake/lit runner for SPEC in the LLVM test-suite can now run the C CPU2006 floating point benchmarks (but not the Fortran ones). r261816.

    • The old ELF linker has been deleted from LLD. r262158.

    by Alex Bradbury (noreply@blogger.com) at February 29, 2016 02:58 PM