Benefits of well-Designed projects : GCC vs Clang

GCC (GNU Compiler Collection) and Clang are two of the most prominent C++ compilers in the world of software development. Each has a unique design philosophy and architecture that caters to different needs and preferences. This article explores the fundamental design differences between GCC and Clang, highlighting how these differences impact their functionality, performance, and usability.

Architectural Design Differences

Clang: Clang’s design is highly modular. It consists of a series of well-defined libraries (frontend, middle-end, and backend) that can be used independently or together. This modularity makes Clang easily extensible and maintainable. And LibClang Provides a stable C interface to the Clang libraries, facilitating the development of tools and IDE integrations.

GCC: Historically, GCC was designed as a monolithic compiler with tightly coupled components. While it has become more modular over time, it is still less modular than Clang. GCC supports plugins, but its architecture makes it harder to extend compared to Clang’s more flexible design.

Let’s go deep into the Clang design to understand why its architecture facilitates the addition of new features and plugins.

Clang Design:

Like many other compilers design, Clang compiler has three phase:

  • The front end that parses source code, checking it for errors, and builds a language-specific Abstract Syntax Tree (AST) to represent the input code.
  • The optimizer: its goal is to do some optimization on the AST generated by the front end.
  • The back end : that generate the final code to be executed by the machine, it depends of the target.

What the difference between Clang and the other compilers?

The most important difference of its design is that Clang is based on LLVM , the idea behind LLVM is to use LLVM Intermediate Representation (IR), it’s like the bytecode for java.
LLVM IR is designed to host mid-level analyses and transformations that you find in the optimizer section of a compiler. It was designed with many specific goals in mind, including supporting lightweight runtime optimizations, cross-function/interprocedural optimizations, whole program analysis, and aggressive restructuring transformations, etc. The most important aspect of it, though, is that it is itself defined as a first class language with well-defined semantics.

With this design we can reuse a big part of the compiler to create other compilers, you can for example just change the front end part to treat other languages.

I- Front End:

Clang is designed to be modular and each compilation phase is done by a specific  module, Here are some projects implied in the front end phase:

As any front end parser we need a lexer and a semantic analysis. The Clang front end could be executed by passing the -cc1 argument. It has several features like the AST generation:

clang -cc1 -ast-dump test.c

This command line is treated by the cc1_main function, here’s the sequence of some interesting methods executed

clang11

The method ExecuteAction has a parameter of type  FrontEndAction , the goal is to specify which frond end action to execute. The FrontEndction is an abstract class and we need to inherit from it to implement a concrete front end action.

Let’s discover all the front end actions implemented by Clang using CQLinq, for that we can search for all classes inheriting directly or indirectly from it.

from t in Types
let depth0 = t.DepthOfDeriveFrom(“clang.FrontendAction”)
where depth0  >= 0 orderby depth0
select new { t, depth0 }

Many front end actions are available, for example ASTDumpAction permits to generate the AST without creating the final executable. Almost all the front end actions inherits from ASTFrontEndAction, which means that they work with the generated AST.

What’s interesting about this design is that we can easily plug in our custom FrontEndAction; we simply need to implement a new one.

How we can do some treatments on the AST?

Each ASTFrontEndAction create one or many ASTConsumer instances, the ASTConsumer class is an abstract class, and we have to implement our AST consumer for our specific needs.

The FrontEndAction will invoke the AST consumer as specified by the following dependency graph.

Let’s search for all ASTConsumer classes using CQLinq:

from t in Types
let depth0 = t.DepthOfDeriveFrom(“clang.ASTConsumer”)
where depth0  == 1
select new { t, depth0 }

CodeGenerator is an example of an AST Consumer, and as specified before the power of LLVM is to work with IR, and to generate it we need to parse the AST. CodeGenerator is the class inheriting from ASTConsumer responsible of generating the IR, and what’s interesting is that this treatment is isolated into another project named ClangCodeGen.

Here are some classes implied in the LLVM IR generation:

II- Optimizer

To explain this phase I can’t say better than Chris Lattner the father of LLVM in this post:

“To give some intuition for how optimizations work, it is useful to walk through some examples. There are lots of different kinds of compiler optimizations, so it is hard to provide a recipe for how to solve an arbitrary problem. That said, most optimizations follow a simple three-part structure:

  • Look for a pattern to be transformed.
  • Verify that the transformation is safe/correct for the matched instance.
  • Do the transformation, updating the code.

The optimizer reads LLVM IR in, chews on it a bit, then emits LLVM IR, which hopefully will execute faster. In LLVM (as in many other compilers) the optimizer is organized as a pipeline of distinct optimization passes each of which is run on the input and has a chance to do something. Common examples of passes are the inliner (which substitutes the body of a function into call sites), expression reassociation, loop invariant code motion, etc. Depending on the optimization level, different passes are run: for example at -O0 (no optimization) the Clang compiler runs no passes, at -O3 it runs a series of 67 passes in its optimizer (as of LLVM 2.8).

Let’s discover the LLVMCore passes, for that we can search for classes inheriting from “pass” class

from t in Types
let depth0 = t.DepthOfDeriveFrom(“llvm.Pass”)
where t.ParentProject.Name==”LLVMCore” && depth0  >= 0 orderby depth0
select new { t, depth0 }

Of course many other passes exist in the other LLVM modules.

III- BackEnd

Like other phases the backend responsible of generating the output for a specific target, in the case of Clang it’s very modular, let’s take as example LLVMX86Target the module generating for X86 target.

Here’s the graph showing all the modules concerned by generating the binaries for x86 target.

Many modules are involved in this phase, each on has it’s specific responsibility, which enforces the cohesion and encourages clean APIs and separation. Therefore making it easier for developers to understand, since they only have to undertand small pieces of the big picture.

Conclusion

The duo LLVM/Clang is not just a C/C++ compiler, it’s also an infrastructure to build tools, it’s easy to extend its behavior. Many tools are included out of the box in the LLVM/Clang source code and many others could be found in the web.

If you need a C/C++ parser to build a tool, Clang is a very good candidate.