{"id":321,"date":"2018-02-15T12:57:48","date_gmt":"2018-02-15T12:57:48","guid":{"rendered":"http:\/\/cppdepend.com\/blog\/?p=321"},"modified":"2018-02-15T12:57:48","modified_gmt":"2018-02-15T12:57:48","slug":"quick-overview-of-how-clang-works-internally","status":"publish","type":"post","link":"https:\/\/cppdepend.com\/blog\/quick-overview-of-how-clang-works-internally\/","title":{"rendered":"Quick overview of how Clang works internally"},"content":{"rendered":"<p>It\u2019s proven that Clang is a mature compiler For C and C++ as GCC and Microsoft compilers, but what makes it so special is the fact that it\u2019s not just a compiler. It\u2019s also an infrastructure to build tools. Thanks to its\u00a0library based architecture which makes the reuse and integration of\u00a0new features more flexible\u00a0and easier to integrate into other projects.<br \/>\n<!--more--><\/p>\n<p><strong>Clang Design:<\/strong><\/p>\n<p>Like many other compilers design, Clang compiler has three phase:<\/p>\n<ul>\n<li>The front end that parses source code, checking it for errors, and builds a language-specific Abstract Syntax Tree (AST) to represent the input code.<\/li>\n<li>The optimizer: its goal is to do some optimization on the AST generated by the front end.<\/li>\n<li>The back end : that generate the final code to be executed by the machine, it depends of the target.<\/li>\n<\/ul>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/latfig1.gif\"><img decoding=\"async\" class=\"aligncenter size-full\" title=\"latfig1\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/latfig1.gif?w=595\" alt=\"\" \/><\/a><\/p>\n<p><strong>What the difference between Clang and the other compilers?<\/strong><\/p>\n<p>The most important difference of its design is that Clang is based on LLVM , the idea behind LLVM is to use LLVM Intermediate Representation (IR), it\u2019s like the bytecode for java.<br \/>\nLLVM IR is designed to host mid-level analyses and transformations that you find in the optimizer section of a compiler. It was designed with many specific goals in mind, including supporting lightweight runtime optimizations, cross-function\/interprocedural optimizations, whole program analysis, and aggressive restructuring transformations, etc. The most important aspect of it, though, is that it is itself defined as a first class language with well-defined semantics.<\/p>\n<p>With this design we can reuse a big part of the compiler to create other compilers, you can for example just change the front end part to treat other languages.<\/p>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/latfig2.gif\"><img decoding=\"async\" class=\"aligncenter size-full \" title=\"latfig2\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/latfig2.gif?w=595\" alt=\"\" \/><\/a><\/p>\n<p><strong>I- Front End:<\/strong><\/p>\n<p>Clang is designed to be modular and each compilation phase is done by a specific\u00a0\u00a0module, Here are some projects implied in the front end phase:<\/p>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm1.png\"><img decoding=\"async\" class=\"aligncenter size-full\" title=\"llvm1\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm1.png?w=595\" alt=\"\" \/><\/a><\/p>\n<p>As any front end parser we need a lexer and a semantic analysis. The Clang front end could be executed by passing the -cc1 argument.\u00a0It\u00a0has several features like the AST generation:<\/p>\n<blockquote><p>clang -cc1 -ast-dump test.c<\/p><\/blockquote>\n<p>This command line is treated by the cc1_main function,\u00a0here\u2019s the sequence of some interesting methods executed<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/clang111.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/clang111.png\" alt=\"clang11\" width=\"574\" height=\"292\" \/><\/a><\/p>\n<p>The method ExecuteAction has a parameter of type \u00a0FrontEndAction , the goal is to specify which frond end action to execute.\u00a0The FrontEndction is abstract , we need to inherit from it to implement a concrete front end action.<\/p>\n<p>Let\u2019s discover all the front end actions implemented by Clang using\u00a0<a href=\"http:\/\/cppdepend.com\/cqlinq.aspx\">CQLinq<\/a>, for that we can search for all classes inheriting directly or indirectly from it.<\/p>\n<pre>from t in Types\r\nlet depth0 = t.DepthOfDeriveFrom(\u201cclang.FrontendAction\u201d)\r\nwhere depth0  &gt;= 0 orderby depth0\r\nselect new { t, depth0 }\r\n<\/pre>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm2.png\"><img decoding=\"async\" class=\"aligncenter size-full\" title=\"llvm2\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm2.png?w=595\" alt=\"\" \/><\/a><\/p>\n<p>Many front end actions are available, for example ASTDumpAction permits to generate the AST without creating the final executable. Almost all the front end actions inherits\u00a0from ASTFrontEndAction, which\u00a0means that they work with the generated AST.<\/p>\n<p>What\u2019s interesting with this design is \u00a0we can plug our custom FrontEndAction easily, we have just to implement a new one.<\/p>\n<p><strong>How we can do some treatments on the AST?<\/strong><\/p>\n<p>Each ASTFrontEndAction create one or many ASTConsumer, the ASTConsumer class is an abstract class, and we have to implement our AST consumer for our specific needs.<\/p>\n<p>The FrontEndAction will invoke the AST consumer as specified by the following graph.<\/p>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm5.png\"><img decoding=\"async\" class=\"aligncenter\" title=\"llvm5\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm5.png\" alt=\"\" \/><\/a><\/p>\n<p>Let\u2019s search for all ASTConsumer classes using\u00a0<a href=\"http:\/\/cppdepend.com\/cqlinq.aspx\">CQLinq<\/a>:<\/p>\n<pre>from t in Types\r\nlet depth0 = t.DepthOfDeriveFrom(\u201cclang.ASTConsumer\u201d)\r\nwhere depth0  == 1\r\nselect new { t, depth0 }\r\n<\/pre>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm41.png\"><img decoding=\"async\" class=\"aligncenter size-full\" title=\"llvm4\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm41.png?w=595\" alt=\"\" \/><\/a><\/p>\n<p><strong>CodeGenerator is an example of the AST Consumer<\/strong><\/p>\n<p>As we specified before the power of LLVM is to work with IR, and to generate it we need to parse the AST. CodeGenerator is the class inheriting from ASTConsumer responsible of generating the IR, and what\u2019s interesting is that this treatment is isolated into another project named ClangCodeGen.<\/p>\n<p>Here are some classes implied in the LLVM IR generation:<\/p>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm6.png\"><img decoding=\"async\" class=\"aligncenter\" title=\"llvm6\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm6.png\" alt=\"\" \/><\/a><\/p>\n<p><strong>II- Optimizer<\/strong><\/p>\n<p>To explain this phase I can\u2019t say better than Chris Lattner the father of LLVM in this\u00a0<a href=\"http:\/\/www.drdobbs.com\/architecture-and-design\/the-design-of-llvm\/240001128?pgno=1\">post<\/a>:<\/p>\n<p><em>\u201cTo give some intuition for how optimizations work, it is useful to walk through some examples. There are lots of different kinds of compiler optimizations, so it is hard to provide a recipe for how to solve an arbitrary problem. That said, most optimizations follow a simple three-part structure:<\/em><\/p>\n<ul>\n<li>Look for a pattern to be transformed.<\/li>\n<li>Verify that the transformation is safe\/correct for the matched instance.<\/li>\n<li>Do the transformation, updating the code.<\/li>\n<\/ul>\n<p><em>The optimizer reads LLVM IR in, chews on it a bit, then emits LLVM IR, which hopefully will execute faster. In LLVM (as in many other compilers) the optimizer is organized as a pipeline of distinct optimization passes each of which is run on the input and has a chance to do something. Common examples of passes are the inliner (which substitutes the body of a function into call sites), expression reassociation, loop invariant code motion, etc. Depending on the optimization level, different passes are run: for example at -O0 (no optimization) the Clang compiler runs no passes, at -O3 it runs a series of 67 passes in its optimizer (as of LLVM 2.8).<br \/>\n<\/em><\/p>\n<p>Let\u2019s discover the LLVMCore passes, for that we can search for classes inheriting from \u201cpass\u201d class<\/p>\n<pre>from t in Types\r\nlet depth0 = t.DepthOfDeriveFrom(\u201cllvm.Pass\u201d)\r\nwhere t.ParentProject.Name==\u201dLLVMCore\u201d &amp;&amp; depth0  &gt;= 0 orderby depth0\r\nselect new { t, depth0 }\r\n<\/pre>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm7.png\"><img decoding=\"async\" class=\"aligncenter size-full\" title=\"llvm7\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm7.png?w=595\" alt=\"\" \/><\/a><\/p>\n<p>Of course many other passes exist in the other LLVM modules.<\/p>\n<p><strong>III- BackEnd<\/strong><\/p>\n<p>Like other phases the backend responsible of generating the output for a specific target, in the case of Clang it\u2019s very modular, let\u2019s take as example LLVMX86Target the module generating for X86 target.<\/p>\n<p>Here\u2019s the graph showing all the modules concerned by generating the binaries for x86 target.<\/p>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm8.png\"><img decoding=\"async\" class=\"aligncenter size-full\" title=\"llvm8\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/llvm8.png?w=595\" alt=\"\" \/><\/a><\/p>\n<p>Many modules are involved in this phase, each on has it\u2019s specific responsibility, which enforces the cohesion and\u00a0encourages clean APIs and separation.\u00a0Therefore making it easier for developers to understand, since they only have to undertand small pieces of the big picture.<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>The duo LLVM\/Clang is not just a C\/C++ compiler, it\u2019s also an infrastructure to build tools, it\u2019s easy to extend its behavior. Many tools are included out of the box in the LLVM\/Clang source code and many others could be found in the web.<\/p>\n<p>If you need a C\/C++ parser to build a tool, Clang is a very good candidate.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It\u2019s proven that Clang is a mature compiler For C and C++ as GCC and Microsoft compilers, but what makes it so special is the fact that it\u2019s not just a compiler. It\u2019s also an infrastructure to build tools. Thanks to its\u00a0library based architecture which makes the reuse and integration of\u00a0new features more flexible\u00a0and easier &hellip; <a href=\"https:\/\/cppdepend.com\/blog\/quick-overview-of-how-clang-works-internally\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Quick overview of how Clang works internally&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-321","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/posts\/321","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/comments?post=321"}],"version-history":[{"count":8,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/posts\/321\/revisions"}],"predecessor-version":[{"id":379,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/posts\/321\/revisions\/379"}],"wp:attachment":[{"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/media?parent=321"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/categories?post=321"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/tags?post=321"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}