{"id":92,"date":"2017-06-14T23:46:59","date_gmt":"2017-06-14T23:46:59","guid":{"rendered":"http:\/\/cppdepend.com\/wordpress\/?p=92"},"modified":"2017-06-26T22:46:35","modified_gmt":"2017-06-26T22:46:35","slug":"lessons-to-learn-from-the-clangllvm-codebase","status":"publish","type":"post","link":"https:\/\/cppdepend.com\/blog\/lessons-to-learn-from-the-clangllvm-codebase\/","title":{"rendered":"Lessons to learn from the CLang\/LLVM codebase"},"content":{"rendered":"<p>It\u2019s proven that Clang is a mature compiler For C and C++ as GCC and Microsoft compilers, but what makes it special is the fact that it\u2019s not just a compiler. It\u2019s also an infrastructure to build tools. Thanks to its\u00a0library based architecture which makes the reuse and integration of functionality provided more flexible\u00a0and easier to integrate into other projects.<!--more--><span id=\"more-679\"><\/span><\/p>\n<p><strong>Clang Design:<\/strong><\/p>\n<p>Like many other compiler design, Clang compiler has three phase:<\/p>\n<ul>\n<li>The front end that parses source code, checking it for errors, and builds a language-specific Abstract Syntax Tree (AST) to represent the input code.<\/li>\n<li>The optimizer: its goal is to do some optimisaion on the AST generated by the front end.<\/li>\n<li>The back end : that generate the final code to be executed by the machine, it depends of the target.<\/li>\n<\/ul>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/latfig1.gif\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-637\" title=\"latfig1\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/latfig1.gif?w=595\" alt=\"\" \/><\/a><\/p>\n<p><strong>What the difference between Clang and the other compilers?<\/strong><\/p>\n<p>The most important difference of its design is that Clang is based on LLVM , the idea behind LLVM is to use LLVM Intermediate Representation (IR), it\u2019s like the bytecode for java.<br \/>\nLLVM IR is designed to host mid-level analyses and transformations that you find in the optimizer section of a compiler. It was designed with many specific goals in mind, including supporting lightweight runtime optimizations, cross-function\/interprocedural optimizations, whole program analysis, and aggressive restructuring transformations, etc. The most important aspect of it, though, is that it is itself defined as a first class language with well-defined semantics.<\/p>\n<p>With this design we can reuse a big part of the compiler to create other compilers, you can for example just change the front end part to treat other languages.<\/p>\n<p><a href=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/latfig2.gif\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-640\" title=\"latfig2\" src=\"http:\/\/cppdepend.files.wordpress.com\/2012\/10\/latfig2.gif?w=595\" alt=\"\" \/><\/a><\/p>\n<p>It\u2019s very interesting to go inside this powerful game engine and discover how it\u2019s designed and implemented. C++ developers could learn many\u00a0good practices from its code base.<\/p>\n<p>Let\u2019s XRay its source code using\u00a0<a href=\"http:\/\/www.cppdepend.com\/\">CppDepend<\/a>\u00a0and\u00a0<a href=\"http:\/\/cppdepend.com\/cqlinq.aspx\">CQLinq\u00a0<\/a>to explore some design and implementation choices of its developement team.<span id=\"more-1069\"><\/span><\/p>\n<h2>1- Modularity:<\/h2>\n<p><em>1-1 Modualrity using Libraries<\/em><\/p>\n<p>A major design concept for clang is its use of a library-based architecture. In this design, various parts of the front-end can be cleanly divided into separate libraries which can then be mixed up for different needs and uses. In addition, the library-based approach encourages good interfaces and makes it easier for new developers to get involved (because they only need to understand small pieces of the big picture).<\/p>\n<p>The DSM (Dependency Structure Matrix) is a compact way to represent and navigate across dependencies between components. A non-empty DSM Cell contain a number. This number represent the strengths of the coupling represented by the cell. The coupling strength can be expressed in terms of number of members\/methods\/fields\/types or namespaces involved in the coupling. The DSM could also show us the dependency ycles between the libraries.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM20.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2160\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM20.png\" alt=\"LLVM20\" width=\"963\" height=\"541\" \/><\/a><\/p>\n<p>This\u00a0dependency graph shows\u00a0the libraries used directly by clang,<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2140\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm1.png\" alt=\"llvm1\" width=\"1022\" height=\"273\" \/><\/a><\/p>\n<p>As we can remark there are three dependency cycles between the libraries clangBasic\/clangFrontEnd, clangBasic\/clangDriver and\u00a0clangBasic\/clangLex. It\u2019s recommended to remove any dependency cycle between the libraries to makes the code more readable and maintainable.<\/p>\n<p>It\u2019s normal that clangFrontend uses clangBasic, However why the clangFrontend uses the clangBasic library?<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2146\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm2.png\" alt=\"llvm2\" width=\"426\" height=\"522\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Only one enum field is the origin of this dependency cycle, the code could be refactored and the\u00a0dependency could be easily removed.<\/p>\n<p><em>1-2 Modualrity using namespaces<\/em><\/p>\n<p>In C++ the namespaces are also used to modularize the code base and for the LLVM\/clang they are used\u00a0for three\u00a0main reasons:<\/p>\n<ul>\n<li>Many\u00a0namespaces contains only enums as shown by this following CQLinq query, which gives us the ones\u00a0containing only enums<\/li>\n<\/ul>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM7.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2150\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM7.png\" alt=\"LLVM7\" width=\"505\" height=\"478\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>In a large project, you would not be guaranteed that two distinct enums don\u2019t both called with the same name. This issue was resolved in C++11, using\u00a0\u00a0<strong>enum class<\/strong>\u00a0which implicitly scope the enum values within the enum\u2019s name. The code could be refactored in the near future to use C++11 enum classes.<\/p>\n<ul>\n<li>Anonymous namespace: Namespace with no name avoids making global static variable. The \u201canonymous\u201d namespace you have created will only be accessible within the file you created it in. Here\u2019s the list of all anonymous namespaces used:<\/li>\n<\/ul>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2148\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm3.png\" alt=\"llvm3\" width=\"445\" height=\"493\" \/><\/a><\/p>\n<ul>\n<li>Modularize the code base: Let\u2019s search for all the not anonymous\u00a0ones:<\/li>\n<\/ul>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2149\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM5.png\" alt=\"LLVM5\" width=\"444\" height=\"422\" \/><\/a><\/p>\n<p>The namespaces represents a good solution to modularize the application, LLVM\/clang\u00a0defines more than 500 namespaces to enforces its modularity, which makes the code more readable and maintanble.<\/p>\n<h2>2- Paradigm used:<\/h2>\n<p>C++ is not just an object-oriented language. As Bjarne Stroustrup points out, \u201cC++ is a multi-paradigmed language.\u201d It supports many different styles of programs, or paradigms, and object-oriented programming is only one of these. Some of the others are procedural programming and generic programming.<\/p>\n<p><b>2-1 Procedural Paradigm<\/b><\/p>\n<p><em>2-1-1 Global functions<\/em><\/p>\n<p>Let\u2019s search for all global functions\u00a0defined in the LLVM\/Clang source code:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM8.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2151\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM8.png\" alt=\"LLVM8\" width=\"415\" height=\"515\" \/><\/a><\/p>\n<div><\/div>\n<p>We can classify these functions in three categories:<\/p>\n<p>1 \u2013 Utility \u00a0functions: For example many\u00a0functions concern the conversion from type to another.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM9.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2152\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM9.png\" alt=\"LLVM9\" width=\"415\" height=\"346\" \/><\/a><\/p>\n<p>2 \u2013 Operators: Many operators are defined as shown by the result of this CQLinq query:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM10.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2153\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM10.png\" alt=\"LLVM10\" width=\"416\" height=\"489\" \/><\/a><\/p>\n<p>Almost all kind of operators are implemented in the llvm\/clang\u00a0source code.<\/p>\n<p>3 \u2013 Functions related to the compiler\u00a0logic: Many global functions containing some compiler\u00a0traitments\u00a0\u00a0are implemented.<\/p>\n<p>Maybe these kind of functions \u00a0could be grouped by category\u00a0as static methods into classes. Or grouped in namespaces.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM11.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2154\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM11.png\" alt=\"LLVM11\" width=\"423\" height=\"465\" \/><\/a><\/p>\n<p><em>2-1-2 Static global functions:<\/em><\/p>\n<p>It\u2019s a best practice \u00a0to declare a global function as static unless you have a specific need to call it from another source file.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM12.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2155\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM12.png\" alt=\"LLVM12\" width=\"422\" height=\"492\" \/><\/a><\/p>\n<p>Almost all the\u00a0global functions are declared as static.<\/p>\n<p>2-1-3 Global functions candidate to be static.<\/p>\n<p>Global not exported functions, not defined in an anonymous namespace and not used by any method outside the file where it was defined. are a good candidates to be refactored to be static.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM15.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2156\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM15.png\" alt=\"LLVM15\" width=\"499\" height=\"520\" \/><\/a><\/p>\n<p>As we can observe only very few\u00a0functions are candidates to be refactored to be static.<\/p>\n<p><strong>2-2 Object Oriented paradigm<\/strong><\/p>\n<p><em>2-2-1 Inheritance<\/em><\/p>\n<p>In object-oriented programming (OOP), inheritance is a way to establish Is-a relationship between objects. It is often confused as a way to reuse the existing code which is not a good practice because inheritance for implementation reuse leads to Tight Coupling. Re-usability of code is achieved through composition (Composition over inheritance). Let\u2019s search for all classes having at least one base class:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM16.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2157\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM16.png\" alt=\"LLVM16\" width=\"490\" height=\"508\" \/><\/a><\/p>\n<p>And to have a better idea of the classes concerned by this query, we can use the Metric View.<\/p>\n<p>In the Metric View, the code base is represented through a Treemap. Treemapping is a method for displaying tree-structured data by using nested rectangles. The tree structure used in a CppDepend treemap is the usual code hierarchy:<\/p>\n<ul>\n<li>Projects contains namespaces.<\/li>\n<li>Namespaces contains types.<\/li>\n<li>Types contain methods and fields.<\/li>\n<\/ul>\n<p>The treemap view provides a useful way to represent the result of a CQLinq request, the blue rectangles represent \u00a0this result, so we can visually see the types concerned by the request.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM17.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2158\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM17.png\" alt=\"LLVM17\" width=\"1054\" height=\"501\" \/><\/a><\/p>\n<p>As we can observe, the inheritance is widely used in the llvm\/clang\u00a0source code.<\/p>\n<p><em>Multiple Inheritanc<\/em>e: Let\u2019s search for classes inheriting from more than one concrete class.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm21.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2163\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm21.png\" alt=\"llvm21\" width=\"396\" height=\"534\" \/><\/a><\/p>\n<p>The multiple inheritance is not widely used,less than 1%\u00a0 of the classes inherit from more than one class.<\/p>\n<p><em>2-2-2 Virtual methods<\/em><\/p>\n<p>Let\u2019s search for all virtual methods defined in the source code:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm22.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2164\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm22.png\" alt=\"llvm22\" width=\"373\" height=\"489\" \/><\/a><\/p>\n<p>Many methods are virtual, and some of them are pure virtual:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm23.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2165\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm23.png\" alt=\"llvm23\" width=\"418\" height=\"499\" \/><\/a><\/p>\n<p>The OOP paradigm is widely used in the llvm\/clang\u00a0source code. What about the generic programming paradigm?<\/p>\n<p><b>2-3 Generic Programming<\/b><\/p>\n<p>C++ provides unique abilities to express the ideas of Generic Programming through templates. Templates provide a form of parametric polymorphism that allows the expression of generic algorithms and data structures. The instantiation mechanism of C++ templates insures that when a generic algorithm or data structure is used, a fully-optimized and specialized version will be created and tailored for that particular use, allowing generic algorithms to be as efficient as their non-generic counterparts.<\/p>\n<p><em>\u00a02-3-1 Generic types:<\/em><\/p>\n<p>Let\u2019s search for all genric types defined in the engine source code:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm25.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2166\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm25.png\" alt=\"llvm25\" width=\"417\" height=\"522\" \/><\/a><\/p>\n<p>Many\u00a0\u00a0types are defined as generic. Let\u2019s search for generic methods:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm27.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2167\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm27.png\" alt=\"llvm27\" width=\"417\" height=\"529\" \/><\/a><\/p>\n<p>Less than 1% of\u00a0 methods are generic,<\/p>\n<p>To resume the llvm\/clang\u00a0source code mix between the three pradigms.<\/p>\n<h2>3- PODs to define the data model<\/h2>\n<p>In object-oriented programming, \u00a0plain old data (POD) \u00a0is a data structure that is represented only as passive collections of field values (instance variables), without using object-oriented features. In computer science, this is known as passive data structure<\/p>\n<p>Let\u2019s search for the POD types in the \u00a0source code<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm28.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2168\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm28.png\" alt=\"llvm28\" width=\"419\" height=\"524\" \/><\/a><\/p>\n<p>More than 1500\u00a0types are defined as POD types, many of them are used to define the compiler\u00a0data model.<\/p>\n<h2>4- Gang Of Four design patterns<\/h2>\n<p>Design Patterns are a software engineering concept describing recurring solutions to common problems in software design. Gang of four patterns are the most popular ones. Let\u2019s discover some of them used in the llvm\/clang\u00a0source code.<\/p>\n<p>4-1 Factory<\/p>\n<p>Using factory is interesting to isolate the logic instantiation and enforces the cohesion , \u00a0here is the list of factories defined in the source code:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm291.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2172\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm291.png\" alt=\"llvm29\" width=\"413\" height=\"548\" \/><\/a><\/p>\n<p>And here\u2019s the list of the abstract ones:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm301.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2171\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm301.png\" alt=\"llvm30\" width=\"412\" height=\"319\" \/><\/a><\/p>\n<p><em>4-3 Observer<\/em><\/p>\n<p>The observer pattern is a software design pattern in which an object maintains a list of its dependents, called observers, and notifies them automatically of any state changes, usually by calling one of their methods.<\/p>\n<p>Only one obserer is defined in the source code:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm31.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2173\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm31.png\" alt=\"llvm31\" width=\"416\" height=\"328\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><em>4-4\u00a0Visitor<\/em><\/p>\n<p>The visitor pattern is the recommended pattern when we need to traverse a structure and do a specific treatement for each node of this structure.<\/p>\n<p>In the llvm\/clang source code, the visitor pattern is widely used:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm32.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2174\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm32.png\" alt=\"llvm32\" width=\"413\" height=\"445\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<h2>5- Coupling and Cohesion<\/h2>\n<p>5-1 Coupling<\/p>\n<p>Low coupling is desirable because a change in one area of an application will require fewer changes throughout the entire application. In the long run, this could alleviate a lot of time, effort, and cost associated with modifying and adding new features to an application.<\/p>\n<p>Low coupling could be acheived by using abstract classes or using generic types and methods.<\/p>\n<p>Let\u2019s search for all abstract classes defined in the \u00a0source code :<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm40.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2177\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm40.png\" alt=\"llvm40\" width=\"412\" height=\"423\" \/><\/a><\/p>\n<p>more than 280\u00a0types are declared as abstract. However the low coupling is also\u00a0enforced by using generic types and generic methods.<\/p>\n<p><strong>Cohesion<\/strong><\/p>\n<p>The single responsibility principle states that a class should not have more than one reason to change. Such a class is said to be cohesive. A high LCOM value generally pinpoints a poorly cohesive class. There are several LCOM metrics. The LCOM takes its values in the range [0-1]. The LCOM HS (HS stands for Henderson-Sellers) takes its values in the range [0-2]. A LCOM HS value highest than 1 should be considered alarming. Here are \u00a0to compute LCOM metrics:<\/p>\n<blockquote><p>LCOM = 1 \u2013 (sum(MF)\/M*F)<br \/>\nLCOM HS = (M \u2013 sum(MF)\/F)(M-1)<\/p><\/blockquote>\n<p>Where:<\/p>\n<ul>\n<li>M is the number of methods in class (both static and instance methods are counted, it includes also constructors, properties getters\/setters, events add\/remove methods).<\/li>\n<li>F is the number of instance fields in the class.<\/li>\n<li>MF is the number of methods of the class accessing a particular instance field.<\/li>\n<li>Sum(MF) is the sum of MF over all instance fields of the class.<\/li>\n<\/ul>\n<p>The underlying idea behind these formulas can be stated as follow: a class is utterly cohesive if all its methods use all its\u00a0methods use all its instance fields, which means that sum(MF)=M*F and then LCOM = 0 and LCOMHS = 0.<\/p>\n<p>LCOMHS value higher than 1 should be considered alarming.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm42.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2178\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm42.png\" alt=\"llvm42\" width=\"416\" height=\"540\" \/><\/a><\/p>\n<p>235 classes are concerned, maybe some classes\u00a0could be refactored to improve their cohesion.<\/p>\n<h2>6- Immutability, Purity and side effect<\/h2>\n<p><em>6-1 Immutable types<\/em><\/p>\n<p>Basically, an object is immutable if its state doesn\u2019t change once the object has been created. Consequently, a class is immutable if its instances are immutable.<\/p>\n<p>There is one important argument in favor of using immutable objects: It dramatically simplifies concurrent programming. Think about it, why does writing proper multithreaded programming is a hard task? Because it is hard to synchronize threads access to resources (objects or others OS resources). Why it is hard to synchronize these accesses? Because it is hard to guarantee that there won\u2019t be race conditions between the multiple write accesses and read accesses done by multiple threads on multiple objects. What if there are no more write accesses? In other words, what if the state of the objects accessed by threads, doesn\u2019t change? There is no more need for synchronization!<\/p>\n<p>Another benefit about immutable classes is that they can never violate LSP (Liskov Subtitution Principle) , here\u2019s a definition of LSP quoted from its wiki page:<\/p>\n<blockquote><p>Liskov\u2019s notion of a behavioral subtype defines a notion of substitutability for mutable objects; that is, if S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program (e.g., correctness).<\/p><\/blockquote>\n<p>Here\u2019s the list of immutable types defined in the source code:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm43.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2179\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm43.png\" alt=\"llvm43\" width=\"415\" height=\"523\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>6-2 purity and side effect<\/p>\n<p>The primary benefit of immutable types come from the fact that they\u00a0eliminate side-effects. I\u00a0couldn\u2019t\u00a0say it better than\u00a0<a href=\"http:\/\/blogs.msdn.com\/wesdyer\/archive\/2007\/03\/01\/immutability-purity-and-referential-transparency.aspx\" target=\"_blank\">Wes Dyer<\/a>\u00a0so I quote him:<\/p>\n<p><em>We\u00a0all know that generally it is not a good idea to use global variables.\u00a0\u00a0This is basically the extreme of exposing side-effects\u00a0(the global scope). Many of the programmers who don\u2019t use global variables\u00a0don\u2019t realize that the same principles apply to fields, properties, parameters,\u00a0and variables on a more limited scale: don\u2019t mutate them unless you have a good\u00a0reason.(\u2026)<\/em><\/p>\n<p><em>One\u00a0way to increase the reliability of a unit is to eliminate the side-effects.\u00a0This makes composing and integrating units together much easier and more\u00a0robust.\u00a0\u00a0Since they are side-effect free,\u00a0they always work the same no matter the environment.\u00a0\u00a0This is called referential transparency.<\/em><\/p>\n<p>Writing your functions\/methods without side effects \u2013 so they\u2019re\u00a0<em>pure functions, i.e. not mutate the object<\/em>\u00a0&#8211; makes it easier to reason about the correctness of your program.<\/p>\n<p>Here\u2019s the list of all methods without \u00a0side-effects<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm45.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2180\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm45.png\" alt=\"llvm45\" width=\"415\" height=\"529\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>More than 100 000 methods are pure.<\/p>\n<h2>7- Implementation quality<\/h2>\n<p>7-1 Too big methods<\/p>\n<p>Methods with many number of line of code are not easy to maintain and understand. Let\u2019s search for methods with more than 60 lines.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM50.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2181\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/LLVM50.png\" alt=\"LLVM50\" width=\"413\" height=\"526\" \/><\/a><\/p>\n<p>The llvm\/clang\u00a0source code contains more than 100 000 methods, so less than 2% could be considered as too big.<\/p>\n<p>7-2 Methods with many parameters<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm51.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2182\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm51.png\" alt=\"llvm51\" width=\"415\" height=\"539\" \/><\/a><\/p>\n<p>Few methods has more than 8 parameters.<\/p>\n<p>7-3 Methods with many local variabless<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm55.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2183\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm55.png\" alt=\"llvm55\" width=\"413\" height=\"520\" \/><\/a><\/p>\n<p>Less than 1% has many local variables.<\/p>\n<p>7-4 Methods too complex<\/p>\n<p>Many metrics exist to detect complex functions, NBLinesOfCode,Number of parameters and number of local variables are the basic ones.<\/p>\n<p>There are other interesting metrics to detect complex functions:<\/p>\n<ul>\n<li>Cyclomatic complexity is a popular procedural software metric equal to the number of decisions that can be taken in a procedure.<\/li>\n<li>Nesting Depth\u00a0is a metric defined on methods that is relative to the maximum depth\u00a0of the more nested scope in a method body.<\/li>\n<li>Max Nested loop is\u00a0equals the maximum level of loop nesting in a function.<\/li>\n<\/ul>\n<p>The max value tolerated for these metrics depends more on the team choices, there\u2019s no standard values.<\/p>\n<p>Let\u2019s search for methods that could be considered as complex in the code base.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm56.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2184\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm56.png\" alt=\"llvm56\" width=\"411\" height=\"510\" \/><\/a><\/p>\n<p>Only 1,5% are \u00a0candidate to be refactored to minimize their complexity.<\/p>\n<h4>7-4 Halstead\u00a0complexity<\/h4>\n<h4><a href=\"http:\/\/en.wikipedia.org\/wiki\/Halstead_complexity_measures\">Halstead complexity<\/a>\u00a0measures are software metrics introduced by Maurice Howard Halstead in 1977. Halstead made the observation that metrics of the software should reflect the implementation or expression of algorithms in different languages, but be independent of their execution on a specific platform. These metrics are therefore computed statically from the code.<\/h4>\n<p>Many metrics was introduced by Halstead, Let\u2019s take as example the TimeToImplement one, which represents the time required to program a method in seconds.<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm60.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2185\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm60.png\" alt=\"llvm60\" width=\"437\" height=\"523\" \/><\/a><\/p>\n<p>2690\u00a0methods require more than one hour to be implemented.<\/p>\n<h2>8- RTTI<\/h2>\n<p>RTTI refers to the ability of the system to report on the dynamic type of an object and to provide information about that type at runtime (as opposed to at compile time). However,\u00a0RTTI become\u00a0controversial within the C++ community. \u00a0Many C++ developers chose to not use this mechanism.<\/p>\n<p>What about the llvm\/clang\u00a0developers team?<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm62.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2186\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm62.png\" alt=\"llvm62\" width=\"425\" height=\"314\" \/><\/a><\/p>\n<p>No method uses the dynamic_cast keyword, The llvm\/clang\u00a0team chose to not use the RTTI mechanism.<\/p>\n<h2>9-\u00a0Exceptions<\/h2>\n<p>Exception handling is also another controversial C++ feature. Many known open source C++ projects not use it.<\/p>\n<p>Let\u2019s search if in the \u00a0source code an exception was thrown.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm64.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2187\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm64.png\" alt=\"llvm64\" width=\"424\" height=\"286\" \/><\/a><\/p>\n<p>As the RTTI, the exceptions mechanism is not used.<\/p>\n<h2>10- Some statistics<\/h2>\n<p>10-1 most popular types<\/p>\n<p>It\u2019s interesting to know the most used types in a project, indeed these types must be well designed, implemented and tested. And any change to them could impact the whole project.<\/p>\n<p>We can find them using the\u00a0<em>TypesUsingMe<\/em>\u00a0metric:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm72.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2189\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm72.png\" alt=\"llvm72\" width=\"485\" height=\"514\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>However There\u2019s another interesting metric to search for popular types: TypeRank.<\/p>\n<p>TypeRank values are computed by applying the Google PageRank algorithm on the graph of types\u2019 dependencies. A homothety of center 0.15 is applied to make it so that the average of TypeRank is 1.<\/p>\n<p>Types with high TypeRank should be more carefully tested because bugs in such types will likely be more catastrophic.<\/p>\n<p>Here\u2019s the result of all popular types according to the TypeRank metric:<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm70.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2188\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm70.png\" alt=\"llvm70\" width=\"429\" height=\"527\" \/><\/a><\/p>\n<p>10-2 Most popular methods<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm74.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2190\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm74.png\" alt=\"llvm74\" width=\"481\" height=\"505\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>10-3 Methods calling many other methods<\/p>\n<p>It\u2019s interesting to know\u00a0the methods using many other ones, It could reveal a design problem in these methods.\u00a0And in some cases a refactoring is needed to make them more readable and maintanble.<\/p>\n<p><a href=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm73.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2191\" src=\"http:\/\/www.codergears.com\/Blog\/wp-content\/uploads\/llvm73.png\" alt=\"llvm73\" width=\"485\" height=\"525\" \/><\/a><\/p>\n<h2>Summary<\/h2>\n<p>The llvm\/clang is very well designed and implemented, and as any other project, some refactoring could be acheived to improve it. In this post we discovered some minor possibe changes to do in the source code. Don\u2019t hesitate to explore it\u2019s source code to improve your C++ skills.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It\u2019s proven that Clang is a mature compiler For C and C++ as GCC and Microsoft compilers, but what makes it special is the fact that it\u2019s not just a compiler. It\u2019s also an infrastructure to build tools. Thanks to its\u00a0library based architecture which makes the reuse and integration of functionality provided more flexible\u00a0and easier &hellip; <a href=\"https:\/\/cppdepend.com\/blog\/lessons-to-learn-from-the-clangllvm-codebase\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Lessons to learn from the CLang\/LLVM codebase&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[25,3,26],"class_list":["post-92","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-clang","tag-cplusplus","tag-llvm"],"_links":{"self":[{"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/posts\/92","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/comments?post=92"}],"version-history":[{"count":3,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/posts\/92\/revisions"}],"predecessor-version":[{"id":95,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/posts\/92\/revisions\/95"}],"wp:attachment":[{"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/media?parent=92"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/categories?post=92"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cppdepend.com\/blog\/wp-json\/wp\/v2\/tags?post=92"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}