A picture is worth a thousand words: Visualize your C/C++ Projects

A picture is worth a thousand words” is an English idiom. It refers to the notion that a complex idea can be conveyed with just a single still image or that an image of a subject conveys its meaning or essence more effectively than a description does.

This idiom could also be applied in software programming. Indeed you can easilly understand a mini project when exploring its source code. However a big project become complex and not easy to understand.  In such cases it’s better to visualize the source code using graphs and diagrams to assist the developers understanding the source code.

Let’s discover some useful diagrams  provided by CppDepend to understand the code base.

I-Treemap diagram:

Treemapping is a method for displaying tree-structured data by using nested rectangles. The tree structure used in CppDepend treemap is the usual code hierarchy:

  • C/C++ projects contain namespaces,
  • Namespaces contain types,
  • Types contains methods and fields.

With treemap rectangles represent code elements. The option level determines the kind of code element represented by unit rectangles. The option level can take the 5 values: project, namespace, type, method and field. The two screenshots below shows the same code base represented through type level on left, and namespace level on right. 



The option Code Metric of the treemap determines the size of rectangles. For example if level is set to type and metric is set tonumber of Lines of Code, each unit rectangle represents a type, and the size of a unit rectangle is proportional to the number of Lines of Code of the corresponding type. 

If a CQLinq query is currently edited, the set of code elements matched by the query is shown on the treemap as a set of blue rectangles. In the screenshot below, a CQLinq query matches the 200 largest methods. Here the treemap level is method and the metric is number of lines of code. As expected, we can see that blue rectangles represent the 200 largest unit rectangles of the treemap. 

The screenshot below also shows that the currently pointed code element (here the project XML) is shown as a red rectangles on the treemap. 



Common Scenarios

By choosing an appropriate combination of metric and level values, the Metric View helps see patterns that would be hard to spot with other ways.

Too Big, Too Complex

CppDepend provides many code metrics to spot too big and too complex code. Method with too many parameters, too many variables, too many lines of code, too high cyclomatic complexity etc… should be banished. The same way types with too many lines of code should be banished too.

On the screenshot below, the treemap level is set to type and the metric is number of lines of code. Large rectangles represent large types of the code base. Hovering a rectangle with the mouse indicates the type metric values, here the type number of lines of code. Clearly, code treemaping helps not only spotting too large and too complex code elements, but also, it helps comparing their respective size and complexity.

Flaws Localization

In the screenshot below we are asking with a CQLinq query, for methods with too many parameters and too many variables. Blue rectangles show matched methods on the treemap. Here the interesting point is that some matched methods seem to be grouped. Because treemap is a hierarchical view, if a set of code elements is grouped, it means that the code elements belong to the same parent. And indeed, here the 12 methods grouped belong to the same type.

Spotting the parent types of these 12 methods would have been hard without treemaping. Now the code quality reviewer can focus his attention on this region that seems to contain more flaws than other part of the code base.

Top-Down Code Exploration

The CppDepend treemap metric view supports zoom-in and zoom-out. This makes easy to zoom on a particular project, namespace or class. This can be especially useful to explore and compare the volume of components in terms of lines of code.

By no mean productivity should be measured in terms of lines of code. However counting lines of code has been proven to be a useful metric to accurately estimate software, given a certain context of an organization. The code base features are somehow partitioned into assemblies, namespaces and types. With treemaping, these artifacts become rectangles that sit side-by-side. Rectangles areas, hence features weight, can be compared visually. Being able to periodically explore code size through treemaping is a unique way to build an accurate sense of feature weight and feature cost.

Do the experiment to visualize treemap on a code base you know well, and you’ll be surprised to discover unexpected feature size.

Code Structure Observations

Choosing to use the Metric View with a metric different than volume (lines of code, number of parameters for methods…) or complexity, can reveal interesting observations.

The ranking metric is a code metric that measure the popularity of a type or a method in a code base. Using treemaping with the ranking metric gives a clear view of where popular types are declared. This quickly gives information about what is important in the code base.

The screenshot below shows types of the Microsoft DLR (Dynamic language Runtime) code base treemaped with the ranking metric. The treemap indicates that CallSite and Expression are popular concepts of the DLR, and this is indeed the case.

The same kind of interesting Code Structure Observations can be made with code metrics such as Afferent/Efferent coupling or Level.

 II- Dependency Graph

CppDepend offers a wide range of facilities to help user exploring an Existing Code Architecture using the dependency grapj. here are some most popular Code Exploration scenarios:

Call Graph

CppDepend can generate any call graph you might need with a two steps procedure.

  • First: Ask for direct and indirect callers/callees of a type, a field, a method, a namespace or a project. The effect is that the following CQLinq query is generated to match all callers or callees asked.
  • Notice that, in the CQLinq query result, the metric DepthOfIsUsing/DepthOfIsUsedBy shows depth of usage (1 means direct, 2 means using a direct user etc…). The CQLinq query can easily be modified to only match indirect callers/callees with a certain condition on depth of usage.Notice also that callers/callees asked are not necessarily of the same kind of the concerned code element. For example here we ask for methods that are using directly or indirectly a type.
  • Second: Once the CQLinq query matches the set of callers/callees that the user wishes, the set of matches result can be exported to the Dependency Graph. This has for effect to show the call graph wished.

Class Inheritance Graph

To display a Class of Inheritance Graph, the same two steps procedure shown in the precedent section (on generating a Call Graph) must be applied.

  • First: Generate a CQLinq query asking for the set of classes that inherits from a particular class (or that implement a particular interface). Here, the following CQLinq query is generated:
  • Second: Export the result of the CQLinq query to the Dependency Graph to show the inheritance graph wished.

Coupling Graph

It might be needed to know which code elements exactly are involved in a particular dependency. Especially when one needs to anticipate the impact of a structural change. In the screenshoot below, the CppDepend Info panel describes a coupling between 2 projects.

From pointing a cell in the dependency matrix, it says that X types of an project A are using Y types of an project B. Notice that you can change the option Weight on Cell to # methods# members or # namespaces, if you need to know the coupling with something else than types.

Just left clicking the matrix cell shows the coupling graph below.

A coupling graph can also be generated from an edge in the dependency graph. Here, you can adjust the option Edge Thickness to something else than # type.

Path Graph

If you wish to dig into a path or a dependency cycle between 2 code elements, the first thing to do is to show the dependency matrix with the option Weight on Cells: Direct & indirect depth of use.

Matrix blue and green cells will represent paths while black cells will represent dependency cycles. For example, here, the Info panel tells us that there is a path of minimal length 7 between the 2 types involved.

Just left clicking the cell shows the path graph below.

All Paths Graph

In certain situations, you’ll need to know about all paths from a code element A to a code element B. For example, here, the Info panel tells us that there is a path of minimal length 2 between the 2 types involved.

Finally exporting to the graph the 12 types matched by the CQLinq query, shows all paths from A to B.

Cycle Graph

As we explained in the previous section, to deal with dependency cycle graphs, the first thing to do is to show the dependency matrix with the option Weight on Cells: Direct & indirect depth of use. Black cells then represent cycles.

For example, here, the Info panel tells us that there is a dependency cycle of minimal length 5 between the 2 types involved.

Just left clicking the cell shows the cycle graph below.

We’d like to warn that obtaining a clean ’rounded’ dependency cycle as the one shown above, is actually more an exceptional situation than a rule.

Often, exhibiting a cycle will end up in a not ’rounded’ graph as the one shown below. In this example, the minimal length of a cycle between the 2 types involved (in yellow) is 12. Count the number of edges crossed from one yellow type to the other one, and you’ll get 12. You’ll see that some edges will be counted more than once.

III – Dependency Structure Matrix 

Large Graph visualized with Dependency Structure Matrix

Here, we’d like to underline the fact that when the dependency Graph becomes unreadable, it is worth switching to the dependency Matrix.
Both dependency Graph and dependency Matrix co-exist because:

  • Dependency Graph is intuitive but becomes unreadable as soon as there are too many edges between nodes.
  • Dependency Matrix requires time to be understood, but once mastered, you’ll see that the Dependency Matrix is much more efficient than the Dependency Graph to explore an existing architecture. More information on the Dependency Matrix readability can be found in Identify Code Structure Patterns at a Glance

To illustrate the point, find below the same dependencies between 77 namespaces shown through Dependency Graph and Dependency Matrix.

The DSM (Dependency Structure Matrix) is a compact way to represent and navigate across dependencies between components. For most engineers, talking of dependencies means talking about something that looks like that:

DSM is used to represent the same information than a graph.

  • Matrix headers’ elements represent graph boxes
  • Matrix non-empty cells correspond to graph arrows.

As a consequence, in the snapshot below, the coupling from Net to Foundation is represented by a non empty cell in the matrix and by an arrow in the graph.

Why using two different ways, graph and DSM, to represent the same information? Because there is a trade-off:

  • Graph is more intuitive but can be totally not understandable when the numbers of nodes and edges grow (a few dozens boxes can be enough to produce a graph too complex)
  • DSM is less intuitive but can be very efficient to represent large and complex graph. We say that DSM scalescompare to graph.

Once one understood DSM principles, typically one prefers DSM over graph to represent dependencies. This is mainly because DSM offers the possibility to spot structural patterns at a glance. This is explained in the second half of the current document.

CppDepend offers Context-Sensitive Help to educate the user about what he sees on DSM. CppDepend’s DSM relies on a simple 3 coloring scheme for DSM cell: Blue, Green and Black. When hovering a row or a column with the mouse, the Context-Sensitive Help explains the meaning of this coloring scheme:

A non-empty DSM Cell contain a number. This number represent the strengths of the coupling represented by the cell. The coupling strength can be expressed in terms of number of members/methods/fields/types or namespaces involved in the coupling, depending on the actual value of the option Weight on Cells. In addition to the Context-Sensitive Help, the DSM offers as well a Info Panel that explains coupling with a plain-english description:

CppDepend’s DSM comes with numerous options to try:

  • It has numerous facilities to dig into dependency exploration (a parent column/row can be opened, cells can be expanded…)
  • It can deal with squared symmetric DSM and rectangular non-symmetric DSM
  • Horizontal and Vertical headers can be bound, to constantly have a squared symmetric matrix
  • It comes with the option Indirect usage, where cell shows direct and indirect usage
  • The vertical header can contains tier code elements

It is advised to experience all these features by yourself, by analyzing dependencies into your code base.

Identify Code Structure Patterns on Matrix

As explained in the introduction, DSM comes with the particularity to offer easy identification of popular Code Structure Patterns. Let’s present most common scenarios:

Layered Code

One pattern that is made obvious by a DSM is layered structure (i.e acyclic structure). When the matrix is triangular, with all blue cells in the lower-left triangle and all green cells in the upper-right triangle, then it shows that the structure is perfectly layered. In other words, the structure doesn’t contain any dependency cycle.

On the right part of the snapshot, the same layered structure is represented with a graph. All arrows have the same left to right direction. The problem with graph, is that the graph layout doesn’t scale. Here, we can barely see the big picture of the structure. If the number of boxes would be multiplied by 2, the graph would be completely un-readable. On the other side, the DSM representation wouldn’t be affected; we say that DSM scales better than graph.

Side note: Interestingly enough, most of graph layout algorithms rely on the fact that a graph is acyclic. To compute layout of a graph with cycles, these algorithms temporarily discard some dependencies to deal with a layered graph, and then append the discarded dependencies at the last step of the computation. 

Dependency Cycle

If a structure contains a cycle, the cycle is displayed by a red square on the DSM. We can see that inside the red square, green and blue cells are mixed across the diagonal. There are also some black cells that represent mutual direct usage (i.e A is using B and B is using A).

The CppDepend’s DSM comes with the unique option Indirect Dependency. An indirect dependency between A and B means that A is using something, that is using something, that is using something … that is using B. Below is shown the same DSM with a cycle but in indirect mode. We can see that the red square is filled up with only black cells. It just means that given any element A and B in the cycle, A and B are indirectly and mutually dependent.

Here is the same structure represented with a graph. The red arrow shows that several elements are mutually dependent. But the graph is not of any help to highlight all elements involved in the parent cycle.

Notice that in CppDepend, we provided a button to highlight cycles in the DSM (if any). If the structure is layered, then this button has for effect to triangularize the matrix and to keep non-empty cells as closed as possible to the diagonal.

High Cohesion – Low Coupling

The idea of high-cohesion (inside a component) / low-coupling (between components) is popular nowadays. But if one cannot measure and visualize dependencies, it is hard to get a concrete evaluation of cohesion and coupling. DSM is good at showing high cohesion. In the DSM below, an obvious squared aggregate around the diagonal is displayed. It means that elements involved in the square have a high cohesion: they are strongly dependent on each other although. Moreover, we can see that they are layered since there is no cycle. They are certainly candidate to be grouped into a parent artifact (such as a namespace or an assembly).

On the other hand, the fact that most cells around the square are empty advocate for low-coupling between elements of the square and other elements.

In the DSM below, we can see 2 components with high cohesion (upper and lower square) and a pretty low coupling between them.

While refactoring, having such an indicator can be pretty useful to know if there are opportunities to split coarse components into several more fine-grained components.

Too Many Responsibilities

The Single Responsibility Principle (SRP) is getting popular amongst software architects community nowadays. The principle states that: a class shouldn’t have more than one reason to change. Another way to interpret the SRP is that a class shouldn’t use too many different other types. If we extend the idea at other level (assemblies, namespaces and method), certainly, if a code element is using dozens of other different code elements (at same level), it has too many responsibilities. Often the term God class or God component is used to qualify such piece of code.

DSM can help pinpoint code elements with too many responsibilities. Such code element is represented by columns with many blue cells and by rows with many green cells. The DSM below exposes this phenomenon.

Popular Code Elements

A popular code element is used by many other code elements. Popular code elements are unavoidable (think of theString class for example) but a popular code element is not a flaw. It just means that in every code base, there are some central concepts represented with popular classes

A popular code element is represented by columns with many green cells and by rows with many blue cells. The DSM below highlights a popular code element.

Something to notice is that when one is keeping its code structure perfectly layered, popular components are naturally kept at low-level. Indeed, a popular component cannot de-facto use many things, because popular component are low-level, they cannot use something at a higher level. This would create a dependency from low-level to high-level and this would break the acyclic property of the structure.

Mutually Dependent

You can see the coupling between 2 components by right clicking a non-empty cell, and select the menu Open this dependency.

If the opened cell was black as in the snapshot above (i.e if A and B are mutually dependent) then the resulting rectangular matrix will contains both green and blue cells (and eventually black cells as well) as in the snapshot below.

In this situation, you’ll often notice a deficit of green or blue cells (3 blue cells for 1 green cell here). It is because even if 2 code elements are mutually dependent, there often exists a natural level order between them. For example, consider the System.Threading namespaces and the System.String class. They are mutually dependent; they both rely on each other. But the matrix shows that Threading is much more dependent on String than the opposite (there are much more blue cells than green cells). This confirms the intuition that Threading is upper level than String.

Learn basic “C” coding rules from open source projects

Every project has its own style guide: a set of conventions about how to write code for that project. Some managers choose a basic coding rules, others prefer very advanced ones and for many projects there is no coding rules, and each developer uses his own style.

It is much easier to understand a large codebase when all the source code is in a consistent style.

Many resources exist talking about the better coding rules to adopt, we can learn good coding rules from:

  • Reading a book or a magazine.
  • Web sites.
  • From a collegue.
  • Doing a training.

Another more interesting approach is to study a known and mature open source project to discover how developers implement their source code. In the “C” World, the Linux kernel could be a good candidate.

For the beginner or even the intermediate C developers, the Linux kernel is maybe not easy to go inside, however the goal is not necessarily to contribute to its source code but to explore how it’s implemented.

Let’ take as example a function implementation from the Linux source code

linux11

The code looks very clean, indeed the function

  • Has only few lines of code.
  • The signature is well defined.
  • It’s well commented.
  • It’s well indented.
  • The variable names are very clear.

The same  function could be implemented by another developer like this

linux12

The coding style has a big impact in the source code readability, investing some hours to train developers, and doing periodically a code review is always good to make the code easy to maintain and evolve.

Let’s go inside the linux kernel source code using CppDepend and discover some basic coding rules adopted by their developers.

Modularity

Modularity is a software design technique that increases the extent to which software is composed from separate parts , you can manage and maintain modular code easily.

For procedural language like C where no logical artifacts like namespace, component or class does not exist, we can modularize by using directories and files.

Here are some possible scenarios :

  • Put all the souce files in one directory
  • Isolate files related to a module or a sub module  into a specific directory.

In case of the Linux kernel, directories and sub directories are used to modularize the kernel source code.

linux15

Encapsulation

Encapsulation  is the hiding of functions and data which are internal to an implementation.  In C, encapsulation is performed by using the key word static . These entities are called file-scope functions and variables.

Let’s search for all static functions by executing the following CQLinq query

linux17

We can use the Metric view to have a good idea how many functions are concerned. In the Metric View, the code base is represented through a Treemap. Treemapping is a method for displaying tree-structured data by using nested rectangles. The tree structure used in a CppDepend treemap is the usual code hierarchy:

  • Projects contains directories.
  • Directories contains files.
  • Files contains struects, functions and variables.

The treemap view provides a useful way to represent the result of a CQLinq request, so we can visually see the types concerned by the request.

linux2

As we can observe many functions are declared as static.

Let’s search now for the static fields:

linux3

The same remark as functions, many variables are declared as static.

In the Linux kernel source code the encapsulation is used whenever the functions and variables must be private to the file scope.

Use structs to store your data model

In C programing the functions uses variables to acheive their treatments, theses variables could be:

  • Static variables.
  • Global variables.
  • Local variables
  • Variables from structs.

Each project has it’s data model which could be used by many source files, using global variables is a solution but not the good one, using structs to group data is more recommended.

Let’s search for global variables with a primitive type:

linux4

only very few variables are concerned, and maybe we can group some of them into structs, like (elfcorehdr_addr and elfcorehdr_size) or (pm_freezing and pm_nosig_freezing).

Let function be short and sweet

Here’s from the linux coding style web page, an advice about the length of functions:

Functions should be short and sweet, and do just one thing.  They should
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
as we all know), and do one thing and do that well.

The maximum length of a function is inversely proportional to the
complexity and indentation level of that function.  So, if you have a
conceptually simple function that is just one long (but simple)
case-statement, where you have to do lots of small things for a lot of
different cases, it's OK to have a longer function.

 Let’s search for functions where the number of lines of code is more than 30

linux14

Only few methods has more than 30 lines of code.

Function Number of parameters

Functions where NbParameters > 8 might be painful to call  and might degrade performance. Another alternative is to provide  a structure dedicated to handle arguments passing.

linux7

only 2 methods has more than 8 parameters.

Number of local variables

Methods where NbVariables is higher than 8 are hard to understand and maintain. Methods where NbVariables is higher than 15 are extremely complex and should be split in smaller methods (except if they are automatically generated by a tool).

linux9

only 5 functions has more than 15 local variables.

Avoid defining complex functions

Many metrics exist to detect complex functions, NBLinesOfCode,Number of parameters and number of local variables are the basic ones.

There are other interesting metrics to detect complex functions:

  • Cyclomatic complexity is a popular procedural software metric equal to the number of decisions that can be taken in a procedure.
  • Nesting Depth is a metric defined on methods that is relative to the maximum depth of the more nested scope in a method body.
  • Max Nested loop is equals the maximum level of loop nesting in a function.

The max value tolerated for these metrics depends more on the team choices, there’s no standard values.

Let’s search for functions candidate to be refactored:

linux8

only very few functions could be considered as complex.

Naming convention

There’s no standard for the naming convention, each project managers could choose what they think it’s better, however what’s very important is to respect the chosen convention to have an homegenous naming.

For example in case of Linux, the structs must began with a lower case, and we can check if it’s true for the whole kernel source code, let’s execute the following query:

linux5

 

Only 4 structs began with “_” instead of a lower case letter.

 Indentation

The indentation is very useful to make the code easy to read, here’s from the linux coding style web page  the motivations behind the indentation:

Rationale: The whole idea behind indentation is to clearly define where
a block of control starts and ends.  Especially when you've been looking
at your screen for 20 straight hours, you'll find it a lot easier to see
how the indentation works if you have large indentations.

Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen.  The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.

 Conclusion

Exploring some known open source projects is always good to elevate your programming skills, no need to download and build the project, you can just discover the code from GitHub for example.

 

Learn from Folly source code the new C++11 features.

Two years ago Facebook released their C++ library named Folly , it’s a large collection of reusable C++ library components that internally at Facebook are used extensively.

But many mature C++ open source libraries exist, why introduce another one ? Here’s the motivations from their website behind  its utility:

Folly (acronymed loosely after Facebook Open Source Library) is a library of C++11 components designed with practicality and efficiency in mind. It complements (as opposed to competing against) offerings such as Boost and of course std. In fact, we embark on defining our own component only when something we need is either not available, or does not meet the needed performance profile.

Continue reading “Learn from Folly source code the new C++11 features.”

Learn from the Doxygen source code how to optimize the memory usage.

When the processes running on your machine attempt to allocate more memory than your system has available, the kernel begins to swap memory pages to and from the disk. This is done in order to free up sufficient physical memory to meet the RAM allocation requirements of the requestor.

Excessive use of swapping is called thrashing and is undesirable because it lowers overall system performance, mainly because hard drives are far slower than RAM.
Continue reading “Learn from the Doxygen source code how to optimize the memory usage.”

Inside the Unreal Engine 4.5 source code

The Unreal Engine is a game engine developed by Epic Games, first showcased in the 1998 first-person shooter game Unreal. Although primarily developed for first-person shooters, it has been successfully used in a variety of other genres, including stealthMMORPGs, and other RPGs.

Its code is written in C++ and  it’s used by many game developers today. Its source code is available from GitHub and it’s free for students. Many amazing games were developed using this engine, it permits to produce very realistic rendering like this one. Continue reading “Inside the Unreal Engine 4.5 source code”

Doom3 is the proof that “keep it simple” works.

If you search on the web for the best C++ source code. The Doom3 source code is mentioned many times, with testimonials  like this one.

I spent a bit of time going through the Doom3 source code. It’s probably the cleanest and nicest looking code I’ve ever seen.

Doom 3 is a video game developed by id Software and published by Activision.The game was a  commercial success for id Software; with more than 3.5 million copies of the game were sold. Continue reading “Doom3 is the proof that “keep it simple” works.”

Make the most of the C/C++ static analysis tools

Static code analysis is the process of detecting flaws in software’s source code.  The static analysis tools are useful to detect common coding mistakes; here are some benefits from using them:

  • Make the code source more readable and maintainable.
  • Prevent unexpected behavior in execution.
  • Optimize the execution.
  • Make the code more secure.

Many C/C++ static analysis tools exist right there, each one focus on a specific area and has its advantages, we can enumerate:

  • CppCheck 
  • Clang Analyzer
  • Visual C++ Analyzer
  • VERA++
  • Goanna
  • Viva64
  • PCLint

Continue reading “Make the most of the C/C++ static analysis tools”

SQLite: The art of keep it simple.

16 years after its first checkin, SQLite is the most widely deployed database engine in the world. such open source project is a good candidate to learn how to make your code easy to understand and to maintain.

Let’s discover some facts about the SQLite code base, for that let’s begin with the following code snippet: Continue reading “SQLite: The art of keep it simple.”