The refactoring overhead costs of the C++ mechanisms over C

Few years ago when Linus Trovalds criticized C++ and told:

inefficient abstracted programming models where two years down the road you notice that some abstraction wasn’t very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.

Many C++ developers does not appreciate this opinion. However we must admit that each language facility comes with a price and it’s better to know it than to think that my best programming language is perfect.

Keep in mind that no language, technology or OS is perfect, but knowing its limitations, drawbacks and disadvantages could help to make its use perfect.

For this purpose we will analyze the Git source code and discover some design facts. Git is a distributed revision control and source code management (SCM) system with an emphasis on speed. Git was initially designed and developed by Linus Torvalds for Linux kernel development; it has since been adopted by many other projects.

Let’s compare the costs of the refactoring of some C++ OOP mechanisms over the C codebase.

Modularity: Namespace vs Directory

Modularity is a software design technique that increases the extent to which software is composed from separate parts, you can manage and maintain modular code easily.

We can modularize a project with two approaches:

  • Physically: by using directories and files, this modularity is provided by the operating system and can be applied to any language.
  • Logically: by using namespaces, component and classes, this technique depends on the language capabilities.

When we develop with C the code is structured by using directories to isolate the modules, here’s for Git the dependency graph between some of its directories.

However for C++ instead of C we can use namespaces to modularize the code, theses types are provided by the language, and for the previous graph we can use namespaces to modularize our code instead of directories.

Pros and cons of the C++ approach:

Easy to understand : The logical approach is better because the modularity is well defined by the language artifacts, and just reading the code we can know in which module a code element exist.

Managing changes: a good design need in general many iterations, and for the physical approach the impact of design changes can be very limited than the logical one, indeed we need only to move function or variable from a file to another, or move file from directory to another. However for C++ it can impact a lot of code because the logical modularity is implemented by the language artifacts and a code modification is needed.

Encapsulation:Class vs File

For C++ the encapsulation is defined as the process of combining data and functions into a single unit called class. Using the method of encapsulation, the programmer cannot directly access the data. Data is only accessible through the functions present inside the class.

For C we can have an encapsulation, but using also a physical approach like described in the modularity section, and a class can be a file containing functions and data used by them, and we can limit the accessibility of functions and variables by using “static” keyword.

Git use this technique to hide functions and variables, to discover that let’s search for static function:

from m in Methods where m.IsStatic select m

The treemap is very useful to have a good idea of code elements concerned by a CQLinq query, the blue rectangles represent the result.


Almost all functions are declared as static to be visible only in the translation unit where there are declared, the same remark could be applicable for variables.

from f in Fields where f.IsStatic select f


Easy to Understand:Using C++ encapsulation mechanism improve the understanding and visibility of code.

Managing changes:If we have to change the place where variable or function are encapsulated, it can very easy for C, but for C++ it can impact a lot of code.

Polymorphism vs Selection idiom

Polymorphism means that some code or operations or objects behave differently in different contexts.

This technique is very used in C++ projects, but what about C?

For procedural languages the selection technique is adopted by using the keywords “switch”, “if” or maybe “goto”, but this technique tend to increase the cyclomatic complexity of the code.

Let’s search for complex function inside Git code source.


Even Git is well developed, but many functions could be considered complex, it’s due to overusing of control flow instructions like “if”, “switch” or “goto”, with C++ however we can use polymorphism and to minimize the complexity of the code.

Easy to understand: Using Polymorphism permits the isolation of a specific behavior to a class, it improves the visibility and the cohesion of the code.

Managing changes: Adding another behavior with polymorphism can implies the adding of another class, however with selection idiom, you can add only another case under the switch statement.

Inheritance vs Composition

Git uses essentially structs to define data manipulated by functions. Let’s search for all structs used:

from t in Types where t.IsStructure select t


What’s interesting is that almost all data are isolated inside structs, and to verify that we can search for all not const public variables that are primitives and not inside a struct:

from f in Fields where f.IsPublic && f.IsPrimitiveType
&& !f.IsStatic && !f.IsConst
select f

Only some variables are concerned what’s a good point for Git design.

So what about extending a struct, with C we can use the composition like the case of “remote” struct, where many structs reference it.

However for C++ we can use also inheritance to extend structs, for example known_remote struct could inherit from remote one.

Easy to understand: using inheritance can improve the understanding of data, but we have to be careful when using it, its used only for the “Is” relation.

Managing changes: Inheritance implies a high coupling so any changes can impact a lot of code.

Conclusion:

C++ provides a better possibilities to have a beautiful and well structured code, but it comes with a price, any changes or refactoring could be difficult.

But doing refactoring need to understand the existing code before making changes, C programs are more difficult to understand, but easy to change.

How we can limit the impact of changes for C++?

The good solution to limit the impact of changes is to use patterns, specially loose coupling and high cohesion concepts to isolate changes only in a specific place, Irrlicht as explained in the previous post is a good example of using loose coupling.