Hacking on Clang to demystify the temporary objects

It is sometimes necessary for the C++ compiler to create temporary objects.They are used during:

  • Reference initialization.
  • Evaluation of expressions including standard type conversions.
  • Argument passing.
  • Function returns.
  • Evaluation of the throw expression.

For non-trivial classes, the creation and destruction of temporary objects can be expensive in terms of processing time and memory usage. In this case, you should minimize their introduction.  The C++ compiler does eliminate some temporary objects, but it cannot eliminate all of them.

It’s not always easy to detect where the temporary objects are introduced, as Herb Sutter explains in his sample. The compiler has this information, but it’s possible to get the places where the temporary objects are introduced from the compiler? And if it’s not the case, is it easy to modify it and report them?

In our case we will use Clang, it’s very flexible and provides many ways to customize its behavior. Indeed a major design concept for clang is its use of a library-based architecture. In this design, various parts of the front-end can be cleanly divided into separate libraries which can then be mixed up for different needs and uses. In addition, the library-based approach encourages good interfaces and makes it easier for new developers to get involved (because they only need to understand small pieces of the big picture).

The Clang compiler has three phase:

  • The front end that parses source code, checking it for errors, and builds a language-specific Abstract Syntax Tree (AST) to represent the input code.
  • The optimizer: its goal is to do some optimization on the AST generated by the front end.
  • The back end: that generate the final code to be executed by the machine, it depends on the target.

In our case we will focus more on the front end phase, the goal is to get the Abstart Syntax Tree (AST) for a source code, and check if some useful data about the temporary objects are reported.

Let’s  explore the AST of  this minimal source code :

ast2

To generate the AST we can execute the clang front end parser using the -cc1 switch.

clang -cc1 -ast-dump test.cpp

And here’s the AST generated for the GetTest function:

ast1

 

In this AST two information are related to the temporary objects: nrvo and elidable.

Named Return Value Optimization is a compiler optimization technique that involves eliminating the temporary object created to hold a function’s return value. NRVO eliminates the copy constructor and destructor of a stack-based return value. This optimizes out the redundant copy constructor and destructor calls and thus improves overall performance. For more details, you can refer to its wiki page.

And the copy elision refers to a compiler optimization technique that eliminates unnecessary copying of objects. For more details, you can refer to its wiki page.

It’s interesting that the clang tell us where a nrvo is applied, but it’s not reporting explicitly where the temporary objects are created. Let’s go inside the AST dumper source code and try to report them.

How ASTDumper works?

The compiler parses a program and represents the parsed program as an abstract syntax tree (AST). The AST has many different kinds of nodes, such as Assignment, Variable Reference, and Arithmetic Expression nodes. After generating the AST, Clang invokes some front end actions that traverse it and do some treatments, the ASTDumpAction is one of them, it permits to dump the AST in the console.

ASTDumper is declared like this

  class ASTDumper
      : public ConstDeclVisitor, public ConstStmtVisitor,
        public ConstCommentVisitor 

The visitor pattern is the recommended pattern when we need to traverse a structure and do a specific treatment for each node of this structure.

Here are some methods invoked when the AST is traversed, each one is related to a specific AST node.

void VisitNamespaceDecl(const NamespaceDecl *D);
void VisitUsingDirectiveDecl(const UsingDirectiveDecl *D);
void VisitNamespaceAliasDecl(const NamespaceAliasDecl *D);
void VisitTypeAliasDecl(const TypeAliasDecl *D);
void VisitTypeAliasTemplateDecl(const TypeAliasTemplateDecl *D);
void VisitCXXRecordDecl(const CXXRecordDecl *D);
void VisitStaticAssertDecl(const StaticAssertDecl *D);

Hacking the Clang AST dumper

Our goal is to report the temporary objects created, so we have to track the object creation and identify which AST node is concerned by the object construction.

In case of clang the VisitCXXConstructExpr is invoked when a C++ object must be created, here’s its implementation:

ast5

As we can see there’s no test if the object created is temporary or not, but the good news is that CXXConstructExpr has the method IsTemporaryObject which determine whether the result of this expression is a temporary object of the given class type.

Let’s change the implementation to add another condition:

ast6

Here’s the new AST printed after the modification

ast3

After adding few lines of code, the ASTDumper report now explicitly where the temporary objects are created.

Conclusion

The duo LLVM/Clang is not just a compiler, but a powerful infrastructure to develop your own C/C++/Objective-C tools, it’s not difficult to understand how it works, and easy to customize as you like. Don’t hesitate to download the Clang source code, do some modifications and rebuild it, it will help special students to know how compilers work.