Detect the obfuscated names in a C/C++ project

How many times do you already discover a code like this:

obfuscatednames

Maybe in some cases it’s not a big issue to have such code. But if this coding habit is used many times by the developers, it will cost a lot to the company. Each new comer who needs to debug or add a new feature will spent a lot of time to understand the existing codebase.

How to detect these obfuscated names to refactor them?

1- Using the clang-query and the  AST Matchers

Clang’s LibASTMatchers is a powerful library to match nodes of the AST and execute code that uses the matched nodes. Combined with LibTooling, LibASTMatchers helps to write code-to-code transformation tools or query tools.

The clang-query tool is based on LibASTMatchers and provides an easy way to query the AST nodes of a specific source file.

For example if we want to detect the functions where the name contains only one charachter, we can execute this ast matcher:

functionDecl(matchesName("^[a-zA-Z]$"))

The AST Machers are a powerful way to query the codebase, here’s a quick definition from the AST Matchers documentation.

AST matchers are predicates on nodes in the AST. Matchers are created by calling creator functions that allow building up a tree of matchers, where inner matchers are used to make the match more specific.

For example, to create a matcher that matches all class or union declarations in the AST of a translation unit, you can call recordDecl(). To narrow the match down, for example to find all class or union declarations with the name “Foo”, insert a hasName matcher: the call recordDecl(hasName("Foo")) returns a matcher that matches classes or unions that are named “Foo”, in any namespace. By default, matchers that accept multiple inner matchers use an implicit allOf(). This allows further narrowing down the match, for example to match all classes that are derived from “Bar”: recordDecl(hasName("Foo"), isDerivedFrom("Bar")).

2- Using CppDepend and CQLinq

CppDepend comes with CQLinq, a code query language to query your codebase.

CQLinq defines a few predefined domains to query on including: Types ; Methods ; Fields ; Namespaces ; Projects

These domains enumerate not only all code elements of the code base queried, but also all third-party code elements used by the code base (like for example the type string and all methods and fields of the type string that are used by the code base).

The syntax is as simple as:

from m in Methods where m.NbLinesOfCode > 30 select m

And to detect the functions where the names contains only one character we can execute the following CQLinq query:

Untitled2

Conclusion:

Do not underestimate the importance of the symbol names, it helps to make the codebase easy to understand and maintains. It’s very interesting to automate your build process and detect the bad named symbols to improve your codebase code quality.