Exploring SQLite Codebase: Improve C++ Skills

16 years after its first checkin, SQLite is the most widely deployed database engine in the world. An open source project such as this is a good candidate for learning how to make your code easy to understand and to maintain.

Let’s discover some facts about the SQLite code base, beginning with the following code snippet:

sqlite

Here are some remarks concerning this function:

  • The function is declared as static.
  • The function return an error code.
  • The function has few parameters.
  • The function exit as early as possible.
  • Assert technique is used to check some conditions.
  • No global variable used.
  • The  variable naming is easy to understand.
  • The method is short.
  •  No extra comments in the body.
  • The function body is well indented.

If we navigate across the SQLite source code we can remark the coherence of the implementation. The same best practice rules are applied to each function.

Here are some best practices to learn from the SQLite codebase:

Use structs to store your data model

In C programming the functions use variables to achieve their treatments. Theses variables could be:

  • Static variables.
  • Global variables.
  • Local variables
  • Variables from structs.

Each project has its data model which can be used by many source files. Using global variables is a solution but not the good one; instead, using structs to group data is more recommended.

Let’s search using CQlinq and CppDepend for defined structs:

sqlite22

Many structs are used to specify the data model.

Let function be short and sweet

Here’s some advice about the length of functions from the linux coding style web page:

Functions should be short and sweet, and do just one thing.  They should
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
as we all know), and do one thing and do that well.

The maximum length of a function is inversely proportional to the
complexity and indentation level of that function.  So, if you have a
conceptually simple function that is just one long (but simple)
case-statement, where you have to do lots of small things for a lot of
different cases, it's OK to have a longer function.

Let’s search for functions where the number of lines of code is less than 30

sqlite5

More than 90% of functions have less than 30 lines of code.

Encapsulation

Encapsulation  is the hiding of functions and data which are internal to an implementation.  In C, encapsulation is performed by using the static keyword . These entities are called file-scope functions and variables.

Let’s search for all static functions by executing the following CQLinq query

sqlite

As we can observe many functions are declared as static.

Function Number of parameters

Functions where NbParameters > 8 might be painful to call  and might degrade performance. Another alternative is to provide  a structure dedicated to handle arguments passing.

sqlite6

Only a few methods have more than 8 parameters.

Number of local variables

Methods where NbVariables is higher than 8 are hard to understand and maintain. Methods where NbVariables is higher than 15 are extremely complex and should be split into smaller methods (unless they are automatically generated by a tool).

sqlite7

Only a few functions have more than 15 local variables.

Avoid defining complex functions

Many metrics exist to detect complex functions, NBLinesOfCode, Number of parameters and number of local variables are the basic ones.

There are other interesting metrics to detect complex functions:

  • Cyclomatic complexity is a popular procedural software metric where the result is equal to the number of decisions that can be taken in a procedure.
  • Nesting Depth is a metric defined on methods that is relative to the maximum depth of the more nested scope in a method body.
  • Max Nested loop is equal to the maximum level of loop nesting in a function.

The max value tolerated for these metrics depends more on the team choices, as there are no standard values.

Let’s search for possible functions to be refactored:

sqlite8

Only very few functions could be considered complex.

Be Const Correct

C provides the const key word to allow objects passing that cannot change to bypass parameters and indicates when a method doesn’t modify its object. Using const in all the right places is called “const correctness.” It’s hard at first, but using const really tightens up your coding style.

Let’s search for functions having at least one const parameter:

sqlite9

Function coupling

Functions using many other functions are very difficult to understand and maintain. It’s recommended  to minimize the efferent coupling of your functions.

For SQLite very few functions have a high efferent coupling:

sqlite10

If you can exit a function early, you should.

Early exits out of a function, specially through guard clauses at the top of a function are preferred since they simplify the logic further down in the function.

In the SQLite source code this best practice rule is applied for almost all the functions.

Conclusion

Exploring some known open source projects is always good to elevate your programming skills, no need to download and build the project, you can just discover the code from GitHub.