Chapter 4: Header Files and Project Structure

This chapter will focus less on new functionality, and more about how our projects are structured, so that we can have clean, organized projects that are easily editable. While this may not seem important, having good project structure is key once you begin to increase the complexity of your code.

Imagine you're working on a large database that uses many functions in its program. For this tutorial, I'll represent the file that contains the function main as main.c. If we included all of the functions we needed, main.c would get very cluttered and hard to read, making it really hard to understand what's going on! Additionally, if we wanted to use those functions elsewhere, we'd have to copy the code, making for very large files.

Instead, let's put some of those other functions in a separate file called src.c (to represent all of the source code that our main.c uses). This way, we can implement the functions we need in this file, with the program actually calling them from main.c (in real-life projects, you may have hundreds of source files!). In addition, if any other code needs those functions, they can get them from src.c without having to worry about accidentally copying over a main function that they didn't want.

Now comes the tricky part; how does main.c access the functions in src.c? To understand this, we must first understand what a function declaration is. Previously, we declared and defined functions in one step with the function definition. Function declarations allow us to declare the function before actually defining it. They have the syntax below; notice how it looks just like the first line of our function definition, with the only difference being a semicolon to indicate the end of the statement (this is also sometimes called the function signature). Note that in order to call a function, code only needs to know about the function declaration, not the final definition it ends up with:

rtype function_name( ptype0 pname0, ptype1 pname1, ... );

We will then create a header file named src.h, which will have a function declaration for each function defined in src.c that we want outside code to know about. The .h suffix is used to indicate that this is a header file, and only contains declarations (not definitions) of functions (as well as possibly other things that outside code might want, such as relevant constants). An example of what this might look like can be seen below:

src.c

// Definition of averager function
int avg( int src1, int src2 )
{
	return ( ( src1 + src2 ) / 2 );
}

src.h

// Declaration of averager function
int avg( int src1, int src2 );

Lastly, to include these functions in main.c, we will use the include directive. This is a compiler macro (think of it as an instruction to the compiler) to blindly copy all of the code from whatever we include to the spot where we put the directive (we also use this in src.c, although not shown above). Using our example above, main.c might then look something like this:

main.c

// Include averager function
#include "src.h"
 
int main( void )
{
  int source1 = 4;
  int source2 = 22;

  int average = avg( source1, source2 );
   
  return 0;
}

We can see how this can help separate our code into more manageable segments and organize our project. For example, one source file could hold functions for calculating what time it is, another could hold functions for feeding your pets, and we could use them both in main.c to feed your pets at a specific time.

Some of you may wonder "Hey, why don't we just include the function definitions in main.c, instead of only the declarations and having to include the definitions at a later time?" One conceptual reason is to have a clean definition of the interface of the function - how a function is supposed to access and use the code inside. If we only had the definitions, a programmer that wanted to use the functions inside of src.c would have to comb through all of the implementation details just to find out how to call it - yuck! Instead, with src.h, we have all the information we need to know how to use the functions inside of src.c without ever having to look inside of it. This is an example of abstraction - purposefully hiding details that you don't need to know at a given step, which is critical for large projects. If you tried to keep in your head every single implementation of a function while writing code, you might just go insane from the complexity. Instead, the interface allows us to know what we need to without having to worry about the implementation (this also allows us to really easily use the code of others without needing to know how they did it, enabling collaboration)

There is also a technical reason for it; it has to do with how our code is compiled. I'll give simple overview here:

All of our source files get compiled (with any header files they include) into object files, which in turn get combined and linked together to form our final executable. Notice how if we make a change to src.c, as long as the interface in src.h stays the same, we only need to recompile src.o. However, if we included the definitions in src.c instead of the interface in other files, not only would the object files get really large (due to all the copy-pasting of the definitions), but if there was a bug in src.c, we'd have to recompile all of the object files, not just one - super tedious!

Lastly, you might notice that this results in multiple copies of our function declarations. If we don't do anything about it, this will cause issues when creating the final executable from the object files, as we won't know which one to use! To avoid this, we use include guards. These are compiler directives that we use to surround our functions to make sure that they are only ever included once in the final executable. These take the form of compiler macros that are conditionals - we can modify our src.h from above to show this (this would be the final version that we use in our code)

src.h

#ifndef SRC
#define SRC

// Declaration of averager function
int avg( int src1, int src2 );

#endif // SRC

We can see that the conditionals rely on a value (here, it is SRC - by convention, it is the name of the header file and usually the path to the main directory, although spaces and slashes must be replaced with underscores - see the practice for a more concrete example). The first time the compiler comes across this when linking, it will see that SRC is yet to be defined, so it will define it and include the declarations. Any subsequent time it encounters this code (in other places it has been included), SRC will have already been defined, so it won't include the extra copies of the declarations, resulting in only one copy in the end executable.

printf and C standard libraries

Alright - we now know enough to also talk about printf, a very common function in C for printing to the command line (where you run the code). Notice how in our past code when we've used printf, we've used the include directive to include stdio.h - this is where the printf function interface is declared for us to be able to use it! (The name stands for Standard I/O, and declares functions like printf that relate to input and output from the function)

printf takes in one primary input, some text surrounded by double quotes (this is known as a string, which we'll cover more later)! It will then print that string to the terminal you are running the code from.

In addition to just text, printf strings can take two other things. The first are escape sequences. These are a sequence of characters beginning with a backslash (\). The backslash escapes the characters - makes them mean something other than what they normally do. The most common of these is \n, which is used to insert a new line in the output. If you look all the way back at our Hello, World program, you'll see this character at the end of our output - this is how the terminal knew to start the next input on the line after the output! (You should now be able to fully understand this code, by the way ). If you're curious about more escape sequences, there are many resources.

The other acceptable inputs are format specifiers. These are similar to escape sequences in that they escape characters, but begin with the percent sign (%) instead of a backslash. Instead of giving the characters that follow a different meaning, they allow us to print out values from our code. Each type has a format specifier (possibly more than one, which can specify different formats to print the value, as the name suggests), and the format specifier indicates how to print that data (otherwise, printf has no clue how to interpret just 1's and 0's). For the type int, we most commonly use the format specifier %d, which stands for decimal - it prints the value as a decimal number, as you usually expect to see it. To tell printf which values to replace the format specifiers with, we include them as additional inputs to the printf function, in the order that they appear in the text:

int test1 = 3;
int test2 = 9;

printf("%d is less than %d", test1, test2);

Note that if you want to print the actual characters \ or %, you can escape them as well, giving them the meaning of a text character instead of an escape character! Both of these are escaped by themselves; you would type \\ or %%, respectively.

The last thing to cover is how our final executable knows the implementation of printf? This is where standard libraries come in. Each programming language usually has some standard libraries - code that is given to you as part of the programming language that may be useful to use. stdio is part of C's Standard Library, and includes all of the I/O functions. By including different header files, you can use various functions that the C Standard Library contains (without needing to know how they're implemented!). You'll notice that the include directive looked a little bit different for standard libraries than for our own header files - we surround our own header files with double quotes when including them, and standard library files with angle brackets (<>), solely to give hints to the compiler about where they'd be located (part of our code or the library code) whey they search for the header files. If you're curious, there are many more header files that include many more functions, the most common being stdlib.h (such as for generating random ints with rand(), which returns an int at random from 0 to RAND_MAX, a constant also defined in there). In addition, many (if not all) embedded devices also come with their own header file for functions that use the hardware, such as controlling GPIO's or sending messages.

Great - now that we know how we can use functions in different files, let's put it in action!

Practice

If you haven't already, clone the practice repository by opening a terminal in your desired directory and typing

git clone git@github.com:cornell-c2s2/c_cxx_training.git

Once inside this directory, we'll switch over to the code for this chapter by checking out the appropriate branch:

git checkout chpt4

Now that we have code that does and doesn't have main in them, we'll need to distinguish between them. Moving forward, all of our files that have main will be in app as normal, but all of our other source files will be located in a new directory named src. Looking into src, you'll see two source/header pairs; gcd and madd. gcd contains a function for calculating the greatest common divisor of two ints, and madd contains a function to multiply two ints and add a third. For gcd, you only need to implement the function, but for madd, you must create both the function and the declaration in madd.h from scratch.

You need not do anything in app for this one - it contains a file named gcd_madd.c that has a main function to demonstrate the other functions, as well as show how we can include those functions. Once you've implemented the other code, you can build this program by going back up to the main directory and typing

make gcd_madd

Our built file should now be in the build directory (along with all of our object files). We can now run our code with

build/gcd_madd # MacOS/Linux
build\gcd_madd # Windows

You should see some expected outputs. The GCD of 18 and 24 is 6, the GCD of 27 and 51 is 3, and (27 * 4) + 51 is 159. Hopefully you got this - great job! We can now move on to considering how these types are actually implemented in hardware <insert mind blown emoji>

Page tree

printf and C standard libraries

Practice

Previous Chapter

Next Chapter