Great - we've covered most of the basic functionality of C! However, we have one last thing to cover, which deals with how we allocate memory, and that provides us more flexibility in our code.

The Stack (briefly)

Before we move on, we must first introduce the notion of the stack. The stack is a place in memory where all of our data from our program live - not the instructions for the code, but the actual data created. For example, say I had the following line in main:

int test1 = 7;

When executing this, we would place the value 7 on the stack, and our code would know that the value in that space in memory corresponds to the value for test1. If we were to re-assign test1:

test1 = 5;

our code would simply replace the value in that space in memory (7) with 5.

We also consider each function to have a space on the stack just for its data, called a stack frame. This helps define our notion of scope - as soon as a function returns, all of the data that was created as part of its execution (in its stack frame) is taken off of the stack, or deallocated (we'll think of the return value as being placed on a spot on the stack that was pre-allocated when the function was called, and therefore isn't deallocated when the function returns). This means that once a function returns, we can't access any of the data it created during its execution.

Introducing malloc

To serve as motivation, let's imagine that we wanted to create an abstraction for an arrays, and have a single function create them for us. In theory, we would have a function that takes in the size of the array we want, create the array, and then return it. However, we'd run into a couple issues if we tried to do this with what we know now:

  • As soon as we return, the array we created would be deallocated, and we couldn't access it anymore from outside the function
  • We can't currently create arrays of variable size - the size has to be a constant known at compile-time

We can solve both of these problems at once using dynamic memory allocation. This type of allocation allocates memory on the heap, a region of memory separate from the stack. Since there are no notions of stack frames here, heap data isn't deallocated when a function returns. This allows us to create space in memory in one function, have that function return, and still be able to access that memory.

We allocate memory on the heap with a new function called malloc (memory allocate), defined in the C Standard Library header file stdlib.hmalloc takes in one parameter, the number of bytes to allocate, and returns a pointer to the allocated space. The parameter's type is size_t, a new type we haven't discussed, but is essentially a new version of an unsigned integer used to represent sizes. The pointer that malloc returns is of type void*, which indicates not that it points to nothing, but rather that it doesn't yet know what type it points to (as malloc doesn't know the type of data that you want to put there, only the amount). We can cast this to the pointer type we want, but C will also do this type conversion for us implicitly.

However, as you might've noticed, our types can vary in size, and it's often a pain to remember exactly how large each type is. Because of this, we often use the sizeof operator, which takes in a type and gives the amount of space that the type takes up in type size_t (sizeof looks like a function, but we can't pass types into functions, and it's included in the base C language instead of in a header file, so we consider it an operator instead).

Putting all of this together, the following code demonstrates how we might allocate space for an int on the heap:

int* int_prt = malloc( sizeof( int ) );

int_ptr now serves as a normal pointer to an int, except the int it points to is no longer on the stack...it's on the heap!

Note: There are a couple of variants of malloc, such as calloc (allocate memory and clear by setting to all 0's) and realloc (reallocate memory that was previously allocated in a different size), but we will stick to malloc for this training

This also shows us how we could go about allocating arrays of variable size! Suppose we wanted to allocate an array of ints of size arr_size (an int that stores the size we want). We could then allocate the space for that array on the heap like so:

int* arr_ptr = malloc( ( size_t )arr_size * sizeof( int ) );

Here, you can see we're allocating an amount of memory equal to the size of an int times arr_size, therefore giving us all the space we need (we explicitly also case arr_size to type size_t - many C build systems will do this for you, but ours throws an error just to make sure you do it explicitly and are aware of the type). Now, arr_ptr acts like a normal array type/pointer to an array, and we can use it as normal to store ints in an array.

The importance of free

Alright, how does malloc work? In the underlying code, malloc makes a call to the operating system, which then handles the memory allocation and tells malloc where the memory is that it's allocated, which malloc then returns to us in the form of a pointer. Once the OS has marked the memory as being used by malloc, it makes sense that this should also mean that no other program gets to use it while we have it. This brings up an important question, however; when does the OS know we're done with the memory?

If we just used malloc...never. We would continuously have the memory allocated, and it could never be used for something else. In 99% of operating systems, the OS would take care of deallocating the memory once a program finishes; however, it still isn't a good practice to rely on it. In addition, say you had a long-running program, such as a web browser. If we didn't properly deallocate the memory, then it's likely that the memory allocations would build up to the point where there was no more memory left to use. Note that in this case (no more memory left, or if memory allocation failed in general), malloc will return NULL, so it's important to check what malloc returned (NULL or not) before using the pointer.

To deallocate memory, we use the free function, also defined in stdlib.hfree takes in a pointer as an argument, and deallocates the memory that the pointer was associated with (with no return value). When using dynamic memory allocation, the general pattern is to get a pointer for the memory using malloc, do some stuff with the memory, and once you're done, deallocate that memory by passing the same pointer to free. (Note that passing NULL to free is defined to not do anything)

double* test_ptr = malloc( sizeof( double ) );

// Do some stuff with our allocated memory

// Once we're done, free the memory
free( test_ptr );

In good programs, for every malloc called, a programmer should also be able to identify the corresponding free that deallocates the memory. If you can't, then it's likely you've screwed up somewhere (smile). Allocating memory and never deallocating it is known as a memory leak, and should always be thought about and avoided. While you may not see immediate repercussions due to the help of the OS (see above), it is never good practice. For long-running programs, this is especially critical (as you may run out of memory like described above), but still remains an ever-present issue (We talked about how browsers are susceptible to this - look up "<your browser> memory leak", and you will most likely get results)

Note: It is very common to get lots of SEGFAULTs and memory leaks when starting out - don't be discouraged! These are very hard to debug, but some useful tools are GDB and Valgrind, if you wish to explore them. Your fellow programmers are always here to help(smile)

Stack Overflow

Most of us have heard of "stack overflow" from the website, but don't really know what it means (smile). However, the notion of our stack and heap give us some intuition behind this. The stack and the heap, while considered separate, still live in the same memory space. We try to keep them separate, so we start them at opposite ends; the stack starts at the top of our memory space and grows down, whereas the heap starts at the bottom and grows up (by custom). However, you can imagine that if we allocate too much memory, the stack and the heap might collide (AHHHH!!). For the heap, this isn't as bad of a problem; malloc will simply return NULL, and the program has to figure out how to deal with that. However, it's very difficult for the stack to deal with this. We're allocating more stack space any time we create a function or new data, so it would be nigh impossible for the program to continue. This causes a stack overflow error - the stack has gone beyond the bounds that it was supposed to (different than the data overflow issues we discussed earlier). This usually just causes a program crash, but also stresses the importance of freeing memory - if we never free memory, another issue that we could run into is that the amount of allocated memory grows so large that it runs into the stack and causes stack overflow. Therefore, we should make sure to free memory not just so that other programs can use it, but so our own program doesn't crash.

Great - you now know all about the wonders of programming in C, congrats! Instead of a practice, I wanted to include a larger programming exercise (call it a project, if you will) to hopefully help tie in everything you've learned.

Project - Doubly-Linked Lists

In this project, you'll be implementing a doubly-linked list. As opposed to a singly-linked list from earlier, where nodes only contain pointers to the next node, nodes in doubly-linked lists have pointers to the next node and previous node. Storing data in nodes allows us to cleanly control the amount of data we store, without having to free and reallocate space every time we want more or less:

In addition to the nodes that hold the data of the list, we often have a data structure to represent the entire list (in this project, DLinkedList). This would hold some overall data about the list (such as its size), as well as pointers to the nodes at the front and end of the list. Those nodes' prev_node and next_node, respectively, are often set to NULL to allow us to know when we've hit the end of the list. Below is a complete diagram of a doubly-linked list with 3 elements:

While we don't have to dynamically allocate our DLinkedList structure, we'll need to have dynamic allocation for Nodes, as we don't know how many we'll need! In addition, for data structures, we often use NULL to represent when something isn't there. To ensure we can do this, as well as to make sure that we free all of the Nodes we allocate, we often include construct and destruct functions for our DLinkedListconstruct is always called before we start using it, and destruct is called once we're done using it

  • construct: Initialize the members. This includes setting size to 0, and both start_node and end_node to NULL
  • destruct: Free all of the Nodes that have been allocated as part of the doubly-linked list. After that, set the size equal to 0 and start_node and end_node to NULL (while the last part doesn't have to happen, it's often useful). The goal is that any DLinkedList that's been destructed can then immediately be constructed and used again, as well as that any memory that was allocated is freed by destruct.

Now, onto the actual project! If you haven't already, clone the practice repository by opening a terminal in your desired directory and typing

git clone git@github.com:cornell-c2s2/c_cxx_training.git

Once inside this directory, we'll switch over to the code for this chapter by checking out the appropriate branch:

git checkout chpt9

Since this is a project, not just a practice, there's a bit more code (smile), and not a lot given to you, so you can take pride in the overall solution. Let's go over some of the files:

  • src/dlinked_list.h: Declarations of doubly-linked list functions
  • src/dlinked_list.c: Implementation of doubly-linked list functions
  • src/c2s2_utils.h: Declaration of memory helper functions
  • src/c2s2_utils.c: Implementation of memory helper functions
  • app/test_base.c: Test code for base functions
  • app/test_moderate.c: Test code for moderate functions
  • app/test_advanced.c: Test code for advanced functions

(Note that testing of higher-up functions will rely on the lower-ones being implemented)

For this, provided for you are also c2s2_malloc and c2s2_free, defined in c2s2_utils.h. These function exactly as malloc and free do (you can use them in the exact same way), but also keep track of how much memory you've allocated. This allows us to run c2s2_memcheck at the end of every test, which will throw an error if there's any memory that you've allocated but haven't deallocated. Make sure to use c2s2_malloc and c2s2_free so we can check for memory leaks at the end of tests!

There are also functions declared for you in dlinked_list.h, with comments to guide you in dlinked_list.c. These are divided into three categories, depending on how difficult they may be to implement. Notice how, to some extent, they hide your implementation from the caller - this is on purpose, and is meant to give you flexibility if you want to experiment and try different implementations without changing the tests.

In the app directory, you'll notice that we've started to take a more rigorous approach towards testing. Each test is now a separate function, all of which are called from main. We've also started to use the assert function (defined in the C Standard Library header file assert.h, which does nothing if you pass in a non-zero statement, but throws an error if you give it 0 (making it useful for testing).

While this may seem like a lot, it is set up so that you can take an incremental approach. The best workflow would look something like:

  • Implement the DLinkedList and Node data structures in dlinked_list.h (use the diagram above for help!). You will need to implement the Node structure first so that we can use the Node* type for our DLinkedList struct
  • Implement the base functions in dlinked_list.c (remembering to use c2s2_malloc and c2s2_free)
    • Once you've done that, you can run the base tests by going to the main directory, running make test_base to build the tests, and running the generated test_base executable in the build directory as normal
    • Take some time to add your own tests!
  • Implement the moderate functions in dlinked_list.c (remembering to use c2s2_malloc and c2s2_free)
    • Once you've done that, you can run the moderate tests by going to the main directory, running make test_moderate to build the tests, and running the generated test_moderate executable in the build directory as normal
    • Take some time to add your own tests!
  • Implement the advanced functions in dlinked_list.c
    • Once you've done that, you can run the advanced tests by going to the main directory, running make test_advanced to build the tests, and running the generated test_advanced executable in the build directory as normal
    • Take some time to add your own tests!
  • To finally build all of the executables, you can also go to the main directory and run make all

When adding your own tests, some things you might consider are:

  • Any untested functionality
  • Random inputs
  • Doing a test more than once (with random inputs)

This is certainly a BIG step up in complexity, but you have everything you need to succeed. Feel free to reach out to any of the other programmers or the Software subteam lead on the team for help. If you get really stuck, I made a separate GitHub branch with my solutions, in case you want to get a hint. After you've done this, you should have all the experience you need to be a great C programmer (not to mention having developed a data structure that very well may come in handy later) - amazing job completing the C portion of the training!

  • No labels