Section 2: Compiling and Running C Programs

This discussion section serves as gentle introduction to the basics of compiling and running C programs on the ecelinux machines. The starter code for this tutorial is provided here:

1. Logging Into ecelinux with VS Code

As we learned in the last discussion section, we will be using the ecelinux servers for all of the programming assignments. In the last discussion section we used a terminal emulator to log into the ecelinux servers.

In this discussion section, we will use VS Code to log into the ecelinux servers. VS Code provides a nice GUI for navigating the directory hierarchy on ecelinux, syntax highlighting for C/C++ programs, the ability to open many files across multiple tabs, and an integrated terminal for running shell commands.

Note, if you have already installed VS Code on your laptop, then you should feel free to use your laptop for this discussion section. However, if you have not already installed VS Code on your laptop and verified it works, then please use the workstations in 225 Upson or find a partner. We do not have time to help you setup VS Code on your own laptop in this discussion section.

For a full tutorial on using VS Code with ecelinux, follow this tutorial.

2. Before You Begin

The GitHub repo provides a quick checklist you should follow to download and setup the starter code:

  • Be able to connect to ECELinux with VS Code (see tutorial)
  • Set up ssh with GitHub (we did this in last week's discussion section)
  • git clone git@github.com:cornell-ece2400/ece2400-sec02-2025.git ece2400-sec02 this repo to ecelinux
  • OPTIONAL BUT RECOMMENDED: Install clangd extension on the remote host
  • Open the repo with VS Code on ecelinux, by clicking File > Open Folder. Choose ece2400-sec02

3. Compiling and Running a Single-File C Program

We will begin by writing a single-file C program to calculate the average of two integers similar to what we have studied in lecture. We have provided you with a template in the avg-main.c file. Edit the avg-main.c file to include an appropriate implementation of the avg function.

#include <stdio.h>

int avg(int x, int y) {
  int sum = x + y;
  return sum / 2;
}

int main() {
  int a = 10;
  int b = 20;
  int c = avg(a, b);
  printf("average of %d and %d is %d\n", a, b, c);
  return 0;
}

We use a compiler to translate the C source code into an executable binary (i.e., the actual sequence of bits) that the machine can understand. In this discussion we will be using the GNU C compiler (gcc), but you could use clang too. Let's go ahead and give this a try:

$ pwd # Should be ece2400-sec02
$ gcc -Wall -o avg-main src/avg-main.c

The gcc command takes as input the C source file to compile and the command line option -o is used to specify the output exectutable binary (i.e., the file with the machine instructions). We also use the -Wall command line option to report all warnings. After running the gcc command you should see a new avg-main file in the directory. We can execute this binary by simply calling it as any other Linux command.

However, notice the error message we received:

src/avg-main.c: In function ‘avg’:
src/avg-main.c:5:1: warning: control reaches end of non-void function [-Wreturn-type]
    5 | }

We need to fill in the rest of the code. Open the code with code src/avg-main.c and fill in that function. Then, we can check that it works:

$ pwd # Should be ece2400-sec02
$ gcc -Wall -o avg-main src/avg-main.c
$ ./avg-main
average of 10 and 20 is 15

Recall that a single dot (.) always refers to the current working directory. Essentially we are telling Linux that we want to run the executable binary named avg-main which is located in the current working directory.

It can be tedious to to have to carefully enter the correct commands on the command line every time we want to compile a C source file into an executable binary. In the next discussion section, we will explore using a build framework to automate the process of building our C programs.

Before we move on, let's examine the machine instructions using the objdump command.

$ objdump -dC avg-main | less

The objdump command takes an executable binary and shows you the machine instructions in a human readable format. We are piping it through less so we can scroll through the output. Try and find how many machine instructions are used to implement the avg function. Does it seem like the compiler generated optimized code or unoptimized code? You can exit less by pressing the q key. Let's recompile our program with optimizations. We can do this by specifying -O3 (optimization level 3).

$ gcc -Wall -O3 -o avg-main src/avg-main.c
$ objdump -dC avg-main | less

Now how many machine instructions are used to implement the avg function?

4. Compiling and Running a Multi-File C Program


C programs are almost never contained in a single source file. They require many files which must be individually compiled and then linked together. Linking is the process of merging together different binary files each with its own set of machine instructions. To illustrate this process we will experiment with a function to square a given integer. Our project will include three files:

  • include/square.h: header file with function prototype for square function
  • src/square.c: source file with function definition for square function
  • src/square-adhoc.c: adhoc test of square function which contains main

We will compile the square.c and square-adhoc.c files into their own object files and then link these object files into a complete executable binary.

An object file is like a chunk of machine instructions. We cannot execute an object file directly, because they contain only part of the entire program. However, linking the object files together creates an executable binary.

Start by inspecting the header file include/square.h. Header files are the key to multi-file C programs. The square-adhoc.c source file needs to call the square function, but the square function is in a different source file. When we compile the square-adhoc.c source file, how will the compiler know that the square function exists to ensure the programmer is not accidentally calling an undefined function? How will the compiler know what parameters the square function takes, so it can perform type checking? The square-adhoc.c source file cannot directly include square.c since that would result in the same function being compiled twice into two different object files (which would cause a linker error). What we need to do is have a way to tell square-adhoc.c the square function prototype (i.e., the interface of the function including its name, parameter list, and return type) but not the square function definition. We do this with a function declaration. A function definition specifies both the function prototype (interface) and the implementation at the same time, while a function declaration just specifies the function prototype without the implementation. A header file contains all of the function declarations but no function definitions. All of the function definitions are placed in a source file that goes along with the header file. If we want to call a function that is defined in a different source file, then we simply use the #include directive to include the appropriate header file. The linker will take care of making sure the machine instructions corresponding to every function definition are linked together into the executable binary.

We have provided you the square.h file with the the following contents:

int square( int x );

We have provided you with a template for the square.c file. Edit src/square.c file to include an appropriate implementation of the square function.

#include "square.h"

int square(int x) {
  return x * x;
}

Notice how our square.c file includes the corresponding square.h file. This is best practice which follows the course coding conventions. Finally, take a look at the provided square-adhoc.c file:

#include "square.h"
#include <stdio.h>

int main() {
  int a = 10;
  int b = square(a);
  printf("square of %d is %d\n", a, b);
  return 0;
}

Go ahead an fill in the implementation for square(). Then, let's compile square.c and square-adhoc.c into their corresponding object files:

$ pwd # Should be ece2400-sec02
$ gcc -Iinclude -Wall -c -o square.o src/square.c
$ gcc -Iinclude -Wall -c -o square-adhoc.o src/square-adhoc.c

We use the -c command line option to indicate that gcc should create an object file as opposed to a complete executable binary. An object file is just a piece of machine instructions. Again, we cannot actually execute an object file; we need to link multiple object files together to create a complete executable binary. We usually use the .o filename extension to indicate that these files are object files.

Let's link these two object files together to create a complete executable binary that we can actually run:

$ gcc -Wall -o square-adhoc square.o square-adhoc.o

Notice that the complete executable binary contains all of the machine instructions for both the square and main functions along with a bunch of additional system-level code (e.g., for the printf function). Let's go ahead and run the executable binary.

$ ./square-adhoc

This of course begs the question. If we can compile a project with multiple files simply by specifying all of the files on the command line, then why did we learn about how to: (1) compile each file individually into an object file; and (2) link these object files together? For small projects with just 2-3 files there is no need to use object files. However, in a project with thousands of files, specifying all files on a single command line will cause each recompilation to take a very long fixed amount of time---potentially hours.

Using object files enables modular compilation. In modular compilation, we only need to recompile those source files what have changed. We can simply reuse the previously compiled object files for those source files that have not changed. Modulary compilation can drastically reduce recompile times, so that it is proportional to just how many changes you have made to the source files (e.g. less than a second). One challenge with modular compilation is it drastically increases the build complexity. In the next discussion section, we will explore using a build framework to automate the process of modular compilation for complex C programs.

5. Automation with CMake

CMake is a tool that exists to essentially automate the process we just experimented with. It also assists in making your build system more portable to different operating systems. For example, Windows does not have GNU Make, so it uses Visual Studio solutions. CMake can even use the Ninja build system.

In short, CMake is a higher level scripting tool to manage how your C code is built. Consequently, writing your own CMake scripts (CMakeLists.txt) can be difficult and CMake itself is not well documented. Nonetheless, CMake makes the building process automated and very portable when done correctly.

If you want more information, investigate the CMakeLists.txt file. In this tutorial, we will only show you how to run our CMake project---not make new ones. With that said, let's rebuild the project using CMake:

pwd # Should be ece2400-sec02
mkdir -p build
cd build
cmake ..

If that worked, you should see Build files have been written to: /home/netid/ece2400-sec02/build. We have not actually built the project yet. CMake simply establishes a database of how to build our object files, and how they are interdependent. For example, let's take a look at compile_commands.json:

$ cat compile_commands.json
{
  "directory": "/home/netid/ece2400-sec02/build/src",
  "command": "/usr/bin/cc  -I/home/netid/ece2400-sec02/include  -o CMakeFiles/sec02-lib.dir/square.c.o -c /home/netid/ece2400-sec02/src/square.c",
  "file": "/home/netid/ece2400-sec02/src/square.c"
},
{
  "directory": "/home/netid/ece2400-sec02/build/src",
  "command": "/usr/bin/cc  -I/home/netid/ece2400-sec02/include  -o CMakeFiles/square-adhoc.dir/square-adhoc.c.o -c /home/netid/ece2400-sec02/src/square-adhoc.c",
  "file": "/home/netid/ece2400-sec02/src/square-adhoc.c"
},
{
  "directory": "/home/netid/ece2400-sec02/build/src",
  "command": "/usr/bin/cc  -I/home/netid/ece2400-sec02/include  -o CMakeFiles/avg-main.dir/avg-main.c.o -c /home/netid/ece2400-sec02/src/avg-main.c",
  "file": "/home/netid/ece2400-sec02/src/avg-main.c"
}

It may not seem impressive that CMake just generates the same compile commands that we ran by hand. However, it would be infeasible to manage these commands for thousands of object files everytime we make a change (like changing optimization level). To execute all the build commands, just run make in the build directory:

pwd # Should be ece2400-sec02/build
make

To reiterate, cmake generates the files needed to run make (GNU Make). If it worked, you should see the executable programs ece2400-sec02/build/src/square-adhoc and ece2400-sec02/build/src/avg-main. Try running them, and make sure they work the same. Notice that ece2400-sec02/build/src is a different directory than ece2400-sec02/src. The former holds our executables, and the latter is where the source code is located. CMake by default makes the build directory mirror the source file tree.

6. To-Do On Your Own

6.1. Try clangd

One benefit of using CMake is that it makes using clangd very easy. Simply install the clangd extension on the remote host in VS Code. Then, your code editor should provide helpful auto-corrections as you write your code. Clangd is the number one most useful tool when managing a gigantic codebase, so you may want to read into its capabilities: https://clangd.llvm.org/.

6.2. Add a new source file

If you have time, create a new source file named avg3-main.c in the ${HOME}/ece2400-sec02/src directory that contains an avg3 function. This function should calculate the average of three values instead of just two. Modify the main function to properly call your updated function. Finally, add your binary to src/CMakeLists.txt. Compile your new program with the same build script and run it to verify it calculates the average correctly.