Preface
This e-book is a humble attempt to describe C language while actively trying to learn it. I enjoy writing code and technical documentation and I decided to produce this guide under MIT licence. It is not intended to be fully comprehensive and complete, it only contains what I’ve learned and follows my very personal style.
It will be consistently updated and improved until completion and kept—as much as possible—accessible.
To understand every aspect of C, many tools will be used and all examples will refer to CLI commands. I will be using Neovim as text editor and operate on a Linux machine. The output of commands and all examples may differ from machine to machine but the concepts will hopefully remain valid.
I strongly believe that the best way to develop software is by
using CLI and lightweight text editors such as
neovim
or vim
. Whenever it is possible, I
will avoid using browsers to search for documentation by preferring
usage of man
directly into the terminal. This will keep
low the friction and avoid the necessity to leave the home row of my
keyboard.
Introduction
C was invented in Bell Labs when Ken Thompson was working on Unix. Following the idea that a good operating system should have had a high level compiled language. After abandoning the first attempt on creating a compiler for Fortran, a smaller new language was created and named B. B better fitted P2P11 but was not enough to port Unix from Assembly. C was created with a set of feature that were missing in B and was a much better fit for the Unix system.
C was a better language mainly because its multiple distinct types:
- pointers;
- integer;
- floating point numbers: float;
In that sense, C language can be visualized as B with types where all types can also be imagined as integers since pointers—in very simple terms—are integers and so are structures. In fact, structures are a set of integers representing offsets of each field position in memory and values of the very same fields. This simplicity can be considered the strength of the language as it can be easily picked up by new developers, layered to build a powerful abstraction and, with that, imagine in simple terms complex topics and algorithms.
Anatomy of a C program
Execute a C program
C is a compiled language, this means that you cannot execute a file containing the main function. It requires to be compiled in an executable program.
You can use: cc
to compile a C program.
cc
is a Unix command that let you easily communicate
with the compiler. You can use: cc --version
to check
what compiler does it use.
cc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3)
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Let’s consider the following simple C program contained in a file named—for instance—hello_world.c (how original):
#include <stdio.h>
int main(void) {
("Hello World\n");
printfreturn 0;
}
You can use: cc hello_world.c
to compile it to an
executable program.
The compiler generates an executable binary file named:
a.out. This file is executable and runs your program. You
can use: file a.out
to check information about the
generated file.
./a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, \
interpreter /lib64/ld-linux-x86-64.so.2, \
BuildID[sha1]=d589730d718a032a35f848fe8d280063a6cee18c, \ for GNU/Linux 3.2.0, not stripped
If you want to check the content of the generated binary file,
you can use: hexdump -C a.out
.
You can also generate Assembly
code using: cc -S hello_world.c
if the compiler
supports this feature.
.file "hello_world.c"
.text
.section .rodata
.LC0:
.string "Hello World"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startprocpushq %rbp
16
.cfi_def_cfa_offset 6, -16
.cfi_offset movq %rsp, %rbp
6
.cfi_def_cfa_register movl $.LC0, %edi
call puts
movl $0, %eax
popq %rbp
7, 8
.cfi_def_cfa ret
.cfi_endproc.LFE0:
.size main, .-main
.ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
.section .note.GNU-stack,"",@progbits
With a given compiler, you can tweak many compilation aspects.
For instance, cc -O2 hello_world.c
tells the compiler
to optimize the generated executable. A more optimized version of
the executable is also slower to be generated and, if that does not
make much sense for small programs, it can output a much better
version of the program when a high enough level of complexity has
been reached.
With GCC
compiler, you can see that our simple program make use of
puts syscall,
however, this depends on the compiler itself and, often, with
different compilers, the line: printf("Hello World\n");
is compiled using printf instead.
Using -O2
flag can make the compiler use
puts as this syscall is faster than printf. This
is a simple, yet meaningful, example but in such a small program it
does make no difference in terms of execution speed. The compiler is
very good at improving written programs if given enough time. While
developing, though, a low compilation time is often preferred.
You can check the standard C library from the terminal using
man
or--help
flags. For example, you can useman 3 puts
orman 3 printf
to check documentation of both syscalls (3 makes sure to output the C library description).
Include other source code
In the very first line of our simple program, you can see a preprocessor
directive.
This line simply tells to the compiler that a file need to be
included into the program. The compiler, before the compilation,
take the content of the file and paste it at the location.
In this case, <stdio.h>
declares the prototype of
printf function so to instruct the compiler on how to
execute that specific call. To prove this point, you can remove the
first line and replace it with:
int printf(const char *restrict format, ...);
which is
the prototype of the function we want to call.
int printf(const char *restrict format, ...);
int main(void) {
("Hello World\n");
printfreturn 0;
}
#include
can also be used to include other
C files. In fact, you can move a single line to a different
file and than compile a program that includes the file on the line
you want it to be replaced.
#include <stdio.h>
int main(void) {
#include "file.c"
return 0;
}
The generated assembly or machine code will be equivalent.
Functions
This very simple program has a single function named
main. A function has always a return type, an
optional list of parameters, and a body. The signature of
the function main has a return type specified as
int
—this means that the function must return an integer
value.
Parameters are defined inside the brackets of the function and
they too have a specific type. It is also possible to define a
function that does not require any parameter. This can be explicit,
using void
as the function main
does, or
implicit, by simply avoiding specifying any parameter:
int main() {}
.
Functions can call other functions too!
#include <stdio.h>
int sum(int a, int b) {
return a + b;
}
int main(void) {
("Hello World %d\n", sum(10, 20));
printfreturn 0;
}
The function main
is a special function, in fact, it
is the only function that is automatically called by the program.
Other functions must be explicitly called. This means that a valid
C program must define the main function.
Variables
Functions parameters are variables existing only during the function execution. There are variables which are not involved only in function calls but also have a meaning in the callee context or even in the whole program context.
Scope
Variables can have different scope. In the previous example, the
function int sum(int a, int b)
has two variables as
parameters having a local scope. When variables are local, they are
valid only within the function context and have no meaning to other
functions.
To understand this concept, let’s consider the following program:
#include <stdio.h>
int sum(int a, int b) {
return a + b;
}
int main(void) {
int a = 10;
int c = 20;
("Hello World %d\n", sum(a, c));
printfreturn 0;
}
This is a valid C program, equivalent to the previous,
and, as you can see, the variable named a
exists twice
with the same name. This is possible because in both cases, the
variable scope is local to the function itself and it’s removed
after the function has returned its value.
The function
main
has a return type but since it is automatically called by the program, the only one that can be interested in its value is the callee: the program executor. If executed from a shell, the program returns its value and can be shown with./a.out; echo $?
. This is quite useful combined with the fact that0
is equivalent totrue
in Unix shells.
Variables can also have a global scope. A global variable is seen by every function and initialized only once.
#include <stdio.h>
int x = 0;
void incr(void) {
= x + 1;
x ("%d\n", x);
printf}
int main(void) {
();
incr();
incr
return 0;
}
In such cases the value of x
is incremented by one
each time the function is called. Values of local variables can also
be retained through multiple function calls if they are defined as
static: static int x = 0;
.
It’s important to highlight that, in C, variables are passed by value. This means that whenever a function is called, it cannot modify any existing variable local to the callee but, for each of its parameters, a copy of the value is passed. To modify local variables with functions it is necessary to use pointers which will be described extensively in a dedicated chapter.
Type
We have seen variables having type int, but there are multiple primitive types that can define different kinds of data. For simplicity, a subset of common primitive types is shown into the following table, refer to the standard documentation to explore all different existing types.
Type | Common size (b) | Description |
---|---|---|
int | 32 | Signed integer numbers |
float | 32 | Floating point numbers |
double | 64 | High precision floating point numbers |
char | 8 | Characters |
short | 16 | Shorter signed integer numbers |
All size reported are not guaranteed by C specification, it mainly depends on architectures.
In many cases, types are automatically promoted to a
higher size type to easily handle similar cases. For instance,
printf
will promote char
or
short
values to int
enabling developers to
simplify the usage of the function.
short s = 400;
// `s` is automatically converted.
("%d\n", s); printf
This happens with functions such as printf
, which
accept a variable number of parameters (variadic function), but also
during expressions evaluations if necessary.
char c = 127;
// Before evaluation, `c` is promoted to int.
int i = c + 1;
("%d\n", i); printf
Since the size of types is variable and depends on the
architecture, there is a specific function that returns the size of
a specific variable: sizeof(var)
.
Variables can represent a single value or a collection of values. To define a variable and store multiple values of the same type, C provides Arrays.
int array[5] = {1, 2, 3, 4, 5};
("%d\n", array[0]); printf
Arrays can store multiple values in different positions and track
a contiguous block of memory. To access value in a specific position
you can use each index starting from 0
up to
n-1
.
Arrays of characters are named strings, and—since they are very common—there is a simpler way to initialize them:
char phrase[] = "Hello World";
("%s\n", phrase); printf
It is not mandatory to set the size of the array as the compiler
will automatically do it by checking the its size. You can evaluate
the size of a string too using: sizeof(string)
, which
is also evaluated at compile time.
You will notice that the size of strings, returned by
sizeof
, is always greater than the amount of character in the string. Strings always require the null terminator:\0
that tells the program when the array ends and initializing a string using quotes automatically adds the null terminator.
Code blocks
Code blocks are blocks delimited by brackets that can be part of functions. Each function has at least one code block. Variables declared in a specific block have a local meaning and occupy a different memory block.
#include <stdio.h>
int main(void) {
int i = 5;
{
int i = 3;
// (4 bytes) stored at 0x7fff7cae44b8
("(%zu bytes) stored at %p\n", sizeof(i), &i);
printf}
// (4 bytes) stored at 0x7fff7cae44bc
("(%zu bytes) stored at %p\n", sizeof(i), &i);
printf}
&i
returns the memory address where the variable is stored, more about pointers in the following chapters.
This simple program will show how the two variables, having the same name, will be stored in two consecutive memory blocks that differ by exactly 4 bytes (from 0x[…]8 to 0x[…]c).
Conditional code blocks
Often, the linear execution of a program needs to be interrupted to take a direction based on a specific condition. Conditional code blocks are blocks of code executed only if a specific requirement is satisfied.
The keyword if
defines a conditional block and the
condition that needs to be met for the execution:
#include <stdio.h>
int main(void) {
int i = 5;
if (i> 3) {
("Value %d is greater than 3\n", i);
printf} else {
("Value %d is not greater than 3\n", i);
printf}
}
Conditional blocks are optionally enhanced with else
or multiple else if
constructs that build up the logic
based on multiple different conditions.
When conditional code blocks are constituted by a single statement, brackets are optional:
if (i > 0) printf(i);
.
Loops
To be Turing-complete,
a language must have some kind of looping logic. C has many
way to iterate the execution of a code block: for
,
while
, do-while
. Loops let the program
jump at the start of a code block for multiple iteration each time
the condition is met. A way to achieve the same result is by using
the keyword goto
.
The keyword goto
interrupt the program execution and
start from a specified label.
#include <stdio.h>
int main(void) {
int i = 0;
:
again("%d", i);
printf++;
iif (i < 10) goto again;
}
In the example, when the condition is met, the instruction
goto
make the program jump back to the line under the
specified label.
The logic to iterate a code block or a set of instruction can be also written by using a while loop.
#include <stdio.h>
int main(void) {
int i = 0;
while(i < 10) {
("%d", i);
printf++;
i}
}
Recursion
Another way to execute multiple times a specific code block, is
by using recursion. We talk about recursion when a function call
itself. In the following example, count
is a recursive
function.
#include <stdio.h>
void count(int start, int end) {
if (start > end) return;
("%d\n", start);
printf(start + 1, end);
count}
int main(void) {
(0, 9);
countreturn 0;
}
Calling multiple time the same functions is equivalent to have multiple code blocks and—as said—each code block instantiate its variables in different memory addresses. This means that recursion, by instantiating multiple time the same variables, uses more memory than a simple while loop.