More Information on C Language

This document includes more in-depth information on C language, but completely ignoring library functions, that are not actually part of the language. This is an appendix to A-C-X-en.html and is meant to be useful for people already programming in some other language.

Pointers and Arrays

We must always keep in mind that C has been designed to be near the processor and its structures. It shouldn't thus be a surprising choice, made by language authors, to represent a array with the address of the first element. The concepts of array and pointer are thus interchangeable, and you can apply an index to a pointer, using square brackets like it was an array, without any error or warning. The name of an array is however different from a pointer in that you can't assign a new value to it: an array name represents a constant address, which has been chosen at compilation time.

You can sum and subtract pointers and integers: the result of summing a pointer and an integer n is a pointer to the nth element of the array. In other words, the number is not the number of bytes to add to the original address, but the number of elements. The scale factor is applied by the compiler according to the type of pointer you make arithmetics on. So you can increment or decrement a pointer, as well as taking the difference (but not the sum) of two pointers of the same type; the result is an integer number. If you make arithmetics on generic pointers (pointers to void), the gcc compiler uses 1 byte as scale factor, but the operation is not allowed according the standard.

The "null pointer" is zero and is not a valid pointer; usually functions that return a pointer use the null pointer as an error marker. The macro NULL is usually "0", or sometimes "(void *)0", and zero can be assigned to any type of pointer.

The most important operators to use pointers are "*" (read as "pointed by") and "&" ("the address of").

Examples:

int i, v[10], *p; /* an integer, an array, a pointer:
                    "it's integer the element of v, which is long 10 items"
                    "it's integer what's pointed by p" */
p = v;         /* p takes the value of v, i.e. the address of its first item */
p = &v[0];     /* same as above */
p = &v[4];     /* p takes the address of the fifth element of v */
p = v+4;       /* same as above */
p++;           /* p is incremented */
i = p-v;       /* i takes 5, as a result of the previous two lines */

In rhe following loop, the integer i takes the sum of all elements of the array v that are different from 0. Note that there must be a zero-valued array element, or the loop will overflow the array length. The language doesn't offer any check on pointers and array indices.

i=0;
for (p = v; *p; p++)
    i += *p;

Assignment between pointers of different types is an error, and it's also an error any arithmetic between pointers of different types. You can nonetheless convert a pointer of a type into another pointer, as well as pointers into integers and back. These conversions don't generate any machine code, as the values are all integers within the CPU; but they are sometimes needed to ensure type consistency of the source code.

A void pointer and always be assigned to any type of pointer, and any type of pointer can be assigned toa void pointer. This is allowed because the void pointer is used to handle generic memory addresses, a very common task in operating systems and system libraries.

The sizeof, operator, applied to a type or a variable name or an expression, returns the size in bytes of the named object. This operation is performed at compile time, according to the type being passed to sizeof. If a pointer p is incremented, it's numeric value (memory address, in bytes) is incremented by sizeof(*p).

Example:

int i, v[10], *p;   	  /* the same variables used earlier */
i = sizeof(int);    	  /* usually 4, but may be 8 or 2 */
i = sizeof(i);      	  /* same as above */
i = sizeof(v[0]);      	  /* same as above */
i = sizeof(*p);      	  /* same as above */
i = sizeof(p);            /* usually 4 (pointer size), or 8 */ 
i = sizeof(v);      	  /* 40, or 80, or 20 */
i = sizeof(v)/sizeof(*v); /* 10: the number of elements in the array */
#define ARRAY_SIZE(x) (sizeof(x)/sizeof(*x)) /* a useful macro */

Strings

C strings are arrays of characters, each with a zero byte as terminator. They double-quote representation is just a simplified notation to to represent an array. Each time a string in double-quotes appear in the program text, the compiler saves the string in the data segment of the program and represents it with the address of the first character. A character within single quotes is an integer number, i.e. the ASCII code of the character itself.

Examples of declarations of strings and pointers:

char s[] = "test";  /* an array of 5 bytes, including the terminator */
char s[] = {'t', 'e', 's', 't', 0}; /* the same, more baroquely */
char c, *t; /* a character and a pointer to character */
c = *s;     /* c takes 't' */
t = s+2;    /* t represents the string "st" */
s[0] = 'p'; /* the string s is now "pest" */

char *name = "arthur"; /* a pointer to an initialized area, 7 bytes long */
char surname[] = "smith"; /* a 6 byte area, whose address is "surname" */
name++;                /*  now name is "rthur" */
surname++;             /*  error: surname is a constant address */

The following is a possible implementation of strlen, the function that returns the length of a string:

int strlen(char *s)
{
    char *t = s;
    for (; *t; t++)
        ;
    return t - s;
}

Function pointers

Like arrays, a function is represented by an address, the address of its code. Every time you use a function name in a program, you are actually using the pointer to the function. The most common use of function pointers is applying the parentheses operator, even f you don't usually think about this as using a function pointer. A function pointer can also be assigned to other pointers, for example within data structures that define the method that act on objects, or can be passed to argument as other functions, like the library function qsort, which implements the "quick sort" algorithm on an array. The compiler checks at compile time that pointer types in assignments are compatible, i.e. that the functions receive the same arguments. Example:

#include <string.h> /* get strcmp declaration*/
#include <stdlib.h> /* get  qsort declaration*/

char *strings[100]; /* array of 100 char pointers */

strcmp(strings[0], strings[1]); /* comparison of two strings */

/* let's call qsort telling it to use strcmp as compare function */
qsort(strings, 100, sizeof(char *), strcmp);

strncmp(strings[0], strings[1], 5); /* only compare the first 5 chars */

/* the following is an error, as strncmp receives three arguments */
qsort(strings, 100, sizeof(char *), strncmp);

Memory allocation

The language doesn't offer primitives of memory management ( line new, creators or destructors), and has no garbage collection.

Programs use three types of memory: static, dynamic, or automatic memory. A variable or data structure is static when it is declared at compilation time; the linker assigns an immutable address to it. A dynamic structure is allocated at program run-time, for example by calling malloc, and access to data happen through a pointer. A so called "automatic" variable is allocated on the stack and disappears when the program leaves the code block that declared it.

A static variable is initialize to zero, unless the program declares a constant value to load into the variable. Initialized variables are saved to disk and leave in the "data segment" of the program and of the executable ELF file. Uninitialized variables live in the "bss segment" of the program, which is a memory area that is allocated and zeroed before the program starts running. The executable file on disk doesn't include a copy of the BSS area but only a declaration.

A dynamic variable is stored in memory that is being asked to the system at run-time. After allocation, you can't make any assumption on the content of such memory space, it may be zero-filled but it may also contain information from data that was previously allocated and then freed. In any case, each malloc requires a corresponding free, missing which the program will experience memory leakage, and the size of the process will slowly and continuously increase during execution. While all the memory allocated by a user-space program is released at program termination, kernel memory that is allocated and not freed will lead to loss of memory, and such areas can only be recovered rebooting the system.

An automatic variable is a local variable in a function or in a code block. It lives on the stack and is not initialized, unless the programmer does it explicitly. If it initialized, the compiler will output the code that is needed to do the initialization. Memory associated to automatic variables, being on the stack, can't be used after the program leaves the block where it is defined.

Examples:

int i;              /* initialized to zero, lives in bss segment */
int v[4] = {2,1,};  /* initialized to {2,1,0,0}, data segment */
int j = f(3, i);    /* error: the value is not known at compile time */

int *f(int x, int y)
{
    int z;                            /* automatic, unknown value */
    int a=0, b=1, c=2;                /* run-time iniitialization */
    int *p = malloc(4 * sizeof(int)); /* another run-time operation */
    int *q, *r = &z;              /* two pointers: one points to z */

    *q = y;          /* error: q points to undefined place */
    *r = y;          /* correct: r points to z so this assigns z */
    if (x) return p; /* correct: p has been allocated and remains valid */
    else return &z;  /* error: z can't be used out of the function */
}

All the operators

A table of operators is found in operator.tbl, which lists priority and associativity. The file can be printed in an A4 or A5 page.

The operands of every operator are always other expressions, with two exceptions. This section is explaining how to use each operator, in the same order as operator.tbl.

()function call. The operator takes a single operand, a function name or a function pointer, but you must list the arguments inside the parentheses, each argument being an expression.
```
    extern int (*rd_data)(void *buffer, int count);
    void *p = malloc(1024);
    int result = rd_data(p,1024);
```
[]array element. The operator takes two operands, one before the brackets and the other between the two brackets: usually the first operand is a pointer and the second is an integer number, but in practice the operation is commutative. If v is a pointer to integer or an array, the following instructions are all equivalent:
```
    v[3] = 0; *(v+3) = 0; 3[v] = 0;
```
. structure element. The operands are a structure (i.e. an expression whose value is a structure, as all operands in general can be complex expressions) and the name of a field in that structure -- this is one of the cases where the operand is not an expression.
```
    #include <sys/stat.h> /* st_mode is an int field in struct stat */
    struct stat st, *stptr, stvec[10]; int i;
    i = st.st_mode;
    i = (*stptr).st_mode;
    i = stvec[5].st_mode;
```

-> structure element from pointer. The operands are a pointer to structure and the name of a field. This is the most common way to access structure fields.

    #include <sys/stat.h> /* st_mode is an int field in struct stat */
    struct stat st, *stptr, stvec[10]; int i;
    extern struct stat *getstatptr(int i); /* fictional function */
    i = stptr->st_mode;
    i = (stvec+5)->st_mode;
    i = (&st)->st_mode; /* & in parens: it has lower priority */
    getstatptr(5)->st_mode;

! logical negation. It negates the operand at its right: if the operand is 0, the result is one, otherwise the result is zero.
```
    p = malloc(128); if (!p) { /* error management */ }
    return !!i; /* returns 0 if i si 0, 1 otherwise */
```

~ one's complement. The operator negates all bits of the operand at its right.

    i = ~0;  /* 0xffffffff if i is 32 bits, 0xff if it is a char, etc */
    int page_mask = ~(PAGE_SIZE-1) /* 0xfffff000 if the page is 4k, 0x1000 */

- unary negation. It negates the expression at its right.

    i = -j;
    return -EINVAL; /* EINVAL is a positive integer error code */

++ -- increment and decrement. The operators have a single operand; if the the operator is after the operand (e.g: i++) then the increment or decrement is performed after using the value of the operand; if the operator is before the operand, the operand is incremented before using the value. The operand must be a value that can be assigned to (see the discussion about "lvalue", later in this file, where "=" is discussed).
```
   int stack[10], sp=0; /* stack pointer that points to the first empty cell */
   stack[sp++] = datum; /* inserting ("push") */
   datum = stack[--sp]; /* extracting ("pop") */

   i=10; while (--i) { /* i loops from 9 to 1 */ }
   i=10; while (i--) { /* i loops from 9 to 0 */ }
```

& address retrivial. The operator returns the address of the operand at its right.

    #include <sys/stat.h>
    struct stat stbuf;
    stat("/bin/sh", &stbuf); /* the function writes into stbuf */
    
    char s[32];
    sscanf(s, "%i", &i); /* sscanf parses the string and writes to i */

* pointer dereference. The operator references the pointer at its right, returning the pointed-to value.
```
    int v[32], *p;
    for (p = v + 32; p >= v; p--)
	sum = sum + *p;
```
(type) type change (cast). The syntax "(type)expression" converts the expression into the given type. The operation is often not generating code, for example if you convert from unsigned to signed, or between pointers to different types.
```
    /* mmap returns an address, and -1 is used in case of error */
    addr = mmap( /* arguments */ );
    if (addr == (void *)-1) { /* error management */ }
```
sizeof size in bytes. The sizeof operator is evaluated at compile time, and the result is a constant integer in the generated code. The operator returns the size of the type of the data item at its right. It's common practice to place the operand in parentheses and think about sizeof as a function, even though syntactically such parentheses are arithmetic (to alter priority of operations) as sizeof doesn't require parentheses.
```
    struct buf *buffers = malloc(10 * sizeof(struct buf));
    if (sizeof(int) == 2) { /* code for the 16-bit machines */ }
    else if (sizeof(int) == 8) { /* code for 64-bit */ }
    else { /* code for 32-bit processors */ }
```
* / multiplication and division. Normal arithmetic multiplication and division. The asterisk in this use case is not ambiguous with pointer dereferencing, because multiplication takes two operands and can't be performed on pointers. Integer multiplication doesn't handle any overflow, and integer division discards the remainder.
```
     fahr = cels * 9 / 5; /* integer temperature conversion */
     cels = fahr / 9 * 5; /* wrong as the rest is dicarded. Use "* 5 / 9" */
     int nsec = sec * 1000000000 /* overflow if sec is < -2 or sec > -2 */
```

% rest of integer division. When applied to two integer operands, it returns the remainder of the division.

    void print_time(int s)
    {
        int h, m;
	m = s / 60; s = s % 60;
        h = m / 60; m = m % 60;
        printf("%i:%02i:%02i\n", h, m, s);
    }

+ - sum and subtraction. One of the two operands may be a pointer, in which case the result is a pointer of the same type. Like for multiplication, no overflow is checked.
```
    int a=200, b=300; unsigned int c;
    c = a-b; /* a positive number: 2 to the 32th power -  100 */
```

<< >> bit shift. The result is the left operand shifted left or right by the number of bits represented by the right operand.

    /* conversion from rgb888 to rgb565 -- */
    unsigned char rgb[3]; unsigned short pixel;
    pixel = ((rgb[0]>>3) << 11) + ((rgb[1]>>2) <<5) + (rgb[2]>>3);
    /* be careful about saving the 16-bit pixel on big-endian/little-endian */

< <= > >= comparison. The result of the operation is an integer: 1 if comparison is true, 0 if comparison is false.

    int out_of_size = i>100 || i<50;
    i = i * 1000 / 254; /* convert centimeters to tens of inches */
    if (out_of_size) { /* error management */ }

== != comparison. Like above, the result is 0 or 1.
```
    int zero = i==0;
```
& bitwise AND between integers. Each bit of the result is 1 only if both corresponding bits are 1 in the operands.
```
    int low_byte = val & 0xff;
    int high_byte = val & 0xff00;
```

^ bitwise XOR between integers. Each bit of the result is 1 if the corresponding bits in the operands are different.

    while ( /* condition */ ) {
        /* calculation */
        led = led ^ 1; /* blink least significant bit */
     }

| bitwise integer OR. Each bit in the result is 1 if al least on of the operands' bits is 1.
```
    flags = flags | FLAG_BUSY;
    /* ... */
    flags = flags & ~FLAG_BUSY;
```
&& logical AND. The result is 1 if both operands are true (not zero), otherwise the result is zero. The second operand is evaluated only if the first operand is true; if the left one is false, the result is already known, so the right operand is not evaluated.
```
    /* call the print method only if no involved pointer is null */
    if (strptr && strptr->methods && strptr->methods->print)
        strptr->methods->print(strptr);
```
|| logical OR. The result is 1 only if at least one of the operands is not zero. If the left operand is zero, the right operand is not evaluated. The priority of OR is lower than AND because OR is similar to an addition, while AND is similar to a multiplication.
```
    if (v[i]  || fill_item(&v[i]) || set_default(&v[i]))
        /* work on v[i] */ ;
```
?: conditional expression. The operator receives three operands: if the expression before the question mark is true, the second expression (before the colon) is evaluated as result of the operator, otherwise the third expression is evaluated. The type of the third expression must be compatible with the type of the second expression.
```
    printf("%i byte%s in %i file%s", bytes, bytes==1 ? "" : "s",
           files, files==1 ? "" : "s");
```
= assignment. The assignment is an expression whose result is the value that has been assigned. The left operand ust be a variable or a data structure or an equivalent expression. Such operand is called "lvalue", for "left value". Any compiler message referring to an lvalue is about an assignment error.
```
    a = b = c = 0; /* a = (b = (c = 0)) */
    if (i = 0) /* syntactivally valid, it means if(0) */ ;
    stat_array[12]->st_mode = 0; /* good */
    "nome" = s; /* invalid lvalue: you can't assign to an array */
    3 = i; /* invalid lvalue, more apparent than above */
```

*= /= %= += -= <<= >>= &= ^= |= assignment. All assignments of the type "<expr1> op= <expr2>" are short forms for "<expr1> = <expr1> op <expr2>".

   m = s / 60; s %= 60; /* seconds to minutes and seconds */
   flags |= FLAG_BUSY; /* raise a bit */
   flags &= ~FLAG_BUSY; /* lower a bit */

, comma. The comma operator evaluates the expression at the left ignoring the result and evaluates the expression at the right returning the value as result. It is used mainly in while loops and to make for loops with two or more indices.
```
    while(next_number(&i), i) { /* while the new i is not zero */ }
    for (p = v, i = 0; i<32; p++, i++) { /* p scans the 32-long array */
```

The switch control construct

The control construct switch is used to choose between several different behaviours according to an integer expression, keeping in mind that a character between primes is an integer number. The syntax is different from that of other constructs, as the braces are mandatory. Moreover it uses as many as three keywords: switch, case and default.

The complete syntax is as follows:

switch ( integer-expression ) {
    case constant-expression :
        [ instruction ... ]
        [ break ; ]
    case constant-expression :
        [ instruction ... ]
        [ break ; ]
    [ default: ]
        [ instruction ... ]
        [ break ; ]
}

The expressions in each case must be constant expressions, i.e. they must be integer and their value must be known at compile time. After each case instructions are optional, to allow grouping the same code under several cases.

Putting break at the end of each case is optional, to allow instructions associated to a case to continue with the instructions of the next case; when you willingly avoid break you should always add a comment about it, or it will look like an error to people reading your code.

The default branch is optional; if it exists, it is used when no case expression matches the integer expression. Default is usually the last branch, but can appear in any position.

Example: extremely inefficient conversion from hex to decimal, one char at a time. Note how c is being modified after being used to select the correct calse; this shouldn't surprise you as the expression used to select the case is evaluated once only, at the beginning.

int value;
int nextchar(int c)
{
    switch(c) {
        case 'a': case 'b': case 'c': case 'd: case 'e': case 'f':
            c = c - 'a' + 10 + '0';
            /* fall through */
        case '0': case '1': case '2': case '3': case '4':
        case '5': case '6': case '7': case '8': case '9':
            value = value * 10 + c - '0';
            break;
        case 'p':
            printf("%i\n", value); value = 0;
            break;
        default:
            return -1; /* error */
    }
    return 0;
}

Usually switch is used to select between different commands, for example in the implementation of the ioctl() system call, or in the parsing of command line arguments. Using two or more case clauses for the same code block is uncommon, and likely uncommon is the need to fall through a case clause while evaluating the previous case.

Data stuctures

A data structure (struct) can include other structures or pointers to other structures. While pointers can cyclically refer a structure from another, structure inclusion can't be recursive as the included structure is contained in the including one in its entirety.

If a structure includes a pointer to another structure, such other structure must have been declared in advance (even without defining the field list), because the compiler reads source code only once. After declaration, without a definition, you can't instantiate a data structure because the compiler ignores its size, but you can instantiate a pointer to it, as all pointers have the same size.

Example:

struct father;

struct child {
    struct father *father;
    /* ... */
};

struct father {
    struct child *child;
    /* ... */
};

A structure declaration without a field list also allows to have opaque structures in a library. the technique is used for data which is private to the library. If a structure includes a pointer to another structure of the same type you not need to declare it in advance, because when the compiler reads the field list has already seen the structure name.

struct dpriv;
struct datum {
    struct dpriv *priv; /* users of "datum" ignore contents of "dpriv" */
    /* ... */
    struct datum *next; /* pointer to another struct, to build a list */
};

Data and code scope

The language has a single flat name space for variables and functions. A variable can't have the same name as a function, because in the linker a name is associated to a single address, be it code or data.

Unlike global variables, local ("automatic") variables are only visible in the block where they are declared. Such block can be a function or a composite instruction enclosed in braces, either the body of a control statement (if, for, and so on) or a standalone composite instruction. Variables which are local to a block are allocated on the stack, while you can't define local functions, whose scope is limited. If a variable defined within a block has the same name as another one, global or local (to an outer block), within the inner block the name refers to the inner variable. As said, function arguments can be used like variables that are local to the function itself.

The static keyword is a qualifier for code and data: it is used to change the default scope rules. It a global symbol (function or variable) is declared static, it isn't visible outside of the source file where it is defined, because its name is not exported to the linker. A local variable, if defined static, is allocated in the global data space, but without exporting its name; in this way you can have a persistent data space within the block where it is defined. Example:

int i; /* global */
static int j; /* global, but only visible in this file */
static int invert(int i) /* the function can only be called in this file */
{
      int j; /* allocated on the stack */
      j = -i; /* two local variables, where i is the function argument */
      return j;
}
int count(void) /* count is globally defined in the program */
{
      static int i;  /* local but persistent across calls, initially 0 */
      return ++i; /* increment the counter and return its value */
}

Important gcc options

The gcc compiler, like every implementation of cc, is passed command line options. Input files are processed according to their name: it the end in .c they are compiled, if the end in .S they are passed to the assembler and if the end in .o they are just passed to the linker. Its most important options are the following ones. Below, file refers to a generic filename, and not the same file in all examples:


gcc options -o file : this overrides
the default name for the output file, using the one specified instead.

gcc -c file : «compile only». The output
is an object file, instead of a complete executable. The default output
name is derived from the input name, but usually -o is
passed explicitly.

gcc -E file : preprocess only. By default
the preprocessed file is written to stdout, if you don't
specify it with -o.

gcc -Dsymbolname : defines a preprocessor macro,
assigning an empty string to the symbol, meant for @code{#ifdef} blocks.

gcc -Dsymbolname=value : defines a preprocessor
macro, assigning the value to the symbol.

gcc -Idirectory : tells the compiler to look
in the directory for include files, before looking in default directories.

gcc -Ldirectory : same as above, but for library
files.

gcc -lname : link with the specified library,
in addition to the default ones.



Example:

gcc -DDEBUG jpegdemo.c -I/usr/local/include -L/usr/local/lib
-ljpeg -o jpegdemo

Programming style

Please be consistent in you program layout: always indent blocks in the
same way, whatever you preferred indentation style is. The most common
style is the Kernighan and Ritchie one (open brace at end of line,
closed brace alone in a line).  Your personal preference is not very
important, but consistency in your files is.

The TAB character is 8 spaces, whatever your indenting level is
(2, 4, 8 spaces). Please check your editor's configuration, whose
default may be wrong.

Functions should be short and understandable. If a function gets too
complex you should split blocks that are conceptually separate into
separate functions.

Use data structures as much as possible, for better readability
and maintainability. Define creators and destructors for your
objects, using dynamic allocation instead of global variables.

Always check errors: every function you call may fail, the calling
code should check return values and behave in a reasonable way -- which
often means passing the error back to the caller.

Don't call exit from within a function if an error
happens, leaving that decision to the main program.

Add good comments to your code; avoid exceedingly "smart" constructs,
but if you do that please explain why you made the specific implementation
choice.

Always make clear your license terms in the source file; without
any such terms the "all rights reserved" applies by default. Even when
this is your intention, you should make that clear to avoid possible
doubts about it.

Avoid user interaction if not really needed. If needed, please read
stdin with fgets and then sscanf,
never use scanf directly as it may bite you; write to
stdout by complete lines, with a trailing 
'\n'. Avoid unneded output ("silence is golden") and unneeded
empty lines.

What's missing

Constructs that have not been covered in these two documents, as they are
rarely used, are:


enum: defining symbolic names for constants without
resorting to the preprocessor.

typedef: defining new types starting from existing ones.

union: a special type of structure, very useful in special
situations, but somewhat tricky.

volatile, const, inline:
qualifier to help for better code optimization.

goto and labels: this is a dreaded construct but there are
specific situations where it is useful.

register: an obsolete directive for optimization.
Avoid it and be careful about who promotes it.

bit fields: data structures may include fields whose dimension is
expressed in bits (e.g.: 1 bit, 6 bits, 18 bits).  The feature is almost
unused and has serious portability problems.

All gcc extensions, like use of assembly code within
C source files and a zillion other useful but exotique thing. If you need them,
they are well documented in the compiler manual.


    
    Alessandro Rubini


Last modified: Sep 2010