This document includes more in-depth information on C language, but completely ignoring library functions, that are not actually part of the language. This is an appendix to A-C-X-en.html and is meant to be useful for people already programming in some other language.
We must always keep in mind that C has been designed to be near the processor and its structures. It shouldn't thus be a surprising choice, made by language authors, to represent a array with the address of the first element. The concepts of array and pointer are thus interchangeable, and you can apply an index to a pointer, using square brackets like it was an array, without any error or warning. The name of an array is however different from a pointer in that you can't assign a new value to it: an array name represents a constant address, which has been chosen at compilation time.
You can sum and subtract pointers and integers: the result of summing a pointer and an integer n is a pointer to the nth element of the array. In other words, the number is not the number of bytes to add to the original address, but the number of elements. The scale factor is applied by the compiler according to the type of pointer you make arithmetics on. So you can increment or decrement a pointer, as well as taking the difference (but not the sum) of two pointers of the same type; the result is an integer number. If you make arithmetics on generic pointers (pointers to void), the gcc compiler uses 1 byte as scale factor, but the operation is not allowed according the standard.
The "null pointer" is zero and is not a valid pointer; usually
functions that return a pointer use the null pointer as an error
marker. The macro NULL
is usually "0", or sometimes
"(void *)0", and zero can be assigned to any type of pointer.
The most important operators to use pointers are
"*
" (read as "pointed by") and "&
"
("the address of").
Examples:
int i, v[10], *p; /* an integer, an array, a pointer: "it's integer the element of v, which is long 10 items" "it's integer what's pointed by p" */ p = v; /* p takes the value of v, i.e. the address of its first item */ p = &v[0]; /* same as above */ p = &v[4]; /* p takes the address of the fifth element of v */ p = v+4; /* same as above */ p++; /* p is incremented */ i = p-v; /* i takes 5, as a result of the previous two lines */
In rhe following loop, the integer i
takes the
sum of all elements of the array v
that are different from 0.
Note that there must be a zero-valued array element, or the loop
will overflow the array length. The language doesn't offer any check
on pointers and array indices.
i=0; for (p = v; *p; p++) i += *p;
Assignment between pointers of different types is an error, and it's also an error any arithmetic between pointers of different types. You can nonetheless convert a pointer of a type into another pointer, as well as pointers into integers and back. These conversions don't generate any machine code, as the values are all integers within the CPU; but they are sometimes needed to ensure type consistency of the source code.
A void pointer and always be assigned to any type of pointer, and any type of pointer can be assigned toa void pointer. This is allowed because the void pointer is used to handle generic memory addresses, a very common task in operating systems and system libraries.
The sizeof
, operator, applied to a type or a variable
name or an expression, returns the size in bytes of the named object.
This operation is performed at compile time, according to the type
being passed to sizeof
.
If a pointer p
is incremented, it's numeric value
(memory address, in bytes) is incremented by sizeof(*p)
.
Example:
int i, v[10], *p; /* the same variables used earlier */ i = sizeof(int); /* usually 4, but may be 8 or 2 */ i = sizeof(i); /* same as above */ i = sizeof(v[0]); /* same as above */ i = sizeof(*p); /* same as above */ i = sizeof(p); /* usually 4 (pointer size), or 8 */ i = sizeof(v); /* 40, or 80, or 20 */ i = sizeof(v)/sizeof(*v); /* 10: the number of elements in the array */ #define ARRAY_SIZE(x) (sizeof(x)/sizeof(*x)) /* a useful macro */
C strings are arrays of characters, each with a zero byte as terminator. They double-quote representation is just a simplified notation to to represent an array. Each time a string in double-quotes appear in the program text, the compiler saves the string in the data segment of the program and represents it with the address of the first character. A character within single quotes is an integer number, i.e. the ASCII code of the character itself.
Examples of declarations of strings and pointers:
char s[] = "test"; /* an array of 5 bytes, including the terminator */ char s[] = {'t', 'e', 's', 't', 0}; /* the same, more baroquely */ char c, *t; /* a character and a pointer to character */ c = *s; /* c takes 't' */ t = s+2; /* t represents the string "st" */ s[0] = 'p'; /* the string s is now "pest" */ char *name = "arthur"; /* a pointer to an initialized area, 7 bytes long */ char surname[] = "smith"; /* a 6 byte area, whose address is "surname" */ name++; /* now name is "rthur" */ surname++; /* error: surname is a constant address */
strlen
,
the function that returns the length of a string:
int strlen(char *s) { char *t = s; for (; *t; t++) ; return t - s; }
Like arrays, a function is represented by an address, the
address of its code. Every time you use a function name in a
program, you are actually using the pointer to the function. The
most common use of function pointers is applying the
parentheses operator, even f you don't usually think about this
as using a function pointer. A function pointer can also
be assigned to other pointers, for example within data
structures that define the method that act on objects, or
can be passed to argument as other functions, like the
library function qsort
, which implements
the "quick sort" algorithm on an array. The compiler checks
at compile time that pointer types in assignments are compatible,
i.e. that the functions receive the same arguments.
Example:
#include <string.h> /* get strcmp declaration*/ #include <stdlib.h> /* get qsort declaration*/ char *strings[100]; /* array of 100 char pointers */ strcmp(strings[0], strings[1]); /* comparison of two strings */ /* let's call qsort telling it to use strcmp as compare function */ qsort(strings, 100, sizeof(char *), strcmp); strncmp(strings[0], strings[1], 5); /* only compare the first 5 chars */ /* the following is an error, as strncmp receives three arguments */ qsort(strings, 100, sizeof(char *), strncmp);
The language doesn't offer primitives of memory management (
line new
, creators or destructors), and has no garbage
collection.
Programs use three types of memory: static, dynamic, or automatic
memory. A variable or data structure is static when it is declared
at compilation time; the linker assigns an immutable address to it.
A dynamic structure is allocated at program run-time, for example
by calling malloc
, and access to data happen through
a pointer. A so called "automatic" variable is allocated on the stack
and disappears when the program leaves the code block that declared it.
A static variable is initialize to zero, unless the program declares a constant value to load into the variable. Initialized variables are saved to disk and leave in the "data segment" of the program and of the executable ELF file. Uninitialized variables live in the "bss segment" of the program, which is a memory area that is allocated and zeroed before the program starts running. The executable file on disk doesn't include a copy of the BSS area but only a declaration.
A dynamic variable is stored in memory that is being asked to
the system at run-time. After allocation, you can't make any assumption
on the content of such memory space, it may be zero-filled but it may
also contain information from data that was previously allocated and
then freed. In any case, each malloc
requires a
corresponding free
, missing which the program will
experience memory leakage, and the size of the process will slowly
and continuously increase during execution. While all the memory
allocated by a user-space program is released at program termination,
kernel memory that is allocated and not freed will lead to loss of
memory, and such areas can only be recovered rebooting the system.
An automatic variable is a local variable in a function or in a code block. It lives on the stack and is not initialized, unless the programmer does it explicitly. If it initialized, the compiler will output the code that is needed to do the initialization. Memory associated to automatic variables, being on the stack, can't be used after the program leaves the block where it is defined.
Examples:
int i; /* initialized to zero, lives in bss segment */ int v[4] = {2,1,}; /* initialized to {2,1,0,0}, data segment */ int j = f(3, i); /* error: the value is not known at compile time */ int *f(int x, int y) { int z; /* automatic, unknown value */ int a=0, b=1, c=2; /* run-time iniitialization */ int *p = malloc(4 * sizeof(int)); /* another run-time operation */ int *q, *r = &z; /* two pointers: one points to z */ *q = y; /* error: q points to undefined place */ *r = y; /* correct: r points to z so this assigns z */ if (x) return p; /* correct: p has been allocated and remains valid */ else return &z; /* error: z can't be used out of the function */ }
A table of operators is found in operator.tbl, which lists priority and associativity. The file can be printed in an A4 or A5 page.
The operands of every operator are always other expressions, with
two exceptions. This section is explaining how to use each operator,
in the same order as operator.tbl
.
()
function call. The operator takes
a single operand, a function name or a function pointer, but you
must list the arguments inside the parentheses, each argument being
an expression.
extern int (*rd_data)(void *buffer, int count); void *p = malloc(1024); int result = rd_data(p,1024);
[]
array element. The operator
takes two operands, one before the brackets and the other between
the two brackets: usually the first operand is a pointer and the second
is an integer number, but in practice the operation is commutative.
If v
is a pointer to integer or an array, the following
instructions are all equivalent:
v[3] = 0; *(v+3) = 0; 3[v] = 0;
.
structure element. The operands
are a structure (i.e. an expression whose value is a structure, as
all operands in general can be complex expressions) and the name
of a field in that structure -- this is one of the cases where the
operand is not an expression.
#include <sys/stat.h> /* st_mode is an int field in struct stat */ struct stat st, *stptr, stvec[10]; int i; i = st.st_mode; i = (*stptr).st_mode; i = stvec[5].st_mode;
->
structure element from pointer.
The operands are a pointer to structure and the name of a field. This
is the most common way to access structure fields.
#include <sys/stat.h> /* st_mode is an int field in struct stat */ struct stat st, *stptr, stvec[10]; int i; extern struct stat *getstatptr(int i); /* fictional function */ i = stptr->st_mode; i = (stvec+5)->st_mode; i = (&st)->st_mode; /* & in parens: it has lower priority */ getstatptr(5)->st_mode;
!
logical negation. It negates the
operand at its right: if the operand is 0, the result is one, otherwise
the result is zero.
p = malloc(128); if (!p) { /* error management */ } return !!i; /* returns 0 if i si 0, 1 otherwise */
~
one's complement. The operator negates
all bits of the operand at its right.
i = ~0; /* 0xffffffff if i is 32 bits, 0xff if it is a char, etc */ int page_mask = ~(PAGE_SIZE-1) /* 0xfffff000 if the page is 4k, 0x1000 */
-
unary negation. It negates the
expression at its right.
i = -j; return -EINVAL; /* EINVAL is a positive integer error code */
++ --
increment and decrement.
The operators have a single operand; if the the operator is after
the operand (e.g: i++
) then the increment or decrement
is performed after using the value of the operand; if the operator
is before the operand, the operand is incremented before using the
value. The operand must be a value that can be assigned to (see
the discussion about "lvalue", later in this file, where "="
is discussed).
int stack[10], sp=0; /* stack pointer that points to the first empty cell */ stack[sp++] = datum; /* inserting ("push") */ datum = stack[--sp]; /* extracting ("pop") */ i=10; while (--i) { /* i loops from 9 to 1 */ } i=10; while (i--) { /* i loops from 9 to 0 */ }
&
address retrivial. The operator
returns the address of the operand at its right.
#include <sys/stat.h> struct stat stbuf; stat("/bin/sh", &stbuf); /* the function writes into stbuf */ char s[32]; sscanf(s, "%i", &i); /* sscanf parses the string and writes to i */
*
pointer dereference. The operator
references the pointer at its right, returning the pointed-to value.
int v[32], *p; for (p = v + 32; p >= v; p--) sum = sum + *p;
(type)
type change (cast).
The syntax "(type)expression
" converts the expression
into the given type. The operation is often not generating code, for example
if you convert from unsigned
to signed
,
or between pointers to different types.
/* mmap returns an address, and -1 is used in case of error */ addr = mmap( /* arguments */ ); if (addr == (void *)-1) { /* error management */ }
sizeof
size in bytes. The sizeof
operator is evaluated at compile time, and the result is a constant
integer in the generated code. The operator returns the size of the type
of the data item at its right. It's common practice to place the
operand in parentheses and think about sizeof
as
a function, even though syntactically such parentheses are
arithmetic (to alter priority of operations) as sizeof
doesn't require parentheses.
struct buf *buffers = malloc(10 * sizeof(struct buf)); if (sizeof(int) == 2) { /* code for the 16-bit machines */ } else if (sizeof(int) == 8) { /* code for 64-bit */ } else { /* code for 32-bit processors */ }
* /
multiplication and division.
Normal arithmetic multiplication and division. The asterisk in
this use case is not ambiguous with pointer dereferencing,
because multiplication takes two operands and can't be performed on pointers.
Integer multiplication doesn't handle any overflow, and integer
division discards the remainder.
fahr = cels * 9 / 5; /* integer temperature conversion */ cels = fahr / 9 * 5; /* wrong as the rest is dicarded. Use "* 5 / 9" */ int nsec = sec * 1000000000 /* overflow if sec is < -2 or sec > -2 */
%
rest of integer division. When applied
to two integer operands, it returns the remainder of the division.
void print_time(int s) { int h, m; m = s / 60; s = s % 60; h = m / 60; m = m % 60; printf("%i:%02i:%02i\n", h, m, s); }
+ -
sum and subtraction. One of
the two operands may be a pointer, in which case the result
is a pointer of the same type. Like for multiplication, no
overflow is checked.
int a=200, b=300; unsigned int c; c = a-b; /* a positive number: 2 to the 32th power - 100 */
<< >>
bit shift.
The result is the left operand shifted left or right by the number
of bits represented by the right operand.
/* conversion from rgb888 to rgb565 -- */ unsigned char rgb[3]; unsigned short pixel; pixel = ((rgb[0]>>3) << 11) + ((rgb[1]>>2) <<5) + (rgb[2]>>3); /* be careful about saving the 16-bit pixel on big-endian/little-endian */
< <= > >=
comparison. The result
of the operation is an integer: 1 if comparison is true, 0 if comparison
is false.
int out_of_size = i>100 || i<50; i = i * 1000 / 254; /* convert centimeters to tens of inches */ if (out_of_size) { /* error management */ }
== !=
comparison. Like above,
the result is 0 or 1.
int zero = i==0;
&
bitwise AND between integers. Each
bit of the result is 1 only if both corresponding bits are 1 in the
operands.
int low_byte = val & 0xff; int high_byte = val & 0xff00;
^
bitwise XOR between integers. Each bit
of the result is 1 if the corresponding bits in the operands are different.
while ( /* condition */ ) { /* calculation */ led = led ^ 1; /* blink least significant bit */ }
|
bitwise integer OR. Each
bit in the result is 1 if al least on of the operands' bits is 1.
flags = flags | FLAG_BUSY; /* ... */ flags = flags & ~FLAG_BUSY;
&&
logical AND. The result
is 1 if both operands are true (not zero), otherwise the result
is zero. The second operand is evaluated only if the
first operand is true; if the left one is false, the result is
already known, so the right operand is not evaluated.
/* call the print method only if no involved pointer is null */ if (strptr && strptr->methods && strptr->methods->print) strptr->methods->print(strptr);
||
logical OR. The result is 1
only if at least one of the operands is not zero. If the left
operand is zero, the right operand is not evaluated. The priority
of OR is lower than AND because OR is similar to an addition, while
AND is similar to a multiplication.
if (v[i] || fill_item(&v[i]) || set_default(&v[i])) /* work on v[i] */ ;
?:
conditional expression. The
operator receives three operands: if the expression before the
question mark is true, the second expression (before the colon) is
evaluated as result of the operator, otherwise the third expression is
evaluated. The type of the third expression must be compatible
with the type of the second expression.
printf("%i byte%s in %i file%s", bytes, bytes==1 ? "" : "s", files, files==1 ? "" : "s");
=
assignment. The assignment is
an expression whose result is the value that has been assigned.
The left operand ust be a variable or a data structure or an equivalent
expression. Such operand is called "lvalue", for "left value". Any
compiler message referring to an lvalue is about an assignment error.
a = b = c = 0; /* a = (b = (c = 0)) */ if (i = 0) /* syntactivally valid, it means if(0) */ ; stat_array[12]->st_mode = 0; /* good */ "nome" = s; /* invalid lvalue: you can't assign to an array */ 3 = i; /* invalid lvalue, more apparent than above */
*= /= %= += -= <<= >>= &= ^= |=
assignment. All assignments of the type
"<expr1> op= <expr2>" are short forms for
"<expr1> = <expr1> op <expr2>".
m = s / 60; s %= 60; /* seconds to minutes and seconds */ flags |= FLAG_BUSY; /* raise a bit */ flags &= ~FLAG_BUSY; /* lower a bit */
,
comma. The comma operator evaluates
the expression at the left ignoring the result and evaluates the
expression at the right returning the value as result.
It is used mainly in while
loops and to
make for
loops with two or more indices.
while(next_number(&i), i) { /* while the new i is not zero */ } for (p = v, i = 0; i<32; p++, i++) { /* p scans the 32-long array */
The control construct switch
is used to choose between
several different behaviours according to an integer expression,
keeping in mind that a character between primes is an integer number.
The syntax is different from that of other constructs, as the
braces are mandatory. Moreover it uses as many as three keywords:
switch
, case
and default
.
The complete syntax is as follows:
switch ( integer-expression ) { case constant-expression : [ instruction ... ] [ break ; ] case constant-expression : [ instruction ... ] [ break ; ] [ default: ] [ instruction ... ] [ break ; ] }
The expressions in each case
must be constant
expressions, i.e. they must be integer and their value must be known
at compile time. After each case
instructions are optional,
to allow grouping the same code under several cases.
Putting break
at the end of each case is optional,
to allow instructions associated to a case to continue with the
instructions of the next case; when you willingly avoid break
you should always add a comment about it, or it will look like an
error to people reading your code.
The default
branch is optional; if it exists, it is used
when no case
expression matches the integer expression.
Default is usually the last branch, but can appear in any position.
Example: extremely inefficient conversion from hex to decimal, one
char at a time. Note how c
is being modified after being used
to select the correct calse; this shouldn't surprise you as the expression
used to select the case is evaluated once only, at the beginning.
int value; int nextchar(int c) { switch(c) { case 'a': case 'b': case 'c': case 'd: case 'e': case 'f': c = c - 'a' + 10 + '0'; /* fall through */ case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': value = value * 10 + c - '0'; break; case 'p': printf("%i\n", value); value = 0; break; default: return -1; /* error */ } return 0; }
Usually switch
is used to select between different commands,
for example in the implementation of the ioctl()
system
call, or in the parsing of command line arguments. Using two or more
case
clauses for the same code block is uncommon,
and likely uncommon is the need to fall through a case
clause while evaluating the previous case.
A data structure (struct
) can include other structures
or pointers to other structures. While pointers can cyclically refer
a structure from another, structure inclusion can't be recursive as
the included structure is contained in the including one in its
entirety.
If a structure includes a pointer to another structure, such other structure must have been declared in advance (even without defining the field list), because the compiler reads source code only once. After declaration, without a definition, you can't instantiate a data structure because the compiler ignores its size, but you can instantiate a pointer to it, as all pointers have the same size.
Example:
struct father; struct child { struct father *father; /* ... */ }; struct father { struct child *child; /* ... */ };
A structure declaration without a field list also allows to have opaque structures in a library. the technique is used for data which is private to the library. If a structure includes a pointer to another structure of the same type you not need to declare it in advance, because when the compiler reads the field list has already seen the structure name.
struct dpriv; struct datum { struct dpriv *priv; /* users of "datum" ignore contents of "dpriv" */ /* ... */ struct datum *next; /* pointer to another struct, to build a list */ };
The language has a single flat name space for variables and functions. A variable can't have the same name as a function, because in the linker a name is associated to a single address, be it code or data.
Unlike global variables, local ("automatic") variables are only
visible in the block where they are declared. Such block can be a
function or a composite instruction enclosed in braces, either the
body of a control statement (if
, for
, and so
on) or a standalone composite instruction. Variables which are local
to a block are allocated on the stack, while you can't define local
functions, whose scope is limited. If a variable defined within a block
has the same name as another one, global or local (to an outer block),
within the inner block the name refers to the inner variable. As
said, function arguments can be used like variables that are local to
the function itself.
The static
keyword is a qualifier for code and data:
it is used to change the default scope rules. It a global symbol (function
or variable) is declared static
, it isn't visible outside
of the source file where it is defined, because its name is not exported
to the linker. A local variable, if defined
static
, is allocated in the global data space, but without
exporting its name; in this way you can have a persistent data space
within the block where it is defined.
Example:
int i; /* global */ static int j; /* global, but only visible in this file */ static int invert(int i) /* the function can only be called in this file */ { int j; /* allocated on the stack */ j = -i; /* two local variables, where i is the function argument */ return j; } int count(void) /* count is globally defined in the program */ { static int i; /* local but persistent across calls, initially 0 */ return ++i; /* increment the counter and return its value */ }
The Example: Please be consistent in you program layout: always indent blocks in the
same way, whatever you preferred indentation style is. The most common
style is the Kernighan and Ritchie one (open brace at end of line,
closed brace alone in a line). Your personal preference is not very
important, but consistency in your files is.
The TAB character is 8 spaces, whatever your indenting level is
(2, 4, 8 spaces). Please check your editor's configuration, whose
default may be wrong.
Functions should be short and understandable. If a function gets too
complex you should split blocks that are conceptually separate into
separate functions.
Use data structures as much as possible, for better readability
and maintainability. Define creators and destructors for your
objects, using dynamic allocation instead of global variables.
Always check errors: every function you call may fail, the calling
code should check return values and behave in a reasonable way -- which
often means passing the error back to the caller.
Don't call Add good comments to your code; avoid exceedingly "smart" constructs,
but if you do that please explain why you made the specific implementation
choice.
Always make clear your license terms in the source file; without
any such terms the "all rights reserved" applies by default. Even when
this is your intention, you should make that clear to avoid possible
doubts about it.
Avoid user interaction if not really needed. If needed, please read
stdin with Constructs that have not been covered in these two documents, as they are
rarely used, are:
gcc
compiler, like every implementation of
cc
, is passed command line options. Input files are processed
according to their name: it the end in .c
they are compiled,
if the end in .S
they are passed to the assembler and
if the end in .o
they are just passed to the linker.
Its most important options are the following
ones. Below, file
refers to a generic filename,
and not the same file in all examples:
gcc options -o file
: this overrides
the default name for the output file, using the one specified instead.gcc -c file
: «compile only». The output
is an object file, instead of a complete executable. The default output
name is derived from the input name, but usually -o
is
passed explicitly.gcc -E file
: preprocess only. By default
the preprocessed file is written to stdout, if you don't
specify it with -o
.gcc -Dsymbolname
: defines a preprocessor macro,
assigning an empty string to the symbol, meant for @code{#ifdef} blocks.gcc -Dsymbolname=value
: defines a preprocessor
macro, assigning the value to the symbol.gcc -Idirectory
: tells the compiler to look
in the directory for include files, before looking in default directories.gcc -Ldirectory
: same as above, but for library
files.gcc -lname
: link with the specified library,
in addition to the default ones.
gcc -DDEBUG jpegdemo.c -I/usr/local/include -L/usr/local/lib
-ljpeg -o jpegdemo
Programming style
exit
from within a function if an error
happens, leaving that decision to the main program.
fgets
and then sscanf
,
never use scanf
directly as it may bite you; write to
stdout by complete lines, with a trailing
'\n'
. Avoid unneded output ("silence is golden") and unneeded
empty lines.
What's missing
Alessandro Rubini
Last modified: Sep 2010
enum
: defining symbolic names for constants without
resorting to the preprocessor.typedef
: defining new types starting from existing ones.union
: a special type of structure, very useful in special
situations, but somewhat tricky.volatile
, const
, inline
:
qualifier to help for better code optimization.goto
and labels: this is a dreaded construct but there are
specific situations where it is useful.register
: an obsolete directive for optimization.
Avoid it and be careful about who promotes it.gcc
extensions, like use of assembly code within
C source files and a zillion other useful but exotique thing. If you need them,
they are well documented in the compiler manual.