Tuesday, December 28, 2010

simple C example 2

Continuing from last time (here), we put all the previous code (minus the function main) into a separate file print_bits.c and then put the declarations of our two functions in a header file: print_bits.h:

void print_byte(unsigned char);
void print_word(const char *);

We modify main, adding code to get information passed in from the operating system, and put that in a separate file bits.c:

#include <stdio.h>
#include "print_bits.h"
#include "cast.h"

int main(int argc, const char* argv[]) {
const char *greeting = "Hello world!\0";
printf("greeting: %s\n\n", greeting);
char c;
c = 'a';
printf("%c\n", c);
print_word(greeting);
int i;
printf("%s", "\n");
if (argc > 1) {
for (i=1; i < argc; i++) {
print_word (argv[i]);
}
printf("%s", "\n");
}
cast_int();
return 0;
}

We need to #include <stdio.h> to use printf, but we don't need the other C library headers here; they are placed in print_bits.c. There is a second set of new files (cast.c and cast.h) that I'll explain below.

Our project isn't very complicated but it still can benefit from using the make tool. We put a file in the same directory as we're working called Makefile. Ours has this code:


bits: bits.o print_bits.o cast.o
gcc -o bits bits.o print_bits.o

bits.o: bits.c
gcc -c bits.c

print_bits.o: print_bits.c
gcc -c print_bits.c

cast.o: cast.c
gcc -c cast.c

clean:
rm -f bits.o print_bits.o cast.o bits


As explained in the manual for make and Norm Matloff's introduction (here), these instructions consist of

target ... : prerequisites ...
recipe
...


When invoked by itself, make will try to build an executable named bits by linking 3 object files, these in turn depend on the listed code files. The -c compiler directive says not to do the linking at that stage. And clean is self-explanatory. One odd requirement is that after the colon, and on each line that's shifted out, it must be a tab character.

The point is that make will call the compiler only if the timestamp of last modification to a .c file shows it's necessary. So in a complex project, only those files that have been changed get re-compiled in each debug cycle.

[UPDATE: As pointed out in comments, there was a problem with an earlier version of this post, because I forgot the proper syntax for our personal header files, which is #include "print_bits.h" etc. rather than #include <stdio.h>. The search path is determined by the format of the #include. See here. My makeshift solution :) was to tell make where to look for our files using -I., which says to search for them in the current directory. ]

The last part of this little project explores casting. We have two int variables j and k, and a long (int) variable m, declared and assigned in turn (in cast.c). In order to examine what these look like in memory, we do this:

char *c = (char *) &j;

which assigns the address of j to a char pointer c. We tell the compiler that yes, we really want to do this, using a cast (char *). We examine the surrounding bytes as follows:

    int i;
int j = 10;
int k = 660;
long m = 21;
char *c = (char *) &j;
printf("address of c: %p%s", c, "\n\n");
for (i=-16; i < 4; i++) {
print_byte(c[i]);
printf(" %p%s", &c[i], "\n");
}
printf(" int: %ld%s", sizeof(int), "\n");
printf(" long: %ld%s", sizeof(long), "\n");
printf("size_t: %ld%s", sizeof(size_t), "\n");



$ make
gcc -c bits.c -I.
cc bits.o print_bits.o cast.o -o bits
$ ./bits

address of c: 0x7fff5fbff9d8

00010101 21 0x7fff5fbff9c8
00000000 0 0x7fff5fbff9c9
00000000 0 0x7fff5fbff9ca
00000000 0 0x7fff5fbff9cb
00000000 0 0x7fff5fbff9cc
00000000 0 0x7fff5fbff9cd
00000000 0 0x7fff5fbff9ce
00000000 0 0x7fff5fbff9cf
00000000 0 0x7fff5fbff9d0
00000000 0 0x7fff5fbff9d1
00000000 0 0x7fff5fbff9d2
00000000 0 0x7fff5fbff9d3
10010100 148 0x7fff5fbff9d4
00000010 2 0x7fff5fbff9d5
00000000 0 0x7fff5fbff9d6
00000000 0 0x7fff5fbff9d7
00001010 10 0x7fff5fbff9d8
00000000 0 0x7fff5fbff9d9
00000000 0 0x7fff5fbff9da
00000000 0 0x7fff5fbff9db
int: 4
long: 8
size_t: 8

We can see a number of things from this example. The pointer to char c (assigned the address of j using &j), when dereferenced, gives decimal 10 in the first byte, followed by three empty bytes. The sizeof an int on our system is 4 bytes, even though this is a 64-bit executable:

$ file bits
bits: Mach-O 64-bit executable x86_64

The Intel Core 2 Duo chip addresses memory in "litte-endian" fashion---it's the first byte that holds the value 00001010. And four bytes previous to this is the first byte corresponding to the int k, even though k was assigned after j. (k = 148 + 2*256 = 660). One other thing is that the long int m, whose size is 8 bytes, actually starts 12 bytes before k. I'm not sure why this position was chosen, but I guess it's a "boundary" or something. The fundamental unit of size:

sizeof(size_t) 

is 8 bytes.

And finally, if we force the compiler to compile a 32-bit executable by using the flag m32:

gcc bits.c print_bits.c cast.c -I. -m32 -o bits

the last part will print

   int:  4
long: 4
size_t: 4

Now we get 4 byte (32-bit) longs and size_t is also 4 bytes.

Zipped files on Dropbox (here).

1 comment:

Michael Anderson said...

Its usual to use #include <xyz.h> for system includes and #include "xyz.h" for your own includes. Compilers handle them slightly differently, (mostly error reporting and search paths)