Main

January 28, 2004 (Wednesday)

Runtime Stack

The runtime stack is an implicit stack that is maintained automatically during program execution; we've alluded to this stack in previous lectures. The basic idea behind the stack is that everytime a function invocation takes place, a new stack frame (sometimes called an activation record) is pushed on the stack. Generally, the following information is pushed on the runtime stack during each function invocation (though different compilers may do things differently for efficiency and performance reasons):

Global variables

Some variables may be pushed on the runtime stack shortly after the program starts but before the main() function is actually called. Global variables (both static and non-static) are stored on the stack when the program starts. They are popped off the stack only after the program finishes. The lifetime of these variables extends over the lifetime of the entire program. static global variables have the same storage attributes as non-static global variables, but their linkage attributes are internal rather than external (i.e. functions in only one source module can access them. We'll be talking more about linkage attributes when we discuss multifile compilation in the next lecture). Global variables are initialized to zero.

Local static variables

Local static variables have storage (lifetime) attributes similar to global variables but with much more limited scope. Local static variables (like non-static local variables) can only be accessed inside the functions in which they are defined. static local variables are initialized when the function containing the variable is first executed. They do not get re-initialized each time the function containing them is executed. By default, static local variables are initialized to zero.

Like global variables, space is automatically allocated for local static variables at the very beginning of program execution. (i.e. a local static variable is not pushed onto the stack on each invocation of the function in which it resides). As a result, static local variables retain their values across subsequent invocations of the functions in which they reside. Recursive functions can use local static variables to reduce the amount of information that has to be pushed onto the stack on each invocation. This is particularly true if the value of the variable is constant over each invocation.

Local static variables cannot be used if the function is to be used in a multithreaded program. Some classic C library functions use static local variables (e.g. strtok()) and as a result would have to be made re-entrant if they are used in a multithreaded context.

Example

Consider the following (contrived) example:

#include <stdio.h>

int global = 1;

int func(int formal)
{
	static int	static_local;
	int		local = 0;

	++static_local;
	++local;
	return local + global + formal + static_local;
}

int
main()
{
	int	actual = 2;
	int	ret_value;
	
	ret_value = func(actual);
	++global;
	++actual;
	ret_value = func(actual);

	return 0;
}
stack.c

Initially, the global variable and static local variable are pushed on the stack and initialized (either explicitly by the programmer or implicitly, to 0, by the compiler). Note that space is made for static_local even though the function it resides in hasn't even been called yet. Also note that this variable is implicitly initialized to zero (unlike regular local variables, which must be initialized explicitly).

static_local (0)
global (1)

When the main() function is called, space is allocated on the stack for main()'s return value (an int) and main()'s local variables (actual and ret_value) are pushed on the stack. Note that actual is initialized to 2, but ret_value is uninitialized.

ret_value (?)
actual (2)
int (?)
static_local (0)
global (1)

Now, when we call func() for the first time, we make room on the stack for its return value and push the return address on the stack (in this case, the return address would be the address of the instruction that assigns the result of the function to the ret_value variable). Then we copy and push the function's formal parameter on the stack. Finally the function's local variable is pushed on the stack:

local (0)
formal (2)
return address
int (?)
ret_value (?)
actual (2)
int (?)
static_local (0)
global (1)

After performing the two preincrements, the runtime stack will have the following values:

local (1)
formal (2)
return address
int (?)
ret_value (?)
actual (2)
int (?)
static_local (1)
global (1)

After we do the additions:

  local + global + formal + static_local
=   1   +   1    +   2    +       1
=   5

the return is performed, causing the stack frame to be popped and control to jump to the return address on the stack, giving us the resulting stack:

int (5)
ret_value (?)
actual (2)
int (?)
static_local (1)
global (1)

The assignment of ret_value to the return value of the function is then performed by popping the return value off the stack. The global and actual variables are then incremented, giving us the stack:

ret_value (5)
actual (3)
int (?)
static_local (1)
global (2)

We then call the function again, pushing the appropriate values and addresses on the stack as before.

local (0)
formal (3)
return address
int (?)
ret_value (5)
actual (3)
int (?)
static_local (1)
global (2)

We increment local and static_local, as before. Note that static_local had retained its value from the last invocation of the function, so its value now gets updated to 2. This gives us the following runtime stack contents:

local (1)
formal (3)
return address
int (?)
ret_value (5)
actual (3)
int (?)
static_local (2)
global (2)

After we do the additions:

  local + global + formal + static_local
=   1   +   2    +   3    +       2
=   8

the return is performed, causing the stack frame to be popped and control to jump to the return address on the stack, giving us the resulting stack:

int (8)
ret_value (5)
actual (2)
int (?)
static_local (2)
global (2)

Again, ret_value is then assigned the return value of the function by popping the stack:

ret_value (8)
actual (2)
int (?)
static_local (2)
global (2)

main() then finishes by returning zero which is eventually returned to the operating system as the remainder of the runtime stack is destroyed.

Note that there are some simplifications made in the above example, but the general idea should be clear.

Dangling pointers

A dangling pointer is a pointer that points to a memory location that was once valid but is now bogus. For example, consider the following invalid C program:

int	*p;

void func1() 
{
	int	a = 0;
	p = &a;
}

void func2() 
{
	int	b = 1;
}

main()
{
	func1();
	printf ("%d\n", *p);
	func2();
	printf ("%d\n", *p);
}
bad.c

The p pointer pointed to a location on the runtime stack that is no longer valid when func1() ends. When func2() executes, p (may) just happen to point to the part of the runtime stack now occupied by func2()'s local variable b. The end result is that the two printf() calls may display different values. For example, running the above program on a Linux machine produces:

0
1

The memory area pointed to by p was on the runtime stack during the execution of func1() -- it pointed to the memory area occupied by func1()'s local variable a. When func1() finished, the runtime stack was popped, meaning that p no longer points an allocated area of memory. When func2 is then invoked, the runtime stack allocates room for the local variable of func2(), namely, b. In this particular example, p now points to the memory location occupied by b. Therefore, when we initialize b, we are also changing the value to which p points. Hence the change in output observed above.

Parsing Simple Strings

strtok() (K&R § B3)

C provides some rudimentary parsing functions which can be used to extract information from strings (i.e. character arrays). One function for parsing strings is called strtok() which is roughly analogous to Java's StringTokenizer class. Its use is demonstrated by the program below:

#include	<stdio.h>
#include	<string.h>

#define MAX_BUF 80

int
main()
{
	char	*st;
	char	 buffer[MAX_BUF + 1];
	const	 char *delim = ",\n:";

	if (fgets(buffer, sizeof(buffer), stdin) == NULL)
		return 1;

	if ((st = strtok(buffer, delim)) == NULL)
		return 1;
		
	do {
		puts(st);
	} while ((st = strtok(NULL, delim)) != NULL);

	return 0;
}
strtok.c

We read in a string from standard input using fgets() and store the characters in buffer. We then call strtok() for the first time, passing in the string we want to tokenize and a string of delimiters that delimit the substrings that we are trying to extract. The first call to strtok() will return a pointer to the first character of the first token in the string.

We then enter a do...while loop which displays the string and re-calls strtok(), this time with a first parameter of NULL. This signals strtok() to use the string that we supplied to the original invocation of strtok() and to find the next token in the string that is delimited by the supplied array of delimiter characters. When all the tokens have been extracted, strtok() will return NULL and the loop will terminate.

Because strtok() uses its own static array to store the string, the code is not suitable for multithreaded programs. Also, note that strtok() actually modifies the characters within the array passed to it (it places nul bytes in the strings as it extracts tokens). For these two reasons, use of the strtok() function is not encouraged, but can still be useful if you are aware of its limitations.

scanf() (K&R § 7.4)

A more succinct way of parsing strings is to use the scanf() function, which we saw earlier. Below, we are using this function to parse and store a date followed by a colon followed by a series of words:

#include	<stdio.h>

#define MAX_BUF 80

int
main()
{
	int	year, month, day;
	char	entry[MAX_BUF + 1];

	if (scanf("%d-%d-%d: %[^\n]", &year, &month, &day, entry) != 4) {
		fprintf(stderr, "Invalid input\n"); 
		return 1;
	}
	printf("Year: %d\nMonth: %d\nDay: %d\nEvent: %s\n",
		year, month, day, entry);

	return 0;
}
scanf.c

The format string denotes how the input is formatted. As expected, %d means that an integer is expected on the input. The %[^\n] conversion specifier means read and store all characters upto, but not including, the first newline. If we had used the %s specifier, then the scanf() function call would just input a single word (i.e. all characters upto, but not including the first whitespace). Note that the above code has a potential buffer overflow problem if more than 80 characters are entered on standard input.

The scanf() function will return the number of conversion specifiers it was able to apply. In the above example, if all four conversion specifiers were used during the input, then scanf() will return 4. In the event that all four values were extracted from the string, the above program will display the values.

There is also a file version of scanf() called fscanf() which takes a FILE * pointer as its first argument. This can be used to parse input coming from a file other than standard input (e.g. a file you opened using fopen()).


Last modified: January 29, 2004 02:04:53 NST (Thursday)