Main

January 14, 2004 (Wednesday)

More Format Specifiers (K&R 7.2)

We've already seen the %c and %d format conversion specifiers that can be used with the printf(). There are many others variations, as given on page 154 of K&R. For example, consider the following code:

#include	<stdio.h>

int
main()
{
	char            ch     = '\n';
	unsigned int 	bignum = 4294242914U;

	printf("%d\n", ch);       /* Displays: 10         */
	printf("%x\n", ch);       /* Displays: a          */
	printf("<%02X>\n", ch);   /* Displays: <0A>       */      
	printf("%d\n", bignum);   /* Displays: -724382	  */
	printf("%u\n", bignum);   /* Displays: 4294242914 */

	return 0;
}
formats.c

In the first printf(), we are simply displaying the numeric ASCII value of the newline character (\n) which is 10. The second printf() uses the %x conversion specifier, which takes the numeric ASCII value of the newline character and displays it in (lower-case) hexadecimal. This results in a being displayed because that's the hexadecimal equivalent of 10. The third printf() uses the %02X specifier. This also displays the character in its hexadecimal form but it uses an uppercase A instead because of the uppercase X in the conversion specifier. The hexadecimal number is also displayed right justified inside a field width of 2 and any initial spaces will be replaced with zeros.

The fourth printf() displays its numeric argument (bignum) as a signed integer. However, the literal constant that we assigned to bignum is too large to be represented as a signed number. As a result, when we try to display the number as a signed integer, the value -724382 is displayed. To correct this problem, we must display the number using the %u conversion specifier, as in the fifth printf() call, which treats its corresponding argument as being an unsigned integer.

As an aside, note that the literal constant used in the initialization of bignum ends in U. This tells the compiler that we are aware that this number is too large to be a signed integer. Without the U, the compiler would generate a warning.

Enumerated types (K&R § 2.3)

To motivate the concept of enumerated types, assume that we are writing a program which monitors the status of some sort of data acquisition device. The device can be in several states, including OK and FAIL. If the device's buffer is full or empty it could also be in states FULL or EMPTY, for example. If we want to associate each state with an integer, we can set the following #define macros in our program:

#define OK    0
#define FAIL  1
#define FULL  2
#define EMPTY 3

Needless to say, this is quite tedious and error prone, especially if we have a lot of states. If we wanted to insert a state between OK and FAIL, we would have to change all the subsequent #defines. Instead of doing this, we can simply introduce an enumerated type as demonstrated by the following program:

#include <stdio.h>

enum status { ST_OK, ST_FAIL, ST_FULL, ST_EMPTY };

int
main()
{
	enum status st = ST_OK;
	
	/* ... */
	switch (st)   {
		case ST_OK:
			break;
		case ST_FAIL:
			fprintf(stderr, "Error");
			break;
		case ST_FULL:
			/* ... */
			break;
		case ST_EMPTY:
			/* ... */
			break;
		default:
			printf("Hello");
			break;
	}
	return 0;
}
en.c

Each constant of the status enumerated type will be initialized for us (starting from zero). We can then add/insert/delete as many new states as we like and the compiler will keep track of the values for us. Note that it is common to prefix each symbolic constant of the enumerated type with a common string. In the above example, we use ST_. This creates a sort of artificial namespace which may lessen the conflicts between constants from different enumerated types.

"Strings" (i.e. character arrays) (K&R § 1.9, 5.5)

C does not have a first-class string data type. Instead, strings are represented as an array of characters that are terminated with a nul byte, '\0' (ASCII 0). Consider the following example:

#include	<stdio.h>

int
main()
{
	char	string1[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
	char	string2[] = "World.";

	printf("%s, ", string1);
	puts(string2);

	string2[3] = 'k';
	string2[4] = '\0';
	printf("Array containing \"%s\" has %d bytes\n",
			string2, sizeof(string2));
	return 0;
}
chararray1.c

string1 is initialized by specifying each individual character in much the same way that the integer array was initialized earlier. Note that the nul byte is explicitly specified at the end of the string. Needless to say, it is very tedious to have to specify the characters of the string this way, so C allows you to initialize an array using a string literal (in this case "World."). When this is done, the compiler creates an array of sufficient size to hold each of the characters as well as the nul byte. During runtime, the characters of the string literal, including its implicit nul byte at the end are copied into the array.

If we had explicitly specified a dimension for the array that was too large (e.g. char string2[20] = "World."), then the unused space would be filled with '\0' (nul bytes). However, if the dimension we specified was not large enough to hold the nul byte, then the resulting character array will not have the nul byte at the end. For example, if we initialize an array as

char str[5] = "hello"

then the nul byte will not be stored and unpredictable results will occur when you try to display the string.

The printf() function call uses the %s conversion specifier in its format string to display string1 followed by a comma and a space. This specifier requires that the corresponding argument in printf()'s argument list be a pointer to a character. As we will see later, using string1 satisfies this requirement. The %s specifier will display the sequence of characters starting at the specified location until it encounters '\0'.

The puts() function (which is also declared in the stdio.h header file) is then called using string2. The puts() function simply puts the supplied string on the display followed by a newline. It is simpler to use and quicker to execute than printf(), so it should be used when all you wish to do is simply display a collection of characters (that must be terminated with a nul byte) with no formatting. A variant of puts() is fputs() which takes a string argument and a stream argument. The stream argument indicates to which stream the string is to be displayed. Two output streams which are available to C programs are stdout (standard output -- which we've already seen) and stderr (standard error). To display a string (typically a diagnostic message) to stderr, for example, you would write:

fputs("array index out of bounds!\n", stderr);

By default, both standard output and standard error are sent to your display. In subsequent lectures, we'll see how to separate output sent to stdout from output sent to stderr by using redirection on the command line. Note that fputs("A string", stdout) is almost equivalent to puts("A string"), since both strings are sent to standard output. The latter case is almost always used when writing to standard output because it is shorter to type. Another difference is that fputs() does not automatically add the newline character to the output like puts() does, so you have to add it yourself as the above example shows.

We can change the contents of the array of characters as we would any other array. For example, when the line string2[3] = 'k', is executed, the fourth character of the string2 array is changed from an 'l' to a 'k'. We can also shorten the string by writing a nul byte earlier in the array. For example, string[4] = '\0' shortens the string to just "Work".

Finally, the above program displays string2 delimited by quotation marks and a count of the number of bytes (characters) that the string2 array can hold (including the nul byte). Note that we can display a double-quote character by escaping it with a backslash inside printf()'s format string.

The output of the program is:

Hello, World.
Array containing "Work" has 7 bytes

Note that making the string shorter does not actually change the size of the array that contains it.

Ultimately, when dealing with strings, there are a couple of very important points to remember:

  1. Ensure that all strings are terminated with a nul byte.
  2. Always make sure that any array to which a string is copied has enough room for the characters of the string as well as the terminating nul byte.

In some cases when you forget to add the trailing nul byte or forget to ensure there is enough space in your character array to accommodate it, your program may still appear to be working fine. Unfortunately, problems may not actually arise until much later. It is for this reason that nul byte issues can be very problematic to resolve.

Standard string functions: strcpy(), strcat(), strcmp() and strlen() (K&R § 2.8, 5.3, 5.5)

The C standard library provides several functions for handling with strings. These function are all declared in string.h and so any source file that calls these functions should have #include <string.h>.

strcpy(dst,src) Copies string src to dst (including the nul byte).
strcat(dst,src) Concatenates src to the end of dst. The nul byte from src is placed at the end of the concatenated string.
strcmp(str1,str2) Compares the characters of the two strings. If the first one is alphabetically less than the second, then return an integer which is less than 0. If the first one is greater than the second, then return an integer that is greater than zero. Otherwise, if they are equal, then return 0.
strlen(str) Return the length of the string (this length does not include the nul byte)

The following program demonstrates their usage:

#include	<stdio.h>
#include	<string.h>

#define	MAX_LEN 10
#define	ALPHA_LEN 26

int
main()
{
	char	strings[][MAX_LEN] = {	"abcdefghi",
					"jklmnop",
					"",
					"qrstu",
					"vwxyz" };
	char	alpha[ALPHA_LEN + 1]; /* "+ 1" is for the nul byte */
	int	i;

	strcpy(alpha, "");

	for (i = 0; i < sizeof(strings)/sizeof(strings[0]); i++) 
		strcat(alpha, strings[i]);

	printf("\"%s\" has length %d\n", alpha, strlen(alpha));

	if (strcmp(alpha, "abcdefghijklmnopqrstuvwxyz") == 0)
		puts("The resulting string forms the alphabet");

	return 0;
}
chararray2.c

This code creates a two-dimensional array (strings) to hold a collection of strings and a one-dimensional array (alpha) to hold the result of concatenating all the strings in the two dimensional array. Note that we add one to the size of alpha's array. This is to explicitly accommodate the nul byte.

Internally, the two-dimensional array looks as follows:

Columns
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
Rows
strings[0] a b c d e f g h i \0
strings[1] j k l m n o p \0 \0 \0
strings[2] \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
strings[3] q r s t u \0 \0 \0 \0 \0
strings[4] v w x y z \0 \0 \0 \0 \0

Note that there is a lot of wasted space here as nearly half of all of the array's contents are nul bytes. We'll see a more efficient way of storing an array of strings when we discuss pointers. Note also that all the strings have at least one nul byte at the end. Indeed, the string denoted by strings[2] consists of all nul bytes. This is perfectly valid: strings[2] is essentially an empty string (i.e. a string of length 0).

Because arrays that are not initialized have undefined contents, the program copies an empty string into alpha using strcpy(). We must ensure that alpha is a valid nul terminated array because we are concatenating to it later on. Using strcpy() to copy an empty string isn't particularly efficient. Instead, we could have simply initialized alpha to "" when we defined it (we could also have said alpha[0] = '\0' instead of saying strcpy(alpha, "") -- they both have the same effect.)

The for loop then executes once for each row of the array i.e. once for each string. Because sizeof(strings) is 50 and sizeof(strings[0]) is 10 (note: strings[0] is a one dimensional array), the loop will execute five times. Each time through the loop, the next string from the strings array is concatenated onto the end of alpha. Note that the code does not explicitly check whether or not there is enough room in the destination string for the additional characters. If the concatenated string overflows its array bounds, the program could exhibit undefined behaviour.

When the looping is completed, printf() is used to display the resulting string and its length (using strlen()). Note that the length of the string returned by strlen() does not include the trailing nul byte.

Finally, using strcmp() we compare alpha with a string literal representing the alphabet. If they are identical (i.e. strcmp() returns 0), then we display a simple message indicating so.

Note that these string functions could seriously misbehave if either of the string arguments are not nul terminated or (in the case of strcpy() and strcat()) if there is not enough room in the destination string for the result.


Last modified: January 31, 2004 09:56:03 NST (Saturday)