January 12 (Monday) January 16 (Friday)
We've already seen the %c
and %d
format
conversion specifiers that can be used with the printf()
.
There are many others variations, as given on page 154
of K&R. For example, consider the following code:
#include <stdio.h>
int
main()
{
char ch = '\n';
unsigned int bignum = 4294242914U;
printf("%d\n", ch); /* Displays: 10 */
printf("%x\n", ch); /* Displays: a */
printf("<%02X>\n", ch); /* Displays: <0A> */
printf("%d\n", bignum); /* Displays: -724382 */
printf("%u\n", bignum); /* Displays: 4294242914 */
return 0;
}
formats.c
In the first printf()
, we are simply displaying the
numeric ASCII value of the newline character (\n)
which
is 10. The second printf()
uses the %x
conversion specifier, which takes the numeric ASCII value of the
newline character and displays it in (lower-case) hexadecimal.
This results in a
being displayed because that's the
hexadecimal equivalent of 10. The third printf()
uses
the %02X
specifier. This also displays the character in
its hexadecimal form but it uses an uppercase A
instead
because of the uppercase X
in the conversion specifier.
The hexadecimal number is also displayed right justified inside a field
width of 2 and any initial spaces will be replaced with zeros.
The fourth printf()
displays its numeric argument
(bignum
) as a signed integer. However, the literal constant
that we assigned to bignum
is too large to be represented
as a signed number. As a result, when we try to display the number as a
signed integer, the value -724382
is displayed. To correct
this problem, we must display the number using the %u
conversion specifier, as in the fifth printf()
call, which
treats its corresponding argument as being an unsigned integer.
As an aside, note that the literal constant used in the initialization
of bignum
ends in U
. This tells the compiler
that we are aware that this number is too large to be a signed integer.
Without the U
, the compiler would generate a warning.
To motivate the concept of enumerated types, assume that we are writing
a program which monitors the status of some sort of data acquisition
device. The device can be in several states, including OK
and FAIL
. If the device's buffer is full or empty it could
also be in states FULL
or EMPTY
, for example.
If we want to associate each state with an integer, we can set the
following #define
macros in our program:
#define OK 0 #define FAIL 1 #define FULL 2 #define EMPTY 3
Needless to say, this is quite tedious and error prone, especially
if we have a lot of states. If we wanted to insert a state between
OK
and FAIL
, we would have to change all the
subsequent #define
s. Instead of doing this, we can simply
introduce an enumerated type as demonstrated by the following
program:
#include <stdio.h>
enum status { ST_OK, ST_FAIL, ST_FULL, ST_EMPTY };
int
main()
{
enum status st = ST_OK;
/* ... */
switch (st) {
case ST_OK:
break;
case ST_FAIL:
fprintf(stderr, "Error");
break;
case ST_FULL:
/* ... */
break;
case ST_EMPTY:
/* ... */
break;
default:
printf("Hello");
break;
}
return 0;
}
en.c
Each constant of the status
enumerated type will be
initialized for us (starting from zero). We can then add/insert/delete
as many new states as we like and the compiler will keep track of the
values for us. Note that it is common to prefix each symbolic constant of
the enumerated type with a common string. In the above example, we use
ST_
. This creates a sort of artificial namespace which may
lessen the conflicts between constants from different enumerated types.
C does not have a first-class string data type.
Instead, strings are represented as an array of characters that are
terminated with a nul byte, '\0'
(ASCII 0).
Consider the following example:
#include <stdio.h>
int
main()
{
char string1[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
char string2[] = "World.";
printf("%s, ", string1);
puts(string2);
string2[3] = 'k';
string2[4] = '\0';
printf("Array containing \"%s\" has %d bytes\n",
string2, sizeof(string2));
return 0;
}
chararray1.c
string1
is initialized by specifying each individual
character in much the same way that the integer array was initialized
earlier. Note that the nul byte is explicitly specified at the end of
the string. Needless to say, it is very tedious to have to specify
the characters of the string this way, so C allows you to initialize
an array using a string literal (in this case "World."
).
When this is done, the compiler creates an array of sufficient size to
hold each of the characters as well as the nul byte. During runtime,
the characters of the string literal, including its implicit nul byte
at the end are copied into the array.
If we had explicitly specified a dimension for the array that was too
large (e.g. char string2[20] = "World."
), then the unused
space would be filled with '\0'
(nul bytes). However,
if the dimension we specified was not large enough to hold the nul byte,
then the resulting character array will not have the
nul byte at the end. For example, if we initialize an array as
char str[5] = "hello"
then the nul byte will not be stored and unpredictable results will occur when you try to display the string.
The printf()
function call uses the %s
conversion specifier in its format string to display string1
followed by a comma and a space. This specifier requires that the
corresponding argument in printf()
's argument list be a
pointer to a character. As we will see later, using string1
satisfies this requirement. The %s
specifier will display
the sequence of characters starting at the specified location until it
encounters '\0'
.
The puts()
function (which is also declared
in the stdio.h
header file) is then called using
string2
. The puts()
function simply puts the
supplied string on the display followed by a newline. It is simpler to
use and quicker to execute than printf()
, so it should be
used when all you wish to do is simply display a collection of characters
(that must be terminated with a nul byte) with no formatting.
A variant of puts()
is fputs()
which
takes a string argument and a stream argument. The stream argument
indicates to which stream the string is to be displayed. Two output
streams which are available to C programs are stdout
(standard output -- which we've already seen) and stderr
(standard error). To display a string (typically a diagnostic message)
to stderr
, for example, you would write:
fputs("array index out of bounds!\n", stderr);
By default, both standard output and standard error are sent to your
display. In subsequent lectures, we'll see how to separate output sent
to stdout
from output sent to stderr
by using
redirection on the command line. Note that fputs("A string",
stdout)
is almost equivalent to puts("A string")
,
since both strings are sent to standard output. The latter case
is almost always used when writing to standard output because it is
shorter to type. Another difference is that fputs()
does not automatically add the newline character to the output like
puts()
does, so you have to add it yourself as the above
example shows.
We can change the contents of the array of characters as we would any
other array. For example, when the line string2[3] = 'k'
,
is executed, the fourth character of the string2
array
is changed from an 'l'
to a 'k'
.
We can also shorten the string by writing a nul byte earlier in the
array. For example, string[4] = '\0'
shortens the string to
just "Work"
.
Finally, the above program displays string2
delimited by
quotation marks and a count of the number of bytes (characters) that
the string2
array can hold (including the nul byte).
Note that we can display a double-quote character by escaping it
with a backslash inside printf()
's format string.
The output of the program is:
Hello, World. Array containing "Work" has 7 bytes
Note that making the string shorter does not actually change the size of the array that contains it.
Ultimately, when dealing with strings, there are a couple of very important points to remember:
In some cases when you forget to add the trailing nul byte or forget to ensure there is enough space in your character array to accommodate it, your program may still appear to be working fine. Unfortunately, problems may not actually arise until much later. It is for this reason that nul byte issues can be very problematic to resolve.
strcpy()
,
strcat()
, strcmp()
and strlen()
(K&R § 2.8, 5.3, 5.5)
The C standard library provides several
functions for handling with strings. These function are all declared in
string.h
and so any source file that calls these functions
should have #include <string.h>
.
strcpy(dst,src) | Copies string
src to dst (including the nul byte). |
strcat(dst,src) | Concatenates
src to the end of dst . The nul byte from
src is placed at the end of the concatenated string. |
strcmp(str1,str2) | Compares the characters of the two strings. If the first one is alphabetically less than the second, then return an integer which is less than 0. If the first one is greater than the second, then return an integer that is greater than zero. Otherwise, if they are equal, then return 0. |
strlen(str) | Return the length of the string (this length does not include the nul byte) |
The following program demonstrates their usage:
#include <stdio.h>
#include <string.h>
#define MAX_LEN 10
#define ALPHA_LEN 26
int
main()
{
char strings[][MAX_LEN] = { "abcdefghi",
"jklmnop",
"",
"qrstu",
"vwxyz" };
char alpha[ALPHA_LEN + 1]; /* "+ 1" is for the nul byte */
int i;
strcpy(alpha, "");
for (i = 0; i < sizeof(strings)/sizeof(strings[0]); i++)
strcat(alpha, strings[i]);
printf("\"%s\" has length %d\n", alpha, strlen(alpha));
if (strcmp(alpha, "abcdefghijklmnopqrstuvwxyz") == 0)
puts("The resulting string forms the alphabet");
return 0;
}
chararray2.c
This code creates a two-dimensional array (strings
) to hold
a collection of strings and a one-dimensional array (alpha
)
to hold the result of concatenating all the strings in the two dimensional
array. Note that we add one to the size of alpha
's
array. This is to explicitly accommodate the nul byte.
Internally, the two-dimensional array looks as follows:
Columns | ||||||||||||
[0]
| [1]
| [2]
| [3]
| [4]
| [5]
| [6]
| [7]
| [8]
| [9]
| |||
Rows | ||||||||||||
strings[0]
| a |
b |
c |
d |
e |
f |
g |
h |
i |
\0 |
||
strings[1]
| j |
k |
l |
m |
n |
o |
p |
\0 |
\0 |
\0 |
||
strings[2]
| \0 |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
||
strings[3]
| q |
r |
s |
t |
u |
\0 |
\0 |
\0 |
\0 |
\0 |
||
strings[4]
| v |
w |
x |
y |
z |
\0 |
\0 |
\0 |
\0 |
\0 |
Note that there is a lot of wasted space here as nearly half of all of
the array's contents are nul bytes. We'll see a more efficient way of
storing an array of strings when we discuss pointers. Note also that
all the strings have at least one nul byte at the end. Indeed, the
string denoted by strings[2]
consists of all nul bytes.
This is perfectly valid: strings[2]
is essentially an
empty string (i.e. a string of length 0).
Because arrays that are not initialized have undefined contents,
the program copies an empty string into alpha
using
strcpy()
. We must ensure that alpha
is a valid
nul terminated array because we are concatenating to it later on.
Using strcpy()
to copy an empty string isn't particularly
efficient. Instead, we could have simply initialized alpha
to ""
when we defined it (we could also have said
alpha[0] = '\0'
instead of saying strcpy(alpha,
"")
-- they both have the same effect.)
The for
loop then executes once for each
row of the array i.e. once for each string. Because
sizeof(strings)
is 50 and sizeof(strings[0])
is 10 (note: strings[0]
is a one dimensional array), the
loop will execute five times. Each time through the loop, the next
string from the strings
array is concatenated onto the end
of alpha
. Note that the code does not explicitly check
whether or not there is enough room in the destination string for the
additional characters. If the concatenated string overflows its array
bounds, the program could exhibit undefined behaviour.
When the looping is completed, printf()
is used to display
the resulting string and its length (using strlen()
).
Note that the length of the string returned by strlen()
does not include the trailing nul byte.
Finally, using strcmp()
we compare alpha
with a string literal representing the alphabet. If they are identical
(i.e. strcmp()
returns 0), then we display a simple message
indicating so.
Note that these string functions could seriously misbehave if either
of the string arguments are not nul terminated or (in the case of
strcpy()
and strcat()
) if there is not enough
room in the destination string for the result.
Last modified: January 31, 2004 09:56:03 NST (Saturday)