C does not have a first-class string data type.
Instead, strings are represented as an array of characters that are
terminated with a nul byte, '\0'
(ASCII 0).
Consider the following example:
#include <stdio.h> int main() { char string1[] = { 'H', 'e', 'l', 'l', 'o', '\0' }; char string2[] = "World."; printf("%s, ", string1); puts(string2); string2[3] = 'k'; string2[4] = '\0'; printf("Array containing \"%s\" has %d bytes\n", string2, sizeof(string2)); return 0; }
string1
is initialized by specifying each individual
character in much the same why that the integer array was initialized
earlier. Note that the nul byte is explicitly specified at the end of
the string. Needless to say, it is very tedious to have to specify
the characters of the string this way, so C
allows you to initialize an array using a string literal (in this case
"World."
). When this is done, the compiler creates an array
of sufficient size to hold each of the characters as well as the nul byte.
During runtime, the characters of the string literal, including its
implicit nul byte at the end are copied into the array.
If we had explicitly specified a dimension for the array that was too
large (e.g. char string2[20] = "World."
), then the unused
space would be filled with '\0'
.
The printf()
function call uses the %s
conversion specifier in its format string to display string1
followed by a comma and a space. This specifier requires that the
corresponding argument in printf()
's argument list be a
pointer to a character. As we will see later, using string1
satisfies this requirement. The %s
specifier will display
the sequence of characters starting at the specified location until it
encounters '\0'
.
The puts()
function (which is also declared
in the stdio.h
header file) is then called using
string2
. The puts()
function simply puts the
supplied string on the display followed by a newline. It is simpler to
use and quicker to execute than printf()
, so it should be
used when all you wish to do is simply display a collection of characters
(that must be terminated with a nul byte) with no formatting.
We can change the contents of the array of characters as we would any
other array. For example, when the line string2[3] = 'k'
,
is executed, the fourth character of the string2
array
is changed from an 'l'
to a 'k'
.
We can also shorten the string by writing a nul byte earlier in the
array. For example, string[4] = '\0'
shortens the string to
just "Work"
.
Finally, the above program displays string2
delimited by
quotation marks and a count of the number of bytes (characters) that
the string2
array can hold (including the nul byte).
Note that we can display a double-quote character by escaping it
with a backslash inside printf()
's format string.
The output of the program is:
Hello, World. Array containing "Work" has 7 bytes
Note that making the string shorter does not actually change the size of the array that contains it.
Ultimately, when dealing with strings, there are a couple of very important points to remember:
In some cases when you forget to add the trailing nul byte or forget to ensure there is enough space in your character array to accommodate it, your program may still appear to be working fine. Unfortunately, problems may not actually arise until much later. It is for this reason that nul byte issues can be very problematic to resolve.
strcpy()
,
strcat()
, strcmp()
and strlen()
(K&R § 2.8, 5.3, 5.5)
The C standard library provides several
functions for handling with strings. These function are all declared in
string.h
and so any source file that calls these functions
should have #include <string.h>
.
strcpy(dst,src) | Copies string
src to dst (including the nul byte). |
strcat(dst,src) | Concatenates
src to the end of dst . The nul byte from
src is placed at the end of the concatenated string. |
strcmp(str1,str2) | Compares the characters of the two strings. If the first one is alphabetically less than the second, then return an integer which is less than 0. If the first one is greater than the second, then return an integer that is greater than zero. Otherwise, if they are equal, then return 0. |
strlen(str) | Return the length of the string (this length does not include the nul byte) |
The following program demonstrates their usage:
#include <stdio.h> #include <string.h> #define MAX_LEN 10 #define ALPHA_LEN 26 int main() { char strings[][MAX_LEN] = { "abcdefghi", "jklmnop", "", "qrstu", "vwxyz" }; char alpha[ALPHA_LEN + 1]; /* "+ 1" is for the nul byte */ int i; strcpy(alpha, ""); for (i = 0; i < sizeof(strings)/sizeof(strings[0]); i++) { strcat(alpha, strings[i]); } printf("\"%s\" has length %d\n", alpha, strlen(alpha)); if (strcmp(alpha, "abcdefghijklmnopqrstuvwxyz") == 0) { puts("The resulting string forms the alphabet"); } return 0; }
This code creates a two-dimensional array (strings
) to hold
a collection of strings and a one-dimensional array (alpha
)
to hold the result of concatenating all the strings in the two dimensional
array. Note that we add one to the size of alpha
's
array. This is to explicitly accommodate the nul byte.
Internally, the two-dimensional array looks as follows:
Col | ||||||||||||
[0]
| [1]
| [2]
| [3]
| [4]
| [5]
| [6]
| [7]
| [8]
| [9]
| |||
Row | ||||||||||||
strings[0]
| a |
b |
c |
d |
e |
f |
g |
h |
i |
\0 |
||
strings[1]
| j |
k |
l |
m |
n |
o |
p |
\0 |
\0 |
\0 |
||
strings[2]
| \0 |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
||
strings[3]
| q |
r |
s |
t |
u |
\0 |
\0 |
\0 |
\0 |
\0 |
||
strings[4]
| v |
w |
x |
y |
z |
\0 |
\0 |
\0 |
\0 |
\0 |
Note that there is a lot of wasted space here as nearly half of all of
the array's contents are nul bytes. We'll see a more efficient way of
storing an array of strings when we discuss pointers. Note also that
all the strings have at least one nul byte at the end. Indeed, the
string denoted by strings[2]
consists of all nul bytes.
This is perfectly valid: strings[2]
is essentially an
empty string (i.e. a string of length 0).
The program copies an empty string into alpha
using
strcpy()
because arrays that are not initialized have
undefined contents. We must ensure alpha
is an empty string
because we are concatenating to it later on. Using strcpy()
to copy an empty string isn't particularly efficient. Instead, we
could have simply initialized alpha
to ""
when we defined it (we could also have said alpha[0] = '\0'
instead of saying strcpy(alpha, "")
-- they both have the
same effect.)
The for
then executes once for each row of the array
i.e. once for each string. Because sizeof(strings)
is 50
and sizeof(strings[0])
(strings[0]
is a one
dimensional array) is 10, the loop will execute five times.
Each time through the loop, the next string from the strings
array is concatenated onto the end of alpha
. Note that
the code does not explicitly check whether or not there is enough room
in the destination string for the additional characters. If the
concatenated string overflows its array bounds, the program could exhibit
undefined behaviour.
When the looping is completed, printf()
is used to display
the resulting string and its length (using strlen()
).
Note that the length of the string returned by strlen()
does not include the trailing nul byte.
Finally, using strcmp()
we compare alpha
with a string literal representing the alphabet. If they are identical
(i.e. strcmp()
returns 0), then we display a simply message
indicating so.
Note that these string functions could seriously misbehave if either
of the string arguments are not nul terminated or (in the case of
strcpy()
and strcat()
) if there is not enough
room in the destination string for the result.
Last modified: Wed Jan 15 18:46:33 2003