January 16 (Friday) January 21 (Wednesday)
We can pass strings (character arrays terminated with nul bytes) as parameters in much the same way that we pass an array of integers. The address of the first character in the array is passed to the function and any changes that we make to the characters of the string passed into the function will be made to the original string.
The program below defines a function, alternate_caps()
that
accepts a string parameter and modifies the string so that the case of the
letters comprising the string alternates between uppercase and lowercase.
The function returns the number of case alterations made to the string.
The display()
function is passed the numeric index of
the string within the string array defined in the main()
function, the string itself and the number of changes. The purpose of
the display()
function is to show these parameters using
printf()
.
#include <stdio.h>
#include <ctype.h>
int alternate_caps(char string[]);
void display(int num, char str[], int changes);
#define MAX_COL 40
int
main()
{
char strings[][MAX_COL] = {
"This is an array of CHARACTERS.",
"tHiS Is aNoThEr aRrAy oF ChArAcTeRs.",
"thIs sTrInG ReQuIrEs oNe cHaNgE"
};
int i;
for (i = 0; i < sizeof(strings)/sizeof(strings[0]); i++) {
int changes = alternate_caps(strings[i]);
display(i, strings[i], changes);
}
return 0;
}
int
alternate_caps(char str[]) {
int i = 0;
int changes = 0;
while (str[i] != '\0') {
if (i % 2 == 0) {
/* Make characters located at even indices
upper case (if they aren't already) */
if (islower(str[i])) {
str[i] = toupper(str[i]);
changes ++;
}
}
else {
/* Make characters located at odd indices
lower case (if they aren't already) */
if (isupper(str[i])) {
str[i] = tolower(str[i]);
changes ++;
}
}
++ i;
}
return changes;
}
void
display(int num, char str[], int changes)
{
printf("string[%d] is \"%s\" (%d change%s)\n",
num, str, changes, (changes == 1) ? "" : "s");
}
function2.c
In addition to the isupper()
/islower()
macros,
which we saw in the previous lecture, the above code also uses the
toupper()
and tolower()
ctype.h
macros. These two macros accept a character argument and returns the
uppercase and lowercase version of the same letter, respectivel
(if possible). If the conversion is not possible, then the original
character is returned.
Note the conditional in the while
loop of the
alternate_caps()
function:
while (str[i] != '\0') { ... }
This basically says while the character at position i
is not the nul byte (i.e. '\0'), execute the loop body.
This provides one way to iterate over all the characters in the array
supplied to the function. The i
index variable is updated
at the bottom of the loop. This conditional can be abbreviated by
dropping the explicit test against the nul byte. It could also have
been implemented with a for
loop, as we will see later.
The output from the preceding function is:
string[0] is "ThIs iS An aRrAy oF ChArAcTeRs." (11 changes) string[1] is "ThIs iS AnOtHeR ArRaY Of cHaRaCtErS." (30 changes) string[2] is "ThIs sTrInG ReQuIrEs oNe cHaNgE" (1 change)
Note that there is no 's
' on the word
"change"
when only one change was made to the string by
the alternate_caps()
.
As mentioned above, when we pass an array, what we are actually passing is a pointer to the array's first element. This implies that there is a very close relationship between arrays and pointers.
From K&R, Chapter 5:
A pointer is a variable that contains the address of a variable.
A pointer is defined by using *
before the pointer variable
name in its definition. Before a pointer can be used, it must be
set to the address of a variable of the appropriate type. We can get
the address of the variable by preceding the variable's name with the
address-of operator: &
. When this is done, we can then
get access and/or change the value of the variable by dereferencing the
pointer variable. A pointer variable can be dereferenced by placing a
*
before it.
Consider the following (somewhat contrived) example:
#include <stdio.h>
int
main()
{
int i = 0, j = 2;
int *ptr = &i;
/* *ptr is an alias for i */
printf("Before i = %d\t*ptr = %d\n", i, *ptr);
*ptr = 1;
printf("After i = %d\t*ptr = %d\n", i, *ptr);
ptr = &j;
/* Now *ptr is an alias for j */
printf("Before j = %d\t*ptr = %d\n", j, *ptr);
j = *ptr * 2;
printf("After j = %d\t*ptr = %d\n", j, *ptr);
return 0;
}
pointer1.c
Initially, ptr
is set to the address of i
.
This means that the contents of i
can be accessed and
changed by dereferencing ptr
. In other words, *ptr
can be thought of as an alias for i
. In the second
group of statements, ptr
is set to the address of j
.
Now *ptr
becomes synonymous with j
.
ptr
is of type pointer to integer, whereas
*ptr
is of type integer.
Note that the *
symbol in the following two lines
serve different purposes:
int *ptr = &i;
*ptr = 1;
In the first line, the *
is being used to define the variable
ptr
as a pointer to an integer. In the second line,
the *
operator is being used to dereference the pointer
ptr
in order to access the contents of the memory location
addressed by the pointer.
Pointers are commonly used to efficiently pass arguments to functions, to simulate pass by reference semantics and to store the addresses of blocks of memory allocated dynamically (more on this in later lectures). There is also a very close relationship between pointers and arrays.
In nearly all cases, an array variable degrades into a pointer to the
array's first element. Because of this, we can assign an array to
a pointer (of the appropriate type). When we do so, what's actually
happening is that the pointer is assigned the address of the array's
first element. (Incidentally, one important exception to the arrays
degrade into pointers to their first element rule is when an array
is used as the argument to the sizeof
operator.)
For example, the following code demonstrates three ways of iterating
over the characters of a string. One uses the traditional array index
variable, the second iterates over the string using for
loop that increments a pointer, and the third uses a while
loop and an incrementing pointer variable. Note that all loops stop
when we reach the nul byte at the end of the string.
#include <stdio.h>
int
main()
{
char a[] = "Hello";
char *p;
int i;
for (i = 0; a[i] != '\0'; i++)
printf("a[%d] = %c\n", i, a[i]);
for (p = a; *p != '\0'; p++)
printf("*p = %c\n", *p);
p = a;
while (*p != '\0')
putchar(*p++);
return 0;
}
pointer2.c
In all three cases we can get rid of the != '\0'
test,
because in C, any zero value (including
the nul byte) represents false. This means that the two
for
loops could have been written as:
for (i = 0; a[i]; i++)
and
for (p = a; *p; p++)
The while
loop can be written as:
while (*p)
By incrementing the pointer in the last two loops, we can move through elements of the array in a sequential fashion and output the characters by dereferencing the pointer.
In the while
loop, we use the notation:
*p++
This is interpreted as *(p++)
(K&R p.53). What
p++
does is it increments p
and
returns the previous value of p
.
This previous value of p
is then dereferenced by the
*
operator and the character located at this address is
displayed on standard output by the putchar()
function,
which is part of the stdio
library.
This idiom occurs quite frequently in C so
you should get used to it.
Note that the actual number of bytes added to the pointer variable during
each increment depends upon the type of the variable to which it points.
For example, pointers to characters are incremented by one byte (because
sizeof char == 1
). Incrementing a pointer to an integer
would increase the value of the pointer by four bytes on a machine where
integers are four bytes long.
Fortunately, the compiler takes care of incrementing pointers by the appropriate amount, so as a programmer, you generally don't have to worry about how much is being added to the pointer.
Because an array degrades to a pointer to its first element, the
following relationships hold true after we assign an array a
to a pointer p
of appropriate type:
p == &a[0]
p + 1 == &a[1]
p + 2 == &a[2]
... p + i == &a[i]
If we dereference both sides of the above equations we also get the following equivalences:
*p == a[0]
*(p + 1) == a[1]
*(p + 2) == a[2]
... *(p + i) == a[i]
These equivalences allow us to use some shortcuts when setting elements of an array. For example, if we wish to set the first element of an array, we can say
*a = value
instead of
a[0] = value
We can also subtract pointers (K&R § 5.4), as demonstrated by the following code:
#include <stdio.h>
int
main()
{
char string[] = "This Is A String Of Characters";
char srch;
const char start = 'a';
const char end = 'z';
for (srch = start; srch != end + 1; srch++) {
char *p;
for (p = string; *p != srch && *p; p++)
;
if (*p == srch)
printf("Found '%c' at index position %d.\n",
srch, p - string);
else
printf("No '%c' in string.\n", srch);
}
return 0;
}
pointer3.c
This program introduces the concept of a constant (K&R § 2.3).
The constants start
and end
are set to the
begin and end points of a range of characters (in this case, the range of
lower case characters). By defining these as constants, we are assuring
the C compiler that we will not be changing
their values.
We then create a loop which iterates over all the characters delimited
by these contants (the two characters at either extreme, a
and z
, are included in this iteration). We then execute a
nested for
loop that attempts to search for each of these
characters inside the candidate array denoted by string
.
If we find the search character, then we output the position (starting
from zero) of the array at which the character was found. Otherwise,
we indicate that the character was not found.
In order to calculate the position, we subtract the address of the first
element of the string (given by the variable string
, itself)
from the current value of p
(which represents the address
at which the character was found). The result of this subtraction is
the offset at which the character was found in string
.
Also of interest is the fact that the nested for
loop
has an empty body (the body is denoted by the null statement, which
simply consists of a semicolon). In this case, because all we are
doing is searching for a particular character in a string, there is no
need to define a body for the loop. The loop will stop searching
when it either finds the character or when the end of the string has
been reached. Looping constructs with empty bodies are not uncommon in
C.
The program outputs each lowercase letter of the alphabet as well
as the index position that it first occurs in the search string, specified
by the character array string
:
Found 'a' at index position 22. No 'b' in string. Found 'c' at index position 25. No 'd' in string. Found 'e' at index position 27. Found 'f' at index position 18. Found 'g' at index position 15. Found 'h' at index position 1. Found 'i' at index position 2. No 'j' in string. No 'k' in string. No 'l' in string. No 'm' in string. Found 'n' at index position 14. No 'o' in string. No 'p' in string. No 'q' in string. Found 'r' at index position 12. Found 's' at index position 3. Found 't' at index position 11. No 'u' in string. No 'v' in string. No 'w' in string. No 'x' in string. No 'y' in string. No 'z' in string.
Incidentally, a standard library function name strchr()
performs a similar function to the inner for
loop in the
above program.
Despite this relationship between arrays and pointers, it is important to
remember that arrays and pointers are not the same thing.
For example, for an array arr
and pointer ptr
,
the statements arr++
and arr = ptr
are invalid.
However, ptr++
and ptr = arr
are valid.
Another notable difference between arrays and pointers occurs when we initialize each with a string literal (K&R § 5.5):
#include <stdio.h>
int
main()
{
char a[] = "This is an array";
char *p = "This is a pointer";
printf ("sizeof a = %d\n", sizeof a);
printf ("sizeof p = %d\n", sizeof p);
*a = 't'; /* Valid */
puts(a);
*p = 't'; /* Invalid (runtime error) */
puts(p);
a = "New array"; /* Invalid (compiler error) */
p = "New pointer"; /* Valid */
return 0;
}
pointer4.c
Note that when we assign a string literal to a pointer, the address of
the first character of the string literal is assigned to the pointer.
This string literal, for all intents and purposes, is read
only, i.e. you cannot change any of the characters in the
string literal via the pointer p
. Attempting to do so
may cause your program to crash.
Note that this is very different when we initialize an array with a string literal. All the characters in the string literal (and the implicit nul byte at the end) are copied into the array and the program is free to modify this copy as it sees fit (just as long as it does not access character positions outside the array). For example, the following code fragment is okay:
char a[] = "This is an array"; char *p = a; *p = 't'; /* Valid */
However, we cannot assign a new string literal to an array after the array has been defined. This is not the case for pointers, which can be assigned a new string literal.
We can also define an array of pointers to characters as shown below (K&R § 5.6, 5.8):
#include <stdio.h>
int
main()
{
char *strings[] = {
"This is",
"an array",
"of pointers",
"to strings",
NULL
};
char **str = strings;
while (*str)
puts(*str++);
return 0;
}
pointer5.c
Note that this is quite different to the 2 dimensional array of characters described in an earlier lecture. There are no superfluous nul bytes stored by this structure, making it more efficient, memory-wise. The array merely contains pointers to characters stored somewhere in memory. Indeed, the strings may not even be adjacent to each other in memory. However, none of the strings stored this way can be modified unless we copy them (either to an array or via dynamic memory allocation which we will see later).
strings[0] | -----> | T
| h
| i
| s
|
| i
| s
| \0
| ||||
strings[1] | -----> | a
| n
|
| a
| r
| r
| a
| y
| \0
| |||
strings[2] | -----> | o
| f
|
| p
| o
| i
| n
| t
| e
| r
| s
| \0
|
strings[3] | -----> | t
| o
|
| s
| t
| r
| i
| n
| g
| s
| \0
| |
strings[4] | -----> | NULL |
Another important difference is that we are no longer dealing with a
rectangular block of memory. In the 2-dimensional array, we could
legally access and set any character in the range strings[0
.. MAX_ROW-1][0 .. MAX_COL-1]
(the byte at that location may be a
nul byte, but we are permitted to change it if we want to). In the above
"array of character pointers" program, however, attempting to access
the array element strings[1][9]
, for example would yield
an undefined character. Attempting to write to strings[1][9]
will produce undefined behaviour.
The NULL
pointer stored at the end of the array of pointers
is analogous to null
in Java. It represents a pointer to
nothing and is commonly used to represent the end of an array of pointers
(as we have done above) or the end of linked list. It can also be used
as an initial value for a pointer or as a return value from a function
to indicate failure (or maybe even success, depending on how it's used)
of the function. Do not confuse the nul byte and the NULL
pointer. The former is a character, whereas the latter is a pointer.
You should NEVER dereference a NULL
pointer.
Bad things will happen.
By the way, don't let the double pointer in the definition of
the str
variable throw you off (char **str =
strings
). If you think of the type char*
as being
an atomic type (like int
, for example), then the definition
of str
becomes a bit easier to understand. For example,
if we replace all char*
's in the above program with just
char
, replace the initializer list with characters instead
of strings and change
the puts()
to a putchar()
, then we get the
following program which is analagous to the preceding one except that
it deals with char
's instead of char*
's.
#include <stdio.h>
int
main()
{
char chars[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
char *ch = chars;
while (*ch)
putchar(*ch++);
return 0;
}
pointer6.c
Last modified: January 31, 2004 09:56:37 NST (Saturday)