Friday, January 31, 2003

Input/Output (K&R Chapter 7)

We have already seen three files that are already accessible by all C programs: stdin, stdout and stderr. These file streams are open for input/output by default and we do not have to worry about closing them when we are finished. As we have seen, these file streams are typically used as arguments to functions e.g.

fgets(buffer, sizeof(buffer), stdin);

fprintf(stderr, "Error!");

We can also create files of our own for either reading and/or writing by using the fopen() and fclose() functions, which are part of the standard library. We can use these functions to write programs that read and write files (provided we have appropriate file system permissions). For example, consider the following simple program which is a simple version of the Unix command cat. This program simply takes the filename specified on the command line and displays the contents of the file on the display.


#include	<stdio.h>
#include	<stdlib.h>

int
main(int argc, char **argv)
{
	FILE	*fp;
	int	 c;

	if (argc != 2) {
		fprintf(stderr, "Must specify a file name.");
		exit(1);
	}

	if ((fp = fopen(argv[1], "r")) == NULL) {
		fprintf(stderr, "Unable to open specified file.");
		exit(1);
	}
	while ((c = fgetc(fp)) != EOF)
		putchar(c);
	fclose(fp);
	return 0;
}


Command-line arguments (K&R §5.10)

One of the first things you should notice about this program is the fact that there are now formal parameters in main()'s parameter list:

main(int argc, char **argv)

These parameters give the program a way to access command line arguments when the program is invoked. (They are analogous to the String args[] parameter in Java 's main() method.) The first argument, argc is the number of command line arguments (including the program executable itself). The second parameter, argv, is a pointer to an array of character strings that contain the name of the program being executed and the command line arguments. (You may actually see argv defined as char *argv[]. For all intents and purposes, this is synonymous with char **argv.)

For example, if we compile the program, and run it as follows:

$ ./a.out cat.c

argc will be set to 2 and argv will point to an array containing pointers to the strings holding the program name and the command line arguments. Therefore, in the invocation above, argv[0] will containing the string "./a.out" and argv[1] will be the string "cat.c".

The following short program iterates over all the command line arguments (including the program name) and displays them. (K&R p.115) gives another couple of similar examples.)


#include	<stdio.h>

int
main(int argc, char **argv)
{
	int	i;

	for (i = 0; i < argc; i++) 
		printf("argv[%d] is %s\n", i, argv[i]);
	return 0;
}


Note that for the program invocation:

./a.out < input.txt

./a.out has no command line arguments (apart from itself). The redirection symbol and the input.txt filename are not treated as command line arguments. Instead, the redirection symbols < and > cause stdin and stdout, respectively, to be tied to the specified file names and the redirection symbols and the file names are dropped from the command line before the program is actually executed.

Returning to our original cat.c program, we first ensure that the user specified a filename as a command line argument. If the user did, then argc will be 2 (one for the command/program name itself and the second one for the actual command line argument). If the user did not, then we fprintf() and error and terminate.

fopen() and fclose()

If a file name was supplied on the command line, then we open the file using the fopen() function. This function takes two arguments: the name of the file to open (in this case, argv[1] and the mode, which is passed in as a string. Because we are opening the file for read access only, we specify the mode as "r". If the fopen() call succeeds, then it returns a FILE pointer, which we use to actually access the file contents.

In order to access the contents of the file, we use the fgetc() function, which takes a FILE pointer and returns an integer representing the next character in the input file. fgets() returns an integer type and not a character type because we have to allow for the possibility that the return value will be EOF (end of file) which is typically -1, and the default char type may not be able to represent negative numbers.

Another popular character input function (which also returns an integer), is getchar() which returns returns the next character from standard input (stdin). getchar() is sometimes #defined to fgetc(stdin).

We continue to read each character from the input file, displaying it using putchar(), until we hit the EOF character. At which point, we close the file using fclose().

Opening files for writing

Consider the following program that takes a file name on the command line (say, filename) and creates another file named filename.rev in which the lines are displayed in reverse order and the strings representing each line are reversed too.

For example if the file hello contained the text

Hello
World

Then running the program:

$ ./a.out hello

Would create the file hello.rev, with the contents:


dlroW
olleH


#include	<stdio.h>
#include	<string.h>
#include	<stdlib.h>
#include	<errno.h>

#define	BUFFER_LEN	(80+1+1)

const char *extension  = ".rev";

typedef struct node {
	char		buffer[BUFFER_LEN];
	struct node	*next;
} t_node;

int
main(int argc, char **argv)
{
	FILE	*fp;
	char	 buffer[BUFFER_LEN];
	t_node	*list = NULL;
	char	*outfile;

	if (argc != 2) {
		fprintf(stderr, "Usage: %s filename\n", argv[0]);
		exit(1);
	}

	if ((fp = fopen(argv[1], "r")) == NULL) {
		fprintf(stderr, "Unable to open input file \"%s\": %s\n",
			argv[1], strerror(errno));
		exit(1);
	}

	while (fgets(buffer, sizeof(buffer), fp) != NULL) {
		t_node *cur = (t_node *) malloc(sizeof(t_node));
		strcpy(cur->buffer, buffer);
		cur->next = list;
		list = cur;
	}
	fclose(fp);	/* Close our input file stream */

	/* Don't forget '+ 1' for the the nul byte */
	outfile = (char *) malloc (strlen(argv[1]) + strlen(extension) + 1);
	if (outfile == NULL) {
		fprintf(stderr, "No memory to store output file name\n");
		exit(1);
	}

	strcpy(outfile, argv[1]);
	strcat(outfile, extension);

	if ((fp = fopen(outfile, "w")) == NULL) {
		fprintf(stderr, "Unable to open output file \"%s\": %s\n",
			outfile, strerror(errno));
		exit(1);
	}

	while (list) {
		int i;
		t_node *del = list;
		for (i = strlen(list->buffer) - 1; i >= 0; --i)
			fputc(list->buffer[i], fp);
		list = list->next;
		free(del);
	}

	fclose(fp);	/* Close our output file stream */

	free(outfile);	/* Free up the memory that we
			 * allocated for the output file name
			 */

	return 0;
}


The first part of this program works similarly to the earlier one except that now, we read the input a line at a time using fgets() (note the fp argument to fgets()) and store each line in a linked list. When we later traverse this list, the lines will be displayed in reverse order. After we finish reading the file, we fclose() it, then dynamically allocate a new character buffer to hold the name of the new file we are about to create.

We then open this new file for write access by specifying a mode of "w". Note that opening a file for write access will cause a file with the same name (and appropriate write permissions) to be overwritten without warning. We then proceed to iterate over our linked list of line buffers and display the characters in each buffer backwards. Note that we free each node after we are finished with it. We then close our output file and deallocate the memory that we allocated for the output file name.

strerror() and errno

We use a couple of new concepts in the preceding program, namely strerror() and errno, both declared in errno.h. Many system functions (e.g. fopen()) when they fail, set a global variable called errno. This variable is set to an integer value which represents a reason why the function failed. The function strerror() takes this integer value and converts it into a human readable string. So, in our code above, if the call to fopen() fails, we output an error message to the stderr stream that indicates the name of the file and a readable string indicating why the fopen() call failed.

As an aside, the calls to fclose() in the above program can also fail; however, unlike fopen(), fclose() returns non-zero upon failure. Strictly speaking, we should be checking the return value of fclose() too. Upon failure, fclose() will also set errno.

To see the various error messages that can be displayed, consider the following invocations of the above program.

$ ./a.out nofile
Unable to open "nofile": No such file or directory
$ chmod 000 revfile.c
$ ./a.out revfile.c
Unable to open "revfile.c": Permission denied
$ chmod 600 revfile.c
$ ./a.out revfile.c
$ cat recfile.c.rev

}
;0 nruter	

/* 			
eman elif tuptuo eht rof detacolla * 			
ew taht yromem eht pu eerF */	;)eliftuo(eerf	

/* maerts elif tuptuo ruo esolC */	;)pf(esolcf	
... etc. ...

Note, that if we wanted to display each line without the characters reversed, we could have replaced the for loop and the fputc() call with either:

fprintf(fp, "%s", list->buffer);

or better yet,

fputs(list->buffer, fp);

Note that the position of the FILE * parameter (i.e. fp) in the parameter list isn't very consistent.

Miscellaneous

Other file modes are also available to the programmer ("a" for append is another popular mode). There are also ways to randomly seek to an arbitrary offset within a file (fseek()). Arbitrary data types (including structures and arrays) can also be read from and written to files using fread() and fwrite(). Note that files written using fwrite() on one architecture may not be readable by fread() on a different architecture. These two functions should not be used if portability is desired.

Last modified: Fri Jan 31 15:49:55 2003