Main

January 26, 2004 (Monday)

Input/Output (K&R Chapter 7)

We've already seen one way of doing string input using scanf("%s", buffer), where buffer is an array of characters. Unfortunately, this has the problem of potentially overflowing the buffer array if the user enters too many characters:

#include	<stdio.h>

#define BUF_LEN 4

int
main()
{
	int	i = 1;
	char	buffer[BUF_LEN];

	printf("Input some characters: ");
	/* Inputting four or more characters causes problems */
	scanf("%s", buffer);
	printf("%s %d\n", buffer, i);

	return 0;
}
scanf.c

What we need is a way to input a limited number of characters so that we can ensure that the buffer size is not exceeded. fgets() provides this functionality.

The fgets() function (K&R § 7.7)

The fgets() function can be used to read input a full line of characters at a time. This function takes three arguments: a pointer to the first character of an allocated buffer (the buffer could either be an array or it could have been allocated dynamically), a maximum buffer size (say, maxline), and a FILE * (for example, stdin, which represents standard input, is an appropriate value for this parameter). fgets() will read the next line (including the newline) from the specified input stream and store them in the specified buffer. However, it will never read/store more than maxline-1 characters. (This makes it possible to safely read lines from a file without fear of overflowing a buffer.) Because of this, fgets() will only read/store the first maxline-1 characters on a line. If there are any remaining characters on the line (including the newline), they will be read on subsequent input requests. If a line has been fully read, then the last character stored in the buffer should be a \n (unless, of course, there was no newline in the input).

Note that fgets() will automatically put a nul byte in the buffer after the last character it reads. fgets() returns NULL in the event of error or end-of-file.

The following program will read input from standard input and echo each line (enclosed in double quotes) to the screen:

#include	<stdio.h>
#include	<string.h>

#define BUFFER_LEN	(80+1+1)	/* 80 regular characters
					 * +1 for the newline
					 * +1 for nul byte
					 */

int
main()
{
	char	buffer[BUFFER_LEN];

	while (fgets(buffer, sizeof(buffer), stdin) != NULL)  {
		/* Get rid of the newline */
		if (buffer[strlen(buffer)-1] == '\n')
			buffer[strlen(buffer)-1] = '\0';
		printf("Read string: \"%s\"\n", buffer);
	}
	return 0;
}
fgets.c

Creating Files for Input/Output

We have already seen three file streams that are already accessible by all C programs: stdin, stdout and stderr. These file streams are open for input/output by default and we do not have to worry about closing them when we are finished. As we have seen, these file streams are typically used as arguments to functions e.g.

fputs(str, stdout);

fprintf(stderr, "Error!");

We can also create files of our own for either reading and/or writing by using the fopen() and fclose() functions, which are part of the C standard library. We can use these functions to write programs that read and write files (provided we have appropriate file system permissions). For example, consider the following program which is a simple version of the Unix command cat. This program simply takes the filename specified on the command line and displays the contents of the file on the display.

#include	<stdio.h>
#include	<stdlib.h>

int
main(int argc, char **argv)
{
	FILE	*fp;
	int	 c;

	if (argc != 2) {
		fputs("Must specify a file name.\n", stderr);
		exit(1);
	}

	if ((fp = fopen(argv[1], "r")) == NULL) {
		fprintf(stderr, "Unable to open \"%s\" file.\n", argv[1]);
		exit(1);
	}
	while ((c = fgetc(fp)) != EOF)
		putchar(c);
	fclose(fp);
	return 0;
}
cat.c

Command-line arguments (K&R §5.10)

One of the first things you should notice about this program is the fact that there are now formal parameters in main()'s parameter list:

main(int argc, char **argv)

These parameters give the program a way to access command line arguments when the program is invoked. (They are analogous to the String args[] parameter in Java 's main() method.) The first argument, argc is the number of command line arguments (including the program executable itself). The second parameter, argv, is a pointer to an array of character strings that contain the name of the program being executed and the command line arguments. (You may actually see argv defined as char *argv[]. For all intents and purposes, this is synonymous with char **argv.)

For example, if we compile the program with make cat, and run it as follows:

$ ./cat cat.c

argc will be set to 2 and argv will point to an array containing pointers to the strings holding the program name and the command line arguments. Therefore, in the invocation above, argv[0] will containing the string "./cat" and argv[1] will be the string "cat.c".

The following short program iterates over all the command line arguments (including the program name) and displays them. K&R p.115 gives another couple of similar examples.)

#include	<stdio.h>

int
main(int argc, char **argv)
{
	int	i;

	for (i = 0; i < argc; i++) 
		printf("argv[%d] is %s\n", i, argv[i]);
	return 0;
}
cmdarg.c

Note that for the program invocation:

./progname < input.txt > output.txt 2> error.txt

./progname has no command line arguments (apart from itself). Therefore, argc will be set to one and not two. The redirection symbols and the filenames are not treated as command line arguments. Instead, the redirection symbols <, > and 2> cause the shell to tie stdin, stdout and stderr, respectively, to the specified file names. The redirection symbols and the file names are dropped from the command line before the program is actually executed.

Returning to our original cat.c program, we first ensure that the user specified a filename as a command line argument. If the user did, then argc will be 2 (one for the command/program name itself and the second one for the actual command line argument). If the user did not, then we fputs() an error and terminate.

fopen() and fclose()

If a file name was supplied on the command line, then we open the file using the fopen() function. This function takes two arguments: the name of the file to open (in this case, argv[1] and the mode, which is passed in as a string. Because we are opening the file for read access only, we specify the mode as "r". If the fopen() call succeeds, then it returns a FILE pointer, which we use to access the file contents.

In order to access the contents of the file, we use the fgetc() function, which takes a FILE pointer and returns an integer representing the next character in the input file. fgetc() returns an integer type and not a character type because we have to allow for the possibility that the return value will be EOF (end of file) which is typically -1, and the default char type may not be able to represent negative numbers. The getchar() function, which we saw earlier, is roughly equivalent to fgetc(stdin).

We continue to read each character from the input file, displaying it using putchar(), until we hit the EOF character, at which point, we close the file using fclose().

Opening files for writing

Consider the following program that takes a file name on the command line, say, filename. It then creates another file named filename.rev in which the lines are displayed in reverse order and the strings representing each line are reversed too.

For example if the file hello contained the text

Hello
World

Then running the program:

$ ./revfile hello

Would create the file hello.rev, with the contents:


dlroW
olleH

#include	<stdio.h>
#include	<string.h>
#include	<stdlib.h>
#include	<errno.h>

#define	BUFFER_LEN	(80+1+1)

const char *extension  = ".rev";

typedef struct node {
	char		buffer[BUFFER_LEN];
	struct node	*next;
} t_node;

int
main(int argc, char **argv)
{
	FILE	*fp;
	char	 buffer[BUFFER_LEN];
	t_node	*list = NULL;
	char	*outfile;

	if (argc != 2) {
		fprintf(stderr, "Usage: %s filename\n", argv[0]);
		exit(1);
	}

	if ((fp = fopen(argv[1], "r")) == NULL) {
		fprintf(stderr, "Unable to open input file \"%s\": %s\n",
			argv[1], strerror(errno));
		exit(1);
	}

	while (fgets(buffer, sizeof(buffer), fp) != NULL) {
		t_node *cur = (t_node *) malloc(sizeof(t_node));
		strcpy(cur->buffer, buffer);
		cur->next = list;
		list = cur;
	}
	fclose(fp);	/* Close our input file stream */

	/* Don't forget '+ 1' for the the nul byte */
	outfile = (char *) malloc (strlen(argv[1]) + strlen(extension) + 1);
	if (outfile == NULL) {
		fputs("No memory to store output file name\n", stderr);
		exit(1);
	}

	strcpy(outfile, argv[1]);
	strcat(outfile, extension);

	if ((fp = fopen(outfile, "w")) == NULL) {
		fprintf(stderr, "Unable to open output file \"%s\": %s\n",
			outfile, strerror(errno));
		exit(1);
	}

	while (list) {
		int i;
		t_node *del = list;
		for (i = strlen(list->buffer) - 1; i >= 0; --i)
			fputc(list->buffer[i], fp);
		list = list->next;
		free(del);
	}

	fclose(fp);	/* Close our output file stream */

	free(outfile);	/* Free up the memory that we
			 * allocated for the output file name
			 */

	return 0;
}
revfile.c

The first part of this program works similarly to the earlier one except we now read the input a line at a time using fgets() (note the fp argument to fgets()) and store each line in a linked list. When we later traverse this list, the lines will be displayed in reverse order. After we finish reading the file, we fclose() it, then dynamically allocate a new character buffer to hold the name of the new file we are about to create.

We then open this new file for write access by specifying a mode of "w". Note that opening a file for write access will cause a file with the same name (and appropriate write permissions) to be overwritten without warning. We then proceed to iterate over our linked list of line buffers and display the characters in each buffer backwards. Note that we free each node after we are finished with it. We then close our output file and deallocate the memory that we allocated for the output file name.

strerror() and errno

We use a couple of new concepts in the preceding program, namely strerror() and errno, both declared in errno.h. Many system functions (e.g. fopen()) when they fail, set a global variable called errno. This variable is set to an integer value which represents a reason why the function failed. The function strerror() takes this integer value and converts it into a human readable string. So, in our code above, if the call to fopen() fails, we output an error message to the stderr stream that indicates the name of the file and a readable error string indicating why the fopen() call failed.

As an aside, the calls to fclose() in the above program can also fail, in which case fclose() returns a non-zero value. Strictly speaking, we should be checking the return value of fclose() too. Upon failure, fclose() will also set errno.

To see the various error messages that can be displayed, consider the following invocations of the above program.

$ ./revfile nofile
Unable to open "nofile": No such file or directory
$ chmod 000 revfile.c
$ ./revfile revfile.c
Unable to open "revfile.c": Permission denied
$ chmod 600 revfile.c
$ ./revfile revfile.c
$ cat recfile.c.rev

}
;0 nruter	

/* 			
eman elif tuptuo eht rof detacolla * 			
ew taht yromem eht pu eerF */	;)eliftuo(eerf	

/* maerts elif tuptuo ruo esolC */	;)pf(esolcf	
... etc. ...

Note, that if we wanted to display each line without the characters reversed, we could have replaced the for loop and the fputc() call with either:

fprintf(fp, "%s", list->buffer);

or better yet,

fputs(list->buffer, fp);

Note that the position of the FILE * parameter (i.e. fp) in the parameter list isn't very consistent.

Miscellaneous

Other file modes are also available to the programmer ("a" for append is another popular mode). There are also ways to randomly seek to an arbitrary offset within a file (fseek()). Arbitrary data types (including structures and arrays) can also be read from and written to files using fread() and fwrite(). Note that files written using fwrite() on one computer architecture may not be readable by fread() on a different computer architecture. These two functions should not be used if portability is desired.


Last modified: January 26, 2004 14:08:26 NST (Monday)