Learn how file input and output works in C

Understanding I/O can help you do things faster.

4 manilla folders, yellow, green, purple, blue

Image by:

Open Clip Art Library (public domain). Modified by Jen Wike Huger.

If you want to learn input and output in C, start by looking at the stdio.h include file. As you might guess from the name, that file defines all the standard ("std") input and output ("io") functions.

The first stdio.h function that most people learn is the printf function to print formatted output. Or the puts function to print a simple string. Those are great functions to print information to the user, but if you want to do more than that, you'll need to explore other functions.

You can learn about some of these functions and methods by writing a replica of a common Linux command. The cp command will copy one file to another. If you look at the cp man page, you'll see that cp supports a broad set of command-line parameters and options. But in the simplest case, cp supports copying one file to another:

cp infile outfile

You can write your own version of this cp command in C by using only a few basic functions to read and write files.

Reading and writing one character at a time

You can easily do input and output using the fgetc and fputc functions. These read and write data one character at a time. The usage is defined in stdio.h and is quite straightforward: fgetc reads (gets) a single character from a file, and fputc puts a single character into a file.

int fgetc(FILE *stream);
int fputc(int c, FILE *stream);

Writing the cp command requires accessing files. In C, you open a file using the fopen function, which takes two arguments: the name of the file and the mode you want to use. The mode is usually r to read from a file or w to write to a file. The mode supports other options too, but for this tutorial, just focus on reading and writing.

Copying one file to another then becomes a matter of opening the source and destination files, then reading one character at a time from the first file, then writing that character to the second file. The fgetc function returns either the single character read from the input file or the end of file (EOF) marker when the file is done. Once you've read EOF, you've finished copying and you can close both files. That code looks like this:

  do {
    ch = fgetc(infile);
    if (ch != EOF) {
      fputc(ch, outfile);
    }
  } while (ch != EOF);

You can write your own cp program with this loop to read and write one character at a time by using the fgetc and fputc functions. The cp.c source code looks like this:

#include <stdio.h>

int
main(int argc, char **argv)
{
  FILE *infile;
  FILE *outfile;
  int ch;

  /* parse the command line */

  /* usage: cp infile outfile */

  if (argc != 3) {
    fprintf(stderr, "Incorrect usage\n");
    fprintf(stderr, "Usage: cp infile outfile\n");
    return 1;
  }

  /* open the input file */

  infile = fopen(argv[1], "r");
  if (infile == NULL) {
    fprintf(stderr, "Cannot open file for reading: %s\n", argv[1]);
    return 2;
  }

  /* open the output file */

  outfile = fopen(argv[2], "w");
  if (outfile == NULL) {
    fprintf(stderr, "Cannot open file for writing: %s\n", argv[2]);
    fclose(infile);
    return 3;
  }

  /* copy one file to the other */

  /* use fgetc and fputc */

  do {
    ch = fgetc(infile);
    if (ch != EOF) {
      fputc(ch, outfile);
    }
  } while (ch != EOF);

  /* done */

  fclose(infile);
  fclose(outfile);

  return 0;
}

And you can compile that cp.c file into a full executable using the GNU Compiler Collection (GCC):

$ gcc -Wall -o cp cp.c

The -o cp option tells the compiler to save the compiled program into the cp program file. The -Wall option tells the compiler to turn on all warnings. If you don't see any warnings, that means everything worked correctly.

Reading and writing blocks of data

Programming your own cp command by reading and writing data one character at a time does the job, but it's not very fast. You might not notice when copying "everyday" files like documents and text files, but you'll really notice the difference when copying large files or when copying files over a network. Working on one character at a time requires significant overhead.

A better way to write this cp command is by reading a chunk of the input into memory (called a buffer), then writing that collection of data to the second file. This is much faster because the program can read more of the data at one time, which requires fewer "reads" from the file.

You can read a file into a variable by using the fread function. This function takes several arguments: the array or memory buffer to read data into (ptr), the size of the smallest thing you want to read (size), how many of those things you want to read (nmemb), and the file to read from (stream):

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

The different options provide quite a bit of flexibility for more advanced file input and output, such as reading and writing files with a certain data structure. But in the simple case of reading data from one file and writing data to another file, you can use a buffer that is an array of characters.

And you can write the buffer to another file using the fwrite function. This uses a similar set of options to the fread function: the array or memory buffer to read data from, the size of the smallest thing you need to write, how many of those things you need to write, and the file to write to.

size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

In the case where the program reads a file into a buffer, then writes that buffer to another file, the array (ptr) can be an array of a fixed size. For example, you can use a char array called buffer that is 200 characters long.

With that assumption, you need to change the loop in your cp program to read data from a file into a buffer then write that buffer to another file:

  while (!feof(infile)) {
    buffer_length = fread(buffer, sizeof(char), 200, infile);
    fwrite(buffer, sizeof(char), buffer_length, outfile);
  }

Here's the full source code to your updated cp program, which now uses a buffer to read and write data:

#include <stdio.h>

int
main(int argc, char **argv)
{
  FILE *infile;
  FILE *outfile;
  char buffer[200];
  size_t buffer_length;

  /* parse the command line */

  /* usage: cp infile outfile */

  if (argc != 3) {
    fprintf(stderr, "Incorrect usage\n");
    fprintf(stderr, "Usage: cp infile outfile\n");
    return 1;
  }

  /* open the input file */

  infile = fopen(argv[1], "r");
  if (infile == NULL) {
    fprintf(stderr, "Cannot open file for reading: %s\n", argv[1]);
    return 2;
  }

  /* open the output file */

  outfile = fopen(argv[2], "w");
  if (outfile == NULL) {
    fprintf(stderr, "Cannot open file for writing: %s\n", argv[2]);
    fclose(infile);
    return 3;
  }

  /* copy one file to the other */

  /* use fread and fwrite */

  while (!feof(infile)) {
    buffer_length = fread(buffer, sizeof(char), 200, infile);
    fwrite(buffer, sizeof(char), buffer_length, outfile);
  }

  /* done */

  fclose(infile);
  fclose(outfile);

  return 0;
}

Since you want to compare this program to the other program, save this source code as cp2.c. You can compile that updated program using GCC:

$ gcc -Wall -o cp2 cp2.c

As before, the -o cp2 option tells the compiler to save the compiled program into the cp2 program file. The -Wall option tells the compiler to turn on all warnings. If you don't see any warnings, that means everything worked correctly.

Yes, it really is faster

Reading and writing data using buffers is the better way to write this version of the cp program. Because it reads chunks of a file into memory at once, the program doesn't need to read data as often. You might not notice a difference in using either method on smaller files, but you'll really see the difference if you need to copy something that's much larger or when copying data on slower media like over a network connection.

I ran a runtime comparison using the Linux time command. This command runs another program, then tells you how long that program took to complete. For my test, I wanted to see the difference in time, so I copied a 628MB CD-ROM image file I had on my system.

I first copied the image file using the standard Linux cp command to see how long that takes. By running the Linux cp command first, I also eliminated the possibility that Linux's built-in file-cache system wouldn't give my program a false performance boost. The test with Linux cp took much less than one second to run:

$ time cp FD13LIVE.iso tmpfile

real	0m0.040s
user	0m0.001s
sys	0m0.003s

Copying the same file using my own version of the cp command took significantly longer. Reading and writing one character at a time took almost five seconds to copy the file:

$ time ./cp FD13LIVE.iso tmpfile

real	0m4.823s
user	0m4.100s
sys	0m0.571s

Reading data from an input into a buffer and then writing that buffer to an output file is much faster. Copying the file using this method took less than a second:

$ time ./cp2 FD13LIVE.iso tmpfile

real	0m0.944s
user	0m0.224s
sys	0m0.608s

My demonstration cp program used a buffer that was 200 characters. I'm sure the program would run much faster if I read more of the file into memory at once. But for this comparison, you can already see the huge difference in performance, even with a small, 200 character buffer.