Skip to content
Physics and Astronomy
Home Our Teaching Resources C programming Appendix: binary I/O & handling bad input
Back to top
On this page
Contents

Appendix: Binary IO and handling bad input

The occasional appendices and optional examples in this module are for advanced material that you will not need for this module. They are intended for enthusiastic students who are interested in going further in programming for its own sake.

Handling bad input

This is an advanced topic and can be omitted if desired.

So far we have handled checking the values of numbers typed in at the keyboard by enclosing the call to scanf() inside an infinite loop, checking the values typed in and printing an error message if they are incorrect or breaking out of the loop if they are OK.

But you may already have encountered the situation where you have typed a non-numeric character by mistake, say 'q' instead of '1'. The problem is that in this situation scanf() leaves the input at the first character that doesn't match what it expects, the 'q'. The loop does not break so scanf() is called again and does exactly the same thing again: it stops at the 'q'. And so on for ever. A better alternative is to check the return value of scanf() and to quit the program if it failed to read in the correct numnber of items.

This might be thought to be an unhelpful response, but the question arises "what should the system do in this situation?".

One possible approach

The general idea here is to print a helpful message to the screen and skip the rest of the line. This only makes sense if we are reading from the keyboard there is no point in doing this if we are reading from a file.

Even here there are a few subtleties, for example: what happens if we reach the end of the input? This could be because stdin is coming from a file not the keyboard, or because the terminal window has closed. Or the input may be coming over the network and the connection might break. Thankfully we can tell when this happens as scanf() returns the special value EOF, usually equal to -1 and certainly negative. Note the distinction: scanf() returns zero when there was data but the wrong sort, such as a letter for a format of %d, and EOF when there is no data at all.

With that in mind we can write an error handling function which we shall call skipline(),whose job is to read in and discard the rest of the line so that the user can try again:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

void skipline(void);

int main() {
  int x, y;

  while (1 == 1) {
    printf("Please enter two integers > 0\n");

    if (scanf("%d %d", &x, &y) != 2)
      skipline();
    else if ( x <= 0 || y <= 0 )
      printf("Only integers greater than zero are allowed\n");
    else
      break;

    printf("\n\tPlease try again.\n\n");
  }
    
  printf("Read: %d %d\n", x, y);
  return 0;
}

//
// Read and discard the rest of the line from stdin, printing it so
// the user knows what's going on. 
//
void skipline(void) {
  int i;
  
  printf("\nSkipping unexpected input: ");

  while ((i = getc(stdin)) != '\n') {
    if ( i == EOF ) {
      printf("End of standard input\n");
      exit(1);
    }    
    putc(i, stdout);
  }

  putc('\n', stdout);
}
Step through this code


It's important to notice that we have put all of the nastiness of checking for the end of file inside of the skipline() function, we have not left any of it for the calling function to handle.

Dealing with errors is always tedious and should be separated from the main logic of the code as far as possible.

Binary input and otput of numerical data

Sometimes we require a program to be able to save data to a file but we know the file will only ever be read by another program. For example, long-running numerical simulations often periodically save their state so they can be stopped and restarted from their last saved state.

Considering the example of a two-dimensional matrix, the obvious way to do it is to use a double loop and fprintf():

 for (int m = 0; m < M; ++m)
    for (int n = 0; n < N; ++n) 
      fprintf(datafile, "%g\n", x[m][n]);

and later read it in with:

 for (int m = 0; m < M; ++m)
    for (int n = 0; n < N; ++n) 
      fscanf(datafile, "%lg", &y[m][n]);

However this is inefficient:

  • The program is having to take eight bytes of binary data, translate it to human-readable form and then translate it back again.
  • We are losing precision.
  • The file is bigger.

We can improve the precision, at the cost of a larger file size, by writing using a format such as "%.10g" which means "use ten decimal places". But the data we read in still won't be exactly the same as we wrote out and that's a problem.

Binary input and output

C gives us the ability to write the raw bytes to a file, in this case (assuming a true 2-D array) with a single function call:

 fwrite(x, sizeof x[0][0], N*M, datafile);

This means: write N*M chunks of data each of the size of x[0][0] from x into datafile.

The corresponding "read" call is:

fread(y, sizeof y[0][0], N*M, datafile);

Now y is identical to x as we have just copied the bytes to and from the disk.

The complete program

/*
 * Unformatted store and read of a matrix.
 */

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define N 100
#define M 200

int main() {
  double x[M][N], y[M][N], error = 0.0;
  FILE *datafile;

  for (int m = 0; m < M; ++m)
    for (int n = 0; n < N; ++n) 
      x[m][n] = sin(0.01 * n + 0.0153 * m);

  // Write the data
  if ((datafile = fopen("Bindat.dat", "w")) == NULL ) {
    printf("Cannot create file\n");
    exit(1);
  }

  fwrite(x, sizeof x[0][0], N*M, datafile);
  fclose(datafile);

  // Now read it back in
  if ((datafile = fopen("Bindat.dat", "r")) == NULL ) {
    printf("Cannot read file\n");
    exit(1);
  }

  fread(y, sizeof y[0][0], N*M, datafile);
  fclose(datafile);

  // Now compare
  for (int m = 0; m < M; ++m)
    for (int n = 0; n < N; ++n) 
      error += fabs(y[m][n] - x[m][n]);

   printf("Total error is: %g\n", error);
   return 0;
}

The error is zero (precisely) as we have read back exactly the same bytes as we wrote.

Two gotchas

Partial and/or pseudo-arrays

The "all in one go" approach above writes the whole array, but in some circumstances we may only be using a part of the whole array. For example we may be going from x[0][0] to x[m-1][n-1] where m<=M and n<=N. The solution is to write each row separately using a loop:

for (int j = 0; j < m; ++j)
  fwrite(x[j], sizeof x[j][0], n, datafile);

Alternatively, suppose the x[m][n] were not part or a true array but a dynamically-allocated psudo-array using malloc():

double **x;

x = xmalloc(m * sizeof *x);
for (int j = 0; j < m; ++j)
  x[j] = xmalloc(n * sizeof *x[j]);

Then x is a (dynamically allocated) array of pointers, not a 2-D array of doubles. The above code, writing each row separately, works in this case as well.

MS Windows

For horrible historical reasons MS Windows distinguishes between text and binary files so it is recommended you open binary files with "wb" and "rb":

fopen("Bindat.dat", "wb") // MS Windows only
                                                                                                                                                                                                                                                                       

Validate   Link-check © Copyright & disclaimer Privacy & cookies Share
Back to top