The scanf() family of functions we have used so far are intelligent functions for situations where we know what to expect (for example, an integer) and want interpret the input accordingly.

They are extremely convenient for reading in numbers as they skip over white space, including new lines. The user can leave any number of spaces between inputs, or even just put one per line. This is referred to as formatted input as the function has to know the format of the data it is expecting (integer, floating-point number, text string without spaces, etc.)

After it has read the expected characters, the system leaves itself positioned at the next character after the last one it has used, which is nearly always a space or a new-line, '\n'. We saw this problem with the "%c" format to scanf(), but we will remind ourselves of it here:

An example

Consider a file or keyboard input that starts:

1 2 
Mary had a little lamb
...

As far as our program is concerned, it is as if it were a giant character string starting:

"1 2\nMary had a little lamb\n..."

If the code now executes the statement:


  fscanf(infile, "%d %d", &j, &k);

The value 1 and 2 get read into j and k respectively with fscanf() conveniently skipping over unwanted spaces and new-line characters. Having read in everything up to and including "2" the imaginary character string now contains:

You will notice the remaining "string" starts with a newline character, '\n' . This will be very important later on.

"\nMary had a little lamb\n..."

As long as we continue to use fscanf() to read in integer and numbers everything will be fine as all the unwanted spaces just get skipped over.

After a call to scanf() or fscanf(), the system is left looking at the next unread character which is usually a space or a new line.

Reading in text without interpreting it

`getc()` reads in a single character

The getc() function reads in a single character from a file or the keyboard. It requires a single argument which is a FILE* variable just like fscanf(). A few points worth noting:

The newline at the end of the line counts as a character (see example)
Use stdin to read from standard input
getc() returns an int not a char which allows it to return the special value EOF when there is no more input. EOF a number outside of the range 0-255 and is usually negative.

`ungetc()` puts back a single character

Perhaps surprisingly we are allowed to push just one character back into an input stream! This makes it available for the next call to getc(), , etc

The following example reads in a character, pushes it back and reads the character in for a second time. Note that with all unformatted input and output functions the FILE argument comes last.


#include <stdio.h>

int main() {
  int mychar;

  while(1) {
  // Read a character from stdin
    mychar = getc(stdin); 
    if (mychar == EOF) 
      return 0;

    printf("I read: \"%c\"\n", mychar);
    // Put it back
    ungetc(mychar, stdin);

    // Read it again!
    mychar = getc(stdin); 
    printf("Again I read: \"%c\"\n\n", mychar);
  }

}

Step through this code

`putc()` outputs a single character

For example:

  putc('w', stdout);
  putc('o', stdout);
  putc('r', stdout);
  putc('l', stdout);
  putc('d', stdout);
  putc('\n', stdout);

`fgets() reads a line of text`

Sometimes we just want to read in a whole line of text into a character array, referred to as unformatted input. We don't expect it to have any special form such as a number. The most common reason is to be able to input text containing spaces, for example people's names or free-form text for a notebook application.

We shall first look at the mechanics of reading in the text and then deal with the subtleties of combining formatted and unformatted input.

The fgets() function reads a line from a file without interpreting it. That is, fgets() reads the unread input up to and including the next new-line character. The "file" can be the keyboard if stdin (standard input) is used. It has the form:


  fgets(buffer, maxbytes, infile);

Here file is a FILE *. It can be obtained obtained in the usual way using fopen() or we could use the predefined value stdin if we want to read from standard input.

buffer is a character array at least maxbytes long. Like snprintf(), fgets() is "well behaved" and always puts a zero, '\0', at the end of the text even if the input line is too long, thus always leaving a valid zero-terminated string.

Notice that for unformatted input the FILE is the last argument, unlike in fscanf() where it is the first.

Thus fgets() reads in at most maxbytes - 1 bytes of actual input, or up to the next new-line character, whichever comes first. It returns NULL if the read failed completely, for example if we have reached the end of the input file.

fgets(buffer, maxbytes, file); reads at most maxbytes characters from file into the charactyer array buffer, stopping at the end of the line.

Removing the new-line character

If space permits, fgets() includes the new-line character, sent when the user presses the "Return" or "Enter" key. This is always the final character of the string. If we don't want this character, we just need to replace it with '\0', thus shortening the string by one.

The following snippet calls fgets() to read a line from standard input and then checks to see if the last character of a string stored inside a character array is '\n'. If so it replaces it with '\0', thus shortening the string by one.

If this were a program we were writing for other people to use we would need to consider what to do if the final character were not a new line as it probably indicates that the line was too long for out buffer.


  if (fgets(line, N, stdin) != NULL) {
    // Chop off final '\n';
    int end = strlen(line) - 1;
    if (line[end] == '\n')
      line[end] = '\0';
  }

With this we can look at a very short program that uses fgets() to read a line of text from the keyboard and print it out again:

Mixing formatted and unformatted input

In this section we shall be dealing with situations where it is fairly easy to describe what we want to happen but actually making it happen involves some irritating and potentially confusing details. In situations such as these, our instinctive reaction should be to put the potentially confusing code into a function where we can write it once and forget about it.

Whether we have a file of data or are reading from the keyboard, we are always free to mix formatted and unformatted input.

A common situation is to use formatted input to get options from a menu, or to read in data values, and to then need to read in a complete line of text.

The first attempt often looks like this:


// 
// Flawed attempt to read in an integer followed by some text.
//
int main() {
  char line[N];
  int value;

  printf("Please enter the integer value\n");
  scanf("%d", &value);

  printf("Now please type in the text string, spaces are allowed!\n");

  if (fgets(line, N, stdin) != NULL) {
    // Chop off final '\n';
    int end = strlen(line) - 1;
    if (line[end] == '\n')
      line[end] = '\0';
  }

  printf("The value is %d, the text is >%s<\n", value, line);
  return 0;
}

Step through this code

The "conversation" goes like this:

Please enter the integer value
12
Now please type in the text string, spaces are allowed!
The value is 12, the text is ><

The user is given no chance to type in a line of text, instead fgets() just seems to read a completely blank line. What's happening?

The answer is that, as discussed at the start of the lecture, when we typed in "12" we actually typed three characters, '1', '2' and the carriage return, '\n'. As in the previous example, the system reads in the two characters '1' and '2' that form the integer 12 and leaves itself positioned at the very next character, which is the new line '\n'. Thus the "next line" is completely empty!

We can illustrate this by typing at the keyboard not just "12<return>" (three keystrokes) but "12 abc<return>" (seven keystrokes including the space between "12" abd "abc"). The final line of output now looks like this:

The value is 12, the text is > abc<

A call to fscanf() followed by a call to fgets() will almost certainly result in a blank line.

There are various bad solutions at this point (some people's first reaction is just to read in one more character in the hope that nobody will ever type 12<space><return>), but the most common situation is that we require the input line to be non-blank. In this case it's easy to write a loop that carries on reading a line from the file, or keyboard, until it finds a line that contains a non-space character.

If we are reading from stdin it looks like this:


int readoneline(char line[], int maxbytes) {
  while ( 1 ) {
    int i;

    if ( fgets(line, maxbytes, stdin) == NULL ) {
      return 0;  // Out of data
    }

    // We don't the new-line character so chop it
    i = strlen(line) - 1;
    if ( line[i] == '\n')
      line[i] = '\0';

    // Look for a non-blank character.
    for (i = 0; line[i]; ++i)
      if ( isspace(line[i]) == 0) {
        return 1;
      }
  }
}

Step through a complete example

This is quite a useful function.

One solution is to have a function to read in a non-blank line.

Optional further study

The appendix uses this to come with an answer to the question "how do I deal with a user entering completely invalid data such as a letter when we have asked for a number?".
The appendix also covers raw output of binary numeric data to a binary file that is not readable by humans but is more compact and avoids rounding errors.

Command-line arguments

When running a program from the command line it's possible to specify command-line arguments:

myprog hello world

Here myprog is the program name, the first argument is the word hello and the second is world.

Accessing the command-line arguments

Command-line arguments are accessed by declaring main() as:

int main(int argc, char **argv) {
  ...

Or equivalently:

int main(int argc, char *argv[]) {
  ...

That is argc ( argument count ) is an integer, and argv ( argument vector ) is an array (vector) of character strings.

argc is the number of character strings in argv and is one more than the number of command-line arguments.

argv[0] is the program name.

This is usually the name of the file the program is stored in and is not under the control of the programmer.

argv[i] (i > 0) are the program argments.


#include <stdio.h>
int main(int argc,  char *argv[]) {

  if (argc >= 0)
    printf("Welcome to \"%s\"\n", argv[0]);

  for (int i = 1; i < argc; ++i) 
    printf("Argument %d: \"%s\"\n", i, argv[i]);

  return 0;
}

Step through this code

Converting arguments to numbers

Command-line arguments are always presented as character strings even if they are valid numbers. There are a number of ways to convert them to numbers, the easiest is probably to use sscanf():


#include <stdio.h>

int main(int argc,  char *argv[]) {

  if (argc >= 0)
    printf("Welcome to \"%s\"\n", argv[0]);

  for (int i = 1; i < argc; ++i) {
    float val;
    printf("Argument %d: \"%s\"\n", i, argv[i]);
    if ( sscanf(argv[i], "%g", &val) > 0 )
      printf("\tthe value is %g\n", val);
  }

  return 0;
}

Step through this code

Named constants and enumerations

Suppose we need to solve quadratic equations, and let's assume we have defined a structure to represent them. Their roots can be one of three types: two real roots, one repeated real root or two complex roots. It might be useful for our structure to be able to store the solutions and what type they are. Obviously we can do the latter by adding a new member to the structure and setting it to '1' for one root, '2' for two real roots and '3' for two complex roots but then we need to remember what '1', '2' and '3' mean. It's much better to give a name to these constants.

C provides two ways of naming constants. We've met one already, #define, but C provides another way specifically designed for our situation, enumerations:


enum eqnstatus { EQN_UNSOLVED, EQN_ONEROOT, EQN_REALROOTS,
  EQN_COMPLEX_ROOTS };

Now anywhere in our program we could use the named constants EQN_UNSOLVED, EQN_ONEROOT, EQN_REALROOTS, EQN_COMPLEX_ROOTS to mean zero, one, two or three respectively:


enum eqnstatus { EQN_UNSOLVED, EQN_ONEROOT, EQN_REALROOTS,
  EQN_COMPLEX_ROOTS };

int main() {
  enum eqnstatus eqn_status = EQN_UNSOLVED;
  // More  code here ...

  return 0;
}

Enumerations give names to integers and are used to list (enumerate) different, mutually-exclusive choices.

Enumerations are integers and printing them out with %d just prints their integer values but debuggers usually understand them. They can be combined with typedefs as in the example below which also illustrates that enumerations and structures don't actually have to have a type if we don't want them to:


#include <stdio.h>

typedef enum { VANILLA, CHOCOLATE, STRAWBERRY } Flavour;

typedef struct {
  Flavour flavour;
  float fat;
  float sugar;
  float calories;
} Icecream;

main() {
  Icecream icecream;

  icecream.flavour = CHOCOLATE;
  printf("%d\n", icecream.flavour); // Prints: 1

  return 0;
}

The convention is for the individual values to have names that are either all upper-case or have just the first letter capitalised (Vanilla, , etc.)

Non-default values

It's possible to write:

enum something { TYPE1, TYPE2=76, TYPE3, TYPE4 };

In which case TYPE1 has the value 0, TYPE2 has the value 76 and TYPE3 has the value 77, etc. but it's very unusual to.

One class, several types

It's worth noting that we have created a single structure definition with an integer "type" variable rather than a list of identical structure types:


// Don't do this! 
typedef struct {
 float fat;
  float sugar;
  float calories;
} Vanilla_Icecream;

typedef struct {
 float fat;
  float sugar;
  float calories;
} Chocolate_Icecream;

typedef struct {
 float fat;
  float sugar;
  float calories;
} Strawberry_Icecream;

An occasionally-useful trick

We may want to have an array of Icecream structures, one for each flavour. To help with this we can use the useful trick of adding a dummy enumeration value at the end of the list:


typedef enum { VANILLA, CHOCOLATE, STRAWBERRY, NUM_FLAVOURS } Flavour;

typedef struct {
  Flavour flavour;
  float fat;
  float sugar;
  float calories;
} Icecream;

Icecream icecreams[NUM_FLAVOURS];

Now NUM_FLAVOURS has the value "3" and as we add new flavours then as long as we are careful to keep NUM_FLAVOURS as the last in the list its value will always be correct:


typedef enum { VANILLA, CHOCOLATE, STRAWBERRY, TOFFEE, NUM_FLAVOURS } Flavour;

Adding new types or options

Enumeration provide a sinple and neat way of adding new types or options, for example:

// Memristor is new in v2.0
typedef enum { Resister, Capacitor, Inductor, Memristor } Component;

Enumerations and the `switch()` statement

Enumerations are integers and they have a natural affinity with the switch() statement:

switch (person->gender) {
case Female:
  ...
}

Most modern compilers can warn us if a switch() statement with an enumeration as its argument doesn't handle one of the possible cases which can be very helpful: if the above example when we added Memristor we will be warned of any switch() statements that don't handle it. Of course this can't be done with an if() .. else if() statement which is another reason to use switch().

When to use enumerations

Enumerations are used for one thing only: when a variable is used to "enumerate" different, mutually-exclusive possibilities. For general named constants use #define.

Summary of advantages over `#define`

A simple all-in-one way of adding more options later on.
Work well with switch(), most compilers can check for missing cases.
Are understood by most debuggers so a gender will be shown as "Female", "Male", etc rather than 1 or 2.

Sorting, and pointers to functions

Suppose we wish to sort an array of integers. We can easily write a double for() loop:


#include <stdio.h>
#include <stdlib.h>
#define N 8

int main() {
  int x[N] = { 8, 6, 3, 5, 7, 1, 4, 2}, i, j, tmp;

  for (i = 0; i < N; ++i) {
    for (j = i + 1; j < N; ++j) {
      if (x[j] < x[i]) { // swap them
        tmp = x[i];
        x[i] = x[j];
        x[j] = tmp;
      }
    }
  }

  for (i = 0; i < N; ++i) 
    printf("%d\n", x[i]);

  return 0;
}

Step through this code

This takes N*(N-1)/2 comparisons (and possible swaps) which is fine when N equals 8 but a problem when N equals ten thousand or one million.

Using a better algorithm

There are better sorting algorithms that take approximately N*log₂(N) comparisons which is clearly much quicker for large N. But how do we use them? In principle we could find the algorithm and program it ourselves but it would be much easier if we could have a collection of "quick" library functions written by somebody else which we could call:

int x[N];
// Setup x
qsort_ints(x, N);

float x[N];
// Setup x
qsort_floats(x, N);

double x[N];
// Setup x
qsort_doubles(x, N);

But what happens if we wish to sort an array of structures? C's answer to this is to have a generalised function "qsort()" which can sort an array of anything. Like malloc(), etc. qsort() requires <stdlib.h>.

qsort() can sort arrays of anything

qsort()can sort an array of anything.

Like malloc(), etc. qsort() requires <stdlib.h>.

Example: sorting an array of ints

Suppose we wish to use qsort() to sort an array of ints:

  int x[N];

The qsort() function needs to know four things. The first two are fairly obvious, the second two slightly less so:

The first argument to qsort() is the name of the array to be sorted.

The second argument to qsort() is the number of elements to be sorted.

How does `qsort()` find the second element of the array?

qsort() needs to not only be able to find x[0] but also x[1], x[2], etc. and therefore it needs to know how many bytes to move along the array per array element. For most "normal" functions this would be done by knowing the type of the array, but qsort() is specifically designed not to need to now the type of the array to be sorted, which means it does not know the size of each element. So we have to explicitly tell it.

Fortunately this is quite simple, if we are sorting the array x then the size of each element is just sizeof *x.

The third argument to qsort() is the size of each element: sizeof *first_argument

Comparing two array elements

Finally, given two array elements qsort() needs to know which should come first. Since qsort() has no idea what type of array it is sorting we have to provide a comparison function to tell it.

The fourth and final argument to qsort() is the name of the comparison function.

The comparison function

As mentioned above, since qsort() has no way of knowing what it is sorting we have to tell it. Specifically we have to give it a function which will take two array elements as its argument and return an integer value that is (strictly) negative, zero or (strictly) positive depending on whether the first argument is before, the same as or after the second in the sorting order. Thus the function returns type int.

Passing the array elements to the comparison function

If qsort() doesn't know the type of the things it is sorting how can it pass two of them to our comparison function? The answer is that it knows the address of the two elements to be sorted so it passes two (void *) pointers to the comparison function and lets us deal with the rest. To possibility that our comparison function might try to change the two elements we use the const qualifier meaning that the prototype for our function to compare two ints is:

int compared(const void *p1, const void *p2);

where we know that in this case p1 and p2 point to ints but qsort() doesn't.

Our complete call to qsort() is now:

 qsort(x, N, sizeof *x, compared);

The most interesting thing here is that we have passed the name of our function to qsort(). Just like with arrays the name of a function is the address of the function and qsort() will now be able to call that function whenever it needs to compare two array elements.

Writing the comparison function

This isn't difficult, but we do need to be able to extract our two ints from the two void * pointers. Since we know that p1 points to a int we may be tempted to write;

int compared(const void *p1, const void *p2) {
  int x1 = *p1;  // Wrong
  int x2 = *p2;  // Wrong

This looks fine to us as we read from left to right and we see "int x1" before we see *p1 and so it's "obvious" that *p1 should be interpreted as if a pointer to a int. But the compiler looks at the right-hand side first and so has no idea what type of pointer p1 is. We can fix this either in two stages:


int compared(const void *p1, const void *p2) {
  const int *xp1 =  p1, *xp2 =  p2;
  int x1 = *xp1, x2 = *xp2;

Or just one:


  int x1 = *(int *) p1, x2 = *(int *)p2;

Expressions of the form (typename) sub-expression
explicitly convert the sub-expression to that type.

A complete example for sorting an array of `int`s


#include <stdio.h>
#include <stdlib.h>
#define N 8

int compared(const void *p1, const void *p2) {
  static int called;
  int x1 = *(int *) p1, x2 = *(int *)p2;

  ++called;
  if ( x1 < x2 )
    return -1;
  return  x1 > x2;
}

int main() {
  int x[N] = { 8, 6, 3, 5, 7, 1, 4, 2};

  qsort(x, N, sizeof *x, compared);
  for (int i = 0; i < N; ++i) 
    printf("%d\n", x[i]);

  return 0;
}

Step through this code

Other uses for pointers to functions

qsort() uses two common tricks for the siutaion where we wish to use a pre-written library function which itself needs to run some of our own code:

Passing the address of our own function so the library routine can call it.
Using void * pointers to pass information in such a way that the library function does not need to know anything about it.

Other applications include:

Numerical integration.
Minimising the value of a function, such as energy or cost, or maximising a function such as entropy.
Windowing systems where we want to say "if the user clicks here call this function".

Bitwise operators

Bitwise operators operate on the individual bits of a variable not its value.

It's unlikely you will have a lot of use for bitwise operators (operators that operate on the individual bits of a variable) but if you do, here they are:

Operator	Description
&	Bitwise and (both bits are one)
\|	Bitwise or (either or both bits are one)
^	Bitwise exclusive or (just one bit is one)
<<	Left shift (muliplies by power of 2)
>>	Right shift (divides by power of 2)
~	One's complement

NB the One's complement operator takes a single argument, the others two.

Examples

We take the example of an unsigned char (one byte, ie 8 bits). Curiously there is no way to write numbers as binary in C; the nearest is hexadecimal, but for clarity we shall show the calculation in binary.

Expression	Calculation
`00110011 & 11110000`	`00110011 11110000 -------- 00110000`
`00110011 \| 11110000`	`00110011 11110000 -------- 11110011`
`00110011 ^ 11110000`	`00110011 11110000 -------- 11000011`
`10110011 << 2`	`10110011 -------- 11001100`
`11110000 >> 3`	`11110000 -------- 00011110`
`~11110000`	`11110000 -------- 00001111`

Binary flags

The most common use of binary operators is to define flags, ie options that are either on or off. Typically there's a header file with various constants each of which is an exact power of 2 (ie has only one bit set):


#define OPTION1 0x01  //  1 binary 00000001 
#define OPTION2 0x02  //  2 binary 00000010 
#define OPTION3 0x04  //  4 binary 00000100 
#define OPTION4 0x08  //  8 binary 00001000 
#define OPTION5 0x10  // 16 binary 00010000 
#define OPTION6 0x20  // 32 binary 00100000

The code sets, unsets and tests options as follows:


#include <stdio.h>

#define OPTION1 0x01  //  1 binary 00000001 
#define OPTION2 0x02  //  2 binary 00000010 
#define OPTION3 0x04  //  4 binary 00000100 
#define OPTION4 0x08  //  8 binary 00001000 
#define OPTION5 0x10  // 16 binary 00010000 
#define OPTION6 0x20  // 32 binary 00100000 

int main(void) {
 unsigned int flags = 0;

 
  flags |= OPTION3; // Set OPTION3 
  flags |= OPTION4; // Set OPTION3 

  flags &= ~OPTION4; // Unset OPTION4 
  
  if ( (flags & OPTION3) )
    printf("Option 3 is set\n");

  return 0;
}

Step through this code

The setting and testing of flags is fairly clear, the unsetting of OPTION4 is a little more complicated: OPTION4, like all options, has just one bit set so ~OPTION4 has every bit except that one set. So flags &= ~OPTION4 has the following effect:

The bit that was one in OPTION4 is zero ~OPTION4 so the binary and for that bit must always be zero.
The bit that was zero in OPTION4 is one ~OPTION4 so the binary and for that bit will be one if and only if the corresponding bit in flags was one. That is to say, it is not changed.

Summary

The text of each key point is a link to the place in the web page.

Enumerations give names to integers and are used to list (enumerate) different, mutually-exclusive choices.

For study after the lecture

Appendix: binary I/O & handling bad input

Validate Link-check

Privacy & cookies

Scientific programming in C: wrap-up

Recap: formatted input