Home › Our Teaching › Resources › C programming › libraries.html

Sorting with qsort()

Using libraries

Comments and questions to John Rowe.

The base language provides the fundamental of programming, but effective programing requires the use of libraries of functions. This is because to do something well is a lot harder than a rough and ready job and people are often willing to share what they have done, sometimes for free!

Libraries typically come in three catagories:

The standard library - available everywhere.
System libraries - which change from system to system. Most of these come ito two sub-categories: MS Windows, and everything else.
Third-party libraries. many of these are free.

The standard library

Here we give two examples, both from <stdlib.h>: pseudo-random numbers and sorting.

Pseudo-random numbers with `rand()`

Warning: if you are wanting to do "Monte Carlo"-type calculations you will need a more specialist library

The rand() function produces pseudo-random integers, suitable for dice rolls etc, in the range 0 and RAND_MAX (the latter also being defined inside <stdlib.h> and is only guaranteed to "be at least 32767" but is typically the maximum size of an int). The number is generated from a deterministic algorithm.

POSIX (a standard) and C99 both quote the following infamous example whose exact origins are lost in the mists of time:

Please don't use this!

static unsigned long next = 1;

/* RAND_MAX assumed to be 32767 */
int rand(void) {
  next = next * 1103515245 + 12345;
    return((unsigned)(next/65536) % 32768);
}

Converting the output the right range

The simplest way to convert the output of rand() to a number in the range 1 to n is:

dieroll = 1 + (rand() % n);

The "example" implementation above has poor "randomness" in its lower-order bits and it is commonly aserted that following is more reliable:

dieroll = 1 + (int) (n * (rand()/(RAND_MAX + 1.0)));

but the reality is that if your pseudo-random number generator is poor nothing will fix it. (Modern implementations are quite good, although not good enough for Monte Carlo simulations.) However it does fix a minor problem: if RAND_MAX+1 is not an exact multiple of n then values in the range 1 to RAND_MAX%n will be sightly favoured.

Seeding the generator

As the generator is algorithmic it follows that rand() always generates the same pseudo-random sequence. The srand() function allows us to initialise ("seed") the generator to avoid this. It takes an unsigned int. The traditional code is:

srand(time(NULL));

time(NULL) returns the number of seconds since Jan 1 1970 and requires time.h. There are two warnings only one of which is important.

The important one: call `srand()` just once

Since srand() resets the genrator to a known state it follows that it is called only once per run of the program. The following disasterous code continually resets the generator and so produces the same "random" number for a whole second:

int diceroll(void) {
  srand(time(NULL));  // Wrong!
  return 1 + (int) (6 * (rand()/(RAND_MAX + 1.0)));

The correct code is:

int main() {
  srand(time(NULL));  // Never call srand() again
  ...
}

int diceroll(void) {
  return 1 + (int) (6 * (rand()/(RAND_MAX + 1.0)));
}

The other one: the Y2106 problem

On modern systems time(NULL) returns a signed 64-bit integer. (Older systems used a signed 32-bit integer, meaning that dates after 19 January 2038 could not be represented). Some time in February 2106 this will exceed the range of an unsigned 32-bit integer. If you think your code may be running after that you may wish to try:

  time_t seed = time(NULL);
  if ( sizeof(unsigned int) <= 4 )
    seed &= 0xffffffff;
  srand((unsigned int) seed);

Sorting with `qsort()`

As well as malloc, etc. <stdlib.h> includes a function, qsort(), to sort array elements. These can be arrays of any type (float, int, etc.) or arrays of structures.

Example: sorting an array of doubles

Suppose we wish to useqsort() to sort an array of doubles:

  double x[N];

The qsort() function needs to know four things. The first two are fairly obvious, the second two slightly less so:

The (address of) the array to be sorted, x.
The number of elements in the array (N).
The size of each array element. This is because qsort() needs to not only be able to find x[0] but also x[1], x[2], etc. and therefore it needs to know how many bytes to move alog the array per array element. Fortunately this is quite simple, if we are sorting the array x then the size of each element is just sizeof *x.
How to compare two array elements, x[j] and x[k]. This is slightly more subtle than it looks.

The comparison function

Since qsort() has no way of knowing what it is sorting we have to tell it. Specifically we have to give it a function which will take two array elements as its argument and return an integer value that is (strictly) negative, zero or (strictly) positive depending on whether the first argument is before, the same as or after the second in the sorting order. Thus the function returns type int.

Passing the array elements to the comparison function

If qsort() doesn't know the type of the things it is sorting how can it pass two of them to our comparison function? The answer is that it knows the address of the two elements to be sorted so it passes two (void *) pointers to the comparison function and lets us deal with the rest. It should be noted that passing pointers raises the possibility that our comparison function might try to change the two elements, this is dealt with using the const qualifier meaning that the prototype for our function to compare two doubles is:

int compared(const void *p1, const void *p2);

where we know that in this case p1 and p2 point to doubles but qsort() doesn't.

Our complete call to qsort() is now:

 qsort(x, N, sizeof *x, compared);

The most interesting thing here is that we have passed the name of our function to qsort(). Just like with arrays the name of a function is a pointer to the function and qsort() will now be able to call that function whenever it needs to compare two array elements.

Writing the comparison function

This isn't difficult, but we do need to be able to extract our two doubles from the two void * pointers. Since we know that p1 ponts to a double we may be tempted to write;

int compared(const void *p1, const void *p2) {
  double x1 = *p1;  // Wrong

(and similarly for p2). This looks fine to us as we read from left to right and we see "double x1" before we see *p1 and so it's obvious that *p1 should be interpreted as if a pointer to a double. But the compiler looks at the right-hand side first and so has no idea what type of pointer p1 is. We can fix this either in two stages:

int compared(const void *p1, const void *p2) {
  double *px1 = p1;
  double x1 = *px1;

Or just one:

int compared(const void *p1, const void *p2) {
  double x1 = *(double *) p1, x2 = *(double *)p2;

The complete example for sorting an array of `double`s

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define N 22
int compared(const void *p1, const void *p2);

int main() {
  double x[N];

  for (int i = 0; i < N; ++i) 
    x[i] = sin(2 * 3.14159265358979 *i/(1.0 +N));

  qsort(x, N, sizeof *x, compared);
  for (int i = 0; i < N; ++i) 
    printf("%g\n", x[i]);
}

int compared(const void *p1, const void *p2) {
  double x1 = *(double *) p1, x2 = *(double *) p2;

  if ( x1 > x2)
    return 1;
  else if ( x1 < x2 )
    return -1;
  else
    return 0;
}

Complete example: sorting an array of structures:

A common requirement is to sort an array of structures.

For example, let's imagine we have a list of foods and we want them sorted by calories, with foods with identical calories listed alphabetically. Our program might look like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAMELEN 256
typedef struct food {
  char name[NAMELEN];
  float calories;
} Food;

// Sorting function
int calories_or_alphabetical(const void *a, const void *b);

int main(void) {
  Food *foods = NULL;
  int howmany, todo;

  do {
    printf("How many foods (>= 1)?\n");
    scanf("%d", &howmany);
  } while (howmany < 1);

  if ((foods = malloc(howmany * sizeof *foods)) == NULL) {
    fprintf(stderr, "Sorry you're too hungry for this Mac!\n");
    exit(-1);
  }

  for (todo = howmany -1; todo >= 0; --todo) {
    int readin;

    printf("Name and calories?\n");
    readin = scanf("%s %f", foods[todo].name, &foods[todo].calories);
    if (readin != 2) {
      fprintf(stderr, "Sorry, I couldn't understand that\n"
        "Please make sure the food name has no spaces.\n");
      ++todo;
    }
  }

  qsort(foods, howmany, sizeof *foods, calories_or_alphabetical);

  for (todo = howmany -1; todo >= 0; --todo) 
    printf("%s %f\n", foods[todo].name, foods[todo].calories);
  return 0;
}

/*
 * Return positive or negative or zero according to which food
 * should come first in the sorted list
 */
int calories_or_alphabetical(const void *a, const void *b) {
  const Food *fooda = a, *foodb = b;

  if ( fooda->calories != foodb->calories)
    return foodb->calories - fooda->calories;
  return strcmp(foodb->name, fooda->name);
}

The comparison function

Notice that the very first thing the function does is to turn the two void * pointers into something it can use (which must be the same type as the array qsort() was called with of course). Then it checks to see if one food has more calories than the other. If they have the same calories it checks for which comes first in the alphabet. Note that our function calories_or_alphabetical can return any positive or negative number it likes. All that counts is whether the value is positive, negative or zero.

The results

INPUT:

How many foods (>= 1)?
5
Name and calories?
cream 300
Name and calories?
bigmac 300
Name and calories?
kentucky 500
Name and calories?
yoghurt 100
Name and calories?
apple 100

OUTPUT:

apple 100.000000
yoghurt 100.000000
bigmac 300.000000
cream 300.000000
kentucky 500.000000

Debugging the comparison function

The sort function always introduces another layer of pointing. In the above example, we have an array of structures but each argument to calories_or_alphabetical is a pointer to a structure (Food *). Had we had an array of pointers to structures each argument would be a pointer to a pointer (Food **) and the start of calories_or_alphabetical would have been:

int calories_or_alphabetical(const void *a, const void *b) {
  const Food *fooda = * (Food **) a, *foodb = * (Food **) b;
  fprintf(stderr, "%s %s\n", fooda->name, foodb->name);

Notice we have added a diagnostic to check we've got my levels of pointers right. You would be very wise to do the same!

Wider application

Sorting is important in its own right, it has also introduced two new concepts: passing a function name as an argument to another function so that second function can call it and using a void * pointer to pass a pointer to a structure of our own devising through a library function to another function of ours. These two techniques are important when dealing with libraries as by their nature they know nothign about either our functions or our datastructures.

OS-dependent libraries

All operating systems come with libraries of functions for accessing system resources. Just about every operating system except MS Windows supports the POSIX standard referred to earlier and published by The Open Group.

Example: pausing ("sleeping")

We sometimes wish our program to pause for a specific length of time. This is referred to as "sleeping" and POSIX provides two functions, both inside unistd.h:

sleep(n) sleeps for n seconds
usleep(un) sleeps for un microseconds

MS Windows provides its own version:

Sleep(mn) sleeps for mn milliseconds

If we need our program to work on oth (iei to be "cross-platform") we could have this in our header file:

#ifdef MS_WINDOWS
#define msleep Sleep
#else
#include <unistd.h>
#define msleep(n) usleep(1000*(n))
#endif

Now we can write:

  msleep(500);

and the program will pause for half a second.

Command-line arguments

So far we have not talked about giving command-line arguments to our C program as they are not particulary useful on the Mac. However, the C standard actually gives two possibilities for the main() function:

int main(void) {
  ... 
}

and

int main(int argc,  char *argv[]) {
  ... 
}

In the latter case, if argc is greater than zero then argv[0] gives the program name and if argc is greater than one then argv[1] ... argv[argc-1] are the command-line arguments. They are all fixed strings.

The following simple program prints our its name and the command-line arguments (if any):

#include <stdio.h>
int main(int argc,  char *argv[]) {

  if (argc >= 0)
    printf("Welcome to \"%s\"\n", argv[0]);

  for (int i = 1; i < argc; ++i) 
    printf("Argument %d: \"%s\"\n", i, argv[i]);

  return 0;
}

Step through this code

The `getopt()` library function

On non-Windows systems the convention is that "on or off" flags are indicated by a dash followed by a single letter:

./myprog -a -b

which may be amalgamated if preferred:

./myprog -ab

For options with arguments, the argument follows the option. For example we may have a "-e" option which specifies an error file; if it is not specified error output goes to stderr. If we wanted to set all three of these we would have a large number of ways to do it including:

./myprog -ab -e somefile
./myprog -a -b -e somefile
./myprog -ae somefile -b

etc. We could, of course, write our own code to parse these options but unistd.h provides a function getopt() to do it for us. We give it argc, argv and a character string listing the allowed options with options requiring an argument followed by a colon ":". So in our case the option string would be "abe:". The getopt() function returns the next option, or -1 if there are no more, and in the case of a ":" option sets the external char* pointer optarg to point to it.

So in our example:

#include <stdio.h>
#include <unistd.h>

#define OPTION_A 0x01
#define OPTION_B 0x02

unsigned int flags;

int main (int argc, char *argv[]) {
  int o;
  FILE *errfile = stderr;

  // Process command-line arguments
  while((o = getopt(argc, argv, "abe:")) != -1) {
    switch(o) {
  
    case 'a':
      flags |= OPTION_A;
      break;
      
    case 'b':
      flags |= OPTION_B;
      break;
      
    case 'e':
      if ((errfile = fopen(optarg, "w")) == NULL ) {
  fprintf(stderr, "cannot open file \"%s\"\n", optarg);
  return 1;
      }
    }
  }

  // Do some work ...
  
  return 0;
}

getopt() has a number of other abilities as well.

Third-party lbraries

There are a good number of (mainly free) numerical libraries available, including:

Basic Linear Algebra Subprograms (blas) for fast vector and matrix operations.
Linear Algebra package (lapack) for higher-level operations such as eigenvectors.
The FFTW Fast Fourier Transform library

NB: blas libraries vary greatly in speed. The best choices tend to cost money, but ATLAS is a good free one.

Validate Link-check

Privacy & cookies

Using libraries

The standard library

Pseudo-random numbers with rand()

Please don't use this!

Converting the output the right range

Seeding the generator

The important one: call srand() just once

The other one: the Y2106 problem

Sorting with qsort()

Example: sorting an array of doubles

The comparison function

Passing the array elements to the comparison function

Writing the comparison function

The complete example for sorting an array of doubles

Complete example: sorting an array of structures:

The comparison function

The results

Debugging the comparison function

Wider application

OS-dependent libraries

Example: pausing ("sleeping")

Command-line arguments

The getopt() library function

Third-party lbraries

Pseudo-random numbers with `rand()`

The important one: call `srand()` just once

Sorting with `qsort()`

The complete example for sorting an array of `double`s

The `getopt()` library function