Physics and Astronomy

Home › Our Teaching › Resources › C programming › recurse.html

Example 2: geneology, printing people's descendants

Example 4: counting a person's descendants

Summary

Recursion: Functions that call themselves

Comments and questions to John Rowe.

To iterate is human, to recurse divine.
Computer Science saying

God is subtle but he is not malicious.
Einstein

Recursion is an extremely simple concept: a function simply calls itself. Of course, a function can't always call itself, that would create an infinite loop, so all recursive functions have some sort of test before they call themselves.

void myfun(some_args) {

 /* Do some stuff */
  if ( some_test)
     myfun(some_other_args);
}

A variation on the above is to do the test inside the function:

void myfun(some_args) {
  if (I_didnt_need_to_be_called)
    return;

 /* Do some stuff */
    myfun(some_other_args);
}

Where it works this is neater (as we do the test only once).

Try googling the word "recursion"

Recursion can lead to subtle errors: either something gets treated twice or not at all, or our recursion goes into an infinite loop. The key is to understand the "test" inside the recursive function that decides whether or not it calls itself again.

When functions call themselves each instance has its own local variables, unless they are static.

Example 1: a factorial function

The simplest example of recursion is the following somewhat pretentious factorial function:

long factorial(int i) {
  if ( i > 0 )
    return  i * factorial(i - 1);
  
  return 1;
}

Step through this code

This illustrates the concepts of recursion: a function that calls itself together with a test to make sure the sequence of function calls stops at some point, and a local variable "i" which local to each invocation of factorial(): when factorial() calls itself each instance has its own copy of i.

It also illustrates one possible pitfall: it's possible to use recursion just to show off. The above example would have been better written as a simple loop.

Error checking and wrappers

We could add some simple argument error checking as follows:

long factorial(int i) {
  if ( i > 0 )
    return  i * factorial(i - 1);
  else if (i < 0 ) {
    fprintf(stderr, "Factorial must be positive\n");
    return -1;
  }

  return 1;
}

Another option would be to make the publicly-callable function a wrapper around the real function:

// The actual factorial function
long factorial_real(int i) {
  if ( i <= 1 )
    return 1;
  return  i * factorial_real(i - 1);
}

// Publically-callable front-end to the 
// factorial function with error checking
long factorial(int i) {
  if (i < 0 ) { // Error check
    fprintf(stderr, "Factorial must be positive\n");
    return -1;
  }
 return factorial_real(i);
}

Step through this code

This is slightly over the top in this case but is a useful illustration.

Functions only callable from other functions within the same file

Recall that if we declare the following two variables outside of any function:

int globalvar;
static int thisfilevar;

then globalvar is accessible from any function in any source file but thisfilevar is only available to functions inside the same source file.

In the above code factorial_real() is clearly only meant to be called by factorial(). Whilst there's no exact way to acheive this with C we can restrict the function to be only callable by functions inside the same source file by again using the static qualifier for both the prototype and the actual function definition:

static long factorial_real(int i);
...
static long factorial_real(int i) {

When recursion goes wrong

The biggest problem is the internal test or tests: either the recursion never stops or some possibilities get missed out.

Imagine we had made a mistake in our previous rather over-the-top way of calculating a factorial:

long factorial(int i) {

  if ( 1 > 0 ) // Bug!
    return  i * factorial(i - 1);
  return 1;
}

You can see what it's meant to do but we've written '1' instead of 'i' so it will go on for ever. The first stage of debugging is just to print out its arguments:

long factorial(int i) {

  fprintf(stderr, "i: %d\n", i);
  if ( 1 > 0 ) // Bug!
    return  i * factorial(i - 1);
  return 1;
}

We can get even more details by adding an extra debugging argument to tell us the depth:

long factorial(int i, int depth) {
  int d;

  for(d = 0; d <= depth; ++d)
    fprintf(stderr, "  ");
  fprintf(stderr, "i: %d (depth %d)\n", i, depth);
  if ( depth > 12) {
    fprintf(stderr, "Too deep!\n");
    abort();
  }

  if ( 1 > 0 ) // Bug!
    return  i * factorial(i - 1, depth + 1);
  return 1;
}

Here we have done several things:

We have added an extra 'depth' argument which we increment by one every time.
We have indented the output according to the depth. This isn't necessary but it is nice.
Finally, and rather extremely, we have called abort() if factorial() goes too deep. This is only useful when we are running under the debugger as we can look at the "layers" of recursive call.
Reminder: a neater way to fo this is to use assert(), defined inside assert.h:
```
assert(depth <= 12);
```

Using a wrapper

It may be very inconvenient to add an extra argument to our routine because it is called elsewhere, there are header files, etc. In that case we just create a simple wrapper:

long factorial(int i) {
  return  factorial_debug(i, 0);
}

Be prepared to add some debugging to your recursive routine and also be prpared to add a "depth" parameter to the "real" recursive routine, perhaps with a simple publicly-callable wrapper routine.

Example 2: geneology, printing people's descendants

Consider a simple structure to represent a person and their children:

/* legal_parent defaults to bio_parent if the former is NULL */
typedef enum { bio_father, bio_mother, legal_father, legal_mother,
         parent_max } Parent_type;


typedef enum { male, female, unknown_gender } Gender;

typedef struct person {
  char *surname;
  char **forenames;
  struct person **children;
  int n_children;     /* Number of children */
  int childbuf_size;  /* Size of children buffer */
  struct person *parents[parent_max];
  Gender gender;
} Person;

The children

The structure members:

  struct person **children;
  int n_children;     /* Number of children */
  int childbuf_size;  /* Size of children buffer */

allow us to store a dynamically-sized array of that person's children.

Trees

The above is an example of a tree (but see note below). Trees are quite common in programming because the concept they represesent "things that can have other things to an unlimited depth" is quite common in real life. Recursion is an essential tool when dealing with them.

Printing a person's details

We can then define the following function to print out the details of one person:

void printperson(Person *p) {
  int j;

  printf("%s,", p->surname);
  for(j = 0; p->forenames[j] != NULL; ++j) 
    printf(" %s", p->forenames[j]);
  printf(" (%s), %d children\n", gender_name[p->gender], p->n_children);
}

To print a person and their descendants

If we wished to print out a person and all of their descendants the algorithm would look like this:

Print their name and other details.
Foreach child:
- Print this person and their descendants

(notice that in this case the test to call itself is an implicit one: the list of children might be of zero length.)

The function and its public wrapper look like this:

static void printdescendants_1(Person *p, int depth);

//
// Print a person and their descendants
void printdescendants(Person *p) {
  printdescendants_1(p, 0);
}
 
//
// Print a person and their descendants
static void printdescendants_1(Person *p, int depth) {
  if (p == NULL)
    return;
  for (int sp = 0; sp < depth; ++sp)
    fputs("  ", stdout);

  printperson(p);

  for (int child = 0; child < p->n_children; ++child)
    printdescendants_1(p->children[child], depth + 1);
}

There are a couple of things to notice before the main part of the function. The first is that we check that the Person argument is not NULL. This is pure paranoia and hence is extremely wise.

Second we print out some spaces before the name to indicate the depth.

Then it's simple: print this person's details and call the whole function for each of this person's children, increasing the depth parameter by one. (There is an interesting issue here: since each person has two parents, if the original person has several generations of descendants it's likely that at some point two of them will have a child together and we will end up printing the same child twice. For the time being we shall assume this is what is required but we shall examine the issue later.)

Example 3: flood fill

A common siuation, with a number of variations, is when we have a rectangular grid of integer values, typically representing an image, and we wish to identify "islands", i.e. sets of neighbouring pixels with the same value.

In the following example we wish to identify and number "islands" in an empty "sea" of zeros:

The initial description

#include <stdio.h>

/*
 * Demonstrate a flood fill, using it to count and label non-empty
 * contiguous regions in an image.  
 *
 * An image point that is empty has the value
 * zero, if it is non-empty but not yet assigned to a region
 * it has the value minus one. Pixels that have been assigned
 * to regions have the values one, two, three, etc. according
 * to the region they belong to.
 */

#define NX 10000
#define NY 10000

The flood fill function

/*
 * Simple flood filler: set all contiguous spaces with value "-1"
 * to the value "area_id".
 */
void floodfill(int image[NX][NY], int x, int y, int area_id) {
  if (x < 0 || x >= NX || y < 0 || y >= NY || image[x][y] != -1)
    return;

  image[x][y] = area_id;

  for (int ix = x-1; ix <= x+1; ++ix)
    for (int iy = y-1; iy <= y+1; ++iy)
      floodfill(image, ix, iy, area_id);
}

Notes

The first thing we do is to see if the function needs to do anything, if not we immediately return.
We check the co-ordinates are in range before checking the pixel (image) value at that point.
Then the first thing we do is to mark the pixel as filled.
Then we try each of its neighbours. (We also try the original point but that doesn't matter as it immediately returns.)

Had we reversed the last two steps then the function would have been forever calling itself. It would have done this even if we had put in an if() statement to avoid refilling itself:

  for (ix = x-1; ix <= x+1; ++ix)
    for (iy = y-1; iy <= y+1; ++iy)
      if ( ix != x || iy != y)
        floodfill(image, ix, iy, area_id);

  image[x][y] = area_id;  // Too late!!

The outer function

All we need to do now is to go over each point in the image and call floodfill() for each point that needs it:

/* 
 * Count and label contiguous areas in an image.
 * On input: 0 means empty space, -1 means non-empty
 *
 * On exit the contiguous areas have the values 1, 2, 3 etc
 * and the number of areas is returned.
 */
int count_and_label(int image[NX][NY]) {
  int area_id = 0;

  for (int x = 0; x < NX; ++x)
    for (int y = 0; y < NY; ++y)
      if (image[x][y] == -1)
  floodfill(image, x, y, ++area_id);
    
  return area_id;
}

This will fill the first island/region with the value "1", the second with the value "2", etc. and return the total number of regions found.

Example 4: counting a person's descendants

In the above flood fill example, the act of labelling the point 1,2 3 etc. automatically marked the point as having been processed and enabled us to not do it twice. Furthermore, each point is only ever going to be handled once in the program.

Sometimes this may not be true: we may need to explicitly mark something as having been handled, and we may need to process something more than once.

To illustrate this, let's consider a variation of the above geneology example where we wish to count a person's descendants.

As mentioned above, there is a subtle problem here: since each person has two parents, if the original person has several generations of descendants it's likely that at some point two of them will have a child together and we will end up counting the same child twice. Further more we may wish to count several people's descendants and of course any given person may need to be included in several people's counts (but only once each time!).

The answer is to add a "flag" to the person structure to record whether that person has been dealt with.

To use the flag we will need to use binary, bitwise operators

Reminder: bitwise operators and flags

Bitwise operators operate on the individual bits of a variable, rather than the arithmetic value of the whole variable as the arithmetic operators do.

The operators

Operator	Description
&	Bitwise and (both bits are one)
\|	Bitwise or (either or both bits are one)
^	Bitwise exclusive or (just one bit is one)
<<	Left shift (muliplies by power of 2)
>>	Right shift (divides by power of 2)
~	One's complement

NB the One's complement operator takes a single argument, the others two.

Examples

We take the example of an unsigned char (one byte, ie 8 bits). Although there is no way to write numbers as binary in C; the nearest is hexadecimal, but for clarity we shall show the calculation in binary.

Expression	Calculation
`00110011 & 11110000`	`00110011 11110000 -------- 00110000`
`00110011 \| 11110000`	`00110011 11110000 -------- 11110011`
`00110011 ^ 11110000`	`00110011 11110000 -------- 11000011`
`10110011 << 2`	`10110011 -------- 11001100`
`11110000 >> 3`	`11110000 -------- 00011110`
`~11110000`	`11110000 -------- 00001111`

Binary flags

As mentioned above, the most common use of bitwise operators is to define flags, ie options that are either on or off. Typically there's a header file with various constants each of which is an exact power of 2 (ie has only one bit set):

#define OPTION1 0x01  /*  1 binary 00000001 */
#define OPTION2 0x02  /*  2 binary 00000010 */
#define OPTION3 0x04  /*  4 binary 00000100 */
#define OPTION4 0x08  /*  8 binary 00001000 */
#define OPTION5 0x10  /* 16 binary 00010000 */
#define OPTION6 0x20  /* 32 binary 00100000 */

because these are powers of two they are unsuitable for enumerations.

The code sets, unsets and tests options as follows:

unsigned int flags;
 
int main(void) {
 
  flags |= OPTION3; /* Set OPTION3 */
  flags &= ~OPTION4; /* Unset OPTION4 */
  
  if ( flags & OPTION3)
    printf("Option 3 is set\n");
   
}

The setting and testing of flags is fairly clear, the unsetting of OPTION4 is a little more complicated: OPTION4, like all options, has just one bit set so ~OPTION4 has every bit except that one set. So flags &= ~OPTION4 has the following effect:

The bit that was one in OPTION4 is zero ~OPTION4 so the binary and for that bit must always be zero.
The bit that was zero in OPTION4 is one ~OPTION4 so the binary and for that bit will be one if and only if the corresponding bit in flags was one. That is to say, it is not changed.

Example in binary
00111011 & ~00001000 ~00001000 = 11110111

00111011 11110111 -------- 00110011

Using binary flags in our example

Returnin to our example, here's a typical include file defining a structure with some binary flags:

typedef enum { male, female, unknown_gender } Gender;

/* legal_parent defaults to bio_parent if the former is NULL */
typedef enum { bio_father, bio_mother, legal_father, legal_mother,
         parent_max } Parent_type;

typedef struct person {
  char *surname;
  char **forenames;
  struct person **children;
  int n_children;     /* Number of children */
  int childbuf_size;  /* Size of children buffer */
  struct person *parents[parent_max];
  Gender gender;
  unsigned int flags;
} Person;

// For flags
#define ISDEAD 0x01
#define DONE   0x01000000

int countdescendants(Person *p);

We modify our printing function so that it fist checks to see if the person has already been handled and if so immediately returns. It then larks the person as being handled. Since we assume we had to do some debugging we will add the ever-useful "depth" parameter and a (commented out) assert():

//
// Count descendants, assuming "DONE" flag has already
// been cleared
static int countdescendants_1(Person *p, int depth) {
  int children;

  if (p == NULL || p->flags & DONE )
    return 0;

  //  assert(depth < 100);
  p->flags |= DONE;
  children = p->n_children;

  for (int child = 0; child < p->n_children; ++child)
    children += countdescendants_1(p->children[child], depth + 1);

  return children;
}

Cleaning the list

As it stands the code above will work just once because the second time around every person will have already been marked as already done. therefore before printing we must first go down the descendant tree clearing that bit:

//
// Clear the "DONE" flag from a person and their descendants
static void markasclean(Person *p) {
  if (p == NULL)
    return;
  p->flags &= ~DONE;

  for (int child = 0; child < p->n_children; ++child)
    markasclean(p->children[child]);
}

Putting it all together

#include <assert.h>
#include <stdio.h>
#include "peopledefs.h"

static int countdescendants_1(Person *p, int depth);
static void markasclean(Person *p);

//
// Count descendants, assuming "DONE" flag has already
// been cleared
static int countdescendants_1(Person *p, int depth) {
  int children;

  if (p == NULL || p->flags & DONE )
    return 0;

  //  assert(depth < 100);
  p->flags |= DONE;
  children = p->n_children;

  for (int child = 0; child < p->n_children; ++child)
    children += countdescendants_1(p->children[child], depth + 1);

  return children;
}


//
// Clear the "DONE" flag from a person and their descendants
static void markasclean(Person *p) {
  if (p == NULL)
    return;
  p->flags &= ~DONE;

  for (int child = 0; child < p->n_children; ++child)
    markasclean(p->children[child]);
}


//
// (Finally, the "public" function)
// Count a person's descendants
int countdescendants(Person *p) {
  markasclean(p);
  return  countdescendants_1(p, 0);
}

Validate Link-check

Privacy & cookies