Skip to content
Physics and Astronomy
Home Our Teaching Resources C programming Bugs and debugging
Back to top
On this page
Contents

Bugs and debugging

Work in stages

In this course, and in the exercises, we have emphasised the importance of writing a little code at a time and checking it thoroughly before moving on to the next piece.

Work in stages, writing a little code at a time and check it thoroughly before moving on to the next piece.

Just because we hope our code is bug-free doesn't mean that it really is!

In practice, however, it's very easy to degenerate to doing the minimum amount necessary for us to be able to tell our self that it's working properly, rather than doing everything we can to find any bugs. This is usually due to a combination of lack of time, laziness or wishful thinking.

The programs that take a lot of time to get working are the ones where we have let a bug slip through and have to find it later, not the ones where we have spent time checking for bugs as we go.

Shake the scorpions from your shoes (or worse!)

Picture of a
          scorpion
Image: wikipedia

In countries where scorpions and poisonous spiders are common, we are advised to carefully shake out shoes and other clothes in the morning before putting them on, particularly when camping or living out in the country.

Warm up discussion

  1. If you were living in fairly rough accommodation in an area with large numbers of poisonous spiders and scorpions, how careful would you be shaking out your shoes, underwear etc. before putting them on them on in the morning? Why?
  2. . If somebody brought you such an item to put on but were rather vague about whether they had checked for scorpions ("Yeah, well, I think so..."), how would you feel about putting that item on?

Remember

It may be unpleasant to shake out a piece of clothing we were about to put on next to our skin and to see a scorpion fall out, but it's nothing compared to what would have happened had we not bothered!

Find the bugs before they bite you where it hurts

In the case of physical poisonous bugs, the psychology described above is fairly clear. But for logical bugs in our code the psychology tends to be reversed: we can tend not to carefully check for bugs in the hope that if we don't look for them somehow they won't be there. Nobody would make that mistake when checking for a poisonous spider in their underwear so don't make it when checking for bugs in your code!

Debug thoroughly as you go

It's not the scorpion we shake out of our underwear that we need to worry about, but the one we don't shake out of our underwear.

It's not the bug we find as we go along that takes the time,but the one we don't find as we go along.

Put in checks

Put checks into your program

Check every class of possibility

It's seldom possible to check every possible combination of input.

Let's imagine we are programming a generalised, NxN naughts and crosses program, for example a 5x5 game rather than the usual 3x3. This has around 25 factorial possible games and we can't check them all.

The quadratic equation a*x*x + b*x + c = 0 has an infinite number of possible coefficients.

It's important to consider every possible class of cases and to test at least one of every class.

For the example of a quadratic equation we have the following possibilities:

  • Two real roots.
  • Two complex roots.
  • One repeated root.
  • Any or all of a, b and c equal to zero. (a equals zero is particularly important of course.)

We should make sure we test them all.

Check every class of possibility.

Check that tests fail as well as pass

If we are checking, say, that a number is divisible by four we must first check that the conditions triggers when the condition is divisible by four and second that it does not trigger when the number is not divisible by four.

Put in checks

If we are dealing with a triangle, check that the angles add up to two pi, to within a margin of error. (But see below.)

Check that your checks do actually catch the errors.

Mistakes to avoid

Sadly, these are all things I have seen (and probably done!).

1. Printing out the result without checking it's correct

If we write a factorial function and print out 11 factorial as a test, we will need some evidence that it's correct - nobody can work out the value of 11 factorial in their head.

Check that the result is actually correct, not just that's it's a number!

2. Always using the same test data

It's easy to get into the habit of typing a certain set of numbers into the terminal until that particular input data gets handled properly and then moving on. It is even easier when the test data is in a file.

Try not to always use the same test data.

3. Checking only special cases

For example, when reading in a floating-point number to just type in integers. Or when solving the quadratic equation a*x*x + b*x + c = 0 to make b or c equal to zero, or a equal to one. There are a number of errors that won't get caught in this case, for example forgetting the initial minus sign in "-b + sqrt(b*b - 4.0*a*c)".

4. Failing to check special cases

We should make sure that our program can handle a equals zero for example.

Check the special cases but not just the special case.

Conclusion

Using a piece of code which we've only vaguely debugged is like putting on underwear in the Australian outback which we've only vaguely checked for poisonous spiders or scorpions: it's going to come back and bite us somewhere very painful indeed!

Finding bugs

Despite our best intentions, bugs do occur. When they do there are two main priorities, which often operate in parallel:

1. Narrow down where the bug is

Successively partition the problem

If we were playing a game where we choose a (whole) number between one and a hundred and the other person has to guess it, a small child might ask "Is it one? Is it two?...". A more numerate person would start "Is it less than fifty?", thus going from an "order N" problem to an "order log(N)" problem.

Similarly, if an incorrect value or algorithm has two or three components, try to work out which of these is wrong, rather than starting at the first sub-component of the first component, then the second sub-component and so so.

Narrow down where the bug is, preferably by successive partitioning

Also be aware that it can be the code that puts the components together that's in error, or even the debugging code itself.

Check the error isn't in your debugging code.

2. Compare the two stories

It's important to realise that every piece of code has two stories:

  1. What we want it to do
  2. What we have told it to do.

When debugging a short piece of code we basically go down comparing these two stories and seeing where they differ.

Do not make the uncouncious mistake of assuming that what you want your code to do has any effect on what it will do!

Step through the code comparing what you want the program to do with what you have told it to do.

Example: a function calling other functions

Here's a familiar situation. The following function always returns zero whenever r is greater than two:


long ncr(int n, int r) {
  return npr(n,r)*(1/factorial(r));
}

 

We can easily see what it's meant to do, if we can't see the problem by inspection we move on to:

3. Get more information by printing stuff out

The 80/20 rule is more properly known as the Pareto principle.

Bearing in mind the 80/20 rule, there are two debugging tools that will form the majority of what you do. The first is for programs that run but produce the wrong answer. It's very simple: Print stuff out.

Get more information by printing stuff out.

The first thing is to put in some diagnostics:


long ncr(int n, int r) {
  long fact, nprval, result;
    
  fact = factorial(r);
  nprval =  npr(n,r);
  result = npr(n,r)*(1/factorial(r));
  printf("FACT: n %d r %d fact %ld npr %ld result %ld\n", 
   n, r, fact, nprval, result);
  return result;

 

Print out information that tells you the progress of the program.

(Another choice would be to have factorial() and npr() print out their arguments and results, we shall illustrate this below). The program produces:

FACT: n 5 r 3 fact 2 npr 60 result 0

This probably isn't what we expect: at first sight both npr and factorial seem to be doing their job but the result is zero! Careful thought about the rules of integer arithmetic makes us realise that: (1/2) is zero not 0.5 so we fix our code, remove the diagnostics and move on (a little too soon as we shall discover):


long ncr(int n, int r) {
  return npr(n,r)/factorial(r);
}

 

Print out information that tells you whether the program is correct so far (and check it!)

Example continued: a bad function containing a bad loop

A little later on we realise that although ncr() is no longer giving zero, it's not giving the right answer either! Our code includes the following factorial function:


long factorial(int n) {
  long res = 1;

  for (int i = 2; i < n; ++i) 
    res *= i;
  return res;
}

The first thing would be to see if both it and the npr() function were behaving properly. This time we put all the diagnostics inside inside factorial() and npr() so each function we might print out its arguments and what it returns:


long factorial(int n) {
  long res = 1;

  printf("FACTORIAL: n is %d\n", n);
  for (int i = 2; i < n; ++i) 
    res *= i;

  printf("FACTORIAL returning %ld\n", res);
  return res;
}

One point to note here is that it's unwise just to print out a number without saying what it is (printf("%n\n",n);), as we will end up with a string of anonymous numbers.

Print out what the number, etc,  is, not just its value, and make sure the message identifies where it comes from in the program

We give our program data which will call this function with a sensible value (say 6) and find it returns 120, not 720. That's probably enough to find the problem. If we do wish to debug a loop we typically want to print out two things:

  1. The variables controlling the loop
  2. What the body of the loop is doing

We therefore write:


long factorial(int n) {
  long res = 1;

  printf("FACTORIAL: n is %d\n", n);
  for (int i = 2; i < n; ++i) {
    res *= i;
    printf("FACTORIAL: i %d res %ld\n", i, res);
  }
  printf("FACTORIAL %d returning %ld\n", n, res);

  return res;
}

And see:

FACTORIAL: n is 6
FACTORIAL: i 2 res 2
FACTORIAL: i 3 res 6
FACTORIAL: i 4 res 24
FACTORIAL: i 5 res 120
FACTORIAL 6 returning 120

Looking at the numbers we see we have calculated 5 factorial not 6 factorial. We then look at the loop condistion and see that we have instinctively written i < n (perhaps because we are so used to looping over array indices) and fix it to read:


long factorial(int n) {
  long res = 1;

  for (int i = 2; i <= n; ++i) 
    res *= i;
  return res;
}

Dealing with infinity and Nan

Sometimes we try print out a number and get Nan instead! This is short for Not a number and happens when we do something invalid such as sqrt(-1) Nans propagate though calculations:

  x = sqrt(-1)
  y = x + 1;

This code causes both x and y to be Nan. A similar problem occurs when we divide by zero, which gives inf. (NB Windows often uses #IND or another three letter abbreviation instead.)

A macro is a more advanced use of #define which is allowed to have arguments.

The #include file math.h also defines the macro isfinite() that returns zero if its parameter is infinity or Not a Number or one otherwise. For example, suppose we have an array and for some reason we think that some of its values are Nan or infinity. We might write the following loop:

  for(i = 0; i < N; ++i)
    if (isfinite(x[i]) == 0)
      printf("x[%d] is %f\n", i, x[i]);

This will print out i followed by inf, Nan, etc. for any bad array elements. If we were to use the debugger, below, we would have a choice of a few other things to do.

See also the section on assert() below.

isfinite(x) checks that x is not infinity or Not A Number.

  1. Generate Nan and inf, and try isfinite()
  2. Dealing with Nan and inf
  3. Open up the on-line compiler in a new window
  4. Be sure to save this mini-exercise by giving it a suitable name
  5. Declare three doubles and read in the value of one of them using scanf().
  6. Set one of the other varables to the square-root of this value (sqrt(var1)) and the other to the reciprocal (1.0/var1). Don't forget math.h!
  7. Print out these three values.
  8. Build & run. Check the output is correct. (For an input value > 0.)
  9. Now see what happens when you enter a negative value and then zero. You should see Nan (or possibly -Nan) and inf respectively.
  10. Use two if() statements and isfinite() (one for each of the two calculated variables in turn) to print out warnings such as "Invalid square root" and "Invalid reciprocal". (There is no need to change the printf() statement that prints out the three values at the end.)

4. Check our assumptions

Recently I had a problem where a program sometimes couldn't find a file. I printed out just the filename but not the directory name because I "knew" what directory it was looking in. But when I printed out the full file path ("directory/filename") I found it was looking in the wrong directory.

The executive summary of this section is that we should not assume that we know what the problem is. After all, if we knew where the problem was it wouldn't exist!

Check your assumptions

Some things we "know" to be true aren't

When a program is running properly there are all sorts of things we "know". And sometimes instead of printing out the actual thing we want to know the value of we print out something we "know" is the same as the thing we want.

It may not always be the last thing we changed

Most of the time the bug will be in the bit of code we are working on. But if we can't find it there we need to look at the code we had previously assumed was bug-free. This is an unpleasant occurrence (there's lots of code to look through!), but there again we've been saying this for the whole lecture so it should not be a surprise by now.

Interface errors: inconsistent assumptions

The worst bugs can be where we make one assumption when writing a function and another when using it - referred to as interface errors. These are hard to find because each part of the program seems to be correct. Get rid of those ambiguities and where that's not possible say which one you've chosen in the comment at the top of the function.

Remember the differences tend to be subtle: there are two different "heat capacities" for example. Or a function may assume a certain piece of set-up work has already been done before we have called it but when we call it we may assume that the function itself does it for us. (In this latter case, wherever possible the function should check it has been done, and in general functions that do a complete job are better than ones that only do part of it.)

Interface errors, where we make one assumption when writing a function and another, slightly different one, when using it are particularly hard to spot.

Check the diagnostics

We mentioned this above, but sometimes the code is actually correct, it's our diagnostic that's wrong.

Using the debugger

NB: the on-line compiler does not allow interactive use of the debugger as described here but does give some diagnostic output.

When programs crash

The printf() technique is not very helpful when a program crashes with a segmentation fault or other memory error. All we get is a message saying it has crashed, with no idea where or why.

I have seen bugs so bad that the debugging information itself has been destroyed but these cases are rare.

Fortunately the debugger can tell us:

  • Where the program was when it crashed
  • What values the variables had at the time.

For the debugger to be useful we must have enabled the appropriate options when we compiled the program.

The debugger can tell us here the program was when it crashed and what values the variables had at the time.

  1. Show some debugger output in the on-line compiler.
  2. To show a little of how the debugger can help us.
  3. Open up the on-line compiler in a new window
  4. Either find a saved program where main() called a function or write a very small program that does this. The function does not have to do anything.
  5. Inside the sub-function declare a variable and put in a call to scanf() to read in its value. However, instead of &varname put &varname + 123456789(!!). This is sort of legal: we are asking scanf() to write to a random memory location.
  6. Build & run, and type in an input value for the call to scanf().
  7. Under the sub-heading "The program crashed" you should see some debugger output that tells you:
    • The place where the program crashed, inclusing the name of the function.
    • The values of the local variables.
    • The place where this function was called from and the values of that function's local variables (and so on).

This is a brief taste of what the full debugger can tell us when our program crashes.

Invoking the debugger

Showing variable values ("watches")

By default the debugger will show us where a program was when it crashed but not the variable values. This is easily fixed, and also needs to be done just once, from the Debug -> Debugging windows -> Watches menus item:

watch window
Enabling the "Watches" window.

Viewing the Call stack

  • If the "Call stack" entry is not checked in the above picture select that too. This is the window that shows what functions were active at the time.

Enable the Call stack and Watches windows.

Moving windows

Note: Code::Blocks has quite a sophisticated windowing system and free-floating windows can be dragged into the "main" display. The images below will show the "watches" window having been dragged to the left of the source-code window.

Starting the program under the debugger

This can be done either from the Debug -> Start/Continue menu or just br pressing F8:

watch window
Running under the debugger

What the debugger does for us


By far the most valuable use of the debugger is for telling us where our code crashed and the values of the variables.

watch window

The purpose of the debugger is to provide us with a window like the one above. The above example is for a program that has crashed due to trying to modify a read-only string. It has three main windows (NB: if either or both of the "Watches" or "Call stack" windows does not appear enable them from the "Debug -> Debugging windows" menu).

filename.c: source code

This shows the statement where the program crashed.

Watches: variables and their values

The main items of interest to us are the arguments, local variables and global variables.

Call stack: function list

This shows us which function we are in, badfun(), the function that called it, and so on up to main(). In our case there are just two functions.

The function names are double-click-able: if we double-click on the word main inside the call-stack window the other windows will show us the state of main() not badfun(). The ability to go up and down the function list, examining the variables in each function can be extremely useful.

Moving up the call stack

In the following picture we have double-clicked on main() inside the call-stack window. This could be because we wanted to know where badfun() was called from or becase we wanted to view an array declared inside main() whose address was passed to badfun()

watch window
Click on the image for a picture of the whole screen.

Use the Call stack and Watches windows to move up and down the list of active functions and view their variables.

Viewing arrays

Arrays are only truely defined in the function that declared them (subfunctions are only passed their addresses) so the debugger may only show them when viewing teir original function.

The assert() macro

The assert() macro, which is defined inside assert.h, has a simple and brutal function: if the expression in parentheses in false (zero) the program is killed there and then via the abort signal. For example, in the above case where we thought that one or more of the elements of an array were infinity or Not a Number, we could have written:

  for(i = 0; i < N; ++i)
    assert(isfinite(x[i]));

assert(expression) kills the program if expression is false (zero), it is only useful when running under the debugger.

For obvious reasons, assert() is only really useful if we are running under the debugger. For equally obvious reasons we don't want to leave it enabled when we release the program to the people who are going to use it. Therefore assert() can be turned off by defining NDEBUG before including assert.h:

/* Turn off assert */
#define NDEBUG
#include <assert.h

In this case the pre-processor replaces assert() with a space.

  1. assert()
  2. To practice using assert()
  3. Retrieve your previous Nan and isfinite() mini-exercise.
  4. #include<assert.h> at the start of your program.
  5. Replace your two if() statements with statements of the type assert(isfinite(varname)). That is: remove the whole compound if() statements including the printf()s they control and replace them with assert().
  6. Build & run. Does it do what you expect?
  7. assert() can contain any logical expression. To demonstrate this, put a suitable assert() statement before the calculations of the square root and the reciprocal to make the program crash before these calculations. (NB: if you are using Code::Blocks with maximum error checking for this mini-exercise you may need to use fabs() in one of the checks.
  8. Build & run. Check the output is correct.. Check both that it does catch invalid values and that it does not falsely complain about valid values.
  9. Disable assert(): before the #include<assert.h> put #define NDEBUG.
  10. Build & run: the program should now print out Nan and inf.

Summary

The text of each key point is a link to the place in the web page.

Find the bugs before they bite you where it hurts

Put in checks

Finding bugs

Log in
                                                                                                                                                                                                                                                                       

Validate   Link-check © Copyright & disclaimer Privacy & cookies Share
Back to top