Show me your [functions] and conceal your [data structures] and I shall continue to be mystified,
show me your [data structures] and I won't usually need your [functions]; they'll be obvious.
Brooks,

Structures do for variables what functions do for code

Representation is the essence of programming.
Brooks

We have already discussed the fact that nearly every program is so large that it is completely impossible to hold in all in our head at the same time.

Functions organise code into self contained units, enabling us to think at a higher level without having to keep track of the details.
Structures do exactly the same for variables.

Structures allow us to organise our data, letting us think at a higher level without having to keep track of the details

Composite variables

We have already encountered C's two built-in composite variables: complex numbers and arrays. The latter allow us to think in terms of high-level mathematical concepts such as vectors and matrices and allow us to pass a single memory address to a function to allow it to access and modify any member of the array. However arrays are limited to composite objects of the same type (all ints, all doubles, etc.) C structures remove these limitations.

C structures allow us to define our own composite data types where several "sub-variables" (called "members") can be combined into one composite variable which can be treated as a single unit, similar to the individual elements of an array.

Since structures have members with different types it would be inconvenient to refer to them by number, instead they are referred to by name using a dot (.) rather than square brackets.

An array has elements of the same type referred as array[number], a structure has members of different types referred to as structure.name.

Defining a new composite variable (structure) type

Example: a nuclear problem

Imagine a simple problem involving radioactive nuclides. If each nuclide has a name (stored as an array of characters) and a a half life (stored as a double) then to store the data for several nuclides we will need a two-dimensional array of characters and a one-dimensional array of doubles.

This isn't very convenient, but C doesn't define a composite variable type consisting of a string and a double and even if it did, it's likely that a little later on we would want to add more data such as mass and atomic number.

It would be great if C allowed us to define our own variable types, with names such as "Nuclide" or "Isotope", and luckily it does. The following code defines a composite variable type called a "Nuclide":

The structure name (nuclide) is optional but if it is present it's normal to give the structure and the typedef similar names. In our case the typedef name is the same as the structure name but is capitalised

#define MAXLEN  32
typedef struct nuclide { 
  double halflife; 
  char name[MAXLEN];
} Nuclide;

The above code defines a new data type called "Nuclide" consisting of a double called halflife and a character array called name. This definition does not create any Nuclide variables. Rather it puts the variable type "Nuclide" on exactly the same basis as the built-in types int, float etc: we can now create (declare) some Nuclides if we need to.

The structure definition does not create any structures it just says what the structure means so we can later create some.

It's also worth noticing that the Nuclide structure contains an array: structures can contain arrays and we shall see below that we can have arrays of structures.

Optional aside

The above combines two C features: structures, which actually do the work, and typedefs which give user-friendly names to things which already exist.

Structures

The following is the bare structure definition (technically referred to as a "specifier") which defines a new variable type, "struct nuclide":

#define MAXLEN  32
struct nuclide { 
  double halflife; 
  char name[MAXLEN];
} ;

We can now create variables of type "struct nuclide" just like ints, floats, etc.

`typedef`

Since "struct nuclide" is not the most user-friendly name for a new variable type, we can make things a little neater by using C's "typedef" mechanism to give our structure a neater name:

typedef struct nuclide Nuclide;

Combining the two

For convenience we may combine the two as above:

typedef struct nuclide {
  double halflife;
  char name[MAXLEN];
} Nuclide;

Now "Nuclide" and "struct nuclide" are synonyms; they mean the same thing.It is a common convention to give the synonym the same name as the structure type but with an initial capital letter.

Define a structure to represent a cuboid.
The first part of our first use of structures.
A solid cuboid can be considered to have three dimensions, a density and a mass. This can be represented by an double array of size three and two individual doubles.
Create a new on-line program in a new window, with a suitable title and opening comment.
Above main() create a structure definition for a cuboid with the above quantities. Give the structure and its members sensible names but do not at this point declare any actual structures inside main().
Build & run. At this point your program does not do anything it didn't do before, we are just checking your code compiles.

Declaring structures

If we have the above code just once at the top of our file then Nuclide joins the list of variables we can now declare, just like ints, floats, etc, and we can access the individual structures members as varname.member:

#include <stdio.h>
#include <string.h>

#define MAXLEN  32
typedef struct nuclide { 
  double halflife; 
  char name[MAXLEN];
} Nuclide;

int main() {
  Nuclide nuc;

  nuc.halflife = 3.2;
  strncpy(nuc.name, "Mystuff", MAXLEN);
  nuc.name[MAXLEN-1] = '\0';

  // Now do something...
  return 0;
}

Step through this code

Once we have defined what a structure means we can then create some just like any other variable type.

In the above example we have assigned a value to nuc.mass and used nuc.name as an argument to strncpy(). We could have also read in the value of nuc.halflife with scanf():

  printf("Please enter the half-life.\n");
  scanf("%lg", &nuc.halflife);

Anything we can do with an ordinary variable or an array we can do when it's part of a structure

We can also have arrays of structures, although these are quite uncommon:

#include <stdio.h>
int main() {
  Nuclide nucs[MAXNUCLIDES];

  for (int n = 0; n < MAXNUCLIDES; ++n)
    scanf("%s %lg", nucs[n].name, &nucs[n].halflife);
  
  ...
  return 0;
}

Step through this code

Declare a structure to represent a cuboid
To practice structures.
Edit your previous Mini-exercise that defined a structure to represent a cuboid.
Declare an actual cuboid inside of main(). Read in the value of the density into the structure using scanf() and print that value out to the screen as a check.
Build & run. Check the output is correct.
Now read in the three dimensions, calculate the mass and print out all five numbers (the three dimensions, the density and the mass).
Build & run. Check the output is correct..

Another example: an ellipse

The following defines a structure which we might use to represent an ellipse:

typedef struct ellipse {
  float centre[2];
  float axes[2];
  float orientation;
  float area;
} Ellipse;

We will use this example in the rest of the notes.

Initialising structures to all zeros

It's possible to initialise structures in much the same way as arrays, by putting the values inside { }, so in theory we can write:

    Ellipse ellie = { { 0.7, 45.2 }, /* centre */
        { 2.3, 8.4 },  /* axes */
        1.87  /* orientation */ };

However this is horrifically unclear and will fail completely if we add another member at the start of the structure. The only real use of this is to initialise a structure to all zeros using the rule that missing initialisers are set to zero:

    Ellipse zero = { 0 }; // All zeros

When to use structures and what to put in them

In general:

Any type of "thing" in the problem we are thinking about whose properties cannot be represented by an existing variable type or array should normally have its own structure type.

The structure should tell us everything we need to know about that thing.

As a general rule we should ask ourselves: "If I had two ellipses (or nuclides etc.) what would I need two of and what would I still just need one of?" Things we would need two of should be part of the structure.

For example, if we had two ellipses they would each have their own centre and axes but would share the same value of PI.

Structures are "proper variables"

Unlike arrays where the "value" of the name of an array is the address of its first element, structures are "proper variables" that can be copied (although this is quite rare):

  mystruct = yourstruct;

This has a serious consequence when passing structures to functions as we will see in the next example.

Passing structures to functions

Given that structures are proper variables we may pass copies of them to functions. This rather simple example shows a function whose job is to calculate and print out the area of an ellipse. (In practice this function is so simple it probably isn't worth making a separate function.) Just for fun it also makes a doomed attempt to move the ellipse.

#include <stdio.h>
#define  PI 3.14159265358979

typedef struct ellipse {
  float centre[2];
  float axes[2];
  float orientation;
} Ellipse;

void print_area(Ellipse el) {
  float area;

  area = PI * el.axes[0] * el.axes[1];
  printf("The area is %f\n", area);
  el.centre[0] = 123.456; // Move the ellipse
  el.centre[1] = -78.9;
}

int main() {
  Ellipse ellie;

  printf("Centre? (x,y)\n");
  scanf("%f %f", &ellie.centre[0], &ellie.centre[1]);
 
  // NB: in a real program we should check
  // the axes are actually > 0 
  printf("Length of axes ( > 0 )?\n");
  scanf("%f %f", &ellie.axes[0], &ellie.axes[1]);

  printf("Orientation to the vertical?\n");
  scanf("%f", &ellie.orientation);

  print_area(ellie);

  return 0;
}

Step through this code

Step through the above "Key example".
To see that the structure in the called function is a copy of the original structure
Step through the above "Key example" in a new window.
Fast-forward to the call to print_area() (second line from the bottom) either by clicking on the » button to the right of the call or pressing the "Fast forward to the interesting bit" button near the top. (If you find this unclear just reload the page and manually step forward to this point in the program.)
Now step forward and note how the function receives a copy of the original ellipse.
Notive how when print_area() is moving the ellipse it is moving the local copy, not the original. Any changes made to the local copy are lost when the function returns, the original is unaffected.

This example illustrates why using structures without pointers is only of limited use: the function print_area()receives a copy of the original structure, not the structure itself, so if we change the copy inside the function the original is unchanged. For this reason structures are almost always used with pointers.

Changing the values of the members of an ellipse structure

There are several reasons we may wish to allow a function to modify an ellipse, for example:

We may wish to move the ellipse. (Or resize it, rotate it, or ...)
Some properties of the ellipse, such as the area, may be so useful we may wish to add a member of the structure to store it.

Reminder: pointers

We have already encountered this problem in a previous lecture where we wanted a function to be able to manipulate an array declared inside another function. Let's remind ourselves of what we did:

Pointers

Specific requirement:

To enable a function to access an array declared inside another function.

Solution:

A variable called a pointer whose value is the address of another variable (in this case the address of the first element of the array)

Wider application:

To enable a function to access an object declared (or otherwise created) inside another function
To enable a piece of code to act on one set of data and later on another set.

And, as in this case, usually both.

The solution when we want a function to be able to manipulate a structure declared inside another function is the same: pass the address of the structure to the function and have a pointer as a parameter to the function to receive the address. Thus every function has a pointer pointing to the same original ellipse structure and is making changes to that structure.

Pointers to structures

Structures are most useful when combined with pointers.

These work just as we would expect. Suppose, as suggested above, that we don't just want our function to print out the area of the ellipse but to store it for later. To do this we need to add an "area" member to our ellipse structure:

typedef struct ellipse {
  float centre[2];
  float axes[2];
  float orientation;
  float area;
} Ellipse;

The enlarged structure now takes twenty four bytes. A particular Ellipse, say called "ellie", might be stored starting at byte 800 in which case its members would be laid out as follows:

               ----------------------- ------------------------------------------------
Byte number:  | 800 - 803 | 804 - 807 | 808 - 811 | 812 - 815 | 816 - 819 | 820 - 823 |
               ----------------------- ------------------------------------------------
Used for:     | centre[0] | centre[1] |  axes[0]  |  axes[1]  |orientation|   area    |
               ------------------------------------------------------------------------
              | <-------------------------- ellie ----------------------------------> |

A statement such as:

  ellie.orientation = 11.4;

would mean that the computer would go to the start of ellie (800), go along 16 bytes to byte 816 and write the value 11.4 into the four bytes 816 - 819.

Alternatively, we could declare a pointer to an Ellipse, assign it the address of ellie (800) and use that pointer, as in the following rather contrived example:

#define PI 3.14159265358979

int main() {
  Ellipse ellie, *ep; // ellie is a structure, ep just a pointer

  ep = &ellie;
  (*ep).orientation = PI/4;
  // More code here...

The notation (*ep).orientation is rather awkward so pointers to structures are used so often they have their own notation:

  ep->orientation = PI/4;

The fact that they have their own notation should tell us how important pointers to structures are!

If p is a pointer to a structure then p->member accesses a member of that structure.

Our code is now exactly the same as before but with structure. replaced by pointer->, as in this short example:

  Ellipse ellie, *ep; // ellie is a structure, ep just a pointer

  ep = &ellie;
  ep->orientation = PI/4;

Step through this code

Dynamically allocating structures

Although the previous code is entirely legal it's a bit of a bodge. Rather than declaring a structure and then having a pointer whose value is the address of that structure it's far more common to dynamically allocate the stucture:

  Ellipse *e = NULL;

  e = malloc( sizeof *e);
  if ( e == NULL ) {
    fprintf(stderr, "Out of memory\n");
    exit(99);
  }

  e->orientation = PI/4;

Step through this code

Structures are usually dynamically allocated via pointers rather than declared as variables.

Pointer to a structure
The first stage in using pointers to functions.
Take your previous code and declare a pointer to a cuboid structure and dynamically allocate it some memory in the same way as above.
Remove the old declaration of the cuboid structure so that you are left with the declaration of the pointer to a structure and the dynamic memory allocation.
Now replace all the occurances of structurename. with pointername->
Build & run. Check the output is correct.

Passing pointers-to-structures to functions

The old function to print the area of an ellipse had the prototype:

// Print the area of an elipse to the screen
void print_area(Ellipse el);

The new function, which we have renamed calculate_area() now saves the value of the area in the area structure member rather than just printing it. That is it accepts a pointer and modifies the original ellipse structure. It has the prototype:

// Calculate the area of an ellipse and save it in the structure
void calculate_area(Ellipse *el);

The function itself is extremely short (partly because we have put the "move ellipse" code into a separate function where it belongs):

// Calculate the area of an ellipse and save it in the structure
void calculate_area(Ellipse *e) {
  e->area = PI * e->axes[0] * e->axes[1];
}

Passing pointers-to-structures to functions allows the function to access and change the members of the structure.

Passing pointers between functions

In reality the calculate_area() function above is still a little simple to be worth having as a separate function. However wWe may wish to define a new function to read in the values of the ellipse, which we might imaginitively call "read_ellipse()", and another to resize an ellipse by mutliplying each dmension by a constant scaling factor which we might call resize_ellipse(). In this case it would be quite sensible for both read_ellipse() and resize_ellipse() call our new calculate_area() function to update the area. They would do this by passing the pointer they receive to our new calculate_area(). As an example the resize_ellipse() function might look like this:

// Resize an ellipse by the same scaling factor in each dimension.
void resize_ellipse(Ellipse *el, float scale) {
  el->axes[0] *= scale;
  el->axes[1] *= scale;
  calculate_area(el);
}

This illustrates an extremely common practice: functions that receive pointers often pass those pointers on to other functions. The obvious analogy is with a phone number:

If I were to tell you "my phone number is 0314 159 265, ring me any time you have a question about programming", not only could you ring me yourself but could pass that number onto your class-mates who could also ring me.

Similarly, if a function has a pointer to something it can pass the value of that pointer to another function which can then access the original object in memory.

The complete code

The following code has three top-level functions and one utility function:

A function to read in the values of the ellipse structure from the keyboard,
a function to resize an ellipse,
a function to move an ellipse and
a utility function to calculate and save the area of the ellipse.

#include <stdio.h>
#include <stdlib.h>

#define  PI 3.14159265358979
typedef struct ellipse {
  float centre[2];
  float axes[2];
  float orientation;
  float area;
} Ellipse;

void calculate_area(Ellipse *el);
void read_ellipse(Ellipse *el);
void resize_ellipse(Ellipse *el, float scale);
void move_ellipse(Ellipse *el, float dx[2]);

int main() {
  float moveby[2], resize;
  Ellipse *e = NULL;

  e = malloc( sizeof *e);
  if ( e == NULL ) {
    fprintf(stderr, "Out of memory\n");
    exit(99);
  }
  
  printf("Welcome to the ellipse program\n");
  read_ellipse(e);
  
  printf("The area of the ellipse is: %f\n", e->area);
  
  printf("Amount to move the ellipse?\n");
  scanf("%g %g", &moveby[0], &moveby[1]);
  
  move_ellipse(e, moveby);
  
  printf("Amount to resize the ellipse?\n");
  scanf("%g", &resize);
  
  resize_ellipse(e, resize);
  
  return 0; 
}

void read_ellipse(Ellipse *el) {

  printf("Centre? (x,y)\n");
  scanf("%f %f", &el->centre[0], &el->centre[1]);

  printf("Length of axes ( > 0 )?\n");
  scanf("%f %f", &el->axes[0], &el->axes[1]);

  printf("Orientation to the vertical?\n");
  scanf("%f", &el->orientation);

  calculate_area(el); // Pass the pointer to another function
}

// Resize an ellipse by the same scaling factor in each dimension.
void resize_ellipse(Ellipse *el, float scale) {
  el->axes[0] *= scale;
  el->axes[1] *= scale;
  calculate_area(el);
}

// Calculate the area of an ellipse and save it in the structure
void calculate_area(Ellipse *e) {
  e->area = PI * e->axes[0] * e->axes[1];
}

void move_ellipse(Ellipse *el, float dx[2]) {
  el->centre[0] += dx[0];
  el->centre[1] += dx[1];
}

Step through this code

Looking at the above example from the bottom upwards, the first thing we see is that move_ellipse() and calculate_area() are passed a pointer to the original ellipse and so are able to change the values of its position and area respectively.

Then resize_ellipse() is also passed a pointer to the original ellipse. As well as accessing the ellipse itself, it also passes this pointer on to calculate_area() to update the area.

Similarly, for read_ellipse() we pass just one pointer, to the ellipse, rather than three or five individual pointers to ellipse.orientation, ellipse.centre[0], etc. and it again passes the value of the pointer it was given on to the function calculate_area(). As mentioned above this is extremely common.

Functions that receive a pointer to a structure often pass that pointer onto other functions.

Step through the above "Key example".
To see a small but typical code in action.
Step through the above "Key example".
Step through until the call to read_ellipse()
When read_ellipse() is active practice minimising main() by clicking the button next to the function name.
- You may then need to manually un-minimise it when read_ellipse() returns.
Note how at various points there are up to three functions with pointers pointing to the same allocated ellipse (just as there are a number of people with your phone number stored inside their mobile phones).

Pass a pointer to a structure to a function.
To practice an extremely common programming task.
Above main() create a function with a name suitable for a function that will print out all the details about a cuboid. It should accept a pointer to a cuboid but for now the body of the function should be left empty.
Now cut-and-paste the code that prints out the cuboid details from main() to the new function.
Build & run. Check the output is correct..
Now do the same for the code that reads in the structure values such as dimensions, etc.
Build & run. Check the output is correct..

More than one type of structure

The ellipse example has just one type of structure, but the generalisation is straightforward.

Example: a projectile

Imagine a function to calculate the position and velocity of a projectile thrown into the air. Its position and velocity are known at time t=0 and we need calculate its position and velocity at time t + dt. Its prototype would look something like:

void move(float x[NDIMS], float v[NDIMS], float mass,
                     float drag_coeff, float ywind[NDIMS],
                     float viscosity, float dt);

This is a very simple problem but the function has seven arguments. Worse, they are all floats or arrays of floats so it would be very easy to get two in the wrong order and the compiler would not notice. There are one hundred and forty four different legal ways of ordering these arguments (and nearly seven thousand if we allow for the chance of putting the same one in twice), but only one of these is the right one!

If we take a look at the arguments we see they split into three groups: x, v, mass and drag_coeff are properties of the projectile, ywind and viscosity are properties of the air and time is a physical quantity in its own right. This suggests we want two structures, one for the projectile and one for the air, and to leave time as it is.

The following code declares what are in effect two new types of variables. Again, this code does not actually create any structures, it just tells the compiler what we mean by "Projectile" and 'Air'.

#define NDIMS 3
typedef struct projectile {
  float x[NDIMS];
  float v[NDIMS];
  float mass;
  float drag_coeff;
} Projectile; 

typedef struct air {
  float ywind[NDIMS];
  float viscosity;
} Air;

We have gone from seven numbers to three things: projectile, air and time.

The function is now declared as:

void move(Projectile *proj, Air *thisair, float dt);

Not only do we now only have three variables rather than seven, all three are of a different type so if we were to call the function with two of its arguments in the wrong order the compiler would notice and tell us.

Thinking at a higher level

Almost without noticing it, we've made bit of a mental leap. We started by thinking about how we could reduce the number of arrays needed to represent our nuclides or reduce and organise the (floating point) arguments to a function. But we have quickly reached the stage where we have stopped talking about integers, floating-point numbers and strings and have started talking about nuclides, ellipses and projectiles.

This is the point about structures: we identify the types of "things" we are dealing with and typically we define a type of structure to represent that type.

Structures allow us to think at a higher level, in terms of the "things" we are dealing with rather than individual data values.

Structures allow extensibility

Another thing we did without noticing it was that we added an "area" member to our ellipse structure. All w had to do was to type in the extra member and recompile. Similarly we mentioned that our nuclide information may need to be extended to include its mass and atomic number.

Our simple projectile example only has one dimension, y. But if our program is successful somebody is bound to ask us to extend it to three dimensions in which case y, vy and ywind will be replaced by arrays. If we used the "separate variables" approach we would have to go through each of our functions changing the number (and type) of arguments.

With the "structure" approach, we just add some more members to the structure definition, change the part of the code responsible for calculating the acceleration and recompile.

It's extremely common for a program to start off quite simply but for more features and properties to be added later, so the question of extensibility is hugely important.

Structures can easily be extended to include new members.

Structures help make our functions more modular

When we added the area to our functions did not need to know about it. Indeed, resize_ellipse() and read_ellipse() jut passed the pointer to calculate_area() to calculate it: they did not even need to know the area member existed.

In our projectile example we might need to deal with the fact that real projectiles spin in the air and have texture. Even if we assume a spherical shape, that's several more variables. And somebody is sure to want to throw non-spherical objects. In this case our height_velocity function is going to split into two parts:

An numerical algorithm for moving the projectile.
A function for calculating the force on the object (and hence its acceleration), taking into account shape, spin, etc.

But if we use structures, the function definition remains unchanged: our acceleration function could become very complicated but our function call remains unchanged. Thus,the rest of our code doesn't need to know about it.

Functions don't even need to know of the existence of structure members that don't concern them.

Reference

Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious.
Brooks, F. R. Jr, The Mythical Man-Month (1975).

Summary

The text of each key point is a link to the place in the web page.

For study after the lecture

Validate Link-check

Privacy & cookies