Physics and Astronomy |
Back to top
On this page
Contents Structures organise our dataShow me your
[functions] and conceal your [data structures] and I shall
continue to be mystified,
show me your [data structures] and I won't usually need your [functions]; they'll be obvious. Brooks, Structures do for variables what functions do for codeRepresentation is the essence of
programming. Brooks We have already discussed the fact that nearly every program is so large that it is completely impossible to hold in all in our head at the same time.
Structures allow us to organise our data, letting us think at a higher level without having to keep track of the details Composite variablesWe have already encountered C's two built-in composite variables: complex numbers and arrays. The latter allow us to think in terms of high-level mathematical concepts such as vectors and matrices and allow us to pass a single memory address to a function to allow it to access and modify any member of the array. However arrays are limited to composite objects of the same type (all ints, all doubles, etc.) C structures remove these limitations. C structures allow us to define our own composite data types where several "sub-variables" (called "members") can be combined into one composite variable which can be treated as a single unit, similar to the individual elements of an array. Since structures have members with different types it would be inconvenient to refer to them by number, instead they are referred to by name using a dot (.) rather than square brackets. An array has elements of the same type referred as array[number], a structure has members of different types referred to as structure.name. Defining a new composite variable (structure) typeExample: a nuclear problemImagine a simple problem involving radioactive nuclides. If each nuclide has a name (stored as an array of characters) and a a half life (stored as a double) then to store the data for several nuclides we will need a two-dimensional array of characters and a one-dimensional array of doubles. This isn't very convenient, but C doesn't define a composite variable type consisting of a string and a double and even if it did, it's likely that a little later on we would want to add more data such as mass and atomic number. It would be great if C allowed us to define our own variable types, with names such as "Nuclide" or "Isotope", and luckily it does. The following code defines a composite variable type called a "Nuclide": The structure name (nuclide) is optional but if it is present it's normal to give the structure and the typedef similar names. In our case the typedef name is the same as the structure name but is capitalised #define MAXLEN 32
typedef struct nuclide {
double halflife;
char name[MAXLEN];
} Nuclide;
The above code defines a new data type called "Nuclide" consisting of a double called halflife and a character array called name. This definition does not create any Nuclide variables. Rather it puts the variable type "Nuclide" on exactly the same basis as the built-in types int, float etc: we can now create (declare) some Nuclides if we need to. The structure definition does not create any structures it just says what the structure means so we can later create some. It's also worth noticing that the Nuclide structure contains an array: structures can contain arrays and we shall see below that we can have arrays of structures.
Optional asideThe above combines two C features: structures, which actually do the work, and typedefs which give user-friendly names to things which already exist. StructuresThe following is the bare structure definition (technically referred to as a "specifier") which defines a new variable type, "struct nuclide": #define MAXLEN 32
struct nuclide {
double halflife;
char name[MAXLEN];
} ;
We can now create variables of type "struct nuclide" just like ints, floats, etc. typedefSince "struct nuclide" is not the most user-friendly name for a new variable type, we can make things a little neater by using C's "typedef" mechanism to give our structure a neater name: typedef struct nuclide Nuclide; Combining the twoFor convenience we may combine the two as above: typedef struct nuclide { Now "Nuclide" and "struct nuclide" are synonyms; they mean the same thing.It is a common convention to give the synonym the same name as the structure type but with an initial capital letter.
Declaring structuresIf we have the above code just once at the top of our file then Nuclide joins the list of variables we can now declare, just like ints, floats, etc, and we can access the individual structures members as varname.member: #include <stdio.h> #include <string.h> #define MAXLEN 32 typedef struct nuclide { double halflife; char name[MAXLEN]; } Nuclide; int main() { Nuclide nuc; nuc.halflife = 3.2; strncpy(nuc.name, "Mystuff", MAXLEN); nuc.name[MAXLEN-1] = '\0'; // Now do something... return 0; } Once we have defined what a structure means we can then create some just like any other variable type. In the above example we have assigned a value to nuc.mass and used nuc.name as an argument to strncpy(). We could have also read in the value of nuc.halflife with scanf(): printf("Please enter the half-life.\n"); scanf("%lg", &nuc.halflife); Anything we can do with an ordinary variable or an array we can do when it's part of a structure We can also have arrays of structures, although these are quite uncommon: #include <stdio.h> int main() { Nuclide nucs[MAXNUCLIDES]; for (int n = 0; n < MAXNUCLIDES; ++n) scanf("%s %lg", nucs[n].name, &nucs[n].halflife); ... return 0; }
Another example: an ellipseThe following defines a structure which we might use to represent an ellipse: typedef struct ellipse { float centre[2]; float axes[2]; float orientation; float area; } Ellipse;We will use this example in the rest of the notes. Initialising structures to all zerosIt's possible to initialise structures in much the same way as arrays, by putting the values inside { }, so in theory we can write: Ellipse ellie = { { 0.7, 45.2 }, /* centre */ { 2.3, 8.4 }, /* axes */ 1.87 /* orientation */ }; However this is horrifically unclear and will fail completely if we add another member at the start of the structure. The only real use of this is to initialise a structure to all zeros using the rule that missing initialisers are set to zero:
Ellipse zero = { 0 }; // All zeros
When to use structures and what to put in themIn general: Any type of "thing" in the problem we are thinking about whose properties cannot be represented by an existing variable type or array should normally have its own structure type. The structure should tell us everything we need to know about that thing. As a general rule we should ask ourselves: "If I had two ellipses (or nuclides etc.) what would I need two of and what would I still just need one of?" Things we would need two of should be part of the structure. For example, if we had two ellipses they would each have their own centre and axes but would share the same value of PI. Structures are "proper variables"Unlike arrays where the "value" of the name of an array is the address of its first element, structures are "proper variables" that can be copied (although this is quite rare): mystruct = yourstruct; This has a serious consequence when passing structures to functions as we will see in the next example. Passing structures to functionsGiven that structures are proper variables we may pass copies of them to functions. This rather simple example shows a function whose job is to calculate and print out the area of an ellipse. (In practice this function is so simple it probably isn't worth making a separate function.) Just for fun it also makes a doomed attempt to move the ellipse. #include <stdio.h> #define PI 3.14159265358979 typedef struct ellipse { float centre[2]; float axes[2]; float orientation; } Ellipse; void print_area(Ellipse el) { float area; area = PI * el.axes[0] * el.axes[1]; printf("The area is %f\n", area); el.centre[0] = 123.456; // Move the ellipse el.centre[1] = -78.9; } int main() { Ellipse ellie; printf("Centre? (x,y)\n"); scanf("%f %f", &ellie.centre[0], &ellie.centre[1]); // NB: in a real program we should check // the axes are actually > 0 printf("Length of axes ( > 0 )?\n"); scanf("%f %f", &ellie.axes[0], &ellie.axes[1]); printf("Orientation to the vertical?\n"); scanf("%f", &ellie.orientation); print_area(ellie); return 0; }
This example illustrates why using structures without pointers is only of limited use: the function print_area()receives a copy of the original structure, not the structure itself, so if we change the copy inside the function the original is unchanged. For this reason structures are almost always used with pointers. Changing the values of the members of an ellipse structureThere are several reasons we may wish to allow a function to modify an ellipse, for example:
Reminder: pointersWe have already encountered this problem in a previous lecture where we wanted a function to be able to manipulate an array declared inside another function. Let's remind ourselves of what we did:
Pointers
The solution when we want a function to be able to manipulate a structure declared inside another function is the same: pass the address of the structure to the function and have a pointer as a parameter to the function to receive the address. Thus every function has a pointer pointing to the same original ellipse structure and is making changes to that structure.
Pointers to structuresStructures are most useful when combined with pointers. These work just as we would expect. Suppose, as suggested above, that we don't just want our function to print out the area of the ellipse but to store it for later. To do this we need to add an "area" member to our ellipse structure: typedef struct ellipse { float centre[2]; float axes[2]; float orientation; float area; } Ellipse; The enlarged structure now takes twenty four bytes. A particular Ellipse, say called "ellie", might be stored starting at byte 800 in which case its members would be laid out as follows: ----------------------- ------------------------------------------------ Byte number: | 800 - 803 | 804 - 807 | 808 - 811 | 812 - 815 | 816 - 819 | 820 - 823 | ----------------------- ------------------------------------------------ Used for: | centre[0] | centre[1] | axes[0] | axes[1] |orientation| area | ------------------------------------------------------------------------ | <-------------------------- ellie ----------------------------------> |A statement such as: ellie.orientation = 11.4; would mean that the computer would go to the start of ellie (800), go along 16 bytes to byte 816 and write the value 11.4 into the four bytes 816 - 819. Alternatively, we could declare a pointer to an Ellipse, assign it the address of ellie (800) and use that pointer, as in the following rather contrived example: #define PI 3.14159265358979 int main() { Ellipse ellie, *ep; // ellie is a structure, ep just a pointer ep = &ellie; (*ep).orientation = PI/4; // More code here...The notation (*ep).orientation is rather awkward so pointers to structures are used so often they have their own notation: ep->orientation = PI/4;The fact that they have their own notation should tell us how important pointers to structures are! If p is a pointer to a structure then p->member accesses a member of that structure. Our code is now exactly the same as before but with structure. replaced by pointer->, as in this short example: Ellipse ellie, *ep; // ellie is a structure, ep just a pointer
ep = &ellie;
ep->orientation = PI/4;
Dynamically allocating structuresAlthough the previous code is entirely legal it's a bit of a bodge. Rather than declaring a structure and then having a pointer whose value is the address of that structure it's far more common to dynamically allocate the stucture: Ellipse *e = NULL; e = malloc( sizeof *e); if ( e == NULL ) { fprintf(stderr, "Out of memory\n"); exit(99); } e->orientation = PI/4; Structures are usually dynamically allocated via pointers rather than declared as variables.
Passing pointers-to-structures to functionsThe old function to print the area of an ellipse had the prototype: // Print the area of an elipse to the screen
void print_area(Ellipse el);
The new function, which we have renamed calculate_area() now saves the value of the area in the area structure member rather than just printing it. That is it accepts a pointer and modifies the original ellipse structure. It has the prototype: // Calculate the area of an ellipse and save it in the structure
void calculate_area(Ellipse *el);
The function itself is extremely short (partly because we have put the "move ellipse" code into a separate function where it belongs): // Calculate the area of an ellipse and save it in the structure
void calculate_area(Ellipse *e) {
e->area = PI * e->axes[0] * e->axes[1];
}
Passing pointers-to-structures to functions allows the function to access and change the members of the structure. Passing pointers between functionsIn reality the calculate_area() function above is still a little simple to be worth having as a separate function. However wWe may wish to define a new function to read in the values of the ellipse, which we might imaginitively call "read_ellipse()", and another to resize an ellipse by mutliplying each dmension by a constant scaling factor which we might call resize_ellipse(). In this case it would be quite sensible for both read_ellipse() and resize_ellipse() call our new calculate_area() function to update the area. They would do this by passing the pointer they receive to our new calculate_area(). As an example the resize_ellipse() function might look like this: // Resize an ellipse by the same scaling factor in each dimension.
void resize_ellipse(Ellipse *el, float scale) {
el->axes[0] *= scale;
el->axes[1] *= scale;
calculate_area(el);
}
This illustrates an
extremely common practice: functions that
receive pointers often pass those pointers on to other
functions. The obvious analogy is with a phone
number: If I were to tell you "my phone
number is 0314 159 265, ring me any time you have a question
about programming", not only could you ring me yourself but
could pass that number onto your class-mates who could also ring
me.
Similarly, if a function has a pointer to something it can pass the value of that pointer to another function which can then access the original object in memory. The complete codeThe following code has three top-level functions and one utility function:
#include <stdio.h> #include <stdlib.h> #define PI 3.14159265358979 typedef struct ellipse { float centre[2]; float axes[2]; float orientation; float area; } Ellipse; void calculate_area(Ellipse *el); void read_ellipse(Ellipse *el); void resize_ellipse(Ellipse *el, float scale); void move_ellipse(Ellipse *el, float dx[2]); int main() { float moveby[2], resize; Ellipse *e = NULL; e = malloc( sizeof *e); if ( e == NULL ) { fprintf(stderr, "Out of memory\n"); exit(99); } printf("Welcome to the ellipse program\n"); read_ellipse(e); printf("The area of the ellipse is: %f\n", e->area); printf("Amount to move the ellipse?\n"); scanf("%g %g", &moveby[0], &moveby[1]); move_ellipse(e, moveby); printf("Amount to resize the ellipse?\n"); scanf("%g", &resize); resize_ellipse(e, resize); return 0; } void read_ellipse(Ellipse *el) { printf("Centre? (x,y)\n"); scanf("%f %f", &el->centre[0], &el->centre[1]); printf("Length of axes ( > 0 )?\n"); scanf("%f %f", &el->axes[0], &el->axes[1]); printf("Orientation to the vertical?\n"); scanf("%f", &el->orientation); calculate_area(el); // Pass the pointer to another function } // Resize an ellipse by the same scaling factor in each dimension. void resize_ellipse(Ellipse *el, float scale) { el->axes[0] *= scale; el->axes[1] *= scale; calculate_area(el); } // Calculate the area of an ellipse and save it in the structure void calculate_area(Ellipse *e) { e->area = PI * e->axes[0] * e->axes[1]; } void move_ellipse(Ellipse *el, float dx[2]) { el->centre[0] += dx[0]; el->centre[1] += dx[1]; } Looking at the above example from the bottom upwards, the first thing we see is that move_ellipse() and calculate_area() are passed a pointer to the original ellipse and so are able to change the values of its position and area respectively. Then resize_ellipse() is also passed a pointer to the original ellipse. As well as accessing the ellipse itself, it also passes this pointer on to calculate_area() to update the area. Similarly, for read_ellipse() we pass just one pointer, to the ellipse, rather than three or five individual pointers to ellipse.orientation, ellipse.centre[0], etc. and it again passes the value of the pointer it was given on to the function calculate_area(). As mentioned above this is extremely common. Functions that receive a pointer to a structure often pass that pointer onto other functions.
More than one type of structureThe ellipse example has just one type of structure, but the generalisation is straightforward. Example: a projectileImagine a function to calculate the position and velocity of a projectile thrown into the air. Its position and velocity are known at time t=0 and we need calculate its position and velocity at time t + dt. Its prototype would look something like: void move(float x[NDIMS], float v[NDIMS], float mass, float drag_coeff, float ywind[NDIMS], float viscosity, float dt);This is a very simple problem but the function has seven arguments. Worse, they are all floats or arrays of floats so it would be very easy to get two in the wrong order and the compiler would not notice. There are one hundred and forty four different legal ways of ordering these arguments (and nearly seven thousand if we allow for the chance of putting the same one in twice), but only one of these is the right one! If we take a look at the arguments we see they split into three groups: x, v, mass and drag_coeff are properties of the projectile, ywind and viscosity are properties of the air and time is a physical quantity in its own right. This suggests we want two structures, one for the projectile and one for the air, and to leave time as it is. The following code declares what are in effect two new types of variables. Again, this code does not actually create any structures, it just tells the compiler what we mean by "Projectile" and 'Air'. #define NDIMS 3
typedef struct projectile {
float x[NDIMS];
float v[NDIMS];
float mass;
float drag_coeff;
} Projectile;
typedef struct air {
float ywind[NDIMS];
float viscosity;
} Air;
We have gone from seven numbers to three things: projectile, air and time. The function is now declared as:void move(Projectile *proj, Air *thisair, float dt); Not only do we now only have three variables rather than seven, all three are of a different type so if we were to call the function with two of its arguments in the wrong order the compiler would notice and tell us. Thinking at a higher levelAlmost without noticing it, we've made bit of a mental leap. We started by thinking about how we could reduce the number of arrays needed to represent our nuclides or reduce and organise the (floating point) arguments to a function. But we have quickly reached the stage where we have stopped talking about integers, floating-point numbers and strings and have started talking about nuclides, ellipses and projectiles. This is the point about structures: we identify the types of "things" we are dealing with and typically we define a type of structure to represent that type. Structures allow us to think at a higher level, in terms of the "things" we are dealing with rather than individual data values. Structures allow extensibilityAnother thing we did without noticing it was that we added an "area" member to our ellipse structure. All w had to do was to type in the extra member and recompile. Similarly we mentioned that our nuclide information may need to be extended to include its mass and atomic number. Our simple projectile example only has one dimension, y. But if our program is successful somebody is bound to ask us to extend it to three dimensions in which case y, vy and ywind will be replaced by arrays. If we used the "separate variables" approach we would have to go through each of our functions changing the number (and type) of arguments. With the "structure" approach, we just add some more members to the structure definition, change the part of the code responsible for calculating the acceleration and recompile. It's extremely common for a program to start off quite simply but for more features and properties to be added later, so the question of extensibility is hugely important. Structures can easily be extended to include new members. Structures help make our functions more modularWhen we added the area to our functions did not need to know about it. Indeed, resize_ellipse() and read_ellipse() jut passed the pointer to calculate_area() to calculate it: they did not even need to know the area member existed. In our projectile example we might need to deal with the fact that real projectiles spin in the air and have texture. Even if we assume a spherical shape, that's several more variables. And somebody is sure to want to throw non-spherical objects. In this case our height_velocity function is going to split into two parts:
Functions don't even need to know of the existence of structure members that don't concern them. ReferenceShow me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious.Brooks, F. R. Jr, The Mythical Man-Month (1975).
SummaryThe text of each key point is a link to the place in the web page. Structures do for variables what functions do for code
Defining a new composite variable (structure) type
When to use structures and what to put in them
Pointers to structures
Thinking at a higher level |