Keeping track of data
Comments and questions to John Rowe.
The structures we declared in the last last lecture have the
disadvantage that we need to know in advance how many we will
need and give them all names. But for many applications we want
to be able to create and destroy them as the program progresses,
for example if a user opens a new tab in a web browser. How do
we create and keep track of these structures?
Reminder: allocating a structure
In practice structures are nearly always
dynamically allocated via pointers rather than declared as
variables.
We saw in a previous lecture that we can use malloc()
to dynamically create simple structures. The mechanics work
just as we would expect:
struct thingy *p = NULL;
p = xmalloc(sizeof *p);
Having a function to create and initialise a structure
It's a very good idea to define a function
whose job is to allocate the space for a structure and to do
any initialisation as that way we have everything together in
one place. The general form of such a function to allocate and
initialise a new Thing with a double member
called something is something like:
typedef struct thing {
double something;
} Thing;
Thing * newThing() {
Thing *p = NULL;
p = xmalloc(sizeof *p);
printf("Please enter the value of something\n");
scanf("%lg", &p->something);
return p;
}
Step through this code
Two points are worth mentioning here:
1. Do the initialisation in the simplest place
In the above example we have just put a comment where we
would initialise the rest of our new Thing. We have the choice of just
putting the code directly in the function or of having a
separate function to setup the Thing. If we choose to have a
separate function then our newThing() function
becomes extremely simple:
Thing *newThing(void) {
Thing *p;
p = xmalloc(sizeof *p);
setupThing(p);
return p;
}
Our setupThing() function might look a bit like:
void setupThing(Thing *p) {
printf("Please enter the value of something\n");
scanf("%lg", &p->something);
}
As always we just go for the simplest and clearest option in
any given circumstance:
- Whenever we are faced with a choice, our first question
should always be: "which choice
will be the clearest and give me the least chance of
making a mistake?"
2. Choose how to initialise the structure
Although for simplicity we often illustrate
reading in the structure members from the keyboard, there are
a number of other options. For example, it might be better to
initialise all the members to zero which we can do using the
following neat trick:
void zeroThing(Thing *t) {
Thing zero = {0};
*t = zero;
}
This works because:
- Structures can be inialised on declaration just like
arrays.
- Missing initialers are treated as being zero. (NB: as
always this is only true if the structure is initialised,
uninitialised structures are random.)
- Structures are first-class variables so we can assign a
whole structure the value of another whole structure.
We could then decide to call zeroThing() instead
of setupThing():
Thing *newThing(void) {
Thing *p;
p = xmalloc(sizeof *p);
zeroThing(p);
return p;
}
or we may choose to do it all inside of
newThing():
Thing *newThing(void) {
Thing zero = {0}, *p;
p = xmalloc(sizeof *p);
*p = zero;
return p;
}
The function call
Whichever choice we make, the newThing()
function is now essentially a drop-in replacement for malloc()
except that it knows the number of bytes required and does any
initialisation. A typical call looks like this:
Thing *t = NULL;
t = newThing().
It's a very good idea to define a function whose
job is to allocate the space for a structure and to do any
initialisation.
Refactoring
A common situation (and one we will encounter
in the following mini-exercise) is when we have a chunk of
code that is part of a larger function but we decide it would
be better if it became its own separate function. This could
be because:
- We need to use it more than once.
- It, or the surrounding code, have become sufficiently
complex that it is clearer to have it separately.
- It should have been a function to start off with!
This is an example of refactoring;
the practice of tidyng our code in a way that doesn't change
what it does but makes it easier to maintain and extend in the
future. It's good practice to continually
do this in small chunks rather than let our code turn into a
mess and do a large "spring-clean".
Notice how "Cut-and-Paste"
(into a function)
is better than Copy-and-Paste.
A typical way to do this is to cut and
paste the code into its own small function. For example,
imagine we had decided not have a separate
"newThing() function because we only needed to
allocate a new Thing in one place. A little later we extend
the program and need to alocate a new Thing in another place
as well. The wrong way to do this would be to copy and paste
the allocation code into another part of the program. The correct
way would be to put the code inside a small function and to call
that function twice.
Often, although not in this case, the new function will
require some arguments from the calling function. In our case
the shell function is simply:
Thing *newThing(void) {
Thing *p;
return p;
}
- newCuboid function
- To practice creating a function whose job is to allocate
and initialise a new structure and to illustrate simple
refactoring.
- Open up the on-line compiler in a new window
- Click on "Saved programs" and open your "cuboid"
mini-exercise.
- Look carefully at the newThing() stub function
above. Now above main() create a new shell function that is
a version of this adjusted for the name of your
cuboid object. Use the same variable name as in your main()
function.
- Move all of the code that reads in the values of the
cuboid into your new function. However, keep the
printf() statement that prints out the
cuboid inside main().
- Put a call to your "newCuboid" function inside main().
(See the "function call"
example above.)
- Build & run. Check the output is correct..
- So far all we have done is to refactor our
code: it does exactly the same as before but we shall see
later that it is easier to extend for when we want
multiple cuboids..
Example: a slightly more complicated allocation.
Some structures may require more than one call to malloc() to
fully allocate them. Consider a structure which itself has a
dynamically-allocated string:
Notice how str is a pointer not an
array. In order to use it we must make it point at something.
#include <stdlib.h>
typedef struct wotsit {
float val;
char *str;
int length;
} Wotsit;
To obtain a new wotsit structure we will need to first
allocate a structure and then allocate the memory for the
character string. We show a short but complete example as it's
worth stepping through it to watch the two calls to malloc()
.
#include <stdio.h>
#include <stdlib.h>
typedef struct wotsit {
float val;
char *str;
int length;
} Wotsit;
Wotsit *newWotsit(int len) {
Wotsit *w;
w = xmalloc(sizeof *w);
w->length = len;
w->val = 0;
w->str = xmalloc(w->length);
return w;
}
int main() {
Wotsit *william;
william = newWotsit(20);
snprintf(william->str, william->length,
"Hello, world\n");
return 0;
}
Step through this code
Structures which themselves contain dynamically
allocated members will need more than one call to malloc().
Memory that is no longer required must be released using free() otherwise our program
will gradually use more and more memory until the computer runs
out (this is called a memory leak). However, memory is
automatically freed when the program finishes.
free()d memory must never be referenced or freed again.
Thus the absolute rule is that we can't use memory after
we've freed it and we can free it only once. This typically
means that calls to free() are done in the reverse
order to calls to malloc(). To reuse the above
example we might define a function freewotsit():
void freewotsit(Wotsit *w) {
free(w->str);
free(w);
}
The above version is correct. Had we written:
free(w);
free(w->str); // Wrong!!
It would have been very, very wrong.
Step
through a complete example of the correct code.
Keeping links consistent
If a person changes ther phone number
without telling their friends then a lot of people's contacts
lists will be incorrect and trying to dial their friend will
result in "number unobtainable" or a wrong number.
When we dynamically-allocate structures we will often have
several pointers pointing to them. Each of these will need to
be updated (often by setting them to NULL). So the challenge
is to keep links consistent when we free memory, whether this
is done via free() or realloc().
Failure to do this is a notorious source
of bugs.
When freeing data we must take care of any
pointers pointing to it.
The reason why structures are normally allocated dynamically
instead if just declaring them is that we want to be able to
create and destroy them as the program progresses, for example
if a user opens a new tab in a web browser. How do we keep
track of these structures?
Allocating memory is the easy bit and in reality is just the
first of three tasks:
- Getting more memory.
- Keeping track of the memory we have
obtained.
- Releasing it when we have finished with
it, and handling any pointers that point to it..
We have already covered 1 (malloc()) and 3
(free());
this section is about 2.
Reminder: forgetting the results of previous calls to
malloc()
As mentioned in the previous lecture on dynamic allocation of
arrays, we can call malloc() as often as we like to
give ourselves more and more structures but there's a problem:
once we have lost the value of a pointer to memory allocated
by malloc() there is no way to find it out again so
how do we remember where all our structures are?
The following code illustrates the problem: it allocates
memory for n structures but promptly forgets where
the first n-1 are stored:
for (int i = 0; i < n; ++i)
p = malloc(sizeof *p);
Step
through this code
The first n-1 structures are thus inaccessible: we
have allocated the memory but forgotten where it was. It's
like writing down someone's phone number and losing the piece
of paper we wrote it on - they have a perfectly good phone but
we can't ring it because we've lost the number.
One solution
We can see one possible solution by looking at the
step-though of the previous code where we have a set of
allocated chunks of memory but have forgotten the address of
all but one.
Could each allocation remember the address of the previous
one?
However, each allocated chunk of memory has an inviting space
at the end and the obvious thought is "could we use the last
part of each chunk to store the address of the previous
chunk?"
One solution to the problem of remembering the
addresses of an unknown number of chunks of allocated memory
is for each allocated chunk to remember the address of the
previous one.
Lists of people's favourite names or favourite colours
Suppose we have a large group of people
sitting in order and we want to be able to recite all their
names, favourite colours, etc. without writing them down. In
principle we could do this by:
- Remembering the name of the first person
- Getting each person to remember the name of the person
after them
- Having the last person know that there is nobody after
them
Then somebody could recite the names of the whole class, or
some other piece of information such as their favourite
colour.
Before thinking of how we might create such a list of people,
let's imagine how we would use such a list to recite
everybody's favourite colour.
Reciting the list of the class's names
To recite the list we can just say:
- Go to the first person on the list
- Ask them the name of the next person on the list.
- Go to that person.
- Go to step 1.
However, this has a problem: we have failed to handle part 3
of the illustration: telling the last person there is nobody
after them.
[Class exercise.]
Adding a person to the list
If we were to implement this with people
then the basic algorithm for adding somebody to the start of
the list would be:
The improved traversal function
Now that we have made the crucial step of identifying when the
list is at an end, our improved traversal algorithm is:
- Make the first person on the list
the "current person".
- If that person is "nobody" then quit.
- Ask them the name of the next person on the list
and make him or her the new current person.
- Go to step 1.
This leads us to the idea of making a list
of pointers. To make a list we need to add a new element to
our structure: a pointer to the next one. The pointer is NULL
if there is no next one, ie this element is the last one in
the list:
typedef struct particle {
float mass;
float x[NDIM];
struct particle *next;
} Particle;
The structure definition contains an extra
member which is a pointer (usually called "next")
to the next structure in the list.
We have to
use struct particle inside the
structure definition as at this stage the Particle
alias has not been defined.
Printing the value of the next member
Although it's not something we need to do very often (and
it's never needed for a normally functioning program), we
might want to print the value of a structure's
next member out as a diagnostic. It
makes sense to print out the address of the structure
itself as well.
The format to print out the value of
a pointer is %p and it will usually print it
out in hexadecimal (base 16) starting with 0x.
The following prints out the value of a pointer
to a structure and the address of its next member.
As stated above, and as we shall see below,
it can be a useful diagnostic:
printf("p: %p p->next: %p\n", p, p->next);
The output will typically look somehting like:
p: 0xff8b3454 p->next: 0xff8b3514
where "0x" indicates a hexadecial number. A NULL pointer may
be printed out as 0x00000000 or with text such
as (nil), NULL, etc.
- Add a next member and print it out.
- A little practice with pointers.
- Add a next member to your structure.
- Inside your newcuboid function set the
next member to NULL.
- Back inside main(), immediately above your
printf() statement that prints out the cuboid
values add a printf() statement like the one
above to print out the value of your cuboid pointer and
its next member.
- Build & run and check that the next member is
NULL.
Just as in the human case we need something
to hold the identity (ie the address) of the first item of the
list. Typically this is a pointer called first or firstThing,
etc.
The process of creating a linked list of
people converts directly into the process of creating linked
lists of structures. We show then here side by side:
Human list |
C program |
Initialisation: |
Set the first person to "nobody". |
Thing *firstThing
= NULL; |
For each new person or structure: |
1. Find a new person. |
1. Thing *new =
malloc(sizeof *new) |
2. Tell the new person their next person is the
current first person. |
2. new->next = firstThing; |
3. The new person is now the new first person on the
list. |
3. firstThing = new; |
Steps 1-3 are easy to add to our newparticle()
function, we just need to:
- Pass the value of the first particle pointer as an
argument
- Assign this to p->next (where p is the pointer to the
newly allocated particle)
We have highlighted these two steps in the following newparticle
function which allocates and initialises a new
structure, and adds it on to the beginning of an
existing linked list:
#include <stdlib.h>
#include <stdio.h>
#define NDIM 2
typedef struct particle {
float x[NDIM];
float mass;
struct particle *next;
} Particle;
struct particle * newparticle(Particle *first) {
Particle *p;
p = xmalloc(sizeof *p);
p->next = first;
printf("Please enter the mass and x and y values.\n");
scanf("%f %f %f", &p->mass, &p->x[0], &p->x[1]);
return p;
}
int main() {
int i, n;
Particle *first = NULL;
printf("How many structures would you like?\n");
scanf("%d", &n);
for (i = 0; i < n; ++i)
first = newparticle(first);
return 0;
}
Step through this code
The call to newparticle()
At first sight the call to newparticle()
looks very strange:
first = newparticle(first);
The interpretation of this is "pass the current value of first
to newparticle() and use the value it returns as the
new value of first".
Notice also:
- We initialised the first element to NULL to show that the
list started off empty: this is vital.
- We put all the checking that malloc() didn't
fail into a utility function called xmalloc() that
we can now use just like malloc() but without the
tedious checking.
It is essential to initialise the first member of
the list to NULL.
- Covert your newcuboid function to create a linked list
- To practice creating a list
- Inside main() make sure that the pointer to your
cuboid structure is initialised to NULL. (This is now going
to be the first member of the list of cuboids)
- Edit your newcuboid function so that it accepts a
pointer to a cuboid structure, in the same way
that newparticle() above accepts a pointer to
a Particle.
- Change the statement inside your newcuboid function
that sets the value of p->next to NULL
so that it now sets p->next to the value
of the pointer it received as a parameter.
- Change the call to your newcuboid function to
pass it the value of the current cuboid structure
(which should be NULL at this point).
- Build & run. Check the output is correct.. At this stage your next member
should still be NULL as there is only one item
in the list (ie there is no second item).
- At this stage main() should contain the following
three statements in successsion:
- A call to your newcuboid function.
- A diagnostic that prints out the values of your pointer
and its next member.
- A statement that prints out the cuboid values such as mass etc.
Now enclose all three of these statements inside a simple
for(){ } loop that executes twice.
- Build & run. Check the output is correct.. For the second diagnostic "%p" printout
the value of next should no longer be NULL
but should be the address of the first structure, something like:
p: 0xff8b3454 p->next: (nil)
p: 0xff8b3514 p->next: 0xff8b3454
Notice how the second p->next is the value of the first p
Again, we can directly convert the process of traversing a
linked list of humans into traversing a linked list of
structures. We will use a temporary pointer p to refer to
the structure under consideration at any one time.:
Human list |
C program |
Initialisation: |
Make the first person on the list
the "current person".
| Thing *p = firstThing |
Traversal: |
1. If that person is "nobody" then quit
| 1. if ( p == NULL) quit the loop
|
2. Ask them the name of the next person on the list
and make him or her the new current person..
| 2. p = p->next |
3. Go back to step 1
| 3. Go back to step 1
|
Using a for() loop
The loop above for traversing a linked list of structures
fits well into a for() loop:
Initialisation
| Thing *p = firstThing
|
Test
| p != NULL
|
Increment
| p = p->next |
The following for loop loops over every element of the
list in turn printing out their mass and co-ordinates. The
pointer p is initially equal to first and
then becomes first->next and so
on.
for (p = first; p != NULL; p = p->next)
printf("%f (%f, %f)\n", p->mass, p->x[0], p->x[1[);
A complete example for study later
#include <stdlib.h>
#include <stdio.h>
#define NDIM 2
typedef struct particle {
float x[NDIM];
float mass;
struct particle *next;
} Particle;
struct particle * newparticle(Particle *first) {
Particle *p;
p = xmalloc(sizeof *p);
p->next = first;
printf("Please enter the mass and x and y values.\n");
scanf("%f %f %f", &p->mass, &p->x[0], &p->x[1]);
return p;
}
int main() {
int i, n;
Particle *first = NULL;
printf("How many structures would you like?\n");
scanf("%d", &n);
for (i = 0; i < n; ++i)
first = newparticle(first);
for (Particle *p = first; p != NULL; p = p->next)
printf("Mass: %f position: (%g, %g)\n", p->mass, p->x[0], p->x[1]);
return 0;
}
Step through this code
To go over every element of a linked list use:
for(p = first; p != NULL; p = p->next)
- Traverse your linked list of cuboids
- To practice traversing a list and to check
the creation of the list.
- Comment out the diagnostic "%p" statement
inside main().
- After you have setup the linked list of cuboids,
create a for() loop like the one above to traverse the
list printing out each cuboid. (You can just resuse the
printf() statement you already have.)
- Build & run. Check the output is correct.. Your second loop should print out the same
cuboids as the first one but in the reverse order.
- Now before the fixed loop that creates two cuboids,
ask the user how many they want and modify the loop to
create that many. (You will need to declare a variable to
hold the number of cuboids required.)
- Build & run. Check the output is correct..
Preserving the order of the list
Our current newparticle() function creates the list
in reverses order. This is not a problem if this is just a way
of keeping track of the items, but it may be that the order is
in fact significant. If we wish to preserve the order we can
tweak the algorithm slightly:
- Allocate a new structure and set its next
member to NULL
- If the list is empty just return the address of the new
structure.
Otherwise:
- Find the last member of the current list.
- Set its next member to point to the new
structure.
- Return the address of the first member of the list.
Note that in this case the start of the list changes only if
the list was initially empty.
The code for finding the last element of the list is similar
to the code for going over the whole list except that instead
of stopping when The code looks like this:
struct particle *newparticle(Particle *first) {
Particle *p, *last = first;
p = xmalloc(sizeof *p);
printf("Please enter the mass and x and y values.\n");
scanf("%f %f %f", &p->mass, &p->x[0], &p->x[1]);
p->next = NULL;
if ( first == NULL )
return p;
while(last->next != NULL)
last = last->next;
last->next = p;
return first;
}
Step through a
complete example
This introduces an extra complication: as well as freeing
the structure itself we also have to remove it from the list.
To remove the first member of a list we simply set the new
first member to old_first->next and free old_first.
For every member except the first we find its predecessor and
change its next pointer to the next of the
one we want to free:
Initial list
- A -> B -> C -> D -> NULL
After removing 'B'
- A -> C -> D -> NULL
free(B)
After removing 'A'
- C -> D -> NULL
free(A)
If we are removing the first element of the list
then the first element of the list will be the former second
member (or NULL)
Otherwise we must traverse the list looking for
the to-be-deleted member's predecessor
Two things to look out for
It would be good to check for two other possibilities:
- The case when the function is asked to remove "nobody"
(or NULL in pointer-speak). This may seem strange ("if there
is nothng to do why would they bother to call it?") but
remember two of our core principles for functions:
The purpose of a function is to make life easier for the person calling it.
And its consequence:
Decisions about what a function should do, or whether the function should exist at all, are
usually answered by asking the question "what makes it easier
for the person who calls this function?"
And it is easier to write:
freesomething(pointer);
Than:
if (pointer != NULL)
freesomething(pointer);
And it is also safer for our function to do the check
than for it to crash.
- The possibility of a mistake, ie asking to remove
something that is not actually a member of the list. We will
discover this when we go down the list looking for the
predecessor of our "to go" member: if we get to the end
without finding our predecessor we have made a mistake.
- A variation of the above is that the list is actually
empty.
It's perfectly OK to ask the function to remove
"nothing" and the function must handle this.
Functions should try to check for common errors
in calling them.
Removing a member of a list
Stripped-down version
Let's first look at a version that does not do any of the
error checking:
struct particle * freeparticle(Particle *togo, Particle *first) {
Particle *parent;
if (togo == first) first = first->next;
else { for (parent = first; parent->next != togo; parent = parent->next) {
}
parent->next = togo->next;
}
free(togo);
return first;
}
The empty for() loop looks a little unusual, it is
because the mere act of finding the end of the loop is actualy
the task we want to acheive.
Full version
Here we incorporate the "nothing to do" and error checks:
struct particle * freeparticle(Particle *togo, Particle *first) {
Particle *parent;
if (togo == NULL) return first;
else if ( first == NULL ) {
fprintf(stderr, "Trying to remove something from an empty list!\n");
return NULL;
}
if (togo == first) first = first->next;
else { for (parent = first; parent->next != togo; parent = parent->next) {
if (parent->next == NULL) {
fprintf(stderr, "%p isn't in the list!\n", togo);
return first;
}
}
parent->next = togo->next;
}
free(togo);
return first;
}
Step through
a complete example of this code
Forgetting to free() the data
As always, forgetting to free() the data leads to
"orphaned" blocks of allocated memory.
Step
through a version of this code that forgets to free the
memory.
Forgetting to free() the data leads to
"orphaned" blocks of allocated memory.
The text of each key point is a link to the place in the web page.