Keeping track of data
Comments and questions to John Rowe.
The structures we declared in the last last lecture have the
disadvantage that we need to know in advance how many we will
need and give them all names. But for many applications we want
to be able to create and destroy them as the program progresses,
for example if a user opens a new tab in a web browser. How do
we create and keep track of these structures?
Reminder: allocating a structure
We saw in a previous lecture that
we can use malloc()
to dynamically create structures for us:
In practice structures are nearly always
dynamically allocated via pointers rather than declared as
variables.
The mechanics work just as we would expect:
struct thingy *p = NULL;
p = xmalloc(sizeof *p);
In practice however it's a very good idea to define a function
whose job is to allocate the space for a structure and to do any
initialisation.
Example: a slightly more complicated allocation.
Some structures may require more than one call to malloc() to
fully allocate them. For example, consider a structure which
itself has a dynamically-allocated string:
Notice how str is a pointer not an
array. In order to use it we must make it point at something.
#include <stdlib.h>
typedef struct wotsit {
float val;
char *str;
int length;
} Wotsit;
Here we will need to first allocate a structure and then allocate
the memory for the character string. We show a short but complete
example as it's worth stepping through it to watch the two calls
to
malloc() .
#include <stdio.h>
#include <stdlib.h>
typedef struct wotsit {
float val;
char *str;
int length;
} Wotsit;
Wotsit *newwotsit(int len) {
Wotsit *w;
w = xmalloc(sizeof *w);
w->length = len;
w->val = 0;
w->str = xmalloc(w->length);
return w;
}
int main() {
Wotsit *william;
william = newwotsit(20);
snprintf(william->str, william->length,
"Hello, world\n");
return 0;
}
Step through this code
It's a very good idea to define a function whose
job is to allocate the space for a structure and to do any
initialisation.
Memory that is no longer required must be released using free() otherwise our program
will gradually use more and more memory until the computer runs
out (this is called a memory leak). However, memory is
automatically freed when the program finishes.
free()d memory must never be referenced or freed again.
Thus the absolute rule is that we can't use memory after we've
freed it and we can free it only once. This typically means that
calls to free() are done in the reverse order to
calls to malloc(). To reuse the above example we might
define a function freewotsit():
void freewotsit(Wotsit *w) {
free(w->str);
free(w);
}
The above version is correct. Had we written:
free(w);
free(w->str); // Wrong!!
It would have been very, very wrong.
Step
through a complete example of the correct code.
Keeping links consistent
If a person changes ther phone number without
telling their friends then a lot of people's contacts lists will be
incorrect and trying to dial their friend will result in "number
unobtainable" or a wrong number.
In practice there are at least two technical measures in place
to reduce this problem for telephones:
- Each phone has a removeable SIM card to which the number is
assigned.
- If you change networks your old network is legally
required to transfer your old number.
Of course, none of these work for land-lines when you move house, but
it is a sign of just how inconvenient changing a phone number can be.
When we dynamically-allocate structures we will often have several
pointers pointing to them. Each of these will need to be updated
(often by setting them to NULL).
So the challenge is to keep links consistent when we free memory,
whether this is done via free() or realloc().
Failure to do this is a notorious source of bugs.
When freeing data we must take care of any pointers
pointing to it.
The reason why structures are normally
allocated dynamically instead if just declaring them is that we
want to be able to create and destroy them as the program
progresses, for example if a user opens a new tab in a web
browser. How do we keep track of these structures?
Allocating memory is the easy bit and in reality is just the
first of three tasks:
- Getting more
memory.
- Keeping track of
the memory we have obtained.
- Releasing it when we have finished with it,
and handling any pointers that point to it..
We have already covered 1 (malloc) and 3 (free());
this section is about 2.
As mentioned in a previous
lecture, we can call malloc as often as we like to give ourselves
more and more structures but there's a problem: once we have lost
the value of a pointer to memory allocated by malloc
there is no way to find it out again so how do we remember where
all our structures are?
The following code illustrates the problem: it allocates
memory for n structures but promptly forgets where the
first n-1 are stored:
for (int i = 0; i < n; ++i)
p = malloc(sizeof *p);
Step through this
code
The first n-1 structures are thus inaccessible: we have
allocated the memory but forgotten where it was. It's like writing
down someone's phone number and losing the piece of paper we wrote
it on - they have a perfectly good phone but we can't ring it
because we've lost the number.
One solution
We can see one possible solution by
looking at the step-though of the previous code where we
have a set of allocated chunks of memory but have forgotten the
address of all but one.
Could each allocation remember the addres of the previous one?
However, each allocated chunk of
memory has an inviting space at the end and the obvious thought is
"could we use the last part of each chunk to store the address of
the previous chunk?"
One solution to the problem of remembering the
addresses of an unknown number of chunks of allocated memory is
for each allocated chunk to remember the address of the previous
one.
Lists of people's favourite names or favourite colours
Suppose we have a large group of people
sitting in order and we want to be able to recite all their
names, favourite colours, etc. without writing them down.
In principle we could do this
by:
- Remembering the name of the first person
- Getting each person to remember the name of the person
after them
- Having the last person know that there is nobody after
them
Then somebody could recite the names of the whole class,
or some other piece of information such as their favourite colour.
Before thinking of how we might create such a list of people,
let's imagine how we would use such a list to recite everybody's
favourite colour.
Reciting the list of the class's names
To recite the list we can just say:
- Go to the first person on the list
- Ask them the name of the next person on the list.
- Go to that person.
- Go to step 1.
However, this has a problem: we have failed to handle part
3 of the illustration: telling the last person there is nobody after them.
[Class exercise.]
Adding a person to the list
If we were to implement this with people
then the basic algorithm for adding somebody to the start of the list
would be:
- Find a new person.
- Tell the new person their next person is the current first person.
- The new person is now the new first person on the list.
This leads us to the idea of
making a list
of pointers. To make a list we need to add a new element to our
structure: a pointer to the next one. The pointer is NULL if there
is no next one, ie this element is the last one in the list:
typedef struct particle {
float mass;
float x[NDIM];
struct particle *next;
} Particle;
The structure definition contains an extra member
which is a pointer (usually called "next") to the next
structure in the list.
In the following code the function
newparticle allocates
and initialises a
new structure, and adds it on
to the beginning of an existing linked list:
#include <stdlib.h>
#include <stdio.h>
#define NDIM 2
typedef struct particle {
float x[NDIM];
float mass;
struct particle *next;
} Particle;
struct particle * newparticle(Particle *first) {
Particle *p;
p = malloc(sizeof *p); printf("Please enter the mass and x and y values.\n");
scanf("%f %f %f", &p->mass, &p->x[0], &p->x[1]);
p->next = first;
return p;
}
int main() {
int i, n;
Particle *first = NULL;
printf("How many structures would you like?\n");
scanf("%d", &n);
for (i = 0; i < n; ++i)
first = newparticle(first);
return 0;
}
Step through this code
Notice:
- We initialised the first element to NULL to show that the
list started off empty: this is vital.
- We put all the checking that malloc() didn't fail
into a utility function called xmalloc() that we can
now use just like malloc() but without the tedious
checking.
It is essential to initialise the first member of
the list to NULL.
The following
for loop loops over every element of the
list in turn printing out their mass and co-ordinates. The pointer
p is initially equal to
first and then becomes
first->next and so on.
for (p = first; p != NULL; p = p->next)
printf("%f (%f, %f)\n", p->mass, p->x[0], p->x[1[);
A complete example for study later
#include <stdlib.h>
#include <stdio.h>
#define NDIM 2
typedef struct particle {
float x[NDIM];
float mass;
struct particle *next;
} Particle;
struct particle * newparticle(Particle *first) {
Particle *p;
p = malloc(sizeof *p); printf("Please enter the mass and x and y values.\n");
scanf("%f %f %f", &p->mass, &p->x[0], &p->x[1]);
p->next = first;
return p;
}
int main() {
int i, n;
Particle *first = NULL;
printf("How many structures would you like?\n");
scanf("%d", &n);
for (i = 0; i < n; ++i)
first = newparticle(first);
for (Particle *p = first; p != NULL; p = p->next)
printf("Mass: %f position: (%g, %g)\n", p->mass, p->x[0], p->x[1]);
return 0;
}
Step through this code
To go over every element of a linked list use: for(p
= first; p != NULL; p = p->next)
Preserving the order of the list
Our current newparticle() function
creates the list in reverses order. This is not a problem
if this is just a way of keeping track of the items, but it may be that
the order is in fact significant.
If we wish to preserve the order
we can tweak the algorithm slightly:
- Allocate a new structure and set its next
member to NULL
- If the list is empty just return the address of the new structure.
Otherwise:
- Find the last member of the current list.
- Set its next
member to point to the new structure.
- Return the address of the first member of the list.
Note that in this case the start of the list changes only
if the list was initially empty.
The code for finding the last element of the list is similar
to the code for going over the whole list except that instead
of stopping when
The code looks like this:
struct particle *newparticle(Particle *first) {
Particle *p, *last = first;
p = xmalloc(sizeof *p);
printf("Please enter the mass and x and y values.\n");
scanf("%f %f %f", &p->mass, &p->x[0], &p->x[1]);
p->next = NULL;
if ( first == NULL )
return p;
while(last->next != NULL)
last = last->next;
last->next = p;
return first;
}
Step
through a complete example
This introduces an extra complication: as well as freeing the
structure itself we also have to remove it from the list.
To remove the first member of a list we simply set the new
first member to old_first->next and free old_first.
For every member except the first we find its predecessor and
change its next pointer to the next of the
one we want to free:
Initial list
- A -> B -> C -> D -> NULL
After removing 'B'
- A -> C -> D -> NULL
free(B)
After removing 'A'
- C -> D -> NULL
free(A)
If we are removing the first element of the list
then the first element of the list will be the former second member
(or NULL)
Otherwise we must traverse the list looking for the to-be-deleted
member's predecessor
Two things to look out for
It would be good to check for two
other possibilities:
- The case when the function is asked to remove "nobody" (or NULL
in pointer-speak). This may seem strange ("if there is nothng to do
why would they bother to call it?") but remember two of our core
principles for functions:
The purpose of a function is to make life easier for the person calling it.
And its consequence:
Decisions about what a function should do, or whether the function should exist at all, are
usually answered by asking the question "what makes it
easier for the person who calls this function?"
And it is easier to write:
freesomething(pointer);
Than:
if (pointer != NULL)
freesomething(pointer);
And it is also safer for our function to do the check than for it
to crash.
- The possibility of a mistake, ie asking to remove something
that is not actually a member of the list. We will discover this when
we go down the list looking for the predecessor of our "to go" member:
if we get to the end without finding our predecessor we have made a
mistake.
- A variation of the above is that the list is actually empty.
It's perfectly OK to ask the function to remove "nothing"
and the function must handle this.
Functions should try to check for common errors in calling them.
Removing a member of a list
Stripped-down version
Let's first look at a version that does not do any of the error checking:
struct particle * freeparticle(Particle *togo, Particle *first) {
Particle *parent;
if (togo == first) first = first->next;
else { for (parent = first; parent->next != togo; parent = parent->next) {
}
parent->next = togo->next;
}
free(togo);
return first;
}
The empty for() loop looks a little unusual, it is because
the mere act of finding the end of the loop is actualy the task we
want to acheive.
Full version
Here we incorporate the "nothing to do" and error checks:
struct particle * freeparticle(Particle *togo, Particle *first) {
Particle *parent;
if (togo == NULL) return first;
else if ( first == NULL ) {
fprintf(stderr, "Trying to remove something from an empty list!\n");
return NULL;
}
if (togo == first) first = first->next;
else { for (parent = first; parent->next != togo; parent = parent->next) {
if (parent->next == NULL) {
fprintf(stderr, "%p isn't in the list!\n", togo);
return first;
}
}
parent->next = togo->next;
}
return first;
}
Step through
a complete example of this code
Forgetting to free() the data
As always, forgetting to free() the dataleads
to "orphaned" blocks of allocated memory.
Step through
a version of this code that forgets to free the memory.
Forgetting to free() the dataleads
to "orphaned" blocks of allocated memory.
The text of each key point is a link to the place in the web page.