Physics and Astronomy |
Back to top
On this page
Contents Dealing with textIntroductionSo far we have been dealing with character strings, (or just "strings" for short) enclosed inside double quotes ("") without really saying what they are. As well as the fixed strings we have been dealing with, C allows us to deal with individual characters as well as modifiable (or writeable) strings via arrays of characters. C also provides a range of standard routines for inspecting and modifying characters and strings. This lecture just deals with strings using the English character set. Should you wish to write a multi-lingual program you will need to investigate unicode and in particular UTF-8. Single charactersC denotes single characters using single quotes, for example: char mychar = 'a'; C denotes single characters using single quotes. A char is a one-byte integerSince the data in a computer's memory consist of zeros and ones, not letters, C stores characters by assigning a one-byte integer value to each character. When a character value is stored in memory it is this one-byte integer that is written. When library routines such as printf() read this one-byte integer from memory and are instructed to interpret it as an character they translate the integer value back to the appropriate printable character. Therefore, technically a char is a one-byte integer. Even though internally C treats characters as a type of integer one byte long, we should never rely on any particular character having any particular integer value. A char is a one-byte integer The format for a single character is %cSingle characters have the "%c" format specifier to printf() and scanf(). %c is unusual in that on input it does not skip white space by default. To read in a single character skipping white space put a space before the %c: Note the space in " %c". char mychar;
printf("Please enter a single character, spaces will be ignored\n");
scanf(" %c", &mychar); // Note blank space before %c
The format for a single character is %c and to read in a single character skipping white space put a space before the %c (you nearly always want to do this). Avoiding a problemIf we start a program and type: "7<return>8<return> then instinctively we think we have entered two characters ('7' and '8'). But actually we have entered 4 characters: "7\n8\n". When reading in a number then scanf() will always skip white space (such as the newline in this input), but when reading in a character we get to choose whether we want the '\n' or to skip over it. We nearly always want the latter. But in the following code the second call to scanf() specifically asks to read in a character without skipping white space: scanf("%d", &intvar); scanf("%c", &charvar); // Doesn't skip spaces So intvar will have the value 7 as expected but charvar will have the value '\n'.The following code skips spaces and charvar will have the value '8' as expected. scanf("%d", &intvar); scanf(" %c", &charvar); // Skips spaces
Don't reply on ASCII
|
Useful <ctype.h> functions | |||
---|---|---|---|
Function | Description | Example (True) | Example (False) |
isalpha() | Alphabetic (letter: does not include '_') | isalpha('A') | isalpha('7') isalpha('_') |
isupper() | Upper case letter | isupper('A') | isupper('a') isupper('?') |
islower() | Lower case letter | islower('a') | islower('A') islower('?') |
isdigit() | Decimal digit | isdigit('4') | isdigit('B') |
isalnum() | Letter or decimal digit | isalnum('m') isalnum('9') |
isalnum'?') |
isspace() | Space | isspace(' ') isspace('\t') |
isspace('G') |
ispunct() | Punctuation | ispunct('?') | ispunct('w') |
The following function looks at a character string to count the number of letters, digits, punctuation characters and spaces.
#include <stdio.h> #include <ctype.h> // // Count the number of letters, digits, punctuation // characters and spaces in a well-known phrase // void countem(const char string[]) { int alphas = 0, digits = 0, spaces = 0, punct = 0, others = 0; int i; for(i = 0; string[i]; ++i) { if (isalpha(string[i])) ++alphas; else if (isdigit(string[i])) ++digits; else if (isspace(string[i])) ++spaces; else if (ispunct(string[i])) ++punct; else ++others; } printf("The string: %s\n" "contains %d letters, %d digits, %d spaces,\n" "%d punctuation characters and %d other characters\n", string, alphas, digits, spaces, punct, others); } int main() { char hello[] = "Hello, world"; countem(hello); return 0; }
You will note that the character string is declared as "const char" as the string itself is not modified. Here we have passed a (modifiable) character array but we could have just as easily passed a fixed string.
The output is:
The string: Hello, world contains 10 letters, 0 digits, 1 spaces, 1 punctuation characters and 0 other characters
The functions toupper() and tolower(), also defined in ctype.h, also take a single character as an argument and return it converted to upper or lower case. Usefully, if the argument is a non-letter, or is already the correct case, they just return the argument.
The following function translates its argument string to upper case.
#include <stdio.h> #include <ctype.h> // // Convert a string to UPPER CASE // void shout(char string[]) { int i; for(i = 0; string[i] != '\0'; ++i) { string[i] = toupper(string[i]); } } int main() { char helloworld[16] = "Hello, world"; printf("The original string is: %s\n", helloworld); shout(helloworld); printf("The upper-case string is: %s\n", helloworld); return 0; }
Notice that we have dropped the "const" qualifier to the argument to shout() and we have had to copy our favourite phrase into a (modifiable) character array; calling shout() with a fixed string would have led to a segmentation error.
The output is:
The original string is: Hello, world The upper-case string is: HELLO, WORLD
Notice how the 'H' and ',' are unchanged.
#include <ctype.h> has various character tests as well as toupper() and tolower().
The file <string.h> provides a number of functions
for dealing with strings, not
individual characters.
Useful <string.h>
functions
Function
Description
int strlen(string)
String length, eg:
strlen("abc") == 3
char * strncpy(dest, source,
nbytes)
String copy
(See example and warning)
char * strncat(dest, source,
nbytes)
Concatenate two strings
int strcmp(string1, string2)
Compare two strings
char * strchr(string, char)
Find char inside string,
returns NULL if not found.
char * strstr(string1, string2)
Find string2 inside string1,
returns NULL if not found.
Note that strlen returns the length of the string, not size of the array it is stored in. For example the code will print the value three, not twelve:
char string[12] = "abc"; printf("The length of \"%s\" is %d\n", string, strlen(string));
Note too how we have used the backslash to put a double-quote character inside a quoted string.
strlen(string returns the length of the string
The strncpy(destination, source, n) function copies at most n bytes of the source string to the destination. The syntax is supposed to be reminiscent of the not-allowed destination = source with a sanity check. It returns the address of the destination string which is occasionally useful if we want to use this as the argument to another function, but not often.
The strncpy() has a subtle flaw: although it correctly refuses to write over the end of the destination array (good), in the case that the length of the source is larger than n the final character written to the destination array is not zero, it is the nth byte of the source. This means that the destination does not have a terminating zero and hence anything that tries to treat it as a C string will run off the end.
One solution is to make the destination array one byte "too big" and initialise it all to zeros. That way the final byte of the destination is always zero, as in this example:
// // Safer way to use strncpy() to handle overflows // #define N 8 int main() { char buffer[N+1] = ""; // One byte longer and initialised to all zeros strncpy(buffer, "Hello, world", N); printf("The truncated string is: %s\n", buffer); return 0; }
It's worth noting that we have used the fact that when we initialise an array with fewer elements that it needs the rest are all set to zero.
Of course, another way to solve the strncpy() problem is to always call strncpy() with argument N-1, or to write a "wrapper" function called, say mystrncpy(), that always writes a zero to the final byte:
The type "size_t" here means something like "a type of integer large enough to hold the length of a really long string"
#include <string.h>
char *mystrncpy(char *dest, const char *src, size_t n) {
strncpy(dest, src, n);
dest[n-1] = '\0';
return dest;
}
strncpy() rather unsafely copies a string
The strncat() function catenates (joins) the contents of the second argument onto the end of the first, the first character of the second argument over-writing the zero at the end of the first. strncat() does not have the same flaw as strncpy() in that if it runs out of space the final character is zero, but it should be remembered that the final argument is the maximum number of bytes to be copied (appended or catenated), not the maximum final length of the resultant string. Try this (assuming N is large enough for the initial string):
char dest[N] = "And I think to myself, "; ... strncat(dest, "Hello, world", N - strlen(dest));
Both strncpy() and strncat() have versions without the 'n' and the final maximum-length argument, strcpy() and strcat(). These are best avoided.
Both strncpy() and strncat() return their first argument, so for example we may use one as the argument to another:
char text[N]; strncat(strncpy(text, "Hello,", N/2), " world", N/2);
The strcmp(str1, str2) function returns the value zero if str1 and str2 are the same, a (strictly) negative integer if str1 comes before str2 in the alphabet and a (strictly) positive integer if str1 comes after str2 in the alphabet. The classic use is to check for equality by seeing if strcmp() returns zero.
As always with arrays we cannot test for string equality using str1 == str2 as this tests to see if the addresses of the two strings are the same, i.e. they both refer to the same character array, rather than two character arrays which contain the same string.
The function strstr(str1, str2) tells us if str1 contains the string str2. (The return value of strstr() can also tell us where in str1 str2 appears.)
Here is an example:
#include <stdio.h> #include <string.h> /* * Simple demo of strcmp() and strstr() */ int main() { char str1[] = "Hello, world", str2[] = "world"; if (strcmp(str1, str2) == 0) printf("The strings \"%s\" and \"%s\" are the same\n", str1, str2); else printf("The strings \"%s\" and \"%s\" are different\n", str1, str2); if (strstr(str1, str2) != 0) printf("String: \"%s\" DOES contain the string \"%s\"\n", str1, str2); else printf("String: \"%s\" does NOT contain the string \"%s\"\n", str1, str2); return 0; }
The output is:
The strings "hello, world" and "world" are different String: "hello, world" DOES contain the string "world"
#include <string.h> has various string functions such as strlen() and strncpy().
Sometimes we wish to use a very long character string that wraps over the side of the page. C solves this for us by the rule that if two strings (not characters!) are separated by white-space they are joined together. Since such strings tend to have new-lines in them, this is a convenient place to break them although it is not compulsory. For example:
printf("Menu\n\n" "1. Hot dog\n" "2. Burger\n" "2. Cheeseburger\n" "4. Veggie-burger\n" "5. Double espresso\n" "6. Coffee with milk and stuff\n\n");
The above seven lines are interpreted as one huge string. It should be noted that C does not insert spaces when joining strings together, if we want spaces we must do that for ourselves within the individual strings. Also, there are no commas between the strings, if there were they would be treated as seven separate strings, not one large one.
Strings separated by white spaces (no commas!) are joined together to make one large string.
The snprintf() functions acts like printf() except that it takes two additional arguments before the format: the name of a character array for the output to go into and the maximum number of bytes to be written to it (which is usually just the length of the array). It "prints" the output into the character array rather than to the screen. It is defined inside stdio.h, just like printf().
There is also a function sprintf() (no 'n') that omits the protection of the maximum length. We do not recommend you use it.
The following example demonstrates snprintf() protecting us against trying to write some text into a buffer that is not large enough.
#include <stdio.h> #define N 8 /* * Demonstrate snprintf protecting against a buffer overflow. * We have "accidently" made the buffer too short for the text. */ int main() { char buffer[N]; int i; snprintf(buffer, N, "Hello, world\n"); /* Print out the individual bytes for information */ for(i = 0; i < N -1; ++i) printf("Byte %d: %d\t'%c'\n", i, buffer[i], buffer[i]); printf("Final byte: %d\n", buffer[N-1]); printf("%s", buffer); return 0; }
The output is:
Byte 0: 72 'H' Byte 1: 101 'e' Byte 2: 108 'l' Byte 3: 108 'l' Byte 4: 111 'o' Byte 5: 44 ',' Byte 6: 32 ' ' Final byte: 0
Notice that unlike strncpy(), snprintf() does the right thing with the final zero byte.
Most of the time we want to read in data from the keyboard or from a file stored on the computer. Occasionally, however, we may have a text string that contains the data we want to "read in".
The sscanf() function is exactly like fscanf() except that the first argument is a string or character array which is used as the source of the data, rather than an external file. It can be thought of as the opposite to snprintf() and is also defined inside stdio.h.
char mystring[] = "12 34"; int j, k; sscanf(mystring, "%d %d", &j, &k);
The variables j and k take their values from the character array mystring[] and hence have their values set to 12 and 34 respectively. There is no input from any file or the keyboard and mystring[] is not altered in any way.
snprintf() and sscanf() "print" to and read from character arrays in the same way as printf() and scanf().
This is a fairly advanced topic.
We've seen two problems with our simple reading of a string into a character array: it's hard to know how big to make the character array and scanf("%s") does not handle spaces. We will deal with the first of these now and the second in a later lecture. The general method is:
Since this is a little complicated we shall make this into a function which we can write once, forget about and call whenever we need it.
A good function is either considerably more complicated to implement than to describe or saves the same code being repeated more than once in the program.
#include <stdio.h> #include <stdlib.h> // Read in a long word, allocate the resulting character array // Later we will encounter a more useful version of this function #define BUFLEN 1024 char *readaword(void) { char input[BUFLEN], *output = NULL; int len; if ( scanf("%s", input) != 1 ) { fprintf(stderr, "Out of input!!\n"); exit(99); } // Now allocate the final string and copy the input to it len = 1 + strlen(input); // +1 for closing '\0' output = malloc(len); // NB: sizeof *output == 1 if ( output == NULL) { fprintf(stderr,"Out of memory!\n"); exit(98); } strncpy(output, input, len); return output; } int main() { char *word; printf("Please enter a long word\n"); word = readaword(); printf("Wow, %s\n\tis a long word!\n", word); return 0; }
Given that a character array can be thought of as a writeable string it immediately follows that we can have an array of them:
// Demonstrate an array of writeable strings #include <stdio.h> #define STRMAX 64 #define NPL 2 // Number of players int main() { char names[NPL][STRMAX] = { 0 }; for (int p = 0; p < NPL; ++p) { printf("Player %d, please enter your forename\n", p + 1); scanf("%s", names[p]); printf("Thanks %s.\n", names[p]); } // Do stuff here... return 0; }
We can have arrays of character arrays, just like any other array type.
for (int j = 0; j < M; ++j) for (int k = j + 1; k < M; ++k) {
For situations where we do not know the length of the text (such as peoples' names) a better approach is to have an array of pointers and to dynamically allocate the character array:
int main() { char *name[2] = {NULL, NULL}; for (int i = 0; i < 2; ++i) { printf("Player %d please enter your first name\n", i+1); name[i] = readaword(); } printf("Welcome %s and %s.\n", name[0], name[1]); return 0; }
An even better way is to dynamically allocate the pointers too. We can also do a trick a bit like the zero at the end of a text string: we can make the array of pointers one too big and add a NULL pointer at the end. We can then pass the pointer to the names to a function and it can work out when the end of the array is itself:
void welcome(char **name) { for (int i = 0; name[i] != NULL; ++i) printf("Welcome %s.\n", name[i]); } int main() { char **name = NULL; int n; printf("How many names are there?\n"); scanf("%d", &n); name = xmalloc((1+n) * sizeof *name); name[n] = NULL; for (int i = 0; i < n; ++i) { printf("Player %d please enter your first name\n", i+1); name[i] = readaword(); } welcome(name); return 0; }
The text of each key point is a link to the place in the web page.