~~TOC~~
Corning Community College
Data Structures
Simple File Access using fprintf()/fscanf()
To look at how to access files (reading, writing, appending) in order to load or store data from our programs.
Files are one of the fundamental information units we deal with on the computer. Under UNIX/Linux systems, there are three types of files:
Although there are ways to access directory and special files, in this document we will only be looking at accessing regular files. Composed of bytes of data, regular files are viewed as two different types from most programming languages:
A text file consists of ASCII characters, and are meant to be accessed by text viewing facilities (text editors, for instance). All source code you've written would be located in regular text files.
A binary file consists of non-ASCII, binary data. The data in this file is uniquely stored according to some format (as determined by application)… it could contain instructions, raw data, image data, audio data, etc.
This document will be addressing the accessing of text files.
With files, there are three modes of access:
When you access the file, you indicate the type of access you desire. Data is accessed at the location of the file pointer, which is basically accessing the file at some offset.
When you read data, it is retrieved from the file, and the file pointer then appropriately advanced (to the next “field” or “record” of data). When writing/appending, a similar operation takes place– data is stored in the file at the offset of the file pointer, and afterwards, the file pointer is located after that data.
Our first program will involve reading a list of numbers from a file and displaying them to STDOUT (standard output).
Our data file is as follows:
5 66 2 234 33 45 12 31 9
As you can see, the format is pretty straightforward; single number per line.
The formatting IS quite important. Your program is written to expect data in a certain format.. if we messed up and had 2 numbers on a line instead of one in an instance, it could lead to some problems.
The program to access and display this data is as follows:
#include <stdio.h> #include <stdlib.h> int main() { FILE *fPtr; int value = 0; char filename[] = "datafile"; if ((fPtr = fopen(filename, "r")) != NULL) { while(fscanf(fPtr, "%d", &value) != EOF) { printf("Value retrieved from file: %d\n", value); } printf("Retrieval COMPLETE!\n"); fclose(fPtr); value = 0; } else { printf("ERROR! Problem opening `%s'.\n", filename); value = 1; } return(value); }
Of note here are the following:
Usual procedure to compile:
lab46:~/src$ gcc -o file1 file1.c lab46:~/src$
If we have datafile located in the same directory as our executable, running the program will appear as follows:
lab46:~/src$ ./file1 Value retrieved from file: 5 Value retrieved from file: 66 Value retrieved from file: 2 Value retrieved from file: 234 Value retrieved from file: 33 Value retrieved from file: 45 Value retrieved from file: 12 Value retrieved from file: 31 Value retrieved from file: 9 Retrieval COMPLETE! lab46:~/src$
If datafile is missing (or we rename it), resulting execution will take a different path:
lab46:~/src$ mv datafile datafile.bak lab46:~/src$ ./file1 ERROR! Problem opening `datafile'. lab46:~/src$
As you'll see in the code, we wrapped an if statement around our call to fopen(), so if there was an error opening the file, fPtr would be set to NULL, and the else would be executed. Ah, robustness of code.
Our second program will involve reading a list of users (and each user has a corresponding id number) from a file and displaying them to STDOUT (standard output).
Our data file is as follows:
bob 32367 jimmy 12784 sue 3832 ian 24383 amy 29233 betsy 34211 gil 17443
Once again, note the formatting… there is a space between the user and number.
The program to access and display the information is as follows:
#include <stdio.h> #include <stdlib.h> #define USERLEN 8 int main() { FILE *fPtr; short int value = 0; char *user; char filename[] = "data"; user = (char *) malloc (sizeof(char)*(USERLEN+1)); if ((fPtr = fopen(filename, "r")) != NULL) { while(fscanf(fPtr, "%s %hd", user, &value) != EOF) { printf("%hd retrieved from file for user %s\n", value, user); } printf("Retrieval COMPLETE!\n"); free(user); fclose(fPtr); value = 0; } else { printf("ERROR! Problem opening `%s'.\n", filename); value = 1; } return(value); }
The UserIDs are short integers, so instead of the usual %d, we specify %hd (check the manual for other type alterations).
Usual procedure to compile:
lab46:~/src$ gcc -o file2 file2.c lab46:~/src$
If we have data located in the same directory as our executable, running the program will appear as follows:
lab46:~/src$ ./file2 32367 retrieved from file for user bob 12784 retrieved from file for user jimmy 3832 retrieved from file for user sue 24383 retrieved from file for user ian 29233 retrieved from file for user amy -31325 retrieved from file for user betsy 17443 retrieved from file for user gil Retrieval COMPLETE! lab46:~/src$
Everything appears as it should, but if you take a close look on the line for betsy, you'll notice her UserID is displayed as -31325 instead of the 34211 that is in the file (this is unrelated to file access, it is merely a quizzing of programming knowledge).
Why is this?
Answer to question above (select the dotted box area to view):
The 34211 becomes a -31325 because we are dealing with signed short integers, and the range for such a data type is: -32768 to +32767. 34211 exceeds the upper range of this data type, so it “rolls over” into the negative range.
The first two examples dealt with reading data from a file. Now we will look at writing data.
Source code follows:
#include <stdio.h> #include <stdlib.h> int main() { FILE *fPtr; char name[20]; int age; printf("Please enter your name (max 19 characters): "); scanf("%s", name); printf("Hello, %s, what is your age? ", name); scanf("%d", &age); printf("Storing results to output file . . . "); if ((fPtr = fopen("output.file", "a")) != NULL) { fprintf(fPtr, "%s %d\n", name, age); printf("done.\n"); fclose(fPtr); } else { fprintf(stderr, "ERROR writing output file.\n"); exit(1); } return(0); }
Notable changes from the other examples include:
Again, nothing out of the ordinary with respect to compilation:
lab46:~/src$ gcc -o file3 file3.c lab46:~/src$
Go ahead and run the program:
lab46:~/src$ ./file3 Please enter your name (max 19 characters): bob Hello, bob, what is your age? 46 Storing results to output file . . . done. lab46:~/src$
And check the output file (conveniently named output.file, as specified in file3.c):
lab46:~/src$ cat output.file bob 46 lab46:~/src$
Run the program again and check the output file. You will see what happens with we open the file in append mode versus write mode. If you want to see what write mode will do, go into file3.c and change the mode in the fopen() call, recompile, and execute to witness the difference in behavior.
Why do you suppose we had the program prompt the user for a 19 character maximum name, when we had declared name to be an array of 20 characters?
A “string” (aka array of characters) is terminated with a null terminator, “\0”. This symbol, much like the end of line character, “\n”, is used to signify when there is no more data. If we put in a 20 character name, there would be no room left in the array for the null terminator, and undesirable side effects could occur.
In our code, when we used scanf() to obtain the name, we did not need an & before name. Why is this?
Our array, name is already a pointer to character data, so if we were to pass it the address, it would complain during compilation that we passed it a parameter of type “char *”.