~~TOC~~ \\ \\ \\ Corning Community College \\ Data Structures \\ \\ Simple File Access using fprintf()/fscanf() =====Objective===== To look at how to access files (reading, writing, appending) in order to load or store data from our programs. =====Background===== Files are one of the fundamental information units we deal with on the computer. Under UNIX/Linux systems, there are three types of files: * regular * directory * special Although there are ways to access **directory** and **special** files, in this document we will only be looking at accessing **regular** files. Composed of bytes of data, regular files are viewed as two different types from most programming languages: * text * binary A **text** file consists of ASCII characters, and are meant to be accessed by text viewing facilities (text editors, for instance). All source code you've written would be located in regular **text** files. A **binary** file consists of non-ASCII, binary data. The data in this file is uniquely stored according to some format (as determined by application)... it could contain instructions, raw data, image data, audio data, etc. This document will be addressing the accessing of **text** files. With files, there are three modes of access: * read - retrieve data from the file (file pointer starts at beginning of file) * write - store data to the file (file pointer starts at beginning of file) * append - store data to the file (file pointer starts at the end of file) When you access the file, you indicate the type of access you desire. Data is accessed at the location of the file pointer, which is basically accessing the file at some offset. When you read data, it is retrieved from the file, and the file pointer then appropriately advanced (to the next "field" or "record" of data). When writing/appending, a similar operation takes place-- data is stored in the file at the offset of the file pointer, and afterwards, the file pointer is located after that data. =====Program 1: Reading a list of numbers===== Our first program will involve reading a list of numbers from a file and displaying them to **STDOUT** (standard output). ====data file==== Our data file is as follows: 5 66 2 234 33 45 12 31 9 As you can see, the format is pretty straightforward; single number per line. The formatting **IS** quite important. Your program is written to expect data in a certain format.. if we messed up and had 2 numbers on a line instead of one in an instance, it could lead to some problems. ====the program==== The program to access and display this data is as follows: #include #include int main() { FILE *fPtr; int value = 0; char filename[] = "datafile"; if ((fPtr = fopen(filename, "r")) != NULL) { while(fscanf(fPtr, "%d", &value) != EOF) { printf("Value retrieved from file: %d\n", value); } printf("Retrieval COMPLETE!\n"); fclose(fPtr); value = 0; } else { printf("ERROR! Problem opening `%s'.\n", filename); value = 1; } return(value); } Of note here are the following: * FILE: this indicates a variable data type which will be a file pointer. This is the point of access for our file. * fopen: this function (in the C library), handles the opening of a file. It returns NULL if there is an error. * fscanf: a variant of the **scanf()** function, it accepts a file pointer argument and reads from a file instead of STDIN. * EOF: a symbol we can check for that represents the end of a file * fclose: this will close the file we've been accessing (important: always remember to close your files when done!) ====compiling==== Usual procedure to compile: lab46:~/src$ gcc -o file1 file1.c lab46:~/src$ ====executing==== If we have **datafile** located in the same directory as our executable, running the program will appear as follows: lab46:~/src$ ./file1 Value retrieved from file: 5 Value retrieved from file: 66 Value retrieved from file: 2 Value retrieved from file: 234 Value retrieved from file: 33 Value retrieved from file: 45 Value retrieved from file: 12 Value retrieved from file: 31 Value retrieved from file: 9 Retrieval COMPLETE! lab46:~/src$ If **datafile** is missing (or we rename it), resulting execution will take a different path: lab46:~/src$ mv datafile datafile.bak lab46:~/src$ ./file1 ERROR! Problem opening `datafile'. lab46:~/src$ As you'll see in the code, we wrapped an **if** statement around our call to **fopen()**, so if there was an error opening the file, **fPtr** would be set to **NULL**, and the **else** would be executed. Ah, robustness of code. =====Program 2: Reading a list of users and numbers===== Our second program will involve reading a list of users (and each user has a corresponding id number) from a file and displaying them to **STDOUT** (standard output). ====data file==== Our data file is as follows: bob 32367 jimmy 12784 sue 3832 ian 24383 amy 29233 betsy 34211 gil 17443 Once again, note the formatting... there is a space between the user and number. ====the program==== The program to access and display the information is as follows: #include #include #define USERLEN 8 int main() { FILE *fPtr; short int value = 0; char *user; char filename[] = "data"; user = (char *) malloc (sizeof(char)*(USERLEN+1)); if ((fPtr = fopen(filename, "r")) != NULL) { while(fscanf(fPtr, "%s %hd", user, &value) != EOF) { printf("%hd retrieved from file for user %s\n", value, user); } printf("Retrieval COMPLETE!\n"); free(user); fclose(fPtr); value = 0; } else { printf("ERROR! Problem opening `%s'.\n", filename); value = 1; } return(value); } The UserIDs are **short integers**, so instead of the usual **%d**, we specify **%hd** (check the manual for other type alterations). ====compiling==== Usual procedure to compile: lab46:~/src$ gcc -o file2 file2.c lab46:~/src$ ====executing==== If we have **data** located in the same directory as our executable, running the program will appear as follows: lab46:~/src$ ./file2 32367 retrieved from file for user bob 12784 retrieved from file for user jimmy 3832 retrieved from file for user sue 24383 retrieved from file for user ian 29233 retrieved from file for user amy -31325 retrieved from file for user betsy 17443 retrieved from file for user gil Retrieval COMPLETE! lab46:~/src$ Everything appears as it should, but if you take a close look on the line for **betsy**, you'll notice her **UserID** is displayed as **-31325** instead of the **34211** that is in the file (this is unrelated to file access, it is merely a quizzing of programming knowledge).\\ \\ **Why is this?** ---- Answer to question above (select the dotted box area to view): The **34211** becomes a **-31325** because we are dealing with **signed short integers**, and the range for such a data type is: **-32768** to **+32767**. **34211** exceeds the upper range of this data type, so it "rolls over" into the negative range. =====Program 3: Writing data to a file===== The first two examples dealt with reading data from a file. Now we will look at writing data. ====program==== Source code follows: #include #include int main() { FILE *fPtr; char name[20]; int age; printf("Please enter your name (max 19 characters): "); scanf("%s", name); printf("Hello, %s, what is your age? ", name); scanf("%d", &age); printf("Storing results to output file . . . "); if ((fPtr = fopen("output.file", "a")) != NULL) { fprintf(fPtr, "%s %d\n", name, age); printf("done.\n"); fclose(fPtr); } else { fprintf(stderr, "ERROR writing output file.\n"); exit(1); } return(0); } Notable changes from the other examples include: * fprintf: works like **printf()**, but outputs to specified file pointer. * "a" option in **fopen()**: "r" means open for read, "w" is open for write, and "a" is open for append. ====compiling==== Again, nothing out of the ordinary with respect to compilation: lab46:~/src$ gcc -o file3 file3.c lab46:~/src$ ====execution==== Go ahead and run the program: lab46:~/src$ ./file3 Please enter your name (max 19 characters): bob Hello, bob, what is your age? 46 Storing results to output file . . . done. lab46:~/src$ And check the output file (conveniently named **output.file**, as specified in **file3.c**): lab46:~/src$ cat output.file bob 46 lab46:~/src$ Run the program again and check the output file. You will see what happens with we open the file in **append** mode versus **write** mode. If you want to see what **write** mode will do, go into **file3.c** and change the mode in the **fopen()** call, recompile, and execute to witness the difference in behavior. ====questions==== Why do you suppose we had the program prompt the user for a **19** character maximum name, when we had declared **name** to be an array of **20** characters? ---- A "string" (aka **array of characters**) is terminated with a **null terminator**, "\0". This symbol, much like the end of line character, "\n", is used to signify when there is no more data. If we put in a 20 character name, there would be no room left in the array for the null terminator, and undesirable side effects could occur. ---- In our code, when we used **scanf()** to obtain the name, we did **not** need an **&** before **name**. Why is this? ---- Our array, **name** is already a pointer to character data, so if we were to pass it the address, it would complain during compilation that we passed it a parameter of type "char *". ----