User Tools

Site Tools


haas:fall2010:data:using_files

~~TOC~~




Corning Community College


Data Structures



Simple File Access using fprintf()/fscanf()

Objective

To look at how to access files (reading, writing, appending) in order to load or store data from our programs.

Background

Files are one of the fundamental information units we deal with on the computer. Under UNIX/Linux systems, there are three types of files:

  • regular
  • directory
  • special

Although there are ways to access directory and special files, in this document we will only be looking at accessing regular files. Composed of bytes of data, regular files are viewed as two different types from most programming languages:

  • text
  • binary

A text file consists of ASCII characters, and are meant to be accessed by text viewing facilities (text editors, for instance). All source code you've written would be located in regular text files.

A binary file consists of non-ASCII, binary data. The data in this file is uniquely stored according to some format (as determined by application)… it could contain instructions, raw data, image data, audio data, etc.

This document will be addressing the accessing of text files.

With files, there are three modes of access:

  • read - retrieve data from the file (file pointer starts at beginning of file)
  • write - store data to the file (file pointer starts at beginning of file)
  • append - store data to the file (file pointer starts at the end of file)

When you access the file, you indicate the type of access you desire. Data is accessed at the location of the file pointer, which is basically accessing the file at some offset.

When you read data, it is retrieved from the file, and the file pointer then appropriately advanced (to the next “field” or “record” of data). When writing/appending, a similar operation takes place– data is stored in the file at the offset of the file pointer, and afterwards, the file pointer is located after that data.

Program 1: Reading a list of numbers

Our first program will involve reading a list of numbers from a file and displaying them to STDOUT (standard output).

data file

Our data file is as follows:

datafile
5
66
2
234
33
45
12
31
9

As you can see, the format is pretty straightforward; single number per line.

The formatting IS quite important. Your program is written to expect data in a certain format.. if we messed up and had 2 numbers on a line instead of one in an instance, it could lead to some problems.

the program

The program to access and display this data is as follows:

file1.c
#include <stdio.h>
#include <stdlib.h>
 
int main()
{
    FILE *fPtr;
    int value = 0;
    char filename[] = "datafile";
 
    if ((fPtr = fopen(filename, "r")) != NULL)
    {
        while(fscanf(fPtr, "%d", &value) != EOF)
        {
            printf("Value retrieved from file: %d\n", value);
        }
        printf("Retrieval COMPLETE!\n");
        fclose(fPtr);
        value = 0;
    }
    else
    {
        printf("ERROR! Problem opening `%s'.\n", filename);
        value = 1;
    }
    return(value);
}

Of note here are the following:

  • FILE: this indicates a variable data type which will be a file pointer. This is the point of access for our file.
  • fopen: this function (in the C library), handles the opening of a file. It returns NULL if there is an error.
  • fscanf: a variant of the scanf() function, it accepts a file pointer argument and reads from a file instead of STDIN.
  • EOF: a symbol we can check for that represents the end of a file
  • fclose: this will close the file we've been accessing (important: always remember to close your files when done!)

compiling

Usual procedure to compile:

lab46:~/src$ gcc -o file1 file1.c
lab46:~/src$ 

executing

If we have datafile located in the same directory as our executable, running the program will appear as follows:

lab46:~/src$ ./file1
Value retrieved from file: 5
Value retrieved from file: 66
Value retrieved from file: 2
Value retrieved from file: 234
Value retrieved from file: 33
Value retrieved from file: 45
Value retrieved from file: 12
Value retrieved from file: 31
Value retrieved from file: 9
Retrieval COMPLETE!
lab46:~/src$ 

If datafile is missing (or we rename it), resulting execution will take a different path:

lab46:~/src$ mv datafile datafile.bak
lab46:~/src$ ./file1
ERROR! Problem opening `datafile'.
lab46:~/src$ 

As you'll see in the code, we wrapped an if statement around our call to fopen(), so if there was an error opening the file, fPtr would be set to NULL, and the else would be executed. Ah, robustness of code.

Program 2: Reading a list of users and numbers

Our second program will involve reading a list of users (and each user has a corresponding id number) from a file and displaying them to STDOUT (standard output).

data file

Our data file is as follows:

datafile
bob 32367
jimmy 12784
sue 3832
ian 24383
amy 29233
betsy 34211
gil 17443

Once again, note the formatting… there is a space between the user and number.

the program

The program to access and display the information is as follows:

file2.c
#include <stdio.h>
#include <stdlib.h>
 
#define USERLEN 8
 
int main()
{
    FILE *fPtr;
    short int value = 0;
    char *user;
    char filename[] = "data";
 
    user = (char *) malloc (sizeof(char)*(USERLEN+1));
 
    if ((fPtr = fopen(filename, "r")) != NULL)
    {
        while(fscanf(fPtr, "%s %hd", user, &value) != EOF)
        {
            printf("%hd retrieved from file for user %s\n", value, user);
        }
        printf("Retrieval COMPLETE!\n");
        free(user);
        fclose(fPtr);
        value = 0;
    }
    else
    {
        printf("ERROR! Problem opening `%s'.\n", filename);
        value = 1;
    }
    return(value);
}

The UserIDs are short integers, so instead of the usual %d, we specify %hd (check the manual for other type alterations).

compiling

Usual procedure to compile:

lab46:~/src$ gcc -o file2 file2.c
lab46:~/src$ 

executing

If we have data located in the same directory as our executable, running the program will appear as follows:

lab46:~/src$ ./file2
32367 retrieved from file for user bob
12784 retrieved from file for user jimmy
3832 retrieved from file for user sue
24383 retrieved from file for user ian
29233 retrieved from file for user amy
-31325 retrieved from file for user betsy
17443 retrieved from file for user gil
Retrieval COMPLETE!
lab46:~/src$ 

Everything appears as it should, but if you take a close look on the line for betsy, you'll notice her UserID is displayed as -31325 instead of the 34211 that is in the file (this is unrelated to file access, it is merely a quizzing of programming knowledge).

Why is this?


Answer to question above (select the dotted box area to view):

The 34211 becomes a -31325 because we are dealing with signed short integers, and the range for such a data type is: -32768 to +32767. 34211 exceeds the upper range of this data type, so it “rolls over” into the negative range.

Program 3: Writing data to a file

The first two examples dealt with reading data from a file. Now we will look at writing data.

program

Source code follows:

file3.c
#include <stdio.h>
#include <stdlib.h>
 
int main()
{
    FILE *fPtr;
    char name[20];
    int age;
 
    printf("Please enter your name (max 19 characters): ");
    scanf("%s", name);
    printf("Hello, %s, what is your age? ", name);
    scanf("%d", &age);
    printf("Storing results to output file . . . ");
    if ((fPtr = fopen("output.file", "a")) != NULL)
    {
        fprintf(fPtr, "%s %d\n", name, age);
        printf("done.\n");
        fclose(fPtr);
    }
    else
    {
        fprintf(stderr, "ERROR writing output file.\n");
        exit(1);
    }
 
    return(0);
}

Notable changes from the other examples include:

  • fprintf: works like printf(), but outputs to specified file pointer.
  • “a” option in fopen(): “r” means open for read, “w” is open for write, and “a” is open for append.

compiling

Again, nothing out of the ordinary with respect to compilation:

lab46:~/src$ gcc -o file3 file3.c
lab46:~/src$ 

execution

Go ahead and run the program:

lab46:~/src$ ./file3
Please enter your name (max 19 characters): bob
Hello, bob, what is your age? 46
Storing results to output file . . . done.
lab46:~/src$ 

And check the output file (conveniently named output.file, as specified in file3.c):

lab46:~/src$ cat output.file
bob 46
lab46:~/src$ 

Run the program again and check the output file. You will see what happens with we open the file in append mode versus write mode. If you want to see what write mode will do, go into file3.c and change the mode in the fopen() call, recompile, and execute to witness the difference in behavior.

questions

Why do you suppose we had the program prompt the user for a 19 character maximum name, when we had declared name to be an array of 20 characters?


A “string” (aka array of characters) is terminated with a null terminator, “\0”. This symbol, much like the end of line character, “\n”, is used to signify when there is no more data. If we put in a 20 character name, there would be no room left in the array for the null terminator, and undesirable side effects could occur.


In our code, when we used scanf() to obtain the name, we did not need an & before name. Why is this?


Our array, name is already a pointer to character data, so if we were to pass it the address, it would complain during compilation that we passed it a parameter of type “char *”.


haas/fall2010/data/using_files.txt · Last modified: 2010/09/19 17:35 by 127.0.0.1