Corning Community College CSCS1320 C/C++ Programming ~~TOC~~ ======Project: STRUCTURES and FILE ACCESS (sfa0)====== =====Objective===== To begin the exploration of structures, and to explore file access functionality. =====References===== Some notable references: * Chapter 6 ("Structures") in [[http://publications.gbdirect.co.uk/c_book/chapter6/|The C Book]] for additional information on structures (and derived types). * Chapter 9 ("Libraries"): * Chapter 9.10 ("Input and Output") in [[http://publications.gbdirect.co.uk/c_book/chapter9/input_and_output.html|The C Book]] for information on I/O functionality. * Chapter 9.11 ("Formatted I/O") in [[http://publications.gbdirect.co.uk/c_book/chapter9/formatted_io.html|The C Book]] for information on using fprintf()/fscanf(). =====Background===== This project will deal with two independent concepts: * structures (and derived types) * file access Please note, the two are not related, although as with many things, may often be used together (file access is **extremely** useful, and will often find pairings with all our covered topics- variables, selection statements, loops, pointers, arrays, and now structures too). I've held off on really covering it until we've gotten more of the basics down, so that you can better understand the power it offers to your programming toolkit. But first things first, the structure. ====structures==== In some respects, there are 2 classifications of variables: * scalar: it is (and can only be) precisely one thing. Things like **char**, **int**, and **float** fit this classification. They can only interact with their defined type, and they can only hold one such value of that type. * composite: composed of potentially many things. We actually have two main composite types in C: * arrays: composed of 1 or more of the same type (like 5 **int**s all packed together) * structures: composed of 1 or more of any type (like a **char**, 2 **ints**, an array of **float**, pointers, even additional structures) Both **arrays** and **structures** need to be given substance: they are nothing on their own. The same goes for **pointers**, of which all arrays are a type of (all arrays are pointers, but NOT all pointers are arrays, unless you consider them an array of 1). In that respect they are sort of like **adjectives** (descriptor attributes), they describe a property of a **noun** (or thing where attributes can be applied). For example: * The green frog ate the cricket. The adjective "green" DESCRIBES an attribute of the frog. "green" by itself does not make sense in that context: * The green ate the cricket. See what I mean? The green WHAT?? Same goes with arrays and structures. They both enhance things, but cannot be singular entities by themselves: * an array of **int** allows us to have 1 or more consecutive integers: * int my_array[4]; * but we cannot just have an "array": * my_array[4]; ... an array of WHAT? The compiler will yell. Same thing with structures, only structures (or the **struct** keyword) let us pack in any combination of types (including arrays and structs). Just for clarification, you can also have "arrays of arrays" (argv, the second main() function argument, is one in fact) and "arrays of structs". But an array specifically focuses on duplicating ONE THING, whatever that thing is... a struct encapsulates a collection of potentially disparate things. ====struct definition syntax==== Declaring a struct may appear similar to declaring a function, in that there are braces and things inside those braces. For instance: struct stuff { int value; char code; short int range[999]; char *name; }; Here we have a struct (specifically, a "stuff struct"-- that is the type, as structs are nothing on their own) that contains 4 entities, of varying types: * a scalar int * a scalar char * an array (composite) of short int * a pointer (to be used as a string, or array of char) to char Note one important distinguishing syntactical detail of structures: you **MUST** terminate them with a semi-colon (they are a variable, and variables are terminated with semi-colons). ====declaring a structure==== To declare a struct, we pretty much do the same as when declaring a variable. In our case, if we have a "stuff struct" as defined above, we'd need to **declare** an instance of it in order to make use of it. Let's make a "stuff struct" variable by the name of **thing**: struct stuff thing; Bam! Please take note of how precisely like any other variable this declaration is... in our case, **thing** is of the "type" //struct stuff//. ====accessing array elements==== In C (and by extension, C++), there are two structure access operators, depending on whether or not the structure is a pointer (yes, we can even have pointers to structs). ===non-pointer structure access=== If we are dealing with a non-pointer struct, we use the '.' (dot) operator to access the structure member (so, using our **thing** variable declared above to assign information to its members): thing.value = 37; thing.code = 'C'; thing.range[59] = 1337; Especially with arrays and pointers (especially if used as something like a string), we'll probably be routinely combining them with loops, as we cannot access ALL elements in ONE statement. ===pointered structure access=== If the struct we are dealing with has been declared as a pointer (which is the common approach in regular usage, especially as used and interacted with in the C library functions): struct stuff *something; There are actually two ways of accessing it, using the pointer dereferencing operator, or using the structure pointer operator. Both arrive at the same ends (the structure pointer operator is a shortcut to make our code look nicer). ==pointer dereferencing (note the parenthesis)== (*something).value = 64; ==structure pointer access== something -> value = 64; The structure pointer just makes the code look cleaner, so it is the recommended way of accessing elements (when the structured variable has been declared as a pointer, that is). ====Files==== While information is unique down to bits, and the computer accesses data in units of bytes, information is often packaged and made available to us in containers known as **files**. Like a structure, there is no stipulation on what sort of data goes in a generic file; although like an array, a file can be accessed as a sequence of bytes, from a starting index or offset. C and other production-oriented programming languages provide file I/O functionality, which can greatly increase the usability of the programs we write. From reading in external data sets to loading/storing pertinent data in specific patterns (ie formats), file access, like pointers and memory access, contribute to the power and influence of C (and its derivatives). ====Point of access: pointer or descriptor==== C provides two streamlined ways of accessing a file, and various groups of C library functions to access in conformance with the access method: * FILE pointers (**FILE ***) * file descriptors (**int**) FILE pointers are just that: a pointer to a special type of struct, which provides an interface to the file's contents. File descriptors are perhaps slightly more polished, abstracting away more of the low-level details of file access, instead creating an interface around a unique numeric value (somewhat like a "take a number" approach to service). For the purposes of this project and this course, we will be focusing on file access via the use of FILE pointers. Just be aware that there are corresponding functions that make use of the file descriptor concept. ====Methods of file access==== To interact with a file, we must do so in accordance with a fixed set of actions (which are woven into the various C library file functions), some common ones of which are: * create * open * read * write * append * execute * remove * close We'll be specifically focusing on opening, reading, writing, and closing files for this project. A point of distinction on "write" vs. "append": when you open a file for writing, you start from its beginning, overwriting and existing content; when you open for appending, you start at its end, adding to (appending) existing content. ====Declaring a FILE pointer==== To access a file, we must first have an instance of the file access interface to interact with. As indicated above, this comes in the form of a FILE pointer, which for us will take the form of a variable: FILE *fp = NULL; There's nothing magical about the name; you may find "fp" being a common variable name used for file pointers (fp = file pointer), but in the case of multiple file access, even seeing names like **in**, **out**, **inp**, **outp** is not uncommon. Again, the idea is to make your variable names descriptive enough so as to be a form of documentation in and of themselves. ====Opening a file==== To gain access to a file, we must formally OPEN it. The C library provides us with the **fopen()** function, which takes 2 arguments: - PATH and name of the file (location) - means of access (mode) If no extensive path information is given, the program knows to look in the current working directory. For portability, any program seeing wider usage should be referencing an absolute path to reduce potential access complications. As for modes, there are 3 main modes we will be focusing on (there are other combinations, but generally are utilized in more advanced usage; just stick with these for now): * read: "r" * write: "w" * append: "a" Both the location and mode of the parameters are strings (arrays of char, with a terminating NULL terminator). If we wanted to open the file "output.txt" for writing, we would say: fopen("output.txt", "w"); **fopen()** returns a FILE pointer, so in order to make use of it, we need to connect its return value with our FILE pointer, as follows: fp = fopen("output.txt", "w"); If there was a problem opening the file by the requested mode, **fopen()** returns **NULL** (a great thing to check for to ensure if we were successful or not). ====Writing to a file==== At this point, we can write (or output) to this "output.txt" file, via our **fp** variable. Output can be done with a familiar function we've been using all along: **fprintf()**, the first argument of which is a **FILE pointer**. This would write "hello, world!\n" to our output file: fprintf(fp, "hello, world!\n"); This is why I've been having us use **fprintf()** all along (instead of the **printf()** shortcut), so that the interface would already be familiar to you, and for you to conceptually see that outputting to the screen and outputting to a file are indistinguishable (because they are the same thing to the operating system: EVERYTHING is a file). ====Reading from a file==== If we had instead opened our file for reading, we could read from it the same way we obtain keyboard input: **fscanf()** For example, this will read an unsigned short integer from the file pointed to by our FILE pointer (but ONLY if we opened it for reading): fscanf(fp, "%hu", &value); ====When the data ends: End of File checking==== When you are reading from a file and have exhausted its contents, the last character read from the file should be a special EOF symbol, and various other status bits are likely flipped in our FILE pointered struct to indicate that the end of file has been reached. A good function to use is **feof()**, which takes a FILE pointer to check, and it will return a nonzero value if the end of file has been reached (great for using in selection statements or loops as a combined check and termination combo!) ====Responsible file access==== Doing any digging you will see that it is entirely within our ability to open files for reading AND writing; I would caution you against this, because there are issues of data corruption and/or data loss at stake if we're not careful. For now, just focus on ONE action, the current intended action. If you need to switch back and forth, **close** the file and open it in the new desired mode. That will maintain the integrity of your data, especially when first learning (as we are now). In time when the complexity/demands of your programs calls for it, you can start to dabble and experiment with such functionality. ====closing a file==== Just as you are the highly responsible and respectful individual when returning rented VHS tapes to the rental store, or loaned cassettes or 8-tracks from the library (WHAT?), you have been kind and put the media in a state where it is no longer connected to your processing environment (you ejected it). In the case of file access in C, that means remembering to **CLOSE** the file when we are done with it, and that can be done with the **fclose()** function: fclose(fp); Having a file open allocates additional resources. Forgetting to close the file when done keeps those resources in use. Granted, while they will be deallocated on program exit (and our programs are all quick to execute at this point), it is a good habit to perform proper file management by closing files when we are done with them, so when our programs are more complex, that won't be a bug that needs tracking down. =====Program===== For this project, mixing together all the skills we've previously learned and just learned is in order. In **/var/public/cprog/sfa0/** is a file called **datafile.db**, which contains several records of the following format: NAME # 1S 1T 2S 2T 3S 3T ... #S #T Where: * the first 8 characters correspond to an ALL UPPERCASE name of an individual (name doesn't have to occupy all 8 characters, but will not exceed 8 characters) * the # refers to the number of score pairs associated with that individual (will not exceed 127) * then a set of number pairs (labeled S and T, for Score and Total), that are associated with that individual (individual numbers will not exceed 65535) It will be your task to write a program that: * opens that file: * via its absolute path * for reading * load the data for each line into a custom struct you've made that contains the following elements: * place to store the person's name * store the person's name with a leading Uppercase, but all other characters represented in lowercase; for example: MOANA becomes Moana * array to store the person's scores * array to store the corresponding totals for each score * array to store the average of each score:total pair * element to store the tally of all the scores * element to store the tally of all the totals * element to store the average of the averages * element to store the average of the tallied scores:tallied totals * opens the local file: **sfa0.out** for **writing** * stores the processed results you have in memory (in your structs), in the following format: Name:#:scoreTally:scoreTotal:avgofAverages:averageofTallies:#s,#t;...;2s,2t;1s,1t Of particular note: * Name of individual is that changed Uppercase lead-in letter followed by all lowercase * category fields are separated by colons ':' * averages should be truncated 2 places after the decimal point * if rounding occurs, so be it; if not, don't worry about it * the written out score pairs are done so in reverse order (last to first, although score still precedes total) * the score is separated from the tally by a comma ',' * the field separators in the score pairs field are semi-colons ';' For example, if the source data was: KRIS 2 13 17 9 18 The corresponding line written out to **sfa0.out** would be: Kris:2:22:35:63.25:62.86:9,18;13,17 Additional constraints: * use FILE pointers and FILE pointer-oriented C library functions (fopen(), fprintf(), fscanf(), fclose()) * close all open files when done * you must have and use 2 FILE pointers =====Review of Compiling/Executing===== Just to review the compilation/execution process for working with your source code, if we had a file, **hello.c**, that we wished to compile to a binary called **hello**, we'd first want to compile the code, as follows: lab46:~/src/cprog$ gcc -Wall --std=c99 -o hello hello.c lab46:~/src/cprog$ Assuming there are no syntax errors or warnings, and everything compiled correctly, you should just get your prompt back. In the event of problems, the compiler will be sure to tell you about them. Conceptually, the arrangement is as follows: gcc -Wall --std=c99 -o BINARY_FILE SOURCE_FILE The BINARY_FILE comes **immediately after** the **-o**, **NOT** the SOURCE_FILE (it must never **immediately** follow a **-o**). It can precede, and such is perfectly valid (especially if you feel that way more intuitive). The **-Wall** (treat all warnings as errors, increase general verbosity about warnings) and **--std=c99** (switch compiler to use the **C99** standard of the C language) are options given to the compiler. To execute your binary, we need to specify a path to it, so we use **./**, which basically references the current directory: lab46:~/src/cprog$ ./hello Hello, World! lab46:~/src/cprog$ =====Submission===== To successfully complete this project, the following criteria must be met: * Code must compile cleanly (no warnings or errors) * Use the **-Wall** and **--std=c99** flags when compiling. * Output must be correct, and resemble the form given in the sample output above. * Code must be nicely and consistently indented (you may use the **indent** tool) * Code must utilize the algorithm presented above * Code must establish and utilize the functions described above * Code must be commented (and those comments relevant) * Track/version the source code in a repository * Submit a copy of your source code to me using the **submit** tool. To submit this program to me using the **submit** tool, run the following command at your lab46 prompt: $ submit cprog sfa0 sfa0.c sfa0.out Submitting cprog project "sfa0": -> sfa0.c(OK) -> sfa0.out(OK) SUCCESSFULLY SUBMITTED You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.