Corning Community College
CSCS1730 UNIX/Linux Fundamentals
======Project: Archive and Data Manipulation (adm0)======
=====Errata=====
This section will document any updates applied to the project since original release:
* __revision #__: (DATESTRING)
=====Objective=====
To begin putting your skills to work accomplishing tasks and solving problems on the system.
=====Prerequisites=====
To successfully accomplish/perform this project, the listed resources/experiences need to be consulted/achieved:
* ability to read the manual pages and use the information therein
* ability to copy, move, and list files
* ability to navigate around the filesystem
=====Toolbox=====
It would be especially useful to review the manual pages or any documentation on the following resources:
* **cp**(**1**)
* **mv**(**1**)
* **ls**(**1**)
* **mkdir**(**1**)
* **tar**(**1**)
* **xz**(**1**)
* **gzip**(**1**)
* **bzip2**(**1**)
* **zip**(**1**)
* **tac**(**1**)
* **rev**(**1**)
* **cat**(**1**)
* **file**(**1**)
* **uudecode**(**1**)
* **md5sum**(**1**)
=====Background=====
When we talk about archives, there are commonly two separate actions taking place. Sometimes they are intertwined, others they represent discrete steps.
They are:
* archiving / extracting
* compression / decompression
Archives are merely a manifestation of a common computing concept: a container.
Containers encapsulate things; in this case- files. And the fact that UNIX tries to make everything a file really enhances the viability of this ability.
Compression, on the other hand, is an action performed on a single file. Utilizing various algorithms, we accomplish a sort of "more in less"... we can take the data present and cram it into a smaller box (file)... where the aim is to take up less storage on the filesystem (also makes copying easier).
There are many compression algorithms in existence. There are commonly two categories of compression algorithm:
* [[http://en.wikipedia.org/wiki/Lossless_data_compression|lossless]] - no data is lost as a part of the compression process
* [[http://en.wikipedia.org/wiki/Lossy_data_compression|lossy]] - unnecessary data is discarded as part of the compression process
Wikipedia has categories identifying various algorithms implemented for both [[http://en.wikipedia.org/wiki/Category:Lossless_compression_algorithms|lossless]] and [[http://en.wikipedia.org/wiki/Category:Lossy_compression_algorithms|lossy]] compression algorithms.
Where confusion may set in is when a tool combines the actions of archival AND compression. But if you think about it, even in such cases, we always end up with one file, and that file is compressed (unless we have a concatenation of separately compressed files into a single file.
Archives are useful in that they let us pack items together. If something needs 100 files, making a copy of that, or copying it/install it onto another system would be made more complex if we had to deal with each of those files individually. Archives simplify the problem in that they can provide us all those files, all contained within a single file (lessening opportunities for error). So, archives make our lives easier.
=====Procedure=====
In the UNIX Public Directory you will find a **spring2018/unix/adm0/** subdirectory.
There you will find directory files by the names of all the users in the class this semester. Locate yours and go into it. There you should find four files:
* adm0a.zip
* adm0b.tar.gz
* adm0c.tar.bz2
* adm0d.tar.xz
You'll want to make a copy of these files to some project-specific working directory in your home directory (**~/src/unix/projects/adm0/**, perhaps?)
Essentially, I want you to do the following:
- Figure out the format of the archive, and read up on the available tools for manipulating it
- Extract the contents of the archive and study it (contents will extract to the current working directory, so you WILL want to be in a custom project directory)
- Analyze the files extracted from the archive. Each file will ultimately be readable plain text (in English), but some may be encoded or compressed or otherwise manipulated and will need further processing to get to the final readable state.
* Once in their readable states, name the files **a**, **b**, **c**, **d**, **e**, **f**, **g**, **h**, in order of their file sizes (in bytes), from least to greatest.
- Place these single-lettered files in a new **tar** archive called **adm0e.tar** (files should be added to the archive in the current directory, do not embed any directory information in the archive).
- Compress it (max compression) with **gzip**; it should now be called **adm0e.tar.gz**
* you are going to submit this archive
- In addition to the created archive, you will also submit a text file named **adm0steps** which will contain step-by-step command-lines used to copy, extract, manipulate, rename, create a new archive and compress **adm0e.tar.gz**
* The file should JUST contain the exact commands you used, in order from start to finish. If you'd like to add any additional commentary, prefix it with a # sign.
* Commands should be left justified, one command-line per line (lines can wrap).
* With the exception of referencing data in the UNIX public directory (that should be in absolute paths), all manipulations within your home directory should be using relative paths.
=====Verification=====
One of the tests I will perform for output compliance of your code will involve comparing your program's output against a range of input values, to see if they all output in conformance with project specifications.
I will make use of a checksum to verify exactness.
You will need to run this from your adm0 project directory, where your individual a-h files are located.
You can check your project by typing in the following at the prompt:
lab46:~/src/unix/adm0$ pchk unix adm0
If all aligns, you will see this:
==================================================
= UNIX adm0 project output validation tool =
==================================================
adm0 checksum is: 9d24739da4078bdd36edd9fcdeef3c1b
your checksum is: 9d24739da4078bdd36edd9fcdeef3c1b
==================================================
verification: SUCCESS!
==================================================
If something is off, your checksum will not match the adm0 checksum, and verification will instead say "**MISMATCH**", like follows (note that a mismatched checksum can be anything, and likely not what is seen in this example):
==================================================
= UNIX adm0 project output validation tool =
==================================================
adm0 checksum is: 9d24739da4078bdd36edd9fcdeef3c1b
your checksum is: 810f36299b37f46896c3624920cdefbb
==================================================
verification: MISMATCH
==================================================
=====Submission=====
To successfully complete this project, the following criteria must be met:
* Submit a copy of your archives to me using the **submit** tool.
To submit this program to me using the **submit** tool, run the following command at your lab46 prompt:
$ submit unix adm0 adm0e.tar.gz adm0steps
Submitting unix project "adm0":
-> adm0e.tar.gz(OK)
-> adm0steps(OK)
SUCCESSFULLY SUBMITTED
You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.
I'll be evaluating the project based on the following criteria:
52:adm0:final tally of results (52/52)
*:adm0:archive submitted [4/4]
*:adm0:archive has correct name of adm0e.tar.gz [4/4]
*:adm0:archive is max compressed with gzip [4/4]
*:adm0:archive is a tar archive [4/4]
*:adm0:archive extracts into current directory [4/4]
*:adm0:archive contains 8 english readable files [4/4]
*:adm0:archived files are named a-h [4/4]
*:adm0:archived files named in order of size [4/4]
*:adm0:instructions submitted in text file [4/4]
*:adm0:instructions in file named adm0steps [4/4]
*:adm0:adm0steps contains list of instructions for accomplishing task [4/4]
*:adm0:adm0steps instructions are accurate and correct [4/4]
*:adm0:adm0steps any extra information after hash mark [4/4]