\\ Corning Community College \\ UNIX/Linux Fundamentals \\ \\ Case Study 0x2: Archive Handling \\ \\ ~~TOC~~ ====Objective==== To become familiar with archives, their purpose, and how to create and extract them. ====History==== UNIX has been around since the beginning of time (or the start of the UNIX Epoch at any rate- 11:59:59pm December 31, 1969 or 12:00:00am January 1, 1970), and tapes at that decade were and came into widespread use. So in common UNIX tradition, we have a utility on the system that allows us to create archives. The UNIX tar utility is short for tape archive, and allows us to combine a set of files together as one long string of data for easy storage or transportation. As time went by, it was realized that to better utilize our resources, we could come up with methods of compressing the data, so we could in essence fit more in the same amount (or less) of space. There are many forms of compression in existence. For this course, we will rely on one of the most popular (but not necessarily best compressing): GNU zip, available to us in the gzip and gunzip utilities. ====Background==== Archives have been around forever. They provide an easy way to keep a bunch of files in one place to send to a backup device or to send to another computer. The advantages to backups are tremendous. In the early days, magnetic tapes were (in fact, for the most part they still are the de-facto large volume backup medium) used to backup critical information. Tapes are a linear storage medium- that is, there is a beginning, and an end. The tape head (which can read/write information) can move (or "seek") between the starting and ending point of the tape a fixed speed. A representation is as follows: {{wiki:tape.png|Illustration of a linear data tape}} On this "tape", there are a fixed "n" amount of cells that can each store a block of data. In our example we can see that of the files we can see (F1, F2, F3, and Fn), that F1 takes up 3 blocks on the tape (cells 0, 1, and 2). F2 takes up a single block, as does F3. For a tape where the head is positioned at cell 0, if we wanted to extract file F3, we would have to seek past F1 (all 3 blocks), and F2, before we get to the beginning of file F3. And what about Fn, the last file on the tape? We would have to seek through the entire tape until we get to the end. How would this affect access time for the files? How is this compared to a hard disk or RAM, which is more of a random access medium? (or at least less restricted by its linearness) The other property of the tape archive is that we now have all our files combined into one long (linear) stream of data. Archives in general have this property- the archive starts with some sort of file address table that identifies the offsets of each file from the start of the tape, and are all lumped together. In addition to their obvious benefit of backing up data, archives also are useful in organizing your files into a single location. For example, developers of large software projects (such as the Linux kernel or the Apache project) do not program everything into a single source file. Not only would this be extremely impractical, but it would undoubtedly be tedious to read through. Instead, developers create lots of small files, all of which make up the whole project. Now, let's say you want to download a version of one of these software projects. The Linux kernel probably is composed of tens of thousands of source files. The last things you would want to do is download 20,000 files just to be able to compile your own version. To get around this, archives are used to collect everything up into a single file. Now all you have to do is download that single archive, then extract it to obtain all the individual files. Quite efficient. ====Archives==== As it turns out, many archive formats have appeared over the years. Varying in the way the data is encoded to even integrating some sort of compression algorithm, it can leave for many different ways for many different computers to archive. Although not a definite guide, perhaps some popular archive formats per platform are: ^ Operating System ^ Popular Archive Format | | UNIX+ | tar | | DOS/Windows | zip | | MacOS classic++ | binhex | | MacOS X | dmg | | AtariDOS | arc | | Debian GNU/Linux+++ | deb | | RedHat/SuSE/Fedora | rpm | + UNIX archives are typically also compressed with **gzip**, **bzip2**, or perhaps even **compress**. However many UNIX vendors also provide some sort of package management system to handle system specific archives. ++ MacOS classic archives are typically also compressed with **StuffIt**. +++ While Debian GNU/Linux (and any other Linux distribution for that matter) is for all intents and purposes a UNIX clone, its common archives, or //packages//, are in a custom format for use with its particular package management system. It has been said "the great thing about standards is that there are so many of them". This is true even in archive formats. While there may be several different formats of archives in existance, there is often justifiable reason for having them. Often times, advancements in technique or a new and improved compression algorithm is discovered. New systems often try to adopt newer technologies, not only to distinguish them from predecessors, but to offer genuine improvements to users who will be using that particular system. ====Exercise==== Time to put your skills to the test. ^ 1. ^|From the **cs2/** subdirectory of the UNIX public directory (**/var/public/unix/cs2**):| | ^ a.|Copy the **archive1.tar.gz** and **archive2.zip** files to your home directory.| |:::^ b.|How did you do this?| ^ 2. ^|Using your book or the man pages:| | ^ a.|Determine how to extract both archives. They will both extract into the same directory: **archives/**| |:::^ b.|Record the commands and incantations used to extract both archives. Where did you find this information?| |:::^ c.|Go into the **archives/** directory and rename abc.txt to: **def.text**| |:::^ d.|Descend back to the parent of the **archives/** directory (most likely the base of your home directory).| ^ 3. ^|Using the available resources:| | ^ a.|Create a tar archive of the **archives/** directory and contents.| |:::^ b.|Name the archive: **arc.tar**| |:::^ c.|How did you do this?| ^ 4. ^|Using **gzip**:| | ^ a.|Compress the tar archive.| |:::^ b.|Be sure to use //maximum// compression.| |:::^ c.|The resulting file should be: **arc.tar.gz**| |:::^ d.|Attach the resulting file to an e-mail to be submitted to your instructor.| Being familiar with archiving can help in organizing your own data, as well as packaging data to share with others. ====Concepts==== DOS and Windows systems use the ZIP archive format. However, to create a ZIP archive involves only one step, as opposed to tar'ing and then gzip'ing files on a UNIX system. ^ 5. ^|Thinking on this:| | ^ a.|Does ZIP actually work fundamentally different than tar + gzip? Explain.| |:::^ b.|Do compressed archives make more sense now that you can see the process behind them?| Better understanding the concepts behind the tools we use has many advantages. Not only can we better use the tools with the tasks at hand, but it can also help us to creatively solve other problems. ====Submission==== In addition to the responses to the various questions, be sure to submit the archive file you have created in this assignment. You can use **alpine** to easily attach files to e-mail. When you are composing a message, there is an Attchmnt: line at the top where you can, by hitting **CTRL-T**, select the file that will be the attachment. ^ 6. ^|Additionally, please do the following:| | ^ a.|Run **md5sum** on your compressed archive.| |:::^ b.|What is (copy and paste) the output of this?| ====Conclusions==== All questions in this assignment require an action or response. Please organize your responses into an easily readable format and submit the final results to your instructor. Your assignment is expected to be performed and submitted in a clear and organized fashion- messy or unorganized assignments may have points deducted. Be sure to adhere to the submission policy. When complete, electronically submit your assignment by filling out the following form:
http://lab46.corning-cc.edu/haas/content/unix/submit.php?cs2
As always, the class mailing list is available for assistance, but not answers.