This is an old revision of the document!

Project: ARCHIVE HANDLING

Objective

To begin putting your skills to work accomplishing tasks and solving problems on the system.

Prerequisites

To successfully accomplish/perform this project, the listed resources/experiences need to be consulted/achieved:

ability to read the manual pages and use the information therein
ability to copy, move, list, remove, and/or link files
ability to navigate around the filesystem

Background

When we talk about archives, there are commonly two separate actions taking place. Sometimes they are intertwined, others they represent discrete steps.

They are:

archiving / extracting
compression / decompression

Archives are merely a manifestation of a common computing concept: a container.

Containers encapsulate things; in this case- files. And the fact that UNIX tries to make everything a file really enhances the viability of this ability.

Compression, on the other hand, is an action performed on a single file. Utilizing various algorithms, we accomplish a sort of “more in less”… we can take the data present and cram it into a smaller box (file)… where the aim is to take up less storage on the filesystem (also makes copying easier).

There are many compression algorithms in existence. There are commonly two categories of compression algorithm:

lossless - no data is lost as a part of the compression process
lossy - unnecessary data is discarded as part of the compression process

Wikipedia has categories identifying various algorithms implemented for both lossless and lossy compression algorithms.

Where confusion may set in is when a tool combines the actions of archival AND compression. But if you think about it, even in such cases, we always end up with one file, and that file is compressed (unless we have a concatenation of separately compressed files into a single file.

Archives are useful in that they let us pack items together. If something needs 100 files, making a copy of that, or copying it/install it onto another system would be made more complex if we had to deal with each of those files individually. Archives simplify the problem in that they can provide us all those files, all contained within a single file (lessening opportunities for error). So, archives make our lives easier.

Procedure

In the UNIX Public Directory you will find a projects/archive_handling subdirectory.

There you will find two existing archives:

archive1.zip
archive2.tar.bz2

You'll probably want to make a copy of these to some working directory in your home directory.

Essentially, I want you to do the following:

Figure out the format of each archive, and read up on the available tools for manipulating them
Extract the contents of the two archives and study them (make sure you keep track of what is in which archive)
Analyze the archive contents and find any corrupt or empty files. They are not needed.
Arrange the remaining content by the following criteria:
- name the files smallest, small, big, biggest, and tack on the appropriate extension
- the smallest file will be whichever file represents the smallest/lightest of the four things (not file size but contextual content of what the file is describing)… similarly as appropriate with small, big, and biggest (the largest/heaviest of the four things)
Create a new archive, called myarchive.tar containing only these size-themed files.
- do NOT store any paths in the archive, just put the files at base level
Compress myarchive.tar on second highest (on the best, not fastest end of the spectrum) compression level in gzip to create the appropriately named myarchive.tar.gz
- also use the -n argument to aid in the verification step below
Submit myarchive.tar.gz using the submit tool.

Reflection

Be sure to provide any commentary on your opus regarding realizations had and discoveries made during your pursuit of this project.

Why do you suppose tar works the way it does?
What might be some benefits of separating archival and compression functionality?

Submission

To successfully complete this project, the following criteria must be met:

Submit a copy of your archive to me using the submit tool.

To submit this program to me using the submit tool, run the following command at your lab46 prompt:

$ submit unix archives myarchive.tar.gz 
Submitting unix project "archives":
    -> myarchive.tar.gz(OK)

SUCCESSFULLY SUBMITTED

You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.

Verify

UPDATE: Turns out I forgot to consider the resulting timestamp of the final archive, so this verification process will ONLY work if you add the -n argument to gzip when you are doing the compression step (note you may be providing gzip with other arguments- you still want those too). With -n, timestamps are not included in the gzip header, which is what was causing the different MD5 sums. With this change, individual verification is now possible.

A quick way to check and see if you did everything right is to compute the MD5sum hash of the archive and see if it matches mine:

lab46:~$ md5sum myarchive.tar.gz
cc247da7f4d4c4d29e8cc4dda2b03f10  myarchive.tar.gz
lab46:~$

That is MY md5sum of the archive, in accordance with the project requirements. If your archive's MD5 sum is identical to mine, then you can rest assured you did it right.

It also seems many have interpret the file naming criteria differently than I had intended (for example, on file size vs. conceptual size of what the images contain). If you did this, you will not get the above checksum… you will like get this one:

lab46:~$ md5sum myarchive.tar.gz
a1eac6601e20f028ff158651359ec890  myarchive.tar.gz
lab46:~$

Ultimately, I will accept either.

Lab46 Wiki

Sidebar

Table of Contents

Project: ARCHIVE HANDLING

Objective

Prerequisites

Background

Procedure

Reflection

Submission

Verify

Lab46 Wiki

User Tools

Site Tools

Sidebar

Table of Contents

Project: ARCHIVE HANDLING

Objective

Prerequisites

Background

Procedure

Reflection

Submission

Verify

Page Tools