\\
Corning Community College
\\
UNIX/Linux Fundamentals
\\
\\
Case Study 0xA: Data Manipulation
\\
\\
~~TOC~~
=====Objective=====
To explore some aspects of data manipulation and data security on the system.
=====Reading=====
Please reference the following manual pages:
* **dd**(**1**)
* **md5sum**(**1**)
* **diff**(**1**)
* **bvi**(**1**)
* **hexedit**(**1**)
* **file**(**1**)
=====Background=====
The **dd**(**1**) utility, short for //data dump//, is a tool that specializes in taking data from a source file and depositing it in a destination file. In combination with its various options, we have the capability of more fine-grained access to data that would otherwise not be as convenient using the standard data manipulation tools (**cp**(**1**), **split**(**1**), **cat**(**1**)).
====Copying====
To illustrate the basic nature of **dd**(**1**), we will perform a file copy. Typically, **dd**(**1**) is given two arguments: the source of the data, and the destination of the data.
When given just a source and a destination, **dd**(**1**) will happily copy (from start to finish), the source data to the destination location (filling it up from beginning to end). The end result should be identical to the source.
For example:
lab46:~$ dd if=/usr/bin/uptime of=howlong
9+1 records in
9+1 records out
4912 bytes (4.9 kB) copied, 0.0496519 s, 98.9 kB/s
lab46:~$
Here, **if=** specifies the source (input file) of our data, and **of=** specifies the destination (output file) for the data.
Doing some comparisons:
lab46:~$ ls -l /usr/bin/uptime howlong
-rwxr-xr-x 1 root root 4912 May 4 2010 /usr/bin/uptime
-rw-r--r-- 1 user lab46 4912 Nov 13 14:57 howlong
lab46:~$
====Investigating====
^ 1. ^|Answer me the following:|
| ^ a.|What is different about these two files?|
|:::^ b.|What is similar?|
|:::^ c.|If **dd**(**1**) copies (or duplicates) data, why do you suppose these differences exist?|
|:::^ d.|What is the output of **file**(**1**) when you run it on both of these files?|
|:::^ e.|When you execute each file, is the output the same or different?|
|:::^ f.|Any prerequisite steps needed to get either file to run? What were they?|
Consistency of data has been a desire of computer users long before computers were readily available. To be able to verify the authenticity of two works of data, minimizing the chances of some hidden alteration or forgery is an important capability to possess.
====Comparisons====
Although many ways exist, there are two common ways of comparing two files:
* **diff**(**1**): compares two files line by line, indicating differences (useful for text files)
* **md5sum**(**1**): computes an MD5 hash of a file's contents, creating a unique data fingerprint
^ 2. ^|Answer me the following:|
| ^ a.|Are **/usr/bin/uptime** and **howlong** text files or binary files? What is your proof?|
|:::^ b.|Using **diff**(**1**), verify whether or not these files are identical. Show me the results.|
|:::^ c.|Using **md5sum**(**1**), verify whether or not these files are identical. Show me the results.|
|:::^ d.|Using **md5sum**(**1**), compare the MD5 hash of one of these files against **/bin/cp**, is there a difference?|
|:::^ e.|How could an MD5 hash be useful with regards to data integrity and security?|
|:::^ f.|In what situations could **diff**(**1**) be a useful tool for comparing differences?|
=====Exercise=====
^ 3. ^|Do the following:|
| ^ a.|Using **dd**(**1**), create a 8kB file called "test.file" filled entirely with zeros.|
|:::^ b.|How did you do this?|
|:::^ c.|How could you verify you were successful?|
|:::^ d.|If you ran **echo "more information" >> test.file**, what would happen?|
|:::^ e.|Can you find this information in **test.file**? Where is it (think in terms of file offsets)|
|:::^ f.|If you wanted to retrieve the information you just added using **dd**(**1**), how would you do it?|
__Hint:__ When on the subject of viewing the contents of non-text files, the typical tools we regularly use likely will not be of much help. Explore **bvi**(**1**) and **hexedit**(**1**).
In the **data/** subdirectory of the UNIX Public Directory is a file called **data.file**
Please copy this to your home directory to work on the following question.
^ 4. ^|Applying your skills to analyze **data.file**, do the following:|
| ^ a.|How large (in bytes) is this file?|
|:::^ b.|What information appears to be in the first 3kB of the file?|
|:::^ c.|Does this information remain constant throughout the file? Are there ranges where it differs? What are they?|
|:::^ d.|How would you extract the data at one of these ranges and place it into unique files? Extract the data at each identified range.|
|:::^ e.|How many such ranges of data are there in this file?|
|:::^ f.|Run **file**(**1**) on each file that hosts extracted data. What is each type of file?|
|:::^ g.|Based on the output of **file**(**1**), react accordingly to the data to unlock its functionality/data. Show me what you did.|
=====Conclusions=====
This assignment has activities which you should tend to- document/summarize knowledge learned on your Opus.
As always, the class mailing list and class IRC channel are available for assistance, but not answers.