User Tools

Site Tools


haas:spring2020:unix:labs:labb

Corning Community College

CSCS1730 UNIX/Linux Fundamentals

Lab 0xB: Data Manipulation

~~TOC~~

Objective

To explore some aspects of data manipulation and data security on the system.

Reading

Please reference the following manual pages:

  • dd(1)
  • md5sum(1)
  • diff(1)
  • bvi(1)
  • hexedit(1)
  • file(1)

Background

The dd(1) utility, short for data dump, is a tool that specializes in taking data from a source file and depositing it in a destination file. In combination with its various options, we have the capability of more fine-grained access to data that would otherwise not be as convenient using the standard data manipulation tools (cp(1), split(1), cat(1)).

Copying

To illustrate the basic nature of dd(1), we will perform a file copy. Typically, dd(1) is given two arguments: the source of the data, and the destination of the data.

When given just a source and a destination, dd(1) will happily copy (from start to finish), the source data to the destination location (filling it up from beginning to end). The end result should be identical to the source.

For example:

lab46:~$ dd if=/usr/bin/uptime of=howlong
9+1 records in
9+1 records out
4912 bytes (4.9 kB) copied, 0.0496519 s, 98.9 kB/s
lab46:~$ 

Here, if= specifies the source (input file) of our data, and of= specifies the destination (output file) for the data.

Doing some comparisons:

lab46:~$ ls -l /usr/bin/uptime howlong
-rwxr-xr-x 1 root  root 4912 May  4  2010 /usr/bin/uptime
-rw-r--r-- 1 user lab46 4912 Nov 13 14:57 howlong
lab46:~$ 

Investigating

1. Answer me the following:
a.What is different about these two files?
b.What is similar?
c.If dd(1) copies (or duplicates) data, why do you suppose these differences exist?
d.What is the output of file(1) when you run it on both of these files?
e.When you execute each file, is the output the same or different?
f.Any prerequisite steps needed to get either file to run? What were they?

Consistency of data has been a desire of computer users long before computers were readily available. To be able to verify the authenticity of two works of data, minimizing the chances of some hidden alteration or forgery is an important capability to possess.

Comparisons

Although many ways exist, there are two common ways of comparing two files:

  • diff(1): compares two files line by line, indicating differences (useful for text files)
  • md5sum(1): computes an MD5 hash of a file's contents, creating a unique data fingerprint
2. Answer me the following:
a.Are /usr/bin/uptime and howlong text files or binary files? What is your proof?
b.Using diff(1), verify whether or not these files are identical. Show me the results.
c.Using md5sum(1), verify whether or not these files are identical. Show me the results.
d.Using md5sum(1), compare the MD5 hash of one of these files against /bin/cp, is there a difference?
e.How could an MD5 hash be useful with regards to data integrity and security?
f.In what situations could diff(1) be a useful tool for comparing differences?

Exercise

3. Do the following:
a.Using dd(1), create a 8kB file called “test.file” filled entirely with zeros.
b.How did you do this?
c.How could you verify you were successful?
d.If you ran echo “more information” » test.file, what would happen?
e.Can you find this information in test.file? Where is it (think in terms of file offsets)
f.If you wanted to retrieve the information you just added using dd(1), how would you do it?

Hint: When on the subject of viewing the contents of non-text files, the typical tools we regularly use likely will not be of much help. Explore bvi(1) and hexedit(1).

In the data/ subdirectory of the UNIX Public Directory is a file called data.file

Please copy this to your home directory to work on the following question.

4. Applying your skills to analyze data.file, do the following:
a.How large (in bytes) is this file?
b.What information predominantly appears to be in the first 3kB of the file?
c.Does this information remain constant throughout the file? Are there ranges where it differs? What are they?
d.How would you extract the data at one of these ranges and place it into unique files? Extract the data at each identified range.
e.How many such ranges of data are there in this file?
f.Run file(1) on each file that hosts extracted data. What is each type of file?
g.Based on the output of file(1), react accordingly to the data to unlock its functionality/data. Show me what you did.

Conclusions

This assignment has activities which you should tend to- document/summarize knowledge learned on your Opus.

As always, the class mailing list and class IRC channel are available for assistance, but not answers.

haas/spring2020/unix/labs/labb.txt · Last modified: 2014/04/15 05:18 by 127.0.0.1