Corning Community College
CSCS1730 UNIX/Linux Fundamentals
======Project: BINARY DATA PROCESSING (bdp0)======
=====Errata=====
Typos and bug fixes:
* (DATESTAMP)
=====Objective=====
Use your UNIX skills and tools at hand to enable you to solve a problem in the realm of raw data management and data recovery.
=====Background=====
As a side job to help you through school, you've become employed at a local microblogging and meme archival firm as their head UNIX IT lead. Your run-of-the-mill tasks include setting up single-purpose web pages and web-browsable images to aid the researchers in tracking the evolution of memes.
Everything was going fine, until one day a researcher, with freshly obtained meme from a multi-seeded bittorrent transfer, experienced a hard drive failure.
Preservation of this meme is downright critical to on-going research, and with seconds to spare before the system locks up, you manage to do a memory dump of the region of RAM containing the downloaded meme data, and transfer it to another system before it becomes unresponsive.
The last thing you see on the screen before the system locks up is a hex address of the table of contents and its octal length:
* address (in hex): **0x1ced3**
* length (in octal): **130**
Hard drive replaced and OS reinstalling on the researcher's computer, your task is now of equal importance: pick out the file fragments from the raw memory dump, and assemble them all into one file, meeting specifications laid out by the researchers and chief meme archivist.
The air is thick with anticipation.
This is the moment you've been working towards your whole life.
You pause and do a quick tai chi exercise to calm the mind and gather some inner energy. Eyes closed. Deep breath in. Deep breath out. Your eyes snap open and shine with a fierceness and determination that would make any obfuscated data quiver.
It is go time.
=====Obtain the file=====
This week's project is located in the **udr0/** subdirectory of the UNIX Public Directory, in a file called: **memdump.ram**
There is a companion file called **dectohex.c**, which may be of some value, directly or indirectly.
Make a copy of these into your home directory somewhere and set to work.
**NOTE:** Hopefully it has been standard practice to locate project files in their own unique subdirectory, such as under **src/unix/**, where you can then add/commit/push the results to your repository (you ARE regularly putting stuff in your repository, aren't you?)
=====Process=====
The file you seek has been broken up into separate parts, each potentially encoded or encapsulated in some way.
To make matters more interesting, the file fragments are located in a raw memory dump, which you'll have to perform some minor data recovery techniques on to get them out and further massage them.
There is a table of contents index located within this memory dump... it is of the following format:
-toc-filename:offset,length;filename2:offset,length;...;-toc-
To make things more interesting, the **offset** is stored as a hexadecimal value.
The **length** is recorded in octal. It represents the total number of bytes contained in the entry (including the start).
Be mindful of the base.
Luckily, you know where to get the table of contents from memory. From there, you can reconstruct the means to access the remaining file fragments.
=====Useful tools=====
You may want to become familiar with the manual pages of the following tools (in addition to tools you've already encountered):
* **dd**(1)
* **bc**(1)
* **netpbm**(1)
* **pnmscale**(1)
Additionally, looking through any companion files provided in this project may offer you some unique value.
====Forbidden tools====
For the purposes of this project, you are not permitted to use any hex editing tools (hexdump, bvi, etc.) to assist you in producing the solution. Any steps relying in any part on the use of these tools will see credit lost. The aim of this project is to test your low-level data manipulation and calculation skills.
You are also forbidden from using any external-to-the-system translation tools (like hex/dec/oct calculators on the web). You must perform any and all calculations using tools on lab46 (and explain your usage of them in your list of instructions!).
You can certainly use other tools to help you in better determining your steps to solution.
====quick dd primer====
Those with little patience and low observation skills are often quick to label **dd** a difficult or weird tool. While it is true that **dd** is no **ls**, it is a powerful tool, quite useful in its particular domain.
**dd** is referred to as a **d**ata **d**ump or **d**ata **d**uplicator... namely, its task is to copy information, and to do so very well.
In some respects, it is like **cp**, only vastly more capable, as it sees and allows you control over more of the file (a file is made up of bytes-- **cp** just copies the whole file, because it works in units of files; **dd**, on the other hand, sees a file as a sequence of bytes).
In other respects, it is like **cat** (then again, one can also perform file copies with **cat**)... in that it reads input and produces output.
Unlike **cp** and **cat**, **dd** specializes in byte-level operations. Both **cat** and **cp** are limited to operating on the entire file as a basic operation. **dd** goes a bit deeper.
===Example 0: dd as cp===
If we had a file, **/etc/motd**, and we wanted to make a copy of it in our current working directory under the name **thing**, we could do this:
lab46:~/src/unix/udr0$ dd if=/dev/motd of=thing
1+1 records in
1+1 records out
859 bytes (859 B) copied, 0.000149696 s, 5.7 MB/s
lab46:~/src/unix/udr0$
As previously stated, **dd** specializes in byte level operations. So it is a far more articulate file copy.
We see two options being used with **dd**:
* if= this specifies what the input file will be
* of= this specifies what the output file will be
And with this, **dd** will read from /etc/motd and output to thing (in current directory).
As it is, **dd** was able to operate on the file as one chunk (similar to how **cp** or **cat** would work), but we can go deeper.
===Example 1: Fine-Grained cp with dd===
For example, to specifically copy the file byte-by-byte:
lab46:~/src/unix/udr0$ dd if=/etc/motd of=thing bs=1
859+0 records in
859+0 records out
859 bytes (859 B) copied, 0.00400408 s, 215 kB/s
lab46:~/src/unix/udr0$
Notice the number of records has changed to match the file size (there are 859 bytes in the file, so a byte by byte copy would result in 859 records). Also note the transfer speed went down... 859 byte transfers is a lot more expensive than 1 larger information transaction.
That new option, **bs**, allows for the setting of the block size. In this case, we're setting the block size of that **dd** transaction to 1 byte from its default.
===Example 2: dd as cat===
To simulate **cat** using **dd**, we merely instruct it where to send its data:
lab46:~/src/unix/udr0$ dd if=/etc/motd of=/dev/tty
##############################################################################
## __ _ _ _ __
## | | __ _| |__ / | |_/ / LAIR Public Shell Machine
## | |__/ _` | '_ \\_ _/ _ \
## |_____\__,_|_.__/ |_|\___/ Lab46 is the CCC Computer & Information
## --------------------------- Science public shell box for student course-
## c o r n i n g - c c . e d u work, projects, and skills exploration.
##
## PLEASE USE THE SYSTEM, LAIR, AND RELATED RESOURCES RESPONSIBLY!
##
## LAB46 RESOURCES:
## website: http://lab46.corning-cc.edu/
## help form: http://lab46.corning-cc.edu/help_request
## help contact: haas@corning-cc.edu or wedge@lab46.corning-cc.edu
##
## USAGE INFORMATION:
## basic usage: type 'usage' at the prompt
## check mail: type 'alpine' at the prompt; broken? type 'fixmail'
##
1+0 records in
1+0 records out
859 bytes (859 B) copied, 0.000158424 s, 5.4 MB/s
lab46:~/src/unix/udr0$
Due to "everything being a file", displaying to STDOUT is merely specifying a file that corresponds to your terminal screen.
And if you wanted to redirect it to a file? It's STDOUT, so your I/O redirectors will work as expected.
There are many additional options in **dd**, so it is highly recommended you read through the manual page and experiment.
===example 3: grabbing the last 200 bytes===
Let's say we wanted to only grab the last 200 bytes of that 859 byte file.
**cp** and **cat** would have some difficulty easily doing this on their own... perhaps with other tools this could be facilitated, but it falls within the capabilities of what **dd** can do (it does what it does extremely well).
It turns out there is a **skip** option to **dd**, that let's it skip ahead some number of blocks in the input before it starts processing. We want to use that to grab the last 200 bytes in the file.
First, we need to figure out how much to skip... knowing the file is 859 bytes, and desiring only the last 200 bytes, we use a little math:
859
-200
===
659
So, let's give it a shot:
lab46:~/src/unix/udr0$ dd if=/etc/motd of=/dev/tty skip=659
dd: ‘/etc/motd’: cannot skip to specified offset
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000252174 s, 0.0 kB/s
lab46:~/src/unix/udr0$
Wut rohs! Seems there is some problem.
This is what tends to throw beginners for a loop with **dd**, they forget that **dd** does not assume a block size of 1 byte, but something larger. Well, we just told **dd** to skip 659 blocks into the file... and if a block is more than 1 byte (we saw in our first examples it was at least 859 bytes), then this would be an unreasonable/nonsensical request.
So let us fix it, by specifying a block size of 1:
lab46:~/src/unix/udr0$ dd if=/etc/motd of=/dev/tty bs=1 skip=659
s@corning-cc.edu or wedge@lab46.corning-cc.edu
##
## USAGE INFORMATION:
## basic usage: type 'usage' at the prompt
## check mail: type 'alpine' at the prompt; broken? type 'fixmail'
##
200+0 records in
200+0 records out
200 bytes (200 B) copied, 0.00165582 s, 121 kB/s
lab46:~/src/unix/udr0$
=====Submission=====
Successful completion will result in the following criteria being met:
* Resulting image has been scaled approximately 2x to a resolution of 414x418
* Image has been converted to **PNG** format and named **meme0531.png**
* Image has been placed in your Lab46 webspace, in a **unix/udr0/** directory which is searchable to the web server (world search); image is world readable.
* No superfluous permissions should be present for group/other. User obviously needs adequate permissions for you to manipulate it.
* All parent directories need to also be world searchable in order to function
* Setting all permissions could result in your home directory being accessed by third parties. ONLY set the minimum required permissions.
* Aside from user permission, group should have no permissions set.
* ONLY the indicated permission for world should be set for impacted files.
* Be sure you can view said image in a web browser.
* When all is said and done, you will submit 2 files and 1 URL:
* **info.txt**, which contains:
* line 1: the full URL to view your file in a web browser
* line 2: the phrase encountered when viewing this image
* lines 3-: the command lines you used to undertake this project (you can exclude initial copying and end submission commands).. be sure to mention offsets/lengths/sizes of things.
* **meme0531.png**, which should conform to the resolution and format specifications above, and be correctly reassembled.
* The working URL to the **meme0531.png** file hosted in your lab46 webspace.
====Submit====
Please submit as follows:
lab46:~/src/unix/udr0$ submit unix udr0 info.txt meme0531.png http://lab46.corning-cc.edu/~USERNAME/unix/udr0/meme0531.png
Submitting unix project "udr0":
-> info.txt(OK)
-> meme0531.png(OK)
-> http://lab46.corning-cc.edu/~USERNAME/unix/udr0/meme0531.png(OK)
SUCCESSFULLY SUBMITTED
lab46:~/src/unix/udr0$