Table of Contents

UNIX/Linux Fundamentals Journal

JANUARY 23, 2014

Hard to believe we are already approaching the end of our first week. In some ways, I feel comfortable with line- commands but at the same time Unix seems mystical and foreign but I am sure this is because it is new. With practice and time, I am confident we can all become proficient at using the wide array of tools offered by this awesome Operating System.

JANUARY 25, 2014

JANUARY 28, 2014

A period (.) always refers to your present directory while (..) refers to the parent directory. Change to the (.) directory. Did it change? No because (.) dot refers to your current working directory. Change to the (/) root directory and enter the (..). Did your directory change? No, because we are already at the Root of the Directory System. Next we learned about three types of files in unix: regular, directories and special files. Regular files are text files and executable file while directories are merely a file that points to other files. A special file consist of devices, network pipes, and sockets. Next we used the ls and ls -l commands to look at files within the following directories: /var/log -regular directory; /dev -character device directory; / -root directory; /etc/int.d - regular file. Next we look at permission modes: read, write, and executable/search. Types include the user (u) who owns the file, the group (g) that owns the file and other (o) everyone else on the system. Just as a special note, you execute files and search directories. An example of permissions might be -rwx r-x r-x. The rwx refer to read, write and execute permissions given to the owner. Next, the r-x gives permissions given to the group and other allowing them to read and execute/search the directory. Next we were instructed to search the following path /usr/bin and look at the permissions for the vim utility. What user owns this? The answer appeared to be “l”. What can the user do with the contents of the directory? The user had the rwx permissions so he/she could read, write and execute/search this directory. It turns out that the group and other category can also read, write and execute. Next we learned about the chmod command and Octal and symbolic permissions. The read permission in octal is “4” and in symbolic “r” while the write permission is “2” octal and “w” for symbolic (this permission allows you to save, create, modify and even delete the directory). Finally, the execute/search option “1” octal and “x” symbolic allow the user to execute/parse through the contents of the directory. Next, we learned to create directories using the mkdir command. We then used the chmod command to modify the permissions giving the owner full read, write and execute permissions with no permission for the group and search only permission to others as defined above. The acutal command given using chmod was 701 octal. Next, we explored the “bin” directories and discovered a number of utilities with various permissions. Among the various tools, I saw the mkdir utility that enables the user to create a new directory into which he or she can place new files. On another note, the following diagram shows my home directory within the UNIX Directory Structure.

February 6, 2014

This weeks reading included reading “Harley Hahn's Guide to UNIX and Linux”, chapter 21 (Displaying Files) and chapter 22 (The vi Text Editor). Mr. Hahn has a unique way of presenting his material in a entertaining and informative way. In this week's log, I will endeavor to capture some of the learning I have gained through reading. To begin with, we learned a little bit about the history of terminal and how data was manipulated in early computers. Since early computers were not very complex by today's standards, they were primarily used to create and store data in a text based format. Therefore, text editors were an essential tool for working with computers. We the learned about the cat, head and tail utilities which give us the ability to look at a page of data. As part of our reading, we instructed to use cat to go into the /etc directory. Next, we were instructed to cat out the contents of the /etc/motd and /etc/hosts files.

February 8, 2014

In this week's OPUS assignment, we are learning about a new and nifty utility called “file”. The next set of exercises will be used to introduce to this tool.

Practice

Procedure

Try your hand at the following activity, where things are not necessarily as they should be:

Being a file that ends in .txt, you might try opening it in a text editor (or simply using the cat(1) utility.

Does it appear to be a text file? No, it does not appear to be a text file. It is difficult to determine what the file type is from the content but is definitely not a text file.

Use the file(1) utility to determine the actual file type. From the output of the file utility, it appears that this file is a WAV file Leaf_In_The_Wind but I can't find a clear text message on how to proceed. I also found information saying it was a compressed file and I tried unzipping the file but it is still not a readable text format. For the moment, I'm kind of stuck but I will continue to search for an answer.

As is the case many investigations, just observing how things behave can lead to recognition of an object's true state, or the recognition of a pattern, which can be used to solve the task at hand. Finally, after talking with Dominic and seeing a post by Matt in Screen -r, I finally realized that I had to change the suffice .txt to .gz and from there I was able to figure out the rest. While this was a challenging exercise, I think I gained allot from this project.

While I have been spending most of my time trying to figure our the Puzzlebox mystery, I did read the information in this week's assigned reading “Text Processing.” I've gotten pretty well acquainted with the cat utility. As instructed, I use cat to look at the ect/motd (Message of the Day). Pretty neat to look at the content of a text file. I haven't used head and tail as much but these are pretty neat utilities as well.

Next, we look at the wc utility. How do you get wc(1) to display just the line count? You do this by using -l option in the line command. Using wc -l, I was able to find 27 lines in the etc/passwd file.

To print the first 16 lines of passwd using head, I entered the following string: head -n 16 passwd. Using the tail utility, I entered: tail -n 8 passwd and was presented with the last 8 lines of this file.

As food for thought, I spent some time thinking about the use of the tail -f utility. I think it would be a great tool for a group to see what's being done by individual members participating in a project together. Even with coding, two eyes are better than one. Errors can be caught before a program is compiled.

Just a quick note, I learned from this week's reading that we have been using the text editor Pico as we have been using pine as our email client. I wonder how often we use a tool like our cell phone without realizing that we're using a Linux OS. The other text editor I'm quite sure we've all been introduce to is nano.

This week we've been spending time learning about vi text editor which is a moded (command and insert). Thus enabling the user to enter text in the insert mode and execute commands while in the command mode (such as moving up and down on a page). While it may take some time to get used to the vi interface, I have no doubt that is a very powerful editor in this hands of an experience UNIX user.

FEBRUARY 18, 2014

Even though we had a four day weekend, there has been plenty of reading and work to accomplish in our quest to understand the UNIX/Linux Operating system. This weeks Lab, focused on learning more about the UNIX Shell operations and commands.

First, we revisited the topic of Wild Cards/Symbols and how they can be used effectively to manipulate data. We were instructed to create a shell/ sub-directory and then using cd move in to this newly created item. What is my current working directory? – shell. We then were instructed to use the Touch command to create 8 files.

Show how you did this: /SHELL$ touch file1 file2 file3 file4 file1234 filea fileZZZ

Next, we were instructed to “Run ls on your directory with an argument of file*. What happened? All 8 of the files, the ones just created, were redirected to the monitor.

Following this, we were instructed to run an ls on our directory with an argument of file? The results of this command displayed the following output: file1 file2 file3 filea. This is because the argument specified called for only one character following the name file.

Next we ran the ls command with the argument of file[23] in an effort to discover the out put of file2 & file3. Why, because the argument specified finds and returns files with 2 and/or 3 in their name.

For our next exercise, we used ls with the argument of file[24a]* which produced the following output: file2 file41 and filea. Why? because the argument was looking through all files in the sub-directory shell for files matching the criteria of 2 4 or a in the name.

Next we moved on to learn more about I/O Redirection. Returning to our Shell sub-directory, we were instructed to use the cat utility to display the content of /etc/motd and redirect the file contents to a STDOUT file named file1 which we created earlier. After using less on the file, we learned that the contents had truly been moved to file1 (MOTD).

Following this, we used echo to append the string ”-This is text-” to file1. What if anything happened? The message above was appended to the MOTD in file1. Next, we continued our exploration by using redirection (>) of the “More text…” to file1. What happened and why? The result was that the MOTD was overwritten by “More test…” in file1. Now we were instructed to enter the following command at the prompt: ls file*|grep “file1”>file2. This scripted command, listed the content of all files, and then grep grabbed the list and put a copy in “file1”. Next, the files were redirected to file2.

Now we completed a number of exercises to facilitate our learning regarding STDOUT an STDERR utilities. At the prompt, we entered ls file555 and an error message was displayed on the screen saying “ls: cannot access file555. Following this, we used the following script ls file555 > /dev/null. After running the script, we learned that /dev/null acts like a black hole where data goes in but never returns. Next we did an experiment with cat <file2 which redirected the content of file2 to our monitors.

Next, we started to take a closer look at Pagers (less, more, and pg). As instructed, we wrote a string that would get a long listing of the entire /bin directory and use a pipe to redirect it to a pager. I chose less because less is more! The script I wrote was ls -l /bin | less which indeed did open so I could read the files page by page.

Finally, we did some work with Quotes in scripting commands. At the prompt, we were instructed to enter echo $PATH which directed the path to numerous /bin directories (home/bkathan/bin:/bin:urs/local/bin:…). Next, we entered echo “$PATH” and got identical results. It is not unusual in Linux to have multiple way of completing the same task.

From our reading this week, I was especially interested in the unique calculator available to the operator of a UNIX system. Once you learn the associated commands, it makes it very easy to convert from binary, octal, decimal and even hexadecimal to the base you need. This can be a nifty little utility for those who deal with these types of conversions on a regular basis.

February 24, 2014

This week's reading started out by talking about Simple Shell Scripts. Through the reading, I learned that scripting is among the highest in high level programming languages. It is a language that is more or less “human readable.” This can help to facilitate quick and easy solutions to a problem or task. On the flip side, performance and/or efficiency can be compromised.

A simple script

We were introduced to a simple and literal script for computers to execute. They are as follows:

ls ~ 
df  
who 

These commands were placed inside a file, script1.sh (the sh is a traditional convention for identifying a shell script). After this, I used the chmod command to change the permission on the file to make it an executable.

I now entered the following command ./script1.sh and the script executed as designed. While I have little to no scripting experience, I see how this could be a powerful tool to automate certain tasks.

As part of exercise 2, I tried writing a script that would prompt the user to enter the year of their birth. As soon as the individual entered their year of birth, the are immediately greeted with a message stating the current year is 2014. However, I have not been successful in using expr to subtract variables.

I finally figured out how to get the code to work properly. My final code is entered below:

Echo "Please enter your year of birth!"
read year1  
echo "Please enter the current year!"
read year2
diff=$(expr $year2 - $year1)
echo "Your age is:"
echo $diff

March 3, 2014

1. Do the following:
List your current processes with ps(1).
a. How did you do this? (Provide the output)
b. If you were logged onto the system multiple times, would your invocation of ps(1) also show any processes being run in your other shells?
c. How would you instruct ps(1) to give you this information?

To complete (a.), How did you do this? I used ps(1) at the command prompt. Below is resulting output!

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
bkathan   9000  0.0  0.1  13656  1964 pts/44   SNs  Jan21   0:00 /bin/bash
bkathan   9009  0.0  0.3  42660  5244 pts/44   SN+  Jan21   8:14 irssi
bkathan  10149  0.0  0.1  13628  1996 pts/41   SNs  12:43   0:00 -bash
bkathan  11236  0.0  0.0   8588   992 pts/41   RN+  14:10   0:00 ps u


To answer part (b.), I logged in on a second prompt and discovered that new session did show up on the monitor when I ran PS. In part (c.) we're asked, how would you instruct ps(1) to give you this information? I would use the ps command with the -e option.

2. Do the following: a. Using ps(1), list all processes on the system in user-oriented format. Tell me how you did it. b. What PID is the process inetd (inet daemon)? Include the entire matching line from ps. c. Run top(1), and check it out for a while. Any observations? What are the most active processes? (q to quit)

After checking the MAN Page for ps(1), I believe the answer to Question 2. a. is (Using ps(1), list all processes on the system in user-oriented format) is to use the ps -U option. Part (b.) asks, What PID is the process inetd (inet daemon)? The response is: root 1058 0.0 0.0 8328 492 ? Ss Jan17 0:00 /usr/sbin/inetd.

3. Using the tools available to you, determine:
a.  What type of file is this? How did you determine this?
b.  Follow the instructions inside on how to compile it to create an executable.
c.  Did it work? How do you run it?
d.  What could you do to make your $PATH see this executable file? Explain.

First, in response to question (a.) I discovered that file count.c is an ASCII C program text. To determine this, I used the File command. I tried following the instructions in the file, “gcc -o count count.c” but I'm not sure that I did it correctly. I used chmod to set the file properties to allow executable. At the command prompt, I entered ./count.c and it did produce data on the monitor. However, it didn't seem to have any understandable meaning. So I'm not sure that I got it to work correctly.

I'm not sure how to make the $PATH visible but I will as this question on Tuesday during our next class. It might be as simple as using ./ proceeding the file name count.c but I'm not certain. As instructed, I ran time count file and got similar output to count.c which makes me believe it did work previously.

real    0m24.325
user    0m0.012s
sys     0m0.060s
Now let us compare the benefits of using the & to background processes:
4. Do the following:
a.  Run: time ./count > output and wait for it to finish. What is in “output”? How long did it take?
b.  Do the same, but this time add an & to the end. Anything change? Get a process listing with top, does it show up? What is its CPU utilization?

In response to (a.), the program did appear to run in the background (bg) but it appeared to produce the same output. One significant thing is that it appears to be using 100% of the CPU. This time, the output of count read, showed up as “The number is 33554432.000000. Adding the ampersand just moved the process to the background. However, the program self terminated once it was complete.

Command Grouping per process

Finally, the parenthesis come in use. If you enclose an entire command-line in parenthesis (excluding the &, as seen on page 132 in Learning the UNIX Operating System, 5th Edition) will cause the shell to treat that sequence of commands as one unique command, and give ALL commands involved just one unique PID!

Devise a command-line to do the following:
 
using sleep(1)
delay for 8 seconds before running ls(1)
when ls(1) is run, it should list all the files in /var
the output of the ls(1) should be redirected into a file, in your home directory, called: multitask.lab.txt
make sure this whole task runs in the background
Confused? Be sure to ask questions.  

Pretty neat stuff! While I have a great deal to learn, I know I couldn't have completed these tasks when we began this class. While it didn't take a long time to complete this task, I realize how beneficial it could be to have extremely long task run in the background. Below is the output of the sleep command redirected to “Multitask.lab.txt”.

5. Devise a command-line to implement task 1:
a.  Show me the command-line incantation you came up with, and explain your reasoning.
total 68
 
lab46:~$sleep 8; ls -l /var >multitask.lab2.txt &
 
drwxr-xr-x   2 root  root   4096 Feb  3 06:25 backups
drwxr-xr-x  11 root  root   4096 Apr 25  2011 cache
drwxr-xr-x   4 root  root   4096 Aug 30  2010 games
drwxr-xr-x  43 root  root   4096 Mar  9 17:45 lib
drwxrwsr-x   2 root  staff  4096 Jun  1  2010 local
drwxrwxrwt   2 root  root   4096 Mar  9 06:25 lock
drwxr-xr-x   6 root  root   4096 Mar  9 06:25 log
drwxrwsr-x 251 root  mail  12288 Feb  9 16:56 mail
drwxr-xr-x   2 root  root   4096 Jun 14  2010 opt
drwxr-xr-x  14 wedge lab46  4096 Jan 16 23:59 public
drwxr-xr-x   9 root  root   4096 Feb 13 02:27 run
drwxr-xr-x   4 root  root   4096 Jun 23  2010 spool
drwxr-x---   7 wedge lab46  4096 Jan 21 04:18 submit
drwxrwxrwt   3 root  root   4096 Mar  4 22:44 tmp
6.  Do the following:
a.   Copy the “link.sh” script from the devel/ subdirectory of the UNIX Public Directory.
b.   View this script.
c.   See that long “ld” line? That's what the compiler is doing for you.
d.   Go ahead and run this script, following the instructions in it, to link together your final
     executable.
 
Now, you should hopefully have an executable.

As instructed, I copied “link.sh” from the devel/ subdirectory of the Unix Public Directory to the subdirectory under ~bkathan/devel. I opened each file and followed the instructions include and was able to compile each file and make them executable. Pretty neat stuff!

 Now, you should hopefully have an executable.
7. Do the following:
a.  At the “lab46:~/devel$” prompt type: file hello
b.  What type of file is it?
c.  At the “lab46:~/devel$” prompt type: file helloC
d.  Does the output match that of the previous?
e.  Go ahead and execute your new binary. Does it run? Show me what you typed and what happens.

First, in response to question (b.) I discovered that file hello results in an error. However, HelloC is an ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped As directed in (e.), I executed the program and received the following response: “Hello, World!.”

Makefiles

A very popular tool used in program development is make. This tool comes in very handy when dealing with multiple source files that need compiling (and determining whether or not you need to recompile a particular object file).

It works by allowing the programmer to set up dependencies between the source and object files with a series of rules pertinent for the particular project. These rules are often places in a file called Makefile

Every account on Lab46 is equipped with a customized Makefile in the src/ subdirectory to the home directory.

8.  Do the following:
a.   Copy helloC.c into your src/ subdirectory. How did you do it?
b.   Do a directory listing. Do you see at least Makefile and helloC.c?
c.   View the contents of Makefile
d.   Compile helloC.c with the Makefile. How did you do this?

Following the instructions in (a.), I copied the file helloC.c to src/ subdirectory using the following command: cp helloC.c ~/src. As instructed in (steps b. through d.), I used the make helloC command to compile the file helloC. Immediately, the following line showed up on my terminal window: gcc -lm helloC.c -o helloC. I ran the helloC.c program and receive the message “Hello, World.”

9.  Do the following:
a. In /var/public/unix/devel there is a subdirectory called “multifile/”. Copy this directory (and its contents) to your own devel/ in your home directory. How did you do this?
b. View the various files in this directory, try and trace the flow of logic between them.
c. Read through the Makefile, and determine how to build this code.
d. How did you do this?

Following the instructions in (a.), I first copied the subdirectory called “multifile/” to my own subdirectory called devel/ using the cp -r (recursive) multifile command. After reading the Makefile, I had to try several methods to compile the file. The command that finally worked, in response to (d.) was: make -o display.c.

Code Efficiency: comparing file sizes

An interesting benchmark that can be conducted is to create programs that perform identical operations, and to compare the resulting file sizes and execution times of the executables.

10. Looking back on the original helloC, helloCPP, and helloASM binaries, do the following:
a.   What size are each of the executables?
b.   What observations can you make regarding differences in file size or execution speed?
c.   View/compile helloJAVA.java and run the result (java helloJAVA). What is its size?
d.   With Java being a higher level language (as C++ is, when compared to C and assembly), what do you think about the resulting compiled file? Is there perhaps more here than meets the eye?

The answer regarding the question in (a.) is helloC = 6546, helloCPP = 8369 and helloASM = 504. While I haven't figured out how to track execution speed of each file, I would guess that the higher the language, (b. through d.) the slower the execution speed since it is further from Machine Language. However, higher level languages are much easier for the novice to work with and edit but compile and execute slower than their counter part.

Procedure

The grep(1) utility is extremely useful in the area of text-searching, and Regular Expressions. We will be calling upon the capability of this tool quite often, so let us take a look at it:

1. Using the grep(1) utility in the /etc/passwd file, perform the following searches:

a.  Grep for the substring 'System' (note capitalization). What did you type on the command-line?
b.  What does this search do?

In response to (a.), I typed the command: less passwd | grep 'System'. This produced the following result: gnats:x:41:41:Gnats Bug Reporting System (admin):/var/lib/gnats/:/bin/sh. It appears that in response to (b.) above, that this command searched the file “passwd” and found the string 'System' and responded with the output above.

As you can see, grep(1) can be used to search for literal text strings, but it can also be used to search based upon a pattern:

2. Using the grep(1) utility in the /etc/passwd file, perform the following search:

a.  Grep for the pattern '^[b-d][aeiou]'. What did you type on the command-line?
b.  What does this search do?
c.  How is this more powerful than just searching for a literal string? 

At the command prompt, in response to (a.), I entered: less /ect/passwd | grep '^[b-d][aeiou]'. Below is the output of this command:

daemon:x:1:1:daemon:usr/sbin:/bin/sh
bin:x:2:2:bin:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

In response to (b.), “What does this search do?”, it searches the file /ect/passwd words that begin with b through d and contain the any of the vowels “aeiou”. This is much more powerful because it allows you to design specific searches to find various patterns.

3. Using the grep(1) utility in the /etc/passwd file, perform the following search:

a. Search for all the lines starting with any of your initials (first or last). Be sure to include command used, and matching lines.]
b. Search for all the lines starting with r, followed by any lowercase vowel, and ending with an h. How did you do it? What were your results?

In response to (a.), I tried searching for my first initial (B) in the /etc/pass\wd file and found “backup:x:34:34:backup:/var/backups:/bash/bin/sh”. Next I searched for “r” as instructed in (b.) and found “root:x:0:0:root:/bin/bash”. The command I used was grep '^r[aeiou]+h$' but I couldn't seem to get it to work as designed. When I left off the +h$, I got the response I wanted. However, when I added the ”+h$” I didn't get any response. Hopefully, I can get some help with this attempt next Tuesday during class.

In the regex/ subdirectory of the UNIX Public Directory you will find a file called regex.html, which is a copy of lab #0, with some changes. Looking through this file, you will see several HTML tags. Having to make changes to this file could result in massive changes, so why worry about doing it by hand? Let Regular Expressions help!

4. Do the following (be sure to show the substitution command used):

a. Oops! I made a typo! All the <center> tags are spelled British style as <centre>. Go ahead and correct this for all occurrences in the entire file.
b. The closing center tags are currently </CENTRE>, so go change them to </center>. Be sure to properly handle the /.
c. This file uses the old <b>-style boldness tags. We want to be fairly modern and use <strong> instead. So go ahead and get that all set.
d. Go ahead and make the appropriate changes to all the </b> tags to their corresponding </strong> counterparts.
e. No need to provide the updated file, just show me the substitution commands given in the first four parts.

To complete (a.) above, entered the following command in vi, :%s/centre/center/g. This changed all of the lower case “centre” to “center”. Therefore, I repeated this process replacing “CENTRE” to “CENTER”. As expected, this changed all of the English word CENTRE to the American version CENTER.

To complete steps (b.) - (e.) I used the following commands.

:%s/centre/center/g
:%s/CENTRE/CENTER/g
:%s/<b>/<strong>/g
:%s/</b>/</strong>/g

Imagine if you had a massive file in need of changes? Would you want to spend hours doing it all by hand? Or construct a simple RegEx pattern and have the computer do the work for you? THAT is the power of Regular Expressions.

RegEx makes editing a massive file much easier and makes the user much more efficient in the use of their time. In a very competitive job market, these skills make an individual very desirable. This enabling the individual to do more in less time.

5. Change into the /usr/share/dict directory and locate the 'words' file.

a. Do you see it? It is a symbolic link. Chase it down to its destination, show me what it is, and how you found it.
b. View this file… how does the file appear to be made up?
c. How many entries are in this file? Show me how you accomplished this.

In response to (a.) above, I found the following information which I believe points to its destination: 16 June 23 2010 words.pre-dictionaries-common → american-english. To find this information, I entered:

ls -l | grep words

The file is made up of a single column with a repeating pattern. As an example [A, A's] [AOL, AOL's]…down the enter list of words in the Dictionary. In response to how many entries in the “words” file, I entered: less words | wc -l and got the response of 98569.

less words | wc -l

Using this dictionary, I'd like for you to perform some searches, aided by Regular Expressions you construct. Be sure to show your pattern, as well as provide a count of how many words match your pattern.

6. Construct RegEx according to the following criteria and show me what you typed, and show me how many words match your pattern:

a. All words exactly 5 characters in length
grep '^.....$' words | wc -l  == 6685
b. All words starting with any of your initials
grep '^b' words | wc -l == 4723
grep '^m' words | wc -l == 4283
grep '^k' words | wc -l == 608
c. All words starting with your first initial, having your middle initial occur somewhere after the first, and end with your last initial.
d. All words that start and end with lowercase vowels.
grep '^[aeiou]' | wc -l == 14732
e. All words that start with any of your initials, immediately followed by any lowercase vowel, and ending with the letters 'e', 's', or 't'
grep '^b.[est]$' words | wc -l == 7 
f. All words that do not start with any of your initials.
g. All words at least 3 characters in length, and do not start with “th“
h. All 3 letter words that end in 'e'
i. All words that contain the substring “bob” but do not end with the letter 'b'
j. Only the words that start with the substring “blue”.
k. All the words that contain no vowels (consider 'Y' in all cases a vowel).
l. All the words that do not begin with a vowel, that can have anything for the second character, only 'a', 'b', 'c', or 'd' for the third character, and end with a vowel. 

March 25, 2014

The first part of this week's reading focuses on the use of filtering to make stored data meaningful. Data is used by decision makers to make better and more informed decisions but to be worthwhile, it must be current and relevant. The first exercise is to use the wc -l filter on a file called sample.db. The results of this query showed that there are 18 lines in this file.

1. Perform the following searches on the database:

a. Find all the students that are a Freshman
b. Same as above but in alphabetical order
c. Any duplicate entries? Remove any duplicates.
d. Using the wc(1) utility, how many matches did you get?
Be sure to give me the command-line incantations you came up with, and any observations you made.

In (a.) above, we used the cat utility in conjunction with “| grep Freshman” to list all freshman students. Next, we added sort to list all Freshman alphabetically as described in part (b.). After the list was generated, it became a apparent the one of the Freshmen was listed twice in the database as mentioned in part (c.). Finally, in part (d.), we are instructd to use the wc(1) utility on the sample.db which produced the following results: 17 Lines, 44 Words and 849 Characters.

cat sample.db | grep Freshman
cat sample.db | grep Freshman | sort
vi sample.db --insert-- deleted duplicate entry

Filter for Manipulation

As we've progressed, I've learned to do some simple searches on our database. In addition, I was successful in filtering the output to get desired values. However, I know I'm not done yet. Not only can we filter the text, but I hope to learn how to manipulate it to a given need or defined out put.

The cut(1) utility lets us literally cut columns from the output.

It relies on a thing called a field-separator, which will be used as a logical separator of the data.

Using the “-d” argument to cut, we can specify the field separator in our data. The “-f” option will parse the text in fields based on the established field separator.

So, looking at the following text:

hello there:this:is:a:bunch of:text.

Looking at this example, we can see that ”:” would make for an excellent field separator.

With ”:” as the field separator, the logical structure of the above text is logically represented as follows: Field 1 Field 2 Field 3 Field 4 Field 5 Field 6 hello there this is a bunch of text.

We can test these properties out by using cut(1) on the command-line:

lab46:~$ echo “hello there:this:is:a:bunch of:text.” | cut -d“:” -f

Where # is a specific field or range of fields. (ie -f2 or -f2,4 or -f1-3)

The output of the above line reads as follows:
 
lab46:~$ echo "hello there:this:is:a:bunch of:text." | cut -d ":" -f1
hello there
 
Pretty neat!

2. Let's play with the cut(1) utility:

a. What would the following command-line display: echo “hello there:this:is:a:bunch of:text.” | cut -d”:” -f3
b. If you wanted to get “hello there text.” to display to the screen, what manipulation to the text would you have to do?
c. Did your general attempt work? Is there extra information?
 
If you found that extra information showed up when you tried to do that last part- taking a closer look will show why:
 
If you tell cut(1) to display any fields that aren't immediately next to one another, it will insert the field separator to indicate the separation.
 
So how do you keep this functionality while still getting the exact data you seek? Well, nobody said we could only apply one filter to text.

The answer to (a.) “What would follow the command-line display: echo “hello there:this:is:a:bunch of:text.” | cut -d”:” -f3” would be is. Therefore, we would say that this line of code filtered the entire RegExp for the word is because it occupies position -f3 in the statement.

The answer to (b.), “If you wanted to get “hello there text.” to display to the screen, what manipulation to the text would you have to do?” I would change the text to read “hello this is a bunch of text” followed by a pipe and the command: cut -d“ ” -f1-2,8.

echo "hello there this is a bunch of text." | cut -d" " -f1-2,8
The output reads, "hello there text."

The answer to (c.), “Did your general attempt work? Is there extra information?”, is yes - it did work and no there is no extra information.

April 2, 2014

The Stream Editor - sed

Remember back when we played with vi/vim? Remember that useful search and replace command:

:%s/regex/replacement/g That was quite useful. And luckily, we've got that same ability on the command line. Introducing “sed(1)”, the stream editor.

sed provides some of the features we've come to enjoy in vi, and is for all intents and purposes a non-interactive editor. One useful ability, however, is its ability to edit data streams (that is, STDOUT, including that generated from our command lines).

Perhaps the most immediately useful command found in sed will be its search and replace, which is pretty much just like the vi/vim variant:

sed -e 's/regex/replacement/g' However, if you look close, you will see that we did not include any sort of file to operate on. While we can, one of the other common uses of sed is to pop it in a command-line with everything else, stuck together with the all-powerful pipe (|).

For example, so solve the above problem with the field separator:

echo "hello there:this:is:a:bunch of:text." | cut -d":" -f1,6 | sed -e 's/:/ /g'

We used sed to replace any occurrence of the ”:” with a single space.

3. Answer me the following:

a. Does the above command-line fix the problem from #2c?
b. If you wanted to change all “t”'s to uppercase “T”'s in addition to that, what would you do?
c. If you wanted to replace all the period symbols in the text with asterisks, how would you do it?
d. What does the resulting output look like?

First, I want to answer question (a.), “Does the above command-line fix the problem from #2c?” The answer is yes it most certainly does. Although my original solution accomplished the same thing, sed provides me with yet another tool.

To answer (b.), “If you wanted to change all “t”'s to uppercase “T”'s in addition to that, what would you do?” I would appended the previous command - sed -e 's/t/T/g'. Below is the result:

echo "hello there:this:is:a:bunch of:text." | cut -d":" -f1,6 | sed -e 's/:/ /g' |
sed -e 's/t/T/g'
 
The result was:  hello There TexT.

I then tried replacing all (.) with (*) but the result I got was just a string of asterics. The syntax I tried using is listed below:

echo "hello there:this:is:a:bunch of:text." | cut -d":" -f1,6 | sed -e 's/:/ /g' | sed -e 's/t/T/g' | sed -e 's/./*/g'

I looked in Haley Hahn's Text Book but couldn't figure out what I was doing wrong. Perhaps someone can point me in the right direction.

In response to (d.) above, “What does the resulting output look like?”, I will display the results below.

The syntax of the command I tried was:
 
echo "hello there:this:is:a:bunch of:text." | cut -d":" -f1,6 | sed -e 's/:/ /g' |
sed -e 's/t/T/g' | sed -e 's/./*/g'
 
But the outcome was not what I expected.  Below is the output:  
*****************

From head(1) to tail(1)

Two other utilities you may want to become acquainted with are the head(1) and tail(1) utilities.

head(1) will allow you to print a specified number of lines from 1 to n. So if you needed to print, say, the first 12 lines of a file, head(1) will be a good bet.

For example, to display the first 4 lines of our sample database:

lab46:~$ head -12 sample.db And, of course, adding it onto an existing command line using the pipe. In this example, the first two results of all the *ology Majors:

lab46:~$ cat sample.db | grep “ology” | head -2 See where we're going with this? We can use these utilities to put together massively powerful command-line incantations create all sorts of interesting filters.

tail(1) works in the opposite end- starting at the end of the file and working backwards towards the beginning. So if you wanted to display the last 8 lines of a file, for example. tail(1) also has the nifty ability to continually monitor a file and update its output should the source file change. This is useful for monitoring log files that are continually updated.

Translating characters with tr
ASCII file line endings

An important thing to be aware of is how the various systems terminate their lines. Check the following table:

System Line Ending Character(s) DOS Carriage Return, Line Feed (CRLF) Mac Carriage Return (CR) UNIX Line Feed (LF) So what does this mean to you? Well, if you have a file that was formatted with Mac-style line endings, and you're trying to read that file on a UNIX system, you may notice that everything appears as a single line at the top of the screen. This is because the Mac uses just Carriage Return to terminate its lines, and UNIX uses just Line Feeds… so the two are drastically incompatible for standard text display reasons.

For example, let's say we have a UNIX file we wish to convert to DOS format. We would need to convert every terminating Line Feed to a Carriage Return & Line Feed combination (and take note that the Carriage Return needs to come first and then the Line Feed). We would do something that looks like this:

lab46:~$ tr “\n” “\r\n” < file.unix > file.dos To interpret this:

\n is the special escape sequence that we're all familiar with. In C, you can use it to issue an end-of-line character. So in UNIX, this represents a Line Feed (LF).

\r is the special escape sequence that corresponds to a Carriage Return (CR).

The first argument is the original sequence. The second is what we would like to replace it with. (in this case, replace every LF with a CRLF combination).

Then, using UNIX I/O redirection operations, file.unix is redirected as input to tr(1), and file.dos is created and will contain the output.

In the filters/ subdirectory of the UNIX Public Directory you will find some text files in DOS, Mac, and UNIX format.

4. Let's do some tr(1) conversions:

a. Convert file.mac to UNIX format. Show me how you did this, as well as any interesting messages you find inside.
b. Convert readme.unix to DOS format. Same deal as above.
c. Convert dos.txt to Mac format. Show me the command-line used.

In response to (a.), I entered the syntax – tr “\r” “\n” < file.mac > file.unix and got the response listed below.

Syntax -- tr "\r" "\n" < file.mac > file.unix
 
Q: What's the difference between a dead dog in the road and a dead lawyer in the road?
 
A: There are skid marks in front of the dog.  file.unix (END)
 
To bad my good friend Dan Pozner, he was a Lawyer in Ithaca, isn't still alive.  He would have gotten a good laugh from this one.

In (b.) above, we were instructed to “Convert readme.unix to DOS format. Same deal as above.” I used the following syntax – tr “\n” “\r\n” < readme.unix > readme.dos

As instructed in(b.), I entered the command -- tr "\n" "\r\n" <readme.unix > readme.dos
 
Immediately below is the output:
 
Sheriff Chameleotoptor sighed with an air of weary sadness, and then^Mturned to Doppelgutt and said 'The Senator must really have been on a^Mbender this time -- he left a party in Cleveland, Ohio, at 11:30 last^Mnight, and they found his car this morning in the smokestack of a British^Maircraft carrier in the Formosa Straits.'^M                -- Grand Panjandrum's Special Award, 1985 Bulwer-Lytton^M                                   bad fiction contest.
readme.dos (END)

Finally, as instructed in (c.), I translated dos.txt to dos.mac format. I entered the following syntax: tr “\r\n” “\r” < dos.txt > dos.mac

Command:
 
tr "\r\n" "\r" < dos.txt > dos.mac
 
Q: How do you shoot a blue elephant?  A: With a blue-elephant gun.
Q: How do you shoot a pink elephant?  A: Twist its trunk until it turns blue, then shoot it with a blue-elephant gun.
 
I deleted some of the content to make it more "human readable" but this is the basic content of the file.

Procedure

Looking back on our database (sample.db in the filters/subdirectory of the UNIX Public Directory), let's do some more operations on it:

5. Develop, explain, and show me the command-lines for the following:

a. How many unique students are there in the database?
b. How many unique majors are there in the database?
c. How many unique “favorite candies” in the database? (remove any trailing asterisks from the output)

In (a.) above, we are asked, “How many unique Students are there in the database?” I used the following string at the command prompt: uniq -u sample.db | wc -l.

a. How many unique Students are there in the database?
 
The line of code used:
 
uniq -u sample.db | wc -l
 
The output: 17

Next, (b.) asks, “How many unique majors are there in the database?” I used the following syntax to get the answer – cut -d“:” -f3 sample.db | sort -u | wc -l

b. "How many unique majors are there in the database?"
 
cut -d":" -f3 sample.db | sort -u | wc -l
 
Out put:  13 

In (c.) we are asked, “How many unique “favorite candies” in the database?” The response will be shown below:

c. "How many unique “favorite candies” in the database?"
 
syntax -- cut -d":" -f5 sample.db | sort -u
 
Bubblegum
Gobstoppers
Ju-Ju Fish
Junior Mints
Lollipops
Mars Bar
Necco Wafers
Rock Candy
Snickers
Tic-Tacs
Warheads
Whoppers
Zero Bar
favorite candy
unknown

6. Using the pelopwar.txt file from the grep/ subdirectory of the UNIX Public Directory, construct filters to do the following:

a. Show me the first 22 lines of this file. How did you do this?
b. Show me the last 4 lines of this file. How did you do this?
c. Show me lines 32-48 of this file. How did you do this? (HINT: the last 16 lines of the first 48)
d. Of the last 12 lines in this file, show me the first 4. How did you do this?
Being familiar with the commands and utilities available to you on the system greatly increases your ability to construct effective filters, and ultimately solve problems in a more efficient and creative manner.
a. Show me the first 22 lines of the file.  How did you do this?
 
Syntax: lab46:~/projects/filters$ head -n22 pelopwar.txt
 
Provided by The Internet Classics Archive.
See bottom for copyright. Available online at
    http://classics.mit.edu//Thucydides/pelopwar.html
 
The History of the Peloponnesian War
By Thucydides
 
 
Translated by Richard Crawley
 
----------------------------------------------------------------------
 
THE FIRST BOOK
 
Chapter I
 
The State of Greece from the earliest Times to the Commencement of
the Peloponnesian War
 
Thucydides, an Athenian, wrote the history of the war between the
Peloponnesians and the Athenians, beginning at the moment that it
broke out, and believing that it would be a great war and more worthy
b. Show me the last 4 lines of this file. How did you do this?
 
Syntax:  lab46:~/projects/filters$ tail -n4 pelopwar.txt
 
our children; so improbable is it that the Athenian spirit will be
the slave of their land, or Athenian experience be cowed by war.
 
"Not that I would bid you be so unfeeling as to suffer them
c. Show me lines 32-48 of this file. How did you do this? (HINT: the last 16 lines of the first 48)
 
Syntax:  lab46:~/projects/filters$ awk 'FNR>=32 && FNR<=48' pelopwar.txt
 
yet the evidences which an inquiry carried as far back as was practicable
leads me to trust, all point to the conclusion that there was nothing
on a great scale, either in war or in other matters.
 
For instance, it is evident that the country now called Hellas had
in ancient times no settled population; on the contrary, migrations
were of frequent occurrence, the several tribes readily abandoning
their homes under the pressure of superior numbers. Without commerce,
without freedom of communication either by land or sea, cultivating
no more of their territory than the exigencies of life required, destitute
of capital, never planting their land (for they could not tell when
an invader might not come and take it all away, and when he did come
they had no walls to stop him), thinking that the necessities of daily
sustenance could be supplied at one place as well as another, they
cared little for shifting their habitation, and consequently neither
built large cities nor attained to any other form of greatness. The
richest soils were always most subject to this change of masters;
d. Of the last 12 lines in this file, show me the first 4. How did you do this?
 
lab46:~/projects/filters$ awk 'FNR>=1593 && FNR<=1597' pelopwar.txt
 
and can import what they want by sea. Again, if we are to attempt
an insurrection of their allies, these will have to be supported with
a fleet, most of them being islanders. What then is to be our war?
For unless we can either beat them at sea, or deprive them of the
revenues which feed their navy, we shall meet with little but disaster.

By being familiar with the commands and utilities we have available to us on the system greatly increases our ability to construct effective filters, and ultimately solve problems in a more efficient and creative manner. While solving many of these problems exceeds the amount of time available, it is definitely worth the effort.

Investigating

1.  Answer me the following:
a. What is different about these two files?
b. What is similar?
c. If dd(1) copies (or duplicates) data, why do you suppose these differences exist?
d. What is the output of file(1) when you run it on both of these files?
e. When you execute each file, is the output the same or different?
f. Any prerequisite steps needed to get either file to run? What were they?

Consistency of data has been a desire of computer users long before computers were readily available. To be able to verify the authenticity of two works of data, minimizing the chances of some hidden alteration or forgery is an important capability to possess.

When I looked at these two files in an effort to answer (a.) above, “What is different about these two files?, the immediate thing that jumps out at me is the change in permissions.

-rwxr-xr-x 1 root    root  4912 Feb 16  2012 /usr/bin/uptime
-rw-r--r-- 1 bkathan lab46 4912 Apr 17 22:58 howlong
 
The change appears to be in the right to execute the file.  Without more research, my initial impression is that the permissions changed because I'm not the creator of this file.  The "howlong" file was created by the system itself after copying data from one file location to another.

Regarding (b.) above, what is similar is the read permission on both files. Also, the name was changed and along with some of the file permissions. In (c.) we are asked, “If dd(1) copies (or duplicates) data, why do you suppose these differences exist?” While I'm not 100% sure of my answer, I believe the answer is that the copy was created by the system and not the original file creator. regarding (d.), What is the output of file(1) when you run it on both of these files? The output I received was “11:25:21 up 9 days, 6:28, 5 users, load average: 0.00, 0.00, 0.00” and “11:28:40 up 9 days, 6:31, 5 users, load average: 0.00, 0.00, 0.00.” The only difference I see is the time statements from each file. Since they weren't run at the same moment, they produce slight different output. To answer (e.), When you execute each file, is the output the same or different?, as noted above the only real difference was in the time statement. Both files were up 9 days but one time was stated as 11:25.21 and the other 11:28:40 (the time between running the two executables). Finally, to answer (f.), Any prerequisite steps needed to get either file to run? What were they? The answer I get is yes! First, the second file needs to have the permissions on the file altered to allow for execution.

chmod 755 howlong

Although many ways exist, there are two common ways of comparing two files:

diff(1): compares two files line by line, indicating differences (useful for text files)
md5sum(1): computes an MD5 hash of a file's contents, creating a unique data fingerprint
 
2. Answer me the following:
a. Are /usr/bin/uptime and howlong text files or binary files? What is your proof?
b. Using diff(1), verify whether or not these files are identical. Show me the results.
c. Using md5sum(1), verify whether or not these files are identical. Show me the results./
d. Using md5sum(1), compare the MD5 hash of one of these files against /bin/cp, is there a difference?
e. How could an MD5 hash be useful with regards to data integrity and security?
f. In what situations could diff(1) be a useful tool for comparing differences?

The question asked (a.) above, Are /usr/bin/uptime and howlong text files or binary files? What is your proof? is answered below. First, they are both ELF 64-bit LSB executable files with dynamic links (to shared libraries). My proof comes from the use of file (file name). Next in (b.) we are instructed to, “Using diff(1), verify whether or not these files are identical. Show me the results.” I posted the result below from using the command (diff -q /usr/bin/uptime howlong). From the lack of output, I believe that both files have identical content. In (c.) above, we are instructed, “Using md5sum(1), verify whether or not these files are identical. Show me the results.” I've posted the output below but the answer, after using MD5sum, again appears to be that both files are identical. In response to (d.), “Using md5sum(1), compare the MD5 hash of one of these files against /bin/cp, is there a difference?” Yes, this time there is a significant difference which I will post below. Following this, (e.) above asks, “How could an MD5 hash be useful with regards to data integrity and security?” To provide a personal example, I recently downloaded the “Ultimate Boot CD” image from their website. However, to verify that the image I downloaded matched the original image, I was able to run the MD5sum on the downloaded version to the image I copied. The results verified that I had an exact copy and was there safe to execute. Finally, (f.) above asks, “In what situations could diff(1) be a useful tool for comparing differences?” First, it would be valuable to verify the files sent from one network to another have not been manipulate (altered) by a “man-in-the-middle.” Additionally, if you have multiple files with the same name, it would be helpful to have a mechanism for finding differences between files.

lab46:/$ file /usr/bin/uptime
/usr/bin/uptime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
 
lab46:~$ file howlong
howlong: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
lab46:~$ diff -q howlong /usr/bin/uptime
lab46:~$

Since there is no output to this command, I would surmise that there are no differences between these files.
lab46:~$ md5sum /usr/bin/uptime howlong
da2388cb2b29d954c22db441bb16188c  /usr/bin/uptime
da2388cb2b29d954c22db441bb16188c  howlong
 
The output again confirms the output of these files are identical.
lab46:~$ md5sum /bin/cp howlong
85e58085c89b09fceb6e3602356c07a9  /bin/cp
da2388cb2b29d954c22db441bb16188c  howlong
 
As the casual observer can see, there is a significant difference between the hash output.  Obviously, the output from the two files varies.

Exercise

3. Do the following:
a. Using dd(1), create a 8kB file called “test.file” filled entirely with zeros.
b. How did you do this?
c. How could you verify you were successful?
d. If you ran echomore information” » test.file, what would happen?
e. Can you find this information in test.file? Where is it (think in terms of file offsets)
f. If you wanted to retrieve the information you just added using dd(1), how would you do it?

In (a.) we are told “Using dd(1) create a 8kB file called “test.file” filled entirely with zeros. In an effort to complete this task, I entered the script below. While I was successful in creating a file of approximately 8kB, I'm not sure that I have the correct script.

lab46:~/public_html$ dd if=text.zero of=test.file bs=1 count=8192
8192+0 records in
8192+0 receords out
8192 bytes (8.2 kB) copied, 0.0596932 s, 137 kB/s
 
I'm not sure the code is correct but I think it is close.

Part (b.) asks, How did you do this? I tried writing a script, dd =text.zero of=test.file bs=1 count=8192. Even though the output is not exactly 8kB as required, I came very close. For part (c.) above, “How could you verify you were successful?” The only way I know is to verify the output with what was expect. I was expecting 8 KB but ended up with slightly over 8.2 kB. As requested in part (d.), “If you ran echo “more information” » test.file, what would happen?

As instructed in part (d.), tried running the string:
lab46:~/public_html$ echo "more information" >> test.file
lab46:~/public_html$ less test.file
 
It appears that I'm taken to the end of the file where I can append addition data to the file.

The next question in part (e.) asks, “Can you find this information in test.file? Where is it (think in terms of file offsets)?” After entering Less test.file, I was taken by the system to the end of this file where I believe I could add additional appended data manually. I'm not sure what is meant by the term “file offsets”. Part (f.) asked the following question, “If you wanted to retrieve the information you just added using dd(1), how would you do it?” I did a Man search for dd(1) but wasn't able to find information that would help me answer this question.

In the data/ subdirectory of the UNIX Public Directory is a file called data.file
 
Please copy this to your home directory to work on the following question.
4. Applying your skills to analyze data.file, do the following:
a. How large (in bytes) is this file?
b. What information predominantly appears to be in the first 3kB of the file?
c. Does this information remain constant throughout the file? Are there ranges where it differs? What are they?
d. How would you extract the data at one of these ranges and place it into unique files? Extract the data at each identified range.
e. How many such ranges of data are there in this file?
f. Run file(1) on each file that hosts extracted data. What is each type of file?
g. Based on the output of file(1), react accordingly to the data to unlock its functionality/data. Show me what you did.

In (a.) above, the question, “How large (in bytes) is this file?” my answer would have to be 8.186 kB. I will post the results below.

lab46:~/projects/datamanipulation$ ls -l
total 8
-rw-r----- 1 bkathan lab46 8186 Apr 20 16:31 data.file
 
It clearly looks to me that the size of this file is 8.186kB.

Part (b.) above asks, “What information predominantly appears to be in the first 3kB of the file?” It would appear from viewing the file in Hexedit, the first 3kB contains all zero's (0). I believe this may be some type of file header. In part (c.) we're asked, “c. Does this information remain constant throughout the file? Are there ranges where it differs? What are they?” The clear answer to this question is no! The file appears to be comprised of multiple sections. The first comes at marker 0x1000 and appears to mark the beginning of an ELF file. The same ELF file appears to end at 0x13A5. After converting these Hex number to decimal, I obtained a staring number of 6567 decimal to 6590 decimal. Using the command, dd ibs=1 obs=1 if=data.file of=data.res skip=4096 count=933, I was able to successfully extract this file and its content.

The next marker was 0x19A7 to 0x19BE. After converting these to decimal, I as able to extract the file which contained the message “The Magic Number is: 42”. To accomplish this, I used the same line command above with the correct input.

The next marker came at 0x1BBF and continued a little past 0x1BF6. The file contained the message “The secret word is: Monkey”. While is was a very difficult exercise, I have learned a great deal through the process of unravelling this puzzle.

The next section, part (d.), asks, “How would you extract the data at one of these ranges and place it into unique files? Extract the data at each identified range.” The command that worked for me was:

dd if=data.file of=data.res skip=7103 count=100 
 
I used the above command to extract a gzip file.  I then had to use gunzip to extract the line file which contained the line:  "The Secret Word is: MonkeySnake.

In part (e.) we're asked, “How many such ranges of data are there in this file?” It would appear that there are 3 ranges of data not including the header. Each file has to be extracted and then checked with file “file.name” to see what type of data it contained.

Now in part (f.), we are instructed to “Run file(1) on each file that hosts extracted data. What is each type of file?” Each of the files contained different types of data. After running file file.name, I was able to determine how read each file.

g. Based on the output of file(1), react accordingly to the data to unlock its functionality/data. Show me what you did. For the ASCII, I was able to use VI or Nano to read the content. One was a gzip file and I had to use GZIP to extract the file before I could read the content. This was a great exercise which provided a unique learning experience.