This week’s case study was focused on the topic of data manipulation. The two main utilities that I was introduced to through this activity were the data dump utility (dd) and a binary editor (bvi). The data dump command can be used to copy the contents of one file into another. This utility is especially interesting since it allows the user to specify which blocks of data to copy, allowing the user to essentially pick and choose which bytes of a file should be moved. The binary editor is similar to the vi editor, only it operates on binary data instead of text. When viewing a file through a binary editor, every two byes of information is displayed as a series of hex values.
The most interesting aspect of this activity for me was extracting other files from a larger one. To do this, I was provided with a file that I viewed with the bvi utility. The first 3 kb of data and the majority of the file was shown to be filled entirely with zeros. However, there were three ranges where there was other information present. The data dump utility could be used to extract the information contained in these ranges. To ensure that I extracted the correct data, I used the file command to ensure that the files could be recognized as an actual file type. When extracted these three ranges were revealed to be an executable file, a text file, and a gzip compressed file, all of which contained messages. I found this lab very interesting since it demonstrated how each bit of data contained in a file can be moved around or edited.
This lab focuses on the use of filters. These filers were applied to a text file which contains a database of students with various pieces of information. Filters can be applied to this file through the use of pipes in order to sort through the data and display relevant information. Many of the filtering techniques used in this lab have already been explored in some capacity. The grep utility is used in order to search through the database entries based on some criteria. The sed utility is also used to edit the output to change what information is displayed.
The cut utility is introduced in this activity, and in many ways it is better suited for manipulating the output in this circumstance than sed is. The cut utility allows the user to specify a character or a string of characters that separates the fields of data and then specify which fields should be removed from the output. Another new utility is tr, which is used to translate certain strings of characters to another string and functions very similarly to sed’s substitution function. The head and tail programs are used to display only the first or last several lines of output respectively.
This case study dealt with the concept of groups and security features of Unix. This topic mostly deals with file permissions, which are used to specify what actions different groups of users are allowed to do with a file. The different actions that a user is allowed to do (or prevented from doing) to a file are read, write, and search or execute depending on the file type. These permissions are different for the file’s owner, the security group associated with the file, and everyone else. These permissions can be symbolically, as they are in a long directory listing, or as an octal value and they can be changed with the chmod utility.
This activity also demonstrated how to determine user and group ID numbers. Each user has a unique ID number (with the root user being 0) that Unix uses to identify them, and similarly each group is also identified by a number. The concept of a umask is also introduced, which is used to specify the permissions that are given to a file when it is created. A umask is defined by three octal numbers (one for each type of user) that is applied to the default permission to specify which permissions should be changed. This case study was useful for demonstrating how permissions can be manipulated and how they affect access to different files on a system.
This is a sample format for a dated entry. Please substitute the actual date for “Month Day, Year”, and duplicate the level 4 heading to make additional entries.
As an aid, feel free to use the following questions to help you generate content for your entries:
Remember that 4 is just the minimum number of entries. Feel free to have more.
A shell script is an executable text file that contains a list of commands and operations to be performed by a command line interpreter.
#!/bin/bash echo "Please enter your birth year" read birth let year=`date +%Y` let age=$year-$birth echo $age
lab46:~$ ./age.sh Please enter your birth year 1986 26
Filtering can be applied to a set of data in order to exclude unnecessary information.
This example shows how a filter can be applied to display only the first 3 lines of a text file.
lab46:~$ head -3 sample.db name:sid:major:year:favorite candy* Jim Smith:105743:Economics:Sophomore:Lollipops* Adelle Wilson:594893:Sociology:Junior:Ju-Ju Fish*
Regular files is a term used to distinguish them from other types of files called special files. Regular files are specified by a “-: for the file flag in a long directory listing. Types of regular files are text files, binary data files, and executable files.
A long listing displaying a regular file.
lab46:~$ ls -l file -rw-r----- 1 rhensen lab46 90 May 8 21:46 file
A directory is a file type that is able to contain other files. A directory is marked with a “d” for the file flag in a long directory listing. Directories are the most common type of special files found in Unix.
A long listing of a directory file.
lab46:~$ ls -ld bin drwxr-xr-x 2 rhensen lab46 18 Mar 9 22:49 bin
Permissions specify how users are able to access a file. Permissions include being able to read, write to, and search or execute a file depending on the type of file it is. Permissions are defined separately for the owner of the file, the security group associated with the file, and everyone else. These are displayed symbolically as a list of three sets of rwx bits.
A long directory listing displays the symbolic permissions for each type of user. The first set applies to the owner of the file, the second to the group, and the third to the world. The owner of the file and the group associated with it are displayed afterwards.
drwxr-xr-x 2 rhensen lab46 4096 Apr 7 02:10 courselist -rw-r--r-- 1 rhensen lab46 666 Feb 9 21:50 courses.html -rwxrwxrwx 1 rhensen lab46 794 May 3 13:03 cs0xd.sh -rw-r----- 1 rhensen lab46 8186 Apr 19 16:59 data.file
The umask is a value that can be specified by the user to set the permissions for new files that are created. A umask specifies which permissions should not be given when a new file is created.
This shows that when a umask is set to 000 (will not change default permissions) a regular file is created with the permission 666. When a umas of 022 is set, a new regular file that is created has a permission of 644.
lab46:~$ umask 000 lab46:~$ touch test1 lab46:~$ ls -l test1 -rw-rw-rw- 1 rhensen lab46 0 May 9 23:17 test1 lab46:~$ umask 022 lab46:~$ touch test2 lab46:~$ ls -l test2 -rw-r--r-- 1 rhensen lab46 0 May 9 23:18 test2
The data dump program (dd) can be used to transfer data from one file into another file. The user has a great deal of control when specifying which bytes of data are to be copied over and where to place them in the output file.
An example of dd being used to extract all the information of a file called “pattern” into a file called “test1”.
lab46:~$ dd if=pattern of=test1 0+1 records in 0+1 records out 30 bytes (30 B) copied, 0.0437978 s, 0.7 kB/s
This concept is similar to a text editor, only dealing with binary data instead of text. A binary editor, such as bvi, allows a user to view and edit each byte of a file.
This is what a screen in bvi looks like. The leftmost column is the line numbers. The far right is the ASCII equivalent of the binary data (although this is typically meaningless when not dealing with text files). The middle shows the values of the bytes written in hexadecimal.
00000000 6E 61 6D 65 3A 73 69 64 3A 6D 61 6A 6F 72 3A 79 name:sid:major:y 00000010 65 61 72 3A 66 61 76 6F 72 69 74 65 20 63 61 6E ear:favorite can 00000020 64 79 2A 0A 4A 69 6D 20 53 6D 69 74 68 3A 31 30 dy*.Jim Smith:10 00000030 35 37 34 33 3A 45 63 6F 6E 6F 6D 69 63 73 3A 53 5743:Economics:S 00000040 6F 70 68 6F 6D 6F 72 65 3A 4C 6F 6C 6C 69 70 6F ophomore:Lollipo 00000050 70 73 2A 0A 41 64 65 6C 6C 65 20 57 69 6C 73 6F ps*.Adelle Wilso 00000060 6E 3A 35 39 34 38 39 33 3A 53 6F 63 69 6F 6C 6F n:594893:Sociolo 00000070 67 79 3A 4A 75 6E 69 6F 72 3A 4A 75 2D 4A 75 20 gy:Junior:Ju-Ju 00000080 46 69 73 68 2A 0A 53 61 72 61 68 20 42 69 6C 6C Fish*.Sarah Bill 00000090 69 6E 67 73 3A 39 33 38 33 38 39 3A 41 63 63 6F ings:938389:Acco 000000A0 75 6E 74 69 6E 67 3A 46 72 65 73 68 6D 61 6E 3A unting:Freshman: 000000B0 54 69 63 2D 54 61 63 73 2A 0A 45 72 69 63 20 56 Tic-Tacs*.Eric V 000000C0 69 6E 63 65 6E 74 3A 31 30 30 31 31 31 39 3A 42 incent:1001119:B 000000D0 69 6F 6C 6F 67 79 3A 46 72 65 73 68 6D 61 6E 3A iology:Freshman: 000000E0 4C 6F 6C 6C 69 70 6F 70 73 2A 0A 4C 69 6E 75 73 Lollipops*.Linus 000000F0 20 54 6F 72 76 61 6C 64 73 3A 34 34 33 32 30 30 Torvalds:443200 00000100 31 3A 43 6F 6D 70 75 74 65 72 20 53 63 69 65 6E 1:Computer Scien 00000110 63 65 3A 53 65 6E 69 6F 72 3A 53 6E 69 63 6B 65 ce:Senior:Snicke 00000120 72 73 2A 0A 41 6C 61 6E 20 43 6F 78 3A 34 30 30 rs*.Alan Cox:400 00000130 34 39 33 30 30 3A 43 6F 6D 70 75 74 65 72 20 53 49300:Computer S 00000140 63 69 65 6E 63 65 3A 53 65 6E 69 6F 72 3A 57 68 cience:Senior:Wh 00000150 6F 70 70 65 72 73 2A 0A 41 6C 61 6E 20 54 75 72 oppers*.Alan Tur 00000160 69 6E 67 3A 34 30 30 33 30 33 33 33 3A 43 6F 6D ing:40030333:Com "sample.db" 898 bytes 00000000 \156 0x6E 110 'n'
To become familiar with more complex Unix concepts.
My objective for this portion of the semester was to become more familiar with some of the more complex concepts in Unix, such as regular expressions and scripting.
To determine my progress in this area I will look at my comprehension of how to use the different symbols of regular expressions. It is also useful to be able to think in terms of regular expressions and use them to obtain desired results.
I will be able to measure my progress by looking at how I am doing at solving exercises that involve the use of regular expressions and writing scripts.
I believe that in this portion of the semester I have become more proficient in using regular expressions effectively. When the concept was first introduced to me it reminded me of wildcards, which are also used to find matches to patterns. I felt that wildcards were fairly easy to use and straightforward. Understanding how to use them has made regular expressions easier, however the complexity of RegEx is much greater and there is much more that can be done with them. I feel that after working through the various labs in this portion of the semester I have a better understanding of how regular expressions are used. I believe I still sometimes forget the way that different tools handle RegEx differently, such as grep being unable to handle extended characters. I still find regular expressions challenging especially when implementing them into a script to perform complex functions, but I feel that I have a good understanding of the concept and can use them effectively.
Is it possible to use the dd command to combine text files.
The manual page for dd was used to formulate this experiment.
The hypothesis that I would like to test is that it is possible to use the dd command to extract the contents of text files into another file in order to combine the contents of text files. I believe that this can be done, although I am testing it because I believe that it is possible that text files may contain some file header information that cannot be viewed. If such information precedes a text file, I believe this test will not work. However, my understanding of text files is that the only contain text with no extraneous formatting information.
For this experiment I will create two text files that will be extracted to the same destination file. To avoid the second dd command from overwriting the first, I will use the seek option to put the second set of data after the first.
Performing the experiment yielded the follwoing results:
lab46:~$ cat small1 the answer to life, the universe, and everything lab46:~$ cat small2 42 lab46:~$ dd if=small1 of=big 0+1 records in 0+1 records out 49 bytes (49 B) copied, 0.0419779 s, 1.2 kB/s lab46:~$ ls -l big -rw-r--r-- 1 rhensen lab46 49 May 9 21:53 big lab46:~$ dd if=small2 of=big seek=49 0+1 records in 0+1 records out 3 bytes (3 B) copied, 0.0154834 s, 0.2 kB/s lab46:~$ cat big the answer to life, the universe, and everything 42
The results of this experiment show that extracting the contents of different text files into one file will still be readable. I was unsure of whether or not the contents of each file would appear on separate lines or as a single line, but these results show that each file's contents appear as a separate line.
This test shows that it is possible to merge text files using this method. The fact that both lines are readable also shows that there is no file header information that interferes with the lines of text being readable.
What is the question you'd like to pose for experimentation? State it here.
Collect information and resources (such as URLs of web resources), and comment on knowledge obtained that you think will provide useful background information to aid in performing the experiment.
Based on what you've read with respect to your original posed question, what do you think will be the result of your experiment (ie an educated guess based on the facts known). This is done before actually performing the experiment.
State your rationale.
How are you going to test your hypothesis? What is the structure of your experiment?
Perform your experiment, and collect/document the results here.
Based on the data collected:
What can you ascertain based on the experiment performed and data collected? Document your findings here; make a statement as to any discoveries you've made.
Perform the following steps:
Whose existing experiment are you going to retest? Provide the URL, note the author, and restate their question.
Evaluate their resources and commentary. Answer the following questions:
State their experiment's hypothesis. Answer the following questions:
Follow the steps given to recreate the original experiment. Answer the following questions:
Publish the data you have gained from your performing of the experiment here.
Answer the following:
Answer the following: