User Tools

Site Tools


haas:spring2011:unix:labs:lab3


Corning Community College


UNIX/Linux Fundamentals



Lab 0x3: Text Processing

~~TOC~~

Objective

To explore the text manipulation abilities of UNIX, as well as become introduced to the vi editor.

Reading

In “Harley Hahn's Guide to UNIX and Linux”, please read:

  • Chapter 21 (“Displaying Files”, pages 521-558).
  • Chapter 22 (“The vi Text Editor”, pages 559-626).

Background

Computers have many uses- but perhaps one of the most widely recognized uses is its ability to store and process text. Word processors, e-mail clients, instant messaging, and text editors all come to mind. We can type, store, and later go back and manipulate, all without even hurting a tree!

UNIX actually became practically useful first serving typing duty. The accounting department at Bell Labs needed the ability to transcribe text, so one of the first text editors was created to allow the various secretaries and accountants to take advantage of this powerful system.

We should all know what text is. ASCII or unicode characters transcribed into a document or text file/field of some sort. We can commonly perform operations such as saving and loading of this text. Often times other options are also typically available.

However, before we even get to the text editing environment, we should look at how UNIX deals with input and output from the user's main peripheral devices (keyboard, monitor).

UNIX uses what are known as data streams- formatted channels where information can flow through, which are connected to various components in the system. When you hit a key on the keyboard, it gets translated into a signal, ultimately an ASCII character, and is received by the program expecting input. Same with the display of information to your monitor.

So there are 3 major data streams which we should become very familiar with (you'll encounter them elsewhere- including C, C++, and Java). There is Standard Input, Standard Output, and Standard Error.

Standard Input (abbreviated as STDIN) refers to the information coming from your keyboard that you type.

STDOUT, or Standard Output, is sent to your terminal display.

Then, we have this thing called Standard Error (abbreviated STDERR) that also by default sent to your terminal display, but is used for diagnostic and/or error messages.

Ok, so if BOTH STDOUT and STDERR go to the same place, why even have STDERR in the first place? As we'll discover in the coming weeks, UNIX provides the ability to redirect and filter text, and that we can cleanly and efficiently separate these two data streams with ease.

Fun with cat

Just as the ls(1) utility can be used to view the contents of directories, there exist similar capabilities to view the contents of files.

With ls(1), there were multiple ways of listing files. Regular view, long listing… we can even sort them in columns, among other things. So how do we do similar with the contents of files? Behold, the cat(1), head(1), and tail(1) utilities.

You may have heard reference to the ubiquitous UNIX cat(1) utility already in the course. As for what it does, we can consult its manual page:

cat - concatenate files and print on the standard output

Or, in other words, cat can be used for displaying text files on the screen. So, if you ever have a text file you wish to view the contents of, you don't have to use a text editor- cat will do nicely. (in fact, you can manipulate cat into functioning as a (albeit bare) text editor).

In a way, utilities like cat(1), head(1), and tail(1) hold true to the UNIX philosophy of doing one thing and one thing well. Text editors usually have multiple features- we can edit, manipulate, etc. the text. These utilities, on the other hand, primarily allow us to output text to STDOUT.

The cat(1) utility is actually more of a swiss-army knife of text manipulation, allowing us to use it for entering text as well as outputting text. It is a small, quick, and simple utility to use when poking around text files.

1. Using the cat(1) utility:
a.Go into the /etc directory.
b.“cat” out /etc/motd, /etc/hosts, and other files

NOTE: If you were to invoke cat(1) with no arguments, it goes into an input mode. If this happens to you, simply hit CTRL-C to get out of it.

Other text processing commands

In addition to cat(1), we also have the head(1) and tail(1) utilities that allow us to display the first n or last n lines of a text file (respectively).

The /etc/passwd file contains an entry for every user on the system. It is often an extremely long file, containing hundreds (if not thousands) of entries. Let us use this file to play with some of these utilities.

First, let's get an idea for how big the file is. The wc(1) utility can be used for counting how many lines (as well as characters and words) are present in a text file.

2. Using the wc(1) utility on /etc/passwd:
a.How do you get wc(1) to display just the line count?
b.How many lines are in the file?

So, if we were to cat this file to the screen, it wculd take several screenfuls of data to get through everything. However, if we were only interested in the first few lines of the file, we could make things work to our advantage.

3. Using the head(1) utility on /etc/passwd:
a.Display the first 16 lines of the file.
b.How did you do this?

Alternatively, if we were interested in the last few lines of the file, the tail(1) would be at our service.

4. Using the tail(1) utility on /etc/passwd:
a.Display the last 8 lines of the file.
b.How did you do this?

The tail(1) utility also has an interesting capability, enabled when we use the “-f” argument. When we do this, tail(1) will not exit, but instead keep monitoring the end of the file, outputting new data as it is appended.

For this next question, you will need to work with a partner (or log in twice). Be sure to identify your partner in the lab submission.

5. Do the following:
a.Change to the /tmp directory.
b.Someone in the group create a file with a unique name, as agreed on by both people.
c.Fill the file with some initial data.
d.One group member should “tail -f” this file.
e.The other member should append text to the file by doing: echo “stuff” » /tmp/groupfile a couple times.
f.Be sure to switch roles so both group members see what is going on.

Be sure to describe what seems to be happening when you do this. The “»” will be explained in more detail in coming weeks.

6. Food for thought:
a.What use would the “-f” feature to tail(1) be?
b.Is there anyplace where this could be put to good use?

Text Editor

You should already be familiar with the PICO text editor (it is the one used when composing messages in PINE). This is a really simplistic text editor, displaying available commands along the bottom of your screen.

PICO can be picked up easily, but really offers very little in terms of flexibility. As such, it is too easy for the likes of this course. We want to learn bigger, more sophisticated tools, and that is exactly what we shall do.

When dealing with the processing of text in an editor, there are a set of operations that can be performed: creation, navigation, modification, file operations (there could undoubtedly be many more).

Moded vs. Unmoded Editors

The concept of modes comes around when categorizing particular functions. A mode would represent a particular set of operations that could be performed, if in that mode. Popularly, two common modes are insert and command.

Insert mode would be responsible for the actual entry of text. All keys would be available for that function, and would have no other meaning except for the data to be stored in the file.

Command mode, alternatively, is responsible for operations to be performed on that entered text. Operations such as navigation, file manipulation, and of course toggling insert mode.

Unmoded/Modeless Editors

Most everyone has had experience with unmoded editors. Unmoded editors combine both the insert and command mode functionality into one. Most keys can be used to enter text, however some have been reserved for special operations (most commonly with the CTRL key).

Page 70 in Learning the UNIX Operating System, 5th Edition covers the popular nano(1) unmoded text editor. For the purposes of this lab and class, we will not focus on nano or many other unmoded editors. The other books also give some in-depth coverage of nano (or its alter-ego: pico)

Moded Editors

UNIX saw the creation of one of the few moded editors in existance- vi

vi has two unique modes- command and insert

When in command mode, you cannot enter text. Similarly, commands cannot be entered while in insert mode. The advantage of having modes is that you have the full availability of keys- while in command mode each key on the keyboard can have a unique function. Conversely, when dealing with modeless editors (ie nearly everything else out there), the keys on the keyboard are used for entering text, and special escape sequences are used to designate commands.

Although we make mention of vi, we will actually be learning a variant known as vim (VI iMproved), which includes considerable additional functionality not present in traditional vi). For those planning to do System Administration in the future, it would likely be a very good idea to get familiar with what functionality constitutes traditional vi, and what is available in vim.

The vi editor

Inserting Commands

To enter insert mode, one of the following commands can be used:

command description
i insert before cursor
o insert line below
O insert line above
a insert after cursor

To exit insert mode and return to command mode, hit the escape (ESC) key once.

File Manipulation Commands

When in command mode, you may use the following to alter the current file:

command description
:wq save and exit
:w save the file
:q! quit without saving
ZZ quit and save only if changed

When in command mode, the following commands are used to navigate:

command description
h (or left arrow) move cursor left
j (or down arrow) move cursor down
k (or up arrow) move cursor up
l (or right arrow) move cursor right

vi vs. ex

There exists a distinction between the vi and ex editors. For the purposes of this course, vi is a full screen visual editor that uses ex. When in command mode you can use either. The difference is how the commands are issued:

  • ex commands are always preceded with a colon (:)
  • vi commands do not use a colon

There are many more commands in the book. Some more will be covered in this lab and later in the course. Get familiar with the basics.

Procedure

Check out the the following VIM Tutorial, and once done use your newfound vim knowledge to create your own customized ~/.signature and ~/.plan files.

Conclusions

All questions in this assignment require an action or response. Please organize your responses into an easily readable format and submit the final results to your instructor.

Your assignment is expected to be performed and submitted in a clear and organized fashion- messy or unorganized assignments may have points deducted. Be sure to adhere to the submission policy.

The successful results of the following actions will be considered for evaluation:

  • your responses to questions submitted at the following form:

<html><center><a href=“http://lab46.corning-cc.edu/haas/content/unix/submit.php?lab3”>http://lab46.corning-cc.edu/haas/content/unix/submit.php?lab3</a></center></html>

  • the response from the form (received via e-mail) saved as lab3.txt to your ~/src/unix/ directory
  • addition/commit of ~/src/unix/lab3.txt into your repository (CS 0x0 sets you up to do this).

As always, the class mailing list and class IRC channel are available for assistance, but not answers.

haas/spring2011/unix/labs/lab3.txt · Last modified: 2011/02/12 01:08 by 127.0.0.1