Corning Community College CSCS1730 UNIX/Linux Fundamentals Lab 0x3: Text Processing ~~TOC~~ =====Objective===== To explore the text manipulation abilities of UNIX, as well as become introduced to the vi editor. =====Reading===== In "Harley Hahn's Guide to UNIX and Linux", please read: * Chapter 21 ("Displaying Files", pages 521-558). * Chapter 22 ("The vi Text Editor", pages 559-626). =====Background===== Computers have many uses- but perhaps one of the most widely recognized uses is its ability to store and process text. Word processors, e-mail clients, instant messaging, and text editors all come to mind. We can type, store, and later go back and manipulate, all without even hurting a tree! UNIX actually became practically useful first serving typing duty. The accounting department at Bell Labs needed the ability to transcribe text, so one of the first text editors was created to allow the various secretaries and accountants to take advantage of this powerful system. We should all know what text is. ASCII or unicode characters transcribed into a document or text file/field of some sort. We can commonly perform operations such as saving and loading of this text. Often times other options are also typically available. However, before we even get to the text editing environment, we should look at how UNIX deals with input and output from the user's main peripheral devices (keyboard, monitor). UNIX uses what are known as data streams- formatted channels where information can flow through, which are connected to various components in the system. When you hit a key on the keyboard, it gets translated into a signal, ultimately an ASCII character, and is received by the program expecting input. Same with the display of information to your monitor. So there are 3 major data streams which we should become very familiar with (you'll encounter them elsewhere- including C, C++, and Java). There is Standard Input, Standard Output, and Standard Error. Standard Input (abbreviated as STDIN) refers to the information coming from your keyboard that you type. STDOUT, or Standard Output, is sent to your terminal display. Then, we have this thing called Standard Error (abbreviated STDERR) that also by default sent to your terminal display, but is used for diagnostic and/or error messages. Ok, so if BOTH STDOUT and STDERR go to the same place, why even have STDERR in the first place? As we'll discover in the coming weeks, UNIX provides the ability to redirect and filter text, and that we can cleanly and efficiently separate these two data streams with ease. =====Fun with cat===== Just as the **ls**(**1**) utility can be used to view the contents of directories, there exist similar capabilities to view the contents of files. With **ls**(**1**), there were multiple ways of listing files. Regular view, long listing... we can even sort them in columns, among other things. So how do we do similar with the contents of files? Behold, the **cat**(**1**), **head**(**1**), and **tail**(**1**) utilities. You may have heard reference to the ubiquitous UNIX **cat**(**1**) utility already in the course. As for what it does, we can consult its manual page: **cat** - concatenate files and print on the standard output Or, in other words, cat can be used for displaying text files on the screen. So, if you ever have a text file you wish to view the contents of, you don't have to use a text editor- cat will do nicely. (in fact, you can manipulate cat into functioning as a (albeit bare) text editor). In a way, utilities like **cat**(**1**), **head**(**1**), and **tail**(**1**) hold true to the UNIX philosophy of doing one thing and one thing well. Text editors usually have multiple features- we can edit, manipulate, etc. the text. These utilities, on the other hand, primarily allow us to output text to STDOUT. The cat(1) utility is actually more of a swiss-army knife of text manipulation, allowing us to use it for entering text as well as outputting text. It is a small, quick, and simple utility to use when poking around text files. ^ 1. ^|Using the **cat**(**1**) utility:| | ^ a.|Go into the **/etc** directory.| |:::^ b.|"cat" out **/etc/motd**, **/etc/hosts**, and other files| __NOTE:__ If you were to invoke **cat**(**1**) with no arguments, it goes into an input mode. If this happens to you, simply hit **CTRL-C** to get out of it. =====Other text processing commands===== In addition to **cat**(**1**), we also have the **head**(**1**) and **tail**(**1**) utilities that allow us to display the first n or last n lines of a text file (respectively). The **/etc/passwd** file contains an entry for every user on the system. It is often an extremely long file, containing hundreds (if not thousands) of entries. Let us use this file to play with some of these utilities. First, let's get an idea for how big the file is. The **wc**(**1**) utility can be used for counting how many lines (as well as characters and words) are present in a text file. ^ 2. ^|Using the **wc**(**1**) utility on **/etc/passwd**:| | ^ a.|How do you get **wc**(**1**) to display just the line count?| |:::^ b.|How many lines are in the file?| So, if we were to **cat** this file to the screen, it would take several screenfuls of data to get through everything. However, if we were only interested in the first few lines of the file, we could make things work to our advantage. ^ 3. ^|Using the **head**(**1**) utility on **/etc/passwd**:| | ^ a.|Display the first 16 lines of the file.| |:::^ b.|How did you do this?| Alternatively, if we were interested in the last few lines of the file, the **tail**(**1**) would be at our service. ^ 4. ^|Using the **tail**(**1**) utility on **/etc/passwd**:| | ^ a.|Display the last 8 lines of the file.| |:::^ b.|How did you do this?| The **tail**(**1**) utility also has an interesting capability, enabled when we use the "**-f**" argument. When we do this, **tail**(**1**) will not exit, but instead keep monitoring the end of the file, outputting new data as it is appended. For this next question, you will need to work with a partner (or log in twice). Be sure to identify your partner in the lab submission. ^ 5. ^|Do the following:| | ^ a.|Change to the **/tmp** directory.| |:::^ b.|Someone in the group create a file with a unique name, as agreed on by both people.| |:::^ c.|Fill the file with some initial data.| |:::^ d.|One group member should "tail -f" this file.| |:::^ e.|The other member should append text to the file by doing: **echo "stuff" >> /tmp/groupfile** a couple times.| |:::^ f.|Be sure to switch roles so both group members see what is going on.| Be sure to describe what seems to be happening when you do this. The "**>>**" will be explained in more detail in coming weeks. ^ 6. ^|Food for thought:| | ^ a.|What use would the "**-f**" feature to **tail**(**1**) be?| |:::^ b.|Is there anyplace where this could be put to good use?| =====Text Editor===== You should already be familiar with the PICO text editor (it is the one used when composing messages in PINE). This is a really simplistic text editor, displaying available commands along the bottom of your screen. PICO can be picked up easily, but really offers very little in terms of flexibility. As such, it is too easy for the likes of this course. We want to learn bigger, more sophisticated tools, and that is exactly what we shall do. When dealing with the processing of text in an editor, there are a set of operations that can be performed: creation, navigation, modification, file operations (there could undoubtedly be many more). ====Moded vs. Unmoded Editors==== The concept of modes comes around when categorizing particular functions. A mode would represent a particular set of operations that could be performed, if in that mode. Popularly, two common modes are insert and command. Insert mode would be responsible for the actual entry of text. All keys would be available for that function, and would have no other meaning except for the data to be stored in the file. Command mode, alternatively, is responsible for operations to be performed on that entered text. Operations such as navigation, file manipulation, and of course toggling insert mode. ====Unmoded/Modeless Editors==== Most everyone has had experience with unmoded editors. Unmoded editors combine both the insert and command mode functionality into one. Most keys can be used to enter text, however some have been reserved for special operations (most commonly with the CTRL key). Page 70 in __Learning the UNIX Operating System, 5th Edition__ covers the popular **nano**(**1**) unmoded text editor. For the purposes of this lab and class, we will not focus on nano or many other unmoded editors. The other books also give some in-depth coverage of nano (or its alter-ego: pico) ====Moded Editors==== UNIX saw the creation of one of the few moded editors in existance- **vi** vi has two unique modes- **command** and **insert** When in command mode, you cannot enter text. Similarly, commands cannot be entered while in insert mode. The advantage of having modes is that you have the full availability of keys- while in command mode each key on the keyboard can have a unique function. Conversely, when dealing with modeless editors (ie nearly everything else out there), the keys on the keyboard are used for entering text, and special escape sequences are used to designate commands. Although we make mention of **vi**, we will actually be learning a variant known as **vim** (VI iMproved), which includes considerable additional functionality not present in traditional vi). For those planning to do System Administration in the future, it would likely be a very good idea to get familiar with what functionality constitutes traditional vi, and what is available in vim. ====The vi editor==== ===Inserting Commands=== To enter //insert mode//, one of the following commands can be used: ^ command ^ description | | i | insert before cursor | | o | insert line below | | O | insert line above | | a | insert after cursor | To exit //insert mode// and return to //command mode//, hit the escape (**ESC**) key once. ===File Manipulation Commands=== When in //command mode//, you may use the following to alter the current file: ^ command ^ description | | :wq | save and exit | | :w | save the file | | :q! | quit without saving | | ZZ | quit and save only if changed | ===Navigation Commands=== When in //command mode//, the following commands are used to navigate: ^ command ^ description | | h (or left arrow) | move cursor left | | j (or down arrow) | move cursor down | | k (or up arrow) | move cursor up | | l (or right arrow) | move cursor right | ===vi vs. ex=== There exists a distinction between the **vi** and **ex** editors. For the purposes of this course, vi is a full screen visual editor that uses ex. When in //command mode// you can use either. The difference is how the commands are issued: * ex commands are always preceded with a colon (:) * vi commands do not use a colon There are many more commands in the book. Some more will be covered in this lab and later in the course. Get familiar with the basics. =====Procedure===== Check out the the following [[http://www.linuxconfig.org/Vim_Tutorial|VIM Tutorial]], and once done use your newfound vim knowledge to create your own customized **~/.signature** and **~/.plan** files. ======Conclusions====== This assignment has activities which you should tend to- document/summarize knowledge learned on your Opus. As always, the class mailing list and class IRC channel are available for assistance, but not answers.