Another challenge appears on the horizon of UNIXy possibility, and one that some of you can potentially put to near-immediate use in some upcoming academic activities.
Some of you may have noticed that Summer and Fall 2010 course registrations are now upon us, and take it from me, the sooner you get your courses taken care of, the easier it will be. With likely continued record-high enrollments, the chances of the classes you want filling up are high, and the speeds at which such things may take place have been known to surprise people.
With that said, this week's quest is going to focus on playing with CCC's course offering data, and gleaning useful information from it.
To start, you will need a copy of the CCC course data for Fall 2010. To save time and frustration, I have gone ahead and downloaded it, and that can be found in a gzipped html file called fall2010-20100315.html.gz located in the courselist/ subdirectory of the UNIX Public Directory. Obtain a copy of this file and place it in a good working location within your home directory.
Notice the size of this file. Uncompress it and check the file size… what is the ratio of raw to compressed data?
Take a look at the data inside this file (use your favorite moded text editor, or cat it with a pipe to less)…. study the data and look for any patterns within it.
Specifically, I want you to be able to identify:
Once you think you have a handle on how the file is arranged (be sure to jot down some of your observations… would make great journal content as you explore this problem), I'd like for you to, using your skills on the UNIX command-line, obtain for me the following information:
To assist you, I would recommend exploring and becoming more familiar with some of the following commands (in addition to your working toolset):
Now, with sed and grep we approach another big area of topic coverage— and that is the area of Regular Expressions.
This is an area we will be spending some time and attention on, but it may behoove you to start reading up on them and asking questions and playing with them.
Some of your books have information on Regular Expressions, and some manual pages (the grep manual page has an informative section on “REGULAR EXPRESSIONS”– search for it in all caps like that, no quotes needed).
Basically, a regular expression is a pattern that can be applied to text, in a similar way as wildcards work on files. Both grep and sed understand regular expressions, and through using them one can obtain some amazing capabilities.
Quickly, a basic table of Regular Expressions:
Symbol | Description |
---|---|
CAROT (shift-6) | Match Beginning of Line |
$ | Match Ending of Line |
. | Match Any Single Character |
* | Match 0 or More of the Previous |
[ ] | Match One of Any of the Enclosed Characters (Character Class) |
[CAROT (shift-6) ] | Do NOT Match One of Any of the Enclosed Characters (Inverted Character Class) |
NOTE: All mentions of “CAROT (shift-6)” should be substituted with a '^' character… at the time of writing, I can't figure out how to get that character properly escaped in dokuwiki table syntax
Can someone figure out how I can display a '^' symbol in a dokuwiki table so I can fix it?
There are more, but for now let's focus on these.
Try to use Regular Expressions with grep and sed to assist you in finding information you seek. This can help you in the solving of the information I request above, along with opening the door to more exciting and more powerful capabilities we will soon be exploring.
Additionally, try your hand at the following:
Once again, ASKING QUESTIONS will be greatly beneficial. This isn't a problem that you can likely do in 20 minutes, so you'll want to gradually poke away at it throughout the week.
Good luck.