User Tools

Site Tools


user:vcordes1:portfolio:cla

Purpose

The purpose here is to mine the 2012 spring class schedule in html format and extract specific classes

Necessities

  • Knowledge of Regular Expressions
  • Knowledge of Shell Scripting

Process

  • With this I will be saving the relevant data to a file and manipulating the file via a shell script.

Things

  • To get the dataz
  * cat spring2012-20111103.html | grep "ddtitle" | sed 's/^<TH CLASS="ddtitle".*crn_in=.....">//g' | sed 's/<\/A.*$//g' | sed 's/^\(.*\) - \([0-9][0-9][0-9][0-9][0-9]\) - \(.*\) - \([0-9][0-9][0-9]\)$/\1: \3-\4:\2/g'
  • Shell Script
#!/bin/bash

echo -n "please enter a class: "

read class

cat combooutput1 | grep -A5 $class

Attributes

  • Files and directories
  • Commands
  • The UNIX shell
  • Regular Expressions
  • Filters
  • Scripting
  • The UNIX development Environment

Final Thinkings

  • This was relatively easy working with only the necessary data.
user/vcordes1/portfolio/cla.txt · Last modified: 2011/12/15 12:33 by vcordes1