This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
user:ccaccia:portfolio:project1 [2011/12/14 22:25] – [Reflection] ccaccia | user:ccaccia:portfolio:project1 [2011/12/15 15:37] (current) – [Project: DATA MINING] ccaccia | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ======Project: | ||
+ | A project for UNIX/Linux Fundamentals by Christopher M. Caccia during the FALL Semester 2011. | ||
+ | |||
+ | This project was begun on November 17th 2011 and has taken 3 hours to complete. | ||
+ | |||
+ | =====Objectives===== | ||
+ | |||
+ | The purpose of this project was to take a text file, containing business department names, addresses, employee names, phone numbers, and e-mail addresses and extract them for output in a specific format. | ||
+ | =====Prerequisites===== | ||
+ | In order to successfully accomplish/ | ||
+ | |||
+ | * ls command | ||
+ | * grep command | ||
+ | * sed command | ||
+ | * cat command | ||
+ | * Regular Expressions | ||
+ | |||
+ | =====Background===== | ||
+ | |||
+ | This project was attempted because a real world business was consolidating many different text files in many different formats, to one file and one specific format. | ||
+ | |||
+ | This process is called data mining and is literally considered magic to most computer users. | ||
+ | |||
+ | =====Scope===== | ||
+ | Give a general overview of your anticipated implementation of the project. Address any areas where you are making upfront assumptions or curtailing potential detail. State the focus you will be taking in implementation. | ||
+ | |||
+ | I will be taking a text document and extracting only specific data from the file and then output this extracted data to a new file. This new file will be used to build a new database of business names, employee names, and respective e-mail addresses for a anonymous company in real world application. | ||
+ | |||
+ | Using text processing commands and regular expressions I will organize the data in the original file, so that specific parts of the data can be extracted and implemented. | ||
+ | =====Attributes===== | ||
+ | State and justify the attributes you'd like to receive upon successful approval and completion of this project. | ||
+ | |||
+ | * __Files and directories__: | ||
+ | * __Commands__: | ||
+ | * __Text processing__: | ||
+ | * __The UNIX development environment__: | ||
+ | * __Regular Expressions__: | ||
+ | * __Groups__: I output the final text document to a directory accessible to group users. | ||
+ | * __Multitasking__: | ||
+ | * __Filters__: | ||
+ | |||
+ | |||
+ | =====Procedure===== | ||
+ | The actual steps taken to accomplish the project. Include images, code snippets, command-line excerpts; whatever is useful for intuitively communicating important information for accomplishing the project. | ||
+ | |||
+ | I started by viewing the original text file using the command " | ||
+ | |||
+ | I noticed that line breaks seemed to separate the clusters of data, and each block of information was also on it's own line. | ||
+ | |||
+ | Using regular expressions, | ||
+ | |||
+ | Using regular expressions I was able to cut out only the fields I needed and then re-order these fields so that it would display business name, employee name, and e-mail. | ||
+ | |||
+ | After the extracted data was organized the way I needed, I sent a copy of the final outcome to the /tmp directory where the data could be merged with all other files. | ||
+ | |||
+ | =====Execution===== | ||
+ | Upon completion of the project, if there is an applicable collection of created code, place a copy of your finished code within < | ||
+ | |||
+ | <cli> | ||
+ | lab46:~$ cat " | ||
+ | > | sed ' | ||
+ | > | sed ' | ||
+ | > | sed ' | ||
+ | > sed ' | ||
+ | lab46:~$ cp " | ||
+ | lab46:~$ | ||
+ | </ | ||
+ | |||
+ | |||
+ | =====Reflection===== | ||
+ | |||
+ | This was an interesting project for me personally. | ||
+ | =====References===== | ||
+ | In performing this project, the following resources were referenced: | ||
+ | |||
+ | * http:// | ||
+ | * Manual pages for: | ||
+ | grep | ||
+ | cut | ||
+ | sed | ||
+ | regex | ||
+ | |||
+ | * http:// | ||
+ | * Google was also helpful in understanding how regex are implemented | ||
+ | |||
+ | |||
+ | Generally, state where you got informative and useful information to help you accomplish this project when you originally worked on it (from Google, other wiki documents on the Lab46 wiki, etc.) |