This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
user:cforman:portfolio:project5 [2011/12/16 02:09] – [Reflection] cforman | user:cforman:portfolio:project5 [2011/12/16 02:09] (current) – [References] cforman | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ======Project 5: Helping a Friend====== | ||
+ | A project for Unix/Linux by Corey Forman during the Fall 2011. | ||
+ | |||
+ | This project was begun on 11/17/11 and was finished on that day also. | ||
+ | |||
+ | =====Objectives===== | ||
+ | The objective is to " | ||
+ | |||
+ | =====Prerequisites===== | ||
+ | In order to successfully accomplish/ | ||
+ | |||
+ | * competent Command Line skills | ||
+ | * basic text editing | ||
+ | * competent RegEx use | ||
+ | * pattern recognition doesn' | ||
+ | |||
+ | |||
+ | =====Background===== | ||
+ | |||
+ | The purpose of this project was to assist a friend in data-mining a large amount of information. I attempted to format the data into the version needed to be turned into his boss. | ||
+ | =====Scope===== | ||
+ | I am going to use RegEx commands to grab and format the data necessary. I will also Manually edit some of the data. The focus on the project is the RegEx command used to get the data into the right format. | ||
+ | =====Attributes===== | ||
+ | State and justify the attributes you'd like to receive upon successful approval and completion of this project. | ||
+ | |||
+ | * filter because we are grepping out stuff that is not needed. | ||
+ | * regular expressions: | ||
+ | * text processing: we are manipulating text with RegExs | ||
+ | * files and directories we are working with files to get data out of them. | ||
+ | * security : we had to cp it to our directory because it was under some elses ownership meaning we could not edit the data. | ||
+ | * command line: we use RegExs on the command line to manipulate data. | ||
+ | |||
+ | =====Procedure===== | ||
+ | I copied the file that needed to data mine. | ||
+ | next i formatted the data into a position of which i could edit it with RegExs easily | ||
+ | i then tried out various RegEx commands until i received the data i wanted. | ||
+ | i then saved that data onto a file so it could be transferred back to the tmp file. | ||
+ | |||
+ | =====Execution===== | ||
+ | <cli> | ||
+ | lab46: | ||
+ | 1275799069694.jpg | ||
+ | 250px-P2_glados.jpg | ||
+ | Downloads | ||
+ | InstNLP2.txt | ||
+ | InstNLP2Edited.txt | ||
+ | Maildir | ||
+ | RageFaceBlackSS.png | ||
+ | archive | ||
+ | archive1.tar.gz | ||
+ | archive2.zip | ||
+ | lab46:~$ | ||
+ | ~/ | ||
+ | Hello, World! | ||
+ | lab46: | ||
+ | |||
+ | the file after some text editing that i was working with. the file name is InstNLP2.txt | ||
+ | |||
+ | Arcturus België | ||
+ | Eric Schneider | ||
+ | email: info@arcturus.be | ||
+ | |||
+ | Heart Systems n.v. - International Training Institute for Communication and NLP | ||
+ | Paul Liekens | ||
+ | email: Paul.Liekens@hookon.be | ||
+ | |||
+ | InMind | ||
+ | Peter Wrycza and Jan Ardui | ||
+ | email: pwrycza@indosat.net.id | ||
+ | |||
+ | Institut Ressources | ||
+ | Alain Moenaert | ||
+ | email: alain.moenaert@infoboard.be | ||
+ | |||
+ | BrainNet | ||
+ | Dr. Helosio Rodrigues, MD | ||
+ | email: brainet@unisys.com.br | ||
+ | |||
+ | Centro de Aprendizado Linguistico | ||
+ | Wilma Steagall de Tomasso | ||
+ | email: silveira@dialdata.com.br | ||
+ | |||
+ | Conexao Evolving Center of NLP | ||
+ | Getulio Barnasque | ||
+ | email: conexao@pro.via-rs.com.br | ||
+ | |||
+ | the RegEx used to manipulate this data. | ||
+ | cat InstNLP2.txt | sed ' | ||
+ | ' | ||
+ | |||
+ | the results were as follows. | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | |||
+ | this data can then be imputed and recognized as data in excel and turned into a spreadsheet. | ||
+ | </ | ||
+ | |||
+ | =====Reflection===== | ||
+ | Comments/ | ||
+ | |||
+ | data mining can be a useful skill when applying for a job because most industries function around data today. Being able to data mine can separate you from the rest of the techies. | ||
+ | =====References===== | ||
+ | In performing this project, the following resources were referenced: | ||
+ | |||
+ | * none in class information only |