======Project 5: Helping a Friend====== A project for Unix/Linux by Corey Forman during the Fall 2011. This project was begun on 11/17/11 and was finished on that day also. =====Objectives===== The objective is to "data-mine" six folders of information and retrieve names, emails, and companies out of the data. Since this is a large project it will be divided into smaller portions but have the same pattern of data for the end result. =====Prerequisites===== In order to successfully accomplish/perform this project, the listed resources/experiences need to be consulted/achieved: * competent Command Line skills * basic text editing * competent RegEx use * pattern recognition doesn't hurt either =====Background===== The purpose of this project was to assist a friend in data-mining a large amount of information. I attempted to format the data into the version needed to be turned into his boss. =====Scope===== I am going to use RegEx commands to grab and format the data necessary. I will also Manually edit some of the data. The focus on the project is the RegEx command used to get the data into the right format. =====Attributes===== State and justify the attributes you'd like to receive upon successful approval and completion of this project. * filter because we are grepping out stuff that is not needed. * regular expressions: we filter by using regular expressions * text processing: we are manipulating text with RegExs * files and directories we are working with files to get data out of them. * security : we had to cp it to our directory because it was under some elses ownership meaning we could not edit the data. * command line: we use RegExs on the command line to manipulate data. =====Procedure===== I copied the file that needed to data mine. next i formatted the data into a position of which i could edit it with RegExs easily i then tried out various RegEx commands until i received the data i wanted. i then saved that data onto a file so it could be transferred back to the tmp file. =====Execution===== lab46:lab46:~$ ls 1275799069694.jpg archivecompilationfile data mystery testdir 250px-P2_glados.jpg archives emvideo-youtube-nd2rBWbvDbA_3.jpg nom-nom-nom-babies.jpg testdir.tar Downloads archives.tar.bz2 error.log public_html testdir2 InstNLP2.txt archives.zip fiddlesticks.jpg puzzlebox testfile InstNLP2Edited.txt bin funny-pictures-taco-cat-is-a-palindrome.jpg shaco.jpg tmp Maildir cake goonies-musical.jpg shellscripting trollin RageFaceBlackSS.png closet irc spring2012-20111103.html trolling-400x345.jpg archive corningcourses linktestfile src veigar.jpg archive1.tar.gz corningcoursesorg minecraft-creeper-comic-600x694.png src.orig wicked-witch.jpg archive2.zip courses motd tempfile words lab46:~$ ~/src/cprog$ ./hello Hello, World! lab46:~/src/cprog$ the file after some text editing that i was working with. the file name is InstNLP2.txt Arcturus Belgiƫ Eric Schneider email: info@arcturus.be Heart Systems n.v. - International Training Institute for Communication and NLP Paul Liekens email: Paul.Liekens@hookon.be InMind Peter Wrycza and Jan Ardui email: pwrycza@indosat.net.id Institut Ressources Alain Moenaert email: alain.moenaert@infoboard.be BrainNet Dr. Helosio Rodrigues, MD email: brainet@unisys.com.br Centro de Aprendizado Linguistico Wilma Steagall de Tomasso email: silveira@dialdata.com.br Conexao Evolving Center of NLP Getulio Barnasque email: conexao@pro.via-rs.com.br the RegEx used to manipulate this data. cat InstNLP2.txt | sed 's/^$/^/g' | tr '\n' '$' | tr '^' '\n'|sed 's/-----------/unknown/g'|sed 's/^\$\(.*\)\$\(.*\)\$\(.*\)\$$/"\3","\2","\1"/g'|sed 's/email: //g'>InstNLP2Edited.txt the results were as follows. "info@arcturus.be ","Eric Schneider ","Arcturus Belgiƫ " "Paul.Liekens@hookon.be ","Paul Liekens ","Heart Systems n.v. - International Training Institute for Communication and NLP " "pwrycza@indosat.net.id ","Peter Wrycza and Jan Ardui ","InMind " "alain.moenaert@infoboard.be ","Alain Moenaert ","Institut Ressources " this data can then be imputed and recognized as data in excel and turned into a spreadsheet. =====Reflection===== Comments/thoughts generated through performing the project, observations made, analysis rendered, conclusions wrought. What did you learn from doing this project? data mining can be a useful skill when applying for a job because most industries function around data today. Being able to data mine can separate you from the rest of the techies. =====References===== In performing this project, the following resources were referenced: * none in class information only