User Tools

Site Tools


haas:fall2014:unix:projects:dataproc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
haas:fall2014:unix:projects:dataproc [2014/03/23 21:29] – external edit 127.0.0.1haas:fall2014:unix:projects:dataproc [2014/09/29 22:28] (current) – [Task 0: Post/respond to a question] wedge
Line 9: Line 9:
  
 =====Objective===== =====Objective=====
-To apply your growing and versatile skills on the command-line by massaging data through the deployment of innovative command-line incantations and slick scripts. +To apply your growing and versatile skills on the command-line by massaging data through the deployment of innovative command-line incantations.
 =====Background===== =====Background=====
 Often times, we will find ourselves encountering data in a slightly one-off format- not quite meeting some requirement we need for further processing. Often times, we will find ourselves encountering data in a slightly one-off format- not quite meeting some requirement we need for further processing.
Line 19: Line 18:
  
 =====Task 0: Post/respond to a question===== =====Task 0: Post/respond to a question=====
-  * Because the class mailing list has been rather quiet of late, and we've got a break coming up, I would like each person to post at least 1 focused question regarding this project to the class mailing list. +  * To ensure adequate out-of-class communications, I'd like for you to make use of the class mailing list
-    * Please do not give away any answers to the actions requested by this project in doing so. +    * I would like each person to post at least 1 focused question regarding this project to the class mailing list. 
-    * Be sure to identify which "task" or aspect of the project you are asking about +      * This also helps to make sure everyone has subscribed to the list (as you should have the first week) 
-  * Respond to at least 1 question, not by giving an explicit answer, but by asking further questions, or giving a pointer to a resource that may contain additional information (i.e. see **cut(1)** manual page)+      * Please do not give away any answers to the actions requested by this project in doing so. 
 +      * Be sure to identify which "task" or aspect of the project you are asking about 
 +    * Respond to at least 1 question, not by giving an explicit answer, but by asking further questions, or giving a pointer to a resource that may contain additional information (i.e. see the **cut(1)** manual page)
     * To get credit, your response can**not** be to one of your own questions.     * To get credit, your response can**not** be to one of your own questions.
   * Put a URL to the mailing list post of your question asked in a file called: **task0.question**   * Put a URL to the mailing list post of your question asked in a file called: **task0.question**
Line 134: Line 135:
  
 =====Task 5: Find and count the duplicates===== =====Task 5: Find and count the duplicates=====
-  * Ignoring the index values in the left-most column, determine which numerical codes occur more than once by concocting a command-line incantation or script that appropriately filters and processes the output.+  * Ignoring the index values in the left-most column, determine which numerical codes occur more than once by concocting a command-line incantation that appropriately filters and processes the output.
   * Also display with a count of the total number of lines in the output, along with the total number of lines with valid numeric values (ignore "blank" lines and lines with error codes). Finally, display the total count of lines that have duplicates.   * Also display with a count of the total number of lines in the output, along with the total number of lines with valid numeric values (ignore "blank" lines and lines with error codes). Finally, display the total count of lines that have duplicates.
-  * Put your resulting command-line(s) or script in a file called **task5.sh** +    * Omit all the lines that occurred only once (ie has no duplicates); it will make your data set immediately more reasonable. 
-  * Put the output (result) of your command-line(s) or script in a file called **task5.out**+  * Put your resulting command-line(s) in a file called **task5.sh** 
 +  * Put the output (result) of your command-line(s) in a file called **task5.out**
  
 For example, let's say we had the following output: For example, let's say we had the following output:
haas/fall2014/unix/projects/dataproc.1395610148.txt.gz · Last modified: 2014/09/29 22:22 (external edit)