Differences

This shows you the differences between two versions of the page.

--- haas:spring2014:unix:projects:dataproc [2014/03/23 19:24] – [Submission] wedge
+++ haas:spring2014:unix:projects:dataproc [2014/03/23 21:29] (current) – [Submission] wedge
@@ Line 12: / Line 12: @@
 =====Background=====
-To most of us, computers are a frequently used interactive tool for accomplishing work. With our recent explorations into the realm of shell scripting, concepts of automation are starting to enter our peripherary.
+Often times, we will find ourselves encountering data in a slightly one-off format- not quite meeting some requirement we need for further processing.
-With automation, comes the need to do things outside that interactive environment, and run at a designated time or in reaction to a particular event.
+Luckily, the UNIX environment provides many facilities for filtering and manipulating data so that we can "reformat" it to meet expectations.
-We will be scheduling tasks with respect to time in this project, to get better acquainted with the functionality and capabilities of non-interactive yet automated tasks.
+This activity has you dabbling in one such scenario: a program that generates "raw" data (simulated from a scientific/industrial instrument). This "raw" data needs to be sanitized and reformatted (to perhaps be further analyzed by other tools upstream).
-=====cron=====
+=====Task 0: Post/respond to a question=====
-From the wikipedia article on [[wp>Cron]]:
+  * Because the class mailing list has been rather quiet of late, and we've got a break coming up, I would like each person to post at least 1 focused question regarding this project to the class mailing list.
+    * Please do not give away any answers to the actions requested by this project in doing so.
+    * Be sure to identify which "task" or aspect of the project you are asking about
+  * Respond to at least 1 question, not by giving an explicit answer, but by asking further questions, or giving a pointer to a resource that may contain additional information (i.e. see **cut(1)** manual page)
+    * To get credit, your response can**not** be to one of your own questions.
+  * Put a URL to the mailing list post of your question asked in a file called: **task0.question**
+    * See http://lab46.corning-cc.edu/mailman/listinfo/unix to access the archives
+  * Put a URL to the mailing list post of your response in a file called: **task0.response**
+    * See http://lab46.corning-cc.edu/mailman/listinfo/unix to access the archives
+    * A question may receive multiple answers.
-"Cron is a time-based job scheduler in Unix-like computer operating systems. The name cron comes from the word "chronos", Greek for "time". Cron enables users to schedule jobs (commands or shell scripts) to run periodically at certain times or dates. It is commonly used to automate system maintenance or administration, though its general-purpose nature means that it can be used for other purposes, such as connecting to the Internet and downloading email."
+=====Task 1: Obtain source code=====
+On Lab46, in the **/var/public/unix/projects/dataproc/** directory, is a file called **info.c**
-Be sure to check the manual page for **cron**(**8**), and the corresponding manual pages for **crontab**(**1**) and **crontab**(**5**). When you are familiar with where pertinent information can be found regarding cron, proceed to the question below.
+  * Copy this into your home directory. How did you do it?
+  * Write down the command-line used in a file called **task1.txt**
-=====Task 1: Obtain irc bot=====
+=====Task 2: Study the file contents=====
-There exist a number of irc bots, consisting of varying features and complexities. To facilitate your task, I would recommend the using of Phenny (or a clone), which tends to be simpler to deploy than its more configurable counterparts.
-So, via the proper means:
+Determine:
+  * How to properly compile the file (so that it will run without displaying an error)?
+  * How to properly execute the resulting program (to generate 8 lines of output)?
+  * When you figure out the answers to both of these, put your responses in a file called **task2.txt**
-  * Download a recent release of Phenny (check out **wget**, it is a nifty tool)
+A copy of the code follows:
-    * There's an official Phenny
-    * There's also a variant known as phenny_osu
-  * If it is obtained in archive form, extract it somewhere within your home directory (perhaps under **src/**)
-  * Take a look at the files obtained, start reading any documentation
-=====Task 2: Configure irc bot=====
+<code c 1>
-An irc bot, being a network-aware piece of software, needs sufficient configuration in order to operate properly. While it is up to you to derive a working configuration, you'll want to keep in mind the following information:
+/*
+ * info.c - program to generate information stream for processing.
+ *
+ *          In order to run, this program must be named according
+ *          to the value stored in the name[] array. Do not change
+ *          the code or values in this source code, but match the
+ *          executable name as appropriate.
+ *
+ *          By default, no data is generated. In order to alter
+ *          this behavior, provide a whole number as the first
+ *          argument on the command-line, and that many lines of
+ *          output will be generated (to STDOUT by default).
+ *
+ * To compile: gcc -o PROGRAM_NAME info.c
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
-  * irc handles are limited to 9 characters max (and they may not like starting with numbers or having spaces in them)
+int main(int argc, char **argv)
-  * the irc server you want to connect to is: **irc.offbyone.lan** (port 6667 if it matters)
+{
-  * you may want to initially configure your bot to join a secluded channel so you can test it. For the project, it will ultimately need to join: **#botchan**
+	int index, max, x, y, i;
-  * you should configure both yourself and me (username **wedge**) as administrators for the bot.
+	char name[] = { 0x64, 0160, (114-63), (064+03), 0x00 };
+	char file[(strlen(name)+1)];
-Verify you can successfully start the bot and that it connects to the intended server and channel. You may want to run it in a sub-console in your screen session, so that you can keep an eye on any messages it generates.
+	x = strlen(*(argv+0));
+	y = strlen(name);
-=====Task 3: Enhance the bot with modules/plugins=====
+	for (i = 0; i <= y; i++)
-In addition to core usability, I'd like you to enable additional functionality through the use of modules. A few modules come with the stock Phenny software distribution, and there appear to be a few third-party sources, such as:
+	{
+		file[i] = *(*(argv+0)+(x-y)+i);
+	}
-  * phenny-games
+	if (strcasecmp(file, name) != 0)
-  * oblique
+	{
+		fprintf(stderr, "ERROR: filename is incorrect!\n");
+		fprintf(stderr, "       must match name[] string\n");
+		exit(1);
+	}
-Install and enable modules for your bot, and verify some form of functionality.
+	if (argc >= 2)
+	{
+		max = atoi(*(argv+1));
+	}
+	else
+	{
+		max = 0;
+	}
-**NOTE:** Due to changes in the service it uses, the Phenny **weather** module is beyond broken. Trying to use it will result in an error being displayed. If you are skilled with Python and can craft a solution, that can certainly count toward completing this task of the project.
+	if (argc >= 3)
+	{
+		srand(atoi(*(argv+2)));
+	}
+	else
+	{
+		srand(1730);
+	}
-=====Task 4: Script to check bot status=====
+	for (index = 1; index <= max; index++)
-Using tools and concepts we learned previously, especially:
+	{
+		x = rand() % 849 + 50;
+		y = rand() % 1899 + 100;
-  * ps
+		if (((x % 3) == 0) && ((y % 4) > 2))
-  * grep
+			fprintf(stdout, "%d\tblank\n", index);
-  * pgrep
+		else if (((x % 7) < 4) && ((y % 5) > 3))
+			fprintf(stdout, "%d\terror %d\n", index, ((x % 20) + 1));
+		else
+			fprintf(stdout, "%d\t%.3d-%.3d\n", index, x, y);
+	}
-Write a script that checks for a currently running Phenny instance (run by you). If not instance is found, launch a new instance. If an instance IS running, do nothing.
+	return(0);
+}
+</code>
-Be sure to make use of **absolute paths**.
+NOTE: Copying/pasting this code into a file to do the project will not earn you credit for task 1. You MUST copy the file from the specified location.
+=====Task 3: Execute your program=====
-=====Task 5: Script to kill running bot=====
+Once you have things working:
-Using tools and concepts we learned previously, especially:
-  * ps
+  * Run the program and have it generate 1024 lines of output
-  * grep
+  * Write down the command-line used in a file called **task3.txt**
-  * pgrep
-  * kill
-  * pkill
-Create a script that will check for a currently running Phenny instance (run by you). If an instance is running, kill it.
+=====Task 4: Store your output=====
-If no instance is running, do nothing.
+  * Save your program's output (the 1024 lines) to a file called **task4.txt**
-=====Task 6: Create a cron job=====
+=====Task 5: Find and count the duplicates=====
-Reading up on cron and figuring out how to add an entry to your user's personal crontab, add a job that runs your check script (Task 4 script) every 10 minutes.
+  * Ignoring the index values in the left-most column, determine which numerical codes occur more than once by concocting a command-line incantation or script that appropriately filters and processes the output.
+  * Also display with a count of the total number of lines in the output, along with the total number of lines with valid numeric values (ignore "blank" lines and lines with error codes). Finally, display the total count of lines that have duplicates.
+  * Put your resulting command-line(s) or script in a file called **task5.sh**
+  * Put the output (result) of your command-line(s) or script in a file called **task5.out**
-You can verify successful cron job deployment by ensuring your bot is not running and waiting for the next 10 minute marker and see if it starts.
+For example, let's say we had the following output:
-=====Task 7: Create an at job=====
+<cli>
-While cron is used for long-term scheduled jobs, **at** is useful for short term ones.
+	671-477
+	error 4
+	742-703
+	671-477
+	blank
+	516-336
+ 	671-477
+	742-703
+	546-031
+	089-322
+	442-1220
+  	blank
+</cli>
-Read up on **at**, and deploy an **at** job that will run your **botkill** script (task 5) a couple minutes before a 10 minute marker.
+As a result of running your solution, the following output should be produced:
-Use this to test both your script and correct usages of **at**, as well as the **cron**-related activities.
+<cli>
+-477 occurs 3 times
+-703 occurs 2 times
+Out of 12 lines (9 with numeric values), there were a total of 5 lines with duplicate values
+</cli>
+=====Task 6: Find and display the max duplicates=====
+From your filtered output in the previous task, write some logic that:
+  * Removes the "blank" lines and error codes from your original output
+  * Collapses any duplicates (have just 1 value for each duplicate set)
+  * Sorts the resulting numeric data according to the value to the left of the dash.
+  * Re-indexes the data to create a new, more refined, data file. Have a single tab separate the index value from the data value on each line.
+  * Put your logic in a file called **task6.sh**
+  * Put your output in a file called **task6.out**
 =====Submission=====
@@ Line 96: / Line 179: @@
   * All criteria indicated above
-  * To signal completion, submit an archive containing the following files:
+  * To signal completion, submit an archive containing all the files generated in each task above.
-    * botcheck script (call it **botcheck.sh**)
+    * Task 0: **task0.question** and **task0.response**
-    * botkill script  (call it **botkill.sh**)
+    * Task 1: **task1.txt**
-    * a text file called (**at.txt**) containing syntax used to create/deploy **at** job
+    * Task 2: **task2.txt**
-    * a listing of your crontab entries (**crontab.txt**), showing the correctly scheduled deployment of your script
+    * Task 3: **task3.txt**
-    * a copy of your configured bot's config file (**config.py**)
+    * Task 4: **task4.txt**
-    * a text file containing the URL you downloaded your bot, along with any URLs you obtained modules/plugins used (**botdata.txt**)
+    * Task 5: **task5.sh** and **task5.out**
-    * Put all these files in a **tar** archive
+    * Task 6: **task6.sh** and **task6.out**
-    * Compress it with max compression using **bzip2**
+    * Put all these files in a **tar** archive called **dataproc.tar**
-    * Name the archive **ircbot.tar.bz2**
+    * Compress it with max compression using **gzip**
-  * After you've recorded the crontab entry with your botcheck script, remove the entry from your crontab (so we don't end up with a bunch of phantom bots nobody cares about).
+    * The resulting archive should be named: **dataproc.tar.gz**
-  * When you're done with this project, kill your bot. Do not leave it running.
-**NOTE:** If this activity fascinates you and you'd like to keep playing with your bot, do so using a different config/handle, so it will not be confused with your project submission. If you want to schedule it, do so using a differently named script at a different time interval).
 To submit this project to me using the **submit** tool, run the following command at your lab46 prompt:
 <cli>
-$ submit unix dataproc ircbot.tar.bz2
+$ submit unix dataproc dataproc.tar.gz
 Submitting unix project "dataproc":
-    -> ircbot.tar.bz2(OK)
+    -> dataproc.tar.gz(OK)
 SUCCESSFULLY SUBMITTED

Lab46 Wiki

User Tools

Site Tools

Differences

Page Tools