User Tools

Site Tools


Sidebar

projects

haas:spring2014:unix:projects:dataproc

This is an old revision of the document!


Corning Community College

CSCS1730 UNIX/Linux Fundamentals

~~TOC~~

Project: DATA PROCESSING

Objective

To apply your growing and versatile skills on the command-line by massaging data through the deployment of innovative command-line incantations and slick scripts.

Background

To most of us, computers are a frequently used interactive tool for accomplishing work. With our recent explorations into the realm of shell scripting, concepts of automation are starting to enter our peripherary.

With automation, comes the need to do things outside that interactive environment, and run at a designated time or in reaction to a particular event.

We will be scheduling tasks with respect to time in this project, to get better acquainted with the functionality and capabilities of non-interactive yet automated tasks.

cron

From the wikipedia article on Cron:

“Cron is a time-based job scheduler in Unix-like computer operating systems. The name cron comes from the word “chronos”, Greek for “time”. Cron enables users to schedule jobs (commands or shell scripts) to run periodically at certain times or dates. It is commonly used to automate system maintenance or administration, though its general-purpose nature means that it can be used for other purposes, such as connecting to the Internet and downloading email.”

Be sure to check the manual page for cron(8), and the corresponding manual pages for crontab(1) and crontab(5). When you are familiar with where pertinent information can be found regarding cron, proceed to the question below.

Task 1: Obtain source code

On Lab46, in the /var/public/unix/projects/dataproc/ directory, is a file called info.c

  • Copy this into your home directory. How did you do it?
  • Write down the command-line used in a file called task1.txt

Task 2: Study the file contents

Determine:

  • How to properly compile the file (so that it will run without displaying an error)?
  • How to properly execute the resulting program (to generate 8 lines of output)?
  • When you figure out the answers to both of these, put your responses in a file called task2.txt

A copy of the code follows:

1
/*
 * info.c - program to generate information stream for processing.
 *
 *          In order to run, this program must be named according
 *          to the value stored in the name[] array. Do not change
 *          the code or values in this source code, but match the
 *          executable name as appropriate.
 *
 *          By default, no data is generated. In order to alter
 *          this behavior, provide a whole number as the first
 *          argument on the command-line, and that many lines of
 *          output will be generated (to STDOUT by default).
 *
 * To compile: gcc -o PROGRAM_NAME info.c
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
int main(int argc, char **argv)
{
	int index, max, x, y, i;
	char name[] = { 0x64, 0160, (114-63), (064+03), 0x00 };
	char file[(strlen(name)+1)]; 
 
	x = strlen(*(argv+0));
	y = strlen(name);
 
	for (i = 0; i <= y; i++)
	{
		file[i] = *(*(argv+0)+(x-y)+i);
	}
 
	if (strcasecmp(file, name) != 0)
	{
		fprintf(stderr, "ERROR: filename is incorrect!\n");
		fprintf(stderr, "       must match name[] string\n");
		exit(1);
	}
 
	if (argc >= 2)
	{
		max = atoi(*(argv+1));
	}
	else
	{
		max = 0;
	}
 
	if (argc >= 3)
	{
		srand(atoi(*(argv+2)));
	}
	else
	{
		srand(1730);
	}
 
	for (index = 1; index <= max; index++)
	{
		x = rand() % 849 + 50;
		y = rand() % 1899 + 100;
 
		if (((x % 3) == 0) && ((y % 4) > 2))
			fprintf(stdout, "%d\tblank\n", index);
		else if (((x % 7) < 4) && ((y % 5) > 3))
			fprintf(stdout, "%d\terror %d\n", index, ((x % 20) + 1));
		else
			fprintf(stdout, "%d\t%.3d-%.3d\n", index, x, y);
	}
 
	return(0);
}

Task 3: Execute your program

Once you have things working:

  • Run the program and have it generate 1024 lines of output
  • Write down the command-line used in a file called task3.txt

Task 4: Store your output

  • Save your program's output (the 1024 lines) to a file called task4.txt

Task 5: Find and count the duplicates

  • Ignoring the index values in the left-most column, determine which numerical codes occur more than once by concocting a command-line incantation or script that appropriately filters and processes the output.
  • Also display with a count of the total number of lines in the output, along with the total number of lines with valid numeric values (ignore “blank” lines and lines with error codes). Finally, display the total count of lines that have duplicates.
  • Put your resulting command-line(s) or script in a file called task5.txt
  • Put the output (result) of your command-line(s) or script in a file called task5.out

For example, let's say we had the following output:

1	671-477
2	error 4
3	742-703
4 	671-477
5	blank
6	516-336
7 	671-477
8 	742-703
9 	546-031
10 	089-322
11 	442-1220
12  	blank

As a result of running your solution, the following output should be produced:

671-477 occurs 3 times
742-703 occurs 2 times
Out of 12 lines (9 with numeric values), there were a total of 5 lines with duplicate values

Task 6: Find and display the max duplicates

From your filtered output in the previous task, write some logic that:

  • Removes the “blank” lines and error codes from your original output
  • Collapses any duplicates (have just 1 value for each duplicate set)
  • Sorts the resulting numeric data according to the value to the left of the dash.
  • Re-indexes the data to create a new, more refined, data file. Have a single tab separate the index value from the data value on each line.
  • Put your logic in a file called task6.txt
  • Put your output in a file called task6.out

Task 2: Configure irc bot

An irc bot, being a network-aware piece of software, needs sufficient configuration in order to operate properly. While it is up to you to derive a working configuration, you'll want to keep in mind the following information:

  • irc handles are limited to 9 characters max (and they may not like starting with numbers or having spaces in them)
  • the irc server you want to connect to is: irc.offbyone.lan (port 6667 if it matters)
  • you may want to initially configure your bot to join a secluded channel so you can test it. For the project, it will ultimately need to join: #botchan
  • you should configure both yourself and me (username wedge) as administrators for the bot.

Verify you can successfully start the bot and that it connects to the intended server and channel. You may want to run it in a sub-console in your screen session, so that you can keep an eye on any messages it generates.

Task 3: Enhance the bot with modules/plugins

In addition to core usability, I'd like you to enable additional functionality through the use of modules. A few modules come with the stock Phenny software distribution, and there appear to be a few third-party sources, such as:

  • phenny-games
  • oblique

Install and enable modules for your bot, and verify some form of functionality.

NOTE: Due to changes in the service it uses, the Phenny weather module is beyond broken. Trying to use it will result in an error being displayed. If you are skilled with Python and can craft a solution, that can certainly count toward completing this task of the project.

Task 4: Script to check bot status

Using tools and concepts we learned previously, especially:

  • ps
  • grep
  • pgrep

Write a script that checks for a currently running Phenny instance (run by you). If not instance is found, launch a new instance. If an instance IS running, do nothing.

Be sure to make use of absolute paths.

Task 5: Script to kill running bot

Using tools and concepts we learned previously, especially:

  • ps
  • grep
  • pgrep
  • kill
  • pkill

Create a script that will check for a currently running Phenny instance (run by you). If an instance is running, kill it.

If no instance is running, do nothing.

Task 6: Create a cron job

Reading up on cron and figuring out how to add an entry to your user's personal crontab, add a job that runs your check script (Task 4 script) every 10 minutes.

You can verify successful cron job deployment by ensuring your bot is not running and waiting for the next 10 minute marker and see if it starts.

Task 7: Create an at job

While cron is used for long-term scheduled jobs, at is useful for short term ones.

Read up on at, and deploy an at job that will run your botkill script (task 5) a couple minutes before a 10 minute marker.

Use this to test both your script and correct usages of at, as well as the cron-related activities.

Submission

To successfully complete this project, the following criteria must be met:

  • All criteria indicated above
  • To signal completion, submit an archive containing the following files:
    • botcheck script (call it botcheck.sh)
    • botkill script (call it botkill.sh)
    • a text file called (at.txt) containing syntax used to create/deploy at job
    • a listing of your crontab entries (crontab.txt), showing the correctly scheduled deployment of your script
    • a copy of your configured bot's config file (config.py)
    • a text file containing the URL you downloaded your bot, along with any URLs you obtained modules/plugins used (botdata.txt)
    • Put all these files in a tar archive
    • Compress it with max compression using bzip2
    • Name the archive ircbot.tar.bz2
  • After you've recorded the crontab entry with your botcheck script, remove the entry from your crontab (so we don't end up with a bunch of phantom bots nobody cares about).
  • When you're done with this project, kill your bot. Do not leave it running.

NOTE: If this activity fascinates you and you'd like to keep playing with your bot, do so using a different config/handle, so it will not be confused with your project submission. If you want to schedule it, do so using a differently named script at a different time interval).

To submit this project to me using the submit tool, run the following command at your lab46 prompt:

$ submit unix dataproc ircbot.tar.bz2
Submitting unix project "dataproc":
    -> ircbot.tar.bz2(OK)

SUCCESSFULLY SUBMITTED

You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.

haas/spring2014/unix/projects/dataproc.1395608783.txt.gz · Last modified: 2014/03/23 21:06 by wedge