Table of Contents

Corning Community College

CSCS1730 UNIX/Linux Fundamentals

~~TOC~~

Project: DATA PROCESSING

Objective

To apply your growing and versatile skills on the command-line by massaging data through the deployment of innovative command-line incantations and slick scripts.

Background

Often times, we will find ourselves encountering data in a slightly one-off format- not quite meeting some requirement we need for further processing.

Luckily, the UNIX environment provides many facilities for filtering and manipulating data so that we can “reformat” it to meet expectations.

This activity has you dabbling in one such scenario: a program that generates “raw” data (simulated from a scientific/industrial instrument). This “raw” data needs to be sanitized and reformatted (to perhaps be further analyzed by other tools upstream).

Task 0: Post/respond to a question

Task 1: Obtain source code

On Lab46, in the /var/public/unix/projects/dataproc/ directory, is a file called info.c

Task 2: Study the file contents

Determine:

A copy of the code follows:

1
/*
 * info.c - program to generate information stream for processing.
 *
 *          In order to run, this program must be named according
 *          to the value stored in the name[] array. Do not change
 *          the code or values in this source code, but match the
 *          executable name as appropriate.
 *
 *          By default, no data is generated. In order to alter
 *          this behavior, provide a whole number as the first
 *          argument on the command-line, and that many lines of
 *          output will be generated (to STDOUT by default).
 *
 * To compile: gcc -o PROGRAM_NAME info.c
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
int main(int argc, char **argv)
{
	int index, max, x, y, i;
	char name[] = { 0x64, 0160, (114-63), (064+03), 0x00 };
	char file[(strlen(name)+1)]; 
 
	x = strlen(*(argv+0));
	y = strlen(name);
 
	for (i = 0; i <= y; i++)
	{
		file[i] = *(*(argv+0)+(x-y)+i);
	}
 
	if (strcasecmp(file, name) != 0)
	{
		fprintf(stderr, "ERROR: filename is incorrect!\n");
		fprintf(stderr, "       must match name[] string\n");
		exit(1);
	}
 
	if (argc >= 2)
	{
		max = atoi(*(argv+1));
	}
	else
	{
		max = 0;
	}
 
	if (argc >= 3)
	{
		srand(atoi(*(argv+2)));
	}
	else
	{
		srand(1730);
	}
 
	for (index = 1; index <= max; index++)
	{
		x = rand() % 849 + 50;
		y = rand() % 1899 + 100;
 
		if (((x % 3) == 0) && ((y % 4) > 2))
			fprintf(stdout, "%d\tblank\n", index);
		else if (((x % 7) < 4) && ((y % 5) > 3))
			fprintf(stdout, "%d\terror %d\n", index, ((x % 20) + 1));
		else
			fprintf(stdout, "%d\t%.3d-%.3d\n", index, x, y);
	}
 
	return(0);
}

NOTE: Copying/pasting this code into a file to do the project will not earn you credit for task 1. You MUST copy the file from the specified location.

Task 3: Execute your program

Once you have things working:

Task 4: Store your output

Task 5: Find and count the duplicates

For example, let's say we had the following output:

1	671-477
2	error 4
3	742-703
4 	671-477
5	blank
6	516-336
7 	671-477
8 	742-703
9 	546-031
10 	089-322
11 	442-1220
12  	blank

As a result of running your solution, the following output should be produced:

671-477 occurs 3 times
742-703 occurs 2 times
Out of 12 lines (9 with numeric values), there were a total of 5 lines with duplicate values

Task 6: Find and display the max duplicates

From your filtered output in the previous task, write some logic that:

Submission

To successfully complete this project, the following criteria must be met:

To submit this project to me using the submit tool, run the following command at your lab46 prompt:

$ submit unix dataproc dataproc.tar.gz
Submitting unix project "dataproc":
    -> dataproc.tar.gz(OK)

SUCCESSFULLY SUBMITTED

You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.