Corning Community College
CSCS1730 UNIX/Linux Fundamentals
~~TOC~~
To apply your growing and versatile skills on the command-line by massaging data through the deployment of innovative command-line incantations and slick scripts.
Often times, we will find ourselves encountering data in a slightly one-off format- not quite meeting some requirement we need for further processing.
Luckily, the UNIX environment provides many facilities for filtering and manipulating data so that we can “reformat” it to meet expectations.
This activity has you dabbling in one such scenario: a program that generates “raw” data (simulated from a scientific/industrial instrument). This “raw” data needs to be sanitized and reformatted (to perhaps be further analyzed by other tools upstream).
On Lab46, in the /var/public/unix/projects/dataproc/ directory, is a file called info.c
Determine:
A copy of the code follows:
/* * info.c - program to generate information stream for processing. * * In order to run, this program must be named according * to the value stored in the name[] array. Do not change * the code or values in this source code, but match the * executable name as appropriate. * * By default, no data is generated. In order to alter * this behavior, provide a whole number as the first * argument on the command-line, and that many lines of * output will be generated (to STDOUT by default). * * To compile: gcc -o PROGRAM_NAME info.c */ #include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char **argv) { int index, max, x, y, i; char name[] = { 0x64, 0160, (114-63), (064+03), 0x00 }; char file[(strlen(name)+1)]; x = strlen(*(argv+0)); y = strlen(name); for (i = 0; i <= y; i++) { file[i] = *(*(argv+0)+(x-y)+i); } if (strcasecmp(file, name) != 0) { fprintf(stderr, "ERROR: filename is incorrect!\n"); fprintf(stderr, " must match name[] string\n"); exit(1); } if (argc >= 2) { max = atoi(*(argv+1)); } else { max = 0; } if (argc >= 3) { srand(atoi(*(argv+2))); } else { srand(1730); } for (index = 1; index <= max; index++) { x = rand() % 849 + 50; y = rand() % 1899 + 100; if (((x % 3) == 0) && ((y % 4) > 2)) fprintf(stdout, "%d\tblank\n", index); else if (((x % 7) < 4) && ((y % 5) > 3)) fprintf(stdout, "%d\terror %d\n", index, ((x % 20) + 1)); else fprintf(stdout, "%d\t%.3d-%.3d\n", index, x, y); } return(0); }
NOTE: Copying/pasting this code into a file to do the project will not earn you credit for task 1. You MUST copy the file from the specified location.
Once you have things working:
For example, let's say we had the following output:
1 671-477 2 error 4 3 742-703 4 671-477 5 blank 6 516-336 7 671-477 8 742-703 9 546-031 10 089-322 11 442-1220 12 blank
As a result of running your solution, the following output should be produced:
671-477 occurs 3 times 742-703 occurs 2 times Out of 12 lines (9 with numeric values), there were a total of 5 lines with duplicate values
From your filtered output in the previous task, write some logic that:
To successfully complete this project, the following criteria must be met:
To submit this project to me using the submit tool, run the following command at your lab46 prompt:
$ submit unix dataproc dataproc.tar.gz Submitting unix project "dataproc": -> dataproc.tar.gz(OK) SUCCESSFULLY SUBMITTED
You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.