Lab46 Wiki

discrete fall2022 eoce 0x3

Objective

To explore the cnv0/cnv1 programs from a different perspective: using only ONE loop to drive central processing.

Background

In mathematics, a prime number is a value that is only evenly divisible by 1 and itself; it has just that one pair of factors, no others. Numbers that have divisibility/factors are classified as composite numbers.

The number 6 is a composite number, as in addition to 1 and 6, it also has the factors of 2 and 3.

The number 17, however, is a prime number, as no numbers other than 1 and 17 can be evenly divided into it.

Calculating the primality of a number

As of yet, there is no quick and direct way of determining the primality of a given number. Instead, we must perform a series of tests to determine if it fails primality (typically by proving it is composite).

This process incurs a considerable amount of processing overhead on the task, so much so that increasingly large values take ever-expanding amounts of time. Often, approaches to prime number calculation involve various algorithms, which offer various benefits (less time) and drawback (more complex code).

Your task for this project is to implement a prime number program using the straightforward, unoptimized brute-force algorithm, which determines the primality of a number in a “trial by division” approach.

Main algorithm: brute force

The brute force approach is the simplest to implement (although at some cost in the form of time).

As we will be looking to do some time/performance analysis and comparisons, it is often good to have a baseline. This program will be it.

To perform the process of computing the primality of a number, we simply attempt to evenly divide all the values between 2 and one less than the number in question. If any one of them divides evenly, the number is NOT prime, but instead composite.

Checking the remainder of a division indicates whether or not a division was clean (having 0 remainder indicates such a state).

For example, the number 11:

11 % 2 = 1 (2 is not a factor of 11)
11 % 3 = 2 (3 is not a factor of 11)
11 % 4 = 3 (4 is not a factor of 11)
11 % 5 = 1 (5 is not a factor of 11)
11 % 6 = 5 (6 is not a factor of 11)
11 % 7 = 4 (7 is not a factor of 11)
11 % 8 = 3 (8 is not a factor of 11)
11 % 9 = 2 (9 is not a factor of 11)
11 % 10 = 1 (10 is not a factor of 11)

Because none of the values 2-10 evenly divided into 11, we can say it passed the test: 11 is a prime number

On the other hand, take 119:

119 % 2 = 1 (2 is not a factor of 119)
119 % 3 = 2 (3 is not a factor of 119)
119 % 4 = 3 (4 is not a factor of 119)
119 % 5 = 4 (5 is not a factor of 119)
119 % 6 = 5 (6 is not a factor of 119)
119 % 7 = 0 (7 is a factor of 119)
119 % 8 = 7
119 % 9 = 2
119 % 10 = 9
119 % 11 = 9
119 % 12 = 11
119 % 13 = 2
...

Because, during our range of testing every value from 2-118, we find that 7 evenly divides into 119, it failed the test: 119 is not prime, but is instead a composite number.

Please NOTE: Even once a number is identified as composite, your program MUST CONTINUE evaluating the remainder of the values (up to 119-1, or 118). It might seem pointless (and it is for a production program), but I want you to see the performance implications this creates.

algorithm

Some things to keep in mind on your implementation:

you will want to have exactly ONE loop in the central processing for

this program.

you will need to flatten the conditions that previously took place

in the nested loop

be mindful of what the underlying conditions are

you know the starting value and the terminating condition, so you

have a clear starting and ending point to work with.

I do NOT want to see ambiguous, one-letter variables used in your

implementation(s). Please use meaningful variable names.

Some good examples of variable names would be:

number: the number being tested

factor: the value being divided into number to test for

primality

step: the rate by which some variable is changing

qty: the count of the current tally of primes

max: the maximum count we seek

start: a value we are starting at

lower: a lower bound

upper: an upper bound

see how much more readable and meaningful these are, especially

as compared to a, i, n, x? You may even find it

      helps with debugging and understanding your code better.

let the loop drive the overall process. Identify prime/composite

status separate from loop terminating conditions.

and remember, the baseline brute force algorithm may well identify

a value as composite, but won't terminate the loop.

your timing should start before the loop (just AFTER argument

processing), and terminate immediately following the terminating

  output newline outside the loops.

you may NOT split qty and range functionality into two

separate code blocks (ie have two sets of two loops). Only the one

  set as indicated.

prime algorithm optimizations

To give us a decent appreciation of the subtleties of algorithm development in a theme of programs, I have identified the following optimizations that we will be implementing.

The optimizations we will be implementing in this section include:

break on composite - once a tested number is proven to be

composite, there is no need to continue processing: break out of the

  factor check logic and proceed to the next number

mapping factors of 6 - it turns out that, aside from the initial

primes of 2 and 3, that all prime numbers fall to a +1 or

1 off a factor of six (there is an algorithm for this: 6a+/-1).

This optimization will utilize this property, only testing numbers

  +/-1 off of factors of 6 (how might this impact overall processing?)

odds-only checking - aside from 2, all other prime

numbers are odd. Therefore, there is zero need to perform a composite

  check on  an even  number, allowing  us to  focus exclusively  on odd
  values (luckily, they seem to occur in a predictable pattern).

sqrt() trick - mathematically it has been shown that if a number

has any evenly divisible factors, at least one half of that factor

  pair will occur by the square root point of the number being tested.

sqrt()-less square root approximation - sqrt(), a function

in the math library, does an industrial strength square root

  calculation.  We  don't  need  that, merely  a  whole  integer  value
  corresponding to the approximate square  root. Here we will implement
  our  own  logic   to  approximate  square  root,   hopefully  with  a
  considerable performance impact.

Unless specified in the encoded name, your algorithm should only implement the algorithm and optimization(s) specified.

Some of these optimizations can co-exist easily (break + map, odd + sqrt()), others are partially compatible (map + odd can coexist in a certain form), while others are mutually exclusive (sqrt() and approximated square root conflict). So there are definitely a few combinations that are possible using this scheme.

Program Specifications

Your program should otherwise work like the discrete/cnv1 program did:

obtain parameters from the command-line (see command-line arguments section below).

check to make sure the user indeed supplied enough parameters, and

exit with an error message if not.

argv[1]: maximum quantity of primes to calculate (your program

should run until it discovers that many primes).

this value should be an integer value, greater than or equal to 0

if argv[1] is 0, disable the quantity check, and rely only on

provided lower and upper bounds (thru argv[4] would be required

        in this case).

argv[2]: N-ary specification; for this section, we are going to use
- *1 * argv[3]: conditionally optional lower bound (starting value). Most of the time, this will probably be 2, but should be a positive integer greater than or equal to 2. This defines where the program will start its prime quantity check from. * if omitted, assume a lower bound of 2. * if you desire to specify an upper bound (argv[4]), you obviously MUST provide the lower bound argument under this scheme. * argv[4]: conditionally optional upper bound (ending value). If provided, this is the ending value you'd like to check to. * If doing a quantity run (argv[1] is NOT 0), you don't need this. * If doing a quantity run AND you specify an upper bound, whichever condition is achieved first dictates program termination. That is, upper bound could override quantity (if it is achieved before quantity), and quantity can override the upper bound (if it is achieved before reaching the specified upper bound). * argv[5]: specification of optimizations. Like in discrete/cnv1, the values are bitwise-specified in a single numeric value, broken out as follows: 0 - no optimizations (naive, brute force approach) 1 - break on composite 2 - odds-only checking 4 - map 8 - sqrt 16 - approximate square root * for each argument: you should do a basic check to ensure the user complied with this specification, and exit with a unique error message (displayed to STDERR) otherwise: * for insufficient quantity of arguments, display: PROGRAM_NAME:

insufficient number of arguments! * for invalid argv[1], display: PROGRAM_NAME: invalid quantity! * for invalid argv[2], display: PROGRAM_NAME: invalid value! * invalid argv[3], display: PROGRAM_NAME: invalid lower bound! * if argv[3] is not needed, ignore (no error displayed not forced exit, as it is acceptable defined behavior). * invalid argv[4], display: PROGRAM_NAME: invalid upper bound! * if argv[4] is not needed, ignore (no error displayed nor forced exit, as it is acceptable defined behavior). * In these error messages, PROGRAM_NAME is the name of the program being run; this can be accessed as a string stored in argv[0]. * perform ONLY the optimization(s) specified on the command-line. We are producing multiple data points for a broader performance comparison. * please take note on differences in run-time, contemplating the impact the algorithm and optimization(s) have on performance (timing, specifically). * immediately after argument processing: start your stopwatch (see timing section below). * perform the correct algorithm and optimization(s) against the command-line input(s) given. * your program is to have no fewer and no more than 1 loop in this prime processing section. * display identified primes (space-separated) to a file pointer called stdout * stop your stopwatch immediately following your prime processing loops (and terminating newline display to stdout). Calculate the time that has transpired (ending time minus starting time). * output the processing run-time to the file pointer called stderr * your output MUST conform to the example output in the execution section below. This is also a test to see how well you can implement to specifications. Basically: * as primes are being displayed, they are space-separated (first prime hugs the left margin), and when all said and done, a newline is issued. ====Coding Restrictions==== Since a lot of our explorations this semester were algorithmic in nature, we made a lot of use of restricting various approaches we could take in the implementation of our solutions. As such, the following restrictions are in place for your implementations: * no global variables * no infinite loops * no if() shunts (have good conditions!) * what is the difference between an if() shunt and a valid conditional statement? if() shunts are a messy way of avoiding iterations; valid if() statements denote conditionally optional processing that occurs every iteration. * one return statement (max) per function * exit() calls are limited to command-line or resource allocation error processing * avoid redundant sections of code (merge logic or break out functions) Remember, the focus should be on writing elegant code, and NOT on brute forcing some solution. Show me that you've learned something this semester- write clean, well commented, consistently indented code. =====Makefile operations===== Makefiles provide a build automation system for our programs, instructing the computer on how to compile files, so we don't have to constantly type compiler command-lines ourselves. I've also integration some other useful, value-added features that will help you with overall administration of the project. Basic operation of the Makefile is invoked by running the command “make” by itself. The default action is to compile everything in the project directory. A description of some available commands include: * make: compile everything * any warnings or errors generated by the compiler will go into a file in the base directory of pnc0 in a file called errors; you can cat it to view the information. * make debug: compile everything with debug support * any warnings or errors generated by the compiler will be displayed to the screen as the programs compile. * make clean: remove all binaries * make save: make a backup of your current work =====Command-Line Arguments===== To automate our comparisons, we will be making use of command-line arguments in our programs. ====header files==== We don't need any extra header files to use command-line arguments, but we will need an additional header file to use the atoi(3) function, which we'll use to quickly turn the command-line parameter into an integer, and that header file is stdlib.h, so be sure to include it with the others: <code c> #include <stdio.h> #include <stdlib.h> </code> ====setting up main()==== To accept (or rather, to gain access) to arguments given to your program at runtime, we need to specify two parameters to the main() function. While the names don't matter, the types do.. I like the traditional argc and argv names, although it is also common to see them abbreviated as ac and av. Please declare your main() function as follows: <code c> int main (int argc, char argv) </code>

There are two very important variables involved here (the types are actually what are important, the names given to the variables are actually quite, variable; you may see other references refer to them as things like “ac” and “av”):

int argc: the count (an integer) of tokens given on the command line

(program name + arguments)

char **argv: an array of strings (technically an

array of an array of char) that contains “strings” of the various

  tokens provided on the command-line.

The arguments are accessible via the argv array, in the order they were specified:

argv[0]: program invocation (path + program name)
argv[1]: our maximum / upper bound
argv[2]: should be provided as a 1 for this project
argv[3]: conditionally optional; represents lower bound
argv[4]: conditionally optional; represents upper bound
argv[5]: conditionally optional; represents optimizations, if any

Additionally, let's not forget the argc variable, an integer, which contains a count of arguments (argc == argument count). If we provided argv[0] through argv[4], argc would contain a 5.

example

For example, if we were to execute the as follows:

lab46:~/src/discrete/eoce/0x3$ ./cnv3 128 1 2 2048 13

We'd have:

argv[0]: “./cnv3”
argv[1]: “128” (note, NOT integer 128, but a string)
argv[2]: “1”
argv[3]: “2”
argv[4]: “2048”
argv[5]: “13”

and let's not forget:

argc: 6 (there are 6 things, argv indexes 0, 1, 2, 3, 4, and 5)

With the conditionally optional arguments as part of the program spec, for a valid execution of the program, argc could be a value anywhere from 3 to 6.

Simple argument checks

While there are a number of checks we should perform, one of the first should be a check to see if the minimal number of arguments has been provided:

    // if  less than 3 arguments (program_name + quantity + argv[2] == 3)
    // have been provided
    //
    if (argc < 3)
    {
        fprintf(stderr, "%s: insufficient number of arguments!\n", argv[0]);
        exit(1);
    }

Since argv[3] (lower bound) and argv[4] (upper bound) are conditionally optional, it wouldn't make sense to check for them in the overall count. But we can and do still want to stategically utilize argc to determine if an argv[3] or argv[4] is present.

Grab and convert max

Finally, we need to put the argument representing the maximum quantity into a variable.

I'd recommend declaring a variable of type int.

We will use the atoi(3) function to quickly convert the command-line arguments into int values:

    max  = atoi (argv[1]);

And now we can proceed with the rest of our prime implementation.

Timing

Often times, when checking the efficiency of a solution, a good measurement (especially for comparison), is to time how long the processing takes.

In order to do that in our prime number programs, we are going to use C library functions that obtain the current time, and use it as a stopwatch: we'll grab the time just before starting processing, and then once more when done. The total time will then be the difference between the two (end_time - start_time).

We are going to use the gettimeofday(2) function to aid us in this, and to use it, we'll need to do the following:

header file

In order to use the gettimeofday(2) function in our program, we'll need to include the sys/time.h header file, so be sure to add it in with the existing ones:

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

timeval variables

gettimeofday(2) uses a struct timeval data type, of which we'll need to declare two variables in our programs (one for storing the starting time, and the other for the ending time).

Please declare these with your other variables, up at the top of main() (but still WITHIN main()– you do not need to declare global variables).

    struct timeval time_start; // starting time
    struct timeval time_end;   // ending time

Obtaining the time

To use gettimeofday(2), we merely place it at the point in our code we wish to take the time.

For our prime number programs, you'll want to grab the start time AFTER you've declared variables and processed arguments, but JUST BEFORE starting the driving loop doing the processing.

That call will look something like this:

    gettimeofday(&time_start, 0);

The ending time should be taken immediately after all processing (and prime number output) is completed, and right before we display the timing information to STDERR:

    gettimeofday(&time_end, 0);

Displaying the runtime

Once we have the starting and ending times, we can display this to the stderr file pointer. You'll want this line:

fprintf(stderr, "%8.4lf\n",
time_end.tv_sec-time_start.tv_sec+((time_end.tv_usec-time_start.tv_usec)/1000000.0));

For clarity sake, that format specifier is “%8.4lf”, where the “lf” is “long float”, that is NOT a number 'one' but a lowercase letter 'ell'.

And with that, we can compute an approximate run-time of our programs. The timing won't necessarily be accurate down to that level of precision, but it will be informative enough for our purposes.

Execution

specified quantity

Your program output should be as follows (given the specified quantity):

lab46:~/src/discrete/eoce/0x3$ ./cnv3 24 1 # same as 24 1 0 0 0
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 
  0.0001
lab46:~/src/discrete/eoce/0x3$

The execution of the programs is short and simple- grab the parameters, do the processing, produce the output, and then terminate.

invalid lower bound

Here's an example that should generate an error upon running (based on project specifications):

lab46:~/src/discrete/eoce/0x3$ ./cnv3 32 1 0
./cnv3: invalid lower bound
lab46:~/src/discrete/eoce/0x3$

In this case, the program logic should have detected an invalid condition and bailed out before prime computations even began. No timing data is displayed, because exiting should occur even prior to that.

upper bound overriding quantity

As indicated above, there is potential interplay with an active quantity and upper bound values. Here is an example where upper bound overrides quantity, resulting in an early termination (ie upper bound is hit before quantity):

lab46:~/src/discrete/eoce/0x3$ ./cnv3 128 1 7 23 10
7 11 13 17 19 23
  0.0001
lab46:~/src/discrete/eoce/0x3$

Also for fun, I set the lower bound to 7, so you'll see computation starts at 7 (vs. the usual 2).

In general

Analyze the times you see… do they make sense, especially when comparing the algorithm used and the quantity being processed? These are related to some very important core Computer Science considerations we need to be increasingly mindful of as we design our programs and implement our solutions. Algorithmic complexity and algorithmic efficiency will be common themes in all we do.

Submission

To successfully complete this project, the following criteria must be met:

Code must compile cleanly (no warnings or errors)
Output must be correct, and match that given in sample output above.
Code must be nicely and consistently indented
Code must utilize the algorithm and optimizations presented above.
Code must be commented
Output Formatting (including spacing) of program must conform to the

provided output (see above).

Track/version the source code in your lab46 SEMESTER repository
Submit a copy of your source code to me using the submit tool.

Table of Contents