discrete fall2022 eoce 0x3 ======================== =====Objective===== To explore the cnv0/cnv1 programs from a different perspective: using only ONE loop to drive central processing. =====Background===== In mathematics, a **prime** number is a value that is only evenly divisible by 1 and itself; it has just that one pair of factors, no others. Numbers that have divisibility/factors are classified as **composite** numbers. The number **6** is a **composite** number, as in addition to 1 and 6, it also has the factors of 2 and 3. The number **17**, however, is a **prime** number, as no numbers other than 1 and 17 can be evenly divided into it. =====Calculating the primality of a number===== As of yet, there is no quick and direct way of determining the primality of a given number. Instead, we must perform a series of tests to determine if it fails primality (typically by proving it is composite). This process incurs a considerable amount of processing overhead on the task, so much so that increasingly large values take ever-expanding amounts of time. Often, approaches to prime number calculation involve various algorithms, which offer various benefits (less time) and drawback (more complex code). Your task for this project is to implement a prime number program using the straightforward, unoptimized brute-force algorithm, which determines the primality of a number in a "trial by division" approach. =====Main algorithm: brute force===== The brute force approach is the simplest to implement (although at some cost in the form of time). As we will be looking to do some time/performance analysis and comparisons, it is often good to have a baseline. This program will be it. To perform the process of computing the primality of a number, we simply attempt to evenly divide all the values between 2 and one less than the number in question. If any one of them divides evenly, the number is **NOT** prime, but instead composite. Checking the **remainder** of a division indicates whether or not a division was clean (having 0 remainder indicates such a state). For example, the number 11: 11 % 2 = 1 (2 is not a factor of 11) 11 % 3 = 2 (3 is not a factor of 11) 11 % 4 = 3 (4 is not a factor of 11) 11 % 5 = 1 (5 is not a factor of 11) 11 % 6 = 5 (6 is not a factor of 11) 11 % 7 = 4 (7 is not a factor of 11) 11 % 8 = 3 (8 is not a factor of 11) 11 % 9 = 2 (9 is not a factor of 11) 11 % 10 = 1 (10 is not a factor of 11) Because none of the values 2-10 evenly divided into 11, we can say it passed the test: **11 is a prime number** On the other hand, take 119: 119 % 2 = 1 (2 is not a factor of 119) 119 % 3 = 2 (3 is not a factor of 119) 119 % 4 = 3 (4 is not a factor of 119) 119 % 5 = 4 (5 is not a factor of 119) 119 % 6 = 5 (6 is not a factor of 119) 119 % 7 = 0 (7 is a factor of 119) 119 % 8 = 7 119 % 9 = 2 119 % 10 = 9 119 % 11 = 9 119 % 12 = 11 119 % 13 = 2 ... Because, during our range of testing every value from 2-118, we find that 7 evenly divides into 119, it failed the test: 119 is **not** prime, but is instead a composite number. Please NOTE: Even once a number is identified as composite, your program MUST **CONTINUE** evaluating the remainder of the values (up to 119-1, or 118). It might seem pointless (and it is for a production program), but I want you to see the performance implications this creates. ====algorithm==== Some things to keep in mind on your implementation: * you will want to have exactly ONE loop in the central processing for this program. * you will need to flatten the conditions that previously took place in the nested loop * be mindful of what the underlying conditions are * you know the starting value and the terminating condition, so you have a clear starting and ending point to work with. * I do **NOT** want to see ambiguous, one-letter variables used in your implementation(s). Please use //meaningful// variable names. * Some good examples of variable names would be: * **number**: the number being tested * **factor**: the value being divided into number to test for primality * **step**: the rate by which some variable is changing * **qty**: the count of the current tally of primes * **max**: the maximum count we seek * **start**: a value we are starting at * **lower**: a lower bound * **upper**: an upper bound * see how much more readable and meaningful these are, especially as compared to **a**, **i**, **n**, **x**? You may even find it helps with debugging and understanding your code better. * let the loop drive the overall process. Identify prime/composite status separate from loop terminating conditions. * and remember, the baseline brute force algorithm may well identify a value as composite, but won't terminate the loop. * your timing should start before the loop (just AFTER argument processing), and terminate immediately following the terminating output newline outside the loops. * you may **NOT** split **qty** and **range** functionality into two separate code blocks (ie have two sets of two loops). Only the one set as indicated. =====prime algorithm optimizations===== To give us a decent appreciation of the subtleties of algorithm development in a theme of programs, I have identified the following optimizations that we will be implementing. The optimizations we will be implementing in this section include: * **break on composite** - once a tested number is proven to be composite, there is no need to continue processing: break out of the factor check logic and proceed to the next number * **mapping factors of 6** - it turns out that, aside from the initial primes of **2** and **3**, that **all** prime numbers fall to a +1 or -1 off a factor of six (there is an algorithm for this: **6a+/-1**). This optimization will utilize this property, only testing numbers +/-1 off of factors of 6 (how might this impact overall processing?) * **odds-only checking** - aside from **2**, **all** other prime numbers are odd. Therefore, there is zero need to perform a composite check on an even number, allowing us to focus exclusively on odd values (luckily, they seem to occur in a predictable pattern). * **sqrt() trick** - mathematically it has been shown that if a number has any evenly divisible factors, at least one half of that factor pair will occur by the square root point of the number being tested. * **sqrt()-less square root approximation** - **sqrt()**, a function in the math library, does an industrial strength square root calculation. We don't need that, merely a whole integer value corresponding to the approximate square root. Here we will implement our own logic to approximate square root, hopefully with a considerable performance impact. Unless specified in the encoded name, your algorithm should only implement the algorithm and optimization(s) specified. Some of these optimizations can co-exist easily (break + map, odd + sqrt()), others are partially compatible (map + odd can coexist in a certain form), while others are mutually exclusive (sqrt() and approximated square root conflict). So there are definitely a few combinations that are possible using this scheme. ====Program Specifications==== Your program should otherwise work like the discrete/cnv1 program did: * obtain parameters from the command-line (see **command-line arguments** section below). * check to make sure the user indeed supplied enough parameters, and exit with an error message if not. * argv[1]: maximum quantity of primes to calculate (your program should run until it discovers **that** many primes). * this value should be an integer value, greater than or equal to 0 * if argv[1] is 0, disable the quantity check, and rely only on provided lower and upper bounds (thru argv[4] would be required in this case). * argv[2]: N-ary specification; for this section, we are going to use **1** * argv[3]: **conditionally optional** lower bound (starting value). Most of the time, this will probably be **2**, but should be a positive integer greater than or equal to 2. This defines where the program will start its prime quantity check from. * if omitted, assume a lower bound of **2**. * if you desire to specify an upper bound (argv[4]), you obviously MUST provide the lower bound argument under this scheme. * argv[4]: **conditionally optional** upper bound (ending value). If provided, this is the ending value you'd like to check to. * If doing a quantity run (argv[1] is NOT 0), you don't need this. * If doing a quantity run AND you specify an upper bound, whichever condition is achieved first dictates program termination. That is, upper bound could override quantity (if it is achieved before quantity), and quantity can override the upper bound (if it is achieved before reaching the specified upper bound). * argv[5]: specification of optimizations. Like in discrete/cnv1, the values are bitwise-specified in a single numeric value, broken out as follows: 0 - no optimizations (naive, brute force approach) 1 - break on composite 2 - odds-only checking 4 - map 8 - sqrt 16 - approximate square root * for each argument: you should do a basic check to ensure the user complied with this specification, and exit with a unique error message (displayed to STDERR) otherwise: * for insufficient quantity of arguments, display: **PROGRAM_NAME: insufficient number of arguments!** * for invalid argv[1], display: **PROGRAM_NAME: invalid quantity!** * for invalid argv[2], display: **PROGRAM_NAME: invalid value!** * invalid argv[3], display: **PROGRAM_NAME: invalid lower bound!** * if argv[3] is not needed, ignore (no error displayed not forced exit, as it is acceptable defined behavior). * invalid argv[4], display: **PROGRAM_NAME: invalid upper bound!** * if argv[4] is not needed, ignore (no error displayed nor forced exit, as it is acceptable defined behavior). * In these error messages, **PROGRAM_NAME** is the name of the program being run; this can be accessed as a string stored in **argv[0]**. * perform ONLY the optimization(s) specified on the command-line. We are producing multiple data points for a broader performance comparison. * please take note on differences in run-time, contemplating the impact the algorithm and optimization(s) have on performance (timing, specifically). * immediately after argument processing: start your stopwatch (see **timing** section below). * perform the correct algorithm and optimization(s) against the command-line input(s) given. * your program is to have no fewer and no more than 1 loop in this prime processing section. * display identified primes (space-separated) to a file pointer called **stdout** * stop your stopwatch immediately following your prime processing loops (and terminating newline display to **stdout**). Calculate the time that has transpired (ending time minus starting time). * output the processing run-time to the file pointer called **stderr** * your output **MUST** conform to the example output in the **execution** section below. This is also a test to see how well you can implement to specifications. Basically: * as primes are being displayed, they are space-separated (first prime hugs the left margin), and when all said and done, a newline is issued. ====Coding Restrictions==== Since a lot of our explorations this semester were algorithmic in nature, we made a lot of use of restricting various approaches we could take in the implementation of our solutions. As such, the following restrictions are in place for your implementations: * no global variables * no infinite loops * no if() shunts (have good conditions!) * what is the difference between an if() shunt and a valid conditional statement? if() shunts are a messy way of avoiding iterations; valid if() statements denote conditionally optional processing that occurs every iteration. * one return statement (max) per function * exit() calls are limited to command-line or resource allocation error processing * avoid redundant sections of code (merge logic or break out functions) Remember, the focus should be on writing elegant code, and NOT on brute forcing some solution. Show me that you've learned something this semester- write clean, well commented, consistently indented code. =====Makefile operations===== Makefiles provide a build automation system for our programs, instructing the computer on how to compile files, so we don't have to constantly type compiler command-lines ourselves. I've also integration some other useful, value-added features that will help you with overall administration of the project. Basic operation of the Makefile is invoked by running the command "**make**" by itself. The default action is to compile everything in the project directory. A description of some available commands include: * **make**: compile everything * any **warnings** or **errors** generated by the compiler will go into a file in the base directory of pnc0 in a file called **errors**; you can **cat** it to view the information. * **make debug**: compile everything with debug support * any **warnings** or **errors** generated by the compiler will be displayed to the screen as the programs compile. * **make clean**: remove all binaries * **make save**: make a backup of your current work =====Command-Line Arguments===== To automate our comparisons, we will be making use of command-line arguments in our programs. ====header files==== We don't need any extra header files to use command-line arguments, but we will need an additional header file to use the **atoi(3)** function, which we'll use to quickly turn the command-line parameter into an integer, and that header file is **stdlib.h**, so be sure to include it with the others: #include #include ====setting up main()==== To accept (or rather, to gain access) to arguments given to your program at runtime, we need to specify two parameters to the main() function. While the names don't matter, the types do.. I like the traditional **argc** and **argv** names, although it is also common to see them abbreviated as **ac** and **av**. Please declare your main() function as follows: int main (int argc, char **argv) There are two very important variables involved here (the types are actually what are important, the names given to the variables are actually quite, variable; you may see other references refer to them as things like "ac" and "av"): * int argc: the count (an integer) of tokens given on the command line (program name + arguments) * char **argv: an array of strings (technically an array of an array of char) that contains "strings" of the various tokens provided on the command-line. The arguments are accessible via the argv array, in the order they were specified: * argv[0]: program invocation (path + program name) * argv[1]: our maximum / upper bound * argv[2]: should be provided as a 1 for this project * argv[3]: conditionally optional; represents lower bound * argv[4]: conditionally optional; represents upper bound * argv[5]: conditionally optional; represents optimizations, if any Additionally, let's not forget the **argc** variable, an integer, which contains a count of arguments (argc == argument count). If we provided argv[0] through argv[4], argc would contain a 5. ===example=== For example, if we were to execute the as follows: lab46:~/src/discrete/eoce/0x3$ ./cnv3 128 1 2 2048 13 We'd have: * argv[0]: "./cnv3" * argv[1]: "128" (note, NOT integer 128, but a string) * argv[2]: "1" * argv[3]: "2" * argv[4]: "2048" * argv[5]: "13" and let's not forget: * argc: 6 (there are 6 things, argv indexes 0, 1, 2, 3, 4, and 5) With the conditionally optional arguments as part of the program spec, for a valid execution of the program, argc could be a value anywhere from 3 to 6. ====Simple argument checks==== While there are a number of checks we should perform, one of the first should be a check to see if the minimal number of arguments has been provided: // if less than 3 arguments (program_name + quantity + argv[2] == 3) // have been provided // if (argc < 3) { fprintf(stderr, "%s: insufficient number of arguments!\n", argv[0]); exit(1); } Since argv[3] (lower bound) and argv[4] (upper bound) are conditionally optional, it wouldn't make sense to check for them in the overall count. But we can and do still want to stategically utilize **argc** to determine if an argv[3] or argv[4] is present. ====Grab and convert max==== Finally, we need to put the argument representing the maximum quantity into a variable. I'd recommend declaring a variable of type **int**. We will use the **atoi(3)** function to quickly convert the command-line arguments into **int** values: max = atoi (argv[1]); And now we can proceed with the rest of our prime implementation. =====Timing===== Often times, when checking the efficiency of a solution, a good measurement (especially for comparison), is to time how long the processing takes. In order to do that in our prime number programs, we are going to use C library functions that obtain the current time, and use it as a stopwatch: we'll grab the time just before starting processing, and then once more when done. The total time will then be the difference between the two (end_time - start_time). We are going to use the **gettimeofday(2)** function to aid us in this, and to use it, we'll need to do the following: ====header file==== In order to use the **gettimeofday(2)** function in our program, we'll need to include the **sys/time.h** header file, so be sure to add it in with the existing ones: #include #include #include ====timeval variables==== **gettimeofday(2)** uses a **struct timeval** data type, of which we'll need to declare two variables in our programs (one for storing the starting time, and the other for the ending time). Please declare these with your other variables, up at the top of main() (but still WITHIN main()-- you do not need to declare global variables). struct timeval time_start; // starting time struct timeval time_end; // ending time ====Obtaining the time==== To use **gettimeofday(2)**, we merely place it at the point in our code we wish to take the time. For our prime number programs, you'll want to grab the start time **AFTER** you've declared variables and processed arguments, but **JUST BEFORE** starting the driving loop doing the processing. That call will look something like this: gettimeofday(&time_start, 0); The ending time should be taken immediately after all processing (and prime number output) is completed, and right before we display the timing information to STDERR: gettimeofday(&time_end, 0); ====Displaying the runtime==== Once we have the starting and ending times, we can display this to the **stderr** file pointer. You'll want this line: fprintf(stderr, "%8.4lf\n", time_end.tv_sec-time_start.tv_sec+((time_end.tv_usec-time_start.tv_usec)/1000000.0)); For clarity sake, that format specifier is "%8.4lf", where the "lf" is "long float", that is **NOT** a number 'one' but a lowercase letter 'ell'. And with that, we can compute an approximate run-time of our programs. The timing won't necessarily be accurate down to that level of precision, but it will be informative enough for our purposes. =====Execution===== ====specified quantity==== Your program output should be as follows (given the specified quantity): lab46:~/src/discrete/eoce/0x3$ ./cnv3 24 1 # same as 24 1 0 0 0 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 0.0001 lab46:~/src/discrete/eoce/0x3$ The execution of the programs is short and simple- grab the parameters, do the processing, produce the output, and then terminate. ====invalid lower bound==== Here's an example that should generate an error upon running (based on project specifications): lab46:~/src/discrete/eoce/0x3$ ./cnv3 32 1 0 ./cnv3: invalid lower bound lab46:~/src/discrete/eoce/0x3$ In this case, the program logic should have detected an invalid condition and bailed out before prime computations even began. No timing data is displayed, because exiting should occur even prior to that. ====upper bound overriding quantity==== As indicated above, there is potential interplay with an active quantity and upper bound values. Here is an example where upper bound overrides quantity, resulting in an early termination (ie upper bound is hit before quantity): lab46:~/src/discrete/eoce/0x3$ ./cnv3 128 1 7 23 10 7 11 13 17 19 23 0.0001 lab46:~/src/discrete/eoce/0x3$ Also for fun, I set the lower bound to 7, so you'll see computation starts at 7 (vs. the usual 2). ====In general==== Analyze the times you see... do they make sense, especially when comparing the algorithm used and the quantity being processed? These are related to some very important core Computer Science considerations we need to be increasingly mindful of as we design our programs and implement our solutions. Algorithmic complexity and algorithmic efficiency will be common themes in all we do. =====Submission===== To successfully complete this project, the following criteria must be met: * Code must compile cleanly (no warnings or errors) * Output must be correct, and match that given in sample output above. * Code must be nicely and consistently indented * Code must utilize the algorithm and optimizations presented above. * Code must be commented * Output Formatting (including spacing) of program must conform to the provided output (see above). * Track/version the source code in your lab46 SEMESTER repository * Submit a copy of your source code to me using the **submit** tool.