This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
haas:spring2016:cprog:projects:pnc0 [2016/02/22 17:29] – created wedge | haas:spring2016:cprog:projects:pnc0 [2016/02/27 13:13] (current) – [Program] wedge | ||
---|---|---|---|
Line 17: | Line 17: | ||
* ability to iterate sections of code (for/while statement) | * ability to iterate sections of code (for/while statement) | ||
+ | =====Algorithmic Complexity===== | ||
+ | A concept in Computer Science curriculum is the notion of computational/ | ||
+ | |||
+ | Basically, a solution to a problem exists on a spectrum of efficiency (typically constrained by time vs. space): if optimizing for time, the code size tends to grow. | ||
+ | |||
+ | Additionally, | ||
+ | |||
+ | This project will endeavor to introduce you to the notion that the algorithms and constructs you use in coding your solution can and do make a difference to the overall runtime of your code. | ||
=====Background===== | =====Background===== | ||
In mathematics, | In mathematics, | ||
Line 36: | Line 44: | ||
====brute force==== | ====brute force==== | ||
+ | The brute force approach is the simplest to implement (and likely also the worst-performing). We will use it as our baseline (it is nice to have something to compare against). | ||
+ | To perform it, we simply attempt to evenly divide all the values between 1 and the number in question. If any one of them divides evenly, the number is **NOT** prime, but instead a composite value. | ||
+ | Checking the remainder of a division indicates whether or not a division was clean (having 0 remainder indicates such a state). | ||
+ | For example, the number 11: | ||
+ | |||
+ | < | ||
+ | 11 % 2 = 1 (2 is not a factor of 11) | ||
+ | 11 % 3 = 2 (3 is not a factor of 11) | ||
+ | 11 % 4 = 3 (4 is not a factor of 11) | ||
+ | 11 % 5 = 1 (5 is not a factor of 11) | ||
+ | 11 % 6 = 5 (6 is not a factor of 11) | ||
+ | 11 % 7 = 4 (7 is not a factor of 11) | ||
+ | 11 % 8 = 3 (8 is not a factor of 11) | ||
+ | 11 % 9 = 2 (9 is not a factor of 11) | ||
+ | 11 % 10 = 1 (10 is not a factor of 11) | ||
+ | </ | ||
+ | |||
+ | Because none of the values 2-10 evenly divided into 11, we can say it passed the test: **11 is a prime number** | ||
+ | |||
+ | On the other hand, take 119: | ||
+ | |||
+ | < | ||
+ | 119 % 2 = 1 (2 is not a factor of 119) | ||
+ | 119 % 3 = 2 (3 is not a factor of 119) | ||
+ | 119 % 4 = 3 (4 is not a factor of 119) | ||
+ | 119 % 5 = 4 (5 is not a factor of 119) | ||
+ | 119 % 6 = 5 (6 is not a factor of 119) | ||
+ | 119 % 7 = 0 (7 is a factor of 119) | ||
+ | </ | ||
+ | |||
+ | Because 7 evenly divided into 119, it failed the test: 119 is **not** a prime, but instead a composite number. | ||
+ | |||
+ | There is no further need to check the remaining values, as once we have proven the non-primality of a number, the state is set: it is composite. So be sure to use a **break** statement to terminate the computation loop (will also be a nice boost to runtime). | ||
+ | |||
+ | ====square root==== | ||
+ | An optimization to the computation of prime numbers is the square root trick. Basically, if we've processed numbers up to the square root of the number we're testing, and none have proven to be evenly divisible, we can also assume primality and bail out. | ||
+ | |||
+ | The C library has a **sqrt()** function available through including the **math.h** header file, and linking against the math library at compile time (add **-lm** to your gcc line). | ||
+ | |||
+ | To use **sqrt()**, we pass in the value we wish to obtain the square root of, and assign the result to an **int**: | ||
+ | |||
+ | <code c> | ||
+ | int x = 25; | ||
+ | int y = 0; | ||
+ | |||
+ | y = sqrt(x); | ||
+ | |||
+ | // y should be 5 as a result | ||
+ | </ | ||
+ | |||
+ | For instance, the number 37 (using the square root optimization), | ||
+ | |||
+ | < | ||
+ | 37 % 2 = 1 (2 is not a factor of 37) | ||
+ | 37 % 3 = 1 (3 is not a factor of 37) | ||
+ | 37 % 4 = 1 (4 is not a factor of 37) | ||
+ | 37 % 5 = 2 (5 is not a factor of 37) | ||
+ | 37 % 6 = 1 (6 is not a factor of 37) | ||
+ | </ | ||
+ | |||
+ | Because none of these values evenly divides, we can give 37 a pass: **it is a prime** | ||
+ | |||
+ | This will dramatically improve the runtime, and offers a nice comparison against our brute force baseline. | ||
+ | |||
+ | ====further optimization==== | ||
+ | There are many other methods, approaches, and tweaks that can be employed to further improve runtime (while maintaining accuracy-- all your solutions must match: the same prime numbers should be identified no matter which program is run). | ||
+ | |||
+ | So I'd like you to explore other optimizations that can be made, be it using other prime number algorithms, further refining existing ones, or playing off patterns in numbers. | ||
+ | |||
+ | One assumption I will allow you to make for your optimized solution is that the single-digit primes (2, 3, 5, 7) can be assumed prime, and just printed out if having them be calculated would otherwise break your algorithm (might be helpful to some people; I certainly found it useful in some of my solutions). | ||
+ | |||
+ | ===some optimization ideas=== | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | |||
+ | Of particular note: the sieve algorithms take advantage of a increased storage space, where others (like brute force) are predominantly time-based. The sieve is also more detailed... even if you don't decide to implement a sieve, take a look and compare the algorithm to what you've done to see the differences in approaches. | ||
=====Program===== | =====Program===== | ||
- | It is your task to write the program that will use the above method to compute the requested one-, two-, or three-digit value against | + | It is your task to write 3 separate prime number calculating programs: |
+ | |||
+ | - **primebrute.c**: | ||
+ | - **primesqrt.c**: | ||
+ | | ||
Your program should: | Your program should: | ||
- | * obtain | + | * obtain |
- | * input should | + | * argv[1]: maximum value to calculate to (your program |
- | * determine from the input if it is a one-, two-, or three-digit number | + | * argv[2]: visibility. If a **1** is provided, print out the prime numbers in a space separated list; if a **0** is provided, run silent: only display the runtime information. |
- | * perform the correct | + | * these values should be positive integer values; you can make the assumption that the user will always do the right thing. |
- | * propagate any carries | + | * start your stopwatch (see **timing** section below): |
- | * output the final value | + | * perform the algorithm against the value |
- | * you can display each digit individually, or combine them into one variable | + | * if enabled, display the prime numbers found in the range |
+ | * output the processing run-time to STDERR (do this always). | ||
+ | * your output **MUST** be conformant to the example output in the **execution** section below. This is also a test to see how well you can implement to specifications. Basically: | ||
+ | * if primes are being displayed, they are space-separated (first prime hugs the left margin), and when all said and done, a newline is issued. | ||
+ | * the timing information will be displayed in accordance to code I will provide | ||
+ | ====Other considerations==== | ||
+ | All your programs MUST perform the calculations to determine primality- you may not always be printing it out (depending on argv[2]), but work must be done to ensure the value is identified as a prime/ | ||
+ | |||
+ | For example: | ||
+ | |||
+ | < | ||
+ | if (show == 1) | ||
+ | { | ||
+ | work to determine if it is prime | ||
+ | if prime | ||
+ | print number | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | will actually skip the core processing, and you’ll see some amazing runtimes as a result. They may be amazing, but they’re not real, because you’re not actually doing anything. | ||
+ | |||
+ | What you want instead: | ||
+ | |||
+ | < | ||
+ | work to determine if it is prime | ||
+ | if (show == 1) | ||
+ | { | ||
+ | if prime | ||
+ | print number | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | there are many ways to express the above, through compound if statements and other arrangements, | ||
+ | |||
+ | That also isn’t to say you can’t avoid doing a work run if you’re able to determine its non-primality with a simple pretest (even value, factor of 3, etc.), but that’s actually considered more of the core “work”, so it is more than okay (and encouraged in the primeopt). | ||
+ | =====Command-Line Arguments===== | ||
+ | To automate our comparisons, | ||
+ | |||
+ | ====header files==== | ||
+ | We don't need any extra header files to use command-line arguments, but we will need an additional header file to use the **atoi(3)** function, which we'll use to quickly turn the command-line parameter into an integer, and that header file is **stdlib.h**, | ||
+ | |||
+ | <code c> | ||
+ | #include < | ||
+ | #include < | ||
+ | </ | ||
+ | |||
+ | ====setting up main()==== | ||
+ | To accept (or rather, to gain access) to arguments given to your program at runtime, we need to specify two parameters to the main() function. While the names don't matter, the types do.. I like the traditional **argc** and **argv** names, although it is also common to see them abbreviated as **ac** and **av**. | ||
+ | |||
+ | Please declare your main() function as follows: | ||
+ | |||
+ | <code c> | ||
+ | int main(int argc, char **argv) | ||
+ | </ | ||
+ | |||
+ | The arguments are accessible via the argv array, in the order they were specified: | ||
+ | |||
+ | * argv[0]: program invocation (path + program name) | ||
+ | * argv[1]: our maximum / upper bound | ||
+ | * argv[2]: visibility (1 to show primes, 0 to be silent) | ||
+ | |||
+ | There are ways to do flexible argument parsing, and even to have dashed options as we have on various commands. But such things are beyond the scope of our current endeavors, so we will stick to this basic functionality for now. | ||
+ | |||
+ | ====Simple argument checks==== | ||
+ | Although I'm not going to require extensive argument checking for this project, here's how we would check to see if the minimal number of arguments has been provided: | ||
+ | |||
+ | <code c> | ||
+ | if (argc < 3) // if less than 3 arguments have been provided | ||
+ | { | ||
+ | fprintf(stderr, | ||
+ | exit(1); | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | If you're wondering, "why 3? I thought we only had 2.", C includes the program' | ||
+ | |||
+ | ====Grab and convert max and visibility==== | ||
+ | Finally, we need to put the arguments representing the maximum value and visibility settings into variables. | ||
+ | |||
+ | I'd recommend declaring two variables of type **int**. | ||
+ | |||
+ | We will use the **atoi(3)** function to quickly convert the command-line arguments into **int** values: | ||
+ | |||
+ | <code c> | ||
+ | max = atoi(argv[1]); | ||
+ | show = atoi(argv[2]); | ||
+ | </ | ||
+ | |||
+ | And now we can proceed with the rest of our prime implementation. | ||
+ | |||
+ | =====Timing===== | ||
+ | Often times, when checking the efficiency of a solution, a good measurement (especially for comparison), | ||
+ | |||
+ | In order to do that in our prime number programs, we are going to use C library functions that obtain the current time, and use it as a stopwatch: we'll grab the time just before starting processing, and then once more when done. The total time will then be the difference between the two (end_time - start_time). | ||
+ | |||
+ | We are going to use the **gettimeofday(2)** function to aid us in this, and to use it, we'll need to do the following: | ||
+ | |||
+ | ====header file==== | ||
+ | In order to use the **gettimeofday(2)** function in our program, we'll need to include the **sys/ | ||
+ | |||
+ | <code c> | ||
+ | #include < | ||
+ | #include < | ||
+ | #include < | ||
+ | </ | ||
+ | |||
+ | ====timeval variables==== | ||
+ | **gettimeofday(2)** uses a **struct timeval** data type, of which we'll need to declare two variables in our programs (one for storing the starting time, and the other for the ending time). | ||
+ | |||
+ | Please declare these with your other variables, up at the top of main() (but still WITHIN main()-- you do not need to declare global variables). | ||
+ | |||
+ | <code c> | ||
+ | struct timeval time_start; // starting time | ||
+ | struct timeval time_end; | ||
+ | </ | ||
+ | |||
+ | ====Obtaining the time==== | ||
+ | To use **gettimeofday(2)**, | ||
+ | |||
+ | For our prime number programs, you'll want to grab the start time **AFTER** you've declared variables and processed arguments, but **JUST BEFORE** starting the driving loop doing the processing. | ||
+ | |||
+ | That call will look something like this: | ||
+ | |||
+ | <code c> | ||
+ | gettimeofday(& | ||
+ | </ | ||
+ | |||
+ | The ending time should be taken immediately after all processing (and prime number output) is completed, and right before we display the timing information to STDERR: | ||
+ | |||
+ | <code c> | ||
+ | gettimeofday(& | ||
+ | </ | ||
+ | |||
+ | ====Displaying the runtime==== | ||
+ | Once we having the starting and ending times, we can display this to STDERR. You'll want this line: | ||
+ | |||
+ | <code c> | ||
+ | fprintf(stderr, | ||
+ | </ | ||
+ | |||
+ | For clarity sake, that format specifier is " | ||
+ | |||
+ | And with that, we can compute an approximate run-time of our programs. The timing won't necessarily be accurate down to that level of precision, but it will be informative enough for our purposes. | ||
=====Execution===== | =====Execution===== | ||
Several operating behaviors are shown as examples. | Several operating behaviors are shown as examples. | ||
- | A two digit value: | + | Brute force showing primes: |
<cli> | <cli> | ||
- | lab46: | + | lab46: |
- | Enter value: 32 | + | 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 |
- | 32 x 11 = 352 | + | 0.000088 |
- | lab46: | + | lab46: |
</ | </ | ||
- | Next, a one digit value: | + | Brute force not showing primes: |
<cli> | <cli> | ||
- | lab46: | + | lab46: |
- | Enter value: 7 | + | |
- | 7 x 11 = 77 | + | lab46: |
- | lab46: | + | |
</ | </ | ||
- | Finally, three digit value: | + | Similarly, for the square root version (showing primes): |
<cli> | <cli> | ||
- | lab46: | + | lab46: |
- | Enter value: 567 | + | 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 |
- | 567 x 11 = 6237 | + | 0.000089 |
- | lab46: | + | lab46: |
</ | </ | ||
- | The execution of the programs is short and simple- | + | And, without showing primes: |
+ | |||
+ | < | ||
+ | lab46: | ||
+ | 0.000006 | ||
+ | lab46: | ||
+ | </ | ||
+ | |||
+ | Don't be alarmed by the visible square root actually seeming to take MORE time; we have to consider the range as well: 90 is barely anything, and there is overhead incurred from the **sqrt()** function call. The real savings will start to be seen once we get into the thousands (and beyond). | ||
+ | |||
+ | And that's another neat thing with algorithm comparison: a " | ||
+ | |||
+ | The same goes for your optimized solution (same parameters). | ||
+ | |||
+ | The execution of the programs is short and simple- | ||
+ | |||
+ | =====Check Results===== | ||
+ | If you'd like to compare your implementations, | ||
+ | |||
+ | In order to work, you **MUST** be in the directory where your **primebrute**, | ||
+ | |||
+ | For instance (running on my implementations): | ||
+ | |||
+ | < | ||
+ | lab46: | ||
+ | ============================================ | ||
+ | | ||
+ | ============================================ | ||
+ | | ||
+ | 16 0.000002 | ||
+ | 32 0.000003 | ||
+ | 64 0.000005 | ||
+ | | ||
+ | | ||
+ | | ||
+ | 1024 0.000540 | ||
+ | 2048 0.001761 | ||
+ | 4096 0.006115 | ||
+ | 8192 0.021259 | ||
+ | | ||
+ | | ||
+ | | ||
+ | 131072 | ||
+ | 262144 | ||
+ | 524288 | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | ============================================ | ||
+ | | ||
+ | ============================================ | ||
+ | lab46: | ||
+ | </ | ||
+ | |||
+ | For evaluation, each test is run 4 times, and the resulting time is averaged. During development, | ||
+ | |||
+ | If the runtime of a particular prime variant exceeds an upper threshold (likely to be set at 2 seconds), it will be omitted from further tests, and a series of dashes will instead appear in the output. | ||
+ | |||
+ | If you don't feel like waiting, simply hit **CTRL-c** and the script will terminate. | ||
+ | |||
+ | In the example output above, my **primeopt** is playing with an implementation of the **6a+/-1** algorithm. | ||
+ | |||
+ | I also include a validation check- to ensure your prime programs are actually producing the correct list of prime numbers. If the check is successful, you will see " | ||
+ | If you'd like to experiment with other variations, the script also recognizes prime variants of the following names: | ||
+ | * primeopt0 (for an additional optimization) | ||
+ | * primeopt1 (and another) | ||
+ | * primeopt2 (if you'd like another entry for another optimization) | ||
+ | * primeopt3 (for yet another optimization) | ||
+ | * primeopt4 (and one more; hey, I want you to have nice things) | ||
=====Bonus Points===== | =====Bonus Points===== | ||
There will be an additional bonus point opportunity with this project, based on processing run-time of your optimized solution. | There will be an additional bonus point opportunity with this project, based on processing run-time of your optimized solution. |