projects
projects
Corning Community College
CSCS1320 C/C++ Programming
~~TOC~~
To apply your skills in algorithmic optimization through the implementation of improved prime number calculating programs.
A concept in Computer Science curriculum is the notion of computational/algorithmic complexity.
Basically, a solution to a problem exists on a spectrum of efficiency (typically constrained by time vs. space). Once you've hashed out an implementation:
Let it be clear: using modern computers bound by fourth dimensional constraints, we cannot write solutions without some foothold in both space and time (any program has statements in it that accomplish the task at hand (space occupied), and that same program takes time to run (even if it happens rather quickly)). What we can do is influence where on the space-time spectrum our implementation falls, and if done strategically for the particular constraint at hand, can aid us in achieving desired improvements in program efficiency.
In the previous projects, we focused on algorithms that were more constrained by time- taking progressively more time the larger the data set to be processed. These time-constrained algorithms also happen to be the type of algorithm you're most familiar with (or at least, were indoctrinated into thinking with…) your third grade self, on the other hand, potentially would have found more familiarity with the space-constrained methods this project will explore.
So in contrast to time-constrained approaches we have space-constrained approaches, which will utilize more space (ie allocating memory) in the solving of the program to the benefit of requiring less time.
We can never be entirely time- or space-free… some element of the other will always be present. But by embracing the advantages of an approach, we can really make an impact in the desired area.
What we will be looking at in this project is a type of prime number calculating algorithm known as a sieve. It will make strategic use of space to aid us in the solving of the problem instead of exclusively relying on time.
Of note, the “mainstream” prime number computing efforts typically make use of sieves.
Following will be the optimized algorithms I'd like you to implement for comparison with all the others.
For this project, please assume the following:
But, the conceptual level we'll be pursuing will be the space-based equivalent of primebrute/primebrk, that is, no fancy optimizations for odd numbers, nor the sqrt() trick, just straight up processing of the values that need to be processed.
Your next program, and first sieve, will be the Sieve of Eratosthenes. Perhaps among the best and likely longest-known sieves, its origins date from antiquity.
This sieve, instead of calculating to determine the eligibility of a prime, works in a manner of marking off patterns of values that cannot be prime (so, it is composite-focused in approach vs. prime-focused).
In order for it to work, we must store all the values we're processing so we can obtain what is left when done– what remains are the prime values.
Here is the wikipedia page: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
Here is an animated image of this sieve in action (from wikipedia):
This YouTube video may also be helpful:
A basic outline of the algorithm (again, from the wikipedia page):
To find all the prime numbers less than or equal to a given integer n by Eratosthenes' method:
When the algorithm terminates, the numbers remaining in the list that are not marked are the primes below n.
This is a space-constrained algorithm, therefore we will need a chunk of space to store these values. Think about what a lot of this looks like with respect to how you know how to organize data.
Taking the primesoe codebase, enhance it to do odds-only processing, ideally considering overall storage used.
Taking the primesoe codebase, enhance it to utilize the sqrt() (or sqrt()-less square root) trick.
Compared to the brute/brk algorithms from the previous projects, the performance differences between using sqrt() and approximating the square root should be minimal. You are free to utilize either approach in your implementation for this project.
NOTE: this variant is still to process all values (odd and even).
NOTE: if you are curious in comparing actual performance between sqrt() and approximated square root, primerun will recognize the following program names:
Taking the primesoe codebase, enhance it to do odds-only processing AND utilize the sqrt() (or sqrt()-less square root) trick.
Compared to the brute/brk algorithms from the previous projects, the performance differences between using sqrt() and approximating the square root should be minimal. You are free to utilize either approach in your implementation for this project.
NOTE: if you are curious in comparing actual performance between sqrt() and approximated square root for this odds-processing variant, primerun will recognize the following program names:
It is your task to write four Sieve of Eratosthenes-oriented prime number calculating programs:
Your programs should:
For those familiar with the grabit tool on lab46, I have made some skeleton files and a custom Makefile available for this project.
To “grab” it:
lab46:~/src/cprog$ grabit cprog pnc2 make: Entering directory '/var/public/SEMESTER/cprog/pnc2' ‘/var/public/SEMESTER/cprog/pnc2/Makefile’ -> ‘/home/USERNAME/src/cprog/pnc2/Makefile’ ‘/var/public/SEMESTER/cprog/pnc2/primesoe.c’ -> ‘/home/USERNAME/src/cprog/pnc2/primesoe.c’ ‘/var/public/SEMESTER/cprog/pnc2/primesoeodd.c’ -> ‘/home/USERNAME/src/cprog/pnc2/primesoeodd.c’ ‘/var/public/SEMESTER/cprog/pnc2/primesoesrt.c’ -> ‘/home/USERNAME/src/cprog/pnc2/primesoesrt.c’ ‘/var/public/SEMESTER/cprog/pnc2/primesoesrtodd.c’ -> ‘/home/USERNAME/src/cprog/pnc2/primesoesrtodd.c’ make: Leaving directory '/var/public/SEMESTER/cprog/pnc2' lab46:~/src/cprog$ cd pnc2 lab46:~/src/cprog/pnc2$ ls Makefile primesoe.c primesoeodd.c primesoesrt.c primesoesrtodd.c lab46:~/src/cprog/pnc2$
Furthermore, if your pnc2/ project directory is next to a pnc1/ and pnc0/ directory, each containing those project's specific prime variants, you can symlink them into the current project directory with a make link:
lab46:~/src/cprog/pnc2$ make link ‘./primebrute.c’ -> ‘../pnc0/primebrute.c’ ‘./primebrk.c’ -> ‘../pnc0/primebrk.c’ ‘./primebrkodd.c’ -> ‘../pnc1/primebrkodd.c’ ‘./primebrksrt.c’ -> ‘../pnc1/primebrksrt.c’ ‘./primebrkoddsrt.c’ -> ‘../pnc1/primebrkoddsrt.c’ ‘./primebrksrtopt.c’ -> ‘../pnc1/primebrksrtopt.c’ ‘./primebrkoddsrtopt.c’ -> ‘../pnc1/primebrkoddsrtopt.c’ lab46:~/src/cprog/pnc2$
And, of course, your basic compile and clean-up operations:
Just another “nice thing” we deserve.
NOTE: You do NOT want to do this on a populated pnc2/ project directory– it will overwrite files. Only do this on an empty directory.
To automate our comparisons, we will be making use of command-line arguments in our programs. As we have yet to really get into arrays, I will provide you same code that you can use that will allow you to utilize them for the purposes of this project.
We don't need any extra header files to use command-line arguments, but we will need an additional header file to use the atoi(3) function, which we'll use to quickly turn the command-line parameter into an integer, and that header file is stdlib.h, so be sure to include it with the others:
#include <stdio.h> #include <stdlib.h>
To accept (or rather, to gain access) to arguments given to your program at runtime, we need to specify two parameters to the main() function. While the names don't matter, the types do.. I like the traditional argc and argv names, although it is also common to see them abbreviated as ac and av.
Please declare your main() function as follows:
int main(int argc, char **argv)
The arguments are accessible via the argv array, in the order they were specified:
Although I'm not going to require extensive argument parsing or checking for this project, we should check to see if the minimal number of arguments has been provided:
if (argc < 2) // if less than 2 arguments have been provided { fprintf(stderr, "Not enough arguments!\n"); exit(1); }
Finally, we need to put the argument representing the maximum value into a variable.
I'd recommend declaring a variable of type int.
We will use the atoi(3) function to quickly convert the command-line arguments into int values:
max = atoi(argv[1]);
And now we can proceed with the rest of our prime implementation.
Being new to arrays, you may not know it yet, but there is an upper bound to how big you can declare an array at compile time.
That is, using the conventional array declaration:
data_type arrayname[size];
It is likely dependent on some combination of the execution environment (32-, 64-bit?) and the compiler, but in practice I've found things generally get unhappy if we try to request array sizes at some point beyond 1-4million elements (which makes sense, most processes in modern operating systems allocate a default stack size of 4-8MiB… if a short int is 2 bytes, how many consecutive short ints before you hit 4MiB?)
As we'll be needing array sizes considerably larger than this for the project (will we? If I could drop a hint: yes, we will), how do we get around this?
The answer: runtime memory allocation
We can allocate memory at run-time, which gives us the benefit of populating variables and performing equations to derive more specific values. This is also a rather powerful programming technique/ability to deploy in more complex programs, so keep it handy in your toolkits.
To do so, we ignore the fake facade that is the array, and treat it like it actually exists to the computer: a pointer.
So when you declare your array, instead of using the square bracket notation, do it as a pointer instead:
datatype *array_name = NULL;
THEN, later on in your program when it comes time to allocate the array, we use a memory allocation function, of which for our current purposes one of the following 2 could suit our needs:
So, if we wanted a 32768 element integer array, to show examples allocating with each function (you'd obviously only use one, whichever best fit your usage scenario):
int *array_name = NULL; // using malloc() array_name = (int *) malloc (sizeof(int) * 32768); // using calloc() array_name = (int *) calloc (32768, sizeof(int));
What else is going on here? What's with the parenthesis and the integer pointer?
That's a typecast, and is a way of telling the compiler to contort data IN THAT INSTANCE to a certain type (because it is a different type, other than what we want).
As it turns out, malloc() and calloc() both return raw memory, which in C is manifested as type void *; if our array is an integer pointer, we should be nice and cast it (apply the rules of an integer pointer to the raw memory, allowing the assignment to take place without the compiler whining– some compilers, depending on settings in place, may whine more than others).
With this, we can get around the compile time array limits, because by allocating memory at runtime, we're getting additional memory given to us, vs. trying to fit within the default initial allotment.
You should then be able to access it using regular array notation, or my preference: pointer arithmetic.
// C array syntax for accessing 3rd element of array: array_name[2] // C pointer arithmetic syntax for accessing 3rd element of array: *(array_name+2)
The nature of utilizing an array for the sieves, as you will likely discover quickly, is that the sheer quantity of elements needed increments drastically as the desired quantity goes up.
For example, if we were seeking the first 32 primes:
lab46:~/src/cprog/pnc2$ ./primebrk 32 1 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131
… we would see that the 32nd prime is 131. For the Sieve of Eratosthenes to successfully work in this case, we'd need an array size of at least 132 (if we do the standard mapping of array element to actual number, where element 0 would correspond with the number 0, and element 131 would correspond with number 131; 0-131 = 132 elements).
But that number isn't immediately known (indeed, the nature of the distribution of prime numbers is the subject of considerable mathematical exploration, with many various proofs demonstrating different aspects of their distribution– don't believe me? Just google “distribution of primes” and read up for yourself). And short of some impressive looking and very calculus-requiring integral, there doesn't (at least from some of my searches) seem to be a decent algebraic expression we can use to come close.
And without this knowledge of the largest prime in the desired quantity, our sieve implementation (with the requirements as I have specified) is sort of at a standstill.
So how do we get around this? I propose a combination of accepting knowledge from a more aware source (me), and a little algebraic ingenuity. Let's hack ourselves an array size equation! (Do I hear a “heck yeah!” ? “HECK YEAH!”)
First, you can actually produce a lot of this data, from your pnc1 programs (technically pnc0 as well, but you'll be waiting a whole heck of a lot longer).
Let's take one of the purportedly best-running programs from pnc1 (primebrksrtoptodd) and contort it into giving us the information that we need.
With a little bit of UNIX shell magic (again, I'm not expecting you to know this, so I'm just giving it to you, but from the abstract/conceptual level, you should be able to recognize things here like loops and output statements, and with some informed knowledge of shell syntax, see variables as well), we have the following:
# clear the screen clear # loop for a powers of 2 progression 32-4194304 (inclusive) for((qty=32; qty<=4194304; qty*=2)); do # display the quantity lead-in information printf "[%7s] %7sth prime: " "${qty}" "${qty}" # obtain the last/largest prime in the specified quantity value=$(./primebrksrtoptodd ${qty} 1 2>/dev/null | tr ' ' '\n' | tail -2 | head -1) # display it, along with calculating its approximate multiplier printf "%8s (x%s)\n" "${value}" $(echo "(${value}/${qty})+1" | bc -q) done
We start with qty of 32, as that is the default starting point of primerun (so why not; makes sense to work with what we've got). But there's still an unknown: the MAXIMUM quantity we'll need to deal with, especially in relation to primerun (focus on primerun execution, not to infinity). This is less available to you, short of just guessing some possible maximum.
I'm going to drop a hint: you'll very likely not be going beyond a quantity of 4194304 (at least with a primerun cut-off limit of 2 seconds), so use THAT as your upper bound of quantity, and that is why I use that number in the shell script given above.
You don't have to worry about running said shell script, as the output is right here:
[ 32] 32th prime: 131 (x5) [ 64] 64th prime: 311 (x5) [ 128] 128th prime: 719 (x6) [ 256] 256th prime: 1619 (x7) [ 512] 512th prime: 3671 (x8) [ 1024] 1024th prime: 8161 (x8) [ 2048] 2048th prime: 17863 (x9) [ 4096] 4096th prime: 38873 (x10) [ 8192] 8192th prime: 84017 (x11) [ 16384] 16384th prime: 180503 (x12) [ 32768] 32768th prime: 386093 (x12) [ 65536] 65536th prime: 821641 (x13) [ 131072] 131072th prime: 1742537 (x14) [ 262144] 262144th prime: 3681131 (x15) [ 524288] 524288th prime: 7754077 (x15) [1048576] 1048576th prime: 16290047 (x16) [2097152] 2097152th prime: 34136029 (x17) [4194304] 4194304th prime: 71378569 (x18)
What we have is the following:
So from the data given above, we can see that a rough approximation of space needed for 32 primes would be 32*5 or 160 elements (131 falls nicely into 160).
For 1024 we have a multiplier of 8 (1024*8 = 8192, which 8161 safely fits into).
Okay, so we've got some raw numbers, we know the array size needed for the smallest quantity (32*5), we also know the array size needed for the likely largest quantity (4194304*18). So how do we handle the intermediaries?
As you can see, there is no a clean progression at play here, so no simple patterns we can play off of. I offer a few suggestions (and for the purposes of this project, any are acceptable):
For example, we see that since our max multiplier is 18, what if we did some factoring of 18? Say: 6, 12, 18?
We could rig up something like this:
if (qty <= 128) array_size = qty * 6; else if (qty <= 32768) array_size = qty * 12; else array_size = qty * 18;
In general, though, this is just a mild annoyance in our quest to implement the sieve of Eratosthenes, so if you just hard code an 18, while certainly dirty, will get the job done for our current purposes (I will most certainly NOT hold it against you).
Make sense? Hopefully. And moreso, if you've been able to follow this, it is also indicative of your general level of comprehension: if you understand the overall problem, the specific issue, and can interact conceptually, that makes you a better problem solver/Computer Scientist/programmer.
And please, ASK QUESTIONS if something doesn't make sense.
Often times, when checking the efficiency of a solution, a good measurement (especially for comparison), is to time how long the processing takes.
In order to do that in our prime number programs, we are going to use C library functions that obtain the current time, and use it as a stopwatch: we'll grab the time just before starting processing, and then once more when done. The total time will then be the difference between the two (end_time - start_time).
We are going to use the gettimeofday(2) function to aid us in this, and to use it, we'll need to do the following:
In order to use the gettimeofday(2) function in our program, we'll need to include the sys/time.h header file, so be sure to add it in with the existing ones:
#include <stdio.h> #include <stdlib.h> #include <sys/time.h>
gettimeofday(2) uses a struct timeval data type, of which we'll need to declare two variables in our programs (one for storing the starting time, and the other for the ending time).
Please declare these with your other variables, up at the top of main() (but still WITHIN main()– you do not need to declare global variables).
struct timeval time_start; // starting time struct timeval time_end; // ending time
To use gettimeofday(2), we merely place it at the point in our code we wish to take the time.
For our prime number programs, you'll want to grab the start time AFTER you've declared variables and processed arguments, but JUST BEFORE starting the driving loop doing the processing.
That call will look something like this:
gettimeofday(&time_start, 0);
The ending time should be taken immediately after all processing (and prime number output) is completed, and right before we display the timing information to STDERR:
gettimeofday(&time_end, 0);
Once we having the starting and ending times, we can display this to STDERR. You'll want this line:
fprintf(stderr, "%10.6lf\n", time_end.tv_sec - time_start.tv_sec + ((time_end.tv_usec - time_start.tv_usec) / 1000000.0));
For clarity sake, that format specifier is “%10.6lf”, where the “lf” is “long float”, that is NOT a number one but a lowercase letter 'ell'.
And with that, we can compute an approximate run-time of our programs. The timing won't necessarily be accurate down to that level of precision, but it will be informative enough for our purposes.
Your program output should be as follows (given the specified range):
lab46:~/src/cprog/pnc2$ ./primesoe 32 1 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 0.000088 lab46:~/src/cprog/pnc2$
The execution of the programs is short and simple- grab the parameters, do the processing, produce the output, and then terminate.
Based on specifications, it should also be able to do the following (override quantity with lower/upper bounds):
lab46:~/src/cprog/pnc2$ ./primesoe 48 1 4 18 5 7 11 13 17 0.000080 lab46:~/src/cprog/pnc2$
And (shift the lower bound, but still count out 32 primes):
lab46:~/src/cprog/pnc2$ ./primesoe 32 1 16 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 0.000087 lab46:~/src/cprog/pnc2$
And of course the same for the 3 variants, and the same error message reporting if invalid values are given.
You may notice a change with the sieves as compared to the other algorithms you've implemented with respect to performance- there will likely be a lower bound of performance, ie you have to exceed a certain threshold before the algorithm truly enters its power band, and then it may truly be a step-up in terms of performance.
To verify your program's correctness and view your implementation's performance, primerun will recognize the indicated program names listed above.
I'll hold off from posting actual primerun output; I want you to see the results without any expectations. Do know that the arrangement of programs (from left to right) is in order of worst to best performance based on my implementations.
If/when the runtime of a particular prime variant exceeds an upper threshold (likely to be set at 2 seconds), it will be omitted from further tests, and a series of dashes will instead appear in the output.
If you don't feel like waiting, simply hit CTRL-c and the script will terminate.
If the check is successful, you will see “OK” displayed beneath in the appropriate column; if unsuccessful, you will see “MISMATCH”.
To successfully complete this project, the following criteria must be met:
To submit this program to me using the submit tool, run the following command at your lab46 prompt:
$ submit cprog pnc2 primesoe.c primesoeodd.c primesoesrt.c primesoesrtodd.c Submitting cprog project "pnc2": -> primesoe.c(OK) -> primesoeodd.c(OK) -> primesoesrt.c(OK) -> primesoesrtodd.c(OK) SUCCESSFULLY SUBMITTED
You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.
What I will be looking for:
78:pnc2:final tally of results (78/0) *:pnc2:submit all programs correctly perform argument checking [2/2] *:pnc2:primesoe.c no negative compiler messages [2/2] *:pnc2:primesoe.c implements only specified algorithm [8/8] *:pnc2:primesoe.c adequate indentation and comments [3/3] *:pnc2:primesoe.c output conforms to specifications [3/3] *:pnc2:primesoe.c primerun runtime tests succeed [3/3] *:pnc2:primesoeodd.c no negative compiler messages [2/2] *:pnc2:primesoeodd.c implements only specified algorithm [8/8] *:pnc2:primesoeodd.c adequate indentation and comments [3/3] *:pnc2:primesoeodd.c output conforms to specifications [3/3] *:pnc2:primesoeodd.c primerun runtime tests succeed [3/3] *:pnc2:primesoesrt.c no negative compiler messages [2/2] *:pnc2:primesoesrt.c implements only specified algorithm [8/8] *:pnc2:primesoesrt.c adequate indentation and comments [3/3] *:pnc2:primesoesrt.c output conforms to specifications [3/3] *:pnc2:primesoesrt.c primerun runtime tests succeed [3/3] *:pnc2:primesoesrtodd.c no negative compiler messages [2/2] *:pnc2:primesoesrtodd.c implements only specified algorithm [8/8] *:pnc2:primesoesrtodd.c adequate indentation and comments [3/3] *:pnc2:primesoesrtodd.c output conforms to specifications [3/3] *:pnc2:primesoesrtodd.c primerun runtime tests succeed [3/3]