User Tools

Site Tools


haas:fall2020:discrete:projects:pnc1

Corning Community College

CSCS2330 Discrete Structures

Project: ALGORITHMS - PRIME NUMBER CALCULATION (pnc1)

Errata

With any increasingly complex piece of code or environment, we must find effective means of organizing various processes or communications. This “Errata” section and related updates are one such instance of that; intended as a means of disseminating information on changes to the project, you can be informed what modifications have taken place, along with any unique actions you need to take to compensate.

Any typos, bugs, or other updates/changes to the project will be documented here.

Revision List

  • revision #: <description> (DATESTRING)

Some changes may involve updates being made available to the project, in which case you'll be prompted with such notification and can run the available updating commands to synchronize your copy of the project with the changes.

Objective

To continue our exploration of algorithms and optimizations applied to them, through further work on the prime number computation programs started in pnc0.

Background

In mathematics, a prime number is a value that is only evenly divisible by 1 and itself; it has just that one pair of factors, no others. Numbers that have divisibility/factors are classified as composite numbers.

The number 6 is a composite number, as in addition to 1 and 6, it also has the factors of 2 and 3.

The number 17, however, is a prime number, as no numbers other than 1 and 17 can be evenly divided into it.

Calculating the primality of a number

As of yet, there is no quick and direct way of determining the primality of a given number. Instead, we must perform a series of tests to determine if it fails primality (typically by proving it is composite).

This process incurs a considerable amount of processing overhead on the task, so much so that increasingly large values take ever-expanding amounts of time. Often, approaches to prime number calculation involve various algorithms, which offer various benefits (less time) and drawback (more complex code).

Your task for this project is to implement a prime number program using the straightforward, unoptimized brute-force algorithm, which determines the primality of a number in a “trial by division” approach.

Main algorithm: brute force (primereg)

The brute force approach is the simplest to implement (although at some cost).

As we will be looking to do some time/performance analysis and comparisons, it is often good to have a baseline. This program will be it.

To perform the process of computing the primality of a number, we simply attempt to evenly divide all the values between 2 and one less than the number in question. If any one of them divides evenly, the number is NOT prime, but instead composite.

Checking the remainder of a division indicates whether or not a division was clean (having 0 remainder indicates such a state).

For example, the number 11:

11 % 2 = 1 (2 is not a factor of 11)
11 % 3 = 2 (3 is not a factor of 11)
11 % 4 = 3 (4 is not a factor of 11)
11 % 5 = 1 (5 is not a factor of 11)
11 % 6 = 5 (6 is not a factor of 11)
11 % 7 = 4 (7 is not a factor of 11)
11 % 8 = 3 (8 is not a factor of 11)
11 % 9 = 2 (9 is not a factor of 11)
11 % 10 = 1 (10 is not a factor of 11)

Because none of the values 2-10 evenly divided into 11, we can say it passed the test: 11 is a prime number

On the other hand, take 119:

119 % 2 = 1 (2 is not a factor of 119)
119 % 3 = 2 (3 is not a factor of 119)
119 % 4 = 3 (4 is not a factor of 119)
119 % 5 = 4 (5 is not a factor of 119)
119 % 6 = 5 (6 is not a factor of 119)
119 % 7 = 0 (7 is a factor of 119)
119 % 8 = 7
119 % 9 = 2
119 % 10 = 9
119 % 11 = 9
119 % 12 = 11
119 % 13 = 2
...

Because, during our range of testing every value from 2-118, we find that 7 evenly divides into 119, it failed the test: 119 is not prime, but is instead a composite number.

Please NOTE: Even once a number is identified as composite, your primereg MUST CONTINUE evaluating the remainder of the values (up to 119-1, or 118). It might seem pointless (and it is for a production program), but I want you to see the performance implications this creates.

algorithm

Some things to keep in mind on your implementation:

  • you will want to use loops (no less than 2, no more than 2) for this program.
    • a nested loop makes the most sense:
      • an outer loop that drives the progression of each sequential number to be tested
      • an inner loop that tests that current number to see if it has any factors
  • you know the starting value and the terminating condition, so you have a clear starting and ending point to work with.
  • I want you to use two DIFFERENT kind of loops in your programs. If you use a for() loop in your outer loop, I want you to use a while() or do-while() loop in your inner loop (and whatever combination you end up with).
  • I do NOT want to see ambiguous, one-letter variables used in your implementation(s). Please use meaningful variable names.
    • Some good examples of variable names would be:
      • number: the number being tested
      • factor: the value being divided into number to test for primality
      • step: the rate by which some variable is changing
      • qty: the count of the current tally of primes
      • max: the maximum count we seek
      • start: a value we are starting at
      • lower: a lower bound
      • upper: an upper bound
      • see how much more readable and meaningful these are, especially as compared to a, i, n, x? You may even find it helps with debugging and understanding your code better.
  • let the loops drive the overall process. Identify prime/composite status separate from loop terminating conditions.
    • and remember, the baseline brute force algorithm (primereg) may well identify a value as composite, but won't terminate the loop.
  • your timing should start before the loop (just AFTER argument processing), and terminate immediately following the terminating output newline outside the loops.
  • you may NOT split qty and range functionality into two separate code blocks (ie have two sets of two loops). Only the one set as indicated.

prime algorithm optimizations

To give us a decent appreciation of the subtleties of algorithm development in a theme of programs, I have identified the following optimizations that we will be implementing.

For simplicity, I have encoded this information in the file name (and therefore resulting executable/binary) that will correspond to the indicated algorithm+optimizations.

To break it down, all prime programs will be of the form:

  • primeALG[O…]
    • where each and every program starts with “prime”
    • is immediately followed by a 3-letter (lowercase) abbreviation of the algorithm to be implemented (reg, for instance)
    • and then is followed by 0 or more layered attributes describing the particular optimization that is applied (again, if any: zero or more).

The optimizations we will be implementing in this project (and their naming values) include:

  • break on composite (b) - once a tested number is proven composite, there is no need to continue processing: break out of the factor loop and proceed to the next number
  • mapping factors of 6 (m) - it turns out that, aside from the initial primes of 2 and 3, that all prime numbers fall to a +1 or -1 off a factor of six (there is an algorithm for this: 6a+/-1). This optimization will utilize this property, only testing numbers +/-1 off of factors of 6 (how might this impact overall processing?)
  • odds-only checking (o) - aside from 2, all other prime numbers are odd. Therefore, there is zero need to perform a composite check on an even number, allowing us to focus exclusively on odd values (luckily, they seem to occur in a predictable pattern).
  • sqrt() trick (s) - mathematically it has been shown that if a number has any evenly divisible factors, at least one half of that factor pair will occur by the square root point of the number being tested.
  • sqrt()-less square root approximation (a) - sqrt(), a function in the math library, does an industrial strength square root calculation. We don't need that, merely a whole integer value corresponding to the approximate square root. Here we will implement our own logic to approximate square root, hopefully with a considerable performance impact.

Unless specified in the encoded name, your algorithm should only implement the algorithm and optimization(s) specified.

That is, if your program to implement is primerego, that means you are ONLY to implement the brute force algorithm and odds-only checking. NO break on composite, NO sqrt() trick, etc. We are establishing separate data points for analytical comparison.

Some of these optimizations can co-exist easily (break + map, odd + sqrt()), others are partially compatible (map + odd can coexist in a certain form), while others are mutually exclusive (sqrt() and approximated square root conflict). So there are definitely a few combinations that are possible using this scheme.

Here are the variants you'll be implementing for this project:

break on composite (primeregb)

This optimization to primereg will make but one algorithmic change, and that takes place at the moment of identifying a number as composite. So, if we had our 119 example above, and discovered that 7 was a factor:

There is no further need to check the remaining values, as once we have proven the non-primality of a number, the state is set: it is composite. So be sure to use a break statement to terminate the computation loop (how does this impact overall performance???).

Make no other optimizations- this first project is to set up some important base line values that we can use for algorithmic comparison later on.

mapping factors of 6 (primeregm)

This optimization will check only the numbers that fall on either side of a factor of 6 for primality.

NOTE: If applicable, just display the initial “2” and “3” as hardcoded values.

odds-only checking (primerego)

This optimization will check only the odd numbers for primality, skipping the evens entirely.

NOTE: If applicable, just display the initial “2” as a hardcoded value.

sqrt() trick (primeregs)

This optimization employs the square root trick utilizing the C library's sqrt() function.

sqrt()-less square root approximation (primerega)

This optimization employs the approximated square root trick (NOT utilizing an existing square root function, but using simpler logic you implement to approximate the square root point).

Further explanation

An optimization to the previous process, which used sqrt(), this variation will do the exact same thing, but without using the sqrt() function. It will approximate the square root.

We know that a square root (especially a whole numbered square root), is when we have whole number factors that are squared. But in addition, only considering the whole number aspect of the square root, we start seeing series of values with the same whole square root value:

lab46:~$ count=0; for ((i=2; i<152; i++)); do printf "[%3d] %2d " "${i}" `echo "sqrt($i)" | bc -q`; let count=count+1; if [ "${count}" -eq 10 ]; then echo; count=0; fi; done; echo
[  2]  1 [  3]  1 [  4]  2 [  5]  2 [  6]  2 [  7]  2 [  8]  2 [  9]  3 [ 10]  3 [ 11]  3
[ 12]  3 [ 13]  3 [ 14]  3 [ 15]  3 [ 16]  4 [ 17]  4 [ 18]  4 [ 19]  4 [ 20]  4 [ 21]  4
[ 22]  4 [ 23]  4 [ 24]  4 [ 25]  5 [ 26]  5 [ 27]  5 [ 28]  5 [ 29]  5 [ 30]  5 [ 31]  5
[ 32]  5 [ 33]  5 [ 34]  5 [ 35]  5 [ 36]  6 [ 37]  6 [ 38]  6 [ 39]  6 [ 40]  6 [ 41]  6
[ 42]  6 [ 43]  6 [ 44]  6 [ 45]  6 [ 46]  6 [ 47]  6 [ 48]  6 [ 49]  7 [ 50]  7 [ 51]  7
[ 52]  7 [ 53]  7 [ 54]  7 [ 55]  7 [ 56]  7 [ 57]  7 [ 58]  7 [ 59]  7 [ 60]  7 [ 61]  7
[ 62]  7 [ 63]  7 [ 64]  8 [ 65]  8 [ 66]  8 [ 67]  8 [ 68]  8 [ 69]  8 [ 70]  8 [ 71]  8
[ 72]  8 [ 73]  8 [ 74]  8 [ 75]  8 [ 76]  8 [ 77]  8 [ 78]  8 [ 79]  8 [ 80]  8 [ 81]  9
[ 82]  9 [ 83]  9 [ 84]  9 [ 85]  9 [ 86]  9 [ 87]  9 [ 88]  9 [ 89]  9 [ 90]  9 [ 91]  9
[ 92]  9 [ 93]  9 [ 94]  9 [ 95]  9 [ 96]  9 [ 97]  9 [ 98]  9 [ 99]  9 [100] 10 [101] 10
[102] 10 [103] 10 [104] 10 [105] 10 [106] 10 [107] 10 [108] 10 [109] 10 [110] 10 [111] 10
[112] 10 [113] 10 [114] 10 [115] 10 [116] 10 [117] 10 [118] 10 [119] 10 [120] 10 [121] 11
[122] 11 [123] 11 [124] 11 [125] 11 [126] 11 [127] 11 [128] 11 [129] 11 [130] 11 [131] 11
[132] 11 [133] 11 [134] 11 [135] 11 [136] 11 [137] 11 [138] 11 [139] 11 [140] 11 [141] 11
[142] 11 [143] 11 [144] 12 [145] 12 [146] 12 [147] 12 [148] 12 [149] 12 [150] 12 [151] 12

Or, if perhaps we instead order by square root value:

lab46:~$ oldsqrt=$(echo "sqrt(2)" | bc -q); for ((i=2; i<49; i++)); do newsqrt=$(echo "sqrt($i)" | bc -q); if [ "${newsqrt}" -ne "${oldsqrt}" ]; then echo; fi; printf "[%3d] %2d " "${i}" "${newsqrt}"; oldsqrt="${newsqrt}"; done; echo
[  2]  1 [  3]  1
[  4]  2 [  5]  2 [  6]  2 [  7]  2 [  8]  2
[  9]  3 [ 10]  3 [ 11]  3 [ 12]  3 [ 13]  3 [ 14]  3 [ 15]  3
[ 16]  4 [ 17]  4 [ 18]  4 [ 19]  4 [ 20]  4 [ 21]  4 [ 22]  4 [ 23]  4 [ 24]  4
[ 25]  5 [ 26]  5 [ 27]  5 [ 28]  5 [ 29]  5 [ 30]  5 [ 31]  5 [ 32]  5 [ 33]  5 [ 34]  5 [ 35]  5
[ 36]  6 [ 37]  6 [ 38]  6 [ 39]  6 [ 40]  6 [ 41]  6 [ 42]  6 [ 43]  6 [ 44]  6 [ 45]  6 [ 46]  6 [ 47]  6 [ 48]  6

We see that the square root of 36 is 6, but so is the square root of 37, 38, 39… etc. up until we hit 49 (where the whole number square root increments to 7).

Therefore, if we were checking 42 to be prime, we'd only have to check up to 6.

We don't need a sqrt() function to tell us this, we can determine the approximate square root point ourselves- by squaring the current factor being tested, and so long as it hasn't exceeded the value we're checking, we know to continue.

There are some important lessons at play here:

  • approximation can be powerful
  • approximation can result in a simpler algorithm, improving runtime
    • sqrt() is more complex than you may be aware, not to mention it is in a function. By avoiding that function call, we eliminate some overhead, and that can make a difference in runtime performance.

Depending on how you implement this and the original sqrt() algorithms, this version may have a noticeable performance difference. If, on the other hand, you were really optimal in both implementations, the performance difference may be narrower (if negligible).

primeregbm

To get a taste for combining optimizations, you'll also implement a variant that incorporates both the break AND the map optimizations.

NOTE: If applicable, just display the initial “2” and “3” as hardcoded values.

primeregbo

To get a taste for combining optimizations, you'll also implement a variant that incorporates both the break AND the odds-only checking optimizations.

NOTE: If applicable, just display the initial “2” as a hardcoded value.

primeregbs

To get a taste for combining optimizations, you'll also implement a variant that incorporates both the break AND the sqrt() optimizations.

primeregba

To get a taste for combining optimizations, you'll also implement a variant that incorporates both the break AND the approximated square root optimizations.

Programs

It is your task to write the following prime number variants:

  • the remainder of the viable double optimization combinations:
    • primeregmo.c: map + odd traversal optimizations
    • primeregms.c: map traversal + sqrt() trick
    • primeregma.c: map treversal + approximated square root trick
    • primeregos.c: odd traversal + sqrt() trick
    • primeregoa.c: odd traversal + approximated square root trick
  • all of the viable triple optimization combinations:
    • primeregbmo.c: break + map + odd traversal
    • primeregbms.c: break + map + sqrt() trick
    • primeregbma.c: break + map + approximated square root trick
    • primeregbos.c: break + odd + sqrt() trick
    • primeregboa.c: break + odd + approximated square root trick
    • primeregmos.c: map + odd traversal + sqrt() trick
    • primeregmoa.c: map + odd traversal + approximated square root trick
  • all of the viable quadruple optimizations combinations:
    • primeregbmos.c: break + map + odd + sqrt() trick
    • primeregbmoa.c: break + map + odd + approximated square root trick

Program Specifications

Your program should:

  • obtain 2-4 parameters from the command-line (see command-line arguments section below).
    • check to make sure the user indeed supplied enough parameters, and exit with an error message if not.
    • argv[1]: maximum quantity of primes to calculate (your program should run until it discovers that many primes).
      • this value should be an integer value, greater than or equal to 0.
        • if argv[1] is 0, disable the quantity check, and rely on provided lower and upper bounds (up to argv[4] would be required in this case).
    • argv[2]: reserved for future compatibility; for now, require and expect it to be 1.
    • argv[3]: conditionally optional lower bound (starting value). Most of the time, this will probably be 2, but should be a positive integer greater than or equal to 2. This defines where your program will start its prime quantity check from.
      • if omitted, assume a lower bound of 2.
      • if you desired to specify an upper bound (argv[4]), you obviously MUST provide the lower bound argument under this scheme.
    • argv[4]: conditionally optional upper bound (ending value). If provided, this is the ending value you'd like to check to.
      • If doing a quantity run (argv[1] is NOT 0), this value isn't necessary.
      • If doing a quantity run AND you specify an upper bound, whichever condition is achieved first dictates program termination. That is, upper bound could override quantity (if it is achieved before quantity), and quantity can override the upper bound (if it is achieved before reaching the specified upper bound).
    • for each argument: you should do a basic check to ensure the user complied with this specification, and exit with a unique error message (displayed to STDERR) otherwise:
      • for insufficient quantity of arguments, display: PROGRAM_NAME: insufficient number of arguments!
      • for invalid argv[1], display: PROGRAM_NAME: invalid quantity!
      • for invalid argv[2], display: PROGRAM_NAME: invalid value!
      • for invalid argv[3], display: PROGRAM_NAME: invalid lower bound!
        • if argv[3] is not needed, ignore (no error displayed not forced exit, as it is acceptable defined behavior).
      • for invalid argv[4], display: PROGRAM_NAME: invalid upper bound!
        • if argv[4] is not needed, ignore (no error displayed nor forced exit, as it is acceptable defined behavior).
      • In these error messages, PROGRAM_NAME is the name of the program being run; this can be accessed as a string stored in argv[0].
  • implement ONLY the algorithm and optimization(s) specified in the program name. We are producing multiple data points for a broader performance comparison.
  • please take note on differences in run-time, contemplating the impact the algorithm and optimization(s) have on performance (timing, specifically).
  • immediately after argument processing: start your stopwatch (see timing section below).
  • perform the correct algorithm and optimization(s) against the command-line input(s) given.
    • each program is to have no fewer and no more than 2 loops in this prime processing section.
    • in each program, you are not allowed to use a given loop type (for(), while(), do-while()) more than once!
  • display identified primes (space-separated) to a file pointer called stdout
  • stop your stopwatch immediately following your prime processing loops (and terminating newline display to stdout). Calculate the time that has transpired (ending time minus starting time).
  • output the processing run-time to the file pointer called stderr
  • your output MUST conform to the example output in the execution section below. This is also a test to see how well you can implement to specifications. Basically:
    • as primes are being displayed, they are space-separated (first prime hugs the left margin), and when all said and done, a newline is issued.
    • the timing information will be displayed in accordance to code I will provide below (see the Timing section).

Implementation Restrictions

As our goal is not only to explore the more subtle concepts of computing but to promote different methods of thinking (and arriving at solutions seemingly in different ways), one of the themes I have been harping on is the stricter adherence to the structured programming philosophy. It isn't just good enough to be able to crank out a solution if you remain blind to the many nuances of the tools we are using, so we will at times be going out of our way to emphasize focus on certain areas that may see less exposure (or avoidance due to it being less familiar).

As such, the following implementation restrictions are also in place:

  • use any break or continue, or other flow redirection statements sparingly. I am not forbidding their use, but I also don't want this to turn into a lazy solution free-for-all. I am letting you use them, but with justification.
    • justification implies some thoughtful why/how style comments explaining how a particular use of one of these statements is effective and efficient (not: “I couldn't think of any other way to do it”).
  • absolutely NO infinite loops (while(1) or the like).
  • no forced redirection of the flow of the process (no seeking to the end of the file to grab a max size only to zip back somewhere else: deal with the data in as you are naturally encountering it; no telling; no “ungetting” data back into the file).
  • All “arrays” must be declared and referenced using ONLY pointer notation, NO square brackets.
  • NO logic shunts (ie having an if statement nested inside a loop to bypass an undesirable iteration)- this should be handled by the loop condition!
  • at most, only one return() statement per function. Error terminations should use exit()

Write clean, well-indented, well-commented, effective code… show me that you have learned something from your programming experience.

Grabit Integration

For those familiar with the grabit tool on lab46, I have made some skeleton files and a custom Makefile available for this project.

To “grab” it:

lab46:~/src/discrete$ grabit discrete pnc1
make: Entering directory '/var/public/SEMESTER/discrete/pnc1'
Commencing copy process for SEMESTER discrete project pnc1:
 -> Creating project pnc1 directory tree           ... OK
 -> Copying pnc1 project files                     ... OK
 -> Synchronizing pnc1 project revision level      ... OK
 -> Establishing sane file permissions for pnc1    ... OK

*** Copy COMPLETE! You may now go to the '/home/USER/src/discrete/pnc1' directory ***

make: Leaving directory '/var/public/SEMESTER/discrete/pnc1'

NOTE: You do NOT want to do this on a populated pnc1 project directory– it will overwrite files.

And, of course, your basic compile and clean-up operations via the Makefile.

Makefile operations

Makefiles provide a build automation system for our programs, instructing the computer on how to compile files, so we don't have to constantly type compiler command-lines ourselves. I've also integration some other useful, value-added features that will help you with overall administration of the project.

Basic operation of the Makefile is invoked by running the command “make” by itself. The default action is to compile everything in the project directory.

Additional options are available, and they are provided as an argument to the make command. You can see the available options by running “make help”:

lab46:~/src/discrete/pnc1$ make help
******************[ Discrete Structures pnc1 Project ]******************
** make                     - build everything                        **
** make showerrors          - display compiler warnings/errors        **
**                                                                    **
** make debug               - build everything with debug symbols     **
** make checkqty            - runtime evaluation for qty              **
** make checkrange          - runtime evaluation for range            **
**                                                                    **
** make save                - create a backup archive                 **
** make submit              - submit assignment (based on dirname)    **
**                                                                    **
** make update              - check for and apply updates             **
** make reupdate            - re-apply last revision                  **
** make reupdate-all        - re-apply all revisions                  **
**                                                                    **
** make clean               - clean; remove all objects/compiled code **
** make help                - this information                        **
************************************************************************

A description of some available commands include:

  • make: compile everything
    • any warnings or errors generated by the compiler will go into a file in the base directory of the project in a file called errors; you can cat it to view the information.
  • make debug: compile everything with debug support
    • any warnings or errors generated by the compiler will be displayed to the screen as the programs compile.
  • make clean: remove all binaries
  • make save: make a backup of your current work
  • make submit: archive and submit your project

The various “check” options do a runtime performance grid, allowing you to compare timings between your implementations.

Just another “nice thing” we deserve.

Command-Line Arguments

To automate our comparisons, we will be making use of command-line arguments in our programs.

header files

We don't need any extra header files to use command-line arguments, but we will need an additional header file to use the atoi(3) function, which we'll use to quickly turn the command-line parameter into an integer, and that header file is stdlib.h, so be sure to include it with the others:

#include <stdio.h>
#include <stdlib.h>

setting up main()

To accept (or rather, to gain access) to arguments given to your program at runtime, we need to specify two parameters to the main() function. While the names don't matter, the types do.. I like the traditional argc and argv names, although it is also common to see them abbreviated as ac and av.

Please declare your main() function as follows:

int main(int argc, char **argv)

There are two very important variables involved here (the types are actually what are important, the names given to the variables are actually quite, variable; you may see other references refer to them as things like “ac” and “av”):

  • int argc: the count (an integer) of tokens given on the command line (program name + arguments)
  • char **argv: an array of strings (technically an array of an array of char) that contains “strings” of the various tokens provided on the command-line.

The arguments are accessible via the argv array, in the order they were specified:

  • argv[0]: program invocation (path + program name)
  • argv[1]: our maximum / upper bound
  • argv[2]: reserved value, should still be provided and be a 1 for this project
  • argv[3]: conditionally optional; represents lower bound
  • argv[4]: conditionally optional; represents upper bound

Additionally, let's not forget the argc variable, an integer, which contains a count of arguments (argc == argument count). If we provided argv[0] through argv[4], argc would contain a 5.

example

For example, if we were to execute the primereg program:

lab46:~/src/discrete/pnc1$ ./primereg 128 1 2 2048

We'd have:

  • argv[0]: “./primereg”
  • argv[1]: “128” (note, NOT the scalar integer 128, but a string)
  • argv[2]: “1”
  • argv[3]: “2”
  • argv[4]: “2048”

and let's not forget:

  • argc: 5 (there are 5 things, argv indexes 0, 1, 2, 3, and 4)

With the conditionally optional arguments as part of the program spec, for a valid execution of the program, argc could be a value anywhere from 3 to 5.

Simple argument checks

While there are a number of checks we should perform, one of the first should be a check to see if the minimal number of arguments has been provided:

    if (argc < 3)  // if less than 3 arguments (program_name + quantity + argv[2] == 3) have been provided
    {
        fprintf(stderr, "%s: insufficient number of arguments!\n", argv[0]);
        exit(1);
    }

Since argv[3] (lower bound) and argv[4] (upper bound) are conditionally optional, it wouldn't make sense to check for them in the overall count. But we can and do still want to stategically utilize argc to determine if an argv[3] or argv[4] is present.

Grab and convert max

Finally, we need to put the argument representing the maximum quantity into a variable.

I'd recommend declaring a variable of type int.

We will use the atoi(3) function to quickly convert the command-line arguments into int values:

    max  = atoi (argv[1]);

And now we can proceed with the rest of our prime implementation.

Timing

Often times, when checking the efficiency of a solution, a good measurement (especially for comparison), is to time how long the processing takes.

In order to do that in our prime number programs, we are going to use C library functions that obtain the current time, and use it as a stopwatch: we'll grab the time just before starting processing, and then once more when done. The total time will then be the difference between the two (end_time - start_time).

We are going to use the gettimeofday(2) function to aid us in this, and to use it, we'll need to do the following:

header file

In order to use the gettimeofday(2) function in our program, we'll need to include the sys/time.h header file, so be sure to add it in with the existing ones:

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

timeval variables

gettimeofday(2) uses a struct timeval data type, of which we'll need to declare two variables in our programs (one for storing the starting time, and the other for the ending time).

Please declare these with your other variables, up at the top of main() (but still WITHIN main()– you do not need to declare global variables).

    struct timeval time_start; // starting time
    struct timeval time_end;   // ending time

Obtaining the time

To use gettimeofday(2), we merely place it at the point in our code we wish to take the time.

For our prime number programs, you'll want to grab the start time AFTER you've declared variables and processed arguments, but JUST BEFORE starting the driving loop doing the processing.

That call will look something like this:

    gettimeofday(&time_start, 0);

The ending time should be taken immediately after all processing (and prime number output) is completed, and right before we display the timing information to STDERR:

    gettimeofday(&time_end, 0);

Displaying the runtime

Once we have the starting and ending times, we can display this to the stderr file pointer. You'll want this line:

fprintf(stderr, "%8.4lf\n",
time_end.tv_sec-time_start.tv_sec+((time_end.tv_usec-time_start.tv_usec)/1000000.0));

For clarity sake, that format specifier is “%8.4lf”, where the “lf” is “long float”, that is NOT a number 'one' but a lowercase letter 'ell'.

And with that, we can compute an approximate run-time of our programs. The timing won't necessarily be accurate down to that level of precision, but it will be informative enough for our purposes.

Loops

A loop is basically instructing the computer to repeat a section, or block, or code a given amount of times (it can be based on a fixed value– repeat this 4 times, or be based on a conditional value– keep repeating as long as (or while) this value is not 4).

Loops enable us to simplify our code– allowing us to write a one-size-fits all algorithm (provided the algorithm itself can appropriately scale!), where the computer merely repeats the instructions we gave. We only have to write them once, but the computer can do that task any number of times.

Loops can be initially difficult to comprehend because unlike other programmatic actions, they are not single-state in nature– loops are multi-state. What this means is that in order to correctly “see” or visualize a loop, you must analyze what is going on with EACH iteration or cycle, watching the values/algorithm/process slowly march from its initial state to its resultant state. Think of it as climbing a set of stairs… yes, we can describe that action succinctly as “climbing a set of stairs”, but there are multiple “steps” (heh, heh) involved: we place our foot, adjust our balance– left foot, right foot, from one step, to the next, to the next, allowing us to progress from the bottom step to the top step… that process of scaling a stairway is the same as iterating through a loop– but what is important as we implement is what needs to happen each step along the way.

With that said, it is important to be able to focus on the process of the individual steps being taken. What is involved in taking a step? What constitutes a basic unit of stairway traversal? If that unit can be easily repeated for the next and the next (and in fact, the rest of the) steps, we've described the core process of the loop, or what will be iterated a given number of times.

In C and C-syntax influenced languages (C++, Java, PHP, among others), we typically have 3 types of loops:

  • for loop (automatic counter loop, stepping loop; top-driven) - when we know exactly how many times we wish something to run; we know where we want to start, where we want to end, and exactly how to progress from start to end (step value)
  • while loop (top-driven conditional loop) - when we want to repeat a process, but the exact number of iterations is either not known, not important, not known, or variable in nature. While loops can run 0 or more times.
  • do-while loop (bottom-driven conditional loop) - similar to the while loop, only we do the check for loop termination at the bottom of the loop, meaning it runs 1 or more times (a do-while loop is guaranteed to run at least once).

for() loops

A for() loop is the most syntactically unique of the loops, so care must be taken to use the proper syntax.

With any loop, we need (at least one) looping variable, which the loop will use to analyze whether or not we've met our looping destination, or to perform another iteration.

A for loop typically also has a defined starting point, a “keep-looping-while” condition, and a stepping equation.

Here's a sample for() loop, in C, which will display the squares of each number, starting at 0, and stepping one at a time, for 8 total iterations:

int i = 0;
 
for (i = 0; i < 8; i++)
{
    fprintf(stdout, "loop #%d ... %d\n", (i+1), (i*i));
}

The output of this code, with the help of our loop should be:

loop #1 ... 0
loop #2 ... 1
loop #3 ... 4
loop #4 ... 9
loop #5 ... 16
loop #6 ... 25
loop #7 ... 36
loop #8 ... 49

Note how we can use our looping variable (i) within mathematical expressions to drive a process along… loops can be of enormous help in this way.

And again, we shouldn't look at this as one step– we need to see there are 8 discrete, distinct steps happening here (when i is 0, when i is 1, when i is 2, … up until (and including) when i is 7).

The loop exits once i reaches a value of 8, because our loop determinant condition states as long as i is less than 8, continue to loop. Once i becomes 8, our looping condition has been satisfied, and the loop will no longer iterate.

The stepping (that third) field is a mathematical expression indicating how we wish for i to progress from its starting state (of being equal to 0) to satisfying the loop's iterating condition (no longer being less than 8).

i++ is a shortcut we can use in C; the longhand (and likely more familiar) equivalent is: i = i + 1

while() loops

A while() loop isn't as specific about starting and stepping values, really only caring about what condition needs to be met in order to exit the loop (keep looping while this condition is true).

In actuality, anything we use a for loop for can be expressed as a while loop– we merely have to ensure we provide the necessary loop variables and progressions within the loop.

That same loop above, expressed as a while loop, could look like:

int i = 0;
 
while (i < 8)
{
    fprintf(stdout, "loop #%d ... %d\n", (i+1), (i*i));
    i = i + 1;   // I could have used "i++;" here
}

The output of this code should be identical, even though we used a different loop to accomplish the task (try them both out and confirm!)

while() loops, like for() loops, will run 0 or more times; if the conditions enabling the loop to occur are not initially met, they will not run… if met, they will continue to iterate until their looping conditions are met.

It is possible to introduce a certain kind of logical error into your programs using loops– what is known as an “infinite loop”; this is basically where you erroneously provide incorrect conditions to the particular loop used, allowing it to start running, but never arriving at its conclusion, thereby iterating forever.

Another common logical error that loops will allow us to encounter will be the “off by one” error– where the conditions we pose to the loop are incorrect, and the loop runs one magnitude more or less than we had intended. Again, proper debugging of our code will resolve this situation.

do-while loops

The third commonly recognized looping structure in C, the do-while loop is identical to the while() (and therefore also the for()) loop, only it differs in where it checks the looping condition: where for() and while() are “top-driven” loops (ie the test for loop continuance occurs at the top of the loop, before running the code in the loop body), the do-while is a “bottom-driven” loop (ie the test for loop continuance occurs at the bottom of the loop).

The placement of this test determines the minimal number of times a loop can run.

In the case of the for()/while() loops, because the test is at the top- if the looping conditions are not met, the loop may not run at all. It is for this reason why these loops can run “0 or more times”

For the do-while loop, because the test occurs at the bottom, the body of the loop (one full iteration) is run before the test is encountered. So even if the conditions for looping are not met, a do-while will run “1 or more times”.

That may seem like a minor, and possibly annoying, difference, but in nuanced algorithm design, such distinctions can drastically change the layout of your code, potentially being the difference between beautifully elegant-looking solutions and those which appear slightly more hackish. They can BOTH be used to solve the same problems, it is merely the nature of how we choose express the solution that should make one more preferable over the other in any given moment.

I encourage you to intentionally try your hand at taking your completed programs and implementing other versions that utilize the other types of loops you haven't utilized. This way, you can get more familiar with how to structure your solutions and express them. You will find you tend to think in a certain way (from experience, we seem to get in the habit of thinking “top-driven”, and as we're unsure, we tend to exert far more of a need to control the situation, so we tend to want to use for loops for everything– but practicing the others will free your mind to craft more elegant and efficient solutions; but only if you take the time to play and explore these possibilities).

So, expressing that same program in the form of a do-while loop (note the changes from the while):

int i = 0;
 
do
{
    fprintf(stdout, "loop #%d ... %d\n", (i+1), (i*i));
    i = i + 1;  // again, we could just as easily use "i++;" here
} while(i < 8);

In this case, the 0 or more vs. 1 or more minimal iterations wasn't important; the difference is purely syntactical.

With the do-while loop, we start the loop with a do statement.

Also, the do-while is the only one of our loops which NEEDS a terminating semi-colon (;).. please take note of this.

Execution

specified quantity

Your program output should be as follows (given the specified quantity):

lab46:~/src/discrete/pnc1$ ./primereg 24 1
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 
  0.0001
lab46:~/src/discrete/pnc1$ 

The execution of the programs is short and simple- grab the parameters, do the processing, produce the output, and then terminate.

invalid lower bound

Here's an example that should generate an error upon running (based on project specifications):

lab46:~/src/discrete/pnc1$ ./primerego 32 1 0
./primerego: invalid lower bound
lab46:~/src/discrete/pnc1$ 

In this case, the program logic should have detected an invalid condition and bailed out before prime computations even began. No timing data is displayed, because exiting should occur even prior to that.

upper bound overriding quantity

As indicated above, there is potential interplay with an active quantity and upper bound values. Here is an example where upper bound overrides quantity, resulting in an early termination (ie upper bound is hit before quantity):

lab46:~/src/discrete/pnc1$ ./primeregs 128 1 7 23
7 11 13 17 19 23
  0.0001
lab46:~/src/discrete/pnc1$ 

Also for fun, I set the lower bound to 7, so you'll see computation starts at 7 (vs. the usual 2).

Check Results

If you'd like to compare your implementations, I rigged up a Makefile checking rule called “make checkqty” and “make checkrange” which you can run to get a nice side-by-side runtime comparisons of your implementations.

In order to work, you MUST be in the directory where your pnc1 binaries reside, and must be named as such (which occurs if you ran make to compile them).

check qty

For instance (running on my implementation of the pnc1 programs, some output omitted to keep the surprise alive):

lab46:~/src/discrete/pnc1$ make checkqty
=========================================================================================
      qty     reg    regm    rego    regb   regbm   regbo    regs    rega   regbs   regba
=========================================================================================
       32  0.0002  0.0001  0.0001  0.0002  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001
       64  0.0006  0.0003  0.0002  0.0002  0.0002  0.0001  0.0001  0.0002  0.0002  0.0001
      128  0.0028  0.0010  0.0008  0.0006  0.0006  0.0003  0.0004  0.0003  0.0002  0.0002
      256  0.0123  0.0041  0.0031  0.0020  0.0019  0.0010  0.0009  0.0008  0.0004  0.0003
      512  0.0574  0.0188  0.0144  0.0077  0.0077  0.0040  0.0025  0.0026  0.0008  0.0007
     1024  0.2690  0.0880  0.0665  0.0320  0.0312  0.0161  0.0077  0.0080  0.0019  0.0016
...
   262144  ------  ------  ------  ------  ------  ------  ------  ------  ------  ------
=========================================================================================

check range

Or check range runtimes:

lab46:~/src/discrete/pnc1$ make checkrange
=========================================================================================
    range     reg    regm    rego    regb   regbm   regbo    regs    rega   regbs   regba
=========================================================================================
       32  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001
       64  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001
      128  0.0002  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001
      256  0.0004  0.0002  0.0002  0.0001  0.0002  0.0002  0.0002  0.0002  0.0001  0.0001
      512  0.0015  0.0006  0.0005  0.0003  0.0003  0.0002  0.0002  0.0002  0.0002  0.0002
     1024  0.0053  0.0018  0.0014  0.0009  0.0010  0.0005  0.0005  0.0005  0.0002  0.0002
     2048  0.0191  0.0063  0.0049  0.0028  0.0027  0.0015  0.0011  0.0011  0.0004  0.0003
     4096  0.0709  0.0232  0.0177  0.0094  0.0091  0.0048  0.0029  0.0030  0.0008  0.0008
     8192  0.2712  0.0887  0.0672  0.0322  0.0315  0.0163  0.0078  0.0077  0.0019  0.0016
...
  2097152  ------  ------  ------  ------  ------  ------  ------  ------  ------  ------
=========================================================================================

If the runtime of a particular prime variant exceeds an upper runtime threshold (likely to be set at 1 second), it will be omitted from further tests, and a series of dashes will instead appear in the output.

If you don't feel like waiting, simply hit CTRL-c (maybe a couple of times) and the script will terminate.

Verification

You will want to verify your program output's validity to ensure maximum correctness.

In the data/ directory you will find a primelist.gz file which contains the first 295947 primes for your verification needs.

In general

Analyze the times you see… do they make sense, especially when comparing the algorithm used and the quantity being processed? These are related to some very important core Computer Science considerations we need to be increasingly mindful of as we design our programs and implement our solutions. Algorithmic complexity and algorithmic efficiency will be common themes in all we do.

Submission

To successfully complete this project, the following criteria must be met:

  • Code must compile cleanly (no warnings or errors)
  • Output must be correct, and match the form given in the sample output above.
  • Code must be nicely and consistently indented (you may use the indent tool)
  • Code must utilize the algorithm(s) presented above:
    • primeregmo.c: map + odd traversal optimizations
    • primeregms.c: map traversal + sqrt() trick
    • primeregma.c: map treversal + approximated square root trick
    • primeregos.c: odd traversal + sqrt() trick
    • primeregoa.c: odd traversal + approximated square root trick
    • primeregbmo.c: break + map + odd traversal
    • primeregbms.c: break + map + sqrt() trick
    • primeregbma.c: break + map + approximated square root trick
    • primeregbos.c: break + odd + sqrt() trick
    • primeregboa.c: break + odd + approximated square root trick
    • primeregmos.c: map + odd traversal + sqrt() trick
    • primeregmoa.c: map + odd traversal + approximated square root trick
    • primeregbmos.c: break + map + odd + sqrt() trick
    • primeregbmoa.c: break + map + odd + approximated square root trick
  • Code must be commented
    • have a properly filled-out comment banner at the top
      • be sure to include any compiling instructions
    • have at least 20% of your program consist of //-style descriptive comments
  • Output Formatting (including spacing) of program must conform to the provided output (see above).
  • Track/version the source code in a repository
  • Submit a copy of your source code to me using the submit tool.

To submit this program to me using the submit tool, run the following command at your lab46 prompt:

lab46:~/src/discrete/pnc1$ make submit
Delinking ...
removed ‘primerega.c’
removed ‘primeregba.c’
removed ‘primeregb.c’
removed ‘primeregbm.c’
removed ‘primeregbo.c’
removed ‘primeregbs.c’
removed ‘primereg.c’
removed ‘primeregm.c’
removed ‘primerego.c’
removed ‘primeregs.c’
removed ‘primeregbma’
removed ‘primeregbmoa’
removed ‘primeregbmo’
removed ‘primeregbmos’
removed ‘primeregbms’
removed ‘primeregboa’
removed ‘primeregbos’
removed ‘primeregma’
removed ‘primeregmoa’
removed ‘primeregmo’
removed ‘primeregmos’
removed ‘primeregms’
removed ‘primeregoa’
removed ‘primeregos’
removed ‘errors’

Project backup process commencing

Taking snapshot of current project (pnc1)      ... OK
Compressing snapshot of pnc1 project archive   ... OK
Setting secure permissions on pnc1 archive     ... OK

Project backup process complete

Submitting discrete project "pnc1":
    -> ../pnc1-20180917-16.tar.gz(OK)

SUCCESSFULLY SUBMITTED

You should get that final “SUCCESSFULLY SUBMITTED” with no error messages occurring. If not, check for typos and or locational mismatches.

Evaluation Criteria

Grand total points:

182:pnc1:final tally of results (182/182)

What I will be looking for (for each file):

*:pnc1:primeALGO.c compiles cleanly, no compiler messages [1/1]
*:pnc1:primeALGO.c implements only specified algorithm [2/2]
*:pnc1:primeALGO.c consistent indentation throughout code [1/1]
*:pnc1:primeALGO.c relevant comments throughout code [1/1]
*:pnc1:primeALGO.c code conforms to project specifications [2/2]
*:pnc1:primeALGO.c runtime output conforms to specifications [4/4]
*:pnc1:primeALGO.c make checkqty test times within reason [1/1]
*:pnc1:primeALGO.c make checkrange test times within reason [1/1]

As the optimizations improve upon others, some evaluations will be based upon differences between a baseline (in some cases, primereg) and the optimization.

Additionally:

  • Solutions not abiding by spirit of project will be subject to a 25% overall deduction
  • Solutions not utilizing descriptive why and how comments will be subject to a 25% overall deduction
  • Solutions not utilizing indentation to promote scope and clarity will be subject to a 25% overall deduction
  • Solutions not organized and easy to read (assume a terminal at least 90 characters wide, 40 characters tall) are subject to a 25% overall deduction
haas/fall2020/discrete/projects/pnc1.txt · Last modified: 2020/10/11 09:27 by wedge