discrete fall2022 eoce 0x3
========================

=====Objective=====
To explore  the cnv0/cnv1  programs from  a different  perspective: using
only ONE loop to drive central processing.


=====Background=====
In  mathematics, a  **prime**  number  is a  value  that  is only  evenly
divisible  by  1 and  itself;  it  has just  that  one  pair of  factors,
no  others.  Numbers that  have  divisibility/factors  are classified  as
**composite** numbers.

The number **6** is a **composite** number, as in addition to 1 and 6, it
also has the factors of 2 and 3.

The number  **17**, however, is a  **prime** number, as no  numbers other
than 1 and 17 can be evenly divided into it.

=====Calculating the primality of a number=====
As of yet, there is no quick  and direct way of determining the primality
of  a  given number.  Instead,  we  must perform  a  series  of tests  to
determine if it fails primality (typically by proving it is composite).

This process incurs  a considerable amount of processing  overhead on the
task,  so much  so  that increasingly  large  values take  ever-expanding
amounts of  time. Often, approaches  to prime number  calculation involve
various algorithms, which offer various benefits (less time) and drawback
(more complex code).

Your task for  this project is to implement a  prime number program using
the straightforward, unoptimized  brute-force algorithm, which determines
the primality of a number in a "trial by division" approach.

=====Main algorithm: brute force=====
The brute force  approach is the simplest to implement  (although at some
cost in the form of time).

As  we  will  be  looking   to  do  some  time/performance  analysis  and
comparisons, it  is often good to  have a baseline. This  program will be
it.

To perform the process of computing  the primality of a number, we simply
attempt to evenly divide  all the values between 2 and  one less than the
number in  question. If  any one  of them divides  evenly, the  number is
**NOT** prime, but instead composite.

Checking  the **remainder**  of a  division  indicates whether  or not  a
division was clean (having 0 remainder indicates such a state).

For example, the number 11:

<code>
11 % 2 = 1 (2 is not a factor of 11)
11 % 3 = 2 (3 is not a factor of 11)
11 % 4 = 3 (4 is not a factor of 11)
11 % 5 = 1 (5 is not a factor of 11)
11 % 6 = 5 (6 is not a factor of 11)
11 % 7 = 4 (7 is not a factor of 11)
11 % 8 = 3 (8 is not a factor of 11)
11 % 9 = 2 (9 is not a factor of 11)
11 % 10 = 1 (10 is not a factor of 11)
</code>

Because none  of the values  2-10 evenly divided into  11, we can  say it
passed the test: **11 is a prime number**

On the other hand, take 119:

<code>
119 % 2 = 1 (2 is not a factor of 119)
119 % 3 = 2 (3 is not a factor of 119)
119 % 4 = 3 (4 is not a factor of 119)
119 % 5 = 4 (5 is not a factor of 119)
119 % 6 = 5 (6 is not a factor of 119)
119 % 7 = 0 (7 is a factor of 119)
119 % 8 = 7
119 % 9 = 2
119 % 10 = 9
119 % 11 = 9
119 % 12 = 11
119 % 13 = 2
...
</code>

Because, during our range of testing every value from 2-118, we find that
7 evenly divides into 119, it failed  the test: 119 is **not** prime, but
is instead a composite number.

Please NOTE: Even once a number  is identified as composite, your program
MUST **CONTINUE** evaluating the remainder of the values (up to 119-1, or
118). It might seem pointless (and it is for a production program), but I
want you to see the performance implications this creates.

====algorithm====
Some things to keep in mind on your implementation:

  * you will want to have exactly ONE  loop in the central processing for
    this program.

    * you will need to flatten  the conditions that previously took place
      in the nested loop

    * be mindful of what the underlying conditions are

  * you  know the starting  value and  the terminating condition,  so you
    have a clear starting and ending point to work with.

  * I do **NOT** want to see ambiguous, one-letter variables used in your
    implementation(s). Please use //meaningful// variable names.

    * Some good examples of variable names would be:

      * **number**: the number being tested

      * **factor**: the  value being  divided  into number  to  test  for
        primality

      * **step**: the rate by which some variable is changing

      * **qty**: the count of the current tally of primes

      * **max**: the maximum count we seek

      * **start**: a value we are starting at

      * **lower**: a lower bound

      * **upper**: an upper bound

      * see how  much more readable and meaningful  these are, especially
        as compared  to **a**, **i**, **n**, **x**?  You may even find it
        helps with debugging and understanding your code better.

  * let  the  loop drive  the overall  process. Identify  prime/composite
    status separate from loop terminating conditions.

    * and remember,  the baseline brute force algorithm may well identify
      a value as composite, but won't terminate the loop.

  * your  timing  should  start  before  the  loop (just  AFTER  argument
    processing),  and  terminate  immediately  following  the terminating
    output newline outside the loops.

  * you  may **NOT** split  **qty** and **range** functionality  into two
    separate  code  blocks (ie have two sets  of two loops). Only the one
    set as indicated.

=====prime algorithm optimizations=====
To  give  us  a  decent  appreciation  of  the  subtleties  of  algorithm
development  in a  theme of  programs,  I have  identified the  following
optimizations that we will be implementing.

The  optimizations we  will be  implementing in  this section include:

  * **break  on   composite**  - once  a  tested  number  is proven to be
    composite, there is no need to continue processing: break out  of the
    factor check logic and proceed to the next number

  * **mapping factors of 6** - it  turns out that, aside from the initial
    primes of **2** and **3**, that **all** prime numbers fall to a +1 or
    -1 off a factor of six  (there is an algorithm for this: **6a+/-1**).
    This optimization  will utilize  this property, only  testing numbers
    +/-1 off of factors of 6 (how might this impact overall processing?)

  * **odds-only  checking**  -  aside  from **2**,  **all**  other  prime
    numbers are odd. Therefore, there is zero need to perform a composite
    check on  an even  number, allowing  us to  focus exclusively  on odd
    values (luckily, they seem to occur in a predictable pattern).

  * **sqrt() trick** - mathematically it has  been shown that if a number
    has any  evenly divisible factors, at  least one half of  that factor
    pair will occur by the square root point of the number being tested.

  * **sqrt()-less square  root approximation**  - **sqrt()**,  a function
    in  the  math  library,  does  an  industrial  strength  square  root
    calculation.  We  don't  need  that, merely  a  whole  integer  value
    corresponding to the approximate square  root. Here we will implement
    our  own  logic   to  approximate  square  root,   hopefully  with  a
    considerable performance impact.

Unless  specified  in  the  encoded  name,  your  algorithm  should  only
implement the algorithm and optimization(s) specified.

Some  of these  optimizations can  co-exist easily  (break +  map, odd  +
sqrt()),  others are  partially  compatible  (map +  odd  can coexist  in
a  certain  form),  while  others  are  mutually  exclusive  (sqrt()  and
approximated  square  root  conflict).  So there  are  definitely  a  few
combinations that are possible using this scheme.

====Program Specifications====
Your program should otherwise work like the discrete/cnv1 program did:

  * obtain   parameters  from   the   command-line  (see   **command-line
    arguments** section below).

    * check to make sure the  user indeed supplied enough parameters, and
      exit with an error message if not.

    * argv[1]: maximum  quantity  of  primes to  calculate (your  program
      should run until it discovers **that** many primes).

      * this value should be an integer value, greater than or equal to 0

        * if argv[1] is  0, disable the quantity check, and  rely only on
          provided lower and upper bounds (thru argv[4] would be required
          in this case).

    * argv[2]: N-ary specification; for this section, we are going to use
      **1**

    * argv[3]:  **conditionally optional** lower bound  (starting value).
      Most of the time, this  will  probably be  **2**, but  should be  a
      positive integer greater than or equal to 2. This defines where the
      program will start its prime quantity check from.

      * if omitted, assume a lower bound of **2**.

      * if you desire  to specify an upper bound (argv[4]), you obviously
        MUST provide the lower bound argument under this scheme.

    * argv[4]: **conditionally optional**  upper bound (ending value). If
      provided, this is the ending value you'd like to check to.

      * If doing a quantity run (argv[1] is NOT 0), you don't need  this.

      * If doing a quantity run AND you specify an upper bound, whichever
        condition is  achieved first  dictates program  termination. That
        is, upper bound could override quantity (if it is achieved before
        quantity), and  quantity can override  the upper bound (if  it is
        achieved before reaching the specified upper bound).

    * argv[5]: specification of optimizations. Like in discrete/cnv1, the
      values are bitwise-specified in a single numeric value,  broken out
      as follows:

        0  - no optimizations (naive, brute force approach)
        1  - break on composite
        2  - odds-only checking
        4  - map
        8  - sqrt
        16 - approximate square root

    * for each argument:  you should do a basic check  to ensure the user
      complied  with this  specification, and  exit with  a unique  error
      message (displayed to STDERR) otherwise:

      * for insufficient quantity  of arguments, display: **PROGRAM_NAME:
        insufficient number of arguments!**

      * for invalid argv[1], display: **PROGRAM_NAME: invalid quantity!**

      * for invalid argv[2], display: **PROGRAM_NAME: invalid value!**

      * invalid  argv[3], display: **PROGRAM_NAME: invalid lower bound!**

        * if argv[3] is not needed, ignore (no error displayed not forced
        exit, as it is acceptable defined behavior).

      * invalid  argv[4], display: **PROGRAM_NAME: invalid upper bound!**

        * if argv[4] is not needed, ignore (no error displayed nor forced
          exit, as it is acceptable defined behavior).

        * In these  error messages, **PROGRAM_NAME**  is the name  of the
          program being run;  this can be accessed as a  string stored in
          **argv[0]**.

  * perform  ONLY  the  optimization(s) specified  on  the  command-line.
    We  are producing  multiple  data points  for  a broader  performance
    comparison.

  * please take note on differences in run-time, contemplating the impact
    the  algorithm  and  optimization(s)  have  on  performance  (timing,
    specifically).

  * immediately  after  argument  processing: start  your stopwatch  (see
    **timing** section below).

  * perform   the  correct  algorithm  and  optimization(s)  against  the
    command-line input(s) given.

    * your program is  to have no fewer and no more than  1 loop  in this
      prime processing section.

  * display identified primes (space-separated)  to a file pointer called
    **stdout**

  * stop your stopwatch immediately following your prime processing loops
    (and terminating newline display to  **stdout**). Calculate  the time
    that has transpired (ending time minus starting time).

  * output the processing run-time to the file pointer called **stderr**

  * your  output   **MUST**  conform  to   the  example  output   in  the
    **execution** section below. This is also  a test to see how well you
    can implement to specifications. Basically:

    * as  primes are  being  displayed, they  are space-separated  (first
      prime hugs the left margin), and  when all said and done, a newline
      is issued.

====Coding Restrictions====
Since a lot of our explorations this semester were algorithmic in nature,
we made a lot  of use of restricting various approaches  we could take in
the implementation of our solutions.

As   such,   the  following   restrictions   are   in  place   for   your
implementations:

  * no global variables
  * no infinite loops
  * no if() shunts (have good conditions!)
    * what  is  the  difference  between   an  if()  shunt  and  a  valid
      conditional  statement? if()  shunts are  a messy  way of  avoiding
      iterations;  valid if()  statements  denote conditionally  optional
      processing that occurs every iteration.
  * one return statement (max) per function
  * exit() calls are limited to command-line or resource allocation error
    processing
  * avoid redundant sections of code (merge logic or break out functions)

Remember, the focus  should be on writing elegant code,  and NOT on brute
forcing  some  solution.  Show  me that  you've  learned  something  this
semester- write clean, well commented, consistently indented code.

=====Makefile operations=====
Makefiles provide a build automation system for our programs, instructing
the computer  on how  to compile  files, so we  don't have  to constantly
type  compiler  command-lines  ourselves.   I've  also  integration  some
other  useful,  value-added features  that  will  help you  with  overall
administration of the project.

Basic  operation  of the  Makefile  is  invoked  by running  the  command
"**make**" by itself. The default action  is to compile everything in the
project directory.

A description of some available commands include:

  * **make**: compile everything

    * any  **warnings** or **errors**  generated by the compiler  will go
      into a file  in  the  base  directory of  pnc0  in  a  file  called
      **errors**; you can **cat** it to view the information.

  * **make debug**: compile everything with debug support

    * any  **warnings** or **errors**  generated by the compiler  will be
      displayed to the screen as the programs compile.

  * **make clean**: remove all binaries

  * **make save**: make a backup of your current work

=====Command-Line Arguments=====
To  automate our  comparisons,  we  will be  making  use of  command-line
arguments in our programs.

====header files====
We don't need  any extra header files to use  command-line arguments, but
we will need  an additional header file to use  the **atoi(3)** function,
which  we'll use  to  quickly  turn the  command-line  parameter into  an
integer, and that  header file is **stdlib.h**, so be  sure to include it
with the others:

<code c>
#include <stdio.h>
#include <stdlib.h>
</code>

====setting up main()====
To accept (or rather, to gain  access) to arguments given to your program
at runtime,  we need to  specify two  parameters to the  main() function.
While  the names  don't matter,  the types  do.. I  like the  traditional
**argc**  and **argv**  names, although  it is  also common  to see  them
abbreviated as **ac** and **av**.

Please declare your main() function as follows:

<code c>
int main (int argc, char **argv)
</code>

There  are two  very important  variables  involved here  (the types  are
actually  what  are important,  the  names  given  to the  variables  are
actually quite, variable;  you may see other references refer  to them as
things like "ac" and "av"):

  * int argc: the count (an integer)  of tokens given on the command line
    (program name + arguments)

  * <nowiki>char **argv</nowiki>:  an array  of strings  (technically  an
    array  of an  array of  char) that contains "strings" of the  various
    tokens provided on the command-line.

The arguments are  accessible via the argv array, in  the order they were
specified:

  * argv[0]: program invocation (path + program name)
  * argv[1]: our maximum / upper bound
  * argv[2]: should be provided as a 1 for this project
  * argv[3]: conditionally optional; represents lower bound
  * argv[4]: conditionally optional; represents upper bound
  * argv[5]: conditionally optional; represents optimizations, if any

Additionally, let's not  forget the **argc** variable,  an integer, which
contains a  count of arguments (argc  == argument count). If  we provided
argv[0] through argv[4], argc would contain a 5.

===example===
For example, if we were to execute the as follows:

<cli>
lab46:~/src/discrete/eoce/0x3$ ./cnv3 128 1 2 2048 13
</cli>

We'd have:

  * <nowiki>argv[0]</nowiki>: "./cnv3" 
  * <nowiki>argv[1]</nowiki>: "128" (note, NOT integer 128, but a string) 
  * <nowiki>argv[2]</nowiki>: "1"
  * <nowiki>argv[3]</nowiki>: "2" 
  * <nowiki>argv[4]</nowiki>: "2048" 
  * <nowiki>argv[5]</nowiki>: "13" 

and let's not forget:

  * argc: 6   (there are 6 things, argv indexes 0, 1, 2, 3, 4, and 5)

With the  conditionally optional arguments  as part of the  program spec,
for a valid execution of the program, argc could be a value anywhere from
3 to 6.

====Simple argument checks====
While there are  a number of checks  we should perform, one  of the first
should be  a check  to see if  the minimal number  of arguments  has been
provided:

<code c>

    // if  less than 3 arguments (program_name + quantity + argv[2] == 3)
    // have been provided
    //
    if (argc < 3)
    {
        fprintf(stderr, "%s: insufficient number of arguments!\n", argv[0]);
        exit(1);
    }
</code>

Since argv[3] (lower  bound) and argv[4] (upper  bound) are conditionally
optional, it wouldn't make sense to  check for them in the overall count.
But  we  can and  do  still  want  to  stategically utilize  **argc**  to
determine if an argv[3] or argv[4] is present.

====Grab and convert max====
Finally, we  need to put  the argument representing the  maximum quantity
into a variable.

I'd recommend declaring a variable of type **int**.

We will use the **atoi(3)**  function to quickly convert the command-line
arguments into **int** values:

<code c>
    max  = atoi (argv[1]);
</code>

And now we can proceed with the rest of our prime implementation.

=====Timing=====
Often  times,  when  checking  the  efficiency  of  a  solution,  a  good
measurement  (especially  for  comparison),  is  to  time  how  long  the
processing takes.

In order to do  that in our prime number programs, we are  going to use C
library  functions  that  obtain  the  current time,  and  use  it  as  a
stopwatch: we'll grab the time  just before starting processing, and then
once more when  done. The total time will then  be the difference between
the two (end_time - start_time).

We are going  to use the **gettimeofday(2)** function to  aid us in this,
and to use it, we'll need to do the following:

====header file====
In order  to use the  **gettimeofday(2)** function in our  program, we'll
need to include the  **sys/time.h** header file, so be sure  to add it in
with the existing ones:

<code c>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
</code>

====timeval variables====
**gettimeofday(2)** uses a  **struct timeval** data type,  of which we'll
need  to declare  two  variables in  our programs  (one  for storing  the
starting time, and the other for the ending time).

Please declare these  with your other variables, up at  the top of main()
(but still WITHIN main()-- you do not need to declare global variables).

<code c>
    struct timeval time_start; // starting time
    struct timeval time_end;   // ending time
</code>

====Obtaining the time====
To use **gettimeofday(2)**,  we merely place it at the  point in our code
we wish to take the time.

For  our prime  number  programs,  you'll want  to  grab  the start  time
**AFTER** you've  declared variables and processed  arguments, but **JUST
BEFORE** starting the driving loop doing the processing.

That call will look something like this:

<code c>
    gettimeofday(&time_start, 0);
</code>

The ending  time should  be taken immediately  after all  processing (and
prime number output) is completed, and right before we display the timing
information to STDERR:

<code c>
    gettimeofday(&time_end, 0);
</code>

====Displaying the runtime====
Once we have  the starting and ending  times, we can display  this to the
**stderr** file pointer. You'll want this line:

<code c>
fprintf(stderr, "%8.4lf\n",
time_end.tv_sec-time_start.tv_sec+((time_end.tv_usec-time_start.tv_usec)/1000000.0));
</code>

For clarity  sake, that format specifier  is "%8.4lf", where the  "lf" is
"long  float", that  is **NOT**  a number  'one' but  a lowercase  letter
'ell'.

And with  that, we can compute  an approximate run-time of  our programs.
The timing won't necessarily be accurate down to that level of precision,
but it will be informative enough for our purposes.

=====Execution=====

====specified quantity====
Your program output should be as follows (given the specified quantity):

<cli>
lab46:~/src/discrete/eoce/0x3$ ./cnv3 24 1 # same as 24 1 0 0 0
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 
  0.0001
lab46:~/src/discrete/eoce/0x3$ 
</cli>

The execution of  the programs is short and simple-  grab the parameters,
do the processing, produce the output, and then terminate.

====invalid lower bound====
Here's an  example that should generate  an error upon running  (based on
project specifications):

<cli>
lab46:~/src/discrete/eoce/0x3$ ./cnv3 32 1 0
./cnv3: invalid lower bound
lab46:~/src/discrete/eoce/0x3$ 
</cli>

In this case, the program logic should have detected an invalid condition
and bailed  out before prime computations  even began. No timing  data is
displayed, because exiting should occur even prior to that.

====upper bound overriding quantity====
As indicated above, there is  potential interplay with an active quantity
and upper  bound values. Here is  an example where upper  bound overrides
quantity, resulting in an early termination (ie upper bound is hit before
quantity):

<cli>
lab46:~/src/discrete/eoce/0x3$ ./cnv3 128 1 7 23 10
7 11 13 17 19 23
  0.0001
lab46:~/src/discrete/eoce/0x3$ 
</cli>

Also for  fun, I  set the  lower bound  to 7,  so you'll  see computation
starts at 7 (vs. the usual 2).

====In general====
Analyze  the  times  you  see...  do they  make  sense,  especially  when
comparing the algorithm used and  the quantity being processed? These are
related  to  some very  important  core  Computer Science  considerations
we  need  to  be  increasingly  mindful of  as  we  design  our  programs
and  implement  our  solutions. Algorithmic  complexity  and  algorithmic
efficiency will be common themes in all we do.

=====Submission=====
To successfully  complete this  project, the  following criteria  must be
met:

  * Code must compile cleanly (no warnings or errors)
  * Output must be correct, and match that given in sample output above.
  * Code must be nicely and consistently indented
  * Code must utilize the algorithm and optimizations presented above.
  * Code must be commented
  * Output Formatting (including spacing)  of program must conform to the
    provided output (see above).

  * Track/version the source code in your lab46 SEMESTER repository
  * Submit a copy of your source code to me using the **submit** tool.