User Tools

Site Tools


Sidebar

projects

wcp1 (due 20240124)
pct0 (bonus; due 20240125)
pct1 (bonus; due 20240125)
abc0 (due 20240131)
btt0 (due 20240131)
pct2 (due 20240131)
wcp2 (due 20240131)
mpg0 (due 20240207)
pct3 (bonus; due 20240207)
wcp3 (due 20240207)
def0 (due 20240214)
pct4 (due 20240214)
wcp4 (due 20240214)
bwp1 (bonus; due 20240228)
cta0 (due 20240228)
pct5 (bonus; due 20240228)
wcp5 (due 20240228)
cta1 (due 20240306)
gfo0 (due 20240306)
pct6 (due 20240306)
wcp6 (due 20240306)
pct7 (bonus; due 20240313)
wcp7 (due 20240313)
dap0 (due 20240314)
dap1 (due 20240320)
pct8 (due 20240320)
wcp8 (due 20240320)
pct9 (bonus; due 20240327)
wcp9 (due 20240327)
dap2 (due 20240329)
bwp2 (bonus; due 20240410)
gfo1 (due 20240410)
pctA (due 20240410)
pnc0 (due 20240410)
wcpA (due 20240410)
pctB (bonus; due 20240417)
pnc1 (due 20240417)
wcpB (due 20240417)
pctC (due 20240424)
pnc2 (due 20240424)
wcpC (due 20240424)
pctD (bonus; due 20240501)
wcpD (bonus; due 20240501)
gfo2 (due 20240508)
pctE (bonus; due 20240508)
wcpE (bonus; due 20240508)
EoCE (due 20240516)
haas:spring2024:comporg:projects:pnc0

Corning Community College

CSCS2650 Computer Organization

PROJECT: Prime Number Computation (PNC0)

OBJECTIVE

Start exploring algorithm/implementation comparison and optimization with respect to various approaches of computing prime numbers.

TASK

Implement two separate, independent programs, one in Vircon32 C, and other in Vircon32 assembly, that:

  • performs a brute force/“trial-by-division” process on a range of values, 2-N
    • the values for N are some sufficient quantity still small enough to fit within an integer
    • the values for N will have some relationship (powers of 2, powers of 10/magnitudes) that ideally can be computed via some loop/equation (ie 1024, 2048, 4098, 8192, 16384, etc.)
    • the values for N have some sufficient quantity large enough where its upper set values will take some amount of time to compute (fast enough to have some relatable value, not to exceed 16 seconds)
    • for each value of N:

* display that N/upper bound

  • tally: display the number of primes identified (2-N)
  • display the amount of time taken to do the total computation for that value of N, out to 3 decimal places
  • display each N value and result in an arrangement on the screen that can be clearly identified and read by the viewer
  • timing should go out, as reasonable, to a few decimal places, and should be consistent across all attempts.
  • timing is on the computational process only, not the display of results.
  • create a graph (using some external tool) that plots the performance of the C and assembly implementations working on identical workloads of this brute force algorithm according to the various N's and the time it took. Share your graph of your results on the class discord and on the project documentation page.
  • a line graph is the suggested best candidate
  • the assembly version is to be done entirely by hand, and make zero use of C API functions. Just the usual in/out stuff we've been doing.
  • this will not be an interactive program: it starts up, does its thing, outputs it results, then halts.
  • this brute force implementation is meant as our baseline. As such, it should not contain any optimizations or attempted improvements. As we progress through pnc1 and pnc2, this base implementation should be the least efficient. This is important, to allow us to realize the impact of various improvements we will be making in those upcoming projects.

REFERENCE

The following are reference screenshots of what your implementations should approximate.

PNC0

C implementation

EDIT

You will want to go here to edit and fill in the various sections of the document:

PNCX

algorithm: brute force / trial-by-division

variant: naive

The naive implementation is our baseline: implement with no awareness of potential tweaks, improvements, or optimizations. This should be the worst performing when compared to any optimization.

START TIMEKEEPING
NUMBER: FROM 2 THROUGH UPPERBOUND:
    ISPRIME <- YES
    FACTOR: FROM 2 THROUGH NUMBER-1:
        SHOULD FACTOR DIVIDE EVENLY INTO NUMBER:
            ISPRIME <- NO
    PROCEED TO NEXT FACTOR
    SHOULD ISPRIME STILL BE YES:
        INCREMENT OUR PRIME TALLY
PROCEED TO NEXT NUMBER
STOP TIMEKEEPING
Future Proofing

From the previous dap projects, we learned a lot about stacks and creating subroutines. With that newfound knowledge, we can create a couple of subroutines to help us with this and later pnc projects. When taking a step back and looking at the output we see that we are displaying strings, integers, and floats. So that three subroutines that we can create now and use later. Further, we can also create a subroutine for doing the brute logic of finding prime numbers. I personally started by creating these as separate files but found later on that having them in one big file is just easier to work with.

For the print subroutine, we actually can work with two old programs. Looking at any of the dap code that we wrote, let's grab the starting and end logic of the subroutine that relates pushing and popping registers along with the code that gets our X, Y, and “Thing” from the stack. Now for the brains of our program. What actually happens in the subroutine was created in one of our very first classes. We created a little program that printed Hello World in ASM. We can use that!

Do note that we cannot just simply copy and paste all of this, and it suddenly works, there are some adjustments that need to be made but by this point, you are 90% of the way there.

Printing Seconds

When printing the seconds that it takes to do the brute force there are generally two approaches that you may want to consider using int's and arithmetic or using the float with some fancy math. But before that, how do you get the seconds? Its relatively easy in assembly all you have to do is record the frames before and after you complete the run. One example of this is something like below.

  in R8, TIM_FrameCounter
  mov startFrames, R8
 
  call _brute_force
 
  in R8, TIM_FrameCounter
  mov endFrames, R8
 
  ;Later in the code
  mov R8, startFrames
  mov R9, endFrames
  isub R9, R8 ; subtract the start frames from end
  CIF R9 ;changing data type to float
  fdiv R9, 60 ;dividing by 60 to get the amount of seconds
  mov seconds, R9 ;store the seconds for use later in the code 

int: integer

Float

There is more than one way to print the runtime as an integer with three decimal places. One way is by subtracting the initial frame count (before the brute force) from the final frame count (after the brute force) to get the total frames elapsed, multiply that by 1000, then divide by 60 (since there are 60 frames per second). The result will be the time in milliseconds. Using your print/itoa subroutine, you can print the first three digits, then a decimal place, then the last three digits. There might be a simpler way that makes more sense to you (such as using floats), but this is what I did.

variant: break on composite (BOC)

Break on Composite is one of the first, and simplest, improvements that can be made to a prime number generator.

In the Brute Force algorithm, every number between 2 and the current number is checked to see if it is a factor. Even after a factor is found, it keeps going through the rest of the possible factors.

What “Break on Composite” does is that after it finds a factor, it does not need to check further, and can break out of the loop.

variant: odds-only processing

Optimizing prime number generation by considering only odd numbers can significantly enhance the efficiency of the algorithm. Since even numbers (except 2) are inherently not prime, eliminating them from consideration reduces the number of iterations required for testing primality. By starting with 3 as the first odd prime number, subsequent odd numbers are generated and tested for primality. This optimization effectively cuts the search space in half, as it eliminates the need to check even numbers, leading to a faster algorithm.

variant: sqrt point

By recognizing that factors of a number occur in pairs, one of which is less than or equal to the square root of the number, we can limit our search space. This method significantly reduces the number of iterations required to test primality, as we only need to examine potential factors up to the square root of the number in question. Consequently, computational resources are conserved, resulting in a faster and more streamlined algorithm.

How to Sqrt

While sqrt might seem easy at first it is kind of involved if you don't fully understand what is going on. To best understand how to sqrt let's look at the generated function from the .asm you get from complaining your c program.

__function_sqrt:
  push BP
  mov BP, SP
  push R1
  mov R0, [BP+2]
  mov R1, 0.5
  pow R0, R1
  pop R1

As we can see sqrt actually makes use of the pow instruction. Also note that there is no number directly fed into pow because you can only give registers so you'll need to make use of a temp register here. At this point, you are probably thinking that this is supper easy and isn't that hard to figure out but that is not the case.

When actually implementing this into your program you'll need to make use of cif and cfi. Lets take a look at the following example:

_useSqrt:
    ; Square rooting number
    mov R12, R9 ; R12 is a temp register in this case
    cif R12     ; We then convert that int into a float 
    mov R1, 0.5 ; R1 is also temp register and holds the power 
    pow R12, R1 ; Now calling pow with TWO floats
    cfi R12     ; converting back to int

So while this is similar to the generated function we needed to make use of cif and cfi and then pow works.

variant: break+odds

Combining the Break method with the Odds optimization further enhances the efficiency of the prime number generation algorithm. The Break method ensures that after finding a factor, the algorithm breaks out of the loop, thus eliminating unnecessary iterations and conserving computational resources.

By considering only odd numbers and incorporating the Break method, the Break+Odds optimization significantly reduces the computational load. It starts with 3 as the first odd prime number and subsequently generates and tests odd numbers for primality. With the Break method in place, the algorithm terminates the inner loop as soon as a factor is found, leading to a faster and more efficient process of identifying prime numbers.

The implementation follows the structure of the brute force algorithm but with the additional optimization of skipping even numbers and breaking out of the loop upon finding a factor. This combined approach drastically improves the performance of the prime number generation process.

variant: break+sqrt

Integrating the Break method with the Square Root optimization further refines the prime number generation algorithm. By breaking out of the loop after finding a factor and limiting the search space to the square root of the current number, unnecessary iterations are avoided, resulting in a more streamlined and efficient algorithm.

The Break+Square Root optimization leverages the approximate square root to determine the upper limit of the inner loop while also incorporating the Break method to terminate the loop early upon finding a factor. This combined approach reduces both the computational complexity and runtime of the algorithm.

The implementation follows a similar structure to the brute force algorithm, with the addition of both the Square Root and Break optimizations. By iterating up to the square root of the current number and breaking out of the loop upon finding a factor, the algorithm achieves significant performance improvements compared to the naive approach.

variant: break+odds+sqrt

Break+odds+Sqrt is a culmination of all that you have done up to this point, so lets list out the requirements over the normal brute force.

  1. Use sqrt/approximate square root to limit how far up the inner for loop goes.
  2. Start at 3 for both for loops, and increment both loop counter by 2.
  3. As soon as it hits a value that changes if it is counted as a prime, it breaks.

This is what your c implementation should look like. This piece of code utilizes the approximate square root, which you can see by the j*j.

  bool isprime=true;
  int primecounter=0;
  for(int i=3; i<n; i+=2){
      isprime=true;
      for(int j=3;j*j<=i; j+=2){
          if(i%j==0){
              isprime=false;
              break;
          }
      }
      if(isprime){
          primecounter++;
      }
  }
  primecounter++;

ALGORITHM: sieve of eratosthenes

variant: baseline soe

The sieve of Eratosthenes is one of the best algorithms for finding prime numbers, you may have noticed that up to this point all the code we have written has a complexity of O(n^2). The soe takes the next step and goes to O(nlog(log(n)).

Here is how the Sieve of Eratosthenes works:

First, you start with 2, and count up to your upper bound. For this example, let's say it is 40:

    2  3  4  5  6  7  8  9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40

Then, you go through the list and remove multiples of 2. After that, you go to the next remaining number, which you now know is prime. Then, you remove multiples of that number, and so on.

To continue from above, 2 is a prime number, so you leave it alone, and remove any multiples of 2:

    2  3     5     7     9  
11    13    15    17    19  
21    23    25    27    29  
31    33    35    37    39  

Then, you go to the next number: 3. Now you know 3 is a prime number, so you can remove multiples of 3:

    2  3     5     7        
11    13          17    19  
      23    25          29  
31          35    37        

You go through the entire list, and when you get to the end, you are only left with prime numbers:

    2  3     5     7        
11    13          17    19  
      23                29  
31                37        
START TIMEKEEPING
NUMBER: FROM 2 THROUGH UPPERBOUND:
    SHOULD THE NUMBER SLOT BE TRUE:
        VALUE AT NUMBER IS PRIME, INCREMENT TALLY
        MULTIPLE: FROM NUMBER+NUMBER THROUGH UPPERBOUND:
            VALUE AT MULTIPLE IS NOT PRIME
            MULTIPLE IS MULTIPLE PLUS NUMBER
        PROCEED TO NEXT MULTIPLE
    INCREMENT NUMBER
PROCEED TO NEXT NUMBER
STOP TIMEKEEPING
variant: sieve of eratosthenes with sqrt trick (soes)
START TIMEKEEPING
NUMBER: FROM 2 THROUGH NUMBER*NUMBER<UPPERBOUND:
    SHOULD THE NUMBER SLOT BE TRUE:
        VALUE AT NUMBER IS PRIME, INCREMENT TALLY
        MULTIPLE: FROM NUMBER*NUMBER THROUGH UPPERBOUND:
            VALUE AT MULTIPLE IS NOT PRIME
            MULTIPLE IS MULTIPLE PLUS NUMBER
        PROCEED TO NEXT MULTIPLE
    INCREMENT NUMBER
PROCEED TO NEXT NUMBER
STOP TIMEKEEPING

timing

A potential example using the dokuwiki dataplot plugin for graphing data:

One easy way to get the timing in a consistent way is to have two functions to call, one to start the time and one to end the time. The startCycle/endCycle and startFrame/endFrame all represent different memory addresses.

  _time_start:
        PUSH BP
        mov BP, SP
        Push R0
        in R0, TIM_CycleCounter
        mov startCycle, R0
        in R0, TIM_FrameCounter
        mov startFrames, R0
        POP R0
        mov SP, BP
        POP BP
        ret
 
  _time_end:
        PUSH BP
        mov BP, SP
        Push R0
        in R0, TIM_CycleCounter
        mov endCycle, R0
        in R0, TIM_FrameCounter
        mov endFrames, R0
        POP R0
        mov SP, BP
        POP BP
        ret

Array in ASM

While Sieve of Eratosthenes might be easier to implement compared to pnc1 it does require knowing how to create an array in assembly which can be challenging if it has been a while sense you messed around with them. First will take a look at the array that we want to create in C and convert it into asm.

Here is the set up of the array in C:

    int [8193]primeAr; // Creating array
 
    // Filling primeAr with numbers from 2 to upperRange
    for (int pop=2; pop <= upperBound; pop++)
    {
        primeAr[pop]=pop;
    }

Lets look at what is happening here, so we have created an array called primeAr and we have a loop that starts at 2 and goes till upperBound which could be 1024, 2048, and so on. Inside the loop, we are specifying the index at a given point in the array and assigning it pop. This is populating our array going 2,3,4,5,…upperBound which will be important for the logic later down the line.

Now lets look at that in asm:

    mov R8, primeAr ; Starting point goes into R8
    iadd R8, 2      ; Incrementing R8 by 2
    mov R9, 2       ; starting point/number (pop in C)
 
_bLoop:
    ; Loop condition
    mov R13, R9
    ile R13, R2
    jf R13, _bLoopEnd
 
    mov [R8], R9 ; R9 moved into R8 array at current index
    iadd R8, 1   ; Increment memory address
    iadd R9, 1   ; Increment loop
 
    jmp _bLoop ; Back to start of loop
 
_bLoopEnd:

As you can see the logic is very similar but just knowing how to write it might be the hard part but after doing a couple look overs and experimenting it will click.

wedge pnc1 runtimes

Wolfgang pncX runtimes

Josiah pnc runtimes

Raw times for my code:

C Asm
1024 .48333 0.333
2048 1.95 1.399
4096 7.83333 5.599
8192 31.3166 22.366

pnc0-

pnc1-

pnc2-

jmerri10 pnc0 runtimes

pnc0

pnc1

jhimmel2 runtimes

PNC0 PNC1

dmorey2 pnc0 runtimes

dmorey2 pnc1 runtimes

dmorey2 pnc2 runtimes

rspringe Runtimes
pnc0

pnc1

pnc2

 

SUBMISSION

To be successful in this project, the following criteria (or their equivalent) must be met:

  • Project must be submit on time, by the deadline.
    • Late submissions will lose 33% credit per day, with the submission window closing on the 3rd day following the deadline.
  • Executed programs must display in a manner similar to provided output
    • output formatted, where applicable, must match that of project requirements
  • Processing must be correct based on input given and output requested
  • Output, if applicable, must be correct based on values input
  • Code must be nicely and consistently indented
  • Code must be consistently written, to strive for readability from having a consistent style throughout
  • Code must be commented
    • Any “to be implemented” comments MUST be removed
      • these “to be implemented” comments, if still present at evaluation time, will result in points being deducted.
      • Sufficient comments explaining the point of provided logic MUST be present
  • No global variables (without instructor approval), no goto statements, no calling of main()!
  • Track/version the source code in your lab46 semester repository
  • Submit a copy of your source code to me using the submit tool by the deadline.

Submit Tool Usage

Let's say you have completed work on the project, and are ready to submit, you would do the following:

lab46:~/src/SEMESTER/DESIG/PROJECT$ submit DESIG PROJECT file1 file2 file3 ... fileN

You should get some sort of confirmation indicating successful submission if all went according to plan. If not, check for typos and or locational mismatches.

RUBRIC

I'll be evaluating the project based on the following criteria:

260:pnc0:final tally of results (260/260)
*:pnc0:submitted C and assembly implementations [26/26]
*:pnc0:each implementation builds cleanly [26/26]
*:pnc0:output conforms to specifications [52/52]
*:pnc0:processing is correct, and to specifications [52/52]
*:pnc0:no optimizations or improvements on the process [26/26]
*:pnc0:graph produced from timing data produced [26/26]
*:pnc0:graph posted to discord and documentation page [26/26]
*:pnc0:timing data is the taken out to 3 decimal places [26/26]

Pertaining to the collaborative authoring of project documentation

  • each class member is to participate in the contribution of relevant information and formatting of the documentation
    • minimal member contributions consist of:
      • near the class average edits (a value of at least four productive edits)
      • near the average class content change average (a value of at least 1024 bytes (absolute value of data content change))
      • no zero-sum commits (adding in one commit then later removing in its entirety for the sake of satisfying edit requirements)
    • adding and formatting data in an organized fashion, aiming to create an informative and readable document that anyone in the class can reference
    • content contributions will be factored into a documentation coefficient, a value multiplied against your actual project submission to influence the end result:
      • no contributions, co-efficient is 0.50
      • less than minimum contributions is 0.75
      • met minimum contribution threshold is 1.00

Additionally

  • Solutions not abiding by spirit of project will be subject to a 50% overall deduction
  • Solutions not utilizing descriptive why and how comments will be subject to a 25% overall deduction
  • Solutions not utilizing indentation to promote scope and clarity or otherwise maintaining consistency in code style and presentation will be subject to a 25% overall deduction
  • Solutions not organized and easy to read (assume a terminal at least 90 characters wide, 40 characters tall) are subject to a 25% overall deduction
haas/spring2024/comporg/projects/pnc0.txt · Last modified: 2024/04/10 21:09 by 127.0.0.1