Table of Contents

Project: PNC4

Things

timing comparisons between your primereg/b and primeregba

how to do primeregba:

This optimization employs the approximated square root trick (NOT utilizing an existing square root function, but using simpler logic you implement to approximate the square root point).

Further explanation An optimization to the previous process, which used sqrt(), this variation will do the exact same thing, but without using the sqrt() function. It will approximate the square root.

We know that a square root (especially a whole numbered square root), is when we have whole number factors that are squared. But in addition, only considering the whole number aspect of the square root, we start seeing series of values with the same whole square root value:

lab46:~$ count=0; for ((i=2; i<152; i++)); do printf "[%3d] %2d " "${i}" `echo "sqrt($i)" | bc -q`; let count=count+1; if [ "${count}" -eq 10 ]; then echo; count=0; fi; done; echo
[  2]  1 [  3]  1 [  4]  2 [  5]  2 [  6]  2 [  7]  2 [  8]  2 [  9]  3 [ 10]  3 [ 11]  3
[ 12]  3 [ 13]  3 [ 14]  3 [ 15]  3 [ 16]  4 [ 17]  4 [ 18]  4 [ 19]  4 [ 20]  4 [ 21]  4
[ 22]  4 [ 23]  4 [ 24]  4 [ 25]  5 [ 26]  5 [ 27]  5 [ 28]  5 [ 29]  5 [ 30]  5 [ 31]  5
[ 32]  5 [ 33]  5 [ 34]  5 [ 35]  5 [ 36]  6 [ 37]  6 [ 38]  6 [ 39]  6 [ 40]  6 [ 41]  6
[ 42]  6 [ 43]  6 [ 44]  6 [ 45]  6 [ 46]  6 [ 47]  6 [ 48]  6 [ 49]  7 [ 50]  7 [ 51]  7
[ 52]  7 [ 53]  7 [ 54]  7 [ 55]  7 [ 56]  7 [ 57]  7 [ 58]  7 [ 59]  7 [ 60]  7 [ 61]  7
[ 62]  7 [ 63]  7 [ 64]  8 [ 65]  8 [ 66]  8 [ 67]  8 [ 68]  8 [ 69]  8 [ 70]  8 [ 71]  8
[ 72]  8 [ 73]  8 [ 74]  8 [ 75]  8 [ 76]  8 [ 77]  8 [ 78]  8 [ 79]  8 [ 80]  8 [ 81]  9
[ 82]  9 [ 83]  9 [ 84]  9 [ 85]  9 [ 86]  9 [ 87]  9 [ 88]  9 [ 89]  9 [ 90]  9 [ 91]  9
[ 92]  9 [ 93]  9 [ 94]  9 [ 95]  9 [ 96]  9 [ 97]  9 [ 98]  9 [ 99]  9 [100] 10 [101] 10
[102] 10 [103] 10 [104] 10 [105] 10 [106] 10 [107] 10 [108] 10 [109] 10 [110] 10 [111] 10
[112] 10 [113] 10 [114] 10 [115] 10 [116] 10 [117] 10 [118] 10 [119] 10 [120] 10 [121] 11
[122] 11 [123] 11 [124] 11 [125] 11 [126] 11 [127] 11 [128] 11 [129] 11 [130] 11 [131] 11
[132] 11 [133] 11 [134] 11 [135] 11 [136] 11 [137] 11 [138] 11 [139] 11 [140] 11 [141] 11
[142] 11 [143] 11 [144] 12 [145] 12 [146] 12 [147] 12 [148] 12 [149] 12 [150] 12 [151] 12

Or, if perhaps we instead order by square root value:

lab46:~$ oldsqrt=$(echo "sqrt(2)" | bc -q); for ((i=2; i<49; i++)); do newsqrt=$(echo "sqrt($i)" | bc -q); if [ "${newsqrt}" -ne "${oldsqrt}" ]; then echo; fi; printf "[%3d] %2d " "${i}" "${newsqrt}"; oldsqrt="${newsqrt}"; done; echo
[  2]  1 [  3]  1
[  4]  2 [  5]  2 [  6]  2 [  7]  2 [  8]  2
[  9]  3 [ 10]  3 [ 11]  3 [ 12]  3 [ 13]  3 [ 14]  3 [ 15]  3
[ 16]  4 [ 17]  4 [ 18]  4 [ 19]  4 [ 20]  4 [ 21]  4 [ 22]  4 [ 23]  4 [ 24]  4
[ 25]  5 [ 26]  5 [ 27]  5 [ 28]  5 [ 29]  5 [ 30]  5 [ 31]  5 [ 32]  5 [ 33]  5 [ 34]  5 [ 35]  5
[ 36]  6 [ 37]  6 [ 38]  6 [ 39]  6 [ 40]  6 [ 41]  6 [ 42]  6 [ 43]  6 [ 44]  6 [ 45]  6 [ 46]  6 [ 47]  6 [ 48]  6

We see that the square root of 36 is 6, but so is the square root of 37, 38, 39… etc. up until we hit 49 (where the whole number square root increments to 7).

Therefore, if we were checking 42 to be prime, we'd only have to check up to 6.

We don't need a sqrt() function to tell us this, we can determine the approximate square root point ourselves- by squaring the current factor being tested, and so long as it hasn't exceeded the value we're checking, we know to continue.

There are some important lessons at play here:

approximation can be powerful approximation can result in a simpler algorithm, improving runtime sqrt() is more complex than you may be aware, not to mention it is in a function. By avoiding that function call, we eliminate some overhead, and that can make a difference in runtime performance. Depending on how you implement this and the original sqrt() algorithms, this version may have a noticeable performance difference. If, on the other hand, you were really optimal in both implementations, the performance difference may be narrower (if negligible).

 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16    
     ^         ^                   /\    
     |         |                   ||    
    2^2       3^2                  4^2   

Project specification

Wrapper Script:

#!/bin/bash
 
starting=`gettimeofday`
 
${0}asm ${1} 2>/dev/null
 
end=`gettimeofday`
 
total=`bc <<< "scale=5; ${end} - ${starting}"`
printf "%.3lf\n" "${total}"
 
exit 0

Kris Beykirch

Here are my pncrun results:

==========pncrun:[reg]=========
    qty                base                  ba                base                  ba
                        gcc                 gcc                nasm                nasm
===============================
    128               0.001               0.000               0.004               0.001
    256               0.005               0.000               0.016               0.002
    512               0.026               0.000               0.076               0.002
   1024               0.127               0.001               0.368               0.004
   2048               0.605               0.002               -----               0.008
   4096               -----               0.005               -----               0.017
   8192               -----               0.012               -----               0.041
  16384               -----               0.031               -----               0.106
  32768               -----               0.082               -----               0.275
  65536               -----               0.218               -----               0.727
 131072               -----               0.590               -----               -----
 262144               -----               -----               -----               -----
===============================

As we can see, the assembly version tends to perform approximately 2-4 times worse than what the compiler does for my C code. The break on composite and square root approximation algorithmic improvements still heavily outweigh the disadvantage of my assembly program, with both of my ba variants performing much better than either of the standard primereg variants.

* Andrei Bratkovski *
My pnc results:

==========pncrun:[reg]=========
    qty                 base                   bs                 base                 base
                         gcc                  gcc                 nasm                nasmba
===============================
    128                0.001                0.000                0.004                0.002
    256                0.006                0.000                0.020                0.005
    512                0.050                0.001                0.095                0.024
   1024                0.142                0.003                0.367                0.071
   2048                0.672                0.008                -----                0.301
   4096                -----                0.021                -----                1.299
   8192                -----                0.061                -----                -----
  16384                -----                0.177                -----                -----
  32768                -----                0.510                -----                -----
  65536                -----                -----                -----                -----
===============================


Based on these results we can see that assembly is better left to the compiler. My C programs were more than 2x faster than my assembly versions. It is clear that while our assembly programs take up much less space than our C programs do, they are much clunkier because we were doing all of the memory allocation, moving, and variable declarations ourselves. With compilers being so optimized, it is doubtful that we can come up with a better version than what gcc turns our C code in to.

mchon

lab46:~/src/comporg/spring2018/week8/pnc4$ ./pncrun
==========pncrun:[reg]=========
      qty             base               ba             base               bs
                      NASM             NASM              gcc              gcc
===============================
    128            0.004            0.001            0.001            0.000
    256            0.016            0.002            0.006            0.000
    512            0.074            0.002            0.028            0.000
   1024            0.359            0.004            0.135            0.001
   2048            -----            0.009            0.654            0.003
   4096            -----            0.020            -----            0.009
   8192            -----            -----            -----            0.024
  16384            -----            -----            -----            0.066
  32768            -----            -----            -----            0.183
  65536            -----            -----            -----            0.510
 131072            -----            -----            -----            -----

Comparing the baseline primereg programs, it can be concluded that the primereg compiled by gcc ran significantly faster than my assembly compile program. Comparing my optimized programs seem to result in the same results. Both program compiled by gcc ran about 2-4 times faster than my assembly program.

bstrong2

=======================================================  
           gcc        nasm        gcc        nasm
        Primereg    Primereg  Primeregba  Primeregba
Qty   : 
=======================================================
128   :  0.0043      0.008        0.0011     0.007
256   :  0.0080      0.014        0.0020     0.008
512   :  0.0300      0.047        0.0042     0.013
1024  :  0.1339      0.208        0.0050     0.035
2048  :  0.6130      0.980        0.0074     0.114
4096  :  ------      ------       0.0132     0.487
8192  :  ------      ------       0.0353     2.223
16384 :  ------      ------       0.0605     -----
32768 :  ------      ------       0.1445     -----
65536 :  ------      ------       0.3669     -----
131072:  ------      ------       0.9850     -----

Alright a little late to the party getting these values up sorry about that guys :P I totally forgot. Anyway as you can see from the results nasm is much slower from C (which we've determined before) So with that the big thing here is comparing primereg nasm with primeregba nasm… To which we see pretty big improvements… A little over doubling the Qty.