======Project: PNC4======
====Things====
timing comparisons between your primereg/b and primeregba
===how to do primeregba:===
This optimization employs the approximated square root trick (**NOT** utilizing an existing square root function, but using simpler logic you implement to approximate the square root point).
Further explanation
An optimization to the previous process, which used sqrt(), this variation will do the exact same thing, but without using the sqrt() function. It will approximate the square root.
We know that a square root (especially a whole numbered square root), is when we have whole number factors that are squared. But in addition, only considering the whole number aspect of the square root, we start seeing series of values with the same whole square root value:
lab46:~$ count=0; for ((i=2; i<152; i++)); do printf "[%3d] %2d " "${i}" `echo "sqrt($i)" | bc -q`; let count=count+1; if [ "${count}" -eq 10 ]; then echo; count=0; fi; done; echo
[ 2] 1 [ 3] 1 [ 4] 2 [ 5] 2 [ 6] 2 [ 7] 2 [ 8] 2 [ 9] 3 [ 10] 3 [ 11] 3
[ 12] 3 [ 13] 3 [ 14] 3 [ 15] 3 [ 16] 4 [ 17] 4 [ 18] 4 [ 19] 4 [ 20] 4 [ 21] 4
[ 22] 4 [ 23] 4 [ 24] 4 [ 25] 5 [ 26] 5 [ 27] 5 [ 28] 5 [ 29] 5 [ 30] 5 [ 31] 5
[ 32] 5 [ 33] 5 [ 34] 5 [ 35] 5 [ 36] 6 [ 37] 6 [ 38] 6 [ 39] 6 [ 40] 6 [ 41] 6
[ 42] 6 [ 43] 6 [ 44] 6 [ 45] 6 [ 46] 6 [ 47] 6 [ 48] 6 [ 49] 7 [ 50] 7 [ 51] 7
[ 52] 7 [ 53] 7 [ 54] 7 [ 55] 7 [ 56] 7 [ 57] 7 [ 58] 7 [ 59] 7 [ 60] 7 [ 61] 7
[ 62] 7 [ 63] 7 [ 64] 8 [ 65] 8 [ 66] 8 [ 67] 8 [ 68] 8 [ 69] 8 [ 70] 8 [ 71] 8
[ 72] 8 [ 73] 8 [ 74] 8 [ 75] 8 [ 76] 8 [ 77] 8 [ 78] 8 [ 79] 8 [ 80] 8 [ 81] 9
[ 82] 9 [ 83] 9 [ 84] 9 [ 85] 9 [ 86] 9 [ 87] 9 [ 88] 9 [ 89] 9 [ 90] 9 [ 91] 9
[ 92] 9 [ 93] 9 [ 94] 9 [ 95] 9 [ 96] 9 [ 97] 9 [ 98] 9 [ 99] 9 [100] 10 [101] 10
[102] 10 [103] 10 [104] 10 [105] 10 [106] 10 [107] 10 [108] 10 [109] 10 [110] 10 [111] 10
[112] 10 [113] 10 [114] 10 [115] 10 [116] 10 [117] 10 [118] 10 [119] 10 [120] 10 [121] 11
[122] 11 [123] 11 [124] 11 [125] 11 [126] 11 [127] 11 [128] 11 [129] 11 [130] 11 [131] 11
[132] 11 [133] 11 [134] 11 [135] 11 [136] 11 [137] 11 [138] 11 [139] 11 [140] 11 [141] 11
[142] 11 [143] 11 [144] 12 [145] 12 [146] 12 [147] 12 [148] 12 [149] 12 [150] 12 [151] 12
Or, if perhaps we instead order by square root value:
lab46:~$ oldsqrt=$(echo "sqrt(2)" | bc -q); for ((i=2; i<49; i++)); do newsqrt=$(echo "sqrt($i)" | bc -q); if [ "${newsqrt}" -ne "${oldsqrt}" ]; then echo; fi; printf "[%3d] %2d " "${i}" "${newsqrt}"; oldsqrt="${newsqrt}"; done; echo
[ 2] 1 [ 3] 1
[ 4] 2 [ 5] 2 [ 6] 2 [ 7] 2 [ 8] 2
[ 9] 3 [ 10] 3 [ 11] 3 [ 12] 3 [ 13] 3 [ 14] 3 [ 15] 3
[ 16] 4 [ 17] 4 [ 18] 4 [ 19] 4 [ 20] 4 [ 21] 4 [ 22] 4 [ 23] 4 [ 24] 4
[ 25] 5 [ 26] 5 [ 27] 5 [ 28] 5 [ 29] 5 [ 30] 5 [ 31] 5 [ 32] 5 [ 33] 5 [ 34] 5 [ 35] 5
[ 36] 6 [ 37] 6 [ 38] 6 [ 39] 6 [ 40] 6 [ 41] 6 [ 42] 6 [ 43] 6 [ 44] 6 [ 45] 6 [ 46] 6 [ 47] 6 [ 48] 6
We see that the square root of 36 is 6, but so is the square root of 37, 38, 39… etc. up until we hit 49 (where the whole number square root increments to 7).
Therefore, if we were checking 42 to be prime, we'd only have to check up to 6.
We don't need a sqrt() function to tell us this, we can determine the approximate square root point ourselves- by squaring the current factor being tested, and so long as it hasn't exceeded the value we're checking, we know to continue.
There are some important lessons at play here:
approximation can be powerful
approximation can result in a simpler algorithm, improving runtime
sqrt() is more complex than you may be aware, not to mention it is in a function. By avoiding that function call, we eliminate some overhead, and that can make a difference in runtime performance.
Depending on how you implement this and the original sqrt() algorithms, this version may have a noticeable performance difference. If, on the other hand, you were really optimal in both implementations, the performance difference may be narrower (if negligible).
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
^ ^ /\
| | ||
2^2 3^2 4^2
====Project specification====
* Doing a c version primeregba if not already done
* Adapting ASM to primeregba
* primes output to STDERR, time to STDOUT
====Wrapper Script:====
#!/bin/bash
starting=`gettimeofday`
${0}asm ${1} 2>/dev/null
end=`gettimeofday`
total=`bc <<< "scale=5; ${end} - ${starting}"`
printf "%.3lf\n" "${total}"
exit 0
====Kris Beykirch====
Here are my pncrun results:
==========pncrun:[reg]=========
qty base ba base ba
gcc gcc nasm nasm
===============================
128 0.001 0.000 0.004 0.001
256 0.005 0.000 0.016 0.002
512 0.026 0.000 0.076 0.002
1024 0.127 0.001 0.368 0.004
2048 0.605 0.002 ----- 0.008
4096 ----- 0.005 ----- 0.017
8192 ----- 0.012 ----- 0.041
16384 ----- 0.031 ----- 0.106
32768 ----- 0.082 ----- 0.275
65536 ----- 0.218 ----- 0.727
131072 ----- 0.590 ----- -----
262144 ----- ----- ----- -----
===============================
As we can see, the assembly version tends to perform approximately 2-4 times worse than what the compiler does for my C code. The break on composite and square root approximation algorithmic improvements still heavily outweigh the disadvantage of my assembly program, with both of my ba variants performing much better than either of the standard primereg variants. \\
*** Andrei Bratkovski *** \\
My pnc results: \\
==========pncrun:[reg]=========
qty base bs base base
gcc gcc nasm nasmba
===============================
128 0.001 0.000 0.004 0.002
256 0.006 0.000 0.020 0.005
512 0.050 0.001 0.095 0.024
1024 0.142 0.003 0.367 0.071
2048 0.672 0.008 ----- 0.301
4096 ----- 0.021 ----- 1.299
8192 ----- 0.061 ----- -----
16384 ----- 0.177 ----- -----
32768 ----- 0.510 ----- -----
65536 ----- ----- ----- -----
===============================
\\
Based on these results we can see that assembly is better left to the compiler. My C programs were more than 2x faster than my assembly versions. It is clear that while our assembly programs take up much less space than our C programs do, they are much clunkier because we were doing all of the memory allocation, moving, and variable declarations ourselves. With compilers being so optimized, it is doubtful that we can come up with a better version than what gcc turns our C code in to.
====mchon====
lab46:~/src/comporg/spring2018/week8/pnc4$ ./pncrun
==========pncrun:[reg]=========
qty base ba base bs
NASM NASM gcc gcc
===============================
128 0.004 0.001 0.001 0.000
256 0.016 0.002 0.006 0.000
512 0.074 0.002 0.028 0.000
1024 0.359 0.004 0.135 0.001
2048 ----- 0.009 0.654 0.003
4096 ----- 0.020 ----- 0.009
8192 ----- ----- ----- 0.024
16384 ----- ----- ----- 0.066
32768 ----- ----- ----- 0.183
65536 ----- ----- ----- 0.510
131072 ----- ----- ----- -----
Comparing the baseline primereg programs, it can be concluded that the primereg compiled by gcc ran significantly faster than my assembly compile program. Comparing my optimized programs seem to result in the same results. Both program compiled by gcc ran about 2-4 times faster than my assembly program.
====bstrong2====
=======================================================
gcc nasm gcc nasm
Primereg Primereg Primeregba Primeregba
Qty :
=======================================================
128 : 0.0043 0.008 0.0011 0.007
256 : 0.0080 0.014 0.0020 0.008
512 : 0.0300 0.047 0.0042 0.013
1024 : 0.1339 0.208 0.0050 0.035
2048 : 0.6130 0.980 0.0074 0.114
4096 : ------ ------ 0.0132 0.487
8192 : ------ ------ 0.0353 2.223
16384 : ------ ------ 0.0605 -----
32768 : ------ ------ 0.1445 -----
65536 : ------ ------ 0.3669 -----
131072: ------ ------ 0.9850 -----
Alright a little late to the party getting these values up sorry about that guys :P I totally forgot. Anyway as you can see from the results nasm is much slower from C (which we've determined before) So with that the big thing here is comparing primereg nasm with primeregba nasm... To which we see pretty big improvements... A little over doubling the Qty.