timing comparisons between your primereg/b and primeregba
This optimization employs the approximated square root trick (NOT utilizing an existing square root function, but using simpler logic you implement to approximate the square root point).
Further explanation An optimization to the previous process, which used sqrt(), this variation will do the exact same thing, but without using the sqrt() function. It will approximate the square root.
We know that a square root (especially a whole numbered square root), is when we have whole number factors that are squared. But in addition, only considering the whole number aspect of the square root, we start seeing series of values with the same whole square root value:
lab46:~$ count=0; for ((i=2; i<152; i++)); do printf "[%3d] %2d " "${i}" `echo "sqrt($i)" | bc -q`; let count=count+1; if [ "${count}" -eq 10 ]; then echo; count=0; fi; done; echo [ 2] 1 [ 3] 1 [ 4] 2 [ 5] 2 [ 6] 2 [ 7] 2 [ 8] 2 [ 9] 3 [ 10] 3 [ 11] 3 [ 12] 3 [ 13] 3 [ 14] 3 [ 15] 3 [ 16] 4 [ 17] 4 [ 18] 4 [ 19] 4 [ 20] 4 [ 21] 4 [ 22] 4 [ 23] 4 [ 24] 4 [ 25] 5 [ 26] 5 [ 27] 5 [ 28] 5 [ 29] 5 [ 30] 5 [ 31] 5 [ 32] 5 [ 33] 5 [ 34] 5 [ 35] 5 [ 36] 6 [ 37] 6 [ 38] 6 [ 39] 6 [ 40] 6 [ 41] 6 [ 42] 6 [ 43] 6 [ 44] 6 [ 45] 6 [ 46] 6 [ 47] 6 [ 48] 6 [ 49] 7 [ 50] 7 [ 51] 7 [ 52] 7 [ 53] 7 [ 54] 7 [ 55] 7 [ 56] 7 [ 57] 7 [ 58] 7 [ 59] 7 [ 60] 7 [ 61] 7 [ 62] 7 [ 63] 7 [ 64] 8 [ 65] 8 [ 66] 8 [ 67] 8 [ 68] 8 [ 69] 8 [ 70] 8 [ 71] 8 [ 72] 8 [ 73] 8 [ 74] 8 [ 75] 8 [ 76] 8 [ 77] 8 [ 78] 8 [ 79] 8 [ 80] 8 [ 81] 9 [ 82] 9 [ 83] 9 [ 84] 9 [ 85] 9 [ 86] 9 [ 87] 9 [ 88] 9 [ 89] 9 [ 90] 9 [ 91] 9 [ 92] 9 [ 93] 9 [ 94] 9 [ 95] 9 [ 96] 9 [ 97] 9 [ 98] 9 [ 99] 9 [100] 10 [101] 10 [102] 10 [103] 10 [104] 10 [105] 10 [106] 10 [107] 10 [108] 10 [109] 10 [110] 10 [111] 10 [112] 10 [113] 10 [114] 10 [115] 10 [116] 10 [117] 10 [118] 10 [119] 10 [120] 10 [121] 11 [122] 11 [123] 11 [124] 11 [125] 11 [126] 11 [127] 11 [128] 11 [129] 11 [130] 11 [131] 11 [132] 11 [133] 11 [134] 11 [135] 11 [136] 11 [137] 11 [138] 11 [139] 11 [140] 11 [141] 11 [142] 11 [143] 11 [144] 12 [145] 12 [146] 12 [147] 12 [148] 12 [149] 12 [150] 12 [151] 12
Or, if perhaps we instead order by square root value:
lab46:~$ oldsqrt=$(echo "sqrt(2)" | bc -q); for ((i=2; i<49; i++)); do newsqrt=$(echo "sqrt($i)" | bc -q); if [ "${newsqrt}" -ne "${oldsqrt}" ]; then echo; fi; printf "[%3d] %2d " "${i}" "${newsqrt}"; oldsqrt="${newsqrt}"; done; echo [ 2] 1 [ 3] 1 [ 4] 2 [ 5] 2 [ 6] 2 [ 7] 2 [ 8] 2 [ 9] 3 [ 10] 3 [ 11] 3 [ 12] 3 [ 13] 3 [ 14] 3 [ 15] 3 [ 16] 4 [ 17] 4 [ 18] 4 [ 19] 4 [ 20] 4 [ 21] 4 [ 22] 4 [ 23] 4 [ 24] 4 [ 25] 5 [ 26] 5 [ 27] 5 [ 28] 5 [ 29] 5 [ 30] 5 [ 31] 5 [ 32] 5 [ 33] 5 [ 34] 5 [ 35] 5 [ 36] 6 [ 37] 6 [ 38] 6 [ 39] 6 [ 40] 6 [ 41] 6 [ 42] 6 [ 43] 6 [ 44] 6 [ 45] 6 [ 46] 6 [ 47] 6 [ 48] 6
We see that the square root of 36 is 6, but so is the square root of 37, 38, 39… etc. up until we hit 49 (where the whole number square root increments to 7).
Therefore, if we were checking 42 to be prime, we'd only have to check up to 6.
We don't need a sqrt() function to tell us this, we can determine the approximate square root point ourselves- by squaring the current factor being tested, and so long as it hasn't exceeded the value we're checking, we know to continue.
There are some important lessons at play here:
approximation can be powerful approximation can result in a simpler algorithm, improving runtime sqrt() is more complex than you may be aware, not to mention it is in a function. By avoiding that function call, we eliminate some overhead, and that can make a difference in runtime performance. Depending on how you implement this and the original sqrt() algorithms, this version may have a noticeable performance difference. If, on the other hand, you were really optimal in both implementations, the performance difference may be narrower (if negligible).
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ^ ^ /\ | | || 2^2 3^2 4^2
#!/bin/bash starting=`gettimeofday` ${0}asm ${1} 2>/dev/null end=`gettimeofday` total=`bc <<< "scale=5; ${end} - ${starting}"` printf "%.3lf\n" "${total}" exit 0
Here are my pncrun results:
==========pncrun:[reg]========= qty base ba base ba gcc gcc nasm nasm =============================== 128 0.001 0.000 0.004 0.001 256 0.005 0.000 0.016 0.002 512 0.026 0.000 0.076 0.002 1024 0.127 0.001 0.368 0.004 2048 0.605 0.002 ----- 0.008 4096 ----- 0.005 ----- 0.017 8192 ----- 0.012 ----- 0.041 16384 ----- 0.031 ----- 0.106 32768 ----- 0.082 ----- 0.275 65536 ----- 0.218 ----- 0.727 131072 ----- 0.590 ----- ----- 262144 ----- ----- ----- ----- ===============================
As we can see, the assembly version tends to perform approximately 2-4 times worse than what the compiler does for my C code. The break on composite and square root approximation algorithmic improvements still heavily outweigh the disadvantage of my assembly program, with both of my ba variants performing much better than either of the standard primereg variants.
* Andrei Bratkovski *
My pnc results:
==========pncrun:[reg]========= qty base bs base base gcc gcc nasm nasmba =============================== 128 0.001 0.000 0.004 0.002 256 0.006 0.000 0.020 0.005 512 0.050 0.001 0.095 0.024 1024 0.142 0.003 0.367 0.071 2048 0.672 0.008 ----- 0.301 4096 ----- 0.021 ----- 1.299 8192 ----- 0.061 ----- ----- 16384 ----- 0.177 ----- ----- 32768 ----- 0.510 ----- ----- 65536 ----- ----- ----- ----- ===============================
Based on these results we can see that assembly is better left to the compiler. My C programs were more than 2x faster than my assembly versions. It is clear that while our assembly programs take up much less space than our C programs do, they are much clunkier because we were doing all of the memory allocation, moving, and variable declarations ourselves. With compilers being so optimized, it is doubtful that we can come up with a better version than what gcc turns our C code in to.
lab46:~/src/comporg/spring2018/week8/pnc4$ ./pncrun ==========pncrun:[reg]========= qty base ba base bs NASM NASM gcc gcc =============================== 128 0.004 0.001 0.001 0.000 256 0.016 0.002 0.006 0.000 512 0.074 0.002 0.028 0.000 1024 0.359 0.004 0.135 0.001 2048 ----- 0.009 0.654 0.003 4096 ----- 0.020 ----- 0.009 8192 ----- ----- ----- 0.024 16384 ----- ----- ----- 0.066 32768 ----- ----- ----- 0.183 65536 ----- ----- ----- 0.510 131072 ----- ----- ----- -----
Comparing the baseline primereg programs, it can be concluded that the primereg compiled by gcc ran significantly faster than my assembly compile program. Comparing my optimized programs seem to result in the same results. Both program compiled by gcc ran about 2-4 times faster than my assembly program.
======================================================= gcc nasm gcc nasm Primereg Primereg Primeregba Primeregba Qty : ======================================================= 128 : 0.0043 0.008 0.0011 0.007 256 : 0.0080 0.014 0.0020 0.008 512 : 0.0300 0.047 0.0042 0.013 1024 : 0.1339 0.208 0.0050 0.035 2048 : 0.6130 0.980 0.0074 0.114 4096 : ------ ------ 0.0132 0.487 8192 : ------ ------ 0.0353 2.223 16384 : ------ ------ 0.0605 ----- 32768 : ------ ------ 0.1445 ----- 65536 : ------ ------ 0.3669 ----- 131072: ------ ------ 0.9850 -----
Alright a little late to the party getting these values up sorry about that guys :P I totally forgot. Anyway as you can see from the results nasm is much slower from C (which we've determined before) So with that the big thing here is comparing primereg nasm with primeregba nasm… To which we see pretty big improvements… A little over doubling the Qty.