User Tools

Site Tools


haxx:projects:sbc:renegade_benchmarks

Renegade SBC benchmarks

Overview

As was introduced in another document, a Renegade 4GB SBC was obtained, setup, and made operational for the purposes of performing benchmarks (specifically in comparison to the Raspberry Pi Model 3B).

This document will cover the benchmarking phase of the endeavor.

Background

Benchmarking is a process of measuring things for comparison, often for consideration of performance, efficiency, power utilization- whatever the desired metric may be.

At the same time, benchmarking is often flawed: it is a careful simulation of comparative activities. Often, the tests performed aren't necessarily indicative of real world use.

It is important to keep that in mind: just because something has shown certain benchmark values does not mean it will always conform to those values. Much like estimated gas mileage on cars- those too are benchmarks (and often done on the benchmarks of “city driving” (flat terrain with occasional stops) and “highway” (flat terrain with no stops). Take that same car on extensive hills (perhaps your real world use), and you will find actual performance differs greatly from advertised benchmarks.

Still, benchmarking is an excellent practice to sharpen your observation, experimentation, consideration, analytical, and even visualization skills, endeavoring to isolate enough aspects of a thing for the sake of a comparison.

And in this case, likely beneficial to give us an impression of the difference in hardware.

Also, when analyzing results, perspective is important. There are often two common results we encounter with many technology benchmarks:

  • lower is better
  • higher is better

But they DEPEND on what is being measured. Clearly, if you're measuring how long it takes to DO a task, that'll be in seconds, and better performance is gained by being able to accomplish the task in less time (lower is better). On the other hand, that task would be able to be accomplished faster if more data is able to be processed at a time (that would be measured not in seconds, but a storage unit, like MegaBytes), and therefore, a higher measured value would be better.

It is important to understand the importance of the measurement, and the units being measured, to properly gauge the impact of the endeavor (otherwise you're just babbling memorized numbers and are clueless about their meaning).

Hypothesis

It is my belief, through reading the technical documentation on the renegade sbc and what I know from the raspberry pi, that:

  • the raspberry pi is intentionally hobbled: it optimizes for price
  • the renegade specifically doesn't cut corners in various areas, leading to higher price and also better performance

Some things we know:

  • the renegade board has 4x the memory, and faster memory than the pi.
  • the renegade board has better network (gig-E), and other I/O facilities that indicate a better I/O infrastructure than the pi. On just I/O alone, the renegade should show marked improvements in its areas of strength.
  • The CPUs are similar, with the renegade being clocked slightly higher. Again, interconnects with the rest of the system will likely prove beneficial for the renegade sbc.
    • The renegade has crypto and video encoding enhancements (hardware accelerated), which when active and supported should offer clear improvements to those workloads.

Some things I discovered:

  • software support on the renegade is still in development. The stabilized custom kernel is a slightly older 4.4.91 (compared to the 4.9 kernel on rasbian). Lack of proper software (driver) support could give the raspberry pi an edge in these scenarios. As such, I hope to test the renegade board on this baseline 4.4 kernel, and also on kernels (like 4.14 and 4.15) which have incorporated better support for the devices it possesses.
  • 4.4 support for its gigabit ethernet is beyond shoddy. I've had to cap it at 100Mb just to get reliable operation (which would put it at least on par, in some aspects, with the pi3). BUT, poor support could also be a significant handicap (reliable operation does not imply optimal operation).

Experiment

Some variables I intend to test:

  • CPU performance
    • should the renegade's interconnect be that much better, that should result in markedly better performance. If on par, we'll likely only see a smaller edge here due to its ~200MHz increase in speed over the CPU in the pi3b.
  • network performance
    • 1Gb vs. 100Mb is just a surface detail; there are many other improvements under the hood that go into a networking device, including aspects of I/O and interconnect.
    • as stated above, in some scenarios the renegade is notably hobbled, lacking proper driver support for its networking hardware.
  • storage and I/O
    • USB3 and better designed interconnects/available bandwidth should demonstrate the renegade as the superior machine as far as hauling data around
    • furthermore, it has a higher performance storage interconnect, an eMMC slot, which I also intend to test (also good to compare its eMMC performance to its SD card performance)

Those are the biggies for now. Other opportunities will likely crop up as we go along.

Test 1: Cryptographic Processing

The renegade has hardware extensions to handle some cryptographic operations. I am not sure if they've yet been implemented, but: I found a cryptographic tool that has benchmarking capability, so a good place to start.

The tool: cryptsetup

The cryptsetup tool:

Description: disk encryption support - startup scripts
 Cryptsetup provides an interface for configuring encryption on block devices
 (such as /home or swap partitions), using the Linux kernel device mapper
 target dm-crypt.

While we're not looking to do actual disk encryption operations (although, truth be told, that would be another good angle to benchmark), the tool itself has a benchmark option, which is what we'll be using.

Results: renegade (with baseline 4.4 kernel)

Running cryptsetup benchmark on the renegade board yields the following results:

# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       186446 iterations per second for 256-bit key
PBKDF2-sha256     361577 iterations per second for 256-bit key
PBKDF2-sha512     137680 iterations per second for 256-bit key
PBKDF2-ripemd160  128754 iterations per second for 256-bit key
PBKDF2-whirlpool   18886 iterations per second for 256-bit key
#  Algorithm | Key |  Encryption |  Decryption
     aes-cbc   128b   285.3 MiB/s   342.0 MiB/s
 serpent-cbc   128b           N/A           N/A
 twofish-cbc   128b           N/A           N/A
     aes-cbc   256b   248.6 MiB/s   314.3 MiB/s
 serpent-cbc   256b           N/A           N/A
 twofish-cbc   256b           N/A           N/A
     aes-xts   256b   304.1 MiB/s   310.7 MiB/s
 serpent-xts   256b           N/A           N/A
 twofish-xts   256b           N/A           N/A
     aes-xts   512b   288.2 MiB/s   292.4 MiB/s
 serpent-xts   512b           N/A           N/A
 twofish-xts   512b           N/A           N/A

Notice some of the common units of measurement:

  • iterations per second
  • MiB/s

along with different encryption algorithms, and using those algorithms for encryption vs. decryption.

Let us see how the raspberry pi 3 did:

Results: pi3b

Running cryptsetup benchmark on the pi3b board yields the following results:

# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       114975 iterations per second for 256-bit key
PBKDF2-sha256     159843 iterations per second for 256-bit key
PBKDF2-sha512     114975 iterations per second for 256-bit key
PBKDF2-ripemd160  104025 iterations per second for 256-bit key
PBKDF2-whirlpool   23239 iterations per second for 256-bit key
#  Algorithm | Key |  Encryption |  Decryption
     aes-cbc   128b    26.2 MiB/s    29.4 MiB/s
 serpent-cbc   128b           N/A           N/A
 twofish-cbc   128b           N/A           N/A
     aes-cbc   256b    21.5 MiB/s    22.7 MiB/s
 serpent-cbc   256b           N/A           N/A
 twofish-cbc   256b           N/A           N/A
     aes-xts   256b    28.1 MiB/s    28.2 MiB/s
 serpent-xts   256b           N/A           N/A
 twofish-xts   256b           N/A           N/A
     aes-xts   512b    22.8 MiB/s    22.2 MiB/s
 serpent-xts   512b           N/A           N/A
 twofish-xts   512b           N/A           N/A

Results: lab46

More for fun, I tried to also run a number of these benchmarks on lab46, if only for subjective comparison (most of us are used to the performance/feel of an Intel CPU… ARM is still very much catching up (as we will likely see)).

Here are the numbers for lab46:

# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       914987 iterations per second for 256-bit key
PBKDF2-sha256    1054905 iterations per second for 256-bit key
PBKDF2-sha512     910222 iterations per second for 256-bit key
PBKDF2-ripemd160  679129 iterations per second for 256-bit key
PBKDF2-whirlpool  522199 iterations per second for 256-bit key
Required kernel crypto interface not available.

Interestingly, lab46 does not have a crypto kernel module loaded, so we won't get the disk I/O numbers. Still, the numbers we did get are telling.

Results comparison

Let's place some of these values side-by-side:

machine PBKDF2-sha1 (ips) PBKDF2-sha256 PBKDF2-sha512 PBKDF2-ripemd160 PBKDF2-whirlpool
renegade (4.4) 186446 361577 137680 128754 18886
pi3b (4.9) 114975 159843 114975 104025 23239
lab46 914987 1054905 910222 679129 522199

As largely expected, the renegade board is pronouncedly more powerful when it comes to a number of these algorithms… on PBKDF2-sha1 it enjoys a hefty improvement over the pi, and with PBKDF2-sha256 it also has an edge. We see that as the algorithms get more complex (I assume that is how they are ranked), we see the lead shrink between the two boards.

Then curiously, the “whirlpool” test sees the pi3b with a slight edge. Looking up the whirlpool algorithm, it is of a potentially different class of algorithm, which makes sense considering how much more rigorous it is on both machines compared to the rest.

When considering encryption, there's also the factor of usability (you don't want to make it TOO EASY for attackers to attack, yet you also don't want to hugely inconvenience the user with overbearing processing requirements). Something like the PBKDF2-sha512 or PBKDF2-ripemd160 would be prime considerations for production on these systems (as they are neither the worst nor best performing).

And, we can see that lab46 trounces the two ARM boards in performance. Even the more intensive whirlpool algorithm (the 'worst' performing on all), is like a factor of 20 better than the others.

The other results are showing data throughput:

machine aes-cbc128 E,D (MiB/s) aes-cbc256 E,D (MiB/s) aes-xts256 E,D (MiB/s) aes-xts512 E,D (MiB/s)
renegade (4.4) 285.3,342.0 248.6,314.3 304.1,310.7 288.2,292.4
pi3b (4.9) 26.2,29.4 21.5,22.7 28.1,28.2 22.8,22.2

And here there is no contest: the renegade soundly bests the pi3b by at least a factor of 10. For doing encrypted disk operations, the more data it is able to process per second, the better the performance (ie interactivity wouldn't appear to suffer, as much). And, we would definitely feel a difference between the sluggish pi3b if we tried to do encrypted disk operations.

Analysis

Given that the renegade's kernel may not be fully optimized for its hardware cryptographic functions, there are clear performance advantages almost across the board in favor of the renegade board. And then in the lone area where the pi3b currently exceeds, that value is far less practical than the other algorithms (and even then, we're only looking at a 19% improvement in whirlpool by the pi3b).

I will be interesting to see how the renegade will perform once proper support is implemented for its hardware resources.

Test 2: Network Performance

Less CPU-specific, measuring network performance speaks to underlying I/O (or, as we will also see, how important it is to have proper support for the hardware so we can adequately measure it).

As stated, in the renegade's 4.4 kernel, the network hardware is NOT fully supported; I actually had to force the speed down to 100Mb so that I wouldn't encounter instabilities. I've also read that improved driver support has been merged as of the 4.14/4.15 kernels, and I've got a 4.14 kernel to potentially test things with (once I successfully boot it, I will be going back and adding additional entries for it).

The tool: iperf

iperf is a tool to perform network throughput tests, an ideal way to test how well things may be working (and what they are capable of). While not indicative of everyday use and performance, it gives us an appreciation of its performance range.

To use this, we need to set up a client and a server.

The iperf server

On the tested machine, I started the server as follows:

# iperf -s -i 1

The iperf client

On the tested machine, I started the client as follows:

# iperf -c IP.AD.RE.SS -i 1

Results: renegade (with 4.4 kernel)

First up, the renegade board:

------------------------------------------------------------
Client connecting to IP.AD.RE.SS, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local IP.AD.RE.SS port 37268 connected with IP.AD.RE.SS port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   290 MBytes  2.43 Gbits/sec
[  3]  1.0- 2.0 sec   294 MBytes  2.46 Gbits/sec
[  3]  2.0- 3.0 sec   293 MBytes  2.46 Gbits/sec
[  3]  3.0- 4.0 sec   293 MBytes  2.46 Gbits/sec
[  3]  4.0- 5.0 sec   294 MBytes  2.46 Gbits/sec
[  3]  5.0- 6.0 sec   294 MBytes  2.46 Gbits/sec
[  3]  6.0- 7.0 sec   294 MBytes  2.47 Gbits/sec
[  3]  7.0- 8.0 sec   294 MBytes  2.47 Gbits/sec
[  3]  8.0- 9.0 sec   294 MBytes  2.47 Gbits/sec
[  3]  9.0-10.0 sec   295 MBytes  2.47 Gbits/sec
[  3]  0.0-10.0 sec  2.87 GBytes  2.46 Gbits/sec

Results: pi3b

Next up, the raspberry pi:

------------------------------------------------------------
Client connecting to IP.AD.RE.SS, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local IP.AD.RE.SS port 37268 connected with IP.AD.RE.SS port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   700 MBytes  5.87 Gbits/sec
[  3]  1.0- 2.0 sec   720 MBytes  6.04 Gbits/sec
[  3]  2.0- 3.0 sec   508 MBytes  4.26 Gbits/sec
[  3]  3.0- 4.0 sec   504 MBytes  4.22 Gbits/sec
[  3]  4.0- 5.0 sec   517 MBytes  4.34 Gbits/sec
[  3]  5.0- 6.0 sec   511 MBytes  4.29 Gbits/sec
[  3]  6.0- 7.0 sec   516 MBytes  4.33 Gbits/sec
[  3]  7.0- 8.0 sec   516 MBytes  4.33 Gbits/sec
[  3]  8.0- 9.0 sec   522 MBytes  4.38 Gbits/sec
[  3]  9.0-10.0 sec   523 MBytes  4.39 Gbits/sec
[  3]  0.0-10.0 sec  5.41 GBytes  4.64 Gbits/sec

Results: lab46

And, to show the scale of different hardware categories:

# iperf -c 10.80.2.46 -i 1
------------------------------------------------------------
Client connecting to 10.80.2.46, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local 10.80.2.46 port 59176 connected with 10.80.2.46 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  6.88 GBytes  59.1 Gbits/sec
[  3]  1.0- 2.0 sec  6.77 GBytes  58.2 Gbits/sec
[  3]  2.0- 3.0 sec  6.64 GBytes  57.0 Gbits/sec
[  3]  3.0- 4.0 sec  6.78 GBytes  58.2 Gbits/sec
[  3]  4.0- 5.0 sec  6.71 GBytes  57.6 Gbits/sec
[  3]  5.0- 6.0 sec  6.79 GBytes  58.3 Gbits/sec
[  3]  6.0- 7.0 sec  6.63 GBytes  56.9 Gbits/sec
[  3]  7.0- 8.0 sec  6.72 GBytes  57.7 Gbits/sec
[  3]  8.0- 9.0 sec  6.52 GBytes  56.0 Gbits/sec
[  3]  9.0-10.0 sec  6.78 GBytes  58.2 Gbits/sec
[  3]  0.0-10.0 sec  67.2 GBytes  57.7 Gbits/sec

This also begs the question: how much is minimally needed for a useful computing experience? Lab46 represents a likely obscene excess. Especially considering our outbound internet connections. Who here has a 55Gb internet connection? I barely have a 15Mb connection at home, and our campus connection to the LAIR is only 20Mb.

Still, recognizing the capacity for bandwidth also describes the general state of the machine. The more data it can haul, the more it takes to saturate it, so if general usage is consistently above the bandwidth usage, our experience is a pleasant one (which would make even the raspberry pi an ideal platform for our networking endeavors).

Test 3: sysbench cpu benchmarking (primes)

I discovered a sysbench benchmarking tool, which has a suite of different tests to perform.

We will start with the cpu tests. which involve a prime number calculation (of all things! Many of us have considerable experience and familiarity with such things).

Another factor that comes into play with CPUs (especially modern CPUs) are cores/execution units/threads.

This tool lets me specify the number of threads, so I have done so for the values of 1, 2, 4, 8, 16. I expect we will see a “sweet spot” of performance, where adding on more threads will not improve performance, and at some point will start to diminish performance.

The specific test is invoked by:

# sysbench --test=cpu --num-threads=# run

And produces output of the form:

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: #

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          X.YYYYs
    total number of events:              10000
    total time taken by event execution: X.YYYY
    per-request statistics:
         min:                                  X.YYms
         avg:                                  X.YYms
         max:                                  X.YYms
         approx.  95 percentile:               X.YYms

Threads fairness:
    events (avg/stddev):           XXXX.0000/0.00
    execution time (avg/stddev):   X.YYYY/0.00

I am specifically going to be comparing the “total time:” values for the different thread values, across the different test environments.

I simply redirected all output for each run into a text file, by the name of sysbench.cpu.out.# (where # are the number of threads).

Results: renegade

Our results:

# grep -H 'total time:' sysbench.cpu.out.*
sysbench.cpu.out.01:    total time:                          147.0570s
sysbench.cpu.out.02:    total time:                          73.8700s
sysbench.cpu.out.04:    total time:                          36.9689s
sysbench.cpu.out.08:    total time:                          37.0156s
sysbench.cpu.out.16:    total time:                          36.8539s

Results: pi3b

# grep -H 'total time:' sysbench.cpu.out.*
sysbench.cpu.out.01:    total time:                          139.0519s
sysbench.cpu.out.02:    total time:                          69.8045s
sysbench.cpu.out.04:    total time:                          34.8948s
sysbench.cpu.out.08:    total time:                          34.8903s
sysbench.cpu.out.16:    total time:                          34.8745s

Results: lab46

# grep -H 'total time:' sysbench.cpu.out.*
sysbench.cpu.out.01:    total time:                          12.0414s
sysbench.cpu.out.02:    total time:                          6.0233s
sysbench.cpu.out.04:    total time:                          3.3988s
sysbench.cpu.out.08:    total time:                          3.3668s
sysbench.cpu.out.16:    total time:                          3.3630s

Results: comparison

Placing values together in a table (in seconds, lower is better):

system 1 2 4 8 16
renegade (4.4) 147.0570 73.8700 36.9689 37.0156 36.8539
pi3b 139.0519 69.8045 34.8948 34.8903 34.8745
lab46 12.0414 6.0233 3.3988 3.3668 3.3630

Analysis of results

It should once again be quite obvious how much of a novelty these ARM boards are when it comes to general computing. Lab46 tends to be a factor of 12 better, across the board.

And comparing the two ARM boards, they are generally on par, and consistently. I was surprised to see the pi edging out the renegade by a few seconds each iteration. Considering the renegade is clocked faster, this is curious. I'll feel more satisfied when I see results with an optimized kernel. Note that I'm not expecting the renegade to significantly outperform the pi, but you would expect a slight improvement (perhaps ahead by as much the non-optimized one is currently behind). Again, this operation is largely CPU-bound, and we're looking at very similar CPUs.

Also, in a more proper setting, we would have run these tests a number of times, and taken an average of the times. This is because other things could have been happening on the system at any given time, and an average would help factor out some of those rough spots (like the renegade board on 8 threads; running that test a second time yielded a value of 36.8052s, which puts it somewhat more in-line with the subtle trend we are seeing in the values).

Test 4: sysbench fileio

Next up is a more I/O specific benchmark: file I/O.

sysbench provides six different top-level fileio tests:

  • seqrd - sequential read
  • seqwr - sequential write
  • seqrewr - sequential rewrite (what is a 'rewrite'?)
  • rndrd - random read
  • rndwr - random write
  • rndrw - random read/write

This touches on two important aspects of file access:

  • sequential access: useful when accessing some large file
  • random access: useful for accessing small files / modeling multiuser interactive usage (you never know what request will come in, when)

As well as the two common means of file access:

  • reading: retrieving data that already exists
  • writing: storing / modifying data in files

We see read/write specs plastered all over drives as marketing sell-points, but again, that's just a benchmark in and of itself… we're pulling back the covers a little bit and adding a bit more detail.

sysbench also lets us specify the number of threads, which can also play a role in fileio. We will be collecting results with thread counts of 1, 2, 4, 8, 16, 32, and 64 (powers of 2 are common examples, as a lot of processing resources come in powers of 2).

Also, there are some additional angles we will be focusing on:

  • primary storage (RAM): we can test memory access this way
  • secondary storage (SD card/SSD): where the system data is typically stored

Then, with systems like the renegade, not only do we have the factor of non-optimized vs. optimized kernel, potentially offering up significant differences in performance, but also the presence of the eMMC, in addition to the SD card. So: a lot of different environments to grab these metrics in.

Again, for now we're generally doing this just to get a sampling of what the renegade is capable of, and pulling in other systems as a means of comparison (the pi3b to show improvements over, and a system like lab46 to show that no matter how much improved me are over the pi3b, we're still in a rather niche computing envelope- but at the same time, where do personal computing needs fall between the renegade and a system like lab46?).

results: lab46 RAMdisk

variation 1 2 4 8 16 32 64
seqrd 0.0872 0.0704 0.0484 0.0402 0.0407 0.0408 0.0412
seqwr 0.1754 0.1676 0.1803 0.1821 0.1815 0.1824 0.1822
seqrewr 0.0894 0.0675 0.0749 0.0751 0.0749 0.0752 0.0761
rndrd 0.0287 0.0221 0.0135 0.0135 0.0142 0.0142 0.0148
rndwr 0.0356 0.0214 0.0139 0.0165 0.0159 0.0170 0.0146
rndrw 0.0365 0.0258 0.0191 0.0188 0.0183 0.0192 0.0198

results: lab46 /home (w/quota)

I was expecting a performance hit compared to the RAMdisk, but was surprised at just how much I had to scale back. For sanity and care of the machine, I opted not to go beyond thread counts of 8… loads were otherwise easily into the double digits.

variation 1 2 4 8
seqrd 0.0907 0.0678 0.0439 0.0394
seqwr 7.6899 7.3817 7.3391 7.1649
seqrewr 7.0684 6.9147 6.8467 6.8211
rndrd 0.0300 0.0237 0.0149 0.0150
rndwr 14.1868 9.7177 6.6635 5.5644
rndrw 7.2057 5.9529 4.1049 3.3158

One thought that came to mind is that the quota system may have been introducing some performance overhead. So as another variable to test, I did the same thing with quota disabled (actually on the lab46 backup system).

results: lab46 /home (no quota)

variation 1 2 4 8
seqrd 0.0858 0.0686 0.0400 0.0397
seqwr 6.9705 6.6353 6.4988 6.4122
seqrewr 6.4171 6.2243 6.1765 6.1118
rndrd 0.0293 0.0218 0.0139 0.0140
rndwr 14.2984 9.5623 6.7292 5.3937
rndrw 7.1826 5.8526 4.1655 3.1605

In the end we see that quota's presence has virtually no impact on performance of this benchmark. Good to know.

results: renegade (un-optimized kernel, SD card)

variation 1 2 4 8
seqrd 0.5670 0.2949 0.2766 0.2849
seqwr 33.2838 25.6750 25.6683 25.6675
seqrewr 24.8401 24.5240 24.5503 24.5604
rndrd 0.1991 0.1290 0.1068 0.1021
rndwr 14.6522 14.2006 23.5479 14.4117
rndrw 6.5032 6.4170 6.3617 6.3976

results: renegade (un-optimized kernel, eMMC module)

variation 1 2 4 8
seqrd 0.5775 0.2921 0.2701 0.3061
seqwr 16.8084 16.1169 16.0469 16.1425
seqrewr 13.4658 13.3482 13.3417 13.3636
rndrd 0.2010 0.1266 0.1012 0.1006
rndwr 12.6298 10.3616 10.1078 9.9995
rndrw 6.4106 5.4288 5.2387 5.0702

results: renegade (un-optimized kernel, RAMdisk)

To set up a RAMdisk, I did the following:

# mkdir /mnt/ramdisk
# mount -t tmpfs -o size=1024M tmpfs /mnt/ramdisk

I'm only making use of 512MiB of test data for sysbench, so this could have been a lot closer to 512MiB than 1024MiB… I'll certainly be doing that on the pi3b, since it has far less RAM available.

What'll be interesting to see is how well the renegade performs here. It uses DDR4, so does lab46, so we may see a surprising surge in performance in this category.

So, here are the values when the manipulated files are on a RAMdisk:

variation 1 2 4 8 16 32 64
seqrd 0.5977 0.3448 0.2768 0.2777 0.2801 0.2828 0.2863
seqwr 1.1285 0.9816 0.9913 0.9990 1.0005 1.0016 1.0071
seqrewr 0.5525 0.4466 0.4397 0.4449 0.4469 0.4499 0.4574
rndrd 0.2230 0.1344 0.0997 0.1049 0.1236 0.1280 0.1215
rndwr 0.2417 0.1718 0.1432 0.1487 0.1544 0.1481 0.1512
rndrw 0.2783 0.1820 0.1402 0.1479 0.1438 0.1656 0.2043

results: pi3b (SD card)

I suspect here is where we will see the pi start to fall flat… I/O is definitely NOT its strong suit. The benchmarks ran for an eternity on the pi, compared to even the renegade:

variation 1 2 4 8
seqrd 0.5583 0.3772 0.3610 0.3564
seqwr 47.8903 45.8135 41.3874 42.8373
seqrewr 44.8747 46.3611 48.1207 48.7688
rndrd 0.2211 0.1523 0.1396 0.1517
rndwr 71.6424 67.9128 68.6338 68.7887
rndrw 37.7170 32.1677 36.1581 36.2798

results: pi3b (RAMdisk)

Created with the same recipe I used on the renegade board.

variation 1 2 4 8 16 32 64
seqrd 0.5695 0.3644 0.3385 0.3403 0.3404 0.3411 0.3328
seqwr 1.0735 0.9783 0.9816 0.9845 0.9821 0.9801 0.9822
seqrewr 0.6841 0.5896 0.5895 0.5906 0.5903 0.5925 0.5932
rndrd 0.1914 0.1202 0.1140 0.1182 0.1113 0.1212 0.1154
rndwr 0.2477 0.1568 0.1458 0.1514 0.1576 0.1603 0.1590
rndrw 0.2275 0.1461 0.1319 0.1428 0.1393 0.1377 0.1463

Curiously, the pi3b held up, giving the renegade an on-par performance. This is surprising at face value, considering the pi uses DDR3. Maybe there's some level of support that still needs to be implemented to unlock the technology benefits (or, the task at hand doesn't heavily rely on technology uniqueness, and the pi3's memory may not be as encumbered as its other storage I/O).

comparison: all systems on sequential read

Again, since these are times (in seconds), lower is better.

host/attr/variation 1 2 4 8 16 32 64
renegade/unoptRAM/seqrd 0.5977 0.3448 0.2768 0.2777 0.2801 0.2828 0.2863
pi3b/RAM/seqrd 0.5695 0.3644 0.3385 0.3403 0.3404 0.3411 0.3328
lab46/RAM/seqrd 0.0872 0.0704 0.0484 0.0402 0.0407 0.0408 0.0412
renegade/unoptSD/seqrd 0.5670 0.2949 0.2766 0.2849 NA NA NA
renegade/unopteMMC/seqrd 0.5775 0.2921 0.2701 0.3061 NA NA NA
pi3b/SD/seqrd 0.5583 0.3772 0.3610 0.3564 NA NA NA
lab46/SSDquota/seqrd 0.0907 0.0678 0.0439 0.0394 NA NA NA
lab46/SSDnoquota/seqrd 0.0858 0.0686 0.0400 0.0397 NA NA NA

Resources

haxx/projects/sbc/renegade_benchmarks.txt · Last modified: 2018/03/06 16:36 by wedge