Skip to content

Radxa Cubie A7Z #100

@bbllmaster

Description

@bbllmaster

Basic information

  • Board URL (official): https://docs.radxa.com/cubie/a7z
  • Board purchased from: TODO
  • Board purchase date: TODO
  • Board specs (as tested): TODO
  • Board price (as tested): TODO

Linux/system information

# output of `neofetch`
       _,met$$$$$gg.          radxa@radxa-cubie-a7z
    ,g$$$$$$$$$$$$$$$P.       ---------------------
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux 11 (bullseye) aarch64
 ,$$P'              `$$$.     Host: sun60iw2
',$$P       ,ggs.     `$$b:   Kernel: 5.15.147-12-a733
`d$$'     ,$P"'   .    $$$    Uptime: 1 hour, 30 mins
 $$P      d$'     ,    $$P    Packages: 1816 (dpkg)
 $$:      $$.   -    ,d$$'    Shell: bash 5.1.4
 $$;      Y$b._   _,d$P'      Resolution: 3456x1080
 Y$$.    `.`"Y$$$$P"'         Theme: Adwaita [GTK3]
 `$$b      "-.__              Icons: Adwaita [GTK3]
  `Y$$                        Terminal: /dev/pts/1
   `Y$$.                      CPU: (8) @ 1.794GHz
     `$$b.                    Memory: 1230MiB / 7934MiB
       `Y$$b.
          `"Y$b._
              `"""
# output of `uname -a`
Linux radxa-cubie-a7z 5.15.147-12-a733 #12 SMP PREEMPT Tue Dec 9 07:24:08 UTC 2025 aarch64 GNU/Linux

System topology

Image

Benchmark results

CPU

Power

  • Idle power draw (at wall): TODO W
    With WiFi: 1.6W
  • Maximum simulated power draw (stress-ng --matrix 0): 5.6 W
  • During Geekbench multicore benchmark: 4.3 W
  • During top500 HPL benchmark: TODO W

Disk

MANUFACTURER_AND_MODEL_OF_DISK_HERE

Kingston SD Canvas Go! Plus(64G)

Benchmark Result
iozone 4K random read 11.06 MB/s
iozone 4K random write 11.10 MB/s
iozone 1M random read 78.74 MB/s
iozone 1M random write 46.45 MB/s
iozone 1M sequential read 78.78 MB/s
iozone 1M sequential write 60.03 MB/s
wget https://raw.githubusercontent.com/geerlingguy/pi-cluster/master/benchmarks/disk-benchmark.sh
chmod +x disk-benchmark.sh
sudo MOUNT_PATH=/ TEST_SIZE=1g ./disk-benchmark.sh

Run benchmark on any attached storage device (e.g. eMMC, microSD, NVMe, SATA) and add results under an additional heading.

Network

iperf3 results:

  • iperf3 -c $SERVER_IP: 231 Mbps
  • iperf3 -c $SERVER_IP --reverse: 297 Mbps
  • iperf3 -c $SERVER_IP --bidir: 234 Mbps up, 196 Mbps down

(Be sure to test all interfaces, noting any that are non-functional.)

GPU

glmark2

glmark2-es2 / glmark2-es2-wayland results:

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      Imagination Technologies
    GL_RENDERER:    PowerVR B-Series BXM-4-64
    GL_VERSION:     OpenGL ES 3.2 build 24.2@6603887
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 457 FrameTime: 2.189 ms
[build] use-vbo=true: FPS: 514 FrameTime: 1.947 ms
[texture] texture-filter=nearest: FPS: 564 FrameTime: 1.774 ms
[texture] texture-filter=linear: FPS: 539 FrameTime: 1.856 ms
[texture] texture-filter=mipmap: FPS: 565 FrameTime: 1.772 ms
[shading] shading=gouraud: FPS: 432 FrameTime: 2.319 ms
[shading] shading=blinn-phong-inf: FPS: 436 FrameTime: 2.298 ms
[shading] shading=phong: FPS: 392 FrameTime: 2.556 ms
[shading] shading=cel: FPS: 381 FrameTime: 2.630 ms
[bump] bump-render=high-poly: FPS: 260 FrameTime: 3.854 ms
[bump] bump-render=normals: FPS: 588 FrameTime: 1.703 ms
[bump] bump-render=height: FPS: 566 FrameTime: 1.768 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 386 FrameTime: 2.592 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 207 FrameTime: 4.836 ms
[pulsar] light=false:quads=5:texture=false: FPS: 659 FrameTime: 1.519 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 181 FrameTime: 5.528 ms
[desktop] effect=shadow:windows=4: FPS: 461 FrameTime: 2.172 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 161 FrameTime: 6.214 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 162 FrameTime: 6.192 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 262 FrameTime: 3.825 ms
[ideas] speed=duration: FPS: 568 FrameTime: 1.762 ms
[jellyfish] <default>: FPS: 275 FrameTime: 3.641 ms
[terrain] <default>: FPS: 28 FrameTime: 37.021 ms
[shadow] <default>: FPS: 301 FrameTime: 3.327 ms
[refract] <default>: FPS: 53 FrameTime: 18.995 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 652 FrameTime: 1.535 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 448 FrameTime: 2.234 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 621 FrameTime: 1.611 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 518 FrameTime: 1.931 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 391 FrameTime: 2.562 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 530 FrameTime: 1.888 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 537 FrameTime: 1.865 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 510 FrameTime: 1.961 ms
=======================================================
                                  glmark2 Score: 411 
=======================================================


vkmark

vkmark results:

1. Install vkmark with `sudo apt install -y vkmark`
2. Run `vkmark` (with `DISPLAY=:0` prepended if running over SSH)
3. Replace this block of text with the results.

Note: vkmark needs to be compiled from source on Debian 12 and earlier.

GravityMark

GravityMark results:

1. Download the latest version of GravityMark: https://gravitymark.tellusim.com
2. Run `chmod +x [downloaded_filename].run`
3. Run `sudo ./[downloaded_filename].run` and press `y` to accept the terms.
4. Open the link it prints, and run the Benchmark defaults, changing to 720p resolution and 50,000 asteroids.

Note: These benchmarks require an active display on the device. Not all devices may be able to run glmark2-es2, so in that case, make a note and move on!

AI / LLM Inference

Basic AI inference results:

# Install ollama
curl -fsSL https://ollama.com/install.sh | sh

# Download some models
ollama pull llama3.2:3b \
  && ollama pull llama3.1:8b \
  && ollama pull llama2:13b

# Download the benchmarking script
git clone https://github.com/geerlingguy/ai-benchmarks.git
cd ai-benchmarks

# Run benchmark on multiple models
declare -a models=("llama3.2:3b" "llama3.1:8b" "llama2:13b")
for i in "${models[@]}"; do ./obench.sh -m "$i" -c 3 --markdown; done

Note that Ollama will run on the CPU if no valid GPU / drivers are present. Be sure to note whether Ollama runs on the CPU, GPU, or a dedicated NPU.

Memory

tinymembench results:

Click to expand memory benchmark result
tinymembench v0.4.10 (simple benchmark for memory throughput and latency)
==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :   1930.4 MB/s (1.0%)
 C copy backwards (32 byte blocks)                    :   1927.6 MB/s (1.5%)
 C copy backwards (64 byte blocks)                    :   1938.7 MB/s (0.7%)
 C copy                                               :   3735.1 MB/s (0.5%)
 C copy prefetched (32 bytes step)                    :   1760.6 MB/s (1.0%)
 C copy prefetched (64 bytes step)                    :   3687.9 MB/s (1.0%)
 C 2-pass copy                                        :   2119.8 MB/s (0.3%)
 C 2-pass copy prefetched (32 bytes step)             :   1361.4 MB/s (0.8%)
 C 2-pass copy prefetched (64 bytes step)             :   5372.3 MB/s (34.5%)
 C fill                                               :   8459.6 MB/s (0.5%)
 C fill (shuffle within 16 byte blocks)               :   8457.8 MB/s (0.5%)
 C fill (shuffle within 32 byte blocks)               :   8463.3 MB/s (0.7%)
 C fill (shuffle within 64 byte blocks)               :   8063.6 MB/s (0.8%)
 NEON 64x2 COPY                                       :   5274.0 MB/s (0.9%)
 NEON 64x2x4 COPY                                     :   5277.1 MB/s (0.6%)
 NEON 64x1x4_x2 COPY                                  :   1853.4 MB/s (0.4%)
 NEON 64x2 COPY prefetch x2                           :   4529.7 MB/s (0.7%)
 NEON 64x2x4 COPY prefetch x1                         :   4817.2 MB/s (0.5%)
 NEON 64x2 COPY prefetch x1                           :   4551.9 MB/s (0.4%)
 NEON 64x2x4 COPY prefetch x1                         :   4822.6 MB/s (0.5%)
 ---
 standard memcpy                                      :   5283.0 MB/s (0.5%)
 standard memset                                      :   8449.4 MB/s (0.7%)
 ---
 NEON LDP/STP copy                                    :   5271.6 MB/s (0.9%)
 NEON LDP/STP copy pldl2strm (32 bytes step)          :   5230.8 MB/s (0.5%)
 NEON LDP/STP copy pldl2strm (64 bytes step)          :   5246.1 MB/s (0.7%)
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   5237.8 MB/s (0.7%)
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   5249.1 MB/s (0.5%)
 NEON LD1/ST1 copy                                    :   5274.1 MB/s (0.5%)
 NEON STP fill                                        :   8472.4 MB/s (0.5%)
 NEON STNP fill                                       :   8462.2 MB/s (1.0%)
 ARM LDP/STP copy                                     :   5274.9 MB/s (0.5%)
 ARM STP fill                                         :   8461.0 MB/s (0.6%)
 ARM STNP fill                                        :   8453.6 MB/s (0.7%)

==========================================================================
== Framebuffer read tests.                                              ==
==                                                                      ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled.       ==
== Writes to such framebuffers are quite fast, but reads are much       ==
== slower and very sensitive to the alignment and the selection of      ==
== CPU instructions which are used for accessing memory.                ==
==                                                                      ==
== Many x86 systems allocate the framebuffer in the GPU memory,         ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
== PCI-E is asymmetric and handles reads a lot worse than writes.       ==
==                                                                      ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer    ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
== performance improvement. For example, the xf86-video-fbturbo DDX     ==
== uses this trick.                                                     ==
==========================================================================

 NEON LDP/STP copy (from framebuffer)                 :   1224.1 MB/s (0.7%)
 NEON LDP/STP 2-pass copy (from framebuffer)          :   1085.2 MB/s (0.3%)
 NEON LD1/ST1 copy (from framebuffer)                 :   1230.4 MB/s (0.6%)
 NEON LD1/ST1 2-pass copy (from framebuffer)          :   1097.2 MB/s (0.3%)
 ARM LDP/STP copy (from framebuffer)                  :   1167.8 MB/s (0.5%)
 ARM LDP/STP 2-pass copy (from framebuffer)           :   1082.3 MB/s (0.4%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns
      2048 :    0.0 ns          /     0.0 ns
      4096 :    0.0 ns          /     0.0 ns
      8192 :    0.0 ns          /     0.0 ns
     16384 :    0.0 ns          /     0.0 ns
     32768 :    0.0 ns          /     0.0 ns
     65536 :    0.0 ns          /     0.0 ns
    131072 :    1.3 ns          /     1.8 ns
    262144 :    2.6 ns          /     3.3 ns
    524288 :   14.5 ns          /    20.4 ns
   1048576 :   21.6 ns          /    26.8 ns
   2097152 :   82.3 ns          /   122.3 ns
   4194304 :  125.1 ns          /   166.4 ns
   8388608 :  149.5 ns          /   183.6 ns
  16777216 :  162.3 ns          /   193.3 ns
  33554432 :  171.5 ns          /   199.7 ns
  67108864 :  179.4 ns          /   207.1 ns

Core to Core Memory Latency

TODO: If this is a new CPU/SoC, run c2clat to generate a core to core memory access latency graph, and paste it here. For instructions, see: https://gist.github.com/geerlingguy/842974c0e49c201c28f4be54a05cc89c

sbc-bench results

Run sbc-bench and paste a link to the results here:

wget https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh
sudo /bin/bash ./sbc-bench.sh -r

Phoronix Test Suite

Results from pi-general-benchmark.sh:

  • pts/encode-mp3: TODO sec
  • pts/x264 1080p: TODO fps
  • pts/x264 4K: TODO fps
  • pts/phpbench: TODO
  • pts/build-linux-kernel (defconfig): TODO sec
    Kingston SD Canvas Go! Plus

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions