GMKtec K10

Info

GMKtec NucBox K10

Benchmarks against DL 380 G9ඞ

Green:

Processor: Intel Core i9-13900HK @ 5.20GHz (14 Cores / 20 Threads)
Motherboard: GMKtec (NucBox K10 0.12 BIOS)
Chipset: Intel Alder Lake PCH
Memory: 2 x 32 GB DDR5-5200MT/s
Disk: 1000GB CT1000E100SSD8
Graphics: Intel Raptor Lake-P [Iris Xe] (1500MHz)
Audio: Conexant SN6140
Network: Realtek RTL8125 2.5GbE + Intel Raptor Lake PCH CNVi WiFi
OS: Rocky Linux 10.0
Kernel: 6.12.0-55.22.1.el10_0.x86_64 (x86_64)
Compiler: GCC 14.2.1 20250110
File-System: xfs
Physical dimensions: 9.8 x 10.3 x 4.2 cm
Weight: 2.2kg
Power draw (idle): 18W
Power draw (max): 120W

Sol:

Processor: 2 x Intel Xeon E5-2637 v3 @ 3.70GHz (8 Cores / 16 Threads)
Motherboard: HP ProLiant DL380 Gen9 (P89 BIOS)
Chipset: Intel Xeon E7 v3/Xeon
Memory: 16 x 16 GB DDR4-2133MT/s 752369-081
Disk: 2400GB LOGICAL VOLUME (8x600GB 10K SAS RAID10 Storage Controller HP Smart Array P440ar)
Graphics: Matrox MGA G200EH
Network: 4 x Broadcom NetXtreme BCM5719 PCIe
OS: Debian 12
Kernel: 6.8.12-9-pve (x86_64)
Compiler: GCC 12.2.0
File-System: ext4
Physical dimensions: 8.73 x 44.55 x 67.94 cm
Weight: 23.6 kg
Power draw (idle): 112W
Power draw (max): 500W

CPU Benchmarksඞ

Sol has: 2 x Intel Xeon E5-2637 v3 @ 3.70GHz (8 Cores / 16 Threads)

Green has: Intel Core i9-13900HK @ 5.20GHz (14 Cores / 20 Threads)

Results on openbenchmarking.org:

Test	Green	Sol	Difference
x265 4K	5.91 fps	5.55 fps	~1.1x faster
x265 1080p	20.98 fps	15.57 fps	~1.3x faster
7-Zip Compression	90,807 MIPS	48,654 MIPS	~1.9x faster
7-Zip Decompression	57,362 MIPS	42,452 MIPS	~1.4x faster
Kernel Compile time	122.9 s	227.1 s	~1.8x faster
OpenSSL SHA256	12.9 GB/s	2.3 GB/s	~5.6x faster
OpenSSL SHA512	4.33 GB/s	2.43 GB/s	~1.8x faster
RSA4096 Sign	2095 ops/s	1299 ops/s	~1.6x faster
RSA4096 Verify	132,978 ops/s	87,888 ops/s	~1.5x faster
AES-128-GCM	65.8 GB/s	28.4 GB/s	~2.3x faster
AES-256-GCM	57.3 GB/s	21.9 GB/s	~2.6x faster
ChaCha20	26.4 GB/s	27.0 GB/s	~1.0x (equal)
ChaCha20-Poly1305	14.9 GB/s	16.9 GB/s	~0.1x slower (Xeon wins!)
Redis GET (50 conn)	4.33M req/s	2.15M req/s	~2.0x more
Redis GET (500 conn)	2.66M req/s	1.96M req/s	~1.4x more
Redis GET (1000 conn)	2.32M req/s	2.18M req/s	~1.1x more
Redis SET (50 conn)	2.79M req/s	1.69M req/s	~1.6x more
Redis SET (500 conn)	2.24M req/s	1.66M req/s	~1.3x more
Redis SET (1000 conn)	2.31M req/s	1.69M req/s	~1.4x more
Redis LPOP (50 conn)	4.43M req/s	2.42M req/s	~1.8x more
Redis LPOP (500 conn)	3.14M req/s	2.46M req/s	~1.3x more
Redis LPOP (1000 conn)	2.72M req/s	1.55M req/s	~1.8x more
Redis LPUSH (50 conn)	2.75M req/s	1.12M req/s	~2.5x more
Redis LPUSH (500 conn)	1.79M req/s	1.54M req/s	~1.2x more
Redis LPUSH (1000 conn)	1.79M req/s	1.49M req/s	~1.2x more
Redis SADD (50 conn)	3.07M req/s	1.93M req/s	~1.6x more
Redis SADD (500 conn)	2.57M req/s	1.85M req/s	~1.4x more
Redis SADD (1000 conn)	2.48M req/s	1.97M req/s	~1.3x more
MariaDB (1 client)	2685 QPS	207 QPS	~13.0x more
MariaDB (32 clients)	2055 QPS	103 QPS	~20.0x more
MariaDB (64 clients)	1621 QPS	76 QPS	~21.0x more
MariaDB (128 clients)	820 QPS	44 QPS	~18.6x more
MariaDB (256 clients)	397 QPS	7 QPS	~57x more
MariaDB (512 clients)	184 QPS	3 QPS	~61x more
MariaDB (1024-8192 clients)	~84-83 QPS	~3 QPS	~28x more

CPU Benchmarks with only P-coresඞ

Since i9-13900HK features two kinds of cores, performance aka P-cores and efficiency aka E-cores (also known as Atom cores), it provided an interesting opportunity to test performance with E-cores off.

Check for E-cores: cat /sys/devices/cpu_atom/cpus

Run task on specific cores (0-11 are P-cores in this case): taskset -c 0-11 <COMMAND>

Results on openbenchmarking.org: https://openbenchmarking.org/result/2509142-NE-CPUGREENN25

Test	All cores	Only P-cores	Difference
x265 4K	5.91 fps	4.24 fps	~1.39x faster
x265 1080p	20.98 fps	17.51 fps	~1.20x faster
7-Zip Compression	90,807 MIPS	64,768 MIPS	~1.40x faster
7-Zip Decompression	57,362 MIPS	37,612 MIPS	~1.53x faster
Linux Kernel Compile	122.90 s	178.45 s	~1.45x faster
OpenSSL SHA256	12.90 GB/s	7.45 GB/s	~1.73x faster
OpenSSL SHA512	4.33 GB/s	2.66 GB/s	~1.63x faster
OpenSSL RSA4096 Sign	2,095 /s	1,625 /s	~1.29x faster
OpenSSL RSA4096 Verify	132,978 /s	104,828 /s	~1.27x faster
OpenSSL ChaCha20	26.41 GB/s	18.92 GB/s	~1.40x faster
OpenSSL AES-128-GCM	65.83 GB/s	38.84 GB/s	~1.70x faster
OpenSSL AES-256-GCM	57.26 GB/s	33.47 GB/s	~1.71x faster
OpenSSL ChaCha20-Poly1305	14.91 GB/s	11.60 GB/s	~1.28x faster
Redis GET (50 conns)	4,334,402	1,038,531	~4.17x faster
Redis SET (50 conns)	2,789,408	1,037,535	~2.69x faster
Redis GET (500 conns)	2,664,632	1,756,982	~1.52x faster
Redis LPOP (50 conns)	4,426,627	796,807	~5.55x faster
Redis SET (500 conns)	2,238,991	1,425,186	~1.57x faster
Redis LPOP (1000 conns)	2,720,310	1,467,308	~1.85x faster
Redis LPUSH (50 conns)	2,748,150	688,325	~3.99x faster
Redis SADD (500 conns)	2,567,202	1,891,882	~1.36x faster

Multi-threaded tasks benefit significantly from E-cores. Decompression sees a bigger relative improvement, likely due to better multi-thread scaling. Full core usage speeds up compilation by about a third.

Single-threaded or lightly threaded tasks benefit less but still gain some improvement when E-cores assist background threads.

Redis throughput improves dramatically with E-cores enabled, especially at lower concurrency where single-thread performance matters less, and total parallelism dominates.

For mixed workloads (databases, Redis, compression, video encoding), enabling all cores is needed for maximum performance. Disabling E-cores mainly limits total throughput for parallel tasks.

CPU Benchmarks with only one CPU on dual socketඞ

Since Sol has two CPU sockets and dual CPUs, it was interesting to run benchmarks only on one of them to see a difference in terms of more equal comparison in terms of core/thread counts between the systems and, perhaps, to see how much NUMA affects the results.

Run task only on first CPU: numactl --cpubind=0 --membind=0 <COMMAND>

Results on openbenchmarking.org: https://openbenchmarking.org/result/2509146-NE-CPUSOLSIN78

Test	Dual CPU	Single CPU socket	Difference
x265 4K (Bosphorus)	5.55 fps	6.16 fps	~0.90x (slower dual)
x265 1080p (Bosphorus)	15.57 fps	15.84 fps	~0.98x (similar)
7-Zip Compression	48654 MIPS	25680 MIPS	~1.90x faster
7-Zip Decompression	42452 MIPS	21362 MIPS	~1.99x faster
Linux Kernel Build (defconfig)	227.11 s (lower=better)	441.74 s	~1.94x faster
OpenSSL SHA256	2.30e9 B/s	1.15e9 B/s	~2.00x faster
OpenSSL SHA512	2.43e9 B/s	1.22e9 B/s	~1.99x faster
OpenSSL RSA4096 sign	1299.7 ops/s	651.8 ops/s	~1.99x faster
OpenSSL RSA4096 verify	87888 ops/s	43995 ops/s	~2.00x faster
OpenSSL ChaCha20	2.70e10 B/s	1.35e10 B/s	~2.00x faster
OpenSSL AES-128-GCM	2.84e10 B/s	1.42e10 B/s	~2.00x faster
OpenSSL AES-256-GCM	2.19e10 B/s	1.10e10 B/s	~2.00x faster
OpenSSL ChaCha20-Poly1305	1.69e10 B/s	8.48e9 B/s	~2.00x faster
Redis GET (50 conn)	2,145,363 req/s	1,496,068 req/s	~1.43x faster
Redis SET (50 conn)	1,688,719 req/s	993,505 req/s	~1.70x faster
Redis LPOP (50 conn)	2,417,681 req/s	1,492,706 req/s	~1.62x faster
Redis SADD (50 conn)	1,926,472 req/s	1,170,568 req/s	~1.65x faster
Redis LPUSH (50 conn)	1,116,586 req/s	898,764 req/s	~1.24x faster
Redis GET (500 conn)	1,963,410 req/s	1,466,446 req/s	~1.34x faster
Redis SET (500 conn)	1,659,920 req/s	1,081,548 req/s	~1.54x faster
Redis LPOP (500 conn)	2,459,230 req/s	1,660,237 req/s	~1.48x faster
Redis SADD (500 conn)	1,849,049 req/s	1,323,014 req/s	~1.40x faster
Redis LPUSH (500 conn)	1,538,552 req/s	964,594 req/s	~1.60x faster
Redis GET (1000 conn)	2,184,990 req/s	1,437,471 req/s	~1.52x faster
Redis SET (1000 conn)	1,688,344 req/s	1,087,756 req/s	~1.55x faster
Redis LPOP (1000 conn)	1,552,459 req/s	1,493,120 req/s	~1.04x faster
Redis SADD (1000 conn)	1,972,121 req/s	1,279,570 req/s	~1.54x faster
Redis LPUSH (1000 conn)	1,491,898 req/s	951,028 req/s	~1.57x faster

Compute-heavy workloads (7-Zip, OpenSSL, kernel build) scale almost perfectly (~2x) with both sockets.

x265 scaling is poor - single-socket actually edges out dual in 4K and is nearly identical in 1080p.

CPU Bench Summaryඞ

Test	Winner	Difference
Video Encoding (x265)	Green	~1.1-1.3x faster
Compression (7-Zip)	Green	~1.4-1.9x faster
Kernel Compilation	Green	~1.8x faster
Cryptography (SHA, RSA, AES)	Green	~1.5-5.6x faster
Cryptography (ChaCha20/Poly1305)	sol	~1.1x faster
Redis (in-memory DB)	Green	~1.2-2.5x more req/s
MariaDB (SQL DB)	Green	~13-61x more QPS

Green has way better IPC with modern architecture, and in some cases can beat parallel loads (7-Zip compression) even if Sol has more threads.

Cryptography improvements likely due to AVX2/AVX512 and AES-NI optimizations on Alder Lake vs Haswell-era Xeons. Surprisingly ChaCha20 performance is equal or even slightly better on Sol, indicating that Green's crypto accelerators favor AES but not ChaCha20. Also RSA4096 and other public-key operations scale less dramatically (~1.5-1.6x), reflecting their compute-bound nature with less dependency on memory bandwidth. ChaCha20 being faster on older Xeon is a rare scenario where lack of AES acceleration helps.

Green consistently outperforms Sol on Redis benchmarks. However at higher concurrent connections, performance delta shrinks slightly, indicating that memory and I/O subsystem become the bottleneck at scale.

MariaDB results in a massive difference in performance not just due to CPU but also NVMe vs RAID SAS latency. CPU cannot shine fully if disk latency dominates.

Disk Benchmarksඞ

Sol has: 2400GB LOGICAL VOLUME (8x600GB 10K SAS RAID10 Storage Controller HP Smart Array P440ar) on ext4

Green has: NVMe 1000GB CT1000E100SSD8 on xfs

Results on openbenchmarking.org:

Test	Green	Sol	Difference
SQLite 1 thread	9.747 s	231.45 s	~24x faster
SQLite 2 threads	19.32 s	416.77 s	~22x faster
SQLite 4 threads	18.64 s	556.95 s	~30x faster
SQLite 8 threads	32.27 s	719.51 s	~22x faster
SQLite 16 threads	48.89 s	948.19 s	~19x faster
FIO Random Write 2MB MB/s	1.187 GB/s	0.477 GB/s	~2.5x faster
FIO Random Write 4KB MB/s	1.176 GB/s	0.489 GB/s	~2.4x faster
FIO Sequential Read 2MB MB/s	1.955 GB/s	0.884 GB/s	~2.2x faster
FIO Sequential Read 4KB MB/s	1.942 GB/s	0.870 GB/s	~2.2x faster
FIO Sequential Write 2MB MB/s	1.523 GB/s	0.737 GB/s	~2.1x faster
FIO Sequential Write 4KB MB/s	1.501 GB/s	0.761 GB/s	~2x faster
FS-Mark 1000 Files	829.7 files/s	43.7 files/s	~19x faster
FS-Mark 4000 Files, 32 Dirs	805.8 files/s	44.8 files/s	~18x faster
Dbench 1 client	398.87 MB/s	24.40 MB/s	~16x faster
Dbench 12 clients	2047.02 MB/s	127.63 MB/s	~16x faster
IOR 2MB / MB/s	2608.21	199.94	~13x faster
IOR 4MB / MB/s	2897.53	236.61	~12x faster
IOR 8MB / MB/s	3085.04	308.99	~10x faster
IOR 16MB / MB/s	3168.04	346.90	~9x faster
IOR 32MB / MB/s	2879.00	407.15	~7x faster
IOR 64MB / MB/s	2875.59	465.83	~6x faster
IOR 256MB / MB/s	3049.90	459.17	~6.6x faster
IOR 512MB / MB/s	2986.88	409.78	~7.3x faster
IOR 1024MB / MB/s	3391.72	382.75	~8.9x faster

Disk Benchmark Summaryඞ

Test	Winner	Difference
SQLite (1-16 threads)	Green	~19-30x faster
FIO Random Write 2MB / 4KB	Green	~2-2.5x faster
FIO Sequential Read 2MB / 4KB	Green	~2.2x faster
FIO Sequential Write 2MB / 4KB	Green	~2x faster
FS-Mark small files	Green	~18-19x faster
Dbench (1/12 clients)	Green	~16x faster
IOR small blocks (2-8MB)	Green	~10-13x faster
IOR medium blocks (16-64MB)	Green	~6-9x faster
IOR large blocks (256-1024MB)	Green	~6-9x faster

SMP performance due to DDR5's higher frequency and wider bus per channel.

STREAM benchmarks favor Sol which is likely due to Sol's 4-channel DDR4 memory configuration (16 DIMMs across 2 CPUs), providing higher aggregate bandwidth despite slower individual DIMMs. STREAM favoring Sol shows multi-channel DDR4 can beat DDR5 for large sequential streams, but only for continuous bulk operations.

Green shows enormous advantage in cache writes, reflecting modern CPU caches with higher associativity and faster L2/L3.

Green's memcpy/memset performance shows low-latency benefits for small-block memory operations.

Threaded memory operations show Green's superior per-thread latency and IPC, particularly in multi-threaded small-memory scenarios.

Green benefits from modern DDR5 and low-latency caches for small-block memory and per-thread operations, whereas Sol's memory excels in sustained high-bandwidth workloads due to many-channel DDR4 configuration.

RAM Benchmarksඞ

Sol has: 16 x 16 GB DDR4-2133MT/s 752369-081

Green has: 2 x 32 GB DDR5-5200MT/s

Results on openbenchmarking.org:

https://openbenchmarking.org/result/2509141-NE-RAMSOL48015
https://openbenchmarking.org/result/2509145-NE-RAMGREEN277

Test	Green	Sol	Difference
RAMspeed SMP Add (Integer)	49170.75 MB/s	25245.05 MB/s	~1.9x faster
RAMspeed SMP Copy (Integer)	46294.82 MB/s	24593.03 MB/s	~1.9x faster
RAMspeed SMP Scale (Integer)	44536.78 MB/s	22317.15 MB/s	~2x faster
RAMspeed SMP Triad (Integer)	47968.84 MB/s	24709.48 MB/s	~1.9x faster
RAMspeed SMP Average (Integer)	44173.83 MB/s	24336.00 MB/s	~1.8x faster
RAMspeed SMP Add (Floating Point)	45365.78 MB/s	26331.63 MB/s	~1.7x faster
RAMspeed SMP Copy (Floating Point)	43004.43 MB/s	25118.17 MB/s	~1.7x faster
RAMspeed SMP Scale (Floating Point)	42716.91 MB/s	22584.29 MB/s	~1.9x faster
RAMspeed SMP Triad (Floating Point)	44670.74 MB/s	26515.81 MB/s	~1.7x faster
RAMspeed SMP Average (Floating Point)	43906.89 MB/s	25257.18 MB/s	~1.7x faster
Stream Copy	64740.8 MB/s	85814.2 MB/s	~1.3x slower
Stream Scale	54172.5 MB/s	65886.2 MB/s	~1.2x slower
Stream Triad	56556.2 MB/s	71293.8 MB/s	~1.3x slower
Stream Add	56446.8 MB/s	71080.0 MB/s	~1.3x slower
Tinymembench Memcpy	28540.4 MB/s	9522.9 MB/s	~3x faster
Tinymembench Memset	56744.2 MB/s	5950.6 MB/s	~9.5x faster
MBW Memory Copy	26202.42 MiB/s	11409.51 MiB/s	~2.3x faster
MBW Memory Copy Fixed Block	14090.10 MiB/s	4169.46 MiB/s	~3.4x faster
t-test1 Threads 1	18.05 s	32.38 s	~1.8x faster
t-test1 Threads 2	5.814 s	16.51 s	~2.8x faster
CacheBench Read Cache	20550.62 MB/s	9333.97 MB/s	~2.2x faster
CacheBench Write Cache	268593.70 MB/s	41921.13 MB/s	~6.4x faster

RAM Benchmark Summaryඞ

Test	Winner	Difference
RAMspeed SMP (Integer)	Green	~1.8-2x faster
RAMspeed SMP (Floating Point)	Green	~1.7x faster
Stream	Sol	~1.2-1.3x faster
Tinymembench Memcpy	Green	~3x faster
Tinymembench Memset	Green	~9.5x faster
MBW Memory Copy	Green	~2.3x faster
MBW Memory Copy Fixed Block	Green	~3.4x faster
t-test1 Threads 1	Green	~1.8x faster
t-test1 Threads 2	Green	~2.8x faster
CacheBench Read Cache	Green	~2.2x faster
CacheBench Write Cache	Green	~6.4x faster

SQLite single-threaded performance shows 24x advantage for Green; even with multiple threads, Green dominates. NVMe latency vs SAS explains this.

FIO sequential/random throughput is 2-2.5x higher on Green. NVMe scales better due to deep queue depth and modern controller efficiency.

IOR shows Green achieving 6-13x higher MB/s, with the delta shrinking for very large I/O blocks (32-1024MB). This suggests that RAID10 overhead in Sol becomes less significant for very large sequential transfers, but NVMe still dominates.

FS-Mark and Dbench show Green is ~16-19x faster. This is the combined effect of NVMe low latency, XFS efficiency, and fewer mechanical seek delays than 10K SAS disks.

NVMe SSDs provide massive latency and throughput advantages for small and large workloads alike. RAID10 SAS arrays are good for sustained large-block transfers, but fall behind in low-latency, metadata-heavy scenarios.

Overall notesඞ

Green's smaller core count is offset by high IPC, modern caches, and DDR5 bandwidth. Many benchmarks (MariaDB, Redis, crypto) show that per-core performance is more important than total threads, especially for latency-sensitive workloads.

MariaDB and SQLite show extreme differences, not just due to CPU but also NVMe vs RAID SAS latency.

High-speed RAM benefits NVMe latency hiding. Small-block I/O (SQLite, Dbench) benefits from Green's faster memory and cache hierarchy. Sol's large-channel memory helps for sustained sequential FIO reads, but cannot compensate for mechanical seek times in small-block workloads.

Enterprise Xeons still shine in extreme multi-threaded sequential memory or I/O scenarios, but their architecture is tuned for consistency and parallel throughput, not latency-sensitive operations.