Avatar for the OpenMathLib user
OpenMathLib
OpenBLAS
BlogDocsChangelog

Improve performance on edges of GEMM for RISC-V

#5674
Comparing
ChipKerchner:fasterRVVEdges
(
daa3215
) with
develop
(
66cc9f0
)
CodSpeed Performance Gauge
0%
Untouched
62

Benchmarks

62 total
test_dgemv[100-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
150.2 µs149.7 µs
test_dot[100]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
22.6 µs22.5 µs
test_gesv[100-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
395 µs394.3 µs
test_gesv[100-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
696.2 µs695.5 µs
test_gesv[100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
257.4 µs257.1 µs
test_gesdd[mn0-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
122.3 µs122.3 µs
test_syev[50-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
1.3 ms1.3 ms
test_syev[50-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
1.4 ms1.4 ms
test_dgemv[1000-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
7 ms7 ms
test_syrk[1000-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
65.4 ms65.4 ms
test_gemm[1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
426 ms426 ms
test_syrk[1000-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
476.4 ms476.4 ms
test_gesdd[mn1-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
94 ms94 ms
test_syrk[100-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
856.8 µs856.8 µs
test_gesv[1000-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
52.6 ms52.6 ms
test_gemm[1000-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
117.4 ms117.4 ms
test_gesv[1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
188.6 ms188.6 ms
test_gemm[1000-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
875.6 ms875.6 ms
test_gemm[100-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
1.2 ms1.2 ms
test_dgemv[1000-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
26.3 ms26.3 ms
test_gemm[1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
239.4 ms239.4 ms
test_gesdd[mn1-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
65.4 ms65.4 ms
test_syrk[1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
130.3 ms130.4 ms
test_syrk[1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
227.5 ms227.6 ms
test_gemm[100-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
659.9 µs659.9 µs

Commits

Click on a commit to change the comparison range
Base
develop
66cc9f0
-0.02%
Fast performing edges for FP32 GEMM of RVV.
376d3a1
20 days ago
by ChipKerchner
-0.07%
Add bool types for C.
6d6af1d
20 days ago
by ChipKerchner
-0.45%
Add K-unrolling to M = 8. Other small changes.
9c16449
19 days ago
by ChipKerchner
+0.1%
Unroll K for N less than or equal to 4.
fda433f
19 days ago
by ChipKerchner
-0.01%
Common unroll code.
eb9bbcc
18 days ago
by ChipKerchner
+0.02%
Preserve K.
b0ee407
18 days ago
by ChipKerchner
+0.11%
Better K.
010f24f
16 days ago
by ChipKerchner
-0.02%
Global optimizations.
f927b94
16 days ago
by ChipKerchner
+0.31%
Use mf2 instead of m1.
79d9fe3
15 days ago
by ChipKerchner
+0.02%
Simplier loops.
477dd40
15 days ago
by ChipKerchner
0%
More global optimzation and clean up.
d832ee5
14 days ago
by ChipKerchner
+0.02%
Merge remote-tracking branch 'origin/develop' into fasterRVVEdges
1e48686
13 days ago
by ChipKerchner
-0.02%
Avoid greater than 4 segment load and store penalties by using 2. Fix mf2 length.
a8a00bb
13 days ago
by ChipKerchner
+0.02%
Only initialize unused variables to prevent GCC warnings.
1bb72b2
12 days ago
by ChipKerchner
+0.01%
Fix typo.
ebf4cd1
10 days ago
by ChipKerchner
+0.03%
Fix another typo.
8fc0004
8 days ago
by ChipKerchner
-0.08%
Convert 2X LMUL1 instructions to 1X LMUL2. Improved FP64 GEMM edges - up to more than 3X faster.
d69be17
2 days ago
by ChipKerchner
-0.01%
Remove shadow variable.
daa3215
1 day ago
by ChipKerchner
© 2026 CodSpeed Technology
Home Terms Privacy Docs