ChipKerchner:fasterRVVEdges - Branch - OpenMathLib/OpenBLAS

Improve performance on edges of GEMM for RISC-V

#5674

Comparing

ChipKerchner:fasterRVVEdges

(

daa3215

) with

develop

(

66cc9f0

)

Untouched: 62

Benchmarks

62 total

test_dgemv[100-c]

benchmark/pybench/benchmarks/bench_blas.py

150.2 µs149.7 µs

test_dot[100]

benchmark/pybench/benchmarks/bench_blas.py

22.6 µs22.5 µs

test_gesv[100-d]

benchmark/pybench/benchmarks/bench_blas.py

395 µs394.3 µs

test_gesv[100-c]

benchmark/pybench/benchmarks/bench_blas.py

696.2 µs695.5 µs

test_gesv[100-s]

benchmark/pybench/benchmarks/bench_blas.py

257.4 µs257.1 µs

test_gesdd[mn0-d]

benchmark/pybench/benchmarks/bench_blas.py

122.3 µs122.3 µs

test_syev[50-s]

benchmark/pybench/benchmarks/bench_blas.py

1.3 ms1.3 ms

test_syev[50-d]

benchmark/pybench/benchmarks/bench_blas.py

1.4 ms1.4 ms

test_dgemv[1000-s]

benchmark/pybench/benchmarks/bench_blas.py

7 ms7 ms

test_syrk[1000-s]

benchmark/pybench/benchmarks/bench_blas.py

65.4 ms65.4 ms

test_gemm[1000-c]

benchmark/pybench/benchmarks/bench_blas.py

426 ms426 ms

test_syrk[1000-z]

benchmark/pybench/benchmarks/bench_blas.py

476.4 ms476.4 ms

test_gesdd[mn1-d]

benchmark/pybench/benchmarks/bench_blas.py

94 ms94 ms

test_syrk[100-z]

benchmark/pybench/benchmarks/bench_blas.py

856.8 µs856.8 µs

test_gesv[1000-s]

benchmark/pybench/benchmarks/bench_blas.py

52.6 ms52.6 ms

test_gemm[1000-s]

benchmark/pybench/benchmarks/bench_blas.py

117.4 ms117.4 ms

test_gesv[1000-c]

benchmark/pybench/benchmarks/bench_blas.py

188.6 ms188.6 ms

test_gemm[1000-z]

benchmark/pybench/benchmarks/bench_blas.py

875.6 ms875.6 ms

test_gemm[100-z]

benchmark/pybench/benchmarks/bench_blas.py

1.2 ms1.2 ms

test_dgemv[1000-z]

benchmark/pybench/benchmarks/bench_blas.py

26.3 ms26.3 ms

test_gemm[1000-d]

benchmark/pybench/benchmarks/bench_blas.py

239.4 ms239.4 ms

test_gesdd[mn1-s]

benchmark/pybench/benchmarks/bench_blas.py

65.4 ms65.4 ms

test_syrk[1000-d]

benchmark/pybench/benchmarks/bench_blas.py

130.3 ms130.4 ms

test_syrk[1000-c]

benchmark/pybench/benchmarks/bench_blas.py

227.5 ms227.6 ms

test_gemm[100-c]

benchmark/pybench/benchmarks/bench_blas.py

659.9 µs659.9 µs

Commits

Click on a commit to change the comparison range

Base

develop

66cc9f0

-0.02%

Fast performing edges for FP32 GEMM of RVV.

376d3a1

20 days ago

by ChipKerchner

-0.07%

Add bool types for C.

6d6af1d

20 days ago

by ChipKerchner

-0.45%

Add K-unrolling to M = 8. Other small changes.

9c16449

19 days ago

by ChipKerchner

+0.1%

Unroll K for N less than or equal to 4.

fda433f

19 days ago

by ChipKerchner

-0.01%

Common unroll code.

eb9bbcc

18 days ago

by ChipKerchner

+0.02%

Preserve K.

b0ee407

18 days ago

by ChipKerchner

+0.11%

Better K.

010f24f

16 days ago

by ChipKerchner

-0.02%

Global optimizations.

f927b94

16 days ago

by ChipKerchner

+0.31%

Use mf2 instead of m1.

79d9fe3

15 days ago

by ChipKerchner

+0.02%

Simplier loops.

477dd40

15 days ago

by ChipKerchner

More global optimzation and clean up.

d832ee5

14 days ago

by ChipKerchner

+0.02%

Merge remote-tracking branch 'origin/develop' into fasterRVVEdges

1e48686

13 days ago

by ChipKerchner

-0.02%

Avoid greater than 4 segment load and store penalties by using 2. Fix mf2 length.

a8a00bb

13 days ago

by ChipKerchner

+0.02%

Only initialize unused variables to prevent GCC warnings.

1bb72b2

12 days ago

by ChipKerchner

+0.01%

Fix typo.

ebf4cd1

10 days ago

by ChipKerchner

+0.03%

Fix another typo.

8fc0004

8 days ago

by ChipKerchner

-0.08%

Convert 2X LMUL1 instructions to 1X LMUL2. Improved FP64 GEMM edges - up to more than 3X faster.

d69be17

2 days ago

by ChipKerchner

-0.01%

Remove shadow variable.

daa3215

1 day ago

by ChipKerchner

Home Terms Privacy Docs