Skip to content

igzip/riscv64: Optimize adler32_rvv for VLEN=128#390

Closed
leiwen2025 wants to merge 1 commit intointel:masterfrom
leiwen2025:optimize_vlen128_new
Closed

igzip/riscv64: Optimize adler32_rvv for VLEN=128#390
leiwen2025 wants to merge 1 commit intointel:masterfrom
leiwen2025:optimize_vlen128_new

Conversation

@leiwen2025
Copy link
Contributor

This PR optimizes the adler32_rvv implementation for vlen=128.

The optimization has been verified on the SG2044 platform:

SG2044:
        new: adler32_warm: runtime =    3062392 usecs, bandwidth 25988 MB in 3.0624 sec = 8486.24 MB/s
        old: adler32_warm: runtime =    3062471 usecs, bandwidth 23095 MB in 3.0625 sec = 7541.43 MB/s

@pablodelara
Copy link
Contributor

@leiwen2025 could you update Release notes saying Adler32 has been optimized for RISCV?

@leiwen2025
Copy link
Contributor Author

@leiwen2025 could you update Release notes saying Adler32 has been optimized for RISCV?

Done. I've updated the Release notes.

@pablodelara
Copy link
Contributor

@sunyuechi could you review this PR? Thanks!


vsetvli zero, t0, e32, m8, ta, ma
vmv.v.i v8, 0
vmv.v.i v24, 0
Copy link
Contributor

@sunyuechi sunyuechi Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v8 unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

vsetvli zero, t0, e16, m4, ta, ma
vsetvli zero, t0, e16, m2, ta, ma
vwaddu.wv v24, v24, v16
vwaddu.wv v24, v24, v18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above 2 lines are just vwaddu.vv v24, v16, v18? And vwaddu.vv doesn't require register zero-clearing either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vwaddu.vv v24, v16 ,v18 cannot implement the correct logic. The logic should be v24 += v16 + v18

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I misunderstood.

li t0, 32
bltu a2, t0, tail_bytes

vsetvli zero, t0, e32, m8, ta, ma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sunyuechi
Copy link
Contributor

sunyuechi commented Feb 9, 2026

LGTM (please squash commits).

Signed-off-by: WenLei <lei.wen2@zte.com.cn>
@leiwen2025 leiwen2025 force-pushed the optimize_vlen128_new branch from 3ff06c7 to 9dd9b26 Compare February 9, 2026 08:18
@pablodelara
Copy link
Contributor

This is merged now, thanks.

@pablodelara pablodelara closed this Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants