Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64 kernels: add accelerated crc32 routines #806

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kjbracey2
Copy link
Contributor

@kjbracey2 kjbracey2 commented Jan 12, 2022

Incorporate changes from Linux 4.20/4.21 to accelerate the kernel's crc32_le and __crc32c_le helpers.

Incorporates:

9784d82db ("make core crc32() routines weak so they can be overridden")
7481cddf2 ("arm64/lib: add accelerated crc32 routines")
efdb25efc ("arm64/lib: improve CRC32 performance for deep pipelines")
ff98e20ef ("lib/crc32.c: mark c4c32_le_base/__crc32_le_base alias as __pure")

But omits the runtime selection which uses machinery that differs significantly in Linux 4.1. We assume CRC support is always available.

@kjbracey2
Copy link
Contributor Author

kjbracey2 commented Jan 13, 2022

I'm also preparing a patch to accelerate crc32_be. With them all done, we can get rid of about 32K of code and tables for the slice-by-8 software solution, which should more than pay for the size of enabling the arm-ce crypto.

This version is also an earlier, less-pipelined version of the upstream code. Commit logs suggested that it would be slightly faster on A53 than the latest version, but it seems that may not be the case. I might update it after more tests.

Speed-up is nearly 8x for the LE ops, and over 5x for the BE ops.

Incorporate changes from Linux 4.20/4.21 to accelerate the kernel's
crc32_le and __crc32c_le helpers.

Incorporates:

9784d82db ("make core crc32() routines weak so they can be overridden")
7481cddf2 ("arm64/lib: add accelerated crc32 routines")
efdb25efc ("arm64/lib: improve CRC32 performance for deep pipelines")
ff98e20ef ("lib/crc32.c: mark c4c32_le_base/__crc32_le_base alias as __pure")

But omits the runtime selection which uses machinery that differs
significantly in Linux 4.1. We assume CRC support is always available.
@kjbracey2
Copy link
Contributor Author

Updated to Linux 4.21 version - it is about 35% faster on my RT-AX88U, despite upstream changelog suggesting it was slightly slower on A53.

Original test time: 170 µs
4.20 version time: 29 µs
4.21 version time: 21 µs

@jonathanmassehsj
Copy link

I tested the changes, work faster on my RT-AX88U

@RMerl RMerl force-pushed the master branch 2 times, most recently from b4d0ac1 to 42dc10f Compare March 23, 2022 19:20
@dr-m
Copy link

dr-m commented Jul 15, 2022

Is there a particular reason why the code is not making use of carry-less multiplication (using the pmull instruction)? On my RT-AC86U, /proc/cpuinfo does advertise that feature. Could some tricks from MariaDB/server#1652 be adopted? Obviously, we would want compile-time detection instead of runtime detection here.

Note: I am not too familiar with ARMv8 implementations or router SoCs. It might bee that pmull is not supported by some ARMv8 SoCs that this code base is targeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants