Vectorize the CRC64 implementation#85221
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsThis significantly improves performance for System.IO.Hashing.Crc64 for cases where the source span is 16 bytes or larger on Intel x86/x64 and modern ARM architectures. The vectorization change only applies to .NET 7 and later targets of System.IO.Hashing because it uses some Vector128 APIs added in .NET 7. This is a continuation of work done in #83321 which added vectorization to CRC32. BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1631) PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
BenchmarkDotNet=v0.13.2.2052-nightly, OS=ubuntu 22.04 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
|
|
/cc @tannergooding |
adamsitnik
left a comment
There was a problem hiding this comment.
LGTM, very impressive improvements @brantburnett !
| Vector128<ulong> x6 = CarrylessMultiplyLower(x2, x0); | ||
| Vector128<ulong> x7 = CarrylessMultiplyLower(x3, x0); | ||
| Vector128<ulong> x8 = CarrylessMultiplyLower(x4, x0); | ||
| x5 = VectorHelper.CarrylessMultiplyLower(x1, x0); |
There was a problem hiding this comment.
It was a good idea to move these methods to other type and reuse them. 👍
To avoid the need of adding the type name everywhere these methods were used you could just use using static at the top of the file
using static System.IO.Hashing.VectorHelper;|
|
||
| // Work with a reference to where we're at in the ReadOnlySpan and a local length | ||
| // to avoid extraneous range checks. | ||
| ref byte srcRef = ref MemoryMarshal.GetReference(source); |
There was a problem hiding this comment.
Personally I would prefer to store a reference to ulong rather than byte and in every loop iteration update the index rather than the reference, but since similar pattern was used in #83321 and approved by people more knowledgeable in this area, so I won't suggest it.
- ref byte srcRef = ref MemoryMarshal.GetReference(source);
+ ref ulong srcRef = ref Unsafe.As<byte, ulong>(ref MemoryMarshal.GetReference(source));
This significantly improves performance for System.IO.Hashing.Crc64 for cases where the source span is 16 bytes or larger on Intel x86/x64 and modern ARM architectures. The vectorization change only applies to .NET 7 and later targets of System.IO.Hashing because it uses some Vector128 APIs added in .NET 7.
This is a continuation of work done in #83321 which added vectorization to CRC32.
BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1631)
Intel Core i7-10850H CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=8.0.100-preview.3.23178.7
[Host] : .NET 8.0.0 (8.0.23.17408), X64 RyuJIT AVX2
Job-FPBBMO : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-FTHZKV : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1
BenchmarkDotNet=v0.13.2.2052-nightly, OS=ubuntu 22.04
AWS m6g.xlarge Graviton2
.NET SDK=8.0.100-preview.3.23178.7
[Host] : .NET 8.0.0 (8.0.23.17408), Arm64 RyuJIT AdvSIMD
Job-OYJLBY : .NET 8.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Job-GKZVCN : .NET 8.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1