Skip to content

Conversation

@DiamonDinoia
Copy link
Contributor

On my machine with gcc-10 I get the error:

/mnt/home/mbarbone/repos/xsimd/include/xsimd/types/../arch/./xsimd_avx512f.hpp:2740:59: error: '_mm512_cvtsi512_si32' was not declared in this scope; did you mean '_mm512_castsi512_si128'?
 2740 |                 return static_cast<T>(_mm512_cvtsi512_si32(self) & 0xFF);
      |                                       ~~~~~~~~~~~~~~~~~~~~^~~~~~
      |                                       _mm512_castsi512_si128
/mnt/home/mbarbone/repos/xsimd/include/xsimd/types/../arch/./xsimd_avx512f.hpp:2744:59: error: '_mm512_cvtsi512_si32' was not declared in this scope; did you mean '_mm512_castsi512_si128'?
 2744 |                 return static_cast<T>(_mm512_cvtsi512_si32(self) & 0xFFFF);
      |                                       ~~~~~~~~~~~~~~~~~~~~^~~~~~
      |                                       _mm512_castsi512_si128
/mnt/home/mbarbone/repos/xsimd/include/xsimd/types/../arch/./xsimd_avx512f.hpp:2748:59: error: '_mm512_cvtsi512_si32' was not declared in this scope; did you mean '_mm512_castsi512_si128'?
 2748 |                 return static_cast<T>(_mm512_cvtsi512_si32(self));
      |                                       ~~~~~~~~~~~~~~~~~~~~^~~~~~
      |                                       _mm512_castsi512_si128

This fixes it for me.

@serge-sans-paille
Copy link
Contributor

Thanks. It would be great to have a CI reproducer first, could you have a look or do you want me to handle it?

@DiamonDinoia
Copy link
Contributor Author

@DiamonDinoia
Copy link
Contributor Author

I could cherry pick the commit here in case you want to merge them in one go

@serge-sans-paille
Copy link
Contributor

I could cherry pick the commit here in case you want to merge them in one go

yes, please do, first the ci commit then the fix :-)

@DiamonDinoia
Copy link
Contributor Author

The fact that it breaks it is a good thing for me. I was chasing bugs in finufft with gcc-10. This might be the cause. flatironinstitute/finufft#780

@DiamonDinoia
Copy link
Contributor Author

DiamonDinoia commented Jan 8, 2026

diff --git a/test/test_shuffle.cpp b/test/test_shuffle.cpp
index b082109..f500072 100644
--- a/test/test_shuffle.cpp
+++ b/test/test_shuffle.cpp
@@ -672,10 +672,8 @@ struct shuffle_test
             }
         };
 
-        std::array<value_type, size> ref_lo;
-        for (size_t i = 0; i < size; ++i)
-            ref_lo[i] = (i & 1) ? rhs[i / 2] : lhs[i / 2];
-        B b_ref_lo = B::load_unaligned(ref_lo.data());
+        // Use zip_lo as a stable reference for the expected interleave.
+        B b_ref_lo = xsimd::zip_lo(b_lhs, b_rhs);
 
         INFO("zip_lo");
         B b_res_lo = xsimd::shuffle(b_lhs, b_rhs, xsimd::make_batch_constant<mask_type, zip_lo_generator, arch_type>());
@@ -689,12 +687,8 @@ struct shuffle_test
             }
         };
 
-        std::array<value_type, size> ref_hi;
-        for (size_t i = 0; i < size; ++i)
-        {
-            ref_hi[i] = (i & 1) ? rhs[size / 2 + i / 2] : lhs[size / 2 + i / 2];
-        }
-        B b_ref_hi = B::load_unaligned(ref_hi.data());
+        // Use zip_hi as a stable reference for the expected interleave.
+        B b_ref_hi = xsimd::zip_hi(b_lhs, b_rhs);

the issue seems to be gcc-10 auto vectorization not xsimd explicit one.

Adding that this code too causes issues: https://marco.godbolt.org/z/bjo6o8TfG

@serge-sans-paille
Copy link
Contributor

if that's a GCC issue, specific to gcc-10, we could add something specific, but is that worth the effort?

@DiamonDinoia
Copy link
Contributor Author

DiamonDinoia commented Jan 9, 2026

gcc-10 has a problem with avx512 but there is also an issue that I introduced when changing the avx512 swizzle: flatironinstitute/finufft#780 (comment)

I should fix that. For gcc-10 if this does not pass CI, I am not sure what is best. I just need xsimd to work with gcc-10 as is not too old of a compiler otherwise I need a #error in finufft if avx512 is used to compile finufft.

}
};

#if defined(__GNUC__) && (__GNUC__ == 10) && XSIMD_WITH_AVX512F
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang also defines __GNUC__, probably not as __GNUC__ == 10 but && !defined (__clang__) would be safer.

template <typename ElemT, typename U, U... Vs>
XSIMD_INLINE constexpr bool is_cross_lane() noexcept
{
static_assert(std::is_integral<U>::value, "swizzle mask values must be integral");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could probably also static_assert one sizeof...(U) to be on the safe side.

static_assert(sizeof...(Vs) >= 1, "Need at least one lane");
constexpr std::size_t N = sizeof...(Vs);
constexpr std::size_t lane_elems = 16 / sizeof(ElemT);
return cross_impl128<0, N, lane_elems, U, Vs...>::value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're in C++14, which has better constexpr support you could probably write this in a more procedural style, using a temporary array and a for loop to perform the check?

@DiamonDinoia DiamonDinoia force-pushed the fix-gcc-10 branch 2 times, most recently from e9699a8 to 346301a Compare February 3, 2026 03:28
Replace _mm512_cvtsi512_si32 with _mm_cvtsi128_si32(_mm512_castsi512_si128(self))
to fix compilation issues with GCC 10. The _mm512_cvtsi512_si32 intrinsic is not
available in older compiler versions.
Public API now checks 128-bit (16-byte) lanes, the standard for
SSE/AVX/AVX512. Internal helper available for explicit lane sizes.
Uses C++14 constexpr procedural style.
Add compiler-specific workaround for GCC 10 with AVX-512F in shuffle
tests. Use zip_lo/zip_hi as stable reference for the expected interleave
pattern instead of manually constructing the reference arrays.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants