diff mbox series

[FFmpeg-devel,1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions

Message ID 20230526080315.83424-1-martin@martin.st
State New
Headers show
Series [FFmpeg-devel,1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Martin Storsjö May 26, 2023, 8:03 a.m. UTC
These are available since ARMv8.4-a and ARMv8.6-a respectively,
but can also be available optionally since ARMv8.2-a.

Check if these are available for use unconditionally (e.g. if compiling
with -march=armv8.6-a), or if they can be enabled with specific
assembler directives.

Use ".arch_extension <ext>" for enabling a specific extension in
assembly; the same can also be achieved with ".arch armv8.2-a+<ext>",
but with .arch_extension is easier to combine multiple separate
features.

Enabling these extensions requires setting a base architecture level
of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary;
if the base level is high enough (which might unlock other extensions
without .arch_extension), we don't want to lower it.

Only add .arch/.arch_extension if needed, e.g. current clang fails
to recognize the dotprod and i8mm features in .arch_extension, but
can successfully assemble these instructions if part of the baseline
set with -march.
---
 configure               | 77 ++++++++++++++++++++++++++++++++++++++++-
 libavutil/aarch64/asm.S | 13 +++++++
 2 files changed, 89 insertions(+), 1 deletion(-)

Comments

Rémi Denis-Courmont May 27, 2023, 8:44 a.m. UTC | #1
Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit :
> These are available since ARMv8.4-a and ARMv8.6-a respectively,
> but can also be available optionally since ARMv8.2-a.
> 
> Check if these are available for use unconditionally (e.g. if compiling
> with -march=armv8.6-a), or if they can be enabled with specific
> assembler directives.
> 
> Use ".arch_extension <ext>" for enabling a specific extension in
> assembly; the same can also be achieved with ".arch armv8.2-a+<ext>",
> but with .arch_extension is easier to combine multiple separate
> features.
> 
> Enabling these extensions requires setting a base architecture level
> of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary;
> if the base level is high enough (which might unlock other extensions
> without .arch_extension), we don't want to lower it.

I don't follow how that would actually happen, TBH. Even if the default target 
version is, say, 8.5, the assembler won't magically start emitting 8.5 
instructions.

Someone would have to write assembler code that would fail to build under a 
toolchain with a lower target version. That sounds like a bug that should be 
spotted and fixed, rather than papered over. Conversely the logic here seems 
unnecessarily, if not counter-productively, intricate.

> Only add .arch/.arch_extension if needed, e.g. current clang fails
> to recognize the dotprod and i8mm features in .arch_extension, but
> can successfully assemble these instructions if part of the baseline
> set with -march.

IME, Clang is utterly useless for assembling. That has become one of my pet 
peeves with Rust inline assembler, which is other much nicer than C inline 
assembler, whence you can't just work around it with `-no-integrated-as`. But 
I digress.

If the problem is to avoid `.arch_extension`, then I don't really see why you 
can't just use `.arch` with plus, and simplify a lot.
Martin Storsjö May 27, 2023, 9:34 p.m. UTC | #2
On Sat, 27 May 2023, Rémi Denis-Courmont wrote:

> Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit :
>> These are available since ARMv8.4-a and ARMv8.6-a respectively,
>> but can also be available optionally since ARMv8.2-a.
>> 
>> Check if these are available for use unconditionally (e.g. if compiling
>> with -march=armv8.6-a), or if they can be enabled with specific
>> assembler directives.
>> 
>> Use ".arch_extension <ext>" for enabling a specific extension in
>> assembly; the same can also be achieved with ".arch armv8.2-a+<ext>",
>> but with .arch_extension is easier to combine multiple separate
>> features.
>> 
>> Enabling these extensions requires setting a base architecture level
>> of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary;
>> if the base level is high enough (which might unlock other extensions
>> without .arch_extension), we don't want to lower it.
>
> I don't follow how that would actually happen, TBH. Even if the default target 
> version is, say, 8.5, the assembler won't magically start emitting 8.5 
> instructions.
>
> Someone would have to write assembler code that would fail to build under a 
> toolchain with a lower target version. That sounds like a bug that should be 
> spotted and fixed, rather than papered over.

I don't see how anything here suggests papering over such an issue?

I'm not sure exactly which parts of the message you refer to here, but 
I'll elaborate on the point about why we only should set .arch if we 
really need to.


Consider a build configuration with -march=armv8.4-a. We test that the 
dotprod extension is available and usable without adding any directives - 
so we won't add any directives for that. We also test that the assembler 
does support i8mm, with ".arch armv8.2-a" plus ".arch_extension i8mm".

But if we do add ".arch armv8.2-a" and ".arch_extension i8mm", then we 
break the dotprod extension. If we only add ".arch_extension i8mm" without 
the .arch directive, we get what we want to though.

> If the problem is to avoid `.arch_extension`, then I don't really see 
> why you can't just use `.arch` with plus, and simplify a lot.

Well Clang doesn't quite support that currently either. For 
".arch_extension dotprod" it errors out since it doesn't recognize the 
dotprod feature in that directive. It does accept ".arch 
armv8.2-a+dotprod" but it doesn't actually unlock using the dotprod 
extension in the assembly despite that. (I'll look into fixing this in 
upstream LLVM afterwards.)

As Clang/LLVM has these limitations/issues currently, one main design 
criterion here is that we shouldn't add any extra .arch/.arch_extension 
directives unless we need and can (and gain some instruction support from 
it).


Taking it back to the drawing board: So for enabling e.g. i8mm, we could 
either do
     .arch armv8.6-a
or
     .arch armv8.2-a+dotprod
or
     .arch armv8.2
     .arch_extension dotprod


Out of these, I initially preferred doing the third approach.

There's no functional difference between the second and third one, except 
the single-line form is more messy to handle, as we can have various 
combinations of what actually is supported. And with the single-line .arch 
form, we can't just add e.g. i8mm on top of a -march= setting that already 
supports dotprod, without respecifying what the toolchain itself defaults 
to.


The documentation for .arch_extension hints at it being possible to 
disable support for extensions with it too, but that doesn't seem to be 
the case in practice. If it was, we could add macros to only enable 
specifically the extensions we want around those functions that should use 
them and nothing more. But I guess if that's not actually supported we 
can't do that.


I guess the alternative would be to just try to set .arch 
<highest-supported-that-we-care-about>. I was worried that support for 
e.g. armv8.6-a appeared later in toolchains than support for the 
individual extension i8mm, but at least from a quick browse in binutils 
history, they seem to have been added at the same time, so there's 
probably no such drawback.

Or what's the situation with e.g. SVE2 - was ".arch_extension sve2" 
supported significantly earlier than ".arch armv9-a"? It looks like 
binutils learnt about sve2 in 2019, but about armv9-a in 2021? OTOH that's 
probably not too much of a real issue either.

If we'd do that, it does simplify the configure logic a fair bit and 
reduces the number of configure variables we need by a lot. It does enable 
a few more instruction set extensions than what we need though, but that's 
probably not a real issue.

// Martin
Rémi Denis-Courmont May 28, 2023, 5:50 a.m. UTC | #3
Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit :
> > Someone would have to write assembler code that would fail to build under
> > a toolchain with a lower target version. That sounds like a bug that
> > should be spotted and fixed, rather than papered over.
> 
> I don't see how anything here suggests papering over such an issue?

For instance, say the toolchain, or the FFmpeg build flags, sets ARMv8.6 as 
target architecture. It makes perfect sense for the C compiler to emit non-
hint ARMv8.3 PAuth instructions, which would crash on an ARMv8.0-8.2 
processor. But it makes zero sense for the assembler to change its behaviour. 
So all existing ASIMD assembler files written for ARMv8.0-A, should still 
generate ARMv8.0-A code.

But now somebody can accidentally write an ARMv8.1-8.6 instruction into their 
ARMv8.0 assembler code, and it will not trigger a build error (to them). There 
are no clear reasons to trying to avoid *lowering* the target version for 
*assembler* (as opposed to C).

If a file needs ARMv8.2, it should just ARMv8.2 as its target, whether the 
overall build targets ARMv8.0, ARMv8.2 or ARMv8.6. The C code around it should 
ensure that the requirements are met. There are no benefits to preserving a 
higher version; it just makes the configure checks needlessly intricate.

> if we do add ".arch armv8.2-a" and ".arch_extension i8mm", then we
> break the dotprod extension. If we only add ".arch_extension i8mm" without
> the .arch directive, we get what we want to though.

Yes, but you can also just set `.arch armv8.4-a`, which is ostensibly The Way 
to enable DotProd on a craptastic assembler. If even that doesn't work, the 
assembler presumably doesn't support DotProd at all (or it's some obscure 
unstable development version that was snapshot between adding DotProd and 
adding complete ARMv8.4, which we really should not care about).

Likewise, for I8MM, you can just do `.arch armv8.6-a`.

So what if the target version was 8.7 or 9.2? Well, in an assembler file, we 
don't really care, as noted above.

> > If the problem is to avoid `.arch_extension`, then I don't really see
> > why you can't just use `.arch` with plus, and simplify a lot.
> 
> Well Clang doesn't quite support that currently either. For
> ".arch_extension dotprod" it errors out since it doesn't recognize the
> dotprod feature in that directive. It does accept ".arch
> armv8.2-a+dotprod" but it doesn't actually unlock using the dotprod
> extension in the assembly despite that. (I'll look into fixing this in
> upstream LLVM afterwards.)

I don't know if `.arch_extension` is specified by Arm or just a GCCism. But if 
you accept DotProd in `.arch` in arch and then don't enable it, then that's 
clearly a bug. But then again, that's moot if you can just do `.arch armv8.4-
a` instead.

> Taking it back to the drawing board: So for enabling e.g. i8mm, we could
> either do
>      .arch armv8.6-a
> or
>      .arch armv8.2-a+dotprod
> or
>      .arch armv8.2
>      .arch_extension dotprod
> 
> 
> Out of these, I initially preferred doing the third approach.

I agree that that's the cleanest option. But that's not the point here. The 
point is that what this patch is a hell of a lot more involved than doing just 
that, and it seems unnecessarily intricate for the purpose of enabling 
DotProd.

> There's no functional difference between the second and third one, except
> the single-line form is more messy to handle, as we can have various
> combinations of what actually is supported.

Yes but a pile of #if's is even more messy to handle.

> And with the single-line .arch
> form, we can't just add e.g. i8mm on top of a -march= setting that already
> supports dotprod, without respecifying what the toolchain itself defaults
> to.
> 
> 
> The documentation for .arch_extension hints at it being possible to
> disable support for extensions with it too, but that doesn't seem to be
> the case in practice. If it was, we could add macros to only enable
> specifically the extensions we want around those functions that should use
> them and nothing more. But I guess if that's not actually supported we
> can't do that.

GCC supports it with RV64 (`.option arch` plus/minus). I haven't tried it on 
AArch64. Of course, LLVM does not support it, even though the patch has been 
out there for a long time and even though it is part of the ABI spec rather 
than made up by GNU/binutils. Thanks to that, RVV is unusable on LLVM (Linux 
kernel specifically requires GNU/as due to that).

> I guess the alternative would be to just try to set .arch
> <highest-supported-that-we-care-about>. I was worried that support for
> e.g. armv8.6-a appeared later in toolchains than support for the
> individual extension i8mm, but at least from a quick browse in binutils
> history, they seem to have been added at the same time, so there's
> probably no such drawback.
> 
> Or what's the situation with e.g. SVE2 - was ".arch_extension sve2"
> supported significantly earlier than ".arch armv9-a"?

I have not tested SVE on LLVM. AFAIK, SVE and SVE2 are optional from 8.2 and 
9.0 onward respectively, and not mandatory in any version, so if your 
toolchain supports neither .arch with plus sign, nor .arch_extension, it is 
game over.

> If we'd do that, it does simplify the configure logic a fair bit and
> reduces the number of configure variables we need by a lot. It does enable
> a few more instruction set extensions than what we need though, but that's
> probably not a real issue.

Yes.
Martin Storsjö May 30, 2023, 12:25 p.m. UTC | #4
On Sun, 28 May 2023, Rémi Denis-Courmont wrote:

> Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit :
>
>> I guess the alternative would be to just try to set .arch
>> <highest-supported-that-we-care-about>. I was worried that support for
>> e.g. armv8.6-a appeared later in toolchains than support for the
>> individual extension i8mm, but at least from a quick browse in binutils
>> history, they seem to have been added at the same time, so there's
>> probably no such drawback.
>> 
>> Or what's the situation with e.g. SVE2 - was ".arch_extension sve2"
>> supported significantly earlier than ".arch armv9-a"?
>
> I have not tested SVE on LLVM. AFAIK, SVE and SVE2 are optional from 8.2 and 
> 9.0 onward respectively, and not mandatory in any version, so if your 
> toolchain supports neither .arch with plus sign, nor .arch_extension, it is 
> game over.

I didn't meant specifically whether LLVM supports it here, just in general 
wrt binutils and how to enable the feature.

FWIW it seems like SVE2 is a mandatory part of 9.0 - assembling SVE2 
instructions can be done with ".arch armv9-a". But there are about 2 years 
worth of deployed binutils based toolchains that do recognize ".arch 
armv8.2-a; .arch_extension sve2" but don't recognize ".arch armv9-a".

So for the generic mechanism for enabling cpu features, I'd prefer to keep 
the mechanism using primarily .arch_extension (with .arch set as high as 
necessary) rather than relying solely on .arch <version> without any extra 
+<feature>.

>> If we'd do that, it does simplify the configure logic a fair bit and
>> reduces the number of configure variables we need by a lot. It does enable
>> a few more instruction set extensions than what we need though, but that's
>> probably not a real issue.
>
> Yes.

I made an attempt at simplifying the logic in configure and asm.S 
somewhat, while still primarily using .arch_extension, and while making 
sure we still can get the features assembled with current Clang with a 
high enough -march= setting. (Runtime enabled features are out of scope 
for Clang for now as we don't want to try to pass individual higher 
-march= options to the individual assembly files.)

// Martin
Rémi Denis-Courmont May 31, 2023, 4:29 p.m. UTC | #5
Le tiistaina 30. toukokuuta 2023, 15.25.25 EEST Martin Storsjö a écrit :
> On Sun, 28 May 2023, Rémi Denis-Courmont wrote:
> > Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit :
> >> I guess the alternative would be to just try to set .arch
> >> <highest-supported-that-we-care-about>. I was worried that support for
> >> e.g. armv8.6-a appeared later in toolchains than support for the
> >> individual extension i8mm, but at least from a quick browse in binutils
> >> history, they seem to have been added at the same time, so there's
> >> probably no such drawback.
> >> 
> >> Or what's the situation with e.g. SVE2 - was ".arch_extension sve2"
> >> supported significantly earlier than ".arch armv9-a"?
> > 
> > I have not tested SVE on LLVM. AFAIK, SVE and SVE2 are optional from 8.2
> > and 9.0 onward respectively, and not mandatory in any version, so if your
> > toolchain supports neither .arch with plus sign, nor .arch_extension, it
> > is game over.
> 
> I didn't meant specifically whether LLVM supports it here, just in general
> wrt binutils and how to enable the feature.
> 
> FWIW it seems like SVE2 is a mandatory part of 9.0

Yes and no. SVE requires ARMv8.2, and SVE2 requires ARMv9.0, but neither are 
ever required by any existing ARM version. DDI0487 is abundantly clear that 
SVE2 is OPTIONAL: "[The Scalable Vector Extension version 2 (SVE2)] feature is 
supported in AArch64 state only. This feature is OPTIONAL in an Armv9.0 
implementation,"

However it adds that "standard Armv9-A software platforms support FEAT_SVE2."

> - assembling SVE2 instructions can be done with ".arch armv9-a".

Presumably binutils targets "standard Armv9-A software platforms" by default, 
as opposed to just any "Armv9.0 implementation". I guess that means that any 
Cortex, Neoverse or other proper ARM design will include SVE2, but big-ass 
architecture licensees such as APPL and NVDA are not required to include SVE2 
in their own designs.

> But there are about 2 years
> worth of deployed binutils based toolchains that do recognize ".arch
> armv8.2-a; .arch_extension sve2" but don't recognize ".arch armv9-a".

This is not entirely surprising, since SVE is much older than ARMv9, even if 
SVE2 requires ARMv9.
Martin Storsjö June 1, 2023, 11:04 a.m. UTC | #6
On Sun, 28 May 2023, Martin Storsjö wrote:

> The documentation for .arch_extension hints at it being possible to disable 
> support for extensions with it too, but that doesn't seem to be the case in 
> practice. If it was, we could add macros to only enable specifically the 
> extensions we want around those functions that should use them and nothing 
> more. But I guess if that's not actually supported we can't do that.

Actually, yes, this does work, it just uses a different syntax than I 
expected. To disable the extension <ext>, one can write the directive 
".arch_extension no<ext>".

Is the updated, less complex version of the configure patch more 
tolerable?

// Martin
diff mbox series

Patch

diff --git a/configure b/configure
index 87f7afc2e1..3c7473efb2 100755
--- a/configure
+++ b/configure
@@ -454,6 +454,8 @@  Optimization options (experts only):
   --disable-armv6t2        disable armv6t2 optimizations
   --disable-vfp            disable VFP optimizations
   --disable-neon           disable NEON optimizations
+  --disable-dotprod        disable DOTPROD optimizations
+  --disable-i8mm           disable I8MM optimizations
   --disable-inline-asm     disable use of inline assembly
   --disable-x86asm         disable use of standalone x86 assembly
   --disable-mipsdsp        disable MIPS DSP ASE R1 optimizations
@@ -1154,6 +1156,41 @@  check_insn(){
     check_as ${1}_external "$2"
 }
 
+check_archext_insn(){
+    log check_archext_insn "$@"
+    feature="$1"
+    base_arch="$2"
+    archext="$3"
+    instr="$4"
+    # Check if the assembly is accepted unconditionally in either inline or
+    # external assembly.
+    check_inline_asm ${feature}_inline "\"$instr\""
+    check_as ${feature}_external "$instr"
+
+    enabled_any ${feature}_inline ${feature}_external || disable ${feature}
+
+    if disabled ${feature}_external; then
+        # If not accepted unconditionally, check if we can assemble it
+        # with a suitable .arch_extension directive.
+        test_as <<EOF && enable ${feature} as_archext_${archext}_directive
+.arch_extension $archext
+$instr
+EOF
+        if disabled ${feature}; then
+            # If the base arch level is too low, .arch_extension can require setting
+            # a higher arch level with .arch too. Only do this if strictly needed;
+            # if the base level is e.g. arvm8.4-a and some features are available
+            # without any .arch_extension, we don't want to set ".arch armv8.2-a"
+            # for some other .arch_extension.
+            test_as <<EOF && enable ${feature} as_archext_${archext}_directive as_archext_${archext}_needs_arch
+.arch $base_arch
+.arch_extension $archext
+$instr
+EOF
+        fi
+    fi
+}
+
 check_x86asm(){
     log check_x86asm "$@"
     name=$1
@@ -2059,6 +2096,8 @@  ARCH_EXT_LIST_ARM="
     armv6
     armv6t2
     armv8
+    dotprod
+    i8mm
     neon
     vfp
     vfpv3
@@ -2322,6 +2361,10 @@  SYSTEM_LIBRARIES="
 
 TOOLCHAIN_FEATURES="
     as_arch_directive
+    as_archext_dotprod_directive
+    as_archext_dotprod_needs_arch
+    as_archext_i8mm_directive
+    as_archext_i8mm_needs_arch
     as_dn_directive
     as_fpu_directive
     as_func
@@ -2622,6 +2665,8 @@  intrinsics_neon_deps="neon"
 vfp_deps_any="aarch64 arm"
 vfpv3_deps="vfp"
 setend_deps="arm"
+dotprod_deps="aarch64 neon"
+i8mm_deps="aarch64 neon"
 
 map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM
 
@@ -5979,12 +6024,26 @@  check_inline_asm inline_asm_labels '"1:\n"'
 check_inline_asm inline_asm_nonlocal_labels '"Label:\n"'
 
 if enabled aarch64; then
+    check_as as_arch_directive ".arch armv8.2-a"
+
     enabled armv8 && check_insn armv8 'prfm   pldl1strm, [x0]'
     # internal assembler in clang 3.3 does not support this instruction
     enabled neon && check_insn neon 'ext   v0.8B, v0.8B, v1.8B, #1'
     enabled vfp  && check_insn vfp  'fmadd d0,    d0,    d1,    d2'
 
-    map 'enabled_any ${v}_external ${v}_inline || disable $v' $ARCH_EXT_LIST_ARM
+    archext_list="dotprod i8mm"
+    enabled dotprod && check_archext_insn dotprod armv8.2-a dotprod 'udot v0.4s, v0.16b, v0.16b'
+    enabled i8mm    && check_archext_insn i8mm    armv8.2-a i8mm    'usdot v0.4s, v0.16b, v0.16b'
+
+    # Disable the main feature (e.g. HAVE_NEON) if neither inline nor external
+    # assembly support the feature out of the box. Skip this for the features
+    # checked with check_archext_insn above; they are checked separately whether
+    # they can be built out of the box or enabled with an .arch_extension
+    # flag.
+    for v in $ARCH_EXT_LIST_ARM; do
+        is_in $v $archext_list && continue
+        enabled_any ${v}_external ${v}_inline || disable $v
+    done
 
 elif enabled alpha; then
 
@@ -6013,6 +6072,12 @@  EOF
         warn "Compiler does not indicate floating-point ABI, guessing $fpabi."
     fi
 
+    # Test for various instruction sets, testing support both in inline and
+    # external assembly. This sets the ${v}_inline or ${v}_external flags
+    # if the instruction can be used unconditionally in either inline or
+    # external assembly. This means that if the ${v}_external feature is set,
+    # that feature can be used unconditionally in various support macros
+    # anywhere in external assembly, in any function.
     enabled armv5te && check_insn armv5te 'qadd r0, r0, r0'
     enabled armv6   && check_insn armv6   'sadd16 r0, r0, r0'
     enabled armv6t2 && check_insn armv6t2 'movt r0, #0'
@@ -6021,6 +6086,14 @@  EOF
     enabled vfpv3   && check_insn vfpv3   'vmov.f32 s0, #1.0'
     enabled setend  && check_insn setend  'setend be'
 
+    # If neither inline nor external assembly can use the feature by default,
+    # disable the main unsuffixed feature (e.g. HAVE_NEON).
+    #
+    # For targets that support runtime CPU feature detection, don't disable
+    # the main feature flag - there we assume that all supported toolchains
+    # can assemble code for all instruction set features (e.g. NEON) with
+    # suitable assembly flags (such as ".fpu neon"); we don't check
+    # specifically that they really do.
     [ $target_os = linux ] || [ $target_os = android ] ||
         map 'enabled_any ${v}_external ${v}_inline || disable $v' \
             $ARCH_EXT_LIST_ARM
@@ -7601,6 +7674,8 @@  fi
 if enabled aarch64; then
     echo "NEON enabled              ${neon-no}"
     echo "VFP enabled               ${vfp-no}"
+    echo "DOTPROD enabled           ${dotprod-no}"
+    echo "I8MM enabled              ${i8mm-no}"
 fi
 if enabled arm; then
     echo "ARMv5TE enabled           ${armv5te-no}"
diff --git a/libavutil/aarch64/asm.S b/libavutil/aarch64/asm.S
index a7782415d7..7cf907f93c 100644
--- a/libavutil/aarch64/asm.S
+++ b/libavutil/aarch64/asm.S
@@ -36,6 +36,19 @@ 
 #   define __has_feature(x) 0
 #endif
 
+#if HAVE_AS_ARCH_DIRECTIVE
+#if HAVE_AS_ARCHEXT_DOTPROD_NEEDS_ARCH || HAVE_AS_ARCHEXT_I8MM_NEEDS_ARCH
+        .arch           armv8.2-a
+#endif
+#endif
+
+#if HAVE_AS_ARCHEXT_DOTPROD_DIRECTIVE
+        .arch_extension dotprod
+#endif
+#if HAVE_AS_ARCHEXT_I8MM_DIRECTIVE
+        .arch_extension i8mm
+#endif
+
 
 /* Support macros for
  *   - Armv8.3-A Pointer Authentication and