Message ID | 20230526080315.83424-1-martin@martin.st |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel,1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit : > These are available since ARMv8.4-a and ARMv8.6-a respectively, > but can also be available optionally since ARMv8.2-a. > > Check if these are available for use unconditionally (e.g. if compiling > with -march=armv8.6-a), or if they can be enabled with specific > assembler directives. > > Use ".arch_extension <ext>" for enabling a specific extension in > assembly; the same can also be achieved with ".arch armv8.2-a+<ext>", > but with .arch_extension is easier to combine multiple separate > features. > > Enabling these extensions requires setting a base architecture level > of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary; > if the base level is high enough (which might unlock other extensions > without .arch_extension), we don't want to lower it. I don't follow how that would actually happen, TBH. Even if the default target version is, say, 8.5, the assembler won't magically start emitting 8.5 instructions. Someone would have to write assembler code that would fail to build under a toolchain with a lower target version. That sounds like a bug that should be spotted and fixed, rather than papered over. Conversely the logic here seems unnecessarily, if not counter-productively, intricate. > Only add .arch/.arch_extension if needed, e.g. current clang fails > to recognize the dotprod and i8mm features in .arch_extension, but > can successfully assemble these instructions if part of the baseline > set with -march. IME, Clang is utterly useless for assembling. That has become one of my pet peeves with Rust inline assembler, which is other much nicer than C inline assembler, whence you can't just work around it with `-no-integrated-as`. But I digress. If the problem is to avoid `.arch_extension`, then I don't really see why you can't just use `.arch` with plus, and simplify a lot.
On Sat, 27 May 2023, Rémi Denis-Courmont wrote: > Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit : >> These are available since ARMv8.4-a and ARMv8.6-a respectively, >> but can also be available optionally since ARMv8.2-a. >> >> Check if these are available for use unconditionally (e.g. if compiling >> with -march=armv8.6-a), or if they can be enabled with specific >> assembler directives. >> >> Use ".arch_extension <ext>" for enabling a specific extension in >> assembly; the same can also be achieved with ".arch armv8.2-a+<ext>", >> but with .arch_extension is easier to combine multiple separate >> features. >> >> Enabling these extensions requires setting a base architecture level >> of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary; >> if the base level is high enough (which might unlock other extensions >> without .arch_extension), we don't want to lower it. > > I don't follow how that would actually happen, TBH. Even if the default target > version is, say, 8.5, the assembler won't magically start emitting 8.5 > instructions. > > Someone would have to write assembler code that would fail to build under a > toolchain with a lower target version. That sounds like a bug that should be > spotted and fixed, rather than papered over. I don't see how anything here suggests papering over such an issue? I'm not sure exactly which parts of the message you refer to here, but I'll elaborate on the point about why we only should set .arch if we really need to. Consider a build configuration with -march=armv8.4-a. We test that the dotprod extension is available and usable without adding any directives - so we won't add any directives for that. We also test that the assembler does support i8mm, with ".arch armv8.2-a" plus ".arch_extension i8mm". But if we do add ".arch armv8.2-a" and ".arch_extension i8mm", then we break the dotprod extension. If we only add ".arch_extension i8mm" without the .arch directive, we get what we want to though. > If the problem is to avoid `.arch_extension`, then I don't really see > why you can't just use `.arch` with plus, and simplify a lot. Well Clang doesn't quite support that currently either. For ".arch_extension dotprod" it errors out since it doesn't recognize the dotprod feature in that directive. It does accept ".arch armv8.2-a+dotprod" but it doesn't actually unlock using the dotprod extension in the assembly despite that. (I'll look into fixing this in upstream LLVM afterwards.) As Clang/LLVM has these limitations/issues currently, one main design criterion here is that we shouldn't add any extra .arch/.arch_extension directives unless we need and can (and gain some instruction support from it). Taking it back to the drawing board: So for enabling e.g. i8mm, we could either do .arch armv8.6-a or .arch armv8.2-a+dotprod or .arch armv8.2 .arch_extension dotprod Out of these, I initially preferred doing the third approach. There's no functional difference between the second and third one, except the single-line form is more messy to handle, as we can have various combinations of what actually is supported. And with the single-line .arch form, we can't just add e.g. i8mm on top of a -march= setting that already supports dotprod, without respecifying what the toolchain itself defaults to. The documentation for .arch_extension hints at it being possible to disable support for extensions with it too, but that doesn't seem to be the case in practice. If it was, we could add macros to only enable specifically the extensions we want around those functions that should use them and nothing more. But I guess if that's not actually supported we can't do that. I guess the alternative would be to just try to set .arch <highest-supported-that-we-care-about>. I was worried that support for e.g. armv8.6-a appeared later in toolchains than support for the individual extension i8mm, but at least from a quick browse in binutils history, they seem to have been added at the same time, so there's probably no such drawback. Or what's the situation with e.g. SVE2 - was ".arch_extension sve2" supported significantly earlier than ".arch armv9-a"? It looks like binutils learnt about sve2 in 2019, but about armv9-a in 2021? OTOH that's probably not too much of a real issue either. If we'd do that, it does simplify the configure logic a fair bit and reduces the number of configure variables we need by a lot. It does enable a few more instruction set extensions than what we need though, but that's probably not a real issue. // Martin
Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit : > > Someone would have to write assembler code that would fail to build under > > a toolchain with a lower target version. That sounds like a bug that > > should be spotted and fixed, rather than papered over. > > I don't see how anything here suggests papering over such an issue? For instance, say the toolchain, or the FFmpeg build flags, sets ARMv8.6 as target architecture. It makes perfect sense for the C compiler to emit non- hint ARMv8.3 PAuth instructions, which would crash on an ARMv8.0-8.2 processor. But it makes zero sense for the assembler to change its behaviour. So all existing ASIMD assembler files written for ARMv8.0-A, should still generate ARMv8.0-A code. But now somebody can accidentally write an ARMv8.1-8.6 instruction into their ARMv8.0 assembler code, and it will not trigger a build error (to them). There are no clear reasons to trying to avoid *lowering* the target version for *assembler* (as opposed to C). If a file needs ARMv8.2, it should just ARMv8.2 as its target, whether the overall build targets ARMv8.0, ARMv8.2 or ARMv8.6. The C code around it should ensure that the requirements are met. There are no benefits to preserving a higher version; it just makes the configure checks needlessly intricate. > if we do add ".arch armv8.2-a" and ".arch_extension i8mm", then we > break the dotprod extension. If we only add ".arch_extension i8mm" without > the .arch directive, we get what we want to though. Yes, but you can also just set `.arch armv8.4-a`, which is ostensibly The Way to enable DotProd on a craptastic assembler. If even that doesn't work, the assembler presumably doesn't support DotProd at all (or it's some obscure unstable development version that was snapshot between adding DotProd and adding complete ARMv8.4, which we really should not care about). Likewise, for I8MM, you can just do `.arch armv8.6-a`. So what if the target version was 8.7 or 9.2? Well, in an assembler file, we don't really care, as noted above. > > If the problem is to avoid `.arch_extension`, then I don't really see > > why you can't just use `.arch` with plus, and simplify a lot. > > Well Clang doesn't quite support that currently either. For > ".arch_extension dotprod" it errors out since it doesn't recognize the > dotprod feature in that directive. It does accept ".arch > armv8.2-a+dotprod" but it doesn't actually unlock using the dotprod > extension in the assembly despite that. (I'll look into fixing this in > upstream LLVM afterwards.) I don't know if `.arch_extension` is specified by Arm or just a GCCism. But if you accept DotProd in `.arch` in arch and then don't enable it, then that's clearly a bug. But then again, that's moot if you can just do `.arch armv8.4- a` instead. > Taking it back to the drawing board: So for enabling e.g. i8mm, we could > either do > .arch armv8.6-a > or > .arch armv8.2-a+dotprod > or > .arch armv8.2 > .arch_extension dotprod > > > Out of these, I initially preferred doing the third approach. I agree that that's the cleanest option. But that's not the point here. The point is that what this patch is a hell of a lot more involved than doing just that, and it seems unnecessarily intricate for the purpose of enabling DotProd. > There's no functional difference between the second and third one, except > the single-line form is more messy to handle, as we can have various > combinations of what actually is supported. Yes but a pile of #if's is even more messy to handle. > And with the single-line .arch > form, we can't just add e.g. i8mm on top of a -march= setting that already > supports dotprod, without respecifying what the toolchain itself defaults > to. > > > The documentation for .arch_extension hints at it being possible to > disable support for extensions with it too, but that doesn't seem to be > the case in practice. If it was, we could add macros to only enable > specifically the extensions we want around those functions that should use > them and nothing more. But I guess if that's not actually supported we > can't do that. GCC supports it with RV64 (`.option arch` plus/minus). I haven't tried it on AArch64. Of course, LLVM does not support it, even though the patch has been out there for a long time and even though it is part of the ABI spec rather than made up by GNU/binutils. Thanks to that, RVV is unusable on LLVM (Linux kernel specifically requires GNU/as due to that). > I guess the alternative would be to just try to set .arch > <highest-supported-that-we-care-about>. I was worried that support for > e.g. armv8.6-a appeared later in toolchains than support for the > individual extension i8mm, but at least from a quick browse in binutils > history, they seem to have been added at the same time, so there's > probably no such drawback. > > Or what's the situation with e.g. SVE2 - was ".arch_extension sve2" > supported significantly earlier than ".arch armv9-a"? I have not tested SVE on LLVM. AFAIK, SVE and SVE2 are optional from 8.2 and 9.0 onward respectively, and not mandatory in any version, so if your toolchain supports neither .arch with plus sign, nor .arch_extension, it is game over. > If we'd do that, it does simplify the configure logic a fair bit and > reduces the number of configure variables we need by a lot. It does enable > a few more instruction set extensions than what we need though, but that's > probably not a real issue. Yes.
On Sun, 28 May 2023, Rémi Denis-Courmont wrote: > Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit : > >> I guess the alternative would be to just try to set .arch >> <highest-supported-that-we-care-about>. I was worried that support for >> e.g. armv8.6-a appeared later in toolchains than support for the >> individual extension i8mm, but at least from a quick browse in binutils >> history, they seem to have been added at the same time, so there's >> probably no such drawback. >> >> Or what's the situation with e.g. SVE2 - was ".arch_extension sve2" >> supported significantly earlier than ".arch armv9-a"? > > I have not tested SVE on LLVM. AFAIK, SVE and SVE2 are optional from 8.2 and > 9.0 onward respectively, and not mandatory in any version, so if your > toolchain supports neither .arch with plus sign, nor .arch_extension, it is > game over. I didn't meant specifically whether LLVM supports it here, just in general wrt binutils and how to enable the feature. FWIW it seems like SVE2 is a mandatory part of 9.0 - assembling SVE2 instructions can be done with ".arch armv9-a". But there are about 2 years worth of deployed binutils based toolchains that do recognize ".arch armv8.2-a; .arch_extension sve2" but don't recognize ".arch armv9-a". So for the generic mechanism for enabling cpu features, I'd prefer to keep the mechanism using primarily .arch_extension (with .arch set as high as necessary) rather than relying solely on .arch <version> without any extra +<feature>. >> If we'd do that, it does simplify the configure logic a fair bit and >> reduces the number of configure variables we need by a lot. It does enable >> a few more instruction set extensions than what we need though, but that's >> probably not a real issue. > > Yes. I made an attempt at simplifying the logic in configure and asm.S somewhat, while still primarily using .arch_extension, and while making sure we still can get the features assembled with current Clang with a high enough -march= setting. (Runtime enabled features are out of scope for Clang for now as we don't want to try to pass individual higher -march= options to the individual assembly files.) // Martin
Le tiistaina 30. toukokuuta 2023, 15.25.25 EEST Martin Storsjö a écrit : > On Sun, 28 May 2023, Rémi Denis-Courmont wrote: > > Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit : > >> I guess the alternative would be to just try to set .arch > >> <highest-supported-that-we-care-about>. I was worried that support for > >> e.g. armv8.6-a appeared later in toolchains than support for the > >> individual extension i8mm, but at least from a quick browse in binutils > >> history, they seem to have been added at the same time, so there's > >> probably no such drawback. > >> > >> Or what's the situation with e.g. SVE2 - was ".arch_extension sve2" > >> supported significantly earlier than ".arch armv9-a"? > > > > I have not tested SVE on LLVM. AFAIK, SVE and SVE2 are optional from 8.2 > > and 9.0 onward respectively, and not mandatory in any version, so if your > > toolchain supports neither .arch with plus sign, nor .arch_extension, it > > is game over. > > I didn't meant specifically whether LLVM supports it here, just in general > wrt binutils and how to enable the feature. > > FWIW it seems like SVE2 is a mandatory part of 9.0 Yes and no. SVE requires ARMv8.2, and SVE2 requires ARMv9.0, but neither are ever required by any existing ARM version. DDI0487 is abundantly clear that SVE2 is OPTIONAL: "[The Scalable Vector Extension version 2 (SVE2)] feature is supported in AArch64 state only. This feature is OPTIONAL in an Armv9.0 implementation," However it adds that "standard Armv9-A software platforms support FEAT_SVE2." > - assembling SVE2 instructions can be done with ".arch armv9-a". Presumably binutils targets "standard Armv9-A software platforms" by default, as opposed to just any "Armv9.0 implementation". I guess that means that any Cortex, Neoverse or other proper ARM design will include SVE2, but big-ass architecture licensees such as APPL and NVDA are not required to include SVE2 in their own designs. > But there are about 2 years > worth of deployed binutils based toolchains that do recognize ".arch > armv8.2-a; .arch_extension sve2" but don't recognize ".arch armv9-a". This is not entirely surprising, since SVE is much older than ARMv9, even if SVE2 requires ARMv9.
On Sun, 28 May 2023, Martin Storsjö wrote: > The documentation for .arch_extension hints at it being possible to disable > support for extensions with it too, but that doesn't seem to be the case in > practice. If it was, we could add macros to only enable specifically the > extensions we want around those functions that should use them and nothing > more. But I guess if that's not actually supported we can't do that. Actually, yes, this does work, it just uses a different syntax than I expected. To disable the extension <ext>, one can write the directive ".arch_extension no<ext>". Is the updated, less complex version of the configure patch more tolerable? // Martin
diff --git a/configure b/configure index 87f7afc2e1..3c7473efb2 100755 --- a/configure +++ b/configure @@ -454,6 +454,8 @@ Optimization options (experts only): --disable-armv6t2 disable armv6t2 optimizations --disable-vfp disable VFP optimizations --disable-neon disable NEON optimizations + --disable-dotprod disable DOTPROD optimizations + --disable-i8mm disable I8MM optimizations --disable-inline-asm disable use of inline assembly --disable-x86asm disable use of standalone x86 assembly --disable-mipsdsp disable MIPS DSP ASE R1 optimizations @@ -1154,6 +1156,41 @@ check_insn(){ check_as ${1}_external "$2" } +check_archext_insn(){ + log check_archext_insn "$@" + feature="$1" + base_arch="$2" + archext="$3" + instr="$4" + # Check if the assembly is accepted unconditionally in either inline or + # external assembly. + check_inline_asm ${feature}_inline "\"$instr\"" + check_as ${feature}_external "$instr" + + enabled_any ${feature}_inline ${feature}_external || disable ${feature} + + if disabled ${feature}_external; then + # If not accepted unconditionally, check if we can assemble it + # with a suitable .arch_extension directive. + test_as <<EOF && enable ${feature} as_archext_${archext}_directive +.arch_extension $archext +$instr +EOF + if disabled ${feature}; then + # If the base arch level is too low, .arch_extension can require setting + # a higher arch level with .arch too. Only do this if strictly needed; + # if the base level is e.g. arvm8.4-a and some features are available + # without any .arch_extension, we don't want to set ".arch armv8.2-a" + # for some other .arch_extension. + test_as <<EOF && enable ${feature} as_archext_${archext}_directive as_archext_${archext}_needs_arch +.arch $base_arch +.arch_extension $archext +$instr +EOF + fi + fi +} + check_x86asm(){ log check_x86asm "$@" name=$1 @@ -2059,6 +2096,8 @@ ARCH_EXT_LIST_ARM=" armv6 armv6t2 armv8 + dotprod + i8mm neon vfp vfpv3 @@ -2322,6 +2361,10 @@ SYSTEM_LIBRARIES=" TOOLCHAIN_FEATURES=" as_arch_directive + as_archext_dotprod_directive + as_archext_dotprod_needs_arch + as_archext_i8mm_directive + as_archext_i8mm_needs_arch as_dn_directive as_fpu_directive as_func @@ -2622,6 +2665,8 @@ intrinsics_neon_deps="neon" vfp_deps_any="aarch64 arm" vfpv3_deps="vfp" setend_deps="arm" +dotprod_deps="aarch64 neon" +i8mm_deps="aarch64 neon" map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM @@ -5979,12 +6024,26 @@ check_inline_asm inline_asm_labels '"1:\n"' check_inline_asm inline_asm_nonlocal_labels '"Label:\n"' if enabled aarch64; then + check_as as_arch_directive ".arch armv8.2-a" + enabled armv8 && check_insn armv8 'prfm pldl1strm, [x0]' # internal assembler in clang 3.3 does not support this instruction enabled neon && check_insn neon 'ext v0.8B, v0.8B, v1.8B, #1' enabled vfp && check_insn vfp 'fmadd d0, d0, d1, d2' - map 'enabled_any ${v}_external ${v}_inline || disable $v' $ARCH_EXT_LIST_ARM + archext_list="dotprod i8mm" + enabled dotprod && check_archext_insn dotprod armv8.2-a dotprod 'udot v0.4s, v0.16b, v0.16b' + enabled i8mm && check_archext_insn i8mm armv8.2-a i8mm 'usdot v0.4s, v0.16b, v0.16b' + + # Disable the main feature (e.g. HAVE_NEON) if neither inline nor external + # assembly support the feature out of the box. Skip this for the features + # checked with check_archext_insn above; they are checked separately whether + # they can be built out of the box or enabled with an .arch_extension + # flag. + for v in $ARCH_EXT_LIST_ARM; do + is_in $v $archext_list && continue + enabled_any ${v}_external ${v}_inline || disable $v + done elif enabled alpha; then @@ -6013,6 +6072,12 @@ EOF warn "Compiler does not indicate floating-point ABI, guessing $fpabi." fi + # Test for various instruction sets, testing support both in inline and + # external assembly. This sets the ${v}_inline or ${v}_external flags + # if the instruction can be used unconditionally in either inline or + # external assembly. This means that if the ${v}_external feature is set, + # that feature can be used unconditionally in various support macros + # anywhere in external assembly, in any function. enabled armv5te && check_insn armv5te 'qadd r0, r0, r0' enabled armv6 && check_insn armv6 'sadd16 r0, r0, r0' enabled armv6t2 && check_insn armv6t2 'movt r0, #0' @@ -6021,6 +6086,14 @@ EOF enabled vfpv3 && check_insn vfpv3 'vmov.f32 s0, #1.0' enabled setend && check_insn setend 'setend be' + # If neither inline nor external assembly can use the feature by default, + # disable the main unsuffixed feature (e.g. HAVE_NEON). + # + # For targets that support runtime CPU feature detection, don't disable + # the main feature flag - there we assume that all supported toolchains + # can assemble code for all instruction set features (e.g. NEON) with + # suitable assembly flags (such as ".fpu neon"); we don't check + # specifically that they really do. [ $target_os = linux ] || [ $target_os = android ] || map 'enabled_any ${v}_external ${v}_inline || disable $v' \ $ARCH_EXT_LIST_ARM @@ -7601,6 +7674,8 @@ fi if enabled aarch64; then echo "NEON enabled ${neon-no}" echo "VFP enabled ${vfp-no}" + echo "DOTPROD enabled ${dotprod-no}" + echo "I8MM enabled ${i8mm-no}" fi if enabled arm; then echo "ARMv5TE enabled ${armv5te-no}" diff --git a/libavutil/aarch64/asm.S b/libavutil/aarch64/asm.S index a7782415d7..7cf907f93c 100644 --- a/libavutil/aarch64/asm.S +++ b/libavutil/aarch64/asm.S @@ -36,6 +36,19 @@ # define __has_feature(x) 0 #endif +#if HAVE_AS_ARCH_DIRECTIVE +#if HAVE_AS_ARCHEXT_DOTPROD_NEEDS_ARCH || HAVE_AS_ARCHEXT_I8MM_NEEDS_ARCH + .arch armv8.2-a +#endif +#endif + +#if HAVE_AS_ARCHEXT_DOTPROD_DIRECTIVE + .arch_extension dotprod +#endif +#if HAVE_AS_ARCHEXT_I8MM_DIRECTIVE + .arch_extension i8mm +#endif + /* Support macros for * - Armv8.3-A Pointer Authentication and