diff mbox series

[FFmpeg-devel] fate.sh: Allow overriding what targets to make for running the tests

Message ID 20231127123118.3622784-1-martin@martin.st
State Accepted
Commit d0e215fffcb4cf0fb9876bf5722dacdee71580be
Headers show
Series [FFmpeg-devel] fate.sh: Allow overriding what targets to make for running the tests | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Martin Storsjö Nov. 27, 2023, 12:31 p.m. UTC
This can be useful if doing testing of uncommon CPU extensions by
running tests with QEMU (by configuring with e.g.
"target_exec=qemu-aarch64"), by only running the checkasm tests,
to get a reasonable test coverage without excessive test runtime.

For such a config, setting fate_targets="fate-checkasm fate-cpu"
can be a good tradeoff.
---
 doc/fate_config.sh.template | 2 ++
 tests/fate.sh               | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Rémi Denis-Courmont Nov. 27, 2023, 3:46 p.m. UTC | #1
Le maanantaina 27. marraskuuta 2023, 14.31.18 EET Martin Storsjö a écrit :
> This can be useful if doing testing of uncommon CPU extensions by
> running tests with QEMU (by configuring with e.g.
> "target_exec=qemu-aarch64"), by only running the checkasm tests,
> to get a reasonable test coverage without excessive test runtime.

For the purpose of testing future or bleeding-edge CPU extensions on emulator, 
you would normally want to be able to actually filter those in. That is more of 
a matter of patching checkasm than FATE.

Considering the poor coverage of checkasm, I fear that this just gives the 
wrong impression, not to say a false sense of security. It feels misleading to 
encourage or support that paradigm into FATE, in light of that poor coverage. 
Afterall, if it's just about running checkasm, anybody can just run
`make tests/checkasm/checkasm && tests/checkasm/checkasm`.

Either way, this feels like a case of cart before horse.

Also FWIW, RV broke due to misaligned accesses and illegal vector types that 
QEMU tolerated. That is rather an argument against QEMU than against this MR 
but still.
Martin Storsjö Nov. 27, 2023, 9:55 p.m. UTC | #2
On Mon, 27 Nov 2023, Rémi Denis-Courmont wrote:

> Le maanantaina 27. marraskuuta 2023, 14.31.18 EET Martin Storsjö a écrit :
>> This can be useful if doing testing of uncommon CPU extensions by
>> running tests with QEMU (by configuring with e.g.
>> "target_exec=qemu-aarch64"), by only running the checkasm tests,
>> to get a reasonable test coverage without excessive test runtime.
>
> For the purpose of testing future or bleeding-edge CPU extensions on 
> emulator, you would normally want to be able to actually filter those 
> in. That is more of a matter of patching checkasm than FATE.

Sorry, can you elaborate on what you mean with "filter those in" here?

> Considering the poor coverage of checkasm, I fear that this just gives 
> the wrong impression, not to say a false sense of security. It feels 
> misleading to encourage or support that paradigm into FATE, in light of 
> that poor coverage. Afterall, if it's just about running checkasm, 
> anybody can just run `make tests/checkasm/checkasm && 
> tests/checkasm/checkasm`.

Yes, anybody can run that - but having those results posted continuously 
somewhere where other can see them can be valuable as well.

Anyway, the concrete case I'm considering, is that we've got AArch64 code 
merged, that uses the I8MM extensions. We don't have any FATE 
configuration that continuously test that. Whenever there are patches, I 
do spin up a cloud instance that supports this extension and test the 
patches there, but inbetween that we're pretty much blind.

While checkasm's coverage isn't fantastic, for this particular case I'm 
not merging any AArch64 code for new extensions unless that code is 
covered by checkasm.

The other AArch64 feature that we do have code for, which also is 
untested, is the assembly support for branch protection and pointer 
authentication. Also this is testable pretty easily with QEMU. It's of 
course more interesting to run the full fate suite, but if we're not 
looking for bugs in the compiler but only for bugs in our assembly, then 
checkasm should cover most of it.

Yes there's potential for QEMU bugs hiding real issues, but I'd rather 
have a regular run of QEMU+checkasm than not have it tested at all. And 
I'm volunteering HW+time for testing these cases with QEMU for whatever 
checkasm covers, but I'm not volunteering it for full fate runs with QEMU.

And sure, I can just run such configs privately, as I already do run a 
bunch of various regular builds for projects I care about - but as we do 
have FATE with the public status page, hooking it up to be reported there 
would feel like added value for everybody.

// Martin
Alexander Strasser Nov. 27, 2023, 10:10 p.m. UTC | #3
Hi Martin,

patch looks technically good to me.

On 2023-11-27 17:46 +0200, Rémi Denis-Courmont wrote:
> Le maanantaina 27. marraskuuta 2023, 14.31.18 EET Martin Storsjö a écrit :
> > This can be useful if doing testing of uncommon CPU extensions by
> > running tests with QEMU (by configuring with e.g.
> > "target_exec=qemu-aarch64"), by only running the checkasm tests,
> > to get a reasonable test coverage without excessive test runtime.
>
> For the purpose of testing future or bleeding-edge CPU extensions on emulator,
> you would normally want to be able to actually filter those in. That is more of
> a matter of patching checkasm than FATE.
>
> Considering the poor coverage of checkasm, I fear that this just gives the
> wrong impression, not to say a false sense of security. It feels misleading to
> encourage or support that paradigm into FATE, in light of that poor coverage.
> Afterall, if it's just about running checkasm, anybody can just run
> `make tests/checkasm/checkasm && tests/checkasm/checkasm`.

Agree, the practice should not be encouraged over cases where it is
necessary to be practical, but having fate clients run the tests and
submit the results seems still worthwhile to me.

We shouldn't get a false sense of security and it can be a problem.
Improving checkasm and qemu and maybe getting actual bleeding edge
hardware into fate could help with that. Not saying it is easy or
effortless...

Still having tests run and some failures detected seems better
than nothing to me. It would be only problematic if the false
positives or negatives weigh out the cases where it is helpful.

@Remi: Please take no offense. This does not look like a black
and white issue to me , thus I wanted to voice what I thought
when initially saw the patch and your response.


Best regards,
  Alexander


> Either way, this feels like a case of cart before horse.
>
> Also FWIW, RV broke due to misaligned accesses and illegal vector types that
> QEMU tolerated. That is rather an argument against QEMU than against this MR
> but still.
Michael Niedermayer Nov. 27, 2023, 11:22 p.m. UTC | #4
On Mon, Nov 27, 2023 at 05:46:40PM +0200, Rémi Denis-Courmont wrote:
[...]
> Also FWIW, RV broke due to misaligned accesses and illegal vector types that
> QEMU tolerated. That is rather an argument against QEMU than against this MR 
> but still.

has someone reported this to qemu ?
(seems like a bug)

thx

[...]
Rémi Denis-Courmont Nov. 28, 2023, 7:27 a.m. UTC | #5
Le 28 novembre 2023 01:22:14 GMT+02:00, Michael Niedermayer <michael@niedermayer.cc> a écrit :
>On Mon, Nov 27, 2023 at 05:46:40PM +0200, Rémi Denis-Courmont wrote:
>[...]
>> Also FWIW, RV broke due to misaligned accesses and illegal vector types that
>> QEMU tolerated. That is rather an argument against QEMU than against this MR 
>> but still.
>
>has someone reported this to qemu ?
>(seems like a bug)

It's not a bug. The specification leaves those cases *undefined*. QEMU supports them because they can, and adding sanity checks would just slow stuff down.

Also generally QEMU TCG policy seems to be maximize perf and compatibility, not formal correctness.

>
>thx
>
>[...]
Michael Niedermayer Nov. 28, 2023, 2:21 p.m. UTC | #6
On Tue, Nov 28, 2023 at 09:27:08AM +0200, Rémi Denis-Courmont wrote:
> 
> 
> Le 28 novembre 2023 01:22:14 GMT+02:00, Michael Niedermayer <michael@niedermayer.cc> a écrit :
> >On Mon, Nov 27, 2023 at 05:46:40PM +0200, Rémi Denis-Courmont wrote:
> >[...]
> >> Also FWIW, RV broke due to misaligned accesses and illegal vector types that
> >> QEMU tolerated. That is rather an argument against QEMU than against this MR 
> >> but still.
> >
> >has someone reported this to qemu ?
> >(seems like a bug)
> 
> It's not a bug. The specification leaves those cases *undefined*. QEMU supports them because they can, and adding sanity checks would just slow stuff down.
> 
> Also generally QEMU TCG policy seems to be maximize perf and compatibility, not formal correctness.

I think i read somewhere that recent qemu supposedly checks alignment on arm
more completely. But i couldnt quickly find a official statement about that

But either way, qemu could emit such code optionally when it is used for
testing. Which is one of the things people use qemu for.
So IMHO it would make sense for qemu to detect cases that are undefined
even if for no other reason than to emulate the hw more exactly.
If this is not done, qemu can be detected and code could refuse or
fail to run

thx


[...]
Martin Storsjö Nov. 30, 2023, 11:05 a.m. UTC | #7
On Mon, 27 Nov 2023, Martin Storsjö wrote:

> Anyway, the concrete case I'm considering, is that we've got AArch64 code 
> merged, that uses the I8MM extensions. We don't have any FATE configuration 
> that continuously test that. Whenever there are patches, I do spin up a cloud 
> instance that supports this extension and test the patches there, but 
> inbetween that we're pretty much blind.
>
> While checkasm's coverage isn't fantastic, for this particular case I'm not 
> merging any AArch64 code for new extensions unless that code is covered by 
> checkasm.
>
> The other AArch64 feature that we do have code for, which also is untested, 
> is the assembly support for branch protection and pointer authentication. 
> Also this is testable pretty easily with QEMU. It's of course more 
> interesting to run the full fate suite, but if we're not looking for bugs in 
> the compiler but only for bugs in our assembly, then checkasm should cover 
> most of it.

So if you don't mind, I'd like to go ahead and push this, and set up those 
instances so that we do get some continuous testing of the corners of our 
aarch64 assembly that we otherwise only test occasionally.

// Martin
Rémi Denis-Courmont Nov. 30, 2023, 2:23 p.m. UTC | #8
Le 27 novembre 2023 23:55:18 GMT+02:00, "Martin Storsjö" <martin@martin.st> a écrit :
>On Mon, 27 Nov 2023, Rémi Denis-Courmont wrote:
>
>> Le maanantaina 27. marraskuuta 2023, 14.31.18 EET Martin Storsjö a écrit :
>>> This can be useful if doing testing of uncommon CPU extensions by
>>> running tests with QEMU (by configuring with e.g.
>>> "target_exec=qemu-aarch64"), by only running the checkasm tests,
>>> to get a reasonable test coverage without excessive test runtime.
>> 
>> For the purpose of testing future or bleeding-edge CPU extensions on emulator, you would normally want to be able to actually filter those in. That is more of a matter of patching checkasm than FATE.
>
>Sorry, can you elaborate on what you mean with "filter those in" here?

You're running all checkasm tests, not just those that require the emulator.

But what's potentially much worse is that you're triggering a whole build, or it's not entirely clear from the description how you'd reuse an existing build.

For Armv8, that's just bad. For RV, that's terrible, as we need to run the same checkasm with different emulator configuration (different $QEMU_CPU in the case of QEMU): one per vector length. Armv9 will potentially have the same problem if FFmpeg grows SVE(2) support.

>
>> Considering the poor coverage of checkasm, I fear that this just gives the wrong impression, not to say a false sense of security. It feels misleading to encourage or support that paradigm into FATE, in light of that poor coverage. Afterall, if it's just about running checkasm, anybody can just run `make tests/checkasm/checkasm && tests/checkasm/checkasm`.
>
>Yes, anybody can run that - but having those results posted continuously somewhere where other can see them can be valuable as well.
>
>Anyway, the concrete case I'm considering, is that we've got AArch64 code merged, that uses the I8MM extensions. We don't have any FATE configuration that continuously test that. Whenever there are patches, I do spin up a cloud instance that supports this extension and test the patches there, but inbetween that we're pretty much blind.
>
>While checkasm's coverage isn't fantastic, for this particular case I'm not merging any AArch64 code for new extensions unless that code is covered by checkasm.
>
>The other AArch64 feature that we do have code for, which also is untested, is the assembly support for branch protection and pointer authentication. Also this is testable pretty easily with QEMU. It's of course more interesting to run the full fate suite, but if we're not looking for bugs in the compiler but only for bugs in our assembly, then checkasm should cover most of it.
>
>Yes there's potential for QEMU bugs hiding real issues, but I'd rather have a regular run of QEMU+checkasm than not have it tested at all. And I'm volunteering HW+time for testing these cases with QEMU for whatever checkasm covers, but I'm not volunteering it for full fate runs with QEMU.
>
>And sure, I can just run such configs privately, as I already do run a bunch of various regular builds for projects I care about - but as we do have FATE with the public status page, hooking it up to be reported there would feel like added value for everybody.
>
>// Martin
>_______________________________________________
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Martin Storsjö Nov. 30, 2023, 3:34 p.m. UTC | #9
On Thu, 30 Nov 2023, Rémi Denis-Courmont wrote:

> Le 27 novembre 2023 23:55:18 GMT+02:00, "Martin Storsjö" <martin@martin.st> a écrit :
>> On Mon, 27 Nov 2023, Rémi Denis-Courmont wrote:
>>
>>> Le maanantaina 27. marraskuuta 2023, 14.31.18 EET Martin Storsjö a écrit :
>>>> This can be useful if doing testing of uncommon CPU extensions by
>>>> running tests with QEMU (by configuring with e.g.
>>>> "target_exec=qemu-aarch64"), by only running the checkasm tests,
>>>> to get a reasonable test coverage without excessive test runtime.
>>> 
>>> For the purpose of testing future or bleeding-edge CPU extensions on emulator, you would normally want to be able to actually filter those in. That is more of a matter of patching checkasm than FATE.
>>
>> Sorry, can you elaborate on what you mean with "filter those in" here?
>
> You're running all checkasm tests, not just those that require the 
> emulator.
>
> But what's potentially much worse is that you're triggering a whole 
> build, or it's not entirely clear from the description how you'd reuse 
> an existing build.

Yeah, I wouldn't reuse an existing build here. For the setup I have in 
mind, one build doesn't take too horribly long (either on an old desktop 
x86 machine, or a moderate aarch64 server) - so it's not ideal but not a 
dealbreaker anyway (while running all of fate with qemu takes one 
magnitude longer).

For the other setup I intended to test, to test AArch64 PAC and BTI, I 
would do a separate build with -mbranch-protection=standard anyway.

> For Armv8, that's just bad. For RV, that's terrible, as we need to run 
> the same checkasm with different emulator configuration (different 
> $QEMU_CPU in the case of QEMU): one per vector length. Armv9 will 
> potentially have the same problem if FFmpeg grows SVE(2) support.

Yes, for SVE I would ideally like to test all vector lengths (I did such a 
setup for x264 recently, when someone was proposing some SVE codepaths). I 
don't have a neat idea for how to integrate that into FATE, and this patch 
doesn't buy us that indeed.

But running tests with the default QEMU settings would at least test with 
a larger vector length than the usual, so it would provide at least some 
coverage. Not exhaustive, but at least something.

So the setup I have in mind wouldn't cover all those cases, but it would 
at least fix some current gaps in testing coverage. I guess it's a case of 
whether we should let perfect be the enemy of good; adding this lets us 
easily add a fair bit of more coverage, in particular of the (new) 
handwritten asm. And it shouldn't get in the way of doing other better 
solutions at a later point.

// Martin
Rémi Denis-Courmont Nov. 30, 2023, 3:52 p.m. UTC | #10
Le tiistaina 28. marraskuuta 2023, 16.21.55 EET Michael Niedermayer a écrit :
> On Tue, Nov 28, 2023 at 09:27:08AM +0200, Rémi Denis-Courmont wrote:
> > Le 28 novembre 2023 01:22:14 GMT+02:00, Michael Niedermayer 
<michael@niedermayer.cc> a écrit :
> > >On Mon, Nov 27, 2023 at 05:46:40PM +0200, Rémi Denis-Courmont wrote:
> > >[...]
> > >
> > >> Also FWIW, RV broke due to misaligned accesses and illegal vector types
> > >> that QEMU tolerated. That is rather an argument against QEMU than
> > >> against this MR but still.
> > >
> > >has someone reported this to qemu ?
> > >(seems like a bug)
> > 
> > It's not a bug. The specification leaves those cases *undefined*. QEMU
> > supports them because they can, and adding sanity checks would just slow
> > stuff down.
> > 
> > Also generally QEMU TCG policy seems to be maximize perf and
> > compatibility, not formal correctness.
> I think i read somewhere that recent qemu supposedly checks alignment on arm
> more completely. But i couldnt quickly find a official statement about that

As of 8.2.0-rc2, it most definitely does not:

----8<----
static inline void gen_check_sp_alignment(DisasContext *s)
{
    /* The AArch64 architecture mandates that (if enabled via PSTATE
     * or SCTLR bits) there is a check that SP is 16-aligned on every
     * SP-relative load or store (with an exception generated if it is not).
     * In line with general QEMU practice regarding misaligned accesses,
     * we omit these checks for the sake of guest program performance.
     * This function is provided as a hook so we can more easily add these
     * checks in future (possibly as a "favour catching guest program bugs
     * over speed" user selectable option).
     */ 
}                                  
---->8----

And this is an actual violation of the specification. In the RISC-V case, QEMU 
is not even violating the specification, just making a different choice than the 
only one currently commercially available hardware implementation.

> But either way, qemu could emit such code optionally when it is used for
> testing. Which is one of the things people use qemu for.

That would be very true for system mode "soft-MMU" QEMU, but much more 
questionable for user mode. In any case, I don't make their policies.

> So IMHO it would make sense for qemu to detect cases that are undefined
> even if for no other reason than to emulate the hw more exactly.

I would agree that optional flags would be sensible. But TBH, we don't even yet 
know how the IPs from other vendors than Alibaba/T-Head will behave.

> If this is not done, qemu can be detected and code could refuse or
> fail to run
Rémi Denis-Courmont Nov. 30, 2023, 4:03 p.m. UTC | #11
Le torstaina 30. marraskuuta 2023, 17.34.31 EET Martin Storsjö a écrit :
> Yeah, I wouldn't reuse an existing build here. For the setup I have in
> mind, one build doesn't take too horribly long (either on an old desktop
> x86 machine, or a moderate aarch64 server) - so it's not ideal but not a
> dealbreaker anyway (while running all of fate with qemu takes one
> magnitude longer).

Well it's pretty much a deal breaker for Armv9 and RV. I can understand 
wanting to build on a comfy x86 server, but doing different builds just to 
change QEMU CPU flags is IMO inept.

Sure, we could just build once and run several times checkasm with a separate 
script, as I already pointed out. But then this patch is completely 
unnecessary.

> For the other setup I intended to test, to test AArch64 PAC and BTI, I
> would do a separate build with -mbranch-protection=standard anyway.

That does not make much sense to me. PAC and BTI should be enabled by default 
in compatibility mode (for ARMv8.0-8.2 builds) or noncompatibility mode (for 
ARMv8.3+ builds). The resulting code should be tested with and without PAC and 
with and without BTI.

Separate builds only might make sense if you want to do something more fancy 
with PAC, requiring the non-HINT instructions, but then that is beyond 
"standard" branch protection. For BTI, there are no reasons whatsoever to make 
separate builds; it's a literal waste of time and energy.
Martin Storsjö Nov. 30, 2023, 4:28 p.m. UTC | #12
On Thu, 30 Nov 2023, Rémi Denis-Courmont wrote:

> Le torstaina 30. marraskuuta 2023, 17.34.31 EET Martin Storsjö a écrit :
>> Yeah, I wouldn't reuse an existing build here. For the setup I have in
>> mind, one build doesn't take too horribly long (either on an old desktop
>> x86 machine, or a moderate aarch64 server) - so it's not ideal but not a
>> dealbreaker anyway (while running all of fate with qemu takes one
>> magnitude longer).
>
> Well it's pretty much a deal breaker for Armv9 and RV. I can understand 
> wanting to build on a comfy x86 server, but doing different builds just to 
> change QEMU CPU flags is IMO inept.

Yes. But for doing one single run with QEMU, I don't mind.

Again, for SVE, I'd rather have testing with 1 config (the default, which 
is longer vectors than one usually encounters in HW) rather than none at 
all. It won't catch every theoretical issue but practically would catch 
many things at least.

Are you volunteering to write FATE integration to run checkasm multiple 
times with different QEMU settings, so I can wait for that instead of 
having much improved public test coverage right now?

> Sure, we could just build once and run several times checkasm with a separate 
> script, as I already pointed out. But then this patch is completely 
> unnecessary.

Indeed, that's trivial to do for a private testing setup.

>> For the other setup I intended to test, to test AArch64 PAC and BTI, I
>> would do a separate build with -mbranch-protection=standard anyway.
>
> That does not make much sense to me. PAC and BTI should be enabled by default 
> in compatibility mode (for ARMv8.0-8.2 builds) or noncompatibility mode (for 
> ARMv8.3+ builds).

Maybe it should - but it currently isn't.

And in order to actually test BTI, one has to link with a sysroot that 
also was built with BTI enabled - I currently use a sysroot extracted from 
fedora for that. (And my tests for it use -Wl,-z,force-bti.)

// Martin
Rémi Denis-Courmont Nov. 30, 2023, 5:37 p.m. UTC | #13
Le torstaina 30. marraskuuta 2023, 18.28.39 EET Martin Storsjö a écrit :
> On Thu, 30 Nov 2023, Rémi Denis-Courmont wrote:
> > Le torstaina 30. marraskuuta 2023, 17.34.31 EET Martin Storsjö a écrit :
> >> Yeah, I wouldn't reuse an existing build here. For the setup I have in
> >> mind, one build doesn't take too horribly long (either on an old desktop
> >> x86 machine, or a moderate aarch64 server) - so it's not ideal but not a
> >> dealbreaker anyway (while running all of fate with qemu takes one
> >> magnitude longer).
> > 
> > Well it's pretty much a deal breaker for Armv9 and RV. I can understand
> > wanting to build on a comfy x86 server, but doing different builds just to
> > change QEMU CPU flags is IMO inept.
> 
> Yes. But for doing one single run with QEMU, I don't mind.

You can already test it properly as things stand, and reporting is trivial, 
just not to the FATE website. The question is whether this is worth adding to 
FATE. In other words, is publishing on the FATE website worth making the tests 
coverage and/or the build time worse? not to mention confusing the existing 
website users with weirdly incomplete test results.

> Again, for SVE, I'd rather have testing with 1 config (the default, which
> is longer vectors than one usually encounters in HW) rather than none at
> all. It won't catch every theoretical issue but practically would catch
> many things at least.

I find that statement very misleading. This is not a question of testing 1 
config vs 0. It's a question of testing 1 configuration vs all of them(*), and 
reporting that one vs reporting all of them elsewhere than FATE.ffmpeg.org. 
Until/unless somebody does the missing integration.

(*) at least those that QEMU supports

> Are you volunteering to write FATE integration to run checkasm multiple 
> times with different QEMU settings, so I can wait for that instead of 
> having much improved public test coverage right now?

Of course I will not volunteer, given that the RISE project already has an 
outstanding RfP which will likely require this done professionally:
https://hubs.la/Q029hwpS0
(That does not mean that I would have volunteered otherwise, just that the 
question is moot as far as I am concerned and for the time being.)

> > Sure, we could just build once and run several times checkasm with a
> > separate script, as I already pointed out. But then this patch is
> > completely unnecessary.
> 
> Indeed, that's trivial to do for a private testing setup.
> 
> >> For the other setup I intended to test, to test AArch64 PAC and BTI, I
> >> would do a separate build with -mbranch-protection=standard anyway.
> > 
> > That does not make much sense to me. PAC and BTI should be enabled by
> > default in compatibility mode (for ARMv8.0-8.2 builds) or
> > noncompatibility mode (for ARMv8.3+ builds).
> 
> Maybe it should - but it currently isn't.

That's really up to whoever set up the AArch64 builds to fix their build flags 
TBH (I believe that the assembler is already sorted). And at least for PAuth, 
that should be sufficient, as support from the C runtime is not required.

> And in order to actually test BTI, one has to link with a sysroot that
> also was built with BTI enabled - I currently use a sysroot extracted from
> fedora for that. (And my tests for it use -Wl,-z,force-bti.)

I can readily believe how much of a PITA that would be to set up. I can also 
believe that glibc won't allow masking the guarded page bit in mmap()/
mprotect().

That does not mean you need different builds to test each of the 4 possible 
combinations (or 3 if you ignore the case of BTI without PAC, which does not 
exist in real hardware). Once you have that build, you can test it with 
whichever QEMU CPU settings. Surely Fedora, of all distros, is not going to 
treat Armv8.5-BTI as a distinct arch from AArch64 whilst Arm made sure it was 
both backward-compatible and runtime-tunable.
Martin Storsjö Nov. 30, 2023, 9:13 p.m. UTC | #14
On Thu, 30 Nov 2023, Rémi Denis-Courmont wrote:

> You can already test it properly as things stand, and reporting is trivial, 
> just not to the FATE website. The question is whether this is worth adding to 
> FATE.

More public test coverage is better than less, isn't it?

> In other words, is publishing on the FATE website worth making the tests 
> coverage and/or the build time worse?

By making the test coverage worse, you mean if I'd be doing the full 
testing of many combinations already, and I'd stop doing that in order to 
do this lesser testing instead? If I'd be doing it (I currently don't) I 
guess that would be my concern, not others?

And whether I want to spend my cpu cycles on testing for a public FATE 
config, even if it only tests a limited amount, in addition what tests I 
(hypothetically) run privately, wasting more CPU on building ffmpeg, 
that's also my concern and not others?

> not to mention confusing the existing website users with weirdly 
> incomplete test results.

FWIW, there are a bunch of otherwise weird, limited configs as well, e.g. 
http://fate.ffmpeg.org/history.cgi?slot=x86_64-archlinux-gcc-disableavcodec. 
Although there, the reason for the limited number of tests is clearly 
visible.

>> Again, for SVE, I'd rather have testing with 1 config (the default, which
>> is longer vectors than one usually encounters in HW) rather than none at
>> all. It won't catch every theoretical issue but practically would catch
>> many things at least.
>
> I find that statement very misleading. This is not a question of testing 1 
> config vs 0. It's a question of testing 1 configuration vs all of them(*), and 
> reporting that one vs reporting all of them elsewhere than FATE.ffmpeg.org. 
> Until/unless somebody does the missing integration.

Currently I test 0 of these configurations. I would like to test 1 such 
config, and publish those results on the FATE website. I don't currently 
test any form of "all configs". And if I wanted to make a private setup 
for testing "all configs", I really don't see how it would be mutually 
exclusive with the publicly posted test results from the one config?

>> And in order to actually test BTI, one has to link with a sysroot that
>> also was built with BTI enabled - I currently use a sysroot extracted from
>> fedora for that. (And my tests for it use -Wl,-z,force-bti.)
>
> I can readily believe how much of a PITA that would be to set up. I can also 
> believe that glibc won't allow masking the guarded page bit in mmap()/
> mprotect().
>
> That does not mean you need different builds to test each of the 4 possible 
> combinations (or 3 if you ignore the case of BTI without PAC, which does not 
> exist in real hardware). Once you have that build, you can test it with 
> whichever QEMU CPU settings.

I didn't mean to imply that one would have to do separate builds for all 
of those. I currently don't do any testing with builds with 
-mbranch-protection=standard (and with different sysroots), but I was 
considering adding one such build, with the fedora sysroot - and only test 
one single configuration with it (with QEMU's defaults of all features 
enabled).


So, to spell out your objection in simpler terms. You are firmly against 
anybody posting test results on FATE that only include checkasm but not 
the rest of the tests, because you consider that this can be 
misleading/confusing to people reading the test results - is that right?

Or would such a setup be acceptable to you, if someone would implement a 
way of running the tests (either the full set or only a subset such as 
chckasm) multiple times with different QEMU configurations, with the same 
build of ffmpeg, within the same FATE run?

// Martin
Rémi Denis-Courmont Dec. 1, 2023, 7:36 a.m. UTC | #15
Le 30 novembre 2023 23:13:59 GMT+02:00, "Martin Storsjö" <martin@martin.st> a écrit :
>On Thu, 30 Nov 2023, Rémi Denis-Courmont wrote:
>
>> You can already test it properly as things stand, and reporting is trivial, just not to the FATE website. The question is whether this is worth adding to FATE.
>
>More public test coverage is better than less, isn't it?

That's a false dichotomy.

>> In other words, is publishing on the FATE website worth making the tests coverage and/or the build time worse?
>
>By making the test coverage worse, you mean if I'd be doing the full testing of many combinations already, and I'd stop doing that in order to do this lesser testing instead? If I'd be doing it (I currently don't) I guess that would be my concern, not others?

No. The point is that this is adding a small hack that works for one specific case for a short while (testing Armv8 IMM8 and DP), but is known not to be sufficient anyway (for SVE, PAuth, RVV, etc).

In the end, it's all about not adding inadequate interfaces and supporting/publishing bad solutions. It's certainly not as bad as if it were a public C API, but that doesn't make it good. Normally "insufficient" interfaces don't get merged for a variety of reasons.

>>> Again, for SVE, I'd rather have testing with 1 config (the default, which
>>> is longer vectors than one usually encounters in HW) rather than none at
>>> all. It won't catch every theoretical issue but practically would catch
>>> many things at least.
>> 
>> I find that statement very misleading. This is not a question of testing 1 config vs 0. It's a question of testing 1 configuration vs all of them(*), and reporting that one vs reporting all of them elsewhere than FATE.ffmpeg.org. Until/unless somebody does the missing integration.
>
>Currently I test 0 of these configurations. I would like to test 1 such config, and publish those results on the FATE website. I don't currently test any form of "all configs". And if I wanted to make a private setup for testing "all configs", I really don't see how it would be mutually exclusive with the publicly posted test results from the one config?
>
>>> And in order to actually test BTI, one has to link with a sysroot that
>>> also was built with BTI enabled - I currently use a sysroot extracted from
>>> fedora for that. (And my tests for it use -Wl,-z,force-bti.)
>> 
>> I can readily believe how much of a PITA that would be to set up. I can also believe that glibc won't allow masking the guarded page bit in mmap()/
>> mprotect().
>> 
>> That does not mean you need different builds to test each of the 4 possible combinations (or 3 if you ignore the case of BTI without PAC, which does not exist in real hardware). Once you have that build, you can test it with whichever QEMU CPU settings.
>
>I didn't mean to imply that one would have to do separate builds for all of those. I currently don't do any testing with builds with -mbranch-protection=standard (and with different sysroots), but I was considering adding one such build, with the fedora sysroot - and only test one single configuration with it (with QEMU's defaults of all features enabled).
>
>
>So, to spell out your objection in simpler terms. You are firmly against anybody posting test results on FATE that only include checkasm but not the rest of the tests, because you consider that this can be misleading/confusing to people reading the test results - is that right?
>
>Or would such a setup be acceptable to you, if someone would implement a way of running the tests (either the full set or only a subset such as chckasm) multiple times with different QEMU configurations, with the same build of ffmpeg, within the same FATE run?
>
>// Martin
>_______________________________________________
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Martin Storsjö Dec. 1, 2023, 7:55 a.m. UTC | #16
On Fri, 1 Dec 2023, Rémi Denis-Courmont wrote:

> Le 30 novembre 2023 23:13:59 GMT+02:00, "Martin Storsjö" <martin@martin.st> a écrit :
>> On Thu, 30 Nov 2023, Rémi Denis-Courmont wrote:
>
>>> In other words, is publishing on the FATE website worth making the 
>>> tests coverage and/or the build time worse?
>>
>> By making the test coverage worse, you mean if I'd be doing the full 
>> testing of many combinations already, and I'd stop doing that in order 
>> to do this lesser testing instead? If I'd be doing it (I currently 
>> don't) I guess that would be my concern, not others?
>
> No. The point is that this is adding a small hack that works for one 
> specific case for a short while (testing Armv8 IMM8 and DP), but is 
> known not to be sufficient anyway (for SVE, PAuth, RVV, etc).

I'll reiterate the question from the bottom of the mail, that you didn't 
respond to.

Would you be ok with a setup, where a FATE instance optionally can run a 
subset of tests instead of the full suite, but run them multiple times 
with e.g. different QEMU settings? That would allow repeating checkasm for 
all the interesting cases - and if one really wanted to spend a lot of CPU 
time on it, also could run the full FATE suite in all those 
configurations.

// Martin
Rémi Denis-Courmont Dec. 1, 2023, 12:06 p.m. UTC | #17
Le 1 décembre 2023 09:55:15 GMT+02:00, "Martin Storsjö" <martin@martin.st> a écrit :
>On Fri, 1 Dec 2023, Rémi Denis-Courmont wrote:
>
>> Le 30 novembre 2023 23:13:59 GMT+02:00, "Martin Storsjö" <martin@martin.st> a écrit :
>>> On Thu, 30 Nov 2023, Rémi Denis-Courmont wrote:
>> 
>>>> In other words, is publishing on the FATE website worth making the tests coverage and/or the build time worse?
>>> 
>>> By making the test coverage worse, you mean if I'd be doing the full testing of many combinations already, and I'd stop doing that in order to do this lesser testing instead? If I'd be doing it (I currently don't) I guess that would be my concern, not others?
>> 
>> No. The point is that this is adding a small hack that works for one specific case for a short while (testing Armv8 IMM8 and DP), but is known not to be sufficient anyway (for SVE, PAuth, RVV, etc).
>
>I'll reiterate the question from the bottom of the mail, that you didn't respond to.
>
>Would you be ok with a setup, where a FATE instance optionally can run a subset of tests instead of the full suite, but run them multiple times with e.g. different QEMU settings? That would allow repeating checkasm for all the interesting cases - and if one really wanted to spend a lot of CPU time on it, also could run the full FATE suite in all those configurations.

Being able to run tests under a different runner/wrapper or the same runner with different settings, would be a lot more viable, indeed IMO

>
>// Martin
>_______________________________________________
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff mbox series

Patch

diff --git a/doc/fate_config.sh.template b/doc/fate_config.sh.template
index ab1bda45e4..06bb79a832 100644
--- a/doc/fate_config.sh.template
+++ b/doc/fate_config.sh.template
@@ -31,3 +31,5 @@  makeopts=       # extra options passed to 'make'
                 # defaulting to makeopts above if this is not set
 #tar=           # command to create a tar archive from its arguments on stdout,
                 # defaults to 'tar c'
+#fate_targets=  # targets to make when running fate; defaults to "fate",
+                # can be set to run a subset of tests, e.g. "fate-checkasm".
diff --git a/tests/fate.sh b/tests/fate.sh
index 072e471256..d07a1cf90c 100755
--- a/tests/fate.sh
+++ b/tests/fate.sh
@@ -101,7 +101,7 @@  compile_extra()(
 fate()(
     test "$build_only" = "yes" && return
     cd ${build} || return
-    ${make} ${makeopts_fate-${makeopts}} -k fate
+    ${make} ${makeopts_fate-${makeopts}} -k ${fate_targets}
 )
 
 clean(){
@@ -132,6 +132,7 @@  cd ${workdir}       || die "cd ${workdir} failed"
 src=${workdir}/src
 : ${build:=${workdir}/build}
 : ${inst:=${workdir}/install}
+: ${fate_targets:=fate}
 
 test -d "$src" && update || checkout || die "Error fetching source"