diff mbox series

[FFmpeg-devel,03/10] checkasm: Add idctdsp add/put-pixels-clamped tests

Message ID 20220325185257.513933-4-bavison@riscosopen.org
State New
Headers show
Series avcodec/vc1: Arm optimisations | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Ben Avison March 25, 2022, 6:52 p.m. UTC
Disable ff_add_pixels_clamped_arm, which was found to fail the test. As this
is normally only used for Arms prior to Armv6 (ARM11) it seems quite unlikely
that anyone is still using this, so I haven't put in the effort to debug it.

Signed-off-by: Ben Avison <bavison@riscosopen.org>
---
 libavcodec/arm/idctdsp_init_arm.c |  2 +
 tests/checkasm/Makefile           |  1 +
 tests/checkasm/checkasm.c         |  3 ++
 tests/checkasm/checkasm.h         |  1 +
 tests/checkasm/idctdsp.c          | 85 +++++++++++++++++++++++++++++++
 tests/fate/checkasm.mak           |  1 +
 6 files changed, 93 insertions(+)
 create mode 100644 tests/checkasm/idctdsp.c

Comments

Martin Storsjö March 29, 2022, 1:13 p.m. UTC | #1
On Fri, 25 Mar 2022, Ben Avison wrote:

> Disable ff_add_pixels_clamped_arm, which was found to fail the test. As this
> is normally only used for Arms prior to Armv6 (ARM11) it seems quite unlikely
> that anyone is still using this, so I haven't put in the effort to debug it.

I had a look at this function, and I see that the overflow checks are 
using

         tst             r6,  #0x100

to see whether the addition overflowed (either above or below). However, 
if block[] was e.g. 0x200, it's possible to overflow without setting this 
bit at all.

If it would be the case that the valid range of block[] values would be 
e.g. [-255,255], then this kind of overflow checking would work though. 
(As there exists assembly for armv6, then this function probably hasn't 
been used much in modern times, so this doesn't say much about what values 
actually are used here.)

Secondly, the clamping seems to be done with

         movne           r6,  r5,  lsr #24

However that should use asr, not lsr, I think, to get proper clamping in 
both ends?


Thirdly - the added test also occasionally fails for the other existing 
functions (armv6, neon) and the newly added aarch64 neon version. If you 
have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition 
will overflow, as there's no operation that both widens and clamps at the 
same time.

I think this is reason to limit the range of src[] at least somewhat in 
the test, since I don't think the full 16 bit signed range actually is 
relevant here.

// Martin
Martin Storsjö March 29, 2022, 7:56 p.m. UTC | #2
On Tue, 29 Mar 2022, Martin Storsjö wrote:

> On Fri, 25 Mar 2022, Ben Avison wrote:
>
>> Disable ff_add_pixels_clamped_arm, which was found to fail the test. As 
>> this
>> is normally only used for Arms prior to Armv6 (ARM11) it seems quite 
>> unlikely
>> that anyone is still using this, so I haven't put in the effort to debug 
>> it.
>
> I had a look at this function, and I see that the overflow checks are using
>
>        tst             r6,  #0x100
>
> to see whether the addition overflowed (either above or below). However, if 
> block[] was e.g. 0x200, it's possible to overflow without setting this bit at 
> all.
>
> If it would be the case that the valid range of block[] values would be e.g. 
> [-255,255], then this kind of overflow checking would work though. (As there 
> exists assembly for armv6, then this function probably hasn't been used much 
> in modern times, so this doesn't say much about what values actually are used 
> here.)
>
> Secondly, the clamping seems to be done with
>
>        movne           r6,  r5,  lsr #24
>
> However that should use asr, not lsr, I think, to get proper clamping in both 
> ends?

On second thought, no, lsr #24 should be correct here. But "tst r6, 
#0x100" probably is the main issue, given the range of input values set by 
the current test. No idea what the actual value range is, for the decoders 
that use this function though.

// Martin
Ben Avison March 29, 2022, 8:22 p.m. UTC | #3
On 29/03/2022 14:13, Martin Storsjö wrote:
> On Fri, 25 Mar 2022, Ben Avison wrote:
> 
>> Disable ff_add_pixels_clamped_arm, which was found to fail the test. 
> 
> I had a look at this function, and I see that the overflow checks are using
> 
>          tst             r6,  #0x100
> 
> to see whether the addition overflowed (either above or below). However, 
> if block[] was e.g. 0x200, it's possible to overflow without setting 
> this bit at all.

Yes, thinking about it, that test is only valid if the signed 16-bit 
value from block[] lies in the range -0x100..+0x100 inclusive, otherwise 
there exists at least one unsigned 8-bit value which should have clamped 
but won't.

> Secondly, the clamping seems to be done with
> 
>          movne           r6,  r5,  lsr #24
> 
> However that should use asr, not lsr, I think, to get proper clamping in 
> both ends?

r5 is the NOTted version, so all that's doing is selecting 0x000000FF if 
there was positive overflow, and 0x00000000 if there was negative 
overflow. Given that bit 8 and above need to be zero to facilitate 
repacking the 8-bit samples, that's the right thing to do.

> Thirdly - the added test also occasionally fails for the other existing 
> functions (armv6, neon) and the newly added aarch64 neon version. If you 
> have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition 
> will overflow, as there's no operation that both widens and clamps at 
> the same time.

So it does. I obviously just didn't hit those cases in my test runs!

I can't easily test all codecs that use this function, but I just tried 
instrumenting the VC-1 case and it doesn't appear to actually use this 
particular function, so I'm none the wiser!

Should I just limit the 16-bit values to +/-0x100 and re-enable the 
armv4 fast path then?

Ben
Martin Storsjö March 29, 2022, 8:30 p.m. UTC | #4
On Tue, 29 Mar 2022, Ben Avison wrote:

>> Thirdly - the added test also occasionally fails for the other existing 
>> functions (armv6, neon) and the newly added aarch64 neon version. If you 
>> have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition 
>> will overflow, as there's no operation that both widens and clamps at 
>> the same time.
>
> So it does. I obviously just didn't hit those cases in my test runs!
>
> I can't easily test all codecs that use this function, but I just tried 
> instrumenting the VC-1 case and it doesn't appear to actually use this 
> particular function, so I'm none the wiser!
>
> Should I just limit the 16-bit values to +/-0x100 and re-enable the 
> armv4 fast path then?

Yes, I think that'd be the safest path forward. Worst case, the test would 
be slightly too narrow and could miss some valid case - but that's at 
least better than having the test give false positives for perfectly 
correct assembly, that would work just fine for actual decoder use.

// Martin
diff mbox series

Patch

diff --git a/libavcodec/arm/idctdsp_init_arm.c b/libavcodec/arm/idctdsp_init_arm.c
index ebc90e4b49..8c8f7daf06 100644
--- a/libavcodec/arm/idctdsp_init_arm.c
+++ b/libavcodec/arm/idctdsp_init_arm.c
@@ -83,7 +83,9 @@  av_cold void ff_idctdsp_init_arm(IDCTDSPContext *c, AVCodecContext *avctx,
         }
     }
 
+#if 0 // FIXME: this implementation fails checkasm test
     c->add_pixels_clamped = ff_add_pixels_clamped_arm;
+#endif
 
     if (have_armv5te(cpu_flags))
         ff_idctdsp_init_armv5te(c, avctx, high_bit_depth);
diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 7133a6ee66..f6b1008855 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -9,6 +9,7 @@  AVCODECOBJS-$(CONFIG_G722DSP)           += g722dsp.o
 AVCODECOBJS-$(CONFIG_H264DSP)           += h264dsp.o
 AVCODECOBJS-$(CONFIG_H264PRED)          += h264pred.o
 AVCODECOBJS-$(CONFIG_H264QPEL)          += h264qpel.o
+AVCODECOBJS-$(CONFIG_IDCTDSP)           += idctdsp.o
 AVCODECOBJS-$(CONFIG_LLVIDDSP)          += llviddsp.o
 AVCODECOBJS-$(CONFIG_LLVIDENCDSP)       += llviddspenc.o
 AVCODECOBJS-$(CONFIG_VC1DSP)            += vc1dsp.o
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index c2efd81b6d..57134f96ea 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -123,6 +123,9 @@  static const struct {
     #if CONFIG_HUFFYUV_DECODER
         { "huffyuvdsp", checkasm_check_huffyuvdsp },
     #endif
+    #if CONFIG_IDCTDSP
+        { "idctdsp", checkasm_check_idctdsp },
+    #endif
     #if CONFIG_JPEG2000_DECODER
         { "jpeg2000dsp", checkasm_check_jpeg2000dsp },
     #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 52ab18a5b1..a86db140e3 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -64,6 +64,7 @@  void checkasm_check_hevc_idct(void);
 void checkasm_check_hevc_pel(void);
 void checkasm_check_hevc_sao(void);
 void checkasm_check_huffyuvdsp(void);
+void checkasm_check_idctdsp(void);
 void checkasm_check_jpeg2000dsp(void);
 void checkasm_check_llviddsp(void);
 void checkasm_check_llviddspenc(void);
diff --git a/tests/checkasm/idctdsp.c b/tests/checkasm/idctdsp.c
new file mode 100644
index 0000000000..d94728b672
--- /dev/null
+++ b/tests/checkasm/idctdsp.c
@@ -0,0 +1,85 @@ 
+/*
+ * Copyright (c) 2022 Ben Avison
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <string.h>
+
+#include "checkasm.h"
+
+#include "libavcodec/idctdsp.h"
+
+#include "libavutil/common.h"
+#include "libavutil/internal.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mem_internal.h"
+
+#define RANDOMIZE_BUFFER16(name, size)        \
+    do {                                      \
+        int i;                                \
+        for (i = 0; i < size; ++i) {          \
+            uint16_t r = rnd();               \
+            AV_WN16A(name##0 + i, r);         \
+            AV_WN16A(name##1 + i, r);         \
+        }                                     \
+    } while (0)
+
+#define RANDOMIZE_BUFFER8(name, size)         \
+    do {                                      \
+        int i;                                \
+        for (i = 0; i < size; ++i) {          \
+            uint8_t r = rnd();                \
+            name##0[i] = r;                   \
+            name##1[i] = r;                   \
+        }                                     \
+    } while (0)
+
+#define CHECK_ADD_PUT_CLAMPED(func)                                                             \
+    do {                                                                                        \
+        if (check_func(h.func, "idctdsp." #func)) {                                             \
+            declare_func_emms(AV_CPU_FLAG_MMX, void, const int16_t *, uint8_t *, ptrdiff_t);    \
+            RANDOMIZE_BUFFER16(src, 64);                                                        \
+            RANDOMIZE_BUFFER8(dst, 10 * 24);                                                    \
+            call_ref(src0, dst0 + 24 + 8, 24);                                                  \
+            call_new(src1, dst1 + 24 + 8, 24);                                                  \
+            if (memcmp(dst0, dst1, 10 * 24))                                                    \
+                fail();                                                                         \
+            bench_new(src1, dst1 + 24 + 8, 24);                                                 \
+        }                                                                                       \
+    } while (0)
+
+void checkasm_check_idctdsp(void)
+{
+    /* Source buffers are only as big as needed, since any over-read won't affect results */
+    LOCAL_ALIGNED_16(int16_t, src0, [64]);
+    LOCAL_ALIGNED_16(int16_t, src1, [64]);
+    /* Destination buffers have borders of one row above/below and 8 columns left/right to catch overflows */
+    LOCAL_ALIGNED_8(uint8_t, dst0, [10 * 24]);
+    LOCAL_ALIGNED_8(uint8_t, dst1, [10 * 24]);
+
+    AVCodecContext avctx = { 0 };
+    IDCTDSPContext h;
+
+    ff_idctdsp_init(&h, &avctx);
+
+    CHECK_ADD_PUT_CLAMPED(add_pixels_clamped);
+    CHECK_ADD_PUT_CLAMPED(put_pixels_clamped);
+    CHECK_ADD_PUT_CLAMPED(put_signed_pixels_clamped);
+
+    report("idctdsp");
+}
diff --git a/tests/fate/checkasm.mak b/tests/fate/checkasm.mak
index 99e6bb13c4..c6273db183 100644
--- a/tests/fate/checkasm.mak
+++ b/tests/fate/checkasm.mak
@@ -19,6 +19,7 @@  FATE_CHECKASM = fate-checkasm-aacpsdsp                                  \
                 fate-checkasm-hevc_pel                                  \
                 fate-checkasm-hevc_sao                                  \
                 fate-checkasm-huffyuvdsp                                \
+                fate-checkasm-idctdsp                                   \
                 fate-checkasm-jpeg2000dsp                               \
                 fate-checkasm-llviddsp                                  \
                 fate-checkasm-llviddspenc                               \