diff mbox series

[FFmpeg-devel,v3,2/5] avcodec/mips: Refine get_cabac_inline_mips.

Message ID 1617108715-24232-3-git-send-email-yinshiyou-hf@loongson.cn
State Superseded
Headers show
Series [FFmpeg-devel,v3,1/5] avcodec/mips: Restore the initialization sequence of MSA and MMI in ff_h264chroma_init_mips. | expand

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

Shiyou Yin March 30, 2021, 12:51 p.m. UTC
1. Refined function get_cabac_inline_mips.
2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.

Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000).
---
 libavcodec/mips/cabac.h | 131 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 102 insertions(+), 29 deletions(-)

Comments

Michael Niedermayer March 31, 2021, 2:39 p.m. UTC | #1
On Tue, Mar 30, 2021 at 08:51:52PM +0800, Shiyou Yin wrote:
> 1. Refined function get_cabac_inline_mips.
> 2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.
> 
> Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000).
> ---
>  libavcodec/mips/cabac.h | 131 +++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 102 insertions(+), 29 deletions(-)

This breaks fate with qemu mips

--- ffmpeg/tests/ref/fate/hevc-cabac-tudepth	2021-03-26 18:34:55.142789579 +0100
+++ tests/data/fate/hevc-cabac-tudepth	2021-03-31 16:36:50.613173111 +0200
@@ -3,4 +3,4 @@
 #codec_id 0: rawvideo
 #dimensions 0: 64x64
 #sar 0: 0/1
-0,          0,          0,        1,    12288, 0x0127a0d9
+0,          0,          0,        1,    12288, 0xa330b3bd
Test hevc-cabac-tudepth failed. Look at tests/data/fate/hevc-cabac-tudepth.err for details.
ffmpeg/tests/Makefile:255: recipe for target 'fate-hevc-cabac-tudepth' failed
make: *** [fate-hevc-cabac-tudepth] Error 1

[...]
Shiyou Yin April 12, 2021, 3:59 p.m. UTC | #2
> 2021年3月31日 下午10:39,Michael Niedermayer <michael@niedermayer.cc> 写道:
> 
> On Tue, Mar 30, 2021 at 08:51:52PM +0800, Shiyou Yin wrote:
>> 1. Refined function get_cabac_inline_mips.
>> 2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.
>> 
>> Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000).
>> ---
>> libavcodec/mips/cabac.h | 131 +++++++++++++++++++++++++++++++++++++-----------
>> 1 file changed, 102 insertions(+), 29 deletions(-)
> 
> This breaks fate with qemu mips
> 
> --- ffmpeg/tests/ref/fate/hevc-cabac-tudepth	2021-03-26 18:34:55.142789579 +0100
> +++ tests/data/fate/hevc-cabac-tudepth	2021-03-31 16:36:50.613173111 +0200
> @@ -3,4 +3,4 @@
> #codec_id 0: rawvideo
> #dimensions 0: 64x64
> #sar 0: 0/1
> -0,          0,          0,        1,    12288, 0x0127a0d9
> +0,          0,          0,        1,    12288, 0xa330b3bd
> Test hevc-cabac-tudepth failed. Look at tests/data/fate/hevc-cabac-tudepth.err for details.
> ffmpeg/tests/Makefile:255: recipe for target 'fate-hevc-cabac-tudepth' failed
> make: *** [fate-hevc-cabac-tudepth] Error 1
> 

This bug is caused by using ‘lhu’ to load two byte date on bigendian environment. Has been fixed in V4.  
Please help  to merge them.

BTW, I found another failed case ‘fate-sub2video_time_limited’  when testing origin/master
with cross compiler mips-linux-gnu-gcc-8 on debian10-x64 and run fate  with qemu-mips.
I will try to analyze it later.

My configuration: --samples=../../fate-suite/ --target-exec='/usr/bin/qemu-mips -cpu 74Kf -L /usr/mips-linux-gnu/' --cross-prefix=/usr/mips-linux-gnu/bin/ --cc=mips-linux-gnu-gcc-8 --arch=mips --target-os=linux --optflags='-O3 -g -static' --extra-ldflags=‘-static' --enable-cross-compile --enable-static --enable-gpl --disable-pthreads --disable-iconv --disable-mipsfpu

TEST    sub2video_time_limited
--- src/tests/ref/fate/sub2video_time_limited	2021-04-10 11:53:37.661350105 +0800
+++ tests/data/fate/sub2video_time_limited	2021-04-12 23:18:29.355527385 +0800
@@ -4,5 +4,5 @@
 #dimensions 0: 1920x1080
 #sar 0: 0/1
 0,          2,          2,        1,  8294400, 0x00000000
-0,          2,          2,        1,  8294400, 0xa87c518f
-0,         10,         10,        1,  8294400, 0xa87c518f
+0,          2,          2,        1,  8294400, 0xea5a518f
+0,         10,         10,        1,  8294400, 0xea5a518f
Test sub2video_time_limited failed. Look at tests/data/fate/sub2video_time_limited.err for details.
make: *** [src/tests/Makefile:256:fate-sub2video_time_limited] 错误 1
Shiyou Yin April 28, 2021, 9:57 a.m. UTC | #3
> 2021年4月12日 下午11:59,殷时友 <yinshiyou-hf@loongson.cn> 写道:
> 
> 
>> 2021年3月31日 下午10:39,Michael Niedermayer <michael@niedermayer.cc> 写道:
>> 
>> On Tue, Mar 30, 2021 at 08:51:52PM +0800, Shiyou Yin wrote:
>>> 1. Refined function get_cabac_inline_mips.
>>> 2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.
>>> 
>>> Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000).
>>> ---
>>> libavcodec/mips/cabac.h | 131 +++++++++++++++++++++++++++++++++++++-----------
>>> 1 file changed, 102 insertions(+), 29 deletions(-)
>> 
>> This breaks fate with qemu mips
>> 
>> --- ffmpeg/tests/ref/fate/hevc-cabac-tudepth	2021-03-26 18:34:55.142789579 +0100
>> +++ tests/data/fate/hevc-cabac-tudepth	2021-03-31 16:36:50.613173111 +0200
>> @@ -3,4 +3,4 @@
>> #codec_id 0: rawvideo
>> #dimensions 0: 64x64
>> #sar 0: 0/1
>> -0,          0,          0,        1,    12288, 0x0127a0d9
>> +0,          0,          0,        1,    12288, 0xa330b3bd
>> Test hevc-cabac-tudepth failed. Look at tests/data/fate/hevc-cabac-tudepth.err for details.
>> ffmpeg/tests/Makefile:255: recipe for target 'fate-hevc-cabac-tudepth' failed
>> make: *** [fate-hevc-cabac-tudepth] Error 1
>> 
> 
> This bug is caused by using ‘lhu’ to load two byte date on bigendian environment. Has been fixed in V4.  
> Please help  to merge them.
> 
> BTW, I found another failed case ‘fate-sub2video_time_limited’  when testing origin/master
> with cross compiler mips-linux-gnu-gcc-8 on debian10-x64 and run fate  with qemu-mips.
> I will try to analyze it later.
> 
> My configuration: --samples=../../fate-suite/ --target-exec='/usr/bin/qemu-mips -cpu 74Kf -L /usr/mips-linux-gnu/' --cross-prefix=/usr/mips-linux-gnu/bin/ --cc=mips-linux-gnu-gcc-8 --arch=mips --target-os=linux --optflags='-O3 -g -static' --extra-ldflags=‘-static' --enable-cross-compile --enable-static --enable-gpl --disable-pthreads --disable-iconv --disable-mipsfpu
> 
> TEST    sub2video_time_limited
> --- src/tests/ref/fate/sub2video_time_limited	2021-04-10 11:53:37.661350105 +0800
> +++ tests/data/fate/sub2video_time_limited	2021-04-12 23:18:29.355527385 +0800
> @@ -4,5 +4,5 @@
> #dimensions 0: 1920x1080
> #sar 0: 0/1
> 0,          2,          2,        1,  8294400, 0x00000000
> -0,          2,          2,        1,  8294400, 0xa87c518f
> -0,         10,         10,        1,  8294400, 0xa87c518f
> +0,          2,          2,        1,  8294400, 0xea5a518f
> +0,         10,         10,        1,  8294400, 0xea5a518f
> Test sub2video_time_limited failed. Look at tests/data/fate/sub2video_time_limited.err for details.
> make: *** [src/tests/Makefile:256:fate-sub2video_time_limited] 错误 1
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe”.

Hello, Michael
Can fate-sub2video_time_limited pass on your mips qemu before? I doubt if it’s qemu’s problem.
I build latest master branch without my patch set and disabled all asm code, then run fate test in qemu-mips which download from debian10 repository. 

../ffmpeg/configure --samples=../../fate-suite/ --target-exec='/usr/bin/qemu-mips -cpu 74Kf -L /usr/mips-linux-gnu/' --cross-prefix=/usr/mips-linux-gnu/bin/ --cc=mips-linux-gnu-gcc-8 --arch=mips --target-os=linux --optflags='-O3 -g -static' --extra-ldflags=-static --enable-cross-compile --enable-static --enable-gpl --disable-mipsfpu --disable-iconv --disable-pthreads  --disable-asm --cpu=74kf
make fate-sub2video_time_limited
diff mbox series

Patch

diff --git a/libavcodec/mips/cabac.h b/libavcodec/mips/cabac.h
index 3d09e93..0ee7594 100644
--- a/libavcodec/mips/cabac.h
+++ b/libavcodec/mips/cabac.h
@@ -2,7 +2,8 @@ 
  * Loongson SIMD optimized h264chroma
  *
  * Copyright (c) 2018 Loongson Technology Corporation Limited
- * Copyright (c) 2018 Shiyou Yin <yinshiyou-hf@loongson.cn>
+ * Contributed by Shiyou Yin <yinshiyou-hf@loongson.cn>
+ *                Gu Xiwei(guxiwei-hf@loongson.cn)
  *
  * This file is part of FFmpeg.
  *
@@ -25,18 +26,18 @@ 
 #define AVCODEC_MIPS_CABAC_H
 
 #include "libavcodec/cabac.h"
-#include "libavutil/mips/asmdefs.h"
+#include "libavutil/mips/mmiutils.h"
 #include "config.h"
 
 #define get_cabac_inline get_cabac_inline_mips
 static av_always_inline int get_cabac_inline_mips(CABACContext *c,
-                                             uint8_t * const state){
+                                                  uint8_t * const state){
     mips_reg tmp0, tmp1, tmp2, bit;
 
     __asm__ volatile (
         "lbu          %[bit],        0(%[state])                   \n\t"
         "and          %[tmp0],       %[c_range],     0xC0          \n\t"
-        PTR_ADDU     "%[tmp0],       %[tmp0],        %[tmp0]       \n\t"
+        PTR_SLL      "%[tmp0],       %[tmp0],        0x01          \n\t"
         PTR_ADDU     "%[tmp0],       %[tmp0],        %[tables]     \n\t"
         PTR_ADDU     "%[tmp0],       %[tmp0],        %[bit]        \n\t"
         /* tmp1: RangeLPS */
@@ -44,18 +45,11 @@  static av_always_inline int get_cabac_inline_mips(CABACContext *c,
 
         PTR_SUBU     "%[c_range],    %[c_range],     %[tmp1]       \n\t"
         PTR_SLL      "%[tmp0],       %[c_range],     0x11          \n\t"
-        PTR_SUBU     "%[tmp0],       %[tmp0],        %[c_low]      \n\t"
-
-        /* tmp2: lps_mask */
-        PTR_SRA      "%[tmp2],       %[tmp0],        0x1F          \n\t"
-        /* If tmp0 < 0, lps_mask ==  0xffffffff*/
-        /* If tmp0 >= 0, lps_mask ==  0x00000000*/
+        "slt          %[tmp2],       %[tmp0],        %[c_low]      \n\t"
         "beqz         %[tmp2],       1f                            \n\t"
-        PTR_SLL      "%[tmp0],       %[c_range],     0x11          \n\t"
+        "move         %[c_range],    %[tmp1]                       \n\t"
+        "not          %[bit],        %[bit]                        \n\t"
         PTR_SUBU     "%[c_low],      %[c_low],       %[tmp0]       \n\t"
-        PTR_SUBU     "%[tmp0],       %[tmp1],        %[c_range]    \n\t"
-        PTR_ADDU     "%[c_range],    %[c_range],     %[tmp0]       \n\t"
-        "xor          %[bit],        %[bit],         %[tmp2]       \n\t"
 
         "1:                                                        \n\t"
         /* tmp1: *state */
@@ -70,23 +64,18 @@  static av_always_inline int get_cabac_inline_mips(CABACContext *c,
         PTR_SLL      "%[c_range],    %[c_range],     %[tmp2]       \n\t"
         PTR_SLL      "%[c_low],      %[c_low],       %[tmp2]       \n\t"
 
-        "and          %[tmp0],       %[c_low],       %[cabac_mask] \n\t"
-        "bnez         %[tmp0],       1f                            \n\t"
-        PTR_ADDIU    "%[tmp0],       %[c_low],       -0x01         \n\t"
+        "and          %[tmp1],       %[c_low],       %[cabac_mask] \n\t"
+        "bnez         %[tmp1],       1f                            \n\t"
+        PTR_ADDIU    "%[tmp0],       %[c_low],       -0X01         \n\t"
         "xor          %[tmp0],       %[c_low],       %[tmp0]       \n\t"
         PTR_SRA      "%[tmp0],       %[tmp0],        0x0f          \n\t"
         PTR_ADDU     "%[tmp0],       %[tmp0],        %[tables]     \n\t"
+        /* tmp2: ff_h264_norm_shift[x >> (CABAC_BITS - 1)] */
         "lbu          %[tmp2],       %[norm_off](%[tmp0])          \n\t"
-#if CABAC_BITS == 16
-        "lbu          %[tmp0],       0(%[c_bytestream])            \n\t"
-        "lbu          %[tmp1],       1(%[c_bytestream])            \n\t"
-        PTR_SLL      "%[tmp0],       %[tmp0],        0x09          \n\t"
-        PTR_SLL      "%[tmp1],       %[tmp1],        0x01          \n\t"
-        PTR_ADDU     "%[tmp0],       %[tmp0],        %[tmp1]       \n\t"
-#else
-        "lbu          %[tmp0],       0(%[c_bytestream])            \n\t"
+
+        "lhu          %[tmp0],       0(%[c_bytestream])            \n\t"
+        "wsbh         %[tmp0],       %[tmp0]                       \n\t"
         PTR_SLL      "%[tmp0],       %[tmp0],        0x01          \n\t"
-#endif
         PTR_SUBU     "%[tmp0],       %[tmp0],        %[cabac_mask] \n\t"
 
         "li           %[tmp1],       0x07                          \n\t"
@@ -94,10 +83,13 @@  static av_always_inline int get_cabac_inline_mips(CABACContext *c,
         PTR_SLL      "%[tmp0],       %[tmp0],        %[tmp1]       \n\t"
         PTR_ADDU     "%[c_low],      %[c_low],       %[tmp0]       \n\t"
 
-#if !UNCHECKED_BITSTREAM_READER
-        "bge          %[c_bytestream], %[c_bytestream_end], 1f     \n\t"
+#if UNCHECKED_BITSTREAM_READER
+        PTR_ADDIU    "%[c_bytestream], %[c_bytestream],     0x02                 \n\t"
+#else
+        "slt          %[tmp0],         %[c_bytestream],     %[c_bytestream_end]  \n\t"
+        PTR_ADDIU    "%[tmp2],         %[c_bytestream],     0x02                 \n\t"
+        "movn         %[c_bytestream], %[tmp2],             %[tmp0]              \n\t"
 #endif
-        PTR_ADDIU    "%[c_bytestream], %[c_bytestream],     0X02   \n\t"
         "1:                                                        \n\t"
     : [bit]"=&r"(bit), [tmp0]"=&r"(tmp0), [tmp1]"=&r"(tmp1), [tmp2]"=&r"(tmp2),
       [c_range]"+&r"(c->range), [c_low]"+&r"(c->low),
@@ -116,4 +108,85 @@  static av_always_inline int get_cabac_inline_mips(CABACContext *c,
     return bit;
 }
 
+#define get_cabac_bypass get_cabac_bypass_mips
+static av_always_inline int get_cabac_bypass_mips(CABACContext *c)
+{
+    mips_reg tmp0, tmp1;
+    int res = 0;
+    __asm__ volatile(
+        PTR_SLL    "%[c_low],        %[c_low],        0x01                \n\t"
+        "and        %[tmp0],         %[c_low],        %[cabac_mask]       \n\t"
+        "bnez       %[tmp0],         1f                                   \n\t"
+        "lhu        %[tmp1],         0(%[c_bytestream])                   \n\t"
+        "wsbh       %[tmp1],         %[tmp1]                              \n\t"
+        PTR_SLL    "%[tmp1],         %[tmp1],         0x01                \n\t"
+        PTR_SUBU   "%[tmp1],         %[tmp1],         %[cabac_mask]       \n\t"
+        PTR_ADDU   "%[c_low],        %[c_low],        %[tmp1]             \n\t"
+#if UNCHECKED_BITSTREAM_READER
+        PTR_ADDIU  "%[c_bytestream], %[c_bytestream], 0x02                \n\t"
+#else
+        "slt        %[tmp0],         %[c_bytestream], %[c_bytestream_end] \n\t"
+        PTR_ADDIU  "%[tmp1],         %[c_bytestream], 0x02                \n\t"
+        "movn       %[c_bytestream], %[tmp1],         %[tmp0]             \n\t"
+#endif
+        "1:                                                               \n\t"
+        PTR_SLL    "%[tmp1],         %[c_range],      0x11                \n\t"
+        "slt        %[tmp0],         %[c_low],        %[tmp1]             \n\t"
+        PTR_SUBU   "%[tmp1],         %[c_low],        %[tmp1]             \n\t"
+        "movz       %[res],          %[one],          %[tmp0]             \n\t"
+        "movz       %[c_low],        %[tmp1],         %[tmp0]             \n\t"
+        : [tmp0]"=&r"(tmp0), [tmp1]"=&r"(tmp1), [res]"+&r"(res),
+          [c_range]"+&r"(c->range), [c_low]"+&r"(c->low),
+          [c_bytestream]"+&r"(c->bytestream)
+        : [cabac_mask]"r"(CABAC_MASK),
+#if !UNCHECKED_BITSTREAM_READER
+          [c_bytestream_end]"r"(c->bytestream_end),
+#endif
+          [one]"r"(0x01)
+        : "memory"
+    );
+    return res;
+}
+
+#define get_cabac_bypass_sign get_cabac_bypass_sign_mips
+static av_always_inline int get_cabac_bypass_sign_mips(CABACContext *c, int val)
+{
+    mips_reg tmp0, tmp1;
+    int res = val;
+    __asm__ volatile(
+        PTR_SLL    "%[c_low],        %[c_low],        0x01                \n\t"
+        "and        %[tmp0],         %[c_low],        %[cabac_mask]       \n\t"
+        "bnez       %[tmp0],         1f                                   \n\t"
+        "lhu        %[tmp1],         0(%[c_bytestream])                   \n\t"
+        "wsbh       %[tmp1],         %[tmp1]                              \n\t"
+        PTR_SLL    "%[tmp1],         %[tmp1],         0x01                \n\t"
+        PTR_SUBU   "%[tmp1],         %[tmp1],         %[cabac_mask]       \n\t"
+        PTR_ADDU   "%[c_low],        %[c_low],        %[tmp1]             \n\t"
+#if UNCHECKED_BITSTREAM_READER
+        PTR_ADDIU  "%[c_bytestream], %[c_bytestream], 0x02                \n\t"
+#else
+        "slt        %[tmp0],         %[c_bytestream], %[c_bytestream_end] \n\t"
+        PTR_ADDIU  "%[tmp1],         %[c_bytestream], 0x02                \n\t"
+        "movn       %[c_bytestream], %[tmp1],         %[tmp0]             \n\t"
+#endif
+        "1:                                                               \n\t"
+        PTR_SLL    "%[tmp1],         %[c_range],      0x11                \n\t"
+        "slt        %[tmp0],         %[c_low],        %[tmp1]             \n\t"
+        PTR_SUBU   "%[tmp1],         %[c_low],        %[tmp1]             \n\t"
+        "movz       %[c_low],        %[tmp1],         %[tmp0]             \n\t"
+        PTR_SUBU   "%[tmp1],         %[zero],         %[res]              \n\t"
+        "movn       %[res],          %[tmp1],         %[tmp0]             \n\t"
+        : [tmp0]"=&r"(tmp0), [tmp1]"=&r"(tmp1), [res]"+&r"(res),
+          [c_range]"+&r"(c->range), [c_low]"+&r"(c->low),
+          [c_bytestream]"+&r"(c->bytestream)
+        : [cabac_mask]"r"(CABAC_MASK),
+#if !UNCHECKED_BITSTREAM_READER
+          [c_bytestream_end]"r"(c->bytestream_end),
+#endif
+          [zero]"r"(0x0)
+        : "memory"
+    );
+
+    return res;
+}
 #endif /* AVCODEC_MIPS_CABAC_H */