[FFmpeg-devel,3/3] avcodec/utvideodec: use cached bitstream reader

Submitted by Paul B Mahol on Aug. 23, 2018, 1:08 p.m.

Details

Message ID 20180823130847.22374-3-onemda@gmail.com
State New
Headers show

Commit Message

Paul B Mahol Aug. 23, 2018, 1:08 p.m.
From 100x real-time decoding to 138x real-time decoding for 320x240 video.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
---
 libavcodec/utvideodec.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Carl Eugen Hoyos Aug. 23, 2018, 1:17 p.m.
2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
> From 100x real-time decoding to 138x real-time decoding for 320x240 video.

On which hardware did you test?

Carl Eugen
Paul B Mahol Aug. 23, 2018, 1:21 p.m.
On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>> From 100x real-time decoding to 138x real-time decoding for 320x240 video.
>
> On which hardware did you test?

That is highly confidental info.
Carl Eugen Hoyos Aug. 23, 2018, 1:23 p.m.
2018-08-23 15:21 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>> From 100x real-time decoding to 138x real-time decoding for 320x240
>>> video.
>>
>> On which hardware did you test?
>
> That is highly confidental info.

In that case this patch is not ok.

Carl Eugen
Paul B Mahol Aug. 23, 2018, 1:24 p.m.
On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2018-08-23 15:21 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>>> From 100x real-time decoding to 138x real-time decoding for 320x240
>>>> video.
>>>
>>> On which hardware did you test?
>>
>> That is highly confidental info.
>
> In that case this patch is not ok.

Ugh, can you explain why?
Nicolas George Aug. 23, 2018, 1:25 p.m.
Paul B Mahol (2018-08-23):
> Ugh, can you explain why?

Need provable claims.
Paul B Mahol Aug. 23, 2018, 1:26 p.m.
On 8/23/18, Nicolas George <george@nsup.org> wrote:
> Paul B Mahol (2018-08-23):
>> Ugh, can you explain why?
>
> Need provable claims.

Try it yourself, that is only way you will prove it to self.
Nicolas George Aug. 23, 2018, 1:27 p.m.
Paul B Mahol (2018-08-23):
> Try it yourself

Your patch. Still not ok.
James Almer Aug. 23, 2018, 1:30 p.m.
On 8/23/2018 10:24 AM, Paul B Mahol wrote:
> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>> 2018-08-23 15:21 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>>>> From 100x real-time decoding to 138x real-time decoding for 320x240
>>>>> video.
>>>>
>>>> On which hardware did you test?
>>>
>>> That is highly confidental info.
>>
>> In that case this patch is not ok.
> 
> Ugh, can you explain why?

Christ, just say what CPU you used already.
Carl Eugen Hoyos Aug. 23, 2018, 1:30 p.m.
2018-08-23 15:25 GMT+02:00, Nicolas George <george@nsup.org>:
> Paul B Mahol (2018-08-23):
>> Ugh, can you explain why?
>
> Need provable claims.

What I meant was just that I would prefer to only test on other
platforms...

Not that it wouldn't make sense to add the tested platform
to the commit message though.

Carl Eugen
Paul B Mahol Aug. 23, 2018, 1:35 p.m.
On 8/23/18, James Almer <jamrial@gmail.com> wrote:
> On 8/23/2018 10:24 AM, Paul B Mahol wrote:
>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>> 2018-08-23 15:21 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>>>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>>>>> From 100x real-time decoding to 138x real-time decoding for 320x240
>>>>>> video.
>>>>>
>>>>> On which hardware did you test?
>>>>
>>>> That is highly confidental info.
>>>
>>> In that case this patch is not ok.
>>
>> Ugh, can you explain why?
>
> Christ, just say what CPU you used already.

When you insist!

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               61
Model name:          Intel(R) Core(TM) i3-5005U CPU @ 2.00GHz
Stepping:            4
CPU MHz:             1895.954
CPU max MHz:         1900,0000
CPU min MHz:         500,0000
BogoMIPS:            3990.53
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            3072K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid
sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm
abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi
flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms
invpcid rdseed adx smap intel_pt xsaveopt dtherm arat pln pts
Carl Eugen Hoyos Aug. 23, 2018, 1:35 p.m.
2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
> From 100x real-time decoding to 138x real-time decoding for 320x240 video.

On x86_64 I get an even better improvement, on x86_32
decoding gets slower by approximately 10%.
Without the patch, decoding is faster on x86_32 than
x86_64 here...

Testfile produced with:
$ ffmpeg -f lavfi -i testsrc=s=320x240 -vcodec utvideo -vframes 10000 out.avi

Same question I thought of when this patch was originally dropped:
Is there any problem with using the new reader on some targets
but not others?

Carl Eugen
Carl Eugen Hoyos Aug. 23, 2018, 1:37 p.m.
2018-08-23 15:35 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
> On 8/23/18, James Almer <jamrial@gmail.com> wrote:
>> On 8/23/2018 10:24 AM, Paul B Mahol wrote:
>>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>>> 2018-08-23 15:21 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>>>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>>>>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>>>>>> From 100x real-time decoding to 138x real-time decoding for 320x240
>>>>>>> video.
>>>>>>
>>>>>> On which hardware did you test?
>>>>>
>>>>> That is highly confidental info.
>>>>
>>>> In that case this patch is not ok.
>>>
>>> Ugh, can you explain why?
>>
>> Christ, just say what CPU you used already.
>
> When you insist!
>
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit

Sorry if you disagree but this is not a useful answer:
The important question is (actually was, now that I
confirmed what was claimed earlier) if you compiled
for 32 or 64 bit.

Yes, my original question wasn't much better: Sorry
about it!

Carl Eugen
Reto Kromer Aug. 23, 2018, 1:40 p.m.
Paul B Mahol wrote:

>Byte Order:          Little Endian

I will check the mixed endian on my PDP-11 ;-)
Carl Eugen Hoyos Aug. 23, 2018, 1:58 p.m.
2018-08-23 15:35 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>:
> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>> From 100x real-time decoding to 138x real-time decoding for 320x240
>> video.
>
> On x86_64 I get an even better improvement, on x86_32
> decoding gets slower by approximately 10%.
> Without the patch, decoding is faster on x86_32 than
> x86_64 here...

Using vanilla gcc-6.4.

clang 3.4:
5% slower with patch on x86_32
>40% faster with patch on x86_64
x86_32 again faster than x86_64 without patch.

Carl Eugen
Paul B Mahol Aug. 23, 2018, 2 p.m.
On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2018-08-23 15:35 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>:
>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>> From 100x real-time decoding to 138x real-time decoding for 320x240
>>> video.
>>
>> On x86_64 I get an even better improvement, on x86_32
>> decoding gets slower by approximately 10%.
>> Without the patch, decoding is faster on x86_32 than
>> x86_64 here...
>
> Using vanilla gcc-6.4.
>
> clang 3.4:
> 5% slower with patch on x86_32
>>40% faster with patch on x86_64
> x86_32 again faster than x86_64 without patch.

x86_32 is going to be less and less used.
Carl Eugen Hoyos Aug. 23, 2018, 2:07 p.m.
2018-08-23 16:00 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>> 2018-08-23 15:35 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>:
>>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>>> From 100x real-time decoding to 138x real-time decoding for 320x240
>>>> video.
>>>
>>> On x86_64 I get an even better improvement, on x86_32
>>> decoding gets slower by approximately 10%.
>>> Without the patch, decoding is faster on x86_32 than
>>> x86_64 here...
>>
>> Using vanilla gcc-6.4.
>>
>> clang 3.4:
>> 5% slower with patch on x86_32
>>>40% faster with patch on x86_64
>> x86_32 again faster than x86_64 without patch.
>
> x86_32 is going to be less and less used.

Is there a reason why we cannot use
the change in the x86_64 case (and others that we
show to be faster) but not for x86_32?

Carl Eugen
Paul B Mahol Aug. 23, 2018, 2:12 p.m.
On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2018-08-23 16:00 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>> 2018-08-23 15:35 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>:
>>>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>>>> From 100x real-time decoding to 138x real-time decoding for 320x240
>>>>> video.
>>>>
>>>> On x86_64 I get an even better improvement, on x86_32
>>>> decoding gets slower by approximately 10%.
>>>> Without the patch, decoding is faster on x86_32 than
>>>> x86_64 here...
>>>
>>> Using vanilla gcc-6.4.
>>>
>>> clang 3.4:
>>> 5% slower with patch on x86_32
>>>>40% faster with patch on x86_64
>>> x86_32 again faster than x86_64 without patch.
>>
>> x86_32 is going to be less and less used.
>
> Is there a reason why we cannot use
> the change in the x86_64 case (and others that we
> show to be faster) but not for x86_32?

The define could be only enabled for x86_64.
Carl Eugen Hoyos Aug. 23, 2018, 9:20 p.m.
2018-08-23 15:58 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>:
> 2018-08-23 15:35 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>:
>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
>>> From 100x real-time decoding to 138x real-time decoding for 320x240
>>> video.
>>
>> On x86_64 I get an even better improvement, on x86_32
>> decoding gets slower by approximately 10%.
>> Without the patch, decoding is faster on x86_32 than
>> x86_64 here...
>
> Using vanilla gcc-6.4.
>
> clang 3.4:
> 5% slower with patch on x86_32
>>40% faster with patch on x86_64
> x86_32 again faster than x86_64 without patch.

On Linux ppc, gcc 7.2.1
64bit: 580 -> 740 fps
32bit: ~730fps, both with and without the patch

On Linux aarch64, gcc 4.8
358 -> 492 fps

Not sure if I can find a reliable arm 32bit target.
(Only mobile phones)

Carl Eugen

Patch hide | download patch | download mbox

diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c
index 82cb038ccd..99b37aa0f3 100644
--- a/libavcodec/utvideodec.c
+++ b/libavcodec/utvideodec.c
@@ -27,6 +27,7 @@ 
 #include <inttypes.h>
 #include <stdlib.h>
 
+#define CACHED_BITSTREAM_READER
 #define UNCHECKED_BITSTREAM_READER 1
 
 #include "libavutil/intreadwrite.h"