Message ID | 20180823130847.22374-3-onemda@gmail.com |
---|---|
State | New |
Headers | show |
2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
> From 100x real-time decoding to 138x real-time decoding for 320x240 video.
On which hardware did you test?
Carl Eugen
On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >> From 100x real-time decoding to 138x real-time decoding for 320x240 video. > > On which hardware did you test? That is highly confidental info.
2018-08-23 15:21 GMT+02:00, Paul B Mahol <onemda@gmail.com>: > On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>> From 100x real-time decoding to 138x real-time decoding for 320x240 >>> video. >> >> On which hardware did you test? > > That is highly confidental info. In that case this patch is not ok. Carl Eugen
On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > 2018-08-23 15:21 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>>> From 100x real-time decoding to 138x real-time decoding for 320x240 >>>> video. >>> >>> On which hardware did you test? >> >> That is highly confidental info. > > In that case this patch is not ok. Ugh, can you explain why?
Paul B Mahol (2018-08-23):
> Ugh, can you explain why?
Need provable claims.
On 8/23/18, Nicolas George <george@nsup.org> wrote: > Paul B Mahol (2018-08-23): >> Ugh, can you explain why? > > Need provable claims. Try it yourself, that is only way you will prove it to self.
Paul B Mahol (2018-08-23):
> Try it yourself
Your patch. Still not ok.
On 8/23/2018 10:24 AM, Paul B Mahol wrote: > On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >> 2018-08-23 15:21 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >>>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>>>> From 100x real-time decoding to 138x real-time decoding for 320x240 >>>>> video. >>>> >>>> On which hardware did you test? >>> >>> That is highly confidental info. >> >> In that case this patch is not ok. > > Ugh, can you explain why? Christ, just say what CPU you used already.
2018-08-23 15:25 GMT+02:00, Nicolas George <george@nsup.org>: > Paul B Mahol (2018-08-23): >> Ugh, can you explain why? > > Need provable claims. What I meant was just that I would prefer to only test on other platforms... Not that it wouldn't make sense to add the tested platform to the commit message though. Carl Eugen
On 8/23/18, James Almer <jamrial@gmail.com> wrote: > On 8/23/2018 10:24 AM, Paul B Mahol wrote: >> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >>> 2018-08-23 15:21 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >>>>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>>>>> From 100x real-time decoding to 138x real-time decoding for 320x240 >>>>>> video. >>>>> >>>>> On which hardware did you test? >>>> >>>> That is highly confidental info. >>> >>> In that case this patch is not ok. >> >> Ugh, can you explain why? > > Christ, just say what CPU you used already. When you insist! Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 61 Model name: Intel(R) Core(TM) i3-5005U CPU @ 2.00GHz Stepping: 4 CPU MHz: 1895.954 CPU max MHz: 1900,0000 CPU min MHz: 500,0000 BogoMIPS: 3990.53 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap intel_pt xsaveopt dtherm arat pln pts
2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>:
> From 100x real-time decoding to 138x real-time decoding for 320x240 video.
On x86_64 I get an even better improvement, on x86_32
decoding gets slower by approximately 10%.
Without the patch, decoding is faster on x86_32 than
x86_64 here...
Testfile produced with:
$ ffmpeg -f lavfi -i testsrc=s=320x240 -vcodec utvideo -vframes 10000 out.avi
Same question I thought of when this patch was originally dropped:
Is there any problem with using the new reader on some targets
but not others?
Carl Eugen
2018-08-23 15:35 GMT+02:00, Paul B Mahol <onemda@gmail.com>: > On 8/23/18, James Almer <jamrial@gmail.com> wrote: >> On 8/23/2018 10:24 AM, Paul B Mahol wrote: >>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >>>> 2018-08-23 15:21 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>>>> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >>>>>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>>>>>> From 100x real-time decoding to 138x real-time decoding for 320x240 >>>>>>> video. >>>>>> >>>>>> On which hardware did you test? >>>>> >>>>> That is highly confidental info. >>>> >>>> In that case this patch is not ok. >>> >>> Ugh, can you explain why? >> >> Christ, just say what CPU you used already. > > When you insist! > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit Sorry if you disagree but this is not a useful answer: The important question is (actually was, now that I confirmed what was claimed earlier) if you compiled for 32 or 64 bit. Yes, my original question wasn't much better: Sorry about it! Carl Eugen
Paul B Mahol wrote:
>Byte Order: Little Endian
I will check the mixed endian on my PDP-11 ;-)
2018-08-23 15:35 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>: > 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >> From 100x real-time decoding to 138x real-time decoding for 320x240 >> video. > > On x86_64 I get an even better improvement, on x86_32 > decoding gets slower by approximately 10%. > Without the patch, decoding is faster on x86_32 than > x86_64 here... Using vanilla gcc-6.4. clang 3.4: 5% slower with patch on x86_32 >40% faster with patch on x86_64 x86_32 again faster than x86_64 without patch. Carl Eugen
On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > 2018-08-23 15:35 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>: >> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>> From 100x real-time decoding to 138x real-time decoding for 320x240 >>> video. >> >> On x86_64 I get an even better improvement, on x86_32 >> decoding gets slower by approximately 10%. >> Without the patch, decoding is faster on x86_32 than >> x86_64 here... > > Using vanilla gcc-6.4. > > clang 3.4: > 5% slower with patch on x86_32 >>40% faster with patch on x86_64 > x86_32 again faster than x86_64 without patch. x86_32 is going to be less and less used.
2018-08-23 16:00 GMT+02:00, Paul B Mahol <onemda@gmail.com>: > On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >> 2018-08-23 15:35 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>: >>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>>> From 100x real-time decoding to 138x real-time decoding for 320x240 >>>> video. >>> >>> On x86_64 I get an even better improvement, on x86_32 >>> decoding gets slower by approximately 10%. >>> Without the patch, decoding is faster on x86_32 than >>> x86_64 here... >> >> Using vanilla gcc-6.4. >> >> clang 3.4: >> 5% slower with patch on x86_32 >>>40% faster with patch on x86_64 >> x86_32 again faster than x86_64 without patch. > > x86_32 is going to be less and less used. Is there a reason why we cannot use the change in the x86_64 case (and others that we show to be faster) but not for x86_32? Carl Eugen
On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > 2018-08-23 16:00 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >> On 8/23/18, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >>> 2018-08-23 15:35 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>: >>>> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>>>> From 100x real-time decoding to 138x real-time decoding for 320x240 >>>>> video. >>>> >>>> On x86_64 I get an even better improvement, on x86_32 >>>> decoding gets slower by approximately 10%. >>>> Without the patch, decoding is faster on x86_32 than >>>> x86_64 here... >>> >>> Using vanilla gcc-6.4. >>> >>> clang 3.4: >>> 5% slower with patch on x86_32 >>>>40% faster with patch on x86_64 >>> x86_32 again faster than x86_64 without patch. >> >> x86_32 is going to be less and less used. > > Is there a reason why we cannot use > the change in the x86_64 case (and others that we > show to be faster) but not for x86_32? The define could be only enabled for x86_64.
2018-08-23 15:58 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>: > 2018-08-23 15:35 GMT+02:00, Carl Eugen Hoyos <ceffmpeg@gmail.com>: >> 2018-08-23 15:08 GMT+02:00, Paul B Mahol <onemda@gmail.com>: >>> From 100x real-time decoding to 138x real-time decoding for 320x240 >>> video. >> >> On x86_64 I get an even better improvement, on x86_32 >> decoding gets slower by approximately 10%. >> Without the patch, decoding is faster on x86_32 than >> x86_64 here... > > Using vanilla gcc-6.4. > > clang 3.4: > 5% slower with patch on x86_32 >>40% faster with patch on x86_64 > x86_32 again faster than x86_64 without patch. On Linux ppc, gcc 7.2.1 64bit: 580 -> 740 fps 32bit: ~730fps, both with and without the patch On Linux aarch64, gcc 4.8 358 -> 492 fps Not sure if I can find a reliable arm 32bit target. (Only mobile phones) Carl Eugen
diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c index 82cb038ccd..99b37aa0f3 100644 --- a/libavcodec/utvideodec.c +++ b/libavcodec/utvideodec.c @@ -27,6 +27,7 @@ #include <inttypes.h> #include <stdlib.h> +#define CACHED_BITSTREAM_READER #define UNCHECKED_BITSTREAM_READER 1 #include "libavutil/intreadwrite.h"
From 100x real-time decoding to 138x real-time decoding for 320x240 video. Signed-off-by: Paul B Mahol <onemda@gmail.com> --- libavcodec/utvideodec.c | 1 + 1 file changed, 1 insertion(+)