diff mbox series

[FFmpeg-devel,2/2] avcodec/put_bits: Make bit buffers 64-bit

Message ID 20200717194120.8803-2-steinar+ffmpeg@gunderson.no
State Accepted
Headers show
Series [FFmpeg-devel,1/2] avcodec/put_bits: Parametrize bit buffer type | expand

Checks

Context Check Description
andriy/default pending
andriy/make_warn warning New warnings during build
andriy/make success Make finished
andriy/make_fate fail Make fate failed

Commit Message

Steinar H. Gunderson July 17, 2020, 7:41 p.m. UTC
Change BitBuf into uint64_t on all supported 64-bit platforms.
This means we need to flush the buffer less often, which is a
significant speed win. 32-bit platforms are unchanged.
Output bitstream is the same.

All API constraints are kept in place, e.g., you still cannot put_bits()
more than 31 bits at a time. This is so that codecs cannot accidentally
become 64-bit-only or similar.

Benchmarking on transcoding to various formats shows consistently
positive results:

  dnxhd                 25.60 fps ->  26.26 fps ( +2.6%)
  dvvideo               24.88 fps ->  25.17 fps ( +1.2%)
  ffv1                  14.32 fps ->  14.58 fps ( +1.8%)
  huffyuv               58.75 fps ->  63.27 fps ( +7.7%)
  jpegls                 6.22 fps ->   6.34 fps ( +1.8%)
  mjpeg                 48.65 fps ->  49.01 fps ( +0.7%)
  mpeg1video            76.41 fps ->  77.01 fps ( +0.8%)
  mpeg2video            75.99 fps ->  77.43 fps ( +1.9%)
  mpeg4                 80.66 fps ->  81.37 fps ( +0.9%)
  prores                12.35 fps ->  12.88 fps ( +4.3%)
  prores_ks             16.20 fps ->  16.80 fps ( +3.7%)
  rv20                  62.80 fps ->  62.99 fps ( +0.3%)
  utvideo               68.41 fps ->  76.32 fps (+11.6%)

Note that this includes video decoding and all other encoding work,
such as DCTs. If you isolate the actual bit-writing routines, it is
likely to be much more.

Benchmark details: Transcoding the first 30 seconds of Big Buck Bunny
in 1080p, Haswell 2.1 GHz, GCC 8.3, generally quantizer locked to
5.0. (Exceptions: DNxHD needs fixed bitrate, and JPEG-LS is so slow
that I only took the first 10 seconds, not 30.) All runs were done
ten times and single-threaded, top and bottom two results discarded to
get rid of outliers, arithmetic mean between the remaining six.
---
 libavcodec/put_bits.h | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Paul B Mahol July 17, 2020, 7:48 p.m. UTC | #1
Missing magicyuv benchmark.

On 7/17/20, Steinar H. Gunderson <steinar+ffmpeg@gunderson.no> wrote:
> Change BitBuf into uint64_t on all supported 64-bit platforms.
> This means we need to flush the buffer less often, which is a
> significant speed win. 32-bit platforms are unchanged.
> Output bitstream is the same.
>
> All API constraints are kept in place, e.g., you still cannot put_bits()
> more than 31 bits at a time. This is so that codecs cannot accidentally
> become 64-bit-only or similar.
>
> Benchmarking on transcoding to various formats shows consistently
> positive results:
>
>   dnxhd                 25.60 fps ->  26.26 fps ( +2.6%)
>   dvvideo               24.88 fps ->  25.17 fps ( +1.2%)
>   ffv1                  14.32 fps ->  14.58 fps ( +1.8%)
>   huffyuv               58.75 fps ->  63.27 fps ( +7.7%)
>   jpegls                 6.22 fps ->   6.34 fps ( +1.8%)
>   mjpeg                 48.65 fps ->  49.01 fps ( +0.7%)
>   mpeg1video            76.41 fps ->  77.01 fps ( +0.8%)
>   mpeg2video            75.99 fps ->  77.43 fps ( +1.9%)
>   mpeg4                 80.66 fps ->  81.37 fps ( +0.9%)
>   prores                12.35 fps ->  12.88 fps ( +4.3%)
>   prores_ks             16.20 fps ->  16.80 fps ( +3.7%)
>   rv20                  62.80 fps ->  62.99 fps ( +0.3%)
>   utvideo               68.41 fps ->  76.32 fps (+11.6%)
>
> Note that this includes video decoding and all other encoding work,
> such as DCTs. If you isolate the actual bit-writing routines, it is
> likely to be much more.
>
> Benchmark details: Transcoding the first 30 seconds of Big Buck Bunny
> in 1080p, Haswell 2.1 GHz, GCC 8.3, generally quantizer locked to
> 5.0. (Exceptions: DNxHD needs fixed bitrate, and JPEG-LS is so slow
> that I only took the first 10 seconds, not 30.) All runs were done
> ten times and single-threaded, top and bottom two results discarded to
> get rid of outliers, arithmetic mean between the remaining six.
> ---
>  libavcodec/put_bits.h | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/libavcodec/put_bits.h b/libavcodec/put_bits.h
> index c6a8f3ac14..d09c998991 100644
> --- a/libavcodec/put_bits.h
> +++ b/libavcodec/put_bits.h
> @@ -32,9 +32,15 @@
>  #include "libavutil/intreadwrite.h"
>  #include "libavutil/avassert.h"
>
> +#if ARCH_AARCH64 || ARCH_IA64 || ARCH_MIPS64 || ARCH_SPARC64 || ARCH_X86_64
> +typedef uint64_t BitBuf;
> +#define AV_WBBUF AV_WB64
> +#define AV_WLBUF AV_WL64
> +#else
>  typedef uint32_t BitBuf;
>  #define AV_WBBUF AV_WB32
>  #define AV_WLBUF AV_WL32
> +#endif
>
>  static const int BUF_BITS = 8 * sizeof(BitBuf);
>
> --
> 2.20.1
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Steinar H. Gunderson July 17, 2020, 8:03 p.m. UTC | #2
On Fri, Jul 17, 2020 at 09:48:42PM +0200, Paul B Mahol wrote:
> Missing magicyuv benchmark.

I didn't intend to do every single codec, but sure:
 
magicyuv              57.10 fps ->  63.29 fps (+10.8%)

/* Steinar */
Michael Niedermayer July 18, 2020, 9:53 a.m. UTC | #3
On Fri, Jul 17, 2020 at 09:41:20PM +0200, Steinar H. Gunderson wrote:
> Change BitBuf into uint64_t on all supported 64-bit platforms.
> This means we need to flush the buffer less often, which is a
> significant speed win. 32-bit platforms are unchanged.
> Output bitstream is the same.
> 
> All API constraints are kept in place, e.g., you still cannot put_bits()
> more than 31 bits at a time. This is so that codecs cannot accidentally
> become 64-bit-only or similar.
> 
> Benchmarking on transcoding to various formats shows consistently
> positive results:
> 
>   dnxhd                 25.60 fps ->  26.26 fps ( +2.6%)
>   dvvideo               24.88 fps ->  25.17 fps ( +1.2%)
>   ffv1                  14.32 fps ->  14.58 fps ( +1.8%)
>   huffyuv               58.75 fps ->  63.27 fps ( +7.7%)
>   jpegls                 6.22 fps ->   6.34 fps ( +1.8%)
>   mjpeg                 48.65 fps ->  49.01 fps ( +0.7%)
>   mpeg1video            76.41 fps ->  77.01 fps ( +0.8%)
>   mpeg2video            75.99 fps ->  77.43 fps ( +1.9%)
>   mpeg4                 80.66 fps ->  81.37 fps ( +0.9%)
>   prores                12.35 fps ->  12.88 fps ( +4.3%)
>   prores_ks             16.20 fps ->  16.80 fps ( +3.7%)
>   rv20                  62.80 fps ->  62.99 fps ( +0.3%)
>   utvideo               68.41 fps ->  76.32 fps (+11.6%)
> 
> Note that this includes video decoding and all other encoding work,
> such as DCTs. If you isolate the actual bit-writing routines, it is
> likely to be much more.
> 
> Benchmark details: Transcoding the first 30 seconds of Big Buck Bunny
> in 1080p, Haswell 2.1 GHz, GCC 8.3, generally quantizer locked to
> 5.0. (Exceptions: DNxHD needs fixed bitrate, and JPEG-LS is so slow
> that I only took the first 10 seconds, not 30.) All runs were done
> ten times and single-threaded, top and bottom two results discarded to
> get rid of outliers, arithmetic mean between the remaining six.
> ---
>  libavcodec/put_bits.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/libavcodec/put_bits.h b/libavcodec/put_bits.h
> index c6a8f3ac14..d09c998991 100644
> --- a/libavcodec/put_bits.h
> +++ b/libavcodec/put_bits.h
> @@ -32,9 +32,15 @@
>  #include "libavutil/intreadwrite.h"
>  #include "libavutil/avassert.h"
>  
> +#if ARCH_AARCH64 || ARCH_IA64 || ARCH_MIPS64 || ARCH_SPARC64 || ARCH_X86_64

this needs a #include "config.h" or something equivalent

thx

[...]
Steinar H. Gunderson July 18, 2020, 11:16 a.m. UTC | #4
On Sat, Jul 18, 2020 at 11:53:44AM +0200, Michael Niedermayer wrote:
>> +#if ARCH_AARCH64 || ARCH_IA64 || ARCH_MIPS64 || ARCH_SPARC64 || ARCH_X86_64
> this needs a #include "config.h" or something equivalent

Sounds right, will fix.

/* Steinar */
Carl Eugen Hoyos July 18, 2020, 11:53 a.m. UTC | #5
Am Fr., 17. Juli 2020 um 21:41 Uhr schrieb Steinar H. Gunderson
<steinar+ffmpeg@gunderson.no>:

> +#if ARCH_AARCH64 || ARCH_IA64 || ARCH_MIPS64 || ARCH_SPARC64 || ARCH_X86_64

I suggest to only do this for the platforms that you actually tested.

> +typedef uint64_t BitBuf;
> +#define AV_WBBUF AV_WB64
> +#define AV_WLBUF AV_WL64

Carl Eugen
Steinar H. Gunderson July 18, 2020, 2:51 p.m. UTC | #6
On Sat, Jul 18, 2020 at 01:53:36PM +0200, Carl Eugen Hoyos wrote:
>> +#if ARCH_AARCH64 || ARCH_IA64 || ARCH_MIPS64 || ARCH_SPARC64 || ARCH_X86_64
> I suggest to only do this for the platforms that you actually tested.

OK. If so, that's x86-64 only.

/* Steinar */
diff mbox series

Patch

diff --git a/libavcodec/put_bits.h b/libavcodec/put_bits.h
index c6a8f3ac14..d09c998991 100644
--- a/libavcodec/put_bits.h
+++ b/libavcodec/put_bits.h
@@ -32,9 +32,15 @@ 
 #include "libavutil/intreadwrite.h"
 #include "libavutil/avassert.h"
 
+#if ARCH_AARCH64 || ARCH_IA64 || ARCH_MIPS64 || ARCH_SPARC64 || ARCH_X86_64
+typedef uint64_t BitBuf;
+#define AV_WBBUF AV_WB64
+#define AV_WLBUF AV_WL64
+#else
 typedef uint32_t BitBuf;
 #define AV_WBBUF AV_WB32
 #define AV_WLBUF AV_WL32
+#endif
 
 static const int BUF_BITS = 8 * sizeof(BitBuf);