diff mbox

[FFmpeg-devel,1/2] avutil: Add MSB packed YUV444P10 format

Message ID 20170201225419.32405-2-philipl@overt.org
State New
Headers show

Commit Message

Philip Langdale Feb. 1, 2017, 10:54 p.m. UTC
nvenc supports a YUV444P10 format with the same bit layout as P010,
with the data bits at the MSB end.

Unfortunately, the current YUV444P10 format we have defined puts
the data bits at the LSB end.

This mismatch led to us introducing a fudge in nvenc where we
mapped their 444P10 format to the ffmpeg 444P16 format. This
ensured that the data ends up in the right place if you are
starting with 10bit content, but it leads to other problems.

Specifically:

* >10bit content will be preferrentially converted to 444P16
  to pass to nvenc to 'preserve' the additional bits. However, they
  aren't actually preserved, but are discarded, so we're worse off
  because no dithering took place in swscale.
* On top of that, the input data is almost certainly 4:2:0 and so
  the conversion to 4:4:4 is pointless, while also leading to an
  output file that's incompatible with many playback scenarios
  (no hardware decoder supports 4:4:4 content).

So, for the sake of accuracy, introduce an explicit pixfmt for this
situation. There's no conversion support because you'll basically
never have a use for it. If someone ever finds one, they can write
the swscale code.

In the mean time, common 12 bit content (YUV420P12 or P016) will be
correctly converted to P010 for nvenc.

Signed-off-by: Philip Langdale <philipl@overt.org>
---
 libavutil/pixdesc.c | 24 ++++++++++++++++++++++++
 libavutil/pixfmt.h  |  5 +++++
 libavutil/version.h |  2 +-
 3 files changed, 30 insertions(+), 1 deletion(-)

Comments

Hendrik Leppkes Feb. 2, 2017, 9:48 a.m. UTC | #1
On Wed, Feb 1, 2017 at 11:54 PM, Philip Langdale <philipl@overt.org> wrote:
> nvenc supports a YUV444P10 format with the same bit layout as P010,
> with the data bits at the MSB end.
>
> Unfortunately, the current YUV444P10 format we have defined puts
> the data bits at the LSB end.
>
> This mismatch led to us introducing a fudge in nvenc where we
> mapped their 444P10 format to the ffmpeg 444P16 format. This
> ensured that the data ends up in the right place if you are
> starting with 10bit content, but it leads to other problems.
>
> Specifically:
>
> * >10bit content will be preferrentially converted to 444P16
>   to pass to nvenc to 'preserve' the additional bits. However, they
>   aren't actually preserved, but are discarded, so we're worse off
>   because no dithering took place in swscale.
> * On top of that, the input data is almost certainly 4:2:0 and so
>   the conversion to 4:4:4 is pointless, while also leading to an
>   output file that's incompatible with many playback scenarios
>   (no hardware decoder supports 4:4:4 content).
>
> So, for the sake of accuracy, introduce an explicit pixfmt for this
> situation. There's no conversion support because you'll basically
> never have a use for it. If someone ever finds one, they can write
> the swscale code.
>
> In the mean time, common 12 bit content (YUV420P12 or P016) will be
> correctly converted to P010 for nvenc.

What does this change have to do with 4:2:0 content?

- Hendrik
Timo Rothenpieler Feb. 2, 2017, 10:26 a.m. UTC | #2
>> In the mean time, common 12 bit content (YUV420P12 or P016) will be
>> correctly converted to P010 for nvenc.
> 
> What does this change have to do with 4:2:0 content?

Apparently swscale decides that any kind of 12 or 16 bit content, even
if 4:2:0, whould be converted to YUV444P16 instead of downsampling it to
P010.
Michael Niedermayer Feb. 2, 2017, 10:48 a.m. UTC | #3
On Thu, Feb 02, 2017 at 11:26:26AM +0100, Timo Rothenpieler wrote:
> >> In the mean time, common 12 bit content (YUV420P12 or P016) will be
> >> correctly converted to P010 for nvenc.
> > 
> > What does this change have to do with 4:2:0 content?
> 
> Apparently swscale decides that any kind of 12 or 16 bit content, even
> if 4:2:0, whould be converted to YUV444P16 instead of downsampling it to
> P010.

The code decididing pix fmts is in libavutil
av_find_best_pix_fmt_of_2() and code using that (if i guess correctly
where the choice is made)
so its not swscale that decides what to use

[...]
Michael Niedermayer Feb. 2, 2017, 11:08 a.m. UTC | #4
On Thu, Feb 02, 2017 at 11:48:49AM +0100, Michael Niedermayer wrote:
> On Thu, Feb 02, 2017 at 11:26:26AM +0100, Timo Rothenpieler wrote:
> > >> In the mean time, common 12 bit content (YUV420P12 or P016) will be
> > >> correctly converted to P010 for nvenc.
> > > 
> > > What does this change have to do with 4:2:0 content?
> > 
> > Apparently swscale decides that any kind of 12 or 16 bit content, even
> > if 4:2:0, whould be converted to YUV444P16 instead of downsampling it to
> > P010.
> 
> The code decididing pix fmts is in libavutil
> av_find_best_pix_fmt_of_2() and code using that (if i guess correctly
> where the choice is made)

maybe i worded this unclear, iam not sure thats the only path pixel
format decissions are made in but this should be the main one

If the function makes a mistake instead of teh formats available to
it being restricted by something then fixing it could be an easy
solution

[...]
Philip Langdale Feb. 2, 2017, 4:14 p.m. UTC | #5
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 2 Feb 2017 12:08:59 +0100
Michael Niedermayer <michaelni@gmx.at> wrote:

> On Thu, Feb 02, 2017 at 11:48:49AM +0100, Michael Niedermayer wrote:

> > On Thu, Feb 02, 2017 at 11:26:26AM +0100, Timo Rothenpieler wrote:  

> > > >> In the mean time, common 12 bit content (YUV420P12 or P016)

> > > >> will be correctly converted to P010 for nvenc.  

> > > > 

> > > > What does this change have to do with 4:2:0 content?  

> > > 

> > > Apparently swscale decides that any kind of 12 or 16 bit content,

> > > even if 4:2:0, whould be converted to YUV444P16 instead of

> > > downsampling it to P010.  

> > 

> > The code decididing pix fmts is in libavutil

> > av_find_best_pix_fmt_of_2() and code using that (if i guess

> > correctly where the choice is made)  

> 

> maybe i worded this unclear, iam not sure thats the only path pixel

> format decissions are made in but this should be the main one

> 

> If the function makes a mistake instead of teh formats available to

> it being restricted by something then fixing it could be an easy

> solution

> 

> [...]

> 

> 


There's nothing wrong with the pixfmt selection logic. The problem is
that we are pretending nvenc supports a particular format when it does
not.

I will try and provide a more streamlined explanation.

* nvenc supports 8 and 10 bit encoding
* It supports two 10 bit input formats: P010 and (YUV444P10 with MSB
  packing)
* ffmpeg supports P010 and (YUV444P10 with LSB packing)
* Mapping the two YUV444P10 formats to each other results in a mess
  because the bits are interpreted differently on each side. This is no
  surprise.
* It was observed that 10bit data in YUV444P16, which is then passed to
  nvenc saying it is YUV444P10 will produce correct results because the
  bits end up in the right place. This is what the code does today.
* However, that is only true for 10bit content.
* If you try and provide 12bit (or higher) content to nvenc, the logic
  that tries to find the best pixfmt will, rightly, prefer YUV444P16 to
  P010 because it fully preserves the bit depth.
* However, remember that nvenc doesn't support >10bit content. It will
  discard/ignore the extra bits.
* Instead, we would much rather that P010 is selected and swscale used
  to dither from 12bits to 10bits, and then this data goes to nvenc.

To avoid this, you have two options

* Do not map the nvenc YUV444P10 to anything: ffmpeg doesn't support
  this format, so forget about it. This is easy.
* Map YUV444P10 to something that accurately represents what it is,
  which is what my change here does (it's not the only way. It's *a*
  way). Then the pixfmt selection code can do the right thing because
  it has accurate data to work with.

If people don't like the new pixfmt, I'm happy to just remove the
mapping. But it's really not ok to keep what we have today which is
just going to cause incorrect handling of >10bit content.

- --phil
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAliTWsgACgkQ+NaxlGp1aC4OmwCdGRqRBtktmrN2y3PiIj7HJVgV
yyQAn1lYX06uakA8ZDE/cZSpe4zmH2VU
=DZ7X
-----END PGP SIGNATURE-----
wm4 Feb. 2, 2017, 4:26 p.m. UTC | #6
On Thu, 2 Feb 2017 08:14:00 -0800
Philip Langdale <philipl@overt.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Thu, 2 Feb 2017 12:08:59 +0100
> Michael Niedermayer <michaelni@gmx.at> wrote:
> 
> > On Thu, Feb 02, 2017 at 11:48:49AM +0100, Michael Niedermayer wrote:  
> > > On Thu, Feb 02, 2017 at 11:26:26AM +0100, Timo Rothenpieler wrote:    
> > > > >> In the mean time, common 12 bit content (YUV420P12 or P016)
> > > > >> will be correctly converted to P010 for nvenc.    
> > > > > 
> > > > > What does this change have to do with 4:2:0 content?    
> > > > 
> > > > Apparently swscale decides that any kind of 12 or 16 bit content,
> > > > even if 4:2:0, whould be converted to YUV444P16 instead of
> > > > downsampling it to P010.    
> > > 
> > > The code decididing pix fmts is in libavutil
> > > av_find_best_pix_fmt_of_2() and code using that (if i guess
> > > correctly where the choice is made)    
> > 
> > maybe i worded this unclear, iam not sure thats the only path pixel
> > format decissions are made in but this should be the main one
> > 
> > If the function makes a mistake instead of teh formats available to
> > it being restricted by something then fixing it could be an easy
> > solution
> > 
> > [...]
> > 
> >   
> 
> There's nothing wrong with the pixfmt selection logic. The problem is
> that we are pretending nvenc supports a particular format when it does
> not.
> 
> I will try and provide a more streamlined explanation.
> 
> * nvenc supports 8 and 10 bit encoding
> * It supports two 10 bit input formats: P010 and (YUV444P10 with MSB
>   packing)
> * ffmpeg supports P010 and (YUV444P10 with LSB packing)
> * Mapping the two YUV444P10 formats to each other results in a mess
>   because the bits are interpreted differently on each side. This is no
>   surprise.
> * It was observed that 10bit data in YUV444P16, which is then passed to
>   nvenc saying it is YUV444P10 will produce correct results because the
>   bits end up in the right place. This is what the code does today.
> * However, that is only true for 10bit content.
> * If you try and provide 12bit (or higher) content to nvenc, the logic
>   that tries to find the best pixfmt will, rightly, prefer YUV444P16 to
>   P010 because it fully preserves the bit depth.
> * However, remember that nvenc doesn't support >10bit content. It will
>   discard/ignore the extra bits.
> * Instead, we would much rather that P010 is selected and swscale used
>   to dither from 12bits to 10bits, and then this data goes to nvenc.
> 
> To avoid this, you have two options
> 
> * Do not map the nvenc YUV444P10 to anything: ffmpeg doesn't support
>   this format, so forget about it. This is easy.
> * Map YUV444P10 to something that accurately represents what it is,
>   which is what my change here does (it's not the only way. It's *a*
>   way). Then the pixfmt selection code can do the right thing because
>   it has accurate data to work with.
> 
> If people don't like the new pixfmt, I'm happy to just remove the
> mapping. But it's really not ok to keep what we have today which is
> just going to cause incorrect handling of >10bit content.

Does supporting the nvenc 4:4:4 format even have any advantage? Does it
support encoding something non-4:2:0? If not, then the first option is
probably preferable for now.
Michael Niedermayer Feb. 2, 2017, 5:21 p.m. UTC | #7
On Thu, Feb 02, 2017 at 08:14:00AM -0800, Philip Langdale wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Thu, 2 Feb 2017 12:08:59 +0100
> Michael Niedermayer <michaelni@gmx.at> wrote:
> 
> > On Thu, Feb 02, 2017 at 11:48:49AM +0100, Michael Niedermayer wrote:
> > > On Thu, Feb 02, 2017 at 11:26:26AM +0100, Timo Rothenpieler wrote:  
> > > > >> In the mean time, common 12 bit content (YUV420P12 or P016)
> > > > >> will be correctly converted to P010 for nvenc.  
> > > > > 
> > > > > What does this change have to do with 4:2:0 content?  
> > > > 
> > > > Apparently swscale decides that any kind of 12 or 16 bit content,
> > > > even if 4:2:0, whould be converted to YUV444P16 instead of
> > > > downsampling it to P010.  
> > > 
> > > The code decididing pix fmts is in libavutil
> > > av_find_best_pix_fmt_of_2() and code using that (if i guess
> > > correctly where the choice is made)  
> > 
> > maybe i worded this unclear, iam not sure thats the only path pixel
> > format decissions are made in but this should be the main one
> > 
> > If the function makes a mistake instead of teh formats available to
> > it being restricted by something then fixing it could be an easy
> > solution
> > 
> > [...]
> > 
> > 
> 
> There's nothing wrong with the pixfmt selection logic. The problem is
> that we are pretending nvenc supports a particular format when it does
> not.
> 
> I will try and provide a more streamlined explanation.
> 
> * nvenc supports 8 and 10 bit encoding
> * It supports two 10 bit input formats: P010 and (YUV444P10 with MSB
>   packing)
> * ffmpeg supports P010 and (YUV444P10 with LSB packing)
> * Mapping the two YUV444P10 formats to each other results in a mess
>   because the bits are interpreted differently on each side. This is no
>   surprise.
> * It was observed that 10bit data in YUV444P16, which is then passed to
>   nvenc saying it is YUV444P10 will produce correct results because the
>   bits end up in the right place. This is what the code does today.
> * However, that is only true for 10bit content.
> * If you try and provide 12bit (or higher) content to nvenc, the logic
>   that tries to find the best pixfmt will, rightly, prefer YUV444P16 to
>   P010 because it fully preserves the bit depth.
> * However, remember that nvenc doesn't support >10bit content. It will
>   discard/ignore the extra bits.
> * Instead, we would much rather that P010 is selected and swscale used
>   to dither from 12bits to 10bits, and then this data goes to nvenc.
> 
> To avoid this, you have two options
> 
> * Do not map the nvenc YUV444P10 to anything: ffmpeg doesn't support
>   this format, so forget about it. This is easy.
> * Map YUV444P10 to something that accurately represents what it is,
>   which is what my change here does (it's not the only way. It's *a*
>   way). Then the pixfmt selection code can do the right thing because
>   it has accurate data to work with.
> 

> If people don't like the new pixfmt, I'm happy to just remove the
> mapping. But it's really not ok to keep what we have today which is
> just going to cause incorrect handling of >10bit content.

my comment wasnt intended to imply an oppinion on the patch.
I just wanted to correct the pointer to the code deciding the pix fmt

[...]
Philip Langdale Feb. 2, 2017, 6:11 p.m. UTC | #8
On 2017-02-02 08:26, wm4 wrote:
> 
> Does supporting the nvenc 4:4:4 format even have any advantage? Does it
> support encoding something non-4:2:0? If not, then the first option is
> probably preferable for now.

With sufficiently new hardware it does. It will output h.264 or hevc
with 4:4:4 by default if given a 4:4:4 input format.

I noted that as an additional disadvantage of the current logic. It'll 
produce
4:4:4 output for >10bit 4:2:0 input, which is unnecessarily incompatible
with the wider world.

--phil
diff mbox

Patch

diff --git a/libavutil/pixdesc.c b/libavutil/pixdesc.c
index 3b9c45d..fe9b59d 100644
--- a/libavutil/pixdesc.c
+++ b/libavutil/pixdesc.c
@@ -1565,6 +1565,30 @@  static const AVPixFmtDescriptor av_pix_fmt_descriptors[AV_PIX_FMT_NB] = {
         },
         .flags = AV_PIX_FMT_FLAG_BE | AV_PIX_FMT_FLAG_PLANAR,
     },
+    [AV_PIX_FMT_YUV444P10MSBLE] = {
+        .name = "yuv444p10msble",
+        .nb_components = 3,
+        .log2_chroma_w = 0,
+        .log2_chroma_h = 0,
+        .comp = {
+            { 0, 2, 0, 6, 10, 1, 9, 1 },        /* Y */
+            { 1, 2, 0, 6, 10, 1, 9, 1 },        /* U */
+            { 2, 2, 0, 6, 10, 1, 9, 1 },        /* V */
+        },
+        .flags = AV_PIX_FMT_FLAG_PLANAR,
+    },
+    [AV_PIX_FMT_YUV444P10MSBBE] = {
+        .name = "yuv444p10msbbe",
+        .nb_components = 3,
+        .log2_chroma_w = 0,
+        .log2_chroma_h = 0,
+        .comp = {
+            { 0, 2, 0, 6, 10, 1, 9, 1 },        /* Y */
+            { 1, 2, 0, 6, 10, 1, 9, 1 },        /* U */
+            { 2, 2, 0, 6, 10, 1, 9, 1 },        /* V */
+        },
+        .flags = AV_PIX_FMT_FLAG_BE | AV_PIX_FMT_FLAG_PLANAR,
+    },
     [AV_PIX_FMT_YUV444P9LE] = {
         .name = "yuv444p9le",
         .nb_components = 3,
diff --git a/libavutil/pixfmt.h b/libavutil/pixfmt.h
index dfb1b11..46bf9c0 100644
--- a/libavutil/pixfmt.h
+++ b/libavutil/pixfmt.h
@@ -314,6 +314,9 @@  enum AVPixelFormat {
     AV_PIX_FMT_P016LE, ///< like NV12, with 16bpp per component, little-endian
     AV_PIX_FMT_P016BE, ///< like NV12, with 16bpp per component, big-endian
 
+    AV_PIX_FMT_YUV444P10MSBLE, /// like YUV444P10 but with data in the high bits, zeros in the low bits, little-endian
+    AV_PIX_FMT_YUV444P10MSBBE, /// like YUV444P10 but with data in the high bits, zeros in the low bits, big-endian
+
     AV_PIX_FMT_NB         ///< number of pixel formats, DO NOT USE THIS if you want to link with shared libav* because the number of formats might differ between versions
 };
 
@@ -394,6 +397,8 @@  enum AVPixelFormat {
 #define AV_PIX_FMT_P010       AV_PIX_FMT_NE(P010BE,  P010LE)
 #define AV_PIX_FMT_P016       AV_PIX_FMT_NE(P016BE,  P016LE)
 
+#define AV_PIX_FMT_YUV444P10MSB AV_PIX_FMT_NE(YUV444P10MSBBE, YUV444P10MSBLE)
+
 /**
   * Chromaticity coordinates of the source primaries.
   */
diff --git a/libavutil/version.h b/libavutil/version.h
index 8866064..a8b00bf 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -79,7 +79,7 @@ 
  */
 
 #define LIBAVUTIL_VERSION_MAJOR  55
-#define LIBAVUTIL_VERSION_MINOR  46
+#define LIBAVUTIL_VERSION_MINOR  47
 #define LIBAVUTIL_VERSION_MICRO 100
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \