Message ID | 20170201225419.32405-2-philipl@overt.org |
---|---|
State | New |
Headers | show |
On Wed, Feb 1, 2017 at 11:54 PM, Philip Langdale <philipl@overt.org> wrote: > nvenc supports a YUV444P10 format with the same bit layout as P010, > with the data bits at the MSB end. > > Unfortunately, the current YUV444P10 format we have defined puts > the data bits at the LSB end. > > This mismatch led to us introducing a fudge in nvenc where we > mapped their 444P10 format to the ffmpeg 444P16 format. This > ensured that the data ends up in the right place if you are > starting with 10bit content, but it leads to other problems. > > Specifically: > > * >10bit content will be preferrentially converted to 444P16 > to pass to nvenc to 'preserve' the additional bits. However, they > aren't actually preserved, but are discarded, so we're worse off > because no dithering took place in swscale. > * On top of that, the input data is almost certainly 4:2:0 and so > the conversion to 4:4:4 is pointless, while also leading to an > output file that's incompatible with many playback scenarios > (no hardware decoder supports 4:4:4 content). > > So, for the sake of accuracy, introduce an explicit pixfmt for this > situation. There's no conversion support because you'll basically > never have a use for it. If someone ever finds one, they can write > the swscale code. > > In the mean time, common 12 bit content (YUV420P12 or P016) will be > correctly converted to P010 for nvenc. What does this change have to do with 4:2:0 content? - Hendrik
>> In the mean time, common 12 bit content (YUV420P12 or P016) will be >> correctly converted to P010 for nvenc. > > What does this change have to do with 4:2:0 content? Apparently swscale decides that any kind of 12 or 16 bit content, even if 4:2:0, whould be converted to YUV444P16 instead of downsampling it to P010.
On Thu, Feb 02, 2017 at 11:26:26AM +0100, Timo Rothenpieler wrote: > >> In the mean time, common 12 bit content (YUV420P12 or P016) will be > >> correctly converted to P010 for nvenc. > > > > What does this change have to do with 4:2:0 content? > > Apparently swscale decides that any kind of 12 or 16 bit content, even > if 4:2:0, whould be converted to YUV444P16 instead of downsampling it to > P010. The code decididing pix fmts is in libavutil av_find_best_pix_fmt_of_2() and code using that (if i guess correctly where the choice is made) so its not swscale that decides what to use [...]
On Thu, Feb 02, 2017 at 11:48:49AM +0100, Michael Niedermayer wrote: > On Thu, Feb 02, 2017 at 11:26:26AM +0100, Timo Rothenpieler wrote: > > >> In the mean time, common 12 bit content (YUV420P12 or P016) will be > > >> correctly converted to P010 for nvenc. > > > > > > What does this change have to do with 4:2:0 content? > > > > Apparently swscale decides that any kind of 12 or 16 bit content, even > > if 4:2:0, whould be converted to YUV444P16 instead of downsampling it to > > P010. > > The code decididing pix fmts is in libavutil > av_find_best_pix_fmt_of_2() and code using that (if i guess correctly > where the choice is made) maybe i worded this unclear, iam not sure thats the only path pixel format decissions are made in but this should be the main one If the function makes a mistake instead of teh formats available to it being restricted by something then fixing it could be an easy solution [...]
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 2 Feb 2017 12:08:59 +0100 Michael Niedermayer <michaelni@gmx.at> wrote: > On Thu, Feb 02, 2017 at 11:48:49AM +0100, Michael Niedermayer wrote: > > On Thu, Feb 02, 2017 at 11:26:26AM +0100, Timo Rothenpieler wrote: > > > >> In the mean time, common 12 bit content (YUV420P12 or P016) > > > >> will be correctly converted to P010 for nvenc. > > > > > > > > What does this change have to do with 4:2:0 content? > > > > > > Apparently swscale decides that any kind of 12 or 16 bit content, > > > even if 4:2:0, whould be converted to YUV444P16 instead of > > > downsampling it to P010. > > > > The code decididing pix fmts is in libavutil > > av_find_best_pix_fmt_of_2() and code using that (if i guess > > correctly where the choice is made) > > maybe i worded this unclear, iam not sure thats the only path pixel > format decissions are made in but this should be the main one > > If the function makes a mistake instead of teh formats available to > it being restricted by something then fixing it could be an easy > solution > > [...] > > There's nothing wrong with the pixfmt selection logic. The problem is that we are pretending nvenc supports a particular format when it does not. I will try and provide a more streamlined explanation. * nvenc supports 8 and 10 bit encoding * It supports two 10 bit input formats: P010 and (YUV444P10 with MSB packing) * ffmpeg supports P010 and (YUV444P10 with LSB packing) * Mapping the two YUV444P10 formats to each other results in a mess because the bits are interpreted differently on each side. This is no surprise. * It was observed that 10bit data in YUV444P16, which is then passed to nvenc saying it is YUV444P10 will produce correct results because the bits end up in the right place. This is what the code does today. * However, that is only true for 10bit content. * If you try and provide 12bit (or higher) content to nvenc, the logic that tries to find the best pixfmt will, rightly, prefer YUV444P16 to P010 because it fully preserves the bit depth. * However, remember that nvenc doesn't support >10bit content. It will discard/ignore the extra bits. * Instead, we would much rather that P010 is selected and swscale used to dither from 12bits to 10bits, and then this data goes to nvenc. To avoid this, you have two options * Do not map the nvenc YUV444P10 to anything: ffmpeg doesn't support this format, so forget about it. This is easy. * Map YUV444P10 to something that accurately represents what it is, which is what my change here does (it's not the only way. It's *a* way). Then the pixfmt selection code can do the right thing because it has accurate data to work with. If people don't like the new pixfmt, I'm happy to just remove the mapping. But it's really not ok to keep what we have today which is just going to cause incorrect handling of >10bit content. - --phil -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAliTWsgACgkQ+NaxlGp1aC4OmwCdGRqRBtktmrN2y3PiIj7HJVgV yyQAn1lYX06uakA8ZDE/cZSpe4zmH2VU =DZ7X -----END PGP SIGNATURE-----
On Thu, 2 Feb 2017 08:14:00 -0800 Philip Langdale <philipl@overt.org> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Thu, 2 Feb 2017 12:08:59 +0100 > Michael Niedermayer <michaelni@gmx.at> wrote: > > > On Thu, Feb 02, 2017 at 11:48:49AM +0100, Michael Niedermayer wrote: > > > On Thu, Feb 02, 2017 at 11:26:26AM +0100, Timo Rothenpieler wrote: > > > > >> In the mean time, common 12 bit content (YUV420P12 or P016) > > > > >> will be correctly converted to P010 for nvenc. > > > > > > > > > > What does this change have to do with 4:2:0 content? > > > > > > > > Apparently swscale decides that any kind of 12 or 16 bit content, > > > > even if 4:2:0, whould be converted to YUV444P16 instead of > > > > downsampling it to P010. > > > > > > The code decididing pix fmts is in libavutil > > > av_find_best_pix_fmt_of_2() and code using that (if i guess > > > correctly where the choice is made) > > > > maybe i worded this unclear, iam not sure thats the only path pixel > > format decissions are made in but this should be the main one > > > > If the function makes a mistake instead of teh formats available to > > it being restricted by something then fixing it could be an easy > > solution > > > > [...] > > > > > > There's nothing wrong with the pixfmt selection logic. The problem is > that we are pretending nvenc supports a particular format when it does > not. > > I will try and provide a more streamlined explanation. > > * nvenc supports 8 and 10 bit encoding > * It supports two 10 bit input formats: P010 and (YUV444P10 with MSB > packing) > * ffmpeg supports P010 and (YUV444P10 with LSB packing) > * Mapping the two YUV444P10 formats to each other results in a mess > because the bits are interpreted differently on each side. This is no > surprise. > * It was observed that 10bit data in YUV444P16, which is then passed to > nvenc saying it is YUV444P10 will produce correct results because the > bits end up in the right place. This is what the code does today. > * However, that is only true for 10bit content. > * If you try and provide 12bit (or higher) content to nvenc, the logic > that tries to find the best pixfmt will, rightly, prefer YUV444P16 to > P010 because it fully preserves the bit depth. > * However, remember that nvenc doesn't support >10bit content. It will > discard/ignore the extra bits. > * Instead, we would much rather that P010 is selected and swscale used > to dither from 12bits to 10bits, and then this data goes to nvenc. > > To avoid this, you have two options > > * Do not map the nvenc YUV444P10 to anything: ffmpeg doesn't support > this format, so forget about it. This is easy. > * Map YUV444P10 to something that accurately represents what it is, > which is what my change here does (it's not the only way. It's *a* > way). Then the pixfmt selection code can do the right thing because > it has accurate data to work with. > > If people don't like the new pixfmt, I'm happy to just remove the > mapping. But it's really not ok to keep what we have today which is > just going to cause incorrect handling of >10bit content. Does supporting the nvenc 4:4:4 format even have any advantage? Does it support encoding something non-4:2:0? If not, then the first option is probably preferable for now.
On Thu, Feb 02, 2017 at 08:14:00AM -0800, Philip Langdale wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Thu, 2 Feb 2017 12:08:59 +0100 > Michael Niedermayer <michaelni@gmx.at> wrote: > > > On Thu, Feb 02, 2017 at 11:48:49AM +0100, Michael Niedermayer wrote: > > > On Thu, Feb 02, 2017 at 11:26:26AM +0100, Timo Rothenpieler wrote: > > > > >> In the mean time, common 12 bit content (YUV420P12 or P016) > > > > >> will be correctly converted to P010 for nvenc. > > > > > > > > > > What does this change have to do with 4:2:0 content? > > > > > > > > Apparently swscale decides that any kind of 12 or 16 bit content, > > > > even if 4:2:0, whould be converted to YUV444P16 instead of > > > > downsampling it to P010. > > > > > > The code decididing pix fmts is in libavutil > > > av_find_best_pix_fmt_of_2() and code using that (if i guess > > > correctly where the choice is made) > > > > maybe i worded this unclear, iam not sure thats the only path pixel > > format decissions are made in but this should be the main one > > > > If the function makes a mistake instead of teh formats available to > > it being restricted by something then fixing it could be an easy > > solution > > > > [...] > > > > > > There's nothing wrong with the pixfmt selection logic. The problem is > that we are pretending nvenc supports a particular format when it does > not. > > I will try and provide a more streamlined explanation. > > * nvenc supports 8 and 10 bit encoding > * It supports two 10 bit input formats: P010 and (YUV444P10 with MSB > packing) > * ffmpeg supports P010 and (YUV444P10 with LSB packing) > * Mapping the two YUV444P10 formats to each other results in a mess > because the bits are interpreted differently on each side. This is no > surprise. > * It was observed that 10bit data in YUV444P16, which is then passed to > nvenc saying it is YUV444P10 will produce correct results because the > bits end up in the right place. This is what the code does today. > * However, that is only true for 10bit content. > * If you try and provide 12bit (or higher) content to nvenc, the logic > that tries to find the best pixfmt will, rightly, prefer YUV444P16 to > P010 because it fully preserves the bit depth. > * However, remember that nvenc doesn't support >10bit content. It will > discard/ignore the extra bits. > * Instead, we would much rather that P010 is selected and swscale used > to dither from 12bits to 10bits, and then this data goes to nvenc. > > To avoid this, you have two options > > * Do not map the nvenc YUV444P10 to anything: ffmpeg doesn't support > this format, so forget about it. This is easy. > * Map YUV444P10 to something that accurately represents what it is, > which is what my change here does (it's not the only way. It's *a* > way). Then the pixfmt selection code can do the right thing because > it has accurate data to work with. > > If people don't like the new pixfmt, I'm happy to just remove the > mapping. But it's really not ok to keep what we have today which is > just going to cause incorrect handling of >10bit content. my comment wasnt intended to imply an oppinion on the patch. I just wanted to correct the pointer to the code deciding the pix fmt [...]
On 2017-02-02 08:26, wm4 wrote: > > Does supporting the nvenc 4:4:4 format even have any advantage? Does it > support encoding something non-4:2:0? If not, then the first option is > probably preferable for now. With sufficiently new hardware it does. It will output h.264 or hevc with 4:4:4 by default if given a 4:4:4 input format. I noted that as an additional disadvantage of the current logic. It'll produce 4:4:4 output for >10bit 4:2:0 input, which is unnecessarily incompatible with the wider world. --phil
diff --git a/libavutil/pixdesc.c b/libavutil/pixdesc.c index 3b9c45d..fe9b59d 100644 --- a/libavutil/pixdesc.c +++ b/libavutil/pixdesc.c @@ -1565,6 +1565,30 @@ static const AVPixFmtDescriptor av_pix_fmt_descriptors[AV_PIX_FMT_NB] = { }, .flags = AV_PIX_FMT_FLAG_BE | AV_PIX_FMT_FLAG_PLANAR, }, + [AV_PIX_FMT_YUV444P10MSBLE] = { + .name = "yuv444p10msble", + .nb_components = 3, + .log2_chroma_w = 0, + .log2_chroma_h = 0, + .comp = { + { 0, 2, 0, 6, 10, 1, 9, 1 }, /* Y */ + { 1, 2, 0, 6, 10, 1, 9, 1 }, /* U */ + { 2, 2, 0, 6, 10, 1, 9, 1 }, /* V */ + }, + .flags = AV_PIX_FMT_FLAG_PLANAR, + }, + [AV_PIX_FMT_YUV444P10MSBBE] = { + .name = "yuv444p10msbbe", + .nb_components = 3, + .log2_chroma_w = 0, + .log2_chroma_h = 0, + .comp = { + { 0, 2, 0, 6, 10, 1, 9, 1 }, /* Y */ + { 1, 2, 0, 6, 10, 1, 9, 1 }, /* U */ + { 2, 2, 0, 6, 10, 1, 9, 1 }, /* V */ + }, + .flags = AV_PIX_FMT_FLAG_BE | AV_PIX_FMT_FLAG_PLANAR, + }, [AV_PIX_FMT_YUV444P9LE] = { .name = "yuv444p9le", .nb_components = 3, diff --git a/libavutil/pixfmt.h b/libavutil/pixfmt.h index dfb1b11..46bf9c0 100644 --- a/libavutil/pixfmt.h +++ b/libavutil/pixfmt.h @@ -314,6 +314,9 @@ enum AVPixelFormat { AV_PIX_FMT_P016LE, ///< like NV12, with 16bpp per component, little-endian AV_PIX_FMT_P016BE, ///< like NV12, with 16bpp per component, big-endian + AV_PIX_FMT_YUV444P10MSBLE, /// like YUV444P10 but with data in the high bits, zeros in the low bits, little-endian + AV_PIX_FMT_YUV444P10MSBBE, /// like YUV444P10 but with data in the high bits, zeros in the low bits, big-endian + AV_PIX_FMT_NB ///< number of pixel formats, DO NOT USE THIS if you want to link with shared libav* because the number of formats might differ between versions }; @@ -394,6 +397,8 @@ enum AVPixelFormat { #define AV_PIX_FMT_P010 AV_PIX_FMT_NE(P010BE, P010LE) #define AV_PIX_FMT_P016 AV_PIX_FMT_NE(P016BE, P016LE) +#define AV_PIX_FMT_YUV444P10MSB AV_PIX_FMT_NE(YUV444P10MSBBE, YUV444P10MSBLE) + /** * Chromaticity coordinates of the source primaries. */ diff --git a/libavutil/version.h b/libavutil/version.h index 8866064..a8b00bf 100644 --- a/libavutil/version.h +++ b/libavutil/version.h @@ -79,7 +79,7 @@ */ #define LIBAVUTIL_VERSION_MAJOR 55 -#define LIBAVUTIL_VERSION_MINOR 46 +#define LIBAVUTIL_VERSION_MINOR 47 #define LIBAVUTIL_VERSION_MICRO 100 #define LIBAVUTIL_VERSION_INT AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
nvenc supports a YUV444P10 format with the same bit layout as P010, with the data bits at the MSB end. Unfortunately, the current YUV444P10 format we have defined puts the data bits at the LSB end. This mismatch led to us introducing a fudge in nvenc where we mapped their 444P10 format to the ffmpeg 444P16 format. This ensured that the data ends up in the right place if you are starting with 10bit content, but it leads to other problems. Specifically: * >10bit content will be preferrentially converted to 444P16 to pass to nvenc to 'preserve' the additional bits. However, they aren't actually preserved, but are discarded, so we're worse off because no dithering took place in swscale. * On top of that, the input data is almost certainly 4:2:0 and so the conversion to 4:4:4 is pointless, while also leading to an output file that's incompatible with many playback scenarios (no hardware decoder supports 4:4:4 content). So, for the sake of accuracy, introduce an explicit pixfmt for this situation. There's no conversion support because you'll basically never have a use for it. If someone ever finds one, they can write the swscale code. In the mean time, common 12 bit content (YUV420P12 or P016) will be correctly converted to P010 for nvenc. Signed-off-by: Philip Langdale <philipl@overt.org> --- libavutil/pixdesc.c | 24 ++++++++++++++++++++++++ libavutil/pixfmt.h | 5 +++++ libavutil/version.h | 2 +- 3 files changed, 30 insertions(+), 1 deletion(-)