diff mbox

[FFmpeg-devel] Nvidia NVENC 10-bit HEVC encoding and rate control lookahead support

Message ID 26E0E9A5-8B82-4D90-A97C-50E62FF69AB6@mac.com
State Accepted
Headers show

Commit Message

Oliver Collyer Aug. 23, 2016, 5:10 p.m. UTC
Hi all

Attached is a patch for the above.

10-bit HEVC encoding is a new feature of the latest Pascal Nvidia GPUs, released in the past few months; I’ve added support for the yuv420p10le and yuv444p10le pixel formats.

Rate control lookahead is available on pre-Pascal models too but is available with the latest SDK/latest drivers.

As part of this I’ve bumped the required SDK version to the latest, which is 7.

Feedback welcome. This is only my second patch; I seem to average about one a year :)

Regards

Oliver

---
configure               |   4 +-
libavcodec/nvenc.c      | 120 ++++++++++++++++++++++++++++++++++++++++++++++--
libavcodec/nvenc.h      |   6 +++
libavcodec/nvenc_hevc.c |   6 ++-
4 files changed, 129 insertions(+), 7 deletions(-)

Comments

Carl Eugen Hoyos Aug. 23, 2016, 6:21 p.m. UTC | #1
Hi!

2016-08-23 19:10 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
> +    AV_PIX_FMT_YUV420P10LE,

I know this is theoretical but the Nvidia header seems to indicate
native endianness to me, so this should be AV_PIX_FMT_YUV420P10

> +    AV_PIX_FMT_YUV444P10LE

But after reading the rest of the patch:
Shouldn't this be AV_PIX_FMT_YUV444P16?

And instead of YUV420P10, shouldn't you use P010LE?

In any case, please split the rate control patch from the 10bit patch.

Carl Eugen

who wonders now how the Microsoft headers define the ten bit
yuv420 semi-planar format...
Oliver Collyer Aug. 23, 2016, 8:22 p.m. UTC | #2
Hi Carl

Thanks for looking at my patch.

> On 23 Aug 2016, at 21:21, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 
> Hi!
> 
> 2016-08-23 19:10 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>> +    AV_PIX_FMT_YUV420P10LE,
> 
> I know this is theoretical but the Nvidia header seems to indicate
> native endianness to me, so this should be AV_PIX_FMT_YUV420P10
> 
>> +    AV_PIX_FMT_YUV444P10LE
> 
> But after reading the rest of the patch:
> Shouldn't this be AV_PIX_FMT_YUV444P16?
> 

How so - the Nvidia doc is stating that the encoder is taking 10 bits per component, not 16?

> And instead of YUV420P10, shouldn't you use P010LE?
> 

Yes, I agree with this.

> In any case, please split the rate control patch from the 10bit patch.
> 

Ok, I will do that.

Oliver

> Carl Eugen
> 
> who wonders now how the Microsoft headers define the ten bit
> yuv420 semi-planar format...
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Carl Eugen Hoyos Aug. 24, 2016, 5:31 a.m. UTC | #3
Hi!

2016-08-23 22:22 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>> On 23 Aug 2016, at 21:21, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>> 2016-08-23 19:10 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>>> +    AV_PIX_FMT_YUV420P10LE,
>>
>> I know this is theoretical but the Nvidia header seems to indicate
>> native endianness to me, so this should be AV_PIX_FMT_YUV420P10
>>
>>> +    AV_PIX_FMT_YUV444P10LE
>>
>> But after reading the rest of the patch:
>> Shouldn't this be AV_PIX_FMT_YUV444P16?
>
> How so - the Nvidia doc is stating that the encoder is taking 10 bits per
> component, not 16?

AV_PIX_FMT_YUV444P16 should not need any conversion and therefore
be measurably faster, the least significant bits are ignored.
Please test and report back.

Carl Eugen
Oliver Collyer Aug. 24, 2016, 6:43 a.m. UTC | #4
> On 24 Aug 2016, at 08:31, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 
> Hi!
> 
> 2016-08-23 22:22 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>>> On 23 Aug 2016, at 21:21, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>> 2016-08-23 19:10 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>>>> +    AV_PIX_FMT_YUV420P10LE,
>>> 
>>> I know this is theoretical but the Nvidia header seems to indicate
>>> native endianness to me, so this should be AV_PIX_FMT_YUV420P10
>>> 
>>>> +    AV_PIX_FMT_YUV444P10LE
>>> 
>>> But after reading the rest of the patch:
>>> Shouldn't this be AV_PIX_FMT_YUV444P16?
>> 
>> How so - the Nvidia doc is stating that the encoder is taking 10 bits per
>> component, not 16?
> 
> AV_PIX_FMT_YUV444P16 should not need any conversion and therefore
> be measurably faster, the least significant bits are ignored.
> Please test and report back.
> 

Yes, I can confirm AV_PIX_FMT_YUV444P16 works fine so I can now ditch the conversion and do a straight plane copy.

Thanks

Oliver

> Carl Eugen
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Oliver Collyer Aug. 24, 2016, 7:41 a.m. UTC | #5
> On 23 Aug 2016, at 21:21, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 
> Hi!
> 
> 2016-08-23 19:10 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>> +    AV_PIX_FMT_YUV420P10LE,
> 
> I know this is theoretical but the Nvidia header seems to indicate
> native endianness to me, so this should be AV_PIX_FMT_YUV420P10
> 
>> +    AV_PIX_FMT_YUV444P10LE
> 
> But after reading the rest of the patch:
> Shouldn't this be AV_PIX_FMT_YUV444P16?
> 
> And instead of YUV420P10, shouldn't you use P010LE?
> 

So I’ve tried with P010 but ran into a problem in that this pixel format is only supported as an input format.

In my test I’m reading a yuv420p file and then specifying -pix_fmt P010 but this is giving an error message saying the conversion is impossible. ffmpeg -pix_fmts confirms it is only valid as an input format.

Of course, if the source is P010 then presumably there is no problem.

What should I do? Maybe support both P010 so that if someone has a source in this format it can be encoded natively but also support YUV420P10 with my conversion/shifting routine?

Or should I just support P010 and then consider it a limitation of FFmpeg that it cannot convert a different format to this one?

Regards

Oliver

> In any case, please split the rate control patch from the 10bit patch.
> 
> Carl Eugen
> 
> who wonders now how the Microsoft headers define the ten bit
> yuv420 semi-planar format...
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Carl Eugen Hoyos Aug. 24, 2016, 7:50 a.m. UTC | #6
Hi!

2016-08-24 9:41 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>
>> On 23 Aug 2016, at 21:21, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>
>> 2016-08-23 19:10 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>>> +    AV_PIX_FMT_YUV420P10LE,
>>
>> I know this is theoretical but the Nvidia header seems to indicate
>> native endianness to me, so this should be AV_PIX_FMT_YUV420P10
>>
>>> +    AV_PIX_FMT_YUV444P10LE
>>
>> But after reading the rest of the patch:
>> Shouldn't this be AV_PIX_FMT_YUV444P16?

(Thanks for testing this!)

>> And instead of YUV420P10, shouldn't you use P010LE?
>>
>
> So I’ve tried with P010 but ran into a problem in that this pixel format is
> only supported as an input format.
>
> In my test I’m reading a yuv420p file and then specifying -pix_fmt P010
> but this is giving an error message saying the conversion is impossible.
> ffmpeg -pix_fmts confirms it is only valid as an input format.

Sorry for not realizing this originally!

> Of course, if the source is P010 then presumably there is no problem.

> What should I do? Maybe support both P010 so that if someone has a
> source in this format it can be encoded natively but also support
> YUV420P10 with my conversion/shifting routine?
>
> Or should I just support P010 and then consider it a limitation of FFmpeg
> that it cannot convert a different format to this one?

Imo, both are ok (and the first obviously makes more sense) but wait for
> others to comment.

The ideal solution is of course if you port your conversion routine to
libswscale
(but this will need a little effort I guess and should imo not block
your patch).

Carl Eugen
Carl Eugen Hoyos Aug. 24, 2016, 7:56 a.m. UTC | #7
Hi!

2016-08-24 8:43 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
> Yes, I can confirm AV_PIX_FMT_YUV444P16 works fine so I can
> now ditch the conversion and do a straight plane copy.

I am curious: If you feed the encoder with
NV_ENC_BUFFER_FORMAT_YUV444_10BIT can you still
select 8bit encoding? Is this technically possible or not?

Carl Eugen
Oliver Collyer Aug. 24, 2016, 8 a.m. UTC | #8
I’m not sure what would happen - currently the nvenc.c code enforces 10-bit encoding when it gets a 10-bit input pixel format.

Whether the underlying engine allows it, or what would happen I don’t know.

> On 24 Aug 2016, at 10:56, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 
> Hi!
> 
> 2016-08-24 8:43 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>> Yes, I can confirm AV_PIX_FMT_YUV444P16 works fine so I can
>> now ditch the conversion and do a straight plane copy.
> 
> I am curious: If you feed the encoder with
> NV_ENC_BUFFER_FORMAT_YUV444_10BIT can you still
> select 8bit encoding? Is this technically possible or not?
> 
> Carl Eugen
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Oliver Collyer Aug. 24, 2016, 8:01 a.m. UTC | #9
> On 24 Aug 2016, at 10:50, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 
> Hi!
> 
> 2016-08-24 9:41 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>> 
>>> On 23 Aug 2016, at 21:21, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>> 
>>> 2016-08-23 19:10 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>>>> +    AV_PIX_FMT_YUV420P10LE,
>>> 
>>> I know this is theoretical but the Nvidia header seems to indicate
>>> native endianness to me, so this should be AV_PIX_FMT_YUV420P10
>>> 
>>>> +    AV_PIX_FMT_YUV444P10LE
>>> 
>>> But after reading the rest of the patch:
>>> Shouldn't this be AV_PIX_FMT_YUV444P16?
> 
> (Thanks for testing this!)
> 
>>> And instead of YUV420P10, shouldn't you use P010LE?
>>> 
>> 
>> So I’ve tried with P010 but ran into a problem in that this pixel format is
>> only supported as an input format.
>> 
>> In my test I’m reading a yuv420p file and then specifying -pix_fmt P010
>> but this is giving an error message saying the conversion is impossible.
>> ffmpeg -pix_fmts confirms it is only valid as an input format.
> 
> Sorry for not realizing this originally!
> 
>> Of course, if the source is P010 then presumably there is no problem.
> 
>> What should I do? Maybe support both P010 so that if someone has a
>> source in this format it can be encoded natively but also support
>> YUV420P10 with my conversion/shifting routine?
>> 
>> Or should I just support P010 and then consider it a limitation of FFmpeg
>> that it cannot convert a different format to this one?
> 
> Imo, both are ok (and the first obviously makes more sense) but wait for
>> others to comment.
> 
> The ideal solution is of course if you port your conversion routine to
> libswscale
> (but this will need a little effort I guess and should imo not block
> your patch).
> 

Ok, I’ll wait for more feedback.

There is a certain logic in supporting both since the patch then neatly adds 10-bit support for all the existing 8-bit formats it currently supports:

i.e. currently it supports...

YUV420P
NV12
YUV444P

…which would become

YUV420P
YUV420P10
NV12
P010
YUV444P
YUV444P16

This appeals to my sense of symmetry :)

Regards

Oliver

> Carl Eugen
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Oliver Collyer Aug. 24, 2016, 8:21 a.m. UTC | #10
> In any case, please split the rate control patch from the 10bit patch.
> 

Just double-checking this - both changes require a bump of the minimum NVENC version to 7. Do you still want them as separate patches or does this tie them together? If they are to be separate patches then obviously one of them will need to be applied first, so there is a dependency between them.

> Carl Eugen
> 
> who wonders now how the Microsoft headers define the ten bit
> yuv420 semi-planar format...
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Timo Rothenpieler Aug. 24, 2016, 9:04 a.m. UTC | #11
Am 24.08.2016 um 10:21 schrieb Oliver Collyer:
>> In any case, please split the rate control patch from the 10bit patch.
>>
> 
> Just double-checking this - both changes require a bump of the minimum NVENC version to 7. Do you still want them as separate patches or does this tie them together? If they are to be separate patches then obviously one of them will need to be applied first, so there is a dependency between them.

Just bump it with the first patch.
Also remember to bump lavc micro version.
Oliver Collyer Aug. 24, 2016, 10:30 a.m. UTC | #12
Ok thanks, Timo.

So I’ve split this into two patches and revised as per the discussions and they are attached here.

The only thing to be decided is whether my conversion code to enable YUV420P10 support should be included in this or not.

It’s in the attached patch but I’m happy to remove it if necessary.

Regards

Oliver
> On 24 Aug 2016, at 12:04, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> 
> Am 24.08.2016 um 10:21 schrieb Oliver Collyer:
>>> In any case, please split the rate control patch from the 10bit patch.
>>> 
>> 
>> Just double-checking this - both changes require a bump of the minimum NVENC version to 7. Do you still want them as separate patches or does this tie them together? If they are to be separate patches then obviously one of them will need to be applied first, so there is a dependency between them.
> 
> Just bump it with the first patch.
> Also remember to bump lavc micro version.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Carl Eugen Hoyos Aug. 24, 2016, 5:57 p.m. UTC | #13
Hi!

2016-08-24 10:00 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
> I’m not sure what would happen - currently the nvenc.c code enforces
> 10-bit encoding when it gets a 10-bit input pixel format.

Yes, my question was about what happens if you remove the enforcement /
force an 8bit profile.

And of course if it is possible to encode 10bit with 8bit input?

Carl Eugen
Oliver Collyer Aug. 24, 2016, 9:24 p.m. UTC | #14
> On 24 Aug 2016, at 20:57, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 
> Hi!
> 
> 2016-08-24 10:00 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:
>> I’m not sure what would happen - currently the nvenc.c code enforces
>> 10-bit encoding when it gets a 10-bit input pixel format.
> 
> Yes, my question was about what happens if you remove the enforcement /
> force an 8bit profile.
> 

Ok, I’ve been doing some testing.

I have a single test file which is in yuv420p pixel format and has file size 42,730KB.

I then ran different combinations of -pix_fmt and -profile:v while setting the encoder to a constant global quality of 21.

1. pix_fmt = yuv420p and profile is main => output file size 79,769KB
2. pix_fmt = yuv420p and profile is main10 => output file size 79,769KB

Although the file sizes of 1. and 2. are identical, the md5s were not. VLC reports both as “Planar YUV 420”, ffprobe reports 1. as having Main profile and 2. as having Main 10 profile. I suspect this internal labelling is what accounts for the md5 difference.

3. pix_fmt = yuv444p16 and profile is main => output file size 72,279KB
4. pix_fmt = yuv444p16 and profile is main10 => output file size 72,279KB

Both file sizes and md5s of 3. and 4. are identical. VLC reports both as “Planar YUV 444 10-bit LE”. ffprobe reports both as having Rext profile (why not Main 10 I wonder?)

In summary (and this ties-in with the NVidia documentation which details the steps for 10-bit encoding):

a) to get 10-bit encoding you need to set up a 10-bit input buffer.
b) if you force an 8-bit profile but give the encoder a 10-bit input buffer it treats it as a 10-bit profile
c) you can get some nice savings by converting an 8-bit input to 10-bits and encoding in 10-bits, although this comes at the expense of CPU processing when doing the 8-bit to 10-bit conversion of the input pixel format.

Regards

Oliver

> And of course if it is possible to encode 10bit with 8bit input?
> 
> Carl Eugen
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Carl Eugen Hoyos Aug. 25, 2016, 5:49 a.m. UTC | #15
2016-08-24 23:24 GMT+02:00 Oliver Collyer <ovcollyer@mac.com>:

> In summary (and this ties-in with the NVidia documentation which
> details the steps for 10-bit encoding):
>
> a) to get 10-bit encoding you need to set up a 10-bit input buffer.

Thank you for the tests!

No more comments from me, I cannot test.

Carl Eugen
Timo Rothenpieler Aug. 25, 2016, 10:48 a.m. UTC | #16
Am 24.08.2016 um 12:30 schrieb Oliver Collyer:
> Ok thanks, Timo.
> 
> So I’ve split this into two patches and revised as per the discussions and they are attached here.
> 
> The only thing to be decided is whether my conversion code to enable YUV420P10 support should be included in this or not.
> 
> It’s in the attached patch but I’m happy to remove it if necessary.

I'm not a fan of format-conversion code in nvenc. That's the job of swscale.
If a needed conversion is missing/performs poorly, it should be fixed in
sws instead.

> Regards
> 
> Oliver
> 

Unfortunately I'm still on my old GTX760, so I can't test all the
hevc/10bit stuff.
The patch looks Ok though and should generally be fine to merge minus
the format-conversion.

Might have to get myself an intermediary GTX1060 to upgrade my old PC
once again.
Oliver Collyer Aug. 25, 2016, 5:56 p.m. UTC | #17
Hi Timo

Thankyou for the clarification.

Attached are what should be the final versions of these patches then, with the support for YUV420P10 (and related conversion code) now dropped.

Regards

Oliver
> On 25 Aug 2016, at 13:48, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> 
> Am 24.08.2016 um 12:30 schrieb Oliver Collyer:
>> Ok thanks, Timo.
>> 
>> So I’ve split this into two patches and revised as per the discussions and they are attached here.
>> 
>> The only thing to be decided is whether my conversion code to enable YUV420P10 support should be included in this or not.
>> 
>> It’s in the attached patch but I’m happy to remove it if necessary.
> 
> I'm not a fan of format-conversion code in nvenc. That's the job of swscale.
> If a needed conversion is missing/performs poorly, it should be fixed in
> sws instead.
> 
>> Regards
>> 
>> Oliver
>> 
> 
> Unfortunately I'm still on my old GTX760, so I can't test all the
> hevc/10bit stuff.
> The patch looks Ok though and should generally be fine to merge minus
> the format-conversion.
> 
> Might have to get myself an intermediary GTX1060 to upgrade my old PC
> once again.
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Timo Rothenpieler Aug. 27, 2016, 11:49 a.m. UTC | #18
On 8/25/2016 7:56 PM, Oliver Collyer wrote:
> Hi Timo
> 
> Thankyou for the clarification.
> 
> Attached are what should be the final versions of these patches then, with the support for YUV420P10 (and related conversion code) now dropped.

While testing these patches, I noticed that you now have to go through a
lenghty registration and confirmation process(read: I wasn't able to get
the Version 7 Header/SDK yet, waiting for manual approval of my Video
SDK registration).

I definitely hope the nvEncodeApi header is still MIT licensed,
otherwise it would force me to reject these patches, or re-introduce the
non-free flag for nvenc.

Either way this is a horrible situation, as bumping the SDK requirement
to version 7 forces every user to go through the same registration process.
I'll push for another attempt of including the header in ffmpeg once I
get it. Provided it is still MIT licensed.

Until that is somehow sorted, I'll wait with merging these patches.
Oliver Collyer Aug. 27, 2016, 12:15 p.m. UTC | #19
Hi Timo

Well the copyright message at the top of nvEncodeAPI.h in the 7.0 SDK is identical to that for 6.0 so it looks ok in that respect.

I agree it’s an inconvenience to have to register and wait for approval (although mine came immediately and automatically).

Regards

Oliver

> On 27 Aug 2016, at 14:49, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> 
> On 8/25/2016 7:56 PM, Oliver Collyer wrote:
>> Hi Timo
>> 
>> Thankyou for the clarification.
>> 
>> Attached are what should be the final versions of these patches then, with the support for YUV420P10 (and related conversion code) now dropped.
> 
> While testing these patches, I noticed that you now have to go through a
> lenghty registration and confirmation process(read: I wasn't able to get
> the Version 7 Header/SDK yet, waiting for manual approval of my Video
> SDK registration).
> 
> I definitely hope the nvEncodeApi header is still MIT licensed,
> otherwise it would force me to reject these patches, or re-introduce the
> non-free flag for nvenc.
> 
> Either way this is a horrible situation, as bumping the SDK requirement
> to version 7 forces every user to go through the same registration process.
> I'll push for another attempt of including the header in ffmpeg once I
> get it. Provided it is still MIT licensed.
> 
> Until that is somehow sorted, I'll wait with merging these patches.
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
compn Aug. 28, 2016, 2:12 a.m. UTC | #20
On Sat, 27 Aug 2016 13:49:25 +0200
Timo Rothenpieler <timo@rothenpieler.org> wrote:

>I'll push for another attempt of including the header in ffmpeg

including nvidia header in linux kernel was a no-go ?

-compn
Timo Rothenpieler Aug. 29, 2016, 8:32 a.m. UTC | #21
> Hi all
> 
> Attached is a patch for the above.
> 
> 10-bit HEVC encoding is a new feature of the latest Pascal Nvidia GPUs, released in the past few months; I’ve added support for the yuv420p10le and yuv444p10le pixel formats.
> 
> Rate control lookahead is available on pre-Pascal models too but is available with the latest SDK/latest drivers.
> 
> As part of this I’ve bumped the required SDK version to the latest, which is 7.
> 
> Feedback welcome. This is only my second patch; I seem to average about one a year :)
> 
> Regards
> 
> Oliver

pushed with minimal changes adjusting for the changes in configure and
adding the lookahead parameter to h264 as well.
diff mbox

Patch

diff --git a/configure b/configure
index 9b92426..46ff144 100755
--- a/configure
+++ b/configure
@@ -5774,8 +5774,8 @@  enabled mmal && check_func_headers interface/mmal/mmal.h "MMAL_PARAMETER_VIDEO_M

enabled netcdf            && require_pkg_config netcdf netcdf.h nc_inq_libvers
enabled nvenc             && { check_header nvEncodeAPI.h || die "ERROR: nvEncodeAPI.h not found."; } &&
-                             { check_cpp_condition nvEncodeAPI.h "NVENCAPI_MAJOR_VERSION >= 6" ||
-                               die "ERROR: NVENC API version 5 or older is not supported"; } &&
+                             { check_cpp_condition nvEncodeAPI.h "NVENCAPI_MAJOR_VERSION >= 7" ||
+                               die "ERROR: NVENC API version 6 or older is not supported"; } &&
                             { [ $target_os != cygwin ] || die "ERROR: NVENC is not supported on Cygwin currently."; }
enabled openal            && { { for al_libs in "${OPENAL_LIBS}" "-lopenal" "-lOpenAL32"; do
                               check_lib 'AL/al.h' alGetError "${al_libs}" && break; done } ||
diff --git a/libavcodec/nvenc.c b/libavcodec/nvenc.c
index 984dd3b..685dd7d 100644
--- a/libavcodec/nvenc.c
+++ b/libavcodec/nvenc.c
@@ -75,8 +75,10 @@ 

const enum AVPixelFormat ff_nvenc_pix_fmts[] = {
    AV_PIX_FMT_YUV420P,
+    AV_PIX_FMT_YUV420P10LE,
    AV_PIX_FMT_NV12,
    AV_PIX_FMT_YUV444P,
+    AV_PIX_FMT_YUV444P10LE,
#if CONFIG_CUDA
    AV_PIX_FMT_CUDA,
#endif
@@ -314,6 +316,18 @@  static int nvenc_check_capabilities(AVCodecContext *avctx)
        return AVERROR(ENOSYS);
    }

+    ret = nvenc_check_cap(avctx, NV_ENC_CAPS_SUPPORT_10BIT_ENCODE);
+    if ((ctx->data_pix_fmt == AV_PIX_FMT_YUV420P10LE || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE) && ret <= 0) {
+        av_log(avctx, AV_LOG_VERBOSE, "10 bit encode not supported\n");
+        return AVERROR(ENOSYS);
+    }
+
+    ret = nvenc_check_cap(avctx, NV_ENC_CAPS_SUPPORT_LOOKAHEAD);
+    if (ctx->rc_lookahead > 0 && ret <= 0) {
+        av_log(avctx, AV_LOG_VERBOSE, "RC lookahead not supported\n");
+        return AVERROR(ENOSYS);
+    }
+
    return 0;
}

@@ -673,6 +687,11 @@  static av_cold void nvenc_setup_rate_control(AVCodecContext *avctx)
    } else if (ctx->encode_config.rcParams.averageBitRate > 0) {
        ctx->encode_config.rcParams.vbvBufferSize = 2 * ctx->encode_config.rcParams.averageBitRate;
    }
+
+    if (ctx->rc_lookahead > 0) {
+        ctx->encode_config.rcParams.enableLookahead = 1;
+        ctx->encode_config.rcParams.lookaheadDepth = FFMIN(ctx->rc_lookahead, 32);
+    }
}

static av_cold int nvenc_setup_h264_config(AVCodecContext *avctx)
@@ -800,9 +819,26 @@  static av_cold int nvenc_setup_hevc_config(AVCodecContext *avctx)
        hevc->outputPictureTimingSEI   = 1;
    }

-    /* No other profile is supported in the current SDK version 5 */
-    cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN_GUID;
-    avctx->profile = FF_PROFILE_HEVC_MAIN;
+    switch(ctx->profile) {
+    case NV_ENC_HEVC_PROFILE_MAIN:
+        cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN_GUID;
+        avctx->profile = FF_PROFILE_HEVC_MAIN;
+        break;
+    case NV_ENC_HEVC_PROFILE_MAIN_10:
+        cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN10_GUID;
+        avctx->profile = FF_PROFILE_HEVC_MAIN_10;
+        break;
+    }
+
+    // force setting profile as main10 if input is AV_PIX_FMT_YUVXXXP10LE
+    if (ctx->data_pix_fmt == AV_PIX_FMT_YUV420P10LE || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE) {
+        cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN10_GUID;
+        avctx->profile = FF_PROFILE_HEVC_MAIN_10;
+    }
+
+    hevc->chromaFormatIDC = ctx->data_pix_fmt == AV_PIX_FMT_YUV444P || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE ? 3 : 1;
+
+    hevc->pixelBitDepthMinus8 = ctx->data_pix_fmt == AV_PIX_FMT_YUV420P10LE || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE ? 2 : 0;

    hevc->level = ctx->level;

@@ -954,6 +990,10 @@  static av_cold int nvenc_alloc_surface(AVCodecContext *avctx, int idx)
        ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YV12_PL;
        break;

+    case AV_PIX_FMT_YUV420P10LE:
+        ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YUV420_10BIT;
+        break;
+
    case AV_PIX_FMT_NV12:
        ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_NV12_PL;
        break;
@@ -962,6 +1002,10 @@  static av_cold int nvenc_alloc_surface(AVCodecContext *avctx, int idx)
        ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YUV444_PL;
        break;

+    case AV_PIX_FMT_YUV444P10LE:
+        ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YUV444_10BIT;
+        break;
+
    default:
        av_log(avctx, AV_LOG_FATAL, "Invalid input pixel format\n");
        return AVERROR(EINVAL);
@@ -1206,6 +1250,49 @@  static NvencSurface *get_free_frame(NvencContext *ctx)
    return NULL;
}

+static void copy_single_10bit_plane(uint8_t *dst, int dst_linesize,
+                                    const uint8_t *src, int src_linesize,
+                                    int width, int height)
+{
+    if (!dst || !src)
+        return;
+    av_assert0(abs(src_linesize) >= width << 1);
+    av_assert0(abs(dst_linesize) >= width << 1);
+    for (;height > 0; height--) {
+        uint16_t* tdst = (uint16_t*)dst;
+        uint16_t* tsrc = (uint16_t*)src;
+        for (int w = width; w > 0; w--) {
+            *tdst++ = *tsrc++ << 6;
+        }
+        dst += dst_linesize;
+        src += src_linesize;
+    }
+}
+
+static void interleave_10bit_planes(uint8_t *dst, int dst_linesize,
+                                    const uint8_t *src1, int src1_linesize,
+                                    const uint8_t *src2, int src2_linesize,
+                                    int width, int height)
+{
+    if (!dst || !src1 || !src2)
+        return;
+    av_assert0(abs(src1_linesize) >= width);
+    av_assert0(abs(src2_linesize) >= width);
+    av_assert0(abs(dst_linesize) >= width << 1);
+    for (;height > 0; height--) {
+        uint16_t* tdst = (uint16_t*)dst;
+        uint16_t* tsrc1 = (uint16_t*)src1;
+        uint16_t* tsrc2 = (uint16_t*)src2;
+        for (int w = width; w > 0; w-=2) {
+            *tdst++ = *tsrc1++ << 6;
+            *tdst++ = *tsrc2++ << 6;
+        }
+        dst += dst_linesize;
+        src1 += src1_linesize;
+        src2 += src2_linesize;
+    }
+}
+
static int nvenc_copy_frame(AVCodecContext *avctx, NvencSurface *inSurf,
            NV_ENC_LOCK_INPUT_BUFFER *lockBufferParams, const AVFrame *frame)
{
@@ -1228,6 +1315,17 @@  static int nvenc_copy_frame(AVCodecContext *avctx, NvencSurface *inSurf,
        av_image_copy_plane(buf, lockBufferParams->pitch >> 1,
            frame->data[1], frame->linesize[1],
            avctx->width >> 1, avctx->height >> 1);
+    } else if (frame->format == AV_PIX_FMT_YUV420P10LE) {
+        copy_single_10bit_plane(buf, lockBufferParams->pitch,
+            frame->data[0], frame->linesize[0],
+            avctx->width, avctx->height);
+
+        buf += off;
+
+        interleave_10bit_planes(buf, lockBufferParams->pitch,
+            frame->data[1], frame->linesize[1],
+            frame->data[2], frame->linesize[2],
+            avctx->width, avctx->height >> 1);
    } else if (frame->format == AV_PIX_FMT_NV12) {
        av_image_copy_plane(buf, lockBufferParams->pitch,
            frame->data[0], frame->linesize[0],
@@ -1254,6 +1352,22 @@  static int nvenc_copy_frame(AVCodecContext *avctx, NvencSurface *inSurf,
        av_image_copy_plane(buf, lockBufferParams->pitch,
            frame->data[2], frame->linesize[2],
            avctx->width, avctx->height);
+    } else if (frame->format == AV_PIX_FMT_YUV444P10LE) {
+        copy_single_10bit_plane(buf, lockBufferParams->pitch,
+            frame->data[0], frame->linesize[0],
+            avctx->width, avctx->height);
+
+        buf += off;
+
+        copy_single_10bit_plane(buf, lockBufferParams->pitch,
+            frame->data[1], frame->linesize[1],
+            avctx->width, avctx->height);
+
+        buf += off;
+
+        copy_single_10bit_plane(buf, lockBufferParams->pitch,
+            frame->data[2], frame->linesize[2],
+            avctx->width, avctx->height);
    } else {
        av_log(avctx, AV_LOG_FATAL, "Invalid pixel format!\n");
        return AVERROR(EINVAL);
diff --git a/libavcodec/nvenc.h b/libavcodec/nvenc.h
index 961cbc7..9366a26 100644
--- a/libavcodec/nvenc.h
+++ b/libavcodec/nvenc.h
@@ -117,6 +117,11 @@  enum {
};

enum {
+    NV_ENC_HEVC_PROFILE_MAIN,
+    NV_ENC_HEVC_PROFILE_MAIN_10,
+};
+
+enum {
    NVENC_LOWLATENCY = 1,
    NVENC_LOSSLESS   = 2,
    NVENC_ONE_PASS   = 4,
@@ -174,6 +179,7 @@  typedef struct NvencContext
    int device;
    int flags;
    int async_depth;
+    int rc_lookahead;
} NvencContext;

int ff_nvenc_encode_init(AVCodecContext *avctx);
diff --git a/libavcodec/nvenc_hevc.c b/libavcodec/nvenc_hevc.c
index 1ce7c89..04e351a 100644
--- a/libavcodec/nvenc_hevc.c
+++ b/libavcodec/nvenc_hevc.c
@@ -39,8 +39,9 @@  static const AVOption options[] = {
    { "llhp",       "low latency hp",                     0,                   AV_OPT_TYPE_CONST,  { .i64 = PRESET_LOW_LATENCY_HP }, 0, 0, VE, "preset" },
    { "lossless",   "lossless",                           0,                   AV_OPT_TYPE_CONST,  { .i64 = PRESET_LOSSLESS_DEFAULT }, 0, 0, VE, "preset" },
    { "losslesshp", "lossless hp",                        0,                   AV_OPT_TYPE_CONST,  { .i64 = PRESET_LOSSLESS_HP }, 0, 0, VE, "preset" },
-    { "profile", "Set the encoding profile",             OFFSET(profile),      AV_OPT_TYPE_INT,    { .i64 = FF_PROFILE_HEVC_MAIN }, FF_PROFILE_HEVC_MAIN, FF_PROFILE_HEVC_MAIN, VE, "profile" },
-    { "main",    "",                                     0,                    AV_OPT_TYPE_CONST,  { .i64 = FF_PROFILE_HEVC_MAIN }, 0, 0, VE, "profile" },
+    { "profile", "Set the encoding profile",             OFFSET(profile),      AV_OPT_TYPE_INT,    { .i64 = NV_ENC_HEVC_PROFILE_MAIN }, NV_ENC_HEVC_PROFILE_MAIN, FF_PROFILE_HEVC_MAIN_10, VE, "profile" },
+    { "main",    "",                                     0,                    AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_HEVC_PROFILE_MAIN }, 0, 0, VE, "profile" },
+    { "main10",  "",                                     0,                    AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_HEVC_PROFILE_MAIN_10 }, 0, 0, VE, "profile" },
    { "level",   "Set the encoding level restriction",   OFFSET(level),        AV_OPT_TYPE_INT,    { .i64 = NV_ENC_LEVEL_AUTOSELECT }, NV_ENC_LEVEL_AUTOSELECT, NV_ENC_LEVEL_HEVC_62, VE, "level" },
    { "auto",    "",                                     0,                    AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_LEVEL_AUTOSELECT },  0, 0, VE,  "level" },
    { "1",       "",                                     0,                    AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_LEVEL_HEVC_1 },  0, 0, VE,  "level" },
@@ -73,6 +74,7 @@  static const AVOption options[] = {
    { "ll_2pass_quality", "Multi-pass optimized for image quality (only for low-latency presets)",       0, AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_PARAMS_RC_2_PASS_QUALITY },       0, 0, VE, "rc" },
    { "ll_2pass_size",    "Multi-pass optimized for constant frame size (only for low-latency presets)", 0, AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_PARAMS_RC_2_PASS_FRAMESIZE_CAP }, 0, 0, VE, "rc" },
    { "vbr_2pass",        "Multi-pass variable bitrate mode",                                            0, AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_PARAMS_RC_2_PASS_VBR },           0, 0, VE, "rc" },
+    { "rc-lookahead",  "Number of frames to look ahead for rate-control", OFFSET(rc_lookahead), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, VE },
    { "surfaces", "Number of concurrent surfaces",        OFFSET(nb_surfaces), AV_OPT_TYPE_INT,    { .i64 = 32 },                   0, INT_MAX, VE },
    { "cbr", "Use cbr encoding mode", OFFSET(cbr), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
    { "2pass", "Use 2pass encoding mode", OFFSET(twopass), AV_OPT_TYPE_BOOL, { .i64 = -1 }, -1, 1, VE },