diff mbox

[FFmpeg-devel] avutil/hwcontext_cuda: fix YUV420P cuda_get_buffer

Message ID 20180514212950.15764-1-cus@passwd.hu
State New
Headers show

Commit Message

Marton Balint May 14, 2018, 9:29 p.m. UTC
Regression since ece068a771ac3f725e854c681ecbef08e792addc.

Signed-off-by: Marton Balint <cus@passwd.hu>
---
 libavutil/hwcontext_cuda.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Timo Rothenpieler May 15, 2018, 9:08 a.m. UTC | #1
On 14.05.2018 23:29, Marton Balint wrote:
> Regression since ece068a771ac3f725e854c681ecbef08e792addc.
> 
> Signed-off-by: Marton Balint <cus@passwd.hu>
> ---
>  libavutil/hwcontext_cuda.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c
> index cb5d15c9d9..e16d0a2b4b 100644
> --- a/libavutil/hwcontext_cuda.c
> +++ b/libavutil/hwcontext_cuda.c
> @@ -159,10 +159,13 @@ static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame)
>          return res;
>  
>      // YUV420P is a special case.
> -    // Nvenc expects the U/V planes in swapped order from how ffmpeg expects them.
> +    // Nvenc expects the U/V planes in swapped order from how ffmpeg expects them, also chroma is half-aligned
>      if (ctx->sw_format == AV_PIX_FMT_YUV420P) {
> -        FFSWAP(uint8_t*, frame->data[1], frame->data[2]);
> -        FFSWAP(int, frame->linesize[1], frame->linesize[2]);
> +        int aligned_width = FFALIGN(ctx->width, CUDA_FRAME_ALIGNMENT);
> +        frame->data[2]     = frame->data[0] + aligned_width * ctx->height;
> +        frame->data[1]     = frame->data[2] + aligned_width * ctx->height / 4;
> +        frame->linesize[1] = aligned_width / 2;
> +        frame->linesize[2] = aligned_width / 2;
>      }

Are you sure it isn't already that way? I tested specially YUV420P and
could not make out any issues.

And if this is really necessary, please don't use the old aligned_width
term, it's incredibly confusing and made the old code a mess.
linesize[0] should be identical to it if I'm not mistaken.

So something like

frame->linesize[1] = frame->linesize[2] = frame->linesize[0] / 2;
frame->data[2]     = frame->data[1];
frame->data[1]     = frame->data[2] + frame->linesize[2] * ctx->height / 2;
Marton Balint May 15, 2018, 11:37 a.m. UTC | #2
On Tue, 15 May 2018, Timo Rothenpieler wrote:

> On 14.05.2018 23:29, Marton Balint wrote:
>> Regression since ece068a771ac3f725e854c681ecbef08e792addc.
>> 
>> Signed-off-by: Marton Balint <cus@passwd.hu>
>> ---
>>  libavutil/hwcontext_cuda.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>> 
>> diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c
>> index cb5d15c9d9..e16d0a2b4b 100644
>> --- a/libavutil/hwcontext_cuda.c
>> +++ b/libavutil/hwcontext_cuda.c
>> @@ -159,10 +159,13 @@ static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame)
>>          return res;
>>
>>      // YUV420P is a special case.
>> -    // Nvenc expects the U/V planes in swapped order from how ffmpeg expects them.
>> +    // Nvenc expects the U/V planes in swapped order from how ffmpeg expects them, also chroma is half-aligned
>>      if (ctx->sw_format == AV_PIX_FMT_YUV420P) {
>> -        FFSWAP(uint8_t*, frame->data[1], frame->data[2]);
>> -        FFSWAP(int, frame->linesize[1], frame->linesize[2]);
>> +        int aligned_width = FFALIGN(ctx->width, CUDA_FRAME_ALIGNMENT);
>> +        frame->data[2]     = frame->data[0] + aligned_width * ctx->height;
>> +        frame->data[1]     = frame->data[2] + aligned_width * ctx->height / 4;
>> +        frame->linesize[1] = aligned_width / 2;
>> +        frame->linesize[2] = aligned_width / 2;
>>      }
>
> Are you sure it isn't already that way? I tested specially YUV420P and
> could not make out any issues.

For 4K content linesize[0] is 3840, but linesize[1] becomes 2048 instead 
of 1920.

>
> And if this is really necessary, please don't use the old aligned_width
> term, it's incredibly confusing and made the old code a mess.
> linesize[0] should be identical to it if I'm not mistaken.
>
> So something like
>
> frame->linesize[1] = frame->linesize[2] = frame->linesize[0] / 2;
> frame->data[2]     = frame->data[1];
> frame->data[1]     = frame->data[2] + frame->linesize[2] * ctx->height / 2;

Ok, will use this.

Regards,
Marton
Timo Rothenpieler May 15, 2018, 12:04 p.m. UTC | #3
On 15.05.2018 13:37, Marton Balint wrote:
>> Are you sure it isn't already that way? I tested specially YUV420P and
>> could not make out any issues.
> 
> For 4K content linesize[0] is 3840, but linesize[1] becomes 2048 instead
> of 1920.

That's odd, 2048 seems more correct since it's a multiple of the
recommended 2048. But I guess the whole yuv420p in nvenc is some weird
hack on the driver side.

>>
>> So something like
>>
>> frame->linesize[1] = frame->linesize[2] = frame->linesize[0] / 2;
>> frame->data[2]     = frame->data[1];
>> frame->data[1]     = frame->data[2] + frame->linesize[2] * ctx->height
>> / 2;
> 
> Ok, will use this.

Feel free to push if it works. Don't have access to an nvidia machine
right now, so can't properly test.
Marton Balint May 15, 2018, 9:07 p.m. UTC | #4
On Tue, 15 May 2018, Timo Rothenpieler wrote:

> On 15.05.2018 13:37, Marton Balint wrote:
>>> Are you sure it isn't already that way? I tested specially YUV420P and
>>> could not make out any issues.
>>
>> For 4K content linesize[0] is 3840, but linesize[1] becomes 2048 instead
>> of 1920.
>
> That's odd, 2048 seems more correct since it's a multiple of the
> recommended 2048. But I guess the whole yuv420p in nvenc is some weird
> hack on the driver side.
>
>>>
>>> So something like
>>>
>>> frame->linesize[1] = frame->linesize[2] = frame->linesize[0] / 2;
>>> frame->data[2]     = frame->data[1];
>>> frame->data[1]     = frame->data[2] + frame->linesize[2] * ctx->height
>>> / 2;
>>
>> Ok, will use this.
>
> Feel free to push if it works. Don't have access to an nvidia machine
> right now, so can't properly test.

Ok, pushed.

Thanks,
Marton
diff mbox

Patch

diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c
index cb5d15c9d9..e16d0a2b4b 100644
--- a/libavutil/hwcontext_cuda.c
+++ b/libavutil/hwcontext_cuda.c
@@ -159,10 +159,13 @@  static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame)
         return res;
 
     // YUV420P is a special case.
-    // Nvenc expects the U/V planes in swapped order from how ffmpeg expects them.
+    // Nvenc expects the U/V planes in swapped order from how ffmpeg expects them, also chroma is half-aligned
     if (ctx->sw_format == AV_PIX_FMT_YUV420P) {
-        FFSWAP(uint8_t*, frame->data[1], frame->data[2]);
-        FFSWAP(int, frame->linesize[1], frame->linesize[2]);
+        int aligned_width = FFALIGN(ctx->width, CUDA_FRAME_ALIGNMENT);
+        frame->data[2]     = frame->data[0] + aligned_width * ctx->height;
+        frame->data[1]     = frame->data[2] + aligned_width * ctx->height / 4;
+        frame->linesize[1] = aligned_width / 2;
+        frame->linesize[2] = aligned_width / 2;
     }
 
     frame->format = AV_PIX_FMT_CUDA;