[FFmpeg-devel] avcodec/cuvid: Add support for P010 as an output surface format

Submitted by Philip Langdale on Nov. 20, 2016, 7:44 p.m.

Details

Message ID 20161120194422.20746-1-philipl@overt.org
State New
Headers show

Commit Message

Philip Langdale Nov. 20, 2016, 7:44 p.m.
The nvidia 375.xx driver introduces support for P016 output surfaces,
for 10bit and 12bit HEVC content (it's also the first driver to support
hardware decoding of 12bit content).

Technically, we don't support P016, but in practice I don't think we
zero-out the extra bits in P010 so it can be used to carry the data.

This change introduces cuvid decoder support for P010 output for
output to hardware and system memory surfaces. For simplicity, it
does not maintain the previous ability to output NV12 for > 8 bit
input video - the user will need to update their driver to decode
such videos.

After this change, both cuvid and nvenc support P010, but the
ffmpeg_cuvid transcoding logic will need more work to connect the
two together. Similarly, the scale_npp filter still only works with
8bit surfaces.

Signed-off-by: Philip Langdale <philipl@overt.org>
---
 compat/cuda/cuviddec.h     |  3 ++-
 libavcodec/cuvid.c         | 57 ++++++++++++++++++++++++++++++++++------------
 libavutil/hwcontext_cuda.c | 11 ++++++++-
 3 files changed, 55 insertions(+), 16 deletions(-)

Comments

Timo Rothenpieler Nov. 20, 2016, 8:02 p.m.
I don't really like outputting P016 as P010.
I'd prefer to add support for P016 to ffmpeg and swscale, which
shouldn't be too hard as most P010 code can be reused.
Carl Eugen Hoyos Nov. 20, 2016, 8:06 p.m.
2016-11-20 21:02 GMT+01:00 Timo Rothenpieler <timo@rothenpieler.org>:
> I don't really like outputting P016 as P010.
> I'd prefer to add support for P016 to ffmpeg and swscale, which
> shouldn't be too hard as most P010 code can be reused.

Or change the existing P010 code to actually handle P016...
(And do a rename on the next version bump)

Carl Eugen
Hendrik Leppkes Nov. 20, 2016, 11:24 p.m.
On Sun, Nov 20, 2016 at 9:06 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2016-11-20 21:02 GMT+01:00 Timo Rothenpieler <timo@rothenpieler.org>:
>> I don't really like outputting P016 as P010.
>> I'd prefer to add support for P016 to ffmpeg and swscale, which
>> shouldn't be too hard as most P010 code can be reused.
>
> Or change the existing P010 code to actually handle P016...
> (And do a rename on the next version bump)
>

These are two distinct formats and should be kept as such. Hardware
cares for the difference, so for that alone we need different pixfmts
for the hwcontext implementation.

- Hendrik
Carl Eugen Hoyos Nov. 21, 2016, 12:10 a.m.
2016-11-21 0:24 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>:
> On Sun, Nov 20, 2016 at 9:06 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>> 2016-11-20 21:02 GMT+01:00 Timo Rothenpieler <timo@rothenpieler.org>:
>>> I don't really like outputting P016 as P010.
>>> I'd prefer to add support for P016 to ffmpeg and swscale, which
>>> shouldn't be too hard as most P010 code can be reused.
>>
>> Or change the existing P010 code to actually handle P016...
>> (And do a rename on the next version bump)
>>
>
> These are two distinct formats and should be kept as such.

> Hardware cares for the difference

Please explain how this is possible.

Carl Eugen
Hendrik Leppkes Nov. 21, 2016, 12:31 a.m.
On Mon, Nov 21, 2016 at 1:10 AM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2016-11-21 0:24 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>:
>> On Sun, Nov 20, 2016 at 9:06 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>> 2016-11-20 21:02 GMT+01:00 Timo Rothenpieler <timo@rothenpieler.org>:
>>>> I don't really like outputting P016 as P010.
>>>> I'd prefer to add support for P016 to ffmpeg and swscale, which
>>>> shouldn't be too hard as most P010 code can be reused.
>>>
>>> Or change the existing P010 code to actually handle P016...
>>> (And do a rename on the next version bump)
>>>
>>
>> These are two distinct formats and should be kept as such.
>
>> Hardware cares for the difference
>
> Please explain how this is possible.

I have no idea what your problem with the P01X formats is. You already
made a fuzz about adding P010 and now you are making one again.
Whats wrong with these?

Various hardware APIs make a distinct difference in handling P010 and
P016 for 10-bit and >10-bit pixels respectively, and if we want to
properly support these APIs fully, we need to be able to map these
types to pixel formats on our end - which is already done for HEVC
10-bit decoding with P010, and now follows with 12-bit decoding using
P016. Both are a common industry standard format for 4:2:0
high-bitdepth surfaces on the GPU.

They do technically share the same memory layout with a varying degree
of zeros at the end, however as the hardware and its APIs makes a
clear distinction between the two, so do we. Otherwise allocating
hardware surfaces in the appropriate format is going to be rather
cumbersome or flat out broken. The avutil hwcontext implementation
needs a clear mapping between pixfmts and surface formats.

- Hendrik

Patch hide | download patch | download mbox

diff --git a/compat/cuda/cuviddec.h b/compat/cuda/cuviddec.h
index f9257ea..357289b 100644
--- a/compat/cuda/cuviddec.h
+++ b/compat/cuda/cuviddec.h
@@ -87,7 +87,8 @@  typedef enum cudaVideoCodec_enum {
  * Video Surface Formats Enums
  */
 typedef enum cudaVideoSurfaceFormat_enum {
-    cudaVideoSurfaceFormat_NV12=0       /**< NV12 (currently the only supported output format)  */
+    cudaVideoSurfaceFormat_NV12=0,      /**< NV12  */
+    cudaVideoSurfaceFormat_P016=1       /**< P016  */
 } cudaVideoSurfaceFormat;
 
 /*!
diff --git a/libavcodec/cuvid.c b/libavcodec/cuvid.c
index eafce0a..a6fb302 100644
--- a/libavcodec/cuvid.c
+++ b/libavcodec/cuvid.c
@@ -26,6 +26,7 @@ 
 #include "libavutil/fifo.h"
 #include "libavutil/log.h"
 #include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
 
 #include "avcodec.h"
 #include "internal.h"
@@ -99,11 +100,35 @@  static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
     CuvidContext *ctx = avctx->priv_data;
     AVHWFramesContext *hwframe_ctx = (AVHWFramesContext*)ctx->hwframe->data;
     CUVIDDECODECREATEINFO cuinfo;
+    int surface_fmt;
+
+    enum AVPixelFormat pix_fmts_nv12[3] = { AV_PIX_FMT_CUDA,
+                                            AV_PIX_FMT_NV12,
+                                            AV_PIX_FMT_NONE };
+
+    enum AVPixelFormat pix_fmts_p010[3] = { AV_PIX_FMT_CUDA,
+                                            AV_PIX_FMT_P010,
+                                            AV_PIX_FMT_NONE };
 
     av_log(avctx, AV_LOG_TRACE, "pfnSequenceCallback, progressive_sequence=%d\n", format->progressive_sequence);
 
     ctx->internal_error = 0;
 
+    surface_fmt = ff_get_format(avctx, format->bit_depth_luma_minus8 > 0 ?
+                                pix_fmts_p010 : pix_fmts_nv12);
+    if (surface_fmt < 0) {
+        av_log(avctx, AV_LOG_ERROR, "ff_get_format failed: %d\n", surface_fmt);
+        ctx->internal_error = AVERROR(EINVAL);
+        return 0;
+    }
+
+    av_log(avctx, AV_LOG_VERBOSE, "Formats: Original: %s | HW: %s | SW: %s\n",
+           av_get_pix_fmt_name(avctx->pix_fmt),
+           av_get_pix_fmt_name(surface_fmt),
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    avctx->pix_fmt = surface_fmt;
+
     avctx->width = format->display_area.right;
     avctx->height = format->display_area.bottom;
 
@@ -152,7 +177,7 @@  static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
             hwframe_ctx->width < avctx->width ||
             hwframe_ctx->height < avctx->height ||
             hwframe_ctx->format != AV_PIX_FMT_CUDA ||
-            hwframe_ctx->sw_format != AV_PIX_FMT_NV12)) {
+            hwframe_ctx->sw_format != avctx->sw_pix_fmt)) {
         av_log(avctx, AV_LOG_ERROR, "AVHWFramesContext is already initialized with incompatible parameters\n");
         ctx->internal_error = AVERROR(EINVAL);
         return 0;
@@ -173,7 +198,19 @@  static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
 
     cuinfo.CodecType = ctx->codec_type = format->codec;
     cuinfo.ChromaFormat = format->chroma_format;
-    cuinfo.OutputFormat = cudaVideoSurfaceFormat_NV12;
+
+    switch (avctx->sw_pix_fmt) {
+    case AV_PIX_FMT_NV12:
+        cuinfo.OutputFormat = cudaVideoSurfaceFormat_NV12;
+        break;
+    case AV_PIX_FMT_P010:
+        cuinfo.OutputFormat = cudaVideoSurfaceFormat_P016;
+        break;
+    default:
+        av_log(avctx, AV_LOG_ERROR, "Output formats other than NV12 or P010 are not supported\n");
+        ctx->internal_error = AVERROR(EINVAL);
+        return 0;
+    }
 
     cuinfo.ulWidth = avctx->coded_width;
     cuinfo.ulHeight = avctx->coded_height;
@@ -205,7 +242,7 @@  static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
 
     if (!hwframe_ctx->pool) {
         hwframe_ctx->format = AV_PIX_FMT_CUDA;
-        hwframe_ctx->sw_format = AV_PIX_FMT_NV12;
+        hwframe_ctx->sw_format = avctx->sw_pix_fmt;
         hwframe_ctx->width = avctx->width;
         hwframe_ctx->height = avctx->height;
 
@@ -413,7 +450,8 @@  static int cuvid_output_frame(AVCodecContext *avctx, AVFrame *frame)
 
                 offset += avctx->coded_height;
             }
-        } else if (avctx->pix_fmt == AV_PIX_FMT_NV12) {
+        } else if (avctx->pix_fmt == AV_PIX_FMT_NV12 ||
+                   avctx->pix_fmt == AV_PIX_FMT_P010) {
             AVFrame *tmp_frame = av_frame_alloc();
             if (!tmp_frame) {
                 av_log(avctx, AV_LOG_ERROR, "av_frame_alloc failed\n");
@@ -606,16 +644,6 @@  static av_cold int cuvid_decode_init(AVCodecContext *avctx)
     const AVBitStreamFilter *bsf;
     int ret = 0;
 
-    enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_CUDA,
-                                       AV_PIX_FMT_NV12,
-                                       AV_PIX_FMT_NONE };
-
-    ret = ff_get_format(avctx, pix_fmts);
-    if (ret < 0) {
-        av_log(avctx, AV_LOG_ERROR, "ff_get_format failed: %d\n", ret);
-        return ret;
-    }
-
     ctx->frame_queue = av_fifo_alloc(MAX_FRAME_COUNT * sizeof(CuvidParsedFrame));
     if (!ctx->frame_queue) {
         ret = AVERROR(ENOMEM);
@@ -883,6 +911,7 @@  static const AVOption options[] = {
         .capabilities   = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_AVOID_PROBING, \
         .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_CUDA, \
                                                         AV_PIX_FMT_NV12, \
+                                                        AV_PIX_FMT_P010, \
                                                         AV_PIX_FMT_NONE }, \
     };
 
diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c
index e1dcab0..add4adb 100644
--- a/libavutil/hwcontext_cuda.c
+++ b/libavutil/hwcontext_cuda.c
@@ -35,6 +35,7 @@  static const enum AVPixelFormat supported_formats[] = {
     AV_PIX_FMT_NV12,
     AV_PIX_FMT_YUV420P,
     AV_PIX_FMT_YUV444P,
+    AV_PIX_FMT_P010,
 };
 
 static void cuda_buffer_free(void *opaque, uint8_t *data)
@@ -109,6 +110,7 @@  static int cuda_frames_init(AVHWFramesContext *ctx)
             size = aligned_width * ctx->height * 3 / 2;
             break;
         case AV_PIX_FMT_YUV444P:
+        case AV_PIX_FMT_P010:
             size = aligned_width * ctx->height * 3;
             break;
         }
@@ -123,7 +125,13 @@  static int cuda_frames_init(AVHWFramesContext *ctx)
 
 static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame)
 {
-    int aligned_width = FFALIGN(ctx->width, CUDA_FRAME_ALIGNMENT);
+    int aligned_width;
+    int width_in_bytes = ctx->width;
+
+    if (ctx->sw_format == AV_PIX_FMT_P010) {
+       width_in_bytes *= 2;
+    }
+    aligned_width = FFALIGN(width_in_bytes, CUDA_FRAME_ALIGNMENT);
 
     frame->buf[0] = av_buffer_pool_get(ctx->pool);
     if (!frame->buf[0])
@@ -131,6 +139,7 @@  static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame)
 
     switch (ctx->sw_format) {
     case AV_PIX_FMT_NV12:
+    case AV_PIX_FMT_P010:
         frame->data[0]     = frame->buf[0]->data;
         frame->data[1]     = frame->data[0] + aligned_width * ctx->height;
         frame->linesize[0] = aligned_width;