From patchwork Wed Mar 20 23:18:22 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?B?67Cx7KSA7Iud?= <js100@linecorp.com>
X-Patchwork-Id: 12371
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id A1C5244849E
	for <patchwork@ffaux-bg.ffmpeg.org>;
	Thu, 21 Mar 2019 01:18:32 +0200 (EET)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 764ED68AB64;
	Thu, 21 Mar 2019 01:18:32 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from cvsmtppost001.nmdf.navercorp.com
	(cvsmtppost001.nmdf.navercorp.com [125.209.246.151])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 879BF68AAED
	for <ffmpeg-devel@ffmpeg.org>; Thu, 21 Mar 2019 01:18:25 +0200 (EET)
X-Naver-CIP: 10.70.13.1
Received: from cvsendbo001.nmdf ([10.112.251.49])
	by cvsmtppost001.nmdf.navercorp.com with ESMTP id
	qCqLQsOGQaCMV8THzJW6ig
	for <ffmpeg-devel@ffmpeg.org>; Wed, 20 Mar 2019 23:18:22 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linecorp.com;
	s=s20171222; t=1553123902;
	bh=wr49Zh4Ika/OXQnjwZP2DZ9h2RPbCLNKuHj8KLMqJpg=;
	h=Message-ID:Date:From:To:Subject;
	b=tMCrU6F92B67VNi6XIoOW90oLrMkaGps4slYuuRJvT1OE911RAKEhbpND8jWU8x7z
	R4E3EIUcpM8yVMELH+1c+UlIR13uvVRX+ibdXGEntfvxjqMGA8TuOp0rW//OmogkK3
	7N+joi0j8/7tTKIcGIkg/mngos9lpC1PWyxjcPQmAUwRCOofKe9kVHYKQScHaZBgPD
	1ukCwynaOuSNg68ULb40eMmbEUvrSfWWdI0PTukXLJt/AasbCTJ7AcwIMqwVSf/A2x
	7GoDnDgAs5fJIpdX+kd3YqZpTG5ycxHBm/nuKi9QjvfyDTE5MxRWJntnoMu+MLtQQW
	0f3xRieyu7xzA==
X-Session-ID: Pp82-btrTCOh+6Ls462yUg
MIME-Version: 1.0
Message-ID: <df48eff3f27f65766effa2a6b6b6d9@cweb03.nmdf.nhnsystem.com>
Date: Thu, 21 Mar 2019 08:18:22 +0900
From: =?utf-8?B?67Cx7KSA7Iud?=<js100@linecorp.com>
Importance: normal
To: <ffmpeg-devel@ffmpeg.org>
X-Originating-IP: 10.70.13.1
Subject: [FFmpeg-devel] =?utf-8?q?=5BPATCH=5D_Use_packet_DTS_to_correct_fr?=
	=?utf-8?q?ame_PTS_for_PTS_missing_video_in__nvidia_cuvid_decoder?=
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

- Current Status
In cuvid decoder wrapper, PTS is resolved based on input packet PTS.
When PTS is missing, cuvid decoder produces monotorically increasing numbers with same interval.
This does not occur problem when every single frame in video has same frame duration like fixed frame rate video.

- Problem
For variable frame rate video with missing PTS, however, resolving PTS based on monotorically increasing number with same interval do not reflect actual frame position in timedomain.
Test sample is AVI without PTS info, and has missing frames which means this is variable frame rate video. When cuvid decoder processes this video, various frame duration is ignored due to missing PTS. This directly leads to AV sync problem and a lot of frame duplication at end of video.

- Solution
To correct each frame duration, packet DTS is passed through cuvidParseVideoData() to resolve output frame duration.
Since passed packet DTS is not actual PTS, resolved value through CuvidParsedFrame is stored in frame->best_effort_timestamp like any other decoder wrapper.

Signed-off-by: JoonsikBaek <js100@linecorp.com>
---
 libavcodec/cuviddec.c | 36 +++++++++++++++++++++++++++---------
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
index 291bb93..c9f0ff1 100644
--- a/libavcodec/cuviddec.c
+++ b/libavcodec/cuviddec.c
@@ -81,6 +81,8 @@ typedef struct CuvidContext
     int internal_error;
     int decoder_flushing;
 
+    int use_best_effort_pts_from_dts;
+
     int *key_frame;
 
     cudaVideoCodec codec_type;
@@ -384,6 +386,7 @@ static int cuvid_decode_packet(AVCodecContext *avctx, const AVPacket *avpkt)
     AVPacket filter_packet = { 0 };
     AVPacket filtered_packet = { 0 };
     int ret = 0, eret = 0, is_flush = ctx->decoder_flushing;
+    int64_t timestamp = AV_NOPTS_VALUE;
 
     av_log(avctx, AV_LOG_TRACE, "cuvid_decode_packet\n");
 
@@ -426,11 +429,18 @@ static int cuvid_decode_packet(AVCodecContext *avctx, const AVPacket *avpkt)
         cupkt.payload = avpkt->data;
 
         if (avpkt->pts != AV_NOPTS_VALUE) {
+            timestamp = avpkt->pts;
+        } else if (avpkt->dts != AV_NOPTS_VALUE) {
+            ctx->use_best_effort_pts_from_dts = 1;
+            timestamp = avpkt->dts;
+        }
+
+        if (timestamp != AV_NOPTS_VALUE) {
             cupkt.flags = CUVID_PKT_TIMESTAMP;
             if (avctx->pkt_timebase.num && avctx->pkt_timebase.den)
-                cupkt.timestamp = av_rescale_q(avpkt->pts, avctx->pkt_timebase, (AVRational){1, 10000000});
+                cupkt.timestamp = av_rescale_q(timestamp, avctx->pkt_timebase, (AVRational){1, 10000000});
             else
-                cupkt.timestamp = avpkt->pts;
+                cupkt.timestamp = timestamp;
         }
     } else {
         cupkt.flags = CUVID_PKT_ENDOFSTREAM;
@@ -506,6 +516,7 @@ static int cuvid_output_frame(AVCodecContext *avctx, AVFrame *frame)
         unsigned int pitch = 0;
         int offset = 0;
         int i;
+        int64_t timestamp;
 
         av_fifo_generic_read(ctx->frame_queue, &parsed_frame, sizeof(CuvidParsedFrame), NULL);
 
@@ -610,22 +621,29 @@ static int cuvid_output_frame(AVCodecContext *avctx, AVFrame *frame)
         frame->key_frame = ctx->key_frame[parsed_frame.dispinfo.picture_index];
         frame->width = avctx->width;
         frame->height = avctx->height;
+
         if (avctx->pkt_timebase.num && avctx->pkt_timebase.den)
-            frame->pts = av_rescale_q(parsed_frame.dispinfo.timestamp, (AVRational){1, 10000000}, avctx->pkt_timebase);
+            timestamp = av_rescale_q(parsed_frame.dispinfo.timestamp, (AVRational){1, 10000000}, avctx->pkt_timebase);
         else
-            frame->pts = parsed_frame.dispinfo.timestamp;
+            timestamp = parsed_frame.dispinfo.timestamp;
 
         if (parsed_frame.second_field) {
             if (ctx->prev_pts == INT64_MIN) {
-                ctx->prev_pts = frame->pts;
-                frame->pts += (avctx->pkt_timebase.den * avctx->framerate.den) / (avctx->pkt_timebase.num * avctx->framerate.num);
+                ctx->prev_pts = timestamp;
+                timestamp += (avctx->pkt_timebase.den * avctx->framerate.den) / (avctx->pkt_timebase.num * avctx->framerate.num);
             } else {
-                int pts_diff = (frame->pts - ctx->prev_pts) / 2;
-                ctx->prev_pts = frame->pts;
-                frame->pts += pts_diff;
+                int pts_diff = (timestamp - ctx->prev_pts) / 2;
+                ctx->prev_pts = timestamp;
+                timestamp += pts_diff;
             }
         }
 
+        if(ctx->use_best_effort_pts_from_dts) {
+            frame->best_effort_timestamp = timestamp;
+        } else {
+            frame->pts = timestamp;
+        }
+
         /* CUVIDs opaque reordering breaks the internal pkt logic.
          * So set pkt_pts and clear all the other pkt_ fields.
          */