From patchwork Thu Sep 22 18:33:37 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sasi Inguva <isasi-at-google.com@ffmpeg.org>
X-Patchwork-Id: 681
Delivered-To: ffmpegpatchwork@gmail.com
Received: by 10.103.140.66 with SMTP id o63csp159578vsd;
	Thu, 22 Sep 2016 11:34:25 -0700 (PDT)
X-Received: by 10.194.191.228 with SMTP id hb4mr3720961wjc.213.1474569265102;
	Thu, 22 Sep 2016 11:34:25 -0700 (PDT)
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
	by mx.google.com with ESMTP id
	i206si2700870wma.125.2016.09.22.11.34.23;
	Thu, 22 Sep 2016 11:34:25 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
	dkim=neutral (body hash did not verify) header.i=@google.com;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 79ECA689D2A;
	Thu, 22 Sep 2016 21:34:05 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com
	[209.85.220.52])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 12603689CEB
	for <ffmpeg-devel@ffmpeg.org>; Thu, 22 Sep 2016 21:33:59 +0300 (EEST)
Received: by mail-pa0-f52.google.com with SMTP id qn7so14281041pac.3
	for <ffmpeg-devel@ffmpeg.org>; Thu, 22 Sep 2016 11:34:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
	s=20120113;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=MicTsBYhsY4Bj2h/S3oFSAgtd23qWv09QkrsGD7NECE=;
	b=SZpIIq1iKAEy2iA/XyZb+GWqYfMRA4tq+cVILXakmVcW52KM+j5eRUPjNKJlOoAVs8
	LSbkO1545ZGqpxHBfZlBIIeLthhWbBLkM/sh6miT+FjQ3DwVcFeFXdoW7Xq4uZL1g++v
	qyxSHiFqdDAZOdpfaA0jGbfh+FZxd/H/EjKoAcf74drTPmPlpPdQd+tUsRptEFjNaJA/
	rkvbh6kVWFn+uknsAp40D8RmDX9sCULbHVYM5cQ6huGOlT0VFwU2BheI6ZL7UagEdrxR
	yRpVQJ0ifyDAnxeQGbxbrjqOxS59DWYPM/XQA5NuIVMzsrl0bk+EC51D3xMs18t9/kst
	PFFQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=MicTsBYhsY4Bj2h/S3oFSAgtd23qWv09QkrsGD7NECE=;
	b=djMeHpYOOGC91cTvNmQP7YJzKiIiW2tppPIReQrw/PQT04h1w4IvNBjKWA7xJ7TY60
	znxTeLjrej0EHchtPyhZAJGPZXpgHfOvJHVA7tPuP/WOltu8FyjohEic3aPBYOC6LczI
	/Oc3Jo/hdYz2TAYvozCsmGNxTq22wo7ZN1ScAuDa08XcxbZJl1Fg/50wSwlmx52ji9RD
	WwwJtKa/0lLV9xRWzboyjd5osi/BWPnCDUNL9jgm3xI3Lu5AM1xiOFxsim/nMzCbNehs
	QcfzX4tvRS0rMYXjNnF7W6L7XBSW0PSL0GgmCco/5vtsVQRwHysmEFFJabSzA7BDJJV5
	YYVQ==
X-Gm-Message-State: 
 AE9vXwPSycXChVH1q77mgP72gfjKx+rGirK5SStjfOdwzDbbv9ayQ0FK4HO7AMWsBpMyv0/M
X-Received: by 10.66.119.136 with SMTP id ku8mr5656843pab.6.1474569253397;
	Thu, 22 Sep 2016 11:34:13 -0700 (PDT)
Received: from isasi.mtv.corp.google.com ([172.27.82.89])
	by smtp.gmail.com with ESMTPSA id
	m82sm5728837pfk.64.2016.09.22.11.34.12
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Thu, 22 Sep 2016 11:34:12 -0700 (PDT)
From: Sasi Inguva <isasi-at-google.com@ffmpeg.org>
To: ffmpeg-devel@ffmpeg.org
Date: Thu, 22 Sep 2016 11:33:37 -0700
Message-Id: <1474569217-14473-1-git-send-email-isasi@google.com>
X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020
In-Reply-To: <20160922144923.187aa93a@debian>
References: <20160922144923.187aa93a@debian>
Subject: [FFmpeg-devel] [PATCH] lavf/mov.c: Make audio timestamps strictly
	monotonically increasing inside an edit list. Fixes gapless
	decoding.
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Cc: Sasi Inguva <isasi@google.com>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Signed-off-by: Sasi Inguva <isasi@google.com>
---
 libavcodec/utils.c                           | 16 +++---
 libavformat/mov.c                            | 81 ++++++++++++++++++++++++----
 tests/ref/fate/gaplessenc-itunes-to-ipod-aac |  2 +-
 tests/ref/fate/gaplessenc-pcm-to-mov-aac     |  2 +-
 4 files changed, 79 insertions(+), 22 deletions(-)

diff --git a/libavcodec/utils.c b/libavcodec/utils.c
index b0345b6..6323156 100644
--- a/libavcodec/utils.c
+++ b/libavcodec/utils.c
@@ -2320,7 +2320,6 @@ int attribute_align_arg avcodec_decode_audio4(AVCodecContext *avctx,
         uint32_t discard_padding = 0;
         uint8_t skip_reason = 0;
         uint8_t discard_reason = 0;
-        int demuxer_skip_samples = 0;
         // copy to ensure we do not change avpkt
         AVPacket tmp = *avpkt;
         int did_split = av_packet_split_side_data(&tmp);
@@ -2328,7 +2327,6 @@ int attribute_align_arg avcodec_decode_audio4(AVCodecContext *avctx,
         if (ret < 0)
             goto fail;
 
-        demuxer_skip_samples = avctx->internal->skip_samples;
         avctx->internal->pkt = &tmp;
         if (HAVE_THREADS && avctx->active_thread_type & FF_THREAD_FRAME)
             ret = ff_thread_decode_frame(avctx, frame, got_frame_ptr, &tmp);
@@ -2353,13 +2351,6 @@ int attribute_align_arg avcodec_decode_audio4(AVCodecContext *avctx,
                 frame->sample_rate = avctx->sample_rate;
         }
 
-
-        if (frame->flags & AV_FRAME_FLAG_DISCARD) {
-            // If using discard frame flag, ignore skip_samples set by the decoder.
-            avctx->internal->skip_samples = demuxer_skip_samples;
-            *got_frame_ptr = 0;
-        }
-
         side= av_packet_get_side_data(avctx->internal->pkt, AV_PKT_DATA_SKIP_SAMPLES, &side_size);
         if(side && side_size>=10) {
             avctx->internal->skip_samples = AV_RL32(side);
@@ -2369,6 +2360,13 @@ int attribute_align_arg avcodec_decode_audio4(AVCodecContext *avctx,
             skip_reason = AV_RL8(side + 8);
             discard_reason = AV_RL8(side + 9);
         }
+
+        if ((frame->flags & AV_FRAME_FLAG_DISCARD) && *got_frame_ptr &&
+            !(avctx->flags2 & AV_CODEC_FLAG2_SKIP_MANUAL)) {
+            avctx->internal->skip_samples -= frame->nb_samples;
+            *got_frame_ptr = 0;
+        }
+
         if (avctx->internal->skip_samples > 0 && *got_frame_ptr &&
             !(avctx->flags2 & AV_CODEC_FLAG2_SKIP_MANUAL)) {
             if(frame->nb_samples <= avctx->internal->skip_samples){
diff --git a/libavformat/mov.c b/libavformat/mov.c
index b84d9c0..bb86780 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -2856,6 +2856,21 @@ static int64_t add_index_entry(AVStream *st, int64_t pos, int64_t timestamp,
 }
 
 /**
+ * Rewrite timestamps of index entries in the range [end_index - frame_duration_buffer_size, end_index)
+ * by subtracting end_ts successively by the amounts given in frame_duration_buffer.
+ */
+static void fix_index_entry_timestamps(AVStream* st, int end_index, int64_t end_ts,
+                                       int64_t* frame_duration_buffer,
+                                       int frame_duration_buffer_size) {
+    int i = 0;
+    av_assert0(end_index >= 0 && end_index <= st->nb_index_entries);
+    for (i = 0; i < frame_duration_buffer_size; i++) {
+        end_ts -= frame_duration_buffer[frame_duration_buffer_size - 1 - i];
+        st->index_entries[end_index - 1 - i].timestamp = end_ts;
+    }
+}
+
+/**
  * Append a new ctts entry to ctts_data.
  * Returns the new ctts_count if successful, else returns -1.
  */
@@ -2919,7 +2934,10 @@ static void mov_fix_index(MOVContext *mov, AVStream *st)
     int64_t edit_list_media_time_dts = 0;
     int64_t edit_list_start_encountered = 0;
     int64_t search_timestamp = 0;
-
+    int64_t* frame_duration_buffer = NULL;
+    int num_discarded_begin = 0;
+    int first_non_zero_audio_edit = -1;
+    int packet_skip_samples = 0;
 
     if (!msc->elst_data || msc->elst_count <= 0) {
         return;
@@ -2955,6 +2973,7 @@ static void mov_fix_index(MOVContext *mov, AVStream *st)
         edit_list_index++;
         edit_list_dts_counter = edit_list_dts_entry_end;
         edit_list_dts_entry_end += edit_list_duration;
+        num_discarded_begin = 0;
         if (edit_list_media_time == -1) {
             continue;
         }
@@ -2962,7 +2981,14 @@ static void mov_fix_index(MOVContext *mov, AVStream *st)
         // If we encounter a non-negative edit list reset the skip_samples/start_pad fields and set them
         // according to the edit list below.
         if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
-            st->skip_samples = msc->start_pad = 0;
+            if (first_non_zero_audio_edit < 0) {
+                first_non_zero_audio_edit = 1;
+            } else {
+                first_non_zero_audio_edit = 0;
+            }
+
+            if (first_non_zero_audio_edit > 0)
+                st->skip_samples = msc->start_pad = 0;
         }
 
         //find closest previous key frame
@@ -3041,24 +3067,57 @@ static void mov_fix_index(MOVContext *mov, AVStream *st)
             }
 
             if (curr_cts < edit_list_media_time || curr_cts >= (edit_list_duration + edit_list_media_time)) {
-                if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO && curr_cts < edit_list_media_time &&
-                    curr_cts + frame_duration > edit_list_media_time &&
-                    st->skip_samples == 0 && msc->start_pad == 0) {
-                    st->skip_samples = msc->start_pad = edit_list_media_time - curr_cts;
-
-                    // Shift the index entry timestamp by skip_samples to be correct.
-                    edit_list_dts_counter -= st->skip_samples;
+                if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO && st->codecpar->codec_id != AV_CODEC_ID_VORBIS &&
+                    curr_cts < edit_list_media_time && curr_cts + frame_duration > edit_list_media_time &&
+                    first_non_zero_audio_edit > 0) {
+                     packet_skip_samples = edit_list_media_time - curr_cts;
+                     st->skip_samples += packet_skip_samples;
+
+                    // Shift the index entry timestamp by packet_skip_samples to be correct.
+                    edit_list_dts_counter -= packet_skip_samples;
                     if (edit_list_start_encountered == 0)  {
-                      edit_list_start_encountered = 1;
+                        edit_list_start_encountered = 1;
+                        // Make timestamps strictly monotonically increasing for audio, by rewriting timestamps for
+                        // discarded packets.
+                        if (frame_duration_buffer) {
+                          fix_index_entry_timestamps(st, st->nb_index_entries, edit_list_dts_counter,
+                                                     frame_duration_buffer, num_discarded_begin);
+                          av_freep(&frame_duration_buffer);
+                        }
                     }
 
-                    av_log(mov->fc, AV_LOG_DEBUG, "skip %d audio samples from curr_cts: %"PRId64"\n", st->skip_samples, curr_cts);
+                    av_log(mov->fc, AV_LOG_DEBUG, "skip %d audio samples from curr_cts: %"PRId64"\n", packet_skip_samples, curr_cts);
                 } else {
                     flags |= AVINDEX_DISCARD_FRAME;
                     av_log(mov->fc, AV_LOG_DEBUG, "drop a frame at curr_cts: %"PRId64" @ %"PRId64"\n", curr_cts, index);
+
+                    if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO && edit_list_start_encountered == 0) {
+                        num_discarded_begin++;
+                        frame_duration_buffer = av_realloc(frame_duration_buffer,
+                                                           num_discarded_begin * sizeof(int64_t));
+                        if (!frame_duration_buffer) {
+                            av_log(mov->fc, AV_LOG_ERROR, "Cannot reallocate frame duration buffer\n");
+                            break;
+                        }
+                        frame_duration_buffer[num_discarded_begin - 1] = frame_duration;
+
+                        // Increment skip_samples for the first non-zero audio edit list
+                        if (first_non_zero_audio_edit > 0 && st->codecpar->codec_id != AV_CODEC_ID_VORBIS) {
+                            st->skip_samples += frame_duration;
+                            msc->start_pad = st->skip_samples;
+                        }
+                    }
                 }
             } else if (edit_list_start_encountered == 0) {
                 edit_list_start_encountered = 1;
+                // Make timestamps strictly monotonically increasing for audio, by rewriting timestamps for
+                // discarded packets.
+                if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO && frame_duration_buffer) {
+                    fix_index_entry_timestamps(st, st->nb_index_entries, edit_list_dts_counter,
+                                               frame_duration_buffer, num_discarded_begin);
+                    av_freep(&frame_duration_buffer);
+                }
+
             }
 
             if (add_index_entry(st, current->pos, edit_list_dts_counter, current->size,
diff --git a/tests/ref/fate/gaplessenc-itunes-to-ipod-aac b/tests/ref/fate/gaplessenc-itunes-to-ipod-aac
index 043c085..789681f 100644
--- a/tests/ref/fate/gaplessenc-itunes-to-ipod-aac
+++ b/tests/ref/fate/gaplessenc-itunes-to-ipod-aac
@@ -7,7 +7,7 @@ duration_ts=103326
 start_time=0.000000
 duration=2.367000
 [/FORMAT]
-packet|pts=0|dts=0|duration=N/A
+packet|pts=-1024|dts=-1024|duration=1024
 packet|pts=0|dts=0|duration=1024
 packet|pts=1024|dts=1024|duration=1024
 packet|pts=2048|dts=2048|duration=1024
diff --git a/tests/ref/fate/gaplessenc-pcm-to-mov-aac b/tests/ref/fate/gaplessenc-pcm-to-mov-aac
index 8b7e3f6..8702611 100644
--- a/tests/ref/fate/gaplessenc-pcm-to-mov-aac
+++ b/tests/ref/fate/gaplessenc-pcm-to-mov-aac
@@ -7,7 +7,7 @@ duration_ts=529200
 start_time=0.000000
 duration=12.024000
 [/FORMAT]
-packet|pts=0|dts=0|duration=N/A
+packet|pts=-1024|dts=-1024|duration=1024
 packet|pts=0|dts=0|duration=1024
 packet|pts=1024|dts=1024|duration=1024
 packet|pts=2048|dts=2048|duration=1024