diff mbox

[FFmpeg-devel] libavcodec/videotoolbox: fix decoding of h264 streams with minor SPS changes

Message ID 20171115211506.78836-1-ffmpeg@tmm1.net
State Accepted
Commit 259dc4e01381c0d01fb1dbb8509c5087d621a0d7
Headers show

Commit Message

Aman Gupta Nov. 15, 2017, 9:15 p.m. UTC
From: Aman Gupta <aman@tmm1.net>

Previously the codec kept an entire copy of the SPS, and restarted the VT decoder
session whenever it changed. This fixed decoding errors in [1], as
described in 9519983c. On further inspection, that sample features an SPS change
from High/4.0 to High/3.2 while moving from one scene to another.

Yesterday I received [2], which contains minor SPS changes where the
profile and level do not change. These occur frequently and are not associated with
scene changes. After 9519983c, the VT decoder session is recreated unnecessarily when
these are encountered causing visual glitches.

This commit simplifies the state kept in the VTContext to include just the first three
bytes of the SPS, containing the profile and level details. This is populated initially
when the VT decoder session is created, and used to detect changes and force a restart.

This means minor SPS changes are fed directly into the existing decoder, whereas
profile/level changes force the decoder session to be recreated with the new parameters.

After this commit, both samples [1] and [2] playback as expected.

[1] https://s3.amazonaws.com/tmm1/videotoolbox/spschange.ts
[2] https://s3.amazonaws.com/tmm1/videotoolbox/spschange2.ts

Signed-off-by: Aman Gupta <aman@tmm1.net>
---
 libavcodec/videotoolbox.c | 15 ++++++++-------
 libavcodec/vt_internal.h  |  4 +---
 2 files changed, 9 insertions(+), 10 deletions(-)

Comments

Hendrik Leppkes Nov. 15, 2017, 9:57 p.m. UTC | #1
On Wed, Nov 15, 2017 at 10:15 PM, Aman Gupta <ffmpeg@tmm1.net> wrote:
> From: Aman Gupta <aman@tmm1.net>
>
> Previously the codec kept an entire copy of the SPS, and restarted the VT decoder
> session whenever it changed. This fixed decoding errors in [1], as
> described in 9519983c. On further inspection, that sample features an SPS change
> from High/4.0 to High/3.2 while moving from one scene to another.
>
> Yesterday I received [2], which contains minor SPS changes where the
> profile and level do not change. These occur frequently and are not associated with
> scene changes. After 9519983c, the VT decoder session is recreated unnecessarily when
> these are encountered causing visual glitches.
>
> This commit simplifies the state kept in the VTContext to include just the first three
> bytes of the SPS, containing the profile and level details. This is populated initially
> when the VT decoder session is created, and used to detect changes and force a restart.
>
> This means minor SPS changes are fed directly into the existing decoder, whereas
> profile/level changes force the decoder session to be recreated with the new parameters.
>

The profile and level are not exactly the only things that can change
to force a decoder to be re-created.
How about the frame dimensions, within the same level?

- Hendrik
Aman Gupta Nov. 16, 2017, 1:39 a.m. UTC | #2
On Wed, Nov 15, 2017 at 1:57 PM, Hendrik Leppkes <h.leppkes@gmail.com>
wrote:

> On Wed, Nov 15, 2017 at 10:15 PM, Aman Gupta <ffmpeg@tmm1.net> wrote:
> > From: Aman Gupta <aman@tmm1.net>
> >
> > Previously the codec kept an entire copy of the SPS, and restarted the
> VT decoder
> > session whenever it changed. This fixed decoding errors in [1], as
> > described in 9519983c. On further inspection, that sample features an
> SPS change
> > from High/4.0 to High/3.2 while moving from one scene to another.
> >
> > Yesterday I received [2], which contains minor SPS changes where the
> > profile and level do not change. These occur frequently and are not
> associated with
> > scene changes. After 9519983c, the VT decoder session is recreated
> unnecessarily when
> > these are encountered causing visual glitches.
> >
> > This commit simplifies the state kept in the VTContext to include just
> the first three
> > bytes of the SPS, containing the profile and level details. This is
> populated initially
> > when the VT decoder session is created, and used to detect changes and
> force a restart.
> >
> > This means minor SPS changes are fed directly into the existing decoder,
> whereas
> > profile/level changes force the decoder session to be recreated with the
> new parameters.
> >
>
> The profile and level are not exactly the only things that can change
> to force a decoder to be re-created.
> How about the frame dimensions, within the same level?
>

If you know somewhere I can find such a sample, I can try it with
VideoToolbox to see if it requires the session to be recreated or not.

Aman


>
> - Hendrik
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
Aman Gupta Nov. 17, 2017, 10:44 p.m. UTC | #3
On Wed, Nov 15, 2017 at 1:57 PM, Hendrik Leppkes <h.leppkes@gmail.com>
wrote:

> On Wed, Nov 15, 2017 at 10:15 PM, Aman Gupta <ffmpeg@tmm1.net> wrote:
> > From: Aman Gupta <aman@tmm1.net>
> >
> > Previously the codec kept an entire copy of the SPS, and restarted the
> VT decoder
> > session whenever it changed. This fixed decoding errors in [1], as
> > described in 9519983c. On further inspection, that sample features an
> SPS change
> > from High/4.0 to High/3.2 while moving from one scene to another.
> >
> > Yesterday I received [2], which contains minor SPS changes where the
> > profile and level do not change. These occur frequently and are not
> associated with
> > scene changes. After 9519983c, the VT decoder session is recreated
> unnecessarily when
> > these are encountered causing visual glitches.
> >
> > This commit simplifies the state kept in the VTContext to include just
> the first three
> > bytes of the SPS, containing the profile and level details. This is
> populated initially
> > when the VT decoder session is created, and used to detect changes and
> force a restart.
> >
> > This means minor SPS changes are fed directly into the existing decoder,
> whereas
> > profile/level changes force the decoder session to be recreated with the
> new parameters.
> >
>
> The profile and level are not exactly the only things that can change
> to force a decoder to be re-created.
> How about the frame dimensions, within the same level?
>

I compared the different SPS present in the spschange2.ts sample, and here
is a diff:

 ======= SPS =======
  profile_idc : 100
  constraint_set0_flag : 0
  constraint_set1_flag : 0
  constraint_set2_flag : 0
  constraint_set3_flag : 0
  constraint_set4_flag : 0
  constraint_set5_flag : 0
  reserved_zero_2bits : 0
  level_idc : 40
  seq_parameter_set_id : 0
  chroma_format_idc : 1
  residual_colour_transform_flag : 0
  bit_depth_luma_minus8 : 0
  bit_depth_chroma_minus8 : 0
  qpprime_y_zero_transform_bypass_flag : 0
  seq_scaling_matrix_present_flag : 1
  log2_max_frame_num_minus4 : 0
- pic_order_cnt_type : 1
+ pic_order_cnt_type : 0
-   log2_max_pic_order_cnt_lsb_minus4 : 0
+   log2_max_pic_order_cnt_lsb_minus4 : 3
    delta_pic_order_always_zero_flag : 0
    offset_for_non_ref_pic : 0
-   offset_for_top_to_bottom_field : 7
+   offset_for_top_to_bottom_field : 0
-   num_ref_frames_in_pic_order_cnt_cycle : 7
+   num_ref_frames_in_pic_order_cnt_cycle : 0
- num_ref_frames : 3
+ num_ref_frames : 0
- gaps_in_frame_num_value_allowed_flag : 0
+ gaps_in_frame_num_value_allowed_flag : 1
- pic_width_in_mbs_minus1 : 13
+ pic_width_in_mbs_minus1 : 81
- pic_height_in_map_units_minus1 : 2
+ pic_height_in_map_units_minus1 : 38
  frame_mbs_only_flag : 1
  mb_adaptive_frame_field_flag : 0
  direct_8x8_inference_flag : 0
  frame_cropping_flag : 0
    frame_crop_left_offset : 0
    frame_crop_right_offset : 0
    frame_crop_top_offset : 0
    frame_crop_bottom_offset : 0
- vui_parameters_present_flag : 1
+ vui_parameters_present_flag : 0

Interestingly, the pic_height/pic_width do in fact change already in that
sample. But the correct thing to do, as far as decoding with VideoToolbox,
is to keep the same decompression session instance and pass the new SPS
NALU into the decoder along with the image slices.

Aman


>
> - Hendrik
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
Hendrik Leppkes Nov. 17, 2017, 11:13 p.m. UTC | #4
On Fri, Nov 17, 2017 at 11:44 PM, Aman Gupta <ffmpeg@tmm1.net> wrote:
> On Wed, Nov 15, 2017 at 1:57 PM, Hendrik Leppkes <h.leppkes@gmail.com>
> wrote:
>
>> On Wed, Nov 15, 2017 at 10:15 PM, Aman Gupta <ffmpeg@tmm1.net> wrote:
>> > From: Aman Gupta <aman@tmm1.net>
>> >
>> > Previously the codec kept an entire copy of the SPS, and restarted the
>> VT decoder
>> > session whenever it changed. This fixed decoding errors in [1], as
>> > described in 9519983c. On further inspection, that sample features an
>> SPS change
>> > from High/4.0 to High/3.2 while moving from one scene to another.
>> >
>> > Yesterday I received [2], which contains minor SPS changes where the
>> > profile and level do not change. These occur frequently and are not
>> associated with
>> > scene changes. After 9519983c, the VT decoder session is recreated
>> unnecessarily when
>> > these are encountered causing visual glitches.
>> >
>> > This commit simplifies the state kept in the VTContext to include just
>> the first three
>> > bytes of the SPS, containing the profile and level details. This is
>> populated initially
>> > when the VT decoder session is created, and used to detect changes and
>> force a restart.
>> >
>> > This means minor SPS changes are fed directly into the existing decoder,
>> whereas
>> > profile/level changes force the decoder session to be recreated with the
>> new parameters.
>> >
>>
>> The profile and level are not exactly the only things that can change
>> to force a decoder to be re-created.
>> How about the frame dimensions, within the same level?
>>
>
> I compared the different SPS present in the spschange2.ts sample, and here
> is a diff:
>
>  ======= SPS =======
>   profile_idc : 100
>   constraint_set0_flag : 0
>   constraint_set1_flag : 0
>   constraint_set2_flag : 0
>   constraint_set3_flag : 0
>   constraint_set4_flag : 0
>   constraint_set5_flag : 0
>   reserved_zero_2bits : 0
>   level_idc : 40
>   seq_parameter_set_id : 0
>   chroma_format_idc : 1
>   residual_colour_transform_flag : 0
>   bit_depth_luma_minus8 : 0
>   bit_depth_chroma_minus8 : 0
>   qpprime_y_zero_transform_bypass_flag : 0
>   seq_scaling_matrix_present_flag : 1
>   log2_max_frame_num_minus4 : 0
> - pic_order_cnt_type : 1
> + pic_order_cnt_type : 0
> -   log2_max_pic_order_cnt_lsb_minus4 : 0
> +   log2_max_pic_order_cnt_lsb_minus4 : 3
>     delta_pic_order_always_zero_flag : 0
>     offset_for_non_ref_pic : 0
> -   offset_for_top_to_bottom_field : 7
> +   offset_for_top_to_bottom_field : 0
> -   num_ref_frames_in_pic_order_cnt_cycle : 7
> +   num_ref_frames_in_pic_order_cnt_cycle : 0
> - num_ref_frames : 3
> + num_ref_frames : 0
> - gaps_in_frame_num_value_allowed_flag : 0
> + gaps_in_frame_num_value_allowed_flag : 1
> - pic_width_in_mbs_minus1 : 13
> + pic_width_in_mbs_minus1 : 81
> - pic_height_in_map_units_minus1 : 2
> + pic_height_in_map_units_minus1 : 38
>   frame_mbs_only_flag : 1
>   mb_adaptive_frame_field_flag : 0
>   direct_8x8_inference_flag : 0
>   frame_cropping_flag : 0
>     frame_crop_left_offset : 0
>     frame_crop_right_offset : 0
>     frame_crop_top_offset : 0
>     frame_crop_bottom_offset : 0
> - vui_parameters_present_flag : 1
> + vui_parameters_present_flag : 0
>
> Interestingly, the pic_height/pic_width do in fact change already in that
> sample. But the correct thing to do, as far as decoding with VideoToolbox,
> is to keep the same decompression session instance and pass the new SPS
> NALU into the decoder along with the image slices.
>

Does it actually properly output images of the new size in that case?

All I'm saying is that profile and level may not exactly be the
parameters that actually require a re-creation. Profile maybe, but
level unlikely. And there might as well be others.
Is this not documented?

- Hendrik
Aman Gupta Nov. 17, 2017, 11:32 p.m. UTC | #5
On Fri, Nov 17, 2017 at 3:14 PM Hendrik Leppkes <h.leppkes@gmail.com> wrote:

> On Fri, Nov 17, 2017 at 11:44 PM, Aman Gupta <ffmpeg@tmm1.net> wrote:
> > On Wed, Nov 15, 2017 at 1:57 PM, Hendrik Leppkes <h.leppkes@gmail.com>
> > wrote:
> >
> >> On Wed, Nov 15, 2017 at 10:15 PM, Aman Gupta <ffmpeg@tmm1.net> wrote:
> >> > From: Aman Gupta <aman@tmm1.net>
> >> >
> >> > Previously the codec kept an entire copy of the SPS, and restarted the
> >> VT decoder
> >> > session whenever it changed. This fixed decoding errors in [1], as
> >> > described in 9519983c. On further inspection, that sample features an
> >> SPS change
> >> > from High/4.0 to High/3.2 while moving from one scene to another.
> >> >
> >> > Yesterday I received [2], which contains minor SPS changes where the
> >> > profile and level do not change. These occur frequently and are not
> >> associated with
> >> > scene changes. After 9519983c, the VT decoder session is recreated
> >> unnecessarily when
> >> > these are encountered causing visual glitches.
> >> >
> >> > This commit simplifies the state kept in the VTContext to include just
> >> the first three
> >> > bytes of the SPS, containing the profile and level details. This is
> >> populated initially
> >> > when the VT decoder session is created, and used to detect changes and
> >> force a restart.
> >> >
> >> > This means minor SPS changes are fed directly into the existing
> decoder,
> >> whereas
> >> > profile/level changes force the decoder session to be recreated with
> the
> >> new parameters.
> >> >
> >>
> >> The profile and level are not exactly the only things that can change
> >> to force a decoder to be re-created.
> >> How about the frame dimensions, within the same level?
> >>
> >
> > I compared the different SPS present in the spschange2.ts sample, and
> here
> > is a diff:
> >
> >  ======= SPS =======
> >   profile_idc : 100
> >   constraint_set0_flag : 0
> >   constraint_set1_flag : 0
> >   constraint_set2_flag : 0
> >   constraint_set3_flag : 0
> >   constraint_set4_flag : 0
> >   constraint_set5_flag : 0
> >   reserved_zero_2bits : 0
> >   level_idc : 40
> >   seq_parameter_set_id : 0
> >   chroma_format_idc : 1
> >   residual_colour_transform_flag : 0
> >   bit_depth_luma_minus8 : 0
> >   bit_depth_chroma_minus8 : 0
> >   qpprime_y_zero_transform_bypass_flag : 0
> >   seq_scaling_matrix_present_flag : 1
> >   log2_max_frame_num_minus4 : 0
> > - pic_order_cnt_type : 1
> > + pic_order_cnt_type : 0
> > -   log2_max_pic_order_cnt_lsb_minus4 : 0
> > +   log2_max_pic_order_cnt_lsb_minus4 : 3
> >     delta_pic_order_always_zero_flag : 0
> >     offset_for_non_ref_pic : 0
> > -   offset_for_top_to_bottom_field : 7
> > +   offset_for_top_to_bottom_field : 0
> > -   num_ref_frames_in_pic_order_cnt_cycle : 7
> > +   num_ref_frames_in_pic_order_cnt_cycle : 0
> > - num_ref_frames : 3
> > + num_ref_frames : 0
> > - gaps_in_frame_num_value_allowed_flag : 0
> > + gaps_in_frame_num_value_allowed_flag : 1
> > - pic_width_in_mbs_minus1 : 13
> > + pic_width_in_mbs_minus1 : 81
> > - pic_height_in_map_units_minus1 : 2
> > + pic_height_in_map_units_minus1 : 38
> >   frame_mbs_only_flag : 1
> >   mb_adaptive_frame_field_flag : 0
> >   direct_8x8_inference_flag : 0
> >   frame_cropping_flag : 0
> >     frame_crop_left_offset : 0
> >     frame_crop_right_offset : 0
> >     frame_crop_top_offset : 0
> >     frame_crop_bottom_offset : 0
> > - vui_parameters_present_flag : 1
> > + vui_parameters_present_flag : 0
> >
> > Interestingly, the pic_height/pic_width do in fact change already in that
> > sample. But the correct thing to do, as far as decoding with
> VideoToolbox,
> > is to keep the same decompression session instance and pass the new SPS
> > NALU into the decoder along with the image slices.
> >
>
> Does it actually properly output images of the new size in that case?
>
> All I'm saying is that profile and level may not exactly be the
> parameters that actually require a re-creation. Profile maybe, but
> level unlikely. And there might as well be others.


> Is this not documented?


I understand what you're saying, but there is no documentation about this
from Apple so I can only deduce what is required empirically.

I already confirmed that level changes require a decoder restart, as seen
in the first sample (which goes from level 40 to 32). Without a restart the
VT decoder stalls and fails to produce any new frames.

Aman


>
> - Hendrik
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
diff mbox

Patch

diff --git a/libavcodec/videotoolbox.c b/libavcodec/videotoolbox.c
index 9eeada30ba..d29607363c 100644
--- a/libavcodec/videotoolbox.c
+++ b/libavcodec/videotoolbox.c
@@ -90,6 +90,7 @@  int ff_videotoolbox_alloc_frame(AVCodecContext *avctx, AVFrame *frame)
 
 CFDataRef ff_videotoolbox_avcc_extradata_create(AVCodecContext *avctx)
 {
+    VTContext *vtctx = avctx->internal->hwaccel_priv_data;
     H264Context *h = avctx->priv_data;
     CFDataRef data = NULL;
     uint8_t *p;
@@ -116,6 +117,10 @@  CFDataRef ff_videotoolbox_avcc_extradata_create(AVCodecContext *avctx)
     p += 3 + h->ps.pps->data_size;
     av_assert0(p - vt_extradata == vt_extradata_size);
 
+    // save sps header (profile/level) used to create decoder session,
+    // so we can detect changes and recreate it.
+    memcpy(vtctx->sps, h->ps.sps->data + 1, 3);
+
     data = CFDataCreate(kCFAllocatorDefault, vt_extradata, vt_extradata_size);
     av_free(vt_extradata);
     return data;
@@ -320,16 +325,13 @@  static int videotoolbox_h264_decode_params(AVCodecContext *avctx,
     VTContext *vtctx = avctx->internal->hwaccel_priv_data;
 
     if (type == H264_NAL_SPS) {
-        if (!vtctx->sps || vtctx->sps_len != size || memcmp(buffer, vtctx->sps, size) != 0) {
-            vtctx->sps = av_fast_realloc(vtctx->sps, &vtctx->sps_capa, size);
-            if (vtctx->sps)
-                memcpy(vtctx->sps, buffer, size);
+        if (size > 4 && memcmp(vtctx->sps, buffer + 1, 3) != 0) {
             vtctx->reconfig_needed = true;
-            vtctx->sps_len = size;
+            memcpy(vtctx->sps, buffer + 1, 3);
         }
     }
 
-    // pass-through new PPS to the decoder
+    // pass-through SPS/PPS changes to the decoder
     return ff_videotoolbox_h264_decode_slice(avctx, buffer, size);
 }
 
@@ -365,7 +367,6 @@  int ff_videotoolbox_uninit(AVCodecContext *avctx)
     VTContext *vtctx = avctx->internal->hwaccel_priv_data;
     if (vtctx) {
         av_freep(&vtctx->bitstream);
-        av_freep(&vtctx->sps);
         if (vtctx->frame)
             CVPixelBufferRelease(vtctx->frame);
     }
diff --git a/libavcodec/vt_internal.h b/libavcodec/vt_internal.h
index fc27dad9f9..929fe423fc 100644
--- a/libavcodec/vt_internal.h
+++ b/libavcodec/vt_internal.h
@@ -40,9 +40,7 @@  typedef struct VTContext {
     struct AVVideotoolboxContext *vt_ctx;
 
     // Current H264 parameters (used to trigger decoder restart on SPS changes).
-    uint8_t                     *sps;
-    uint32_t                    sps_len;
-    unsigned int                sps_capa;
+    uint8_t                     sps[3];
     bool                        reconfig_needed;
 } VTContext;