[FFmpeg-devel,v2,2/2] ffmpeg: add option -isync

Message ID	20220625082951.11897-2-ffmpeg@gyani.pro
State	New
Headers	show Delivered-To: ffmpegpatchwork2@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; From: Gyan Doshi <ffmpeg@gyani.pro> To: ffmpeg-devel@ffmpeg.org Date: Sat, 25 Jun 2022 13:59:51 +0530 Message-Id: <20220625082951.11897-2-ffmpeg@gyani.pro> In-Reply-To: <20220625082951.11897-1-ffmpeg@gyani.pro> References: <20220625082951.11897-1-ffmpeg@gyani.pro> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 2/2] ffmpeg: add option -isync Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Series	[FFmpeg-devel,v2,1/2] avformat: add AVFormatContext.first_pkt_wallclock \| expand [FFmpeg-devel,v2,1/2] avformat: add AVFormatContext.first_pkt_wallclock [FFmpeg-devel,v2,2/2] ffmpeg: add option -isync

Context	Check	Description
yinshiyou/make_loongarch64	success	Make finished
yinshiyou/make_fate_loongarch64	success	Make fate finished
andriy/make_x86	success	Make finished
andriy/make_fate_x86	success	Make fate finished
andriy/make_armv7_RPi4	success	Make finished
andriy/make_fate_armv7_RPi4	success	Make fate finished

Gyan Doshi June 25, 2022, 8:29 a.m. UTC

This is a per-file input option that adjusts an input's timestamps
with reference to another input, so that emitted packet timestamps
account for the difference between the start times of the two inputs.

Typical use case is to sync two or more live inputs such as from capture
devices. Both the target and reference input source timestamps should be
based on the same clock source.

If not all inputs have timestamps, the wallclock times at the time of
reception of inputs shall be used.
---
 doc/ffmpeg.texi      | 16 ++++++++++++
 fftools/ffmpeg.h     |  2 ++
 fftools/ffmpeg_opt.c | 59 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+)

Gyan Doshi June 27, 2022, 1:25 p.m. UTC | #1

Ping for the set.

On 2022-06-25 01:59 pm, Gyan Doshi wrote:
> This is a per-file input option that adjusts an input's timestamps
> with reference to another input, so that emitted packet timestamps
> account for the difference between the start times of the two inputs.
>
> Typical use case is to sync two or more live inputs such as from capture
> devices. Both the target and reference input source timestamps should be
> based on the same clock source.
>
> If not all inputs have timestamps, the wallclock times at the time of
> reception of inputs shall be used.
> ---
>   doc/ffmpeg.texi      | 16 ++++++++++++
>   fftools/ffmpeg.h     |  2 ++
>   fftools/ffmpeg_opt.c | 59 ++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 77 insertions(+)
>
> diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
> index d943f4d6f5..8fc85d3a15 100644
> --- a/doc/ffmpeg.texi
> +++ b/doc/ffmpeg.texi
> @@ -518,6 +518,22 @@ see @ref{time duration syntax,,the Time duration section in the ffmpeg-utils(1)
>   Like the @code{-ss} option but relative to the "end of file". That is negative
>   values are earlier in the file, 0 is at EOF.
>   
> +@item -isync @var{input_index} (@emph{input})
> +Assign an input as a sync source.
> +
> +This will take the difference between the start times of the target and referenced inputs and
> +offset the timestamps of the target file by that difference. The source timestamps of the two
> +inputs should derive from the same clock source for expected results. If @code{copyts} is set
> +then @code{start_at_zero} must also be set. If at least one of the inputs has no starting
> +timestamp then the wallclock time at time of reception of the inputs is used as a best-effort
> +sync basis.
> +
> +Acceptable values are those that refer to a valid ffmpeg input index. If the sync reference is
> +the target index itself or @var{-1}, then no adjustment is made to target timestamps. A sync
> +reference may not itself be synced to any other input.
> +
> +Default value is @var{-1}.
> +
>   @item -itsoffset @var{offset} (@emph{input})
>   Set the input time offset.
>   
> diff --git a/fftools/ffmpeg.h b/fftools/ffmpeg.h
> index 69a368b8d1..dc74de6684 100644
> --- a/fftools/ffmpeg.h
> +++ b/fftools/ffmpeg.h
> @@ -118,6 +118,7 @@ typedef struct OptionsContext {
>       float readrate;
>       int accurate_seek;
>       int thread_queue_size;
> +    int input_sync_ref;
>   
>       SpecifierOpt *ts_scale;
>       int        nb_ts_scale;
> @@ -410,6 +411,7 @@ typedef struct InputFile {
>                                at the moment when looping happens */
>       AVRational time_base; /* time base of the duration */
>       int64_t input_ts_offset;
> +    int input_sync_ref;
>   
>       int64_t ts_offset;
>       int64_t last_ts;
> diff --git a/fftools/ffmpeg_opt.c b/fftools/ffmpeg_opt.c
> index e08455478f..de858efbe9 100644
> --- a/fftools/ffmpeg_opt.c
> +++ b/fftools/ffmpeg_opt.c
> @@ -235,6 +235,7 @@ static void init_options(OptionsContext *o)
>       o->chapters_input_file = INT_MAX;
>       o->accurate_seek  = 1;
>       o->thread_queue_size = -1;
> +    o->input_sync_ref = -1;
>   }
>   
>   static int show_hwaccels(void *optctx, const char *opt, const char *arg)
> @@ -287,6 +288,58 @@ static int parse_and_set_vsync(const char *arg, int *vsync_var, int file_idx, in
>       return 0;
>   }
>   
> +static int apply_sync_offsets(void)
> +{
> +    for (int i = 0; i < nb_input_files; i++) {
> +        InputFile *ref, *self = input_files[i];
> +        int64_t adjustment;
> +        int64_t self_start_time, ref_start_time, self_seek_start, ref_seek_start;
> +        int sync_fpw = 0;
> +
> +        if (self->input_sync_ref == -1 || self->input_sync_ref == i) continue;
> +        if (self->input_sync_ref >= nb_input_files || self->input_sync_ref < -1) {
> +            av_log(NULL, AV_LOG_FATAL, "-isync for input %d references non-existent input %d.\n", i, self->input_sync_ref);
> +            exit_program(1);
> +        }
> +
> +        if (copy_ts && !start_at_zero) {
> +            av_log(NULL, AV_LOG_FATAL, "Use of -isync requires that start_at_zero be set if copyts is set.\n");
> +            exit_program(1);
> +        }
> +
> +        ref = input_files[self->input_sync_ref];
> +        if (ref->input_sync_ref != -1 && ref->input_sync_ref != self->input_sync_ref) {
> +            av_log(NULL, AV_LOG_ERROR, "-isync for input %d references a resynced input %d. Sync not set.\n", i, self->input_sync_ref);
> +            continue;
> +        }
> +
> +        if (self->ctx->start_time_realtime != AV_NOPTS_VALUE && ref->ctx->start_time_realtime != AV_NOPTS_VALUE) {
> +            self_start_time = self->ctx->start_time_realtime;
> +            ref_start_time  =  ref->ctx->start_time_realtime;
> +        } else if (self->ctx->start_time != AV_NOPTS_VALUE && ref->ctx->start_time != AV_NOPTS_VALUE) {
> +            self_start_time = self->ctx->start_time;
> +            ref_start_time  =  ref->ctx->start_time;
> +        } else {
> +            self_start_time = self->ctx->first_pkt_wallclock;
> +            ref_start_time  =  ref->ctx->first_pkt_wallclock;
> +            sync_fpw = 1;
> +        }
> +
> +        self_seek_start = self->start_time == AV_NOPTS_VALUE ? 0 : self->start_time;
> +        ref_seek_start  =  ref->start_time == AV_NOPTS_VALUE ? 0 :  ref->start_time;
> +
> +        adjustment = (self_start_time - ref_start_time) + !copy_ts*(self_seek_start - ref_seek_start) + ref->input_ts_offset;
> +
> +        self->ts_offset += adjustment;
> +
> +        av_log(NULL, AV_LOG_INFO, "Adjusted ts offset for Input #%d by %"PRId64"d us to sync with Input #%d", i, adjustment, self->input_sync_ref);
> +        if (sync_fpw) av_log(NULL, AV_LOG_INFO, " using reception wallclock time. Sync may not be obtained");
> +        av_log(NULL, AV_LOG_INFO, ".\n");
> +    }
> +
> +    return 0;
> +}
> +
>   static int opt_filter_threads(void *optctx, const char *opt, const char *arg)
>   {
>       av_free(filter_nbthreads);
> @@ -1305,6 +1358,7 @@ static int open_input_file(OptionsContext *o, const char *filename)
>       f->ist_index  = nb_input_streams - ic->nb_streams;
>       f->start_time = o->start_time;
>       f->recording_time = o->recording_time;
> +    f->input_sync_ref = o->input_sync_ref;
>       f->input_ts_offset = o->input_ts_offset;
>       f->ts_offset  = o->input_ts_offset - (copy_ts ? (start_at_zero && ic->start_time != AV_NOPTS_VALUE ? ic->start_time : 0) : timestamp);
>       f->nb_streams = ic->nb_streams;
> @@ -3489,6 +3543,8 @@ int ffmpeg_parse_options(int argc, char **argv)
>           goto fail;
>       }
>   
> +    apply_sync_offsets();
> +
>       /* create the complex filtergraphs */
>       ret = init_complex_filters();
>       if (ret < 0) {
> @@ -3603,6 +3659,9 @@ const OptionDef options[] = {
>       { "accurate_seek",  OPT_BOOL | OPT_OFFSET | OPT_EXPERT |
>                           OPT_INPUT,                                   { .off = OFFSET(accurate_seek) },
>           "enable/disable accurate seeking with -ss" },
> +    { "isync",          HAS_ARG | OPT_INT | OPT_OFFSET |
> +                        OPT_EXPERT | OPT_INPUT,                      { .off = OFFSET(input_sync_ref) },
> +        "Indicate the input index for sync reference", "sync ref" },
>       { "itsoffset",      HAS_ARG | OPT_TIME | OPT_OFFSET |
>                           OPT_EXPERT | OPT_INPUT,                      { .off = OFFSET(input_ts_offset) },
>           "set the input ts offset", "time_off" },

Gyan Doshi July 1, 2022, 4:16 a.m. UTC | #2

On 2022-06-27 06:55 pm, Gyan Doshi wrote:
> Ping for the set.

Plan to push on Monday.

Regards,
Gyan

>
> On 2022-06-25 01:59 pm, Gyan Doshi wrote:
>> This is a per-file input option that adjusts an input's timestamps
>> with reference to another input, so that emitted packet timestamps
>> account for the difference between the start times of the two inputs.
>>
>> Typical use case is to sync two or more live inputs such as from capture
>> devices. Both the target and reference input source timestamps should be
>> based on the same clock source.
>>
>> If not all inputs have timestamps, the wallclock times at the time of
>> reception of inputs shall be used.
>> ---
>>   doc/ffmpeg.texi      | 16 ++++++++++++
>>   fftools/ffmpeg.h     |  2 ++
>>   fftools/ffmpeg_opt.c | 59 ++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 77 insertions(+)
>>
>> diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
>> index d943f4d6f5..8fc85d3a15 100644
>> --- a/doc/ffmpeg.texi
>> +++ b/doc/ffmpeg.texi
>> @@ -518,6 +518,22 @@ see @ref{time duration syntax,,the Time duration 
>> section in the ffmpeg-utils(1)
>>   Like the @code{-ss} option but relative to the "end of file". That 
>> is negative
>>   values are earlier in the file, 0 is at EOF.
>>   +@item -isync @var{input_index} (@emph{input})
>> +Assign an input as a sync source.
>> +
>> +This will take the difference between the start times of the target 
>> and referenced inputs and
>> +offset the timestamps of the target file by that difference. The 
>> source timestamps of the two
>> +inputs should derive from the same clock source for expected 
>> results. If @code{copyts} is set
>> +then @code{start_at_zero} must also be set. If at least one of the 
>> inputs has no starting
>> +timestamp then the wallclock time at time of reception of the inputs 
>> is used as a best-effort
>> +sync basis.
>> +
>> +Acceptable values are those that refer to a valid ffmpeg input 
>> index. If the sync reference is
>> +the target index itself or @var{-1}, then no adjustment is made to 
>> target timestamps. A sync
>> +reference may not itself be synced to any other input.
>> +
>> +Default value is @var{-1}.
>> +
>>   @item -itsoffset @var{offset} (@emph{input})
>>   Set the input time offset.
>>   diff --git a/fftools/ffmpeg.h b/fftools/ffmpeg.h
>> index 69a368b8d1..dc74de6684 100644
>> --- a/fftools/ffmpeg.h
>> +++ b/fftools/ffmpeg.h
>> @@ -118,6 +118,7 @@ typedef struct OptionsContext {
>>       float readrate;
>>       int accurate_seek;
>>       int thread_queue_size;
>> +    int input_sync_ref;
>>         SpecifierOpt *ts_scale;
>>       int        nb_ts_scale;
>> @@ -410,6 +411,7 @@ typedef struct InputFile {
>>                                at the moment when looping happens */
>>       AVRational time_base; /* time base of the duration */
>>       int64_t input_ts_offset;
>> +    int input_sync_ref;
>>         int64_t ts_offset;
>>       int64_t last_ts;
>> diff --git a/fftools/ffmpeg_opt.c b/fftools/ffmpeg_opt.c
>> index e08455478f..de858efbe9 100644
>> --- a/fftools/ffmpeg_opt.c
>> +++ b/fftools/ffmpeg_opt.c
>> @@ -235,6 +235,7 @@ static void init_options(OptionsContext *o)
>>       o->chapters_input_file = INT_MAX;
>>       o->accurate_seek  = 1;
>>       o->thread_queue_size = -1;
>> +    o->input_sync_ref = -1;
>>   }
>>     static int show_hwaccels(void *optctx, const char *opt, const 
>> char *arg)
>> @@ -287,6 +288,58 @@ static int parse_and_set_vsync(const char *arg, 
>> int *vsync_var, int file_idx, in
>>       return 0;
>>   }
>>   +static int apply_sync_offsets(void)
>> +{
>> +    for (int i = 0; i < nb_input_files; i++) {
>> +        InputFile *ref, *self = input_files[i];
>> +        int64_t adjustment;
>> +        int64_t self_start_time, ref_start_time, self_seek_start, 
>> ref_seek_start;
>> +        int sync_fpw = 0;
>> +
>> +        if (self->input_sync_ref == -1 || self->input_sync_ref == i) 
>> continue;
>> +        if (self->input_sync_ref >= nb_input_files || 
>> self->input_sync_ref < -1) {
>> +            av_log(NULL, AV_LOG_FATAL, "-isync for input %d 
>> references non-existent input %d.\n", i, self->input_sync_ref);
>> +            exit_program(1);
>> +        }
>> +
>> +        if (copy_ts && !start_at_zero) {
>> +            av_log(NULL, AV_LOG_FATAL, "Use of -isync requires that 
>> start_at_zero be set if copyts is set.\n");
>> +            exit_program(1);
>> +        }
>> +
>> +        ref = input_files[self->input_sync_ref];
>> +        if (ref->input_sync_ref != -1 && ref->input_sync_ref != 
>> self->input_sync_ref) {
>> +            av_log(NULL, AV_LOG_ERROR, "-isync for input %d 
>> references a resynced input %d. Sync not set.\n", i, 
>> self->input_sync_ref);
>> +            continue;
>> +        }
>> +
>> +        if (self->ctx->start_time_realtime != AV_NOPTS_VALUE && 
>> ref->ctx->start_time_realtime != AV_NOPTS_VALUE) {
>> +            self_start_time = self->ctx->start_time_realtime;
>> +            ref_start_time  = ref->ctx->start_time_realtime;
>> +        } else if (self->ctx->start_time != AV_NOPTS_VALUE && 
>> ref->ctx->start_time != AV_NOPTS_VALUE) {
>> +            self_start_time = self->ctx->start_time;
>> +            ref_start_time  =  ref->ctx->start_time;
>> +        } else {
>> +            self_start_time = self->ctx->first_pkt_wallclock;
>> +            ref_start_time  = ref->ctx->first_pkt_wallclock;
>> +            sync_fpw = 1;
>> +        }
>> +
>> +        self_seek_start = self->start_time == AV_NOPTS_VALUE ? 0 : 
>> self->start_time;
>> +        ref_seek_start  =  ref->start_time == AV_NOPTS_VALUE ? 0 :  
>> ref->start_time;
>> +
>> +        adjustment = (self_start_time - ref_start_time) + 
>> !copy_ts*(self_seek_start - ref_seek_start) + ref->input_ts_offset;
>> +
>> +        self->ts_offset += adjustment;
>> +
>> +        av_log(NULL, AV_LOG_INFO, "Adjusted ts offset for Input #%d 
>> by %"PRId64"d us to sync with Input #%d", i, adjustment, 
>> self->input_sync_ref);
>> +        if (sync_fpw) av_log(NULL, AV_LOG_INFO, " using reception 
>> wallclock time. Sync may not be obtained");
>> +        av_log(NULL, AV_LOG_INFO, ".\n");
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static int opt_filter_threads(void *optctx, const char *opt, const 
>> char *arg)
>>   {
>>       av_free(filter_nbthreads);
>> @@ -1305,6 +1358,7 @@ static int open_input_file(OptionsContext *o, 
>> const char *filename)
>>       f->ist_index  = nb_input_streams - ic->nb_streams;
>>       f->start_time = o->start_time;
>>       f->recording_time = o->recording_time;
>> +    f->input_sync_ref = o->input_sync_ref;
>>       f->input_ts_offset = o->input_ts_offset;
>>       f->ts_offset  = o->input_ts_offset - (copy_ts ? (start_at_zero 
>> && ic->start_time != AV_NOPTS_VALUE ? ic->start_time : 0) : timestamp);
>>       f->nb_streams = ic->nb_streams;
>> @@ -3489,6 +3543,8 @@ int ffmpeg_parse_options(int argc, char **argv)
>>           goto fail;
>>       }
>>   +    apply_sync_offsets();
>> +
>>       /* create the complex filtergraphs */
>>       ret = init_complex_filters();
>>       if (ret < 0) {
>> @@ -3603,6 +3659,9 @@ const OptionDef options[] = {
>>       { "accurate_seek",  OPT_BOOL | OPT_OFFSET | OPT_EXPERT |
>> OPT_INPUT,                                   { .off = 
>> OFFSET(accurate_seek) },
>>           "enable/disable accurate seeking with -ss" },
>> +    { "isync",          HAS_ARG | OPT_INT | OPT_OFFSET |
>> +                        OPT_EXPERT | OPT_INPUT,                      
>> { .off = OFFSET(input_sync_ref) },
>> +        "Indicate the input index for sync reference", "sync ref" },
>>       { "itsoffset",      HAS_ARG | OPT_TIME | OPT_OFFSET |
>>                           OPT_EXPERT | 
>> OPT_INPUT,                      { .off = OFFSET(input_ts_offset) },
>>           "set the input ts offset", "time_off" },
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

Anton Khirnov July 1, 2022, 10:03 a.m. UTC | #3

Quoting Gyan Doshi (2022-06-25 10:29:51)
> This is a per-file input option that adjusts an input's timestamps
> with reference to another input, so that emitted packet timestamps
> account for the difference between the start times of the two inputs.
> 
> Typical use case is to sync two or more live inputs such as from capture
> devices. Both the target and reference input source timestamps should be
> based on the same clock source.

If both streams are using the same clock, then why is any extra
synchronization needed?

Gyan Doshi July 1, 2022, 11:03 a.m. UTC | #4

On 2022-07-01 03:33 pm, Anton Khirnov wrote:
> Quoting Gyan Doshi (2022-06-25 10:29:51)
>> This is a per-file input option that adjusts an input's timestamps
>> with reference to another input, so that emitted packet timestamps
>> account for the difference between the start times of the two inputs.
>>
>> Typical use case is to sync two or more live inputs such as from capture
>> devices. Both the target and reference input source timestamps should be
>> based on the same clock source.
> If both streams are using the same clock, then why is any extra
> synchronization needed?

Because ffmpeg.c normalizes timestamps by default. We can keep 
timestamps using -copyts, but these inputs are usually preprocessed 
using single-input filters which won't have access to the reference 
inputs, or the merge filters like e.g. amix don't sync by timestamp.

What this option does is allow keeping 0-start timestamps but offset the 
target input relatively. Then it's possible to filter/merge in sync.

Regards,
Gyan

Anton Khirnov July 2, 2022, 8:42 a.m. UTC | #5

Quoting Gyan Doshi (2022-07-01 13:03:04)
> 
> 
> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
> > Quoting Gyan Doshi (2022-06-25 10:29:51)
> >> This is a per-file input option that adjusts an input's timestamps
> >> with reference to another input, so that emitted packet timestamps
> >> account for the difference between the start times of the two inputs.
> >>
> >> Typical use case is to sync two or more live inputs such as from capture
> >> devices. Both the target and reference input source timestamps should be
> >> based on the same clock source.
> > If both streams are using the same clock, then why is any extra
> > synchronization needed?
> 
> Because ffmpeg.c normalizes timestamps by default. We can keep 
> timestamps using -copyts, but these inputs are usually preprocessed 
> using single-input filters which won't have access to the reference 
> inputs,

No idea what you mean by "reference inputs" here.

> or the merge filters like e.g. amix don't sync by timestamp.

amix does seem to look at timestamps.

Gyan Doshi July 2, 2022, 9:51 a.m. UTC | #6

On 2022-07-02 02:12 pm, Anton Khirnov wrote:
> Quoting Gyan Doshi (2022-07-01 13:03:04)
>>
>> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
>>> Quoting Gyan Doshi (2022-06-25 10:29:51)
>>>> This is a per-file input option that adjusts an input's timestamps
>>>> with reference to another input, so that emitted packet timestamps
>>>> account for the difference between the start times of the two inputs.
>>>>
>>>> Typical use case is to sync two or more live inputs such as from capture
>>>> devices. Both the target and reference input source timestamps should be
>>>> based on the same clock source.
>>> If both streams are using the same clock, then why is any extra
>>> synchronization needed?
>> Because ffmpeg.c normalizes timestamps by default. We can keep
>> timestamps using -copyts, but these inputs are usually preprocessed
>> using single-input filters which won't have access to the reference
>> inputs,
> No idea what you mean by "reference inputs" here.

The reference input is the one the target is being synced against. e.g. 
in a karaoke session -  the music track from a DAW would be ref and the 
user's voice via mic is the target.

>> or the merge filters like e.g. amix don't sync by timestamp.
> amix does seem to look at timestamps.

amix does not *sync* by timestamp. If one input starts at 4 and the 
other at 7, the 2nd isn't aligned by timestamp.

Regards,
Gyan

Gyan Doshi July 4, 2022, 3:47 a.m. UTC | #7

On 2022-07-02 03:21 pm, Gyan Doshi wrote:
>
>
> On 2022-07-02 02:12 pm, Anton Khirnov wrote:
>> Quoting Gyan Doshi (2022-07-01 13:03:04)
>>>
>>> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
>>>> Quoting Gyan Doshi (2022-06-25 10:29:51)
>>>>> This is a per-file input option that adjusts an input's timestamps
>>>>> with reference to another input, so that emitted packet timestamps
>>>>> account for the difference between the start times of the two inputs.
>>>>>
>>>>> Typical use case is to sync two or more live inputs such as from 
>>>>> capture
>>>>> devices. Both the target and reference input source timestamps 
>>>>> should be
>>>>> based on the same clock source.
>>>> If both streams are using the same clock, then why is any extra
>>>> synchronization needed?
>>> Because ffmpeg.c normalizes timestamps by default. We can keep
>>> timestamps using -copyts, but these inputs are usually preprocessed
>>> using single-input filters which won't have access to the reference
>>> inputs,
>> No idea what you mean by "reference inputs" here.
>
> The reference input is the one the target is being synced against. 
> e.g. in a karaoke session -  the music track from a DAW would be ref 
> and the user's voice via mic is the target.
>
>>> or the merge filters like e.g. amix don't sync by timestamp.
>> amix does seem to look at timestamps.
>
> amix does not *sync* by timestamp. If one input starts at 4 and the 
> other at 7, the 2nd isn't aligned by timestamp.

If any further comments or objections, let me know.

Plan to push set tonight. The first patch isn't crucial. I can adapt to 
push the 2nd without the first. But ideally both.

Regards,
Gyan

Anton Khirnov July 4, 2022, 6:21 a.m. UTC | #8

Quoting Gyan Doshi (2022-07-02 11:51:53)
> 
> 
> On 2022-07-02 02:12 pm, Anton Khirnov wrote:
> > Quoting Gyan Doshi (2022-07-01 13:03:04)
> >>
> >> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
> >>> Quoting Gyan Doshi (2022-06-25 10:29:51)
> >>>> This is a per-file input option that adjusts an input's timestamps
> >>>> with reference to another input, so that emitted packet timestamps
> >>>> account for the difference between the start times of the two inputs.
> >>>>
> >>>> Typical use case is to sync two or more live inputs such as from capture
> >>>> devices. Both the target and reference input source timestamps should be
> >>>> based on the same clock source.
> >>> If both streams are using the same clock, then why is any extra
> >>> synchronization needed?
> >> Because ffmpeg.c normalizes timestamps by default. We can keep
> >> timestamps using -copyts, but these inputs are usually preprocessed
> >> using single-input filters which won't have access to the reference
> >> inputs,
> > No idea what you mean by "reference inputs" here.
> 
> The reference input is the one the target is being synced against. e.g. 
> in a karaoke session -  the music track from a DAW would be ref and the 
> user's voice via mic is the target.
> 
> >> or the merge filters like e.g. amix don't sync by timestamp.
> > amix does seem to look at timestamps.
> 
> amix does not *sync* by timestamp. If one input starts at 4 and the 
> other at 7, the 2nd isn't aligned by timestamp.

So maybe it should?

My concern generally with this patchset is that it seems like you're
changing things where it's easier to do rather than where it's correct.

Anton Khirnov July 4, 2022, 6:29 a.m. UTC | #9

Quoting Gyan Doshi (2022-07-04 05:47:31)
> 
> 
> On 2022-07-02 03:21 pm, Gyan Doshi wrote:
> >
> >
> > On 2022-07-02 02:12 pm, Anton Khirnov wrote:
> >> Quoting Gyan Doshi (2022-07-01 13:03:04)
> >>>
> >>> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
> >>>> Quoting Gyan Doshi (2022-06-25 10:29:51)
> >>>>> This is a per-file input option that adjusts an input's timestamps
> >>>>> with reference to another input, so that emitted packet timestamps
> >>>>> account for the difference between the start times of the two inputs.
> >>>>>
> >>>>> Typical use case is to sync two or more live inputs such as from 
> >>>>> capture
> >>>>> devices. Both the target and reference input source timestamps 
> >>>>> should be
> >>>>> based on the same clock source.
> >>>> If both streams are using the same clock, then why is any extra
> >>>> synchronization needed?
> >>> Because ffmpeg.c normalizes timestamps by default. We can keep
> >>> timestamps using -copyts, but these inputs are usually preprocessed
> >>> using single-input filters which won't have access to the reference
> >>> inputs,
> >> No idea what you mean by "reference inputs" here.
> >
> > The reference input is the one the target is being synced against. 
> > e.g. in a karaoke session -  the music track from a DAW would be ref 
> > and the user's voice via mic is the target.
> >
> >>> or the merge filters like e.g. amix don't sync by timestamp.
> >> amix does seem to look at timestamps.
> >
> > amix does not *sync* by timestamp. If one input starts at 4 and the 
> > other at 7, the 2nd isn't aligned by timestamp.
> 
> If any further comments or objections, let me know.
> 
> Plan to push set tonight.

Could you please stop constantly threatening to push things when I don't
reply for a day? The traffic on the ML is insane, sometimes I'd like to
work on my own code, and maybe even have a life outside of ffmpeg.

You want to add two new public interfaces, which means we as a project
commit to maintaining them indefinitely. Changing or removing such
things once they are in is a long and arduous process, so they should
not be rushed.

Gyan Doshi July 4, 2022, 8:20 a.m. UTC | #10

On 2022-07-04 11:51 am, Anton Khirnov wrote:
> Quoting Gyan Doshi (2022-07-02 11:51:53)
>>
>> On 2022-07-02 02:12 pm, Anton Khirnov wrote:
>>> Quoting Gyan Doshi (2022-07-01 13:03:04)
>>>> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
>>>>> Quoting Gyan Doshi (2022-06-25 10:29:51)
>>>>>> This is a per-file input option that adjusts an input's timestamps
>>>>>> with reference to another input, so that emitted packet timestamps
>>>>>> account for the difference between the start times of the two inputs.
>>>>>>
>>>>>> Typical use case is to sync two or more live inputs such as from capture
>>>>>> devices. Both the target and reference input source timestamps should be
>>>>>> based on the same clock source.
>>>>> If both streams are using the same clock, then why is any extra
>>>>> synchronization needed?
>>>> Because ffmpeg.c normalizes timestamps by default. We can keep
>>>> timestamps using -copyts, but these inputs are usually preprocessed
>>>> using single-input filters which won't have access to the reference
>>>> inputs,
>>> No idea what you mean by "reference inputs" here.
>> The reference input is the one the target is being synced against. e.g.
>> in a karaoke session -  the music track from a DAW would be ref and the
>> user's voice via mic is the target.
>>
>>>> or the merge filters like e.g. amix don't sync by timestamp.
>>> amix does seem to look at timestamps.
>> amix does not *sync* by timestamp. If one input starts at 4 and the
>> other at 7, the 2nd isn't aligned by timestamp.
> So maybe it should?
>
> My concern generally with this patchset is that it seems like you're
> changing things where it's easier to do rather than where it's correct.

There are many multi=input filters which may be used. amix is just one 
example.

The basic 'deficiency' here is that filters operate upon frames and only 
look at single frames for the most part, even though frames are part of 
streams. These streams may have companion streams (which may be part of 
programs) which are part of a single input. These inputs may have 
companion inputs.  Anything in this tree may be relevant for a 
particular operation as a reference, e.g. we have a bespoke filter 
scale2ref so that we can look at another stream's frames. But we don't 
have pad2ref, crop2ref ..etc.  So, the absolutely correct thing to do 
would be to supply a global context to processing modules like 
filtergraphs , maybe an array of dicts, containing attributes of all 
inputs like starting time stamps, resolution, string metadata..etc. That 
would obviate need for these bespoke fields and even filters.

But that's a much larger design undertaking and I'm just addressing one 
specific practical need here. This patch is currently being used 
successfully by commercial users in a private build. Many users have 
posted to ffmpeg-users and popular forums over the years asking for 
something that achieves this.

Actually, this functionality sounds like it sort of existed earlier in 
the form of map sync (i.e. -map 1:a,0:a:1). Although the assignment 
syntax still remains (and doesn't warn/error out),  it's a no-op now 
since the application code was removed in 2012 by Michael, who said he 
based it off an idea from one of your commits, presumably in Libav.

Regards,
Gyan

Anton Khirnov July 5, 2022, 4:15 p.m. UTC | #11

Quoting Gyan Doshi (2022-07-04 10:20:22)
> 
> 
> On 2022-07-04 11:51 am, Anton Khirnov wrote:
> > Quoting Gyan Doshi (2022-07-02 11:51:53)
> >>
> >> On 2022-07-02 02:12 pm, Anton Khirnov wrote:
> >>> Quoting Gyan Doshi (2022-07-01 13:03:04)
> >>>> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
> >>>>> Quoting Gyan Doshi (2022-06-25 10:29:51)
> >>>>>> This is a per-file input option that adjusts an input's timestamps
> >>>>>> with reference to another input, so that emitted packet timestamps
> >>>>>> account for the difference between the start times of the two inputs.
> >>>>>>
> >>>>>> Typical use case is to sync two or more live inputs such as from capture
> >>>>>> devices. Both the target and reference input source timestamps should be
> >>>>>> based on the same clock source.
> >>>>> If both streams are using the same clock, then why is any extra
> >>>>> synchronization needed?
> >>>> Because ffmpeg.c normalizes timestamps by default. We can keep
> >>>> timestamps using -copyts, but these inputs are usually preprocessed
> >>>> using single-input filters which won't have access to the reference
> >>>> inputs,
> >>> No idea what you mean by "reference inputs" here.
> >> The reference input is the one the target is being synced against. e.g.
> >> in a karaoke session -  the music track from a DAW would be ref and the
> >> user's voice via mic is the target.
> >>
> >>>> or the merge filters like e.g. amix don't sync by timestamp.
> >>> amix does seem to look at timestamps.
> >> amix does not *sync* by timestamp. If one input starts at 4 and the
> >> other at 7, the 2nd isn't aligned by timestamp.
> > So maybe it should?
> >
> > My concern generally with this patchset is that it seems like you're
> > changing things where it's easier to do rather than where it's correct.
> 
> There are many multi=input filters which may be used. amix is just one 
> example.
> 
> The basic 'deficiency' here is that filters operate upon frames and only 
> look at single frames for the most part, even though frames are part of 
> streams. These streams may have companion streams (which may be part of 
> programs) which are part of a single input. These inputs may have 
> companion inputs.  Anything in this tree may be relevant for a 
> particular operation as a reference, e.g. we have a bespoke filter 
> scale2ref so that we can look at another stream's frames. But we don't 
> have pad2ref, crop2ref ..etc.  So, the absolutely correct thing to do 
> would be to supply a global context to processing modules like 
> filtergraphs , maybe an array of dicts, containing attributes of all 
> inputs like starting time stamps, resolution, string metadata..etc. That 
> would obviate need for these bespoke fields and even filters.

I don't see how the second paragraph relates to the first one. scale,
pad, or crop are not multi-input filters, so why are you comparing them
to amix? I don't think there are so many multi-input filters in lavfi,
and the issue should be solvable using the same code for all of them.

Since both relevant streams are visible to the filter, no global context
of any kind should be needed.

> 
> But that's a much larger design undertaking and I'm just addressing one 
> specific practical need here. This patch is currently being used 
> successfully by commercial users in a private build. Many users have 
> posted to ffmpeg-users and popular forums over the years asking for 
> something that achieves this.
> 
> Actually, this functionality sounds like it sort of existed earlier in 
> the form of map sync (i.e. -map 1:a,0:a:1). Although the assignment 
> syntax still remains (and doesn't warn/error out),  it's a no-op now 
> since the application code was removed in 2012 by Michael, who said he 
> based it off an idea from one of your commits, presumably in Libav.

So why are you not restoring that functionality and adding a new option
instead?

Gyan Doshi July 5, 2022, 5:10 p.m. UTC | #12

On 2022-07-05 09:45 pm, Anton Khirnov wrote:
> Quoting Gyan Doshi (2022-07-04 10:20:22)
>>
>> On 2022-07-04 11:51 am, Anton Khirnov wrote:
>>> Quoting Gyan Doshi (2022-07-02 11:51:53)
>>>> On 2022-07-02 02:12 pm, Anton Khirnov wrote:
>>>>> Quoting Gyan Doshi (2022-07-01 13:03:04)
>>>>>> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
>>>>>>> Quoting Gyan Doshi (2022-06-25 10:29:51)
>>>>>>>> This is a per-file input option that adjusts an input's timestamps
>>>>>>>> with reference to another input, so that emitted packet timestamps
>>>>>>>> account for the difference between the start times of the two inputs.
>>>>>>>>
>>>>>>>> Typical use case is to sync two or more live inputs such as from capture
>>>>>>>> devices. Both the target and reference input source timestamps should be
>>>>>>>> based on the same clock source.
>>>>>>> If both streams are using the same clock, then why is any extra
>>>>>>> synchronization needed?
>>>>>> Because ffmpeg.c normalizes timestamps by default. We can keep
>>>>>> timestamps using -copyts, but these inputs are usually preprocessed
>>>>>> using single-input filters which won't have access to the reference
>>>>>> inputs,
>>>>> No idea what you mean by "reference inputs" here.
>>>> The reference input is the one the target is being synced against. e.g.
>>>> in a karaoke session -  the music track from a DAW would be ref and the
>>>> user's voice via mic is the target.
>>>>
>>>>>> or the merge filters like e.g. amix don't sync by timestamp.
>>>>> amix does seem to look at timestamps.
>>>> amix does not *sync* by timestamp. If one input starts at 4 and the
>>>> other at 7, the 2nd isn't aligned by timestamp.
>>> So maybe it should?
>>>
>>> My concern generally with this patchset is that it seems like you're
>>> changing things where it's easier to do rather than where it's correct.
>> There are many multi=input filters which may be used. amix is just one
>> example.
>>
>> The basic 'deficiency' here is that filters operate upon frames and only
>> look at single frames for the most part, even though frames are part of
>> streams. These streams may have companion streams (which may be part of
>> programs) which are part of a single input. These inputs may have
>> companion inputs.  Anything in this tree may be relevant for a
>> particular operation as a reference, e.g. we have a bespoke filter
>> scale2ref so that we can look at another stream's frames. But we don't
>> have pad2ref, crop2ref ..etc.  So, the absolutely correct thing to do
>> would be to supply a global context to processing modules like
>> filtergraphs , maybe an array of dicts, containing attributes of all
>> inputs like starting time stamps, resolution, string metadata..etc. That
>> would obviate need for these bespoke fields and even filters.
> I don't see how the second paragraph relates to the first one. scale,
> pad, or crop are not multi-input filters, so why are you comparing them

scale is a singe-input filter but scale2ref is a multi-input filter 
which is needed solely because there is no means at present to convey 
info about other streams to a single input filter.
Similarly, we would need a crop2ref, pad2ref..etc to achieve the same 
attribute transfer.  If we had a global context, these counterpart 
filters wouldn't be necessary.


> to amix? I don't think there are so many multi-input filters in lavfi,
> and the issue should be solvable using the same code for all of them.

Because reference about other streams isn't helpful only at the point of 
multi-filter use. One of the streams may want to be resampled to a 
specific rate or sample format based on some user's logic instead of 
letting amix choose one. That's where a global context would help.

>> Actually, this functionality sounds like it sort of existed earlier in
>> the form of map sync (i.e. -map 1:a,0:a:1). Although the assignment
>> syntax still remains (and doesn't warn/error out),  it's a no-op now
>> since the application code was removed in 2012 by Michael, who said he
>> based it off an idea from one of your commits, presumably in Libav.
> So why are you not restoring that functionality and adding a new option
> instead?

I said 'sort of'. That adjustment was implemented in do_video/audio_out, 
so it won't help in filtering, or streamcopying.
This current option adjusts just after demux, so it doesn't have those 
limitations.

Regards,
Gyan

Anton Khirnov July 5, 2022, 5:24 p.m. UTC | #13

Quoting Gyan Doshi (2022-07-05 19:10:33)
> 
> 
> On 2022-07-05 09:45 pm, Anton Khirnov wrote:
> > Quoting Gyan Doshi (2022-07-04 10:20:22)
> >>
> >> On 2022-07-04 11:51 am, Anton Khirnov wrote:
> >>> Quoting Gyan Doshi (2022-07-02 11:51:53)
> >>>> On 2022-07-02 02:12 pm, Anton Khirnov wrote:
> >>>>> Quoting Gyan Doshi (2022-07-01 13:03:04)
> >>>>>> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
> >>>>>>> Quoting Gyan Doshi (2022-06-25 10:29:51)
> >>>>>>>> This is a per-file input option that adjusts an input's timestamps
> >>>>>>>> with reference to another input, so that emitted packet timestamps
> >>>>>>>> account for the difference between the start times of the two inputs.
> >>>>>>>>
> >>>>>>>> Typical use case is to sync two or more live inputs such as from capture
> >>>>>>>> devices. Both the target and reference input source timestamps should be
> >>>>>>>> based on the same clock source.
> >>>>>>> If both streams are using the same clock, then why is any extra
> >>>>>>> synchronization needed?
> >>>>>> Because ffmpeg.c normalizes timestamps by default. We can keep
> >>>>>> timestamps using -copyts, but these inputs are usually preprocessed
> >>>>>> using single-input filters which won't have access to the reference
> >>>>>> inputs,
> >>>>> No idea what you mean by "reference inputs" here.
> >>>> The reference input is the one the target is being synced against. e.g.
> >>>> in a karaoke session -  the music track from a DAW would be ref and the
> >>>> user's voice via mic is the target.
> >>>>
> >>>>>> or the merge filters like e.g. amix don't sync by timestamp.
> >>>>> amix does seem to look at timestamps.
> >>>> amix does not *sync* by timestamp. If one input starts at 4 and the
> >>>> other at 7, the 2nd isn't aligned by timestamp.
> >>> So maybe it should?
> >>>
> >>> My concern generally with this patchset is that it seems like you're
> >>> changing things where it's easier to do rather than where it's correct.
> >> There are many multi=input filters which may be used. amix is just one
> >> example.
> >>
> >> The basic 'deficiency' here is that filters operate upon frames and only
> >> look at single frames for the most part, even though frames are part of
> >> streams. These streams may have companion streams (which may be part of
> >> programs) which are part of a single input. These inputs may have
> >> companion inputs.  Anything in this tree may be relevant for a
> >> particular operation as a reference, e.g. we have a bespoke filter
> >> scale2ref so that we can look at another stream's frames. But we don't
> >> have pad2ref, crop2ref ..etc.  So, the absolutely correct thing to do
> >> would be to supply a global context to processing modules like
> >> filtergraphs , maybe an array of dicts, containing attributes of all
> >> inputs like starting time stamps, resolution, string metadata..etc. That
> >> would obviate need for these bespoke fields and even filters.
> > I don't see how the second paragraph relates to the first one. scale,
> > pad, or crop are not multi-input filters, so why are you comparing them
> 
> scale is a singe-input filter but scale2ref is a multi-input filter 
> which is needed solely because there is no means at present to convey 
> info about other streams to a single input filter.
> Similarly, we would need a crop2ref, pad2ref..etc to achieve the same 
> attribute transfer.  If we had a global context, these counterpart 
> filters wouldn't be necessary.

In my experience, global *anything* is almost always a sign of bad
design and only leads to pain and suffering. The proper solution in this
case would be making the filtergraph construction API more flexible.
Then the code that actually has all the necessary information (i.e.
ffmpeg.c or other library caller) would set the filter parameters
however you want. Then none of these whatever2ref hacks would be needed.

Gyan Doshi July 6, 2022, 4:16 a.m. UTC | #14

On 2022-07-05 10:54 pm, Anton Khirnov wrote:
> Quoting Gyan Doshi (2022-07-05 19:10:33)
>>
>> On 2022-07-05 09:45 pm, Anton Khirnov wrote:
>>> Quoting Gyan Doshi (2022-07-04 10:20:22)
>>>> On 2022-07-04 11:51 am, Anton Khirnov wrote:
>>>>> Quoting Gyan Doshi (2022-07-02 11:51:53)
>>>>>> On 2022-07-02 02:12 pm, Anton Khirnov wrote:
>>>>>>> Quoting Gyan Doshi (2022-07-01 13:03:04)
>>>>>>>> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
>>>>>>>>> Quoting Gyan Doshi (2022-06-25 10:29:51)
>>>>>>>>>> This is a per-file input option that adjusts an input's timestamps
>>>>>>>>>> with reference to another input, so that emitted packet timestamps
>>>>>>>>>> account for the difference between the start times of the two inputs.
>>>>>>>>>>
>>>>>>>>>> Typical use case is to sync two or more live inputs such as from capture
>>>>>>>>>> devices. Both the target and reference input source timestamps should be
>>>>>>>>>> based on the same clock source.
>>>>>>>>> If both streams are using the same clock, then why is any extra
>>>>>>>>> synchronization needed?
>>>>>>>> Because ffmpeg.c normalizes timestamps by default. We can keep
>>>>>>>> timestamps using -copyts, but these inputs are usually preprocessed
>>>>>>>> using single-input filters which won't have access to the reference
>>>>>>>> inputs,
>>>>>>> No idea what you mean by "reference inputs" here.
>>>>>> The reference input is the one the target is being synced against. e.g.
>>>>>> in a karaoke session -  the music track from a DAW would be ref and the
>>>>>> user's voice via mic is the target.
>>>>>>
>>>>>>>> or the merge filters like e.g. amix don't sync by timestamp.
>>>>>>> amix does seem to look at timestamps.
>>>>>> amix does not *sync* by timestamp. If one input starts at 4 and the
>>>>>> other at 7, the 2nd isn't aligned by timestamp.
>>>>> So maybe it should?
>>>>>
>>>>> My concern generally with this patchset is that it seems like you're
>>>>> changing things where it's easier to do rather than where it's correct.
>>>> There are many multi=input filters which may be used. amix is just one
>>>> example.
>>>>
>>>> The basic 'deficiency' here is that filters operate upon frames and only
>>>> look at single frames for the most part, even though frames are part of
>>>> streams. These streams may have companion streams (which may be part of
>>>> programs) which are part of a single input. These inputs may have
>>>> companion inputs.  Anything in this tree may be relevant for a
>>>> particular operation as a reference, e.g. we have a bespoke filter
>>>> scale2ref so that we can look at another stream's frames. But we don't
>>>> have pad2ref, crop2ref ..etc.  So, the absolutely correct thing to do
>>>> would be to supply a global context to processing modules like
>>>> filtergraphs , maybe an array of dicts, containing attributes of all
>>>> inputs like starting time stamps, resolution, string metadata..etc. That
>>>> would obviate need for these bespoke fields and even filters.
>>> I don't see how the second paragraph relates to the first one. scale,
>>> pad, or crop are not multi-input filters, so why are you comparing them
>> scale is a singe-input filter but scale2ref is a multi-input filter
>> which is needed solely because there is no means at present to convey
>> info about other streams to a single input filter.
>> Similarly, we would need a crop2ref, pad2ref..etc to achieve the same
>> attribute transfer.  If we had a global context, these counterpart
>> filters wouldn't be necessary.
> In my experience, global *anything* is almost always a sign of bad
> design and only leads to pain and suffering. The proper solution in this
> case would be making the filtergraph construction API more flexible.
> Then the code that actually has all the necessary information (i.e.
> ffmpeg.c or other library caller) would set the filter parameters
> however you want. Then none of these whatever2ref hacks would be needed.

Some of the context data will be used by filters during runtime. So, a 
flexible API could help during init but not afterwards. The context has 
to be accessible during lifetime of filters.

About this patch, the user can already add a custom ts offset to an 
input but it has to be a pre-specified fixed constant. This patch allows 
the user to set one relative to another input. That can only be done in 
ffmpeg.c after all inputs have been opened.

Regards,
Gyan

[FFmpeg-devel,v2,2/2] ffmpeg: add option -isync

Checks

Commit Message

Comments

Patch