Message ID | 20240221073915.26055-1-allancady@yahoo.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel] libavutil/timestamp.h: Fix loss of precision in timestamps for silencedetect on long files | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
On Tue, 20 Feb 2024, Allan Cady via ffmpeg-devel wrote: > When the silencedetect audio filter is run against long files, the > output timestamps gradually lose precision as the scan proceeds further > into the file. This is because the output format specifier ("%.6g" in > libavutil/timestamp.h) limits the total field width to six significant > digits. As the offset into the file increases, digits drop off the end, > until eventually, for offsets greater than 100000 seconds (about 28 > hours), fractions of a second disappear altogether, and the timestamps > are logged as whole seconds. > > This patch changes the format to "%.6f" for silencedetect, which will > give microsecond precision for all timestamps regardless of offset. > > libavutil/timestamp.h exposes a macro, av_ts2timestr, as the public > interface. This macro was used by silencedetect.c, as well as other > source files. In order to fix the issue for silencedetect without > affecting other files and tests, I have added a new macro, > av_ts2timestr_fixed_precision, which uses the new format specifier. > The original av_ts_make_time_string remains, with the original > behavior. I'd rather just to fix av_ts_make_string to not limit the number of significant digits. Something like: 1) Print the number in decimal notation with at most 6 fractional digits. 2) Use less fractional digits if the first format would not fit into AV_TS_MAX_STRING_SIZE. 3) Use scientific notation if the second format would not fit into AV_TS_MAX_STRINT_SIZE. Regards, Marton
I had a similar thought, as all timestamps would have the same issue. This is my first contribution here, and I don't know the code very well, so I was being cautious. I'm open to expanding the scope, but I'm sure I would need some help doing it right and not breaking things. For starters, I'm curious why there are two functions & macros: av_ts2str/av_ts_make_string (which used "%" format specifier) av_ts2timestr/av_ts_make_time_string (which used "%6g") Do you know the rationale for that? I see that only av_ts2timestr is used in silencedetect.c. And are you suggesting we should fold those two functions into one? I did notice something in the output from silencedetect. After I made my change, I see the output is now: frame:92404 pts:53224175 pts_time:2413.79 lavfi.silence_start=2411.120272 frame:92411 pts:53228207 pts_time:2413.98 lavfi.silence_end=2413.992744 lavfi.silence_duration=2.872472 I see that the pts_time values still have the original formatting. I don't know what pts_time is, or where those lines are coming from. Seems like maybe those should have fixed precision as well. Guidance for a noob please? Thanks. (P.S. Can you tell me, when I reply to the list (as opposed to patch submission using git send-email), how should I address the email? Obviously it should go to ffmpeg-devel@ffmpeg.org, but should I include you as a recipient, or as a cc:, or only to the list? or is there some other way it gets directed to you? Any other guidance on how to format email? Thanks.) On Wednesday, February 21, 2024 at 12:25:23 PM PST, Marton Balint <cus@passwd.hu> wrote: On Tue, 20 Feb 2024, Allan Cady via ffmpeg-devel wrote: > When the silencedetect audio filter is run against long files, the > output timestamps gradually lose precision as the scan proceeds further > into the file. This is because the output format specifier ("%.6g" in > libavutil/timestamp.h) limits the total field width to six significant > digits. As the offset into the file increases, digits drop off the end, > until eventually, for offsets greater than 100000 seconds (about 28 > hours), fractions of a second disappear altogether, and the timestamps > are logged as whole seconds. > > This patch changes the format to "%.6f" for silencedetect, which will > give microsecond precision for all timestamps regardless of offset. > > libavutil/timestamp.h exposes a macro, av_ts2timestr, as the public > interface. This macro was used by silencedetect.c, as well as other > source files. In order to fix the issue for silencedetect without > affecting other files and tests, I have added a new macro, > av_ts2timestr_fixed_precision, which uses the new format specifier. > The original av_ts_make_time_string remains, with the original > behavior. I'd rather just to fix av_ts_make_string to not limit the number of significant digits. Something like: 1) Print the number in decimal notation with at most 6 fractional digits. 2) Use less fractional digits if the first format would not fit into AV_TS_MAX_STRING_SIZE. 3) Use scientific notation if the second format would not fit into AV_TS_MAX_STRINT_SIZE. Regards, Marton
On Thu, 22 Feb 2024, Allan Cady via ffmpeg-devel wrote: > I had a similar thought, as all timestamps would have the same issue. > > This is my first contribution here, and I don't know the code very well, > so I was being cautious. I'm open to expanding the scope, but I'm sure I > would need some help doing it right and not breaking things. > > For starters, I'm curious why there are two functions & macros: > > av_ts2str/av_ts_make_string (which used "%" format specifier) That takes a 64-bit integer timestamp and is actually using "%"PRId64 because that is the correct (portable) format string for an int64_t variable. > av_ts2timestr/av_ts_make_time_string (which used "%6g") That takes an integer timestamp and a rational time base. Float timestamps (in seconds) is calculated by multiplying the two, that is what is printed. > > Do you know the rationale for that? I see that only av_ts2timestr is used in silencedetect.c. > > And are you suggesting we should fold those two functions into one? No, they have different purpose. The first prints out a timestamps which can be in any time base. The second prints out a timestamp in seconds. > > I did notice something in the output from silencedetect. After I made my > change, I see the output is now: > > > frame:92404 pts:53224175 pts_time:2413.79 > lavfi.silence_start=2411.120272 > frame:92411 pts:53228207 pts_time:2413.98 > lavfi.silence_end=2413.992744 > lavfi.silence_duration=2.872472 > > > I see that the pts_time values still have the original formatting. I > don't know what pts_time is, or where those lines are coming from. Seems > like maybe those should have fixed precision as well. Well, that is likely using the same function, but you only fixed silencedetect, right? > > Guidance for a noob please? Thanks. > > (P.S. Can you tell me, when I reply to the list (as opposed to patch > submission using git send-email), how should I address the email? > Obviously it should go to ffmpeg-devel@ffmpeg.org, but should I include > you as a recipient, or as a cc:, or only to the list? or is there some > other way it gets directed to you? Any other guidance on how to format > email? Thanks.) I don't think there is a rule, I have seen it happen both with or without CC. You don't need to CC me though, as I am a regular on the list, but others may have other preference. Regards, Marton PS: Please avoid top positing in you replies.
On Thursday, February 22, 2024 at 01:16:19 AM PST, Marton Balint <cus@passwd.hu> wrote: >> For starters, I'm curious why there are two functions & macros:>>>> av_ts2str/av_ts_make_string (which used "%" format specifier)> > That takes a 64-bit integer timestamp and is actually using "%"PRId64 > because that is the correct (portable) format string for an int64_t > variable.> >> av_ts2timestr/av_ts_make_time_string (which used "%6g")> > That takes an integer timestamp and a rational time base. Float timestamps > (in seconds) is calculated by multiplying the two, that is what is > printed.> >>>> Do you know the rationale for that? I see that only av_ts2timestr is used in silencedetect.c.>>>> And are you suggesting we should fold those two functions into one?> > No, they have different purpose. The first prints out a timestamps which > can be in any time base. The second prints out a timestamp in seconds. Understood, I think I'm up to speed now. av_ts2str prints a 64-bit integer timestamp;av_ts2timestr prints a floating point timestamp in seconds with the timebase applied. In your previous email, you said: > I'd rather just to fix av_ts_make_string to not limit the number of > significant digits. Something like:>> 1) Print the number in decimal notation with at most 6 fractional digits. > 2) Use less fractional digits if the first format would not fit into > AV_TS_MAX_STRING_SIZE.> 3) Use scientific notation if the second format would not fit into > AV_TS_MAX_STRINT_SIZE. I think you probably meant to say "I'd rather just to fixav_ts_make_time_string" (not av_ts_make_string)?Since it's av_ts_make_time_string() that's formatting floating point. So it makes sense to me to make the change to av_ts_make_time_string()for all timestamps, as you suggest. As for how specifically to format them, I'm fine with whatever you think is best, and I'm happy to work on this, but theimplementation has me a bit stumped for the moment, and I may need somehelp with it. My C language skills are rusty to say the least. It also occurs to me to wonder, would this warrant a formal problem ticket? I haven't looked into how that works for ffmpeg. Thanks for your help.
[Apologies for the awful mess in the previous email. Trying again with straight text.] On Thursday, February 22, 2024 at 01:16:19 AM PST, Marton Balint <cus@passwd.hu> wrote: >> For starters, I'm curious why there are two functions & macros: >> >> av_ts2str/av_ts_make_string (which used "%" format specifier) > > That takes a 64-bit integer timestamp and is actually using "%"PRId64 > because that is the correct (portable) format string for an int64_t > variable. > >> av_ts2timestr/av_ts_make_time_string (which used "%6g") > > That takes an integer timestamp and a rational time base. Float timestamps > (in seconds) is calculated by multiplying the two, that is what is > printed. > >> >> Do you know the rationale for that? I see that only av_ts2timestr is used in silencedetect.c. >> >> And are you suggesting we should fold those two functions into one? > > No, they have different purpose. The first prints out a timestamps which > can be in any time base. The second prints out a timestamp in seconds. Understood, I think I'm up to speed now. av_ts2str prints a 64-bit integer timestamp; av_ts2timestr prints a floating point timestamp in seconds with the timebase applied. In your previous email, you said: > I'd rather just to fix av_ts_make_string to not limit the number of > significant digits. Something like: > > 1) Print the number in decimal notation with at most 6 fractional digits. > 2) Use less fractional digits if the first format would not fit into > AV_TS_MAX_STRING_SIZE. > 3) Use scientific notation if the second format would not fit into > AV_TS_MAX_STRINT_SIZE. I think you probably meant to say "I'd rather just to fix av_ts_make_time_string" (not av_ts_make_string)? Since it's av_ts_make_time_string() that's formatting floating point. So it makes sense to me to make the change to av_ts_make_time_string() for all timestamps, as you suggest. As for how specifically to format them, I'm fine with whatever you think is best, and I'm happy to work on this, but the implementation has me a bit stumped for the moment, and I may need some help with it. My C language skills are rusty to say the least. It also occurs to me to wonder, would this warrant a formal problem ticket? I haven't looked into how that works for ffmpeg. Thanks for your help.
On Fri, 23 Feb 2024, Allan Cady via ffmpeg-devel wrote: > [Apologies for the awful mess in the previous email. Trying again with straight text.] > > On Thursday, February 22, 2024 at 01:16:19 AM PST, Marton Balint <cus@passwd.hu> wrote: > >>> For starters, I'm curious why there are two functions & macros: >>> >>> av_ts2str/av_ts_make_string (which used "%" format specifier) >> >> That takes a 64-bit integer timestamp and is actually using "%"PRId64 >> because that is the correct (portable) format string for an int64_t >> variable. >> >>> av_ts2timestr/av_ts_make_time_string (which used "%6g") >> >> That takes an integer timestamp and a rational time base. Float timestamps >> (in seconds) is calculated by multiplying the two, that is what is >> printed. >> >>> >>> Do you know the rationale for that? I see that only av_ts2timestr is used in silencedetect.c. >>> >>> And are you suggesting we should fold those two functions into one? >> >> No, they have different purpose. The first prints out a timestamps which >> can be in any time base. The second prints out a timestamp in seconds. > > > Understood, I think I'm up to speed now. av_ts2str prints a 64-bit integer timestamp; > av_ts2timestr prints a floating point timestamp in seconds with the timebase applied. > > > In your previous email, you said: > > >> I'd rather just to fix av_ts_make_string to not limit the number of >> significant digits. Something like: >> >> 1) Print the number in decimal notation with at most 6 fractional digits. >> 2) Use less fractional digits if the first format would not fit into >> AV_TS_MAX_STRING_SIZE. >> 3) Use scientific notation if the second format would not fit into >> AV_TS_MAX_STRINT_SIZE. > > > I think you probably meant to say "I'd rather just to fix > av_ts_make_time_string" (not av_ts_make_string)? > Since it's av_ts_make_time_string() that's formatting floating point. Yes, indeed. > > So it makes sense to me to make the change to av_ts_make_time_string() > for all timestamps, as you suggest. As for how specifically to format them, > I'm fine with whatever you think is best, and I'm happy to work on this, but the > implementation has me a bit stumped for the moment, and I may need some > help with it. My C language skills are rusty to say the least. The simplest way is to try snprintf() with 6 fractional digits, and check the return value to see how long the string would become. Based on this you can either accept snprintf result and truncate trailing zeros and dots, or try snprintf() with less fractional digits and truncate, or fall back to scientific format. > > It also occurs to me to wonder, would this warrant a formal problem ticket? > I haven't looked into how that works for ffmpeg. Opening a ticket for the problem is not required, but if your patch fixes an existing ticket, then you should mention the ticket number in the commit message. Regards, Marton
Greetings Martin et al, I've been trying to resubmit this patch based on your earlier suggestions. Most of the tools in the toolchain for this are new to me, so it's been awkward going. I did finally get the email sent with the new patch, but for some reason I haven't been able to figure out yet, it's going to the wrong thread. Here's the link to the message with the patch. https://ffmpeg.org/pipermail/ffmpeg-devel/2024-March/323170.html Looking forward to comments and to hopefully getting this approved and into the codebase. Allan On Friday, February 23, 2024 at 02:47:27 PM PST, Marton Balint <cus@passwd.hu> wrote: On Fri, 23 Feb 2024, Allan Cady via ffmpeg-devel wrote: > [Apologies for the awful mess in the previous email. Trying again with straight text.] > > On Thursday, February 22, 2024 at 01:16:19 AM PST, Marton Balint <cus@passwd.hu> wrote: > >>> For starters, I'm curious why there are two functions & macros: >>> >>> av_ts2str/av_ts_make_string (which used "%" format specifier) >> >> That takes a 64-bit integer timestamp and is actually using "%"PRId64 >> because that is the correct (portable) format string for an int64_t >> variable. >> >>> av_ts2timestr/av_ts_make_time_string (which used "%6g") >> >> That takes an integer timestamp and a rational time base. Float timestamps >> (in seconds) is calculated by multiplying the two, that is what is >> printed. >> >>> >>> Do you know the rationale for that? I see that only av_ts2timestr is used in silencedetect.c. >>> >>> And are you suggesting we should fold those two functions into one? >> >> No, they have different purpose. The first prints out a timestamps which >> can be in any time base. The second prints out a timestamp in seconds. > > > Understood, I think I'm up to speed now. av_ts2str prints a 64-bit integer timestamp; > av_ts2timestr prints a floating point timestamp in seconds with the timebase applied. > > > In your previous email, you said: > > >> I'd rather just to fix av_ts_make_string to not limit the number of >> significant digits. Something like: >> >> 1) Print the number in decimal notation with at most 6 fractional digits. >> 2) Use less fractional digits if the first format would not fit into >> AV_TS_MAX_STRING_SIZE. >> 3) Use scientific notation if the second format would not fit into >> AV_TS_MAX_STRINT_SIZE. > > > I think you probably meant to say "I'd rather just to fix > av_ts_make_time_string" (not av_ts_make_string)? > Since it's av_ts_make_time_string() that's formatting floating point. Yes, indeed. > > So it makes sense to me to make the change to av_ts_make_time_string() > for all timestamps, as you suggest. As for how specifically to format them, > I'm fine with whatever you think is best, and I'm happy to work on this, but the > implementation has me a bit stumped for the moment, and I may need some > help with it. My C language skills are rusty to say the least. The simplest way is to try snprintf() with 6 fractional digits, and check the return value to see how long the string would become. Based on this you can either accept snprintf result and truncate trailing zeros and dots, or try snprintf() with less fractional digits and truncate, or fall back to scientific format. > > It also occurs to me to wonder, would this warrant a formal problem ticket? > I haven't looked into how that works for ffmpeg. Opening a ticket for the problem is not required, but if your patch fixes an existing ticket, then you should mention the ticket number in the commit message. Regards, Marton
diff --git a/libavfilter/af_silencedetect.c b/libavfilter/af_silencedetect.c index 845c65bfed..f1a8096540 100644 --- a/libavfilter/af_silencedetect.c +++ b/libavfilter/af_silencedetect.c @@ -86,11 +86,11 @@ static av_always_inline void update(SilenceDetectContext *s, AVFrame *insamples, s->start[channel] = insamples->pts + av_rescale_q(current_sample / s->channels + 1 - nb_samples_notify * s->independent_channels / s->channels, (AVRational){ 1, s->last_sample_rate }, time_base); set_meta(insamples, s->mono ? channel + 1 : 0, "silence_start", - av_ts2timestr(s->start[channel], &time_base)); + av_ts2timestr_fixed_precision(s->start[channel], &time_base)); if (s->mono) av_log(s, AV_LOG_INFO, "channel: %d | ", channel); av_log(s, AV_LOG_INFO, "silence_start: %s\n", - av_ts2timestr(s->start[channel], &time_base)); + av_ts2timestr_fixed_precision(s->start[channel], &time_base)); } } } else { @@ -101,15 +101,15 @@ static av_always_inline void update(SilenceDetectContext *s, AVFrame *insamples, int64_t duration_ts = end_pts - s->start[channel]; if (insamples) { set_meta(insamples, s->mono ? channel + 1 : 0, "silence_end", - av_ts2timestr(end_pts, &time_base)); + av_ts2timestr_fixed_precision(end_pts, &time_base)); set_meta(insamples, s->mono ? channel + 1 : 0, "silence_duration", - av_ts2timestr(duration_ts, &time_base)); + av_ts2timestr_fixed_precision(duration_ts, &time_base)); } if (s->mono) av_log(s, AV_LOG_INFO, "channel: %d | ", channel); av_log(s, AV_LOG_INFO, "silence_end: %s | silence_duration: %s\n", - av_ts2timestr(end_pts, &time_base), - av_ts2timestr(duration_ts, &time_base)); + av_ts2timestr_fixed_precision(end_pts, &time_base), + av_ts2timestr_fixed_precision(duration_ts, &time_base)); } s->nb_null_samples[channel] = 0; s->start[channel] = INT64_MIN; diff --git a/libavutil/timestamp.h b/libavutil/timestamp.h index 9ae64da8a1..b483b5e12d 100644 --- a/libavutil/timestamp.h +++ b/libavutil/timestamp.h @@ -31,6 +31,8 @@ #endif #define AV_TS_MAX_STRING_SIZE 32 +#define AV_TS_FMT_DEFAULT "%.6g" +#define AV_TS_FMT_FIXED_PRECISION_6 "%.6f" /** * Fill the provided buffer with a string containing a timestamp @@ -53,9 +55,14 @@ static inline char *av_ts_make_string(char *buf, int64_t ts) */ #define av_ts2str(ts) av_ts_make_string((char[AV_TS_MAX_STRING_SIZE]){0}, ts) +#if FF_API_AV_MAKETIMESTRING /** + * This function is probably deprecated. It was originally called by + * av_ts_make_time_string defined below, which now uses av_ts_make_time_string_format. + * instead. Nothing external references this function directly. + * * Fill the provided buffer with a string containing a timestamp time - * representation. + * representation, using the default format AV_TS_FMT_DEFAULT. * * @param buf a buffer with size in bytes of at least AV_TS_MAX_STRING_SIZE * @param ts the timestamp to represent @@ -65,14 +72,33 @@ static inline char *av_ts_make_string(char *buf, int64_t ts) static inline char *av_ts_make_time_string(char *buf, int64_t ts, AVRational *tb) { if (ts == AV_NOPTS_VALUE) snprintf(buf, AV_TS_MAX_STRING_SIZE, "NOPTS"); - else snprintf(buf, AV_TS_MAX_STRING_SIZE, "%.6g", av_q2d(*tb) * ts); + else snprintf(buf, AV_TS_MAX_STRING_SIZE, AV_TS_FMT_DEFAULT, av_q2d(*tb) * ts); return buf; } +#endif /** - * Convenience macro, the return value should be used only directly in + * Fill the provided buffer with a string containing a timestamp time + * representation. + * + * @param buf a buffer with size in bytes of at least AV_TS_MAX_STRING_SIZE + * @param ts the timestamp to represent + * @param tb the timebase of the timestamp + * @param format format string for timestamp output, e.g. AV_TS_FMT_DEFAULT. + * @return the buffer in input + */ +static inline char *av_ts_make_time_string_format(char *buf, int64_t ts, AVRational *tb, const char *format) +{ + if (ts == AV_NOPTS_VALUE) snprintf(buf, AV_TS_MAX_STRING_SIZE, "NOPTS"); + else snprintf(buf, AV_TS_MAX_STRING_SIZE, format, av_q2d(*tb) * ts); + return buf; +} + +/** + * Convenience macros, the return values should be used only directly in * function arguments but never stand-alone. */ -#define av_ts2timestr(ts, tb) av_ts_make_time_string((char[AV_TS_MAX_STRING_SIZE]){0}, ts, tb) +#define av_ts2timestr(ts, tb) av_ts_make_time_string_format((char[AV_TS_MAX_STRING_SIZE]){0}, ts, tb, AV_TS_FMT_DEFAULT) +#define av_ts2timestr_fixed_precision(ts, tb) av_ts_make_time_string_format((char[AV_TS_MAX_STRING_SIZE]){0}, ts, tb, AV_TS_FMT_FIXED_PRECISION_6) #endif /* AVUTIL_TIMESTAMP_H */ diff --git a/libavutil/version.h b/libavutil/version.h index 9f45af93df..a2daeda772 100644 --- a/libavutil/version.h +++ b/libavutil/version.h @@ -119,6 +119,7 @@ #define FF_API_FRAME_KEY (LIBAVUTIL_VERSION_MAJOR < 59) #define FF_API_PALETTE_HAS_CHANGED (LIBAVUTIL_VERSION_MAJOR < 59) #define FF_API_VULKAN_CONTIGUOUS_MEMORY (LIBAVUTIL_VERSION_MAJOR < 59) +#define FF_API_AV_MAKETIMESTRING (LIBAVUTIL_VERSION_MAJOR < 59) /** * @} diff --git a/tests/ref/fate/filter-metadata-silencedetect b/tests/ref/fate/filter-metadata-silencedetect index bc53fea047..e66ffe5fdd 100644 --- a/tests/ref/fate/filter-metadata-silencedetect +++ b/tests/ref/fate/filter-metadata-silencedetect @@ -1,5 +1,5 @@ pts=0|tag:lavfi.silence_duration=0.523107|tag:lavfi.silence_end=0.690023|tag:lavfi.silence_start=0.736417 -pts=46080|tag:lavfi.silence_start=1.27626|tag:lavfi.silence_end=1.80751|tag:lavfi.silence_duration=0.531247 +pts=46080|tag:lavfi.silence_start=1.276259|tag:lavfi.silence_end=1.807506|tag:lavfi.silence_duration=0.531247 pts=92160 pts=138240 pts=184320
When the silencedetect audio filter is run against long files, the output timestamps gradually lose precision as the scan proceeds further into the file. This is because the output format specifier ("%.6g" in libavutil/timestamp.h) limits the total field width to six significant digits. As the offset into the file increases, digits drop off the end, until eventually, for offsets greater than 100000 seconds (about 28 hours), fractions of a second disappear altogether, and the timestamps are logged as whole seconds. This patch changes the format to "%.6f" for silencedetect, which will give microsecond precision for all timestamps regardless of offset. libavutil/timestamp.h exposes a macro, av_ts2timestr, as the public interface. This macro was used by silencedetect.c, as well as other source files. In order to fix the issue for silencedetect without affecting other files and tests, I have added a new macro, av_ts2timestr_fixed_precision, which uses the new format specifier. The original av_ts_make_time_string remains, with the original behavior. timestamp.h also exposes a function, av_ts_make_time_string, which was called only from av_ts2timestr. After this patch, both of the macros now use a new function, av_ts_make_time_string_format, which takes a format specifier as an argument, so the function av_ts_make_time_string is no longer used. I've left it in place, but flagged it for deprecation with FF_API_AV_MAKETIMESTRING. The test reference file filter-metadata-silencedetect has been updated to match the new functionality. Signed-off-by: Allan Cady <allancady@yahoo.com> --- libavfilter/af_silencedetect.c | 12 +++---- libavutil/timestamp.h | 34 +++++++++++++++++--- libavutil/version.h | 1 + tests/ref/fate/filter-metadata-silencedetect | 2 +- 4 files changed, 38 insertions(+), 11 deletions(-)