Message ID | 20231016091402.7972-1-lucenticus@gmail.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel] avcodec/amfenc: Fix for windows imprecise sleep | expand |
Context | Check | Description |
---|---|---|
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
On 16/10/2023 10:13, Evgeny Pavlov wrote: > This commit reduces the sleep time on Windows to improve AMF encoding > performance on low resolution input videos. > This fix is for Windows only, because sleep() function isn't > very accurate on Windows OS. > > Fix for issue #10622 > > Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com> > --- > libavcodec/amfenc.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c > index 061859f85c..0c95465d6e 100644 > --- a/libavcodec/amfenc.c > +++ b/libavcodec/amfenc.c > @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt) > if (query_output_data_flag == 0) { > if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) { > block_and_wait = 1; > +#ifdef _WIN32 > + av_usleep(0); //Sleep() is not precise on Windows OS. > +#else > av_usleep(1000); > +#endif > } > } > } while (block_and_wait); Wasting lots of power by spinning on a CPU core does not seem like a good answer to this problem. (I mean, presumably that is why Windows isn't honouring your request for a short sleep, because it wants timers to have larger gaps to avoid wasting power.) Why is there a sleep here at all, anyway? An API for hardware encoding should be providing a way for the caller to wait for an outstanding operation to complete. Thanks, - Mark
> 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道: > > On 16/10/2023 10:13, Evgeny Pavlov wrote: >> This commit reduces the sleep time on Windows to improve AMF encoding >> performance on low resolution input videos. >> This fix is for Windows only, because sleep() function isn't >> very accurate on Windows OS. >> Fix for issue #10622 >> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com> >> --- >> libavcodec/amfenc.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c >> index 061859f85c..0c95465d6e 100644 >> --- a/libavcodec/amfenc.c >> +++ b/libavcodec/amfenc.c >> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt) >> if (query_output_data_flag == 0) { >> if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) { >> block_and_wait = 1; >> +#ifdef _WIN32 >> + av_usleep(0); //Sleep() is not precise on Windows OS. >> +#else >> av_usleep(1000); >> +#endif >> } >> } >> } while (block_and_wait); > > Wasting lots of power by spinning on a CPU core does not seem like a good answer to this problem. (I mean, presumably that is why Windows isn't honouring your request for a short sleep, because it wants timers to have larger gaps to avoid wasting power.) If av_usleep is implemented via Sleep like current case, sleep 0 means yield current thread, so it’s not busy wait in normal case (but it can be busy wait). av_usleep(500) may looks better and do the same job by depending 500/1000 = 0. I agree use sleep without real async is like a bug. > > Why is there a sleep here at all, anyway? An API for hardware encoding should be providing a way for the caller to wait for an outstanding operation to complete. > > Thanks, > > - Mark > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email
The reason for using av_usleep() here is that AMF API doesn’t provide an API for explicit wait. There are two modes to get output from encoder: 1. Polling with some sleep to avoid CPU thrashing – currently used in FFmpeg 2. Set timeout parameter on AMF encoder and QueryOutput call will block till output is available or the timeout happens. #2 is the preferable way but it is designed more to be used with a separate polling thread. With a single-thread approach in FFmpeg, the use of timeout can block input submission making things slower. This is even more pronounced when B-frames are enabled and several inputs are needed to produce the first output. The condition of this sleep is in special events (primarily when amf input queue is full), not the core loop part. During the experiments the cpu increasing is about 2-4% or so, not a burst. For low resolution encoding, these changes bring significant performance improvement (about 15%). It will not bring improvement for high resolution such as 4K. Thanks, Evgeny вт, 17 окт. 2023 г. в 03:26, Zhao Zhili <quinkblack@foxmail.com>: > > > 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道: > > > > On 16/10/2023 10:13, Evgeny Pavlov wrote: > >> This commit reduces the sleep time on Windows to improve AMF encoding > >> performance on low resolution input videos. > >> This fix is for Windows only, because sleep() function isn't > >> very accurate on Windows OS. > >> Fix for issue #10622 > >> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com> > >> --- > >> libavcodec/amfenc.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c > >> index 061859f85c..0c95465d6e 100644 > >> --- a/libavcodec/amfenc.c > >> +++ b/libavcodec/amfenc.c > >> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx, > AVPacket *avpkt) > >> if (query_output_data_flag == 0) { > >> if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain > || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= > ctx->hwsurfaces_in_queue_max)) { > >> block_and_wait = 1; > >> +#ifdef _WIN32 > >> + av_usleep(0); //Sleep() is not precise on Windows OS. > >> +#else > >> av_usleep(1000); > >> +#endif > >> } > >> } > >> } while (block_and_wait); > > > > Wasting lots of power by spinning on a CPU core does not seem like a > good answer to this problem. (I mean, presumably that is why Windows isn't > honouring your request for a short sleep, because it wants timers to have > larger gaps to avoid wasting power.) > > If av_usleep is implemented via Sleep like current case, sleep 0 means > yield current thread, so it’s not busy wait in normal case (but it can be > busy wait). > > av_usleep(500) may looks better and do the same job by depending 500/1000 > = 0. > > I agree use sleep without real async is like a bug. > > > > > Why is there a sleep here at all, anyway? An API for hardware encoding > should be providing a way for the caller to wait for an outstanding > operation to complete. > > > > Thanks, > > > > - Mark > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
On Tue, 17 Oct 2023 at 19:34, Evgeny Pavlov <lucenticus@gmail.com> wrote: > > The reason for using av_usleep() here is that AMF API doesn’t provide an > API for explicit wait. There are two modes to get output from encoder: > > 1. Polling with some sleep to avoid CPU thrashing – currently used in FFmpeg > > 2. Set timeout parameter on AMF encoder and QueryOutput call will block > till output is available or the timeout happens. > > #2 is the preferable way but it is designed more to be used with a separate > polling thread. With a single-thread approach in FFmpeg, the use of timeout > can block input submission making things slower. This is even more > pronounced when B-frames are enabled and several inputs are needed to produce > the first output. > > The condition of this sleep is in special events (primarily when amf input > queue is full), not the core loop part. During the experiments the cpu > increasing is about 2-4% or so, not a burst. > > For low resolution encoding, these changes bring significant performance > improvement (about 15%). It will not bring improvement for high resolution > such as 4K. > > > Thanks, > > Evgeny > > вт, 17 окт. 2023 г. в 03:26, Zhao Zhili <quinkblack@foxmail.com>: > > > > > > 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道: > > > > > > On 16/10/2023 10:13, Evgeny Pavlov wrote: > > >> This commit reduces the sleep time on Windows to improve AMF encoding > > >> performance on low resolution input videos. > > >> This fix is for Windows only, because sleep() function isn't > > >> very accurate on Windows OS. > > >> Fix for issue #10622 > > >> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com> > > >> --- > > >> libavcodec/amfenc.c | 4 ++++ > > >> 1 file changed, 4 insertions(+) > > >> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c > > >> index 061859f85c..0c95465d6e 100644 > > >> --- a/libavcodec/amfenc.c > > >> +++ b/libavcodec/amfenc.c > > >> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx, > > AVPacket *avpkt) > > >> if (query_output_data_flag == 0) { > > >> if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain > > || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= > > ctx->hwsurfaces_in_queue_max)) { > > >> block_and_wait = 1; > > >> +#ifdef _WIN32 > > >> + av_usleep(0); //Sleep() is not precise on Windows OS. > > >> +#else > > >> av_usleep(1000); > > >> +#endif > > >> } > > >> } > > >> } while (block_and_wait); > > > > > > Wasting lots of power by spinning on a CPU core does not seem like a > > good answer to this problem. (I mean, presumably that is why Windows isn't > > honouring your request for a short sleep, because it wants timers to have > > larger gaps to avoid wasting power.) > > > > If av_usleep is implemented via Sleep like current case, sleep 0 means > > yield current thread, so it’s not busy wait in normal case (but it can be > > busy wait). > > > > av_usleep(500) may looks better and do the same job by depending 500/1000 > > = 0. > > > > I agree use sleep without real async is like a bug. > > > > > > > > Why is there a sleep here at all, anyway? An API for hardware encoding > > should be providing a way for the caller to wait for an outstanding > > operation to complete. > > > > > > Thanks, > > > > > > - Mark > > > _______________________________________________ > > > ffmpeg-devel mailing list > > > ffmpeg-devel@ffmpeg.org > > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > > To unsubscribe, visit link above, or email > > > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". Please don't top-post. I'll bottom-post now and no one will know how to read this email. If you need more precise sleep on Windows, your application should use timeBeginPeriod/timeEndPeriod API, see https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod This sleep shouldn't be there to begin with and removing it only for Windows, seems like a hacky workaround. Sleep on Windows is accurate, when you request a timer resolution appropriate for your application. You probably don't do that, and have unexpectedly long sleeps, but it is not because they are "not accurate", it is because you don't ask for it. Side note, with `Sleep()` you can request only 1 ms sleep, but with with waitable timers https://learn.microsoft.com/en-us/windows/win32/sync/waitable-timer-objects you can go down to 0.5 ms, which seems currently be the lowest interval that Windows kernel will wake anything up in practice. - Kacper
On Tue, Oct 17, 2023 at 9:45 PM Kacper Michajlow <kasper93@gmail.com> wrote: > On Tue, 17 Oct 2023 at 19:34, Evgeny Pavlov <lucenticus@gmail.com> wrote: > > > > The reason for using av_usleep() here is that AMF API doesn’t provide an > > API for explicit wait. There are two modes to get output from encoder: > > > > 1. Polling with some sleep to avoid CPU thrashing – currently used in > FFmpeg > > > > 2. Set timeout parameter on AMF encoder and QueryOutput call will block > > till output is available or the timeout happens. > > > > #2 is the preferable way but it is designed more to be used with a > separate > > polling thread. With a single-thread approach in FFmpeg, the use of > timeout > > can block input submission making things slower. This is even more > > pronounced when B-frames are enabled and several inputs are needed to > produce > > the first output. > > > > The condition of this sleep is in special events (primarily when amf > input > > queue is full), not the core loop part. During the experiments the cpu > > increasing is about 2-4% or so, not a burst. > > > > For low resolution encoding, these changes bring significant performance > > improvement (about 15%). It will not bring improvement for high > resolution > > such as 4K. > > > > > > Thanks, > > > > Evgeny > > > > вт, 17 окт. 2023 г. в 03:26, Zhao Zhili <quinkblack@foxmail.com>: > > > > > > > > > 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道: > > > > > > > > On 16/10/2023 10:13, Evgeny Pavlov wrote: > > > >> This commit reduces the sleep time on Windows to improve AMF > encoding > > > >> performance on low resolution input videos. > > > >> This fix is for Windows only, because sleep() function isn't > > > >> very accurate on Windows OS. > > > >> Fix for issue #10622 > > > >> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com> > > > >> --- > > > >> libavcodec/amfenc.c | 4 ++++ > > > >> 1 file changed, 4 insertions(+) > > > >> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c > > > >> index 061859f85c..0c95465d6e 100644 > > > >> --- a/libavcodec/amfenc.c > > > >> +++ b/libavcodec/amfenc.c > > > >> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext > *avctx, > > > AVPacket *avpkt) > > > >> if (query_output_data_flag == 0) { > > > >> if (res_resubmit == AMF_INPUT_FULL || > ctx->delayed_drain > > > || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= > > > ctx->hwsurfaces_in_queue_max)) { > > > >> block_and_wait = 1; > > > >> +#ifdef _WIN32 > > > >> + av_usleep(0); //Sleep() is not precise on Windows > OS. > > > >> +#else > > > >> av_usleep(1000); > > > >> +#endif > > > >> } > > > >> } > > > >> } while (block_and_wait); > > > > > > > > Wasting lots of power by spinning on a CPU core does not seem like a > > > good answer to this problem. (I mean, presumably that is why Windows > isn't > > > honouring your request for a short sleep, because it wants timers to > have > > > larger gaps to avoid wasting power.) > > > > > > If av_usleep is implemented via Sleep like current case, sleep 0 means > > > yield current thread, so it’s not busy wait in normal case (but it can > be > > > busy wait). > > > > > > av_usleep(500) may looks better and do the same job by depending > 500/1000 > > > = 0. > > > > > > I agree use sleep without real async is like a bug. > > > > > > > > > > > Why is there a sleep here at all, anyway? An API for hardware > encoding > > > should be providing a way for the caller to wait for an outstanding > > > operation to complete. > > > > > > > > Thanks, > > > > > > > > - Mark > > > > _______________________________________________ > > > > ffmpeg-devel mailing list > > > > ffmpeg-devel@ffmpeg.org > > > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > > > > To unsubscribe, visit link above, or email > > > > > > _______________________________________________ > > > ffmpeg-devel mailing list > > > ffmpeg-devel@ffmpeg.org > > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > > To unsubscribe, visit link above, or email > > > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > > > > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > > Please don't top-post. I'll bottom-post now and no one will know how > to read this email. > > If you need more precise sleep on Windows, your application should use > timeBeginPeriod/timeEndPeriod API, see > > https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod > > This sleep shouldn't be there to begin with and removing it only for > Windows, seems like a hacky workaround. > > Sleep on Windows is accurate, when you request a timer resolution > appropriate for your application. You probably don't do that, and have > unexpectedly long sleeps, but it is not because they are "not > accurate", it is because you don't ask for it. > > Side note, with `Sleep()` you can request only 1 ms sleep, but with > with waitable timers > https://learn.microsoft.com/en-us/windows/win32/sync/waitable-timer-objects > you can go down to 0.5 ms, which seems currently be the lowest > interval that Windows kernel will wake anything up in practice. > > - Kacper > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > I can use the code similar from AMF with timeBeginPeriod/timeEndPeriod API(please ignore commented code) AMF/amf/public/common/Windows/ThreadWindows.cpp at master · GPUOpen-LibrariesAndSDKs/AMF · GitHub <https://github.com/GPUOpen-LibrariesAndSDKs/AMF/blob/master/amf/public/common/Windows/ThreadWindows.cpp#L303> But in my opinion, an alternative suggestion from Zhao Zhili to use av_usleep(500) will be more suitable for ffmpeg, because I found similar code for QSV components in ffmpeg.
diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c index 061859f85c..0c95465d6e 100644 --- a/libavcodec/amfenc.c +++ b/libavcodec/amfenc.c @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt) if (query_output_data_flag == 0) { if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) { block_and_wait = 1; +#ifdef _WIN32 + av_usleep(0); //Sleep() is not precise on Windows OS. +#else av_usleep(1000); +#endif } } } while (block_and_wait);
This commit reduces the sleep time on Windows to improve AMF encoding performance on low resolution input videos. This fix is for Windows only, because sleep() function isn't very accurate on Windows OS. Fix for issue #10622 Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com> --- libavcodec/amfenc.c | 4 ++++ 1 file changed, 4 insertions(+)