diff mbox series

[FFmpeg-devel] avcodec/amfenc: Fix for windows imprecise sleep

Message ID 20231016091402.7972-1-lucenticus@gmail.com
State New
Headers show
Series [FFmpeg-devel] avcodec/amfenc: Fix for windows imprecise sleep | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Evgeny Pavlov Oct. 16, 2023, 9:13 a.m. UTC
This commit reduces the sleep time on Windows to improve AMF encoding
performance on low resolution input videos.
This fix is for Windows only, because sleep() function isn't
very accurate on Windows OS.

Fix for issue #10622

Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
---
 libavcodec/amfenc.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Mark Thompson Oct. 16, 2023, 9:24 p.m. UTC | #1
On 16/10/2023 10:13, Evgeny Pavlov wrote:
> This commit reduces the sleep time on Windows to improve AMF encoding
> performance on low resolution input videos.
> This fix is for Windows only, because sleep() function isn't
> very accurate on Windows OS.
> 
> Fix for issue #10622
> 
> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
> ---
>   libavcodec/amfenc.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> index 061859f85c..0c95465d6e 100644
> --- a/libavcodec/amfenc.c
> +++ b/libavcodec/amfenc.c
> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
>           if (query_output_data_flag == 0) {
>               if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) {
>                   block_and_wait = 1;
> +#ifdef _WIN32
> +                av_usleep(0); //Sleep() is not precise on Windows OS.
> +#else
>                   av_usleep(1000);
> +#endif
>               }
>           }
>       } while (block_and_wait);

Wasting lots of power by spinning on a CPU core does not seem like a good answer to this problem.  (I mean, presumably that is why Windows isn't honouring your request for a short sleep, because it wants timers to have larger gaps to avoid wasting power.)

Why is there a sleep here at all, anyway?  An API for hardware encoding should be providing a way for the caller to wait for an outstanding operation to complete.

Thanks,

- Mark
Zhao Zhili Oct. 17, 2023, 1:25 a.m. UTC | #2
> 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道:
> 
> On 16/10/2023 10:13, Evgeny Pavlov wrote:
>> This commit reduces the sleep time on Windows to improve AMF encoding
>> performance on low resolution input videos.
>> This fix is for Windows only, because sleep() function isn't
>> very accurate on Windows OS.
>> Fix for issue #10622
>> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
>> ---
>>  libavcodec/amfenc.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
>> index 061859f85c..0c95465d6e 100644
>> --- a/libavcodec/amfenc.c
>> +++ b/libavcodec/amfenc.c
>> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
>>          if (query_output_data_flag == 0) {
>>              if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) {
>>                  block_and_wait = 1;
>> +#ifdef _WIN32
>> +                av_usleep(0); //Sleep() is not precise on Windows OS.
>> +#else
>>                  av_usleep(1000);
>> +#endif
>>              }
>>          }
>>      } while (block_and_wait);
> 
> Wasting lots of power by spinning on a CPU core does not seem like a good answer to this problem.  (I mean, presumably that is why Windows isn't honouring your request for a short sleep, because it wants timers to have larger gaps to avoid wasting power.)

If av_usleep is implemented via Sleep like current case, sleep 0 means yield current thread, so it’s not busy wait in normal case (but it can be busy wait).

av_usleep(500) may looks better and do the same job by depending 500/1000 = 0.

I agree use sleep without real async is like a bug.

> 
> Why is there a sleep here at all, anyway?  An API for hardware encoding should be providing a way for the caller to wait for an outstanding operation to complete.
> 
> Thanks,
> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
Evgeny Pavlov Oct. 17, 2023, 5:11 p.m. UTC | #3
The reason for using av_usleep() here is that AMF API doesn’t provide an
API for explicit wait. There are two modes to get output from encoder:

1. Polling with some sleep to avoid CPU thrashing – currently used in FFmpeg

2. Set timeout parameter on AMF encoder and QueryOutput call will block
till output is available or the timeout happens.

#2 is the preferable way but it is designed more to be used with a separate
polling thread. With a single-thread approach in FFmpeg, the use of timeout
can block input submission making things slower.  This is even more
pronounced when B-frames are enabled and several inputs are needed to produce
the first output.

The condition of this sleep is in special events (primarily when amf input
queue is full), not the core loop part. During the experiments the cpu
increasing is about 2-4% or so, not a burst.

For low resolution encoding,  these changes bring significant performance
improvement (about 15%). It will not bring improvement for high resolution
such as 4K.


Thanks,

Evgeny

вт, 17 окт. 2023 г. в 03:26, Zhao Zhili <quinkblack@foxmail.com>:

>
> > 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道:
> >
> > On 16/10/2023 10:13, Evgeny Pavlov wrote:
> >> This commit reduces the sleep time on Windows to improve AMF encoding
> >> performance on low resolution input videos.
> >> This fix is for Windows only, because sleep() function isn't
> >> very accurate on Windows OS.
> >> Fix for issue #10622
> >> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
> >> ---
> >>  libavcodec/amfenc.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> >> index 061859f85c..0c95465d6e 100644
> >> --- a/libavcodec/amfenc.c
> >> +++ b/libavcodec/amfenc.c
> >> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx,
> AVPacket *avpkt)
> >>          if (query_output_data_flag == 0) {
> >>              if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain
> || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >=
> ctx->hwsurfaces_in_queue_max)) {
> >>                  block_and_wait = 1;
> >> +#ifdef _WIN32
> >> +                av_usleep(0); //Sleep() is not precise on Windows OS.
> >> +#else
> >>                  av_usleep(1000);
> >> +#endif
> >>              }
> >>          }
> >>      } while (block_and_wait);
> >
> > Wasting lots of power by spinning on a CPU core does not seem like a
> good answer to this problem.  (I mean, presumably that is why Windows isn't
> honouring your request for a short sleep, because it wants timers to have
> larger gaps to avoid wasting power.)
>
> If av_usleep is implemented via Sleep like current case, sleep 0 means
> yield current thread, so it’s not busy wait in normal case (but it can be
> busy wait).
>
> av_usleep(500) may looks better and do the same job by depending 500/1000
> = 0.
>
> I agree use sleep without real async is like a bug.
>
> >
> > Why is there a sleep here at all, anyway?  An API for hardware encoding
> should be providing a way for the caller to wait for an outstanding
> operation to complete.
> >
> > Thanks,
> >
> > - Mark
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Kacper Michajłow Oct. 17, 2023, 7:45 p.m. UTC | #4
On Tue, 17 Oct 2023 at 19:34, Evgeny Pavlov <lucenticus@gmail.com> wrote:
>
> The reason for using av_usleep() here is that AMF API doesn’t provide an
> API for explicit wait. There are two modes to get output from encoder:
>
> 1. Polling with some sleep to avoid CPU thrashing – currently used in FFmpeg
>
> 2. Set timeout parameter on AMF encoder and QueryOutput call will block
> till output is available or the timeout happens.
>
> #2 is the preferable way but it is designed more to be used with a separate
> polling thread. With a single-thread approach in FFmpeg, the use of timeout
> can block input submission making things slower.  This is even more
> pronounced when B-frames are enabled and several inputs are needed to produce
> the first output.
>
> The condition of this sleep is in special events (primarily when amf input
> queue is full), not the core loop part. During the experiments the cpu
> increasing is about 2-4% or so, not a burst.
>
> For low resolution encoding,  these changes bring significant performance
> improvement (about 15%). It will not bring improvement for high resolution
> such as 4K.
>
>
> Thanks,
>
> Evgeny
>
> вт, 17 окт. 2023 г. в 03:26, Zhao Zhili <quinkblack@foxmail.com>:
>
> >
> > > 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道:
> > >
> > > On 16/10/2023 10:13, Evgeny Pavlov wrote:
> > >> This commit reduces the sleep time on Windows to improve AMF encoding
> > >> performance on low resolution input videos.
> > >> This fix is for Windows only, because sleep() function isn't
> > >> very accurate on Windows OS.
> > >> Fix for issue #10622
> > >> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
> > >> ---
> > >>  libavcodec/amfenc.c | 4 ++++
> > >>  1 file changed, 4 insertions(+)
> > >> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> > >> index 061859f85c..0c95465d6e 100644
> > >> --- a/libavcodec/amfenc.c
> > >> +++ b/libavcodec/amfenc.c
> > >> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx,
> > AVPacket *avpkt)
> > >>          if (query_output_data_flag == 0) {
> > >>              if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain
> > || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >=
> > ctx->hwsurfaces_in_queue_max)) {
> > >>                  block_and_wait = 1;
> > >> +#ifdef _WIN32
> > >> +                av_usleep(0); //Sleep() is not precise on Windows OS.
> > >> +#else
> > >>                  av_usleep(1000);
> > >> +#endif
> > >>              }
> > >>          }
> > >>      } while (block_and_wait);
> > >
> > > Wasting lots of power by spinning on a CPU core does not seem like a
> > good answer to this problem.  (I mean, presumably that is why Windows isn't
> > honouring your request for a short sleep, because it wants timers to have
> > larger gaps to avoid wasting power.)
> >
> > If av_usleep is implemented via Sleep like current case, sleep 0 means
> > yield current thread, so it’s not busy wait in normal case (but it can be
> > busy wait).
> >
> > av_usleep(500) may looks better and do the same job by depending 500/1000
> > = 0.
> >
> > I agree use sleep without real async is like a bug.
> >
> > >
> > > Why is there a sleep here at all, anyway?  An API for hardware encoding
> > should be providing a way for the caller to wait for an outstanding
> > operation to complete.
> > >
> > > Thanks,
> > >
> > > - Mark
> > > _______________________________________________
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel@ffmpeg.org
> > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
> > > To unsubscribe, visit link above, or email
> >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

Please don't top-post. I'll bottom-post now and no one will know how
to read this email.

If you need more precise sleep on Windows, your application should use
timeBeginPeriod/timeEndPeriod API, see
https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod

This sleep shouldn't be there to begin with and removing it only for
Windows, seems like a hacky workaround.

Sleep on Windows is accurate, when you request a timer resolution
appropriate for your application. You probably don't do that, and have
unexpectedly long sleeps, but it is not because they are "not
accurate", it is because you don't ask for it.

Side note, with `Sleep()` you can request only 1 ms sleep, but with
with waitable timers
https://learn.microsoft.com/en-us/windows/win32/sync/waitable-timer-objects
you can go down to 0.5 ms, which seems currently be the lowest
interval that Windows kernel will wake anything up in practice.

- Kacper
Evgeny Pavlov Oct. 18, 2023, 10:32 a.m. UTC | #5
On Tue, Oct 17, 2023 at 9:45 PM Kacper Michajlow <kasper93@gmail.com> wrote:

> On Tue, 17 Oct 2023 at 19:34, Evgeny Pavlov <lucenticus@gmail.com> wrote:
> >
> > The reason for using av_usleep() here is that AMF API doesn’t provide an
> > API for explicit wait. There are two modes to get output from encoder:
> >
> > 1. Polling with some sleep to avoid CPU thrashing – currently used in
> FFmpeg
> >
> > 2. Set timeout parameter on AMF encoder and QueryOutput call will block
> > till output is available or the timeout happens.
> >
> > #2 is the preferable way but it is designed more to be used with a
> separate
> > polling thread. With a single-thread approach in FFmpeg, the use of
> timeout
> > can block input submission making things slower.  This is even more
> > pronounced when B-frames are enabled and several inputs are needed to
> produce
> > the first output.
> >
> > The condition of this sleep is in special events (primarily when amf
> input
> > queue is full), not the core loop part. During the experiments the cpu
> > increasing is about 2-4% or so, not a burst.
> >
> > For low resolution encoding,  these changes bring significant performance
> > improvement (about 15%). It will not bring improvement for high
> resolution
> > such as 4K.
> >
> >
> > Thanks,
> >
> > Evgeny
> >
> > вт, 17 окт. 2023 г. в 03:26, Zhao Zhili <quinkblack@foxmail.com>:
> >
> > >
> > > > 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道:
> > > >
> > > > On 16/10/2023 10:13, Evgeny Pavlov wrote:
> > > >> This commit reduces the sleep time on Windows to improve AMF
> encoding
> > > >> performance on low resolution input videos.
> > > >> This fix is for Windows only, because sleep() function isn't
> > > >> very accurate on Windows OS.
> > > >> Fix for issue #10622
> > > >> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
> > > >> ---
> > > >>  libavcodec/amfenc.c | 4 ++++
> > > >>  1 file changed, 4 insertions(+)
> > > >> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> > > >> index 061859f85c..0c95465d6e 100644
> > > >> --- a/libavcodec/amfenc.c
> > > >> +++ b/libavcodec/amfenc.c
> > > >> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext
> *avctx,
> > > AVPacket *avpkt)
> > > >>          if (query_output_data_flag == 0) {
> > > >>              if (res_resubmit == AMF_INPUT_FULL ||
> ctx->delayed_drain
> > > || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >=
> > > ctx->hwsurfaces_in_queue_max)) {
> > > >>                  block_and_wait = 1;
> > > >> +#ifdef _WIN32
> > > >> +                av_usleep(0); //Sleep() is not precise on Windows
> OS.
> > > >> +#else
> > > >>                  av_usleep(1000);
> > > >> +#endif
> > > >>              }
> > > >>          }
> > > >>      } while (block_and_wait);
> > > >
> > > > Wasting lots of power by spinning on a CPU core does not seem like a
> > > good answer to this problem.  (I mean, presumably that is why Windows
> isn't
> > > honouring your request for a short sleep, because it wants timers to
> have
> > > larger gaps to avoid wasting power.)
> > >
> > > If av_usleep is implemented via Sleep like current case, sleep 0 means
> > > yield current thread, so it’s not busy wait in normal case (but it can
> be
> > > busy wait).
> > >
> > > av_usleep(500) may looks better and do the same job by depending
> 500/1000
> > > = 0.
> > >
> > > I agree use sleep without real async is like a bug.
> > >
> > > >
> > > > Why is there a sleep here at all, anyway?  An API for hardware
> encoding
> > > should be providing a way for the caller to wait for an outstanding
> > > operation to complete.
> > > >
> > > > Thanks,
> > > >
> > > > - Mark
> > > > _______________________________________________
> > > > ffmpeg-devel mailing list
> > > > ffmpeg-devel@ffmpeg.org
> > > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > > >
> > > > To unsubscribe, visit link above, or email
> > >
> > > _______________________________________________
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel@ffmpeg.org
> > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
> > > To unsubscribe, visit link above, or email
> > > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> > >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
> Please don't top-post. I'll bottom-post now and no one will know how
> to read this email.
>
> If you need more precise sleep on Windows, your application should use
> timeBeginPeriod/timeEndPeriod API, see
>
> https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
>
> This sleep shouldn't be there to begin with and removing it only for
> Windows, seems like a hacky workaround.
>
> Sleep on Windows is accurate, when you request a timer resolution
> appropriate for your application. You probably don't do that, and have
> unexpectedly long sleeps, but it is not because they are "not
> accurate", it is because you don't ask for it.
>
> Side note, with `Sleep()` you can request only 1 ms sleep, but with
> with waitable timers
> https://learn.microsoft.com/en-us/windows/win32/sync/waitable-timer-objects
> you can go down to 0.5 ms, which seems currently be the lowest
> interval that Windows kernel will wake anything up in practice.
>
> - Kacper
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>

I can use the code similar from AMF with timeBeginPeriod/timeEndPeriod
API(please ignore commented code)
AMF/amf/public/common/Windows/ThreadWindows.cpp
at master · GPUOpen-LibrariesAndSDKs/AMF · GitHub
<https://github.com/GPUOpen-LibrariesAndSDKs/AMF/blob/master/amf/public/common/Windows/ThreadWindows.cpp#L303>
But in my opinion, an alternative suggestion from Zhao Zhili to use
av_usleep(500) will be more suitable for ffmpeg,
because I found similar code for QSV components in ffmpeg.
diff mbox series

Patch

diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 061859f85c..0c95465d6e 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -770,7 +770,11 @@  int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
         if (query_output_data_flag == 0) {
             if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) {
                 block_and_wait = 1;
+#ifdef _WIN32
+                av_usleep(0); //Sleep() is not precise on Windows OS.
+#else
                 av_usleep(1000);
+#endif
             }
         }
     } while (block_and_wait);