diff mbox

[FFmpeg-devel] pthread_frame: make accesses to debug field be protected by owner lock.

Message ID 1491500921-18604-1-git-send-email-rsbultje@gmail.com
State Accepted
Headers show

Commit Message

Ronald S. Bultje April 6, 2017, 5:48 p.m. UTC
The av_log() is done outside the lock, but this way the accesses to the
field (reads and writes) are always protected by a mutex. The av_log()
is not run inside the lock context because it may involve user callbacks
and doing that in performance-sensitive code is probably not a good idea.

This should fix occasional tsan warnings when running fate-h264, like:

WARNING: ThreadSanitizer: data race (pid=10916)
  Write of size 4 at 0x7d64000174fc by main thread (mutexes: write M2313):
    #0 update_context_from_user src/libavcodec/pthread_frame.c:335 (ffmpeg+0x000000df7b06)
[..]
  Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write M2311):
    #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592 (ffmpeg+0x000000df8b3e)
---
 libavcodec/pthread_frame.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

Comments

Kieran Kunhya April 6, 2017, 6:07 p.m. UTC | #1
On Thu, 6 Apr 2017 at 18:48 Ronald S. Bultje <rsbultje@gmail.com> wrote:

> The av_log() is done outside the lock, but this way the accesses to the
> field (reads and writes) are always protected by a mutex. The av_log()
> is not run inside the lock context because it may involve user callbacks
> and doing that in performance-sensitive code is probably not a good idea.
>
> This should fix occasional tsan warnings when running fate-h264, like:
>

Looks fine.

Kieran
wm4 April 7, 2017, 6:37 a.m. UTC | #2
On Thu,  6 Apr 2017 13:48:41 -0400
"Ronald S. Bultje" <rsbultje@gmail.com> wrote:

> The av_log() is done outside the lock, but this way the accesses to the
> field (reads and writes) are always protected by a mutex. The av_log()
> is not run inside the lock context because it may involve user callbacks
> and doing that in performance-sensitive code is probably not a good idea.
> 
> This should fix occasional tsan warnings when running fate-h264, like:
> 
> WARNING: ThreadSanitizer: data race (pid=10916)
>   Write of size 4 at 0x7d64000174fc by main thread (mutexes: write M2313):
>     #0 update_context_from_user src/libavcodec/pthread_frame.c:335 (ffmpeg+0x000000df7b06)
> [..]
>   Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write M2311):
>     #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592 (ffmpeg+0x000000df8b3e)
> ---
>  libavcodec/pthread_frame.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
> index c246c2f..8857bfc 100644
> --- a/libavcodec/pthread_frame.c
> +++ b/libavcodec/pthread_frame.c
> @@ -559,6 +559,7 @@ void ff_thread_report_progress(ThreadFrame *f, int n, int field)
>  {
>      PerThreadContext *p;
>      atomic_int *progress = f->progress ? (atomic_int*)f->progress->data : NULL;
> +    int debug_mv;
>  
>      if (!progress ||
>          atomic_load_explicit(&progress[field], memory_order_relaxed) >= n)
> @@ -566,22 +567,24 @@ void ff_thread_report_progress(ThreadFrame *f, int n, int field)
>  
>      p = f->owner[field]->internal->thread_ctx;
>  
> -    if (f->owner[field]->debug&FF_DEBUG_THREADS)
> -        av_log(f->owner[field], AV_LOG_DEBUG,
> -               "%p finished %d field %d\n", progress, n, field);
> -
>      pthread_mutex_lock(&p->progress_mutex);
> +    debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS;
>  
>      atomic_store_explicit(&progress[field], n, memory_order_release);
>  
>      pthread_cond_broadcast(&p->progress_cond);
>      pthread_mutex_unlock(&p->progress_mutex);
> +
> +    if (debug_mv)
> +        av_log(f->owner[field], AV_LOG_DEBUG,
> +               "%p finished %d field %d\n", progress, n, field);
>  }
>  
>  void ff_thread_await_progress(ThreadFrame *f, int n, int field)
>  {
>      PerThreadContext *p;
>      atomic_int *progress = f->progress ? (atomic_int*)f->progress->data : NULL;
> +    int debug_mv;
>  
>      if (!progress ||
>          atomic_load_explicit(&progress[field], memory_order_acquire) >= n)
> @@ -589,14 +592,15 @@ void ff_thread_await_progress(ThreadFrame *f, int n, int field)
>  
>      p = f->owner[field]->internal->thread_ctx;
>  
> -    if (f->owner[field]->debug&FF_DEBUG_THREADS)
> -        av_log(f->owner[field], AV_LOG_DEBUG,
> -               "thread awaiting %d field %d from %p\n", n, field, progress);
> -
>      pthread_mutex_lock(&p->progress_mutex);
> +    debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS;
>      while (atomic_load_explicit(&progress[field], memory_order_relaxed) < n)
>          pthread_cond_wait(&p->progress_cond, &p->progress_mutex);
>      pthread_mutex_unlock(&p->progress_mutex);
> +
> +    if (debug_mv)
> +        av_log(f->owner[field], AV_LOG_DEBUG,
> +               "thread awaited %d field %d from %p\n", n, field, progress);
>  }
>  
>  void ff_thread_finish_setup(AVCodecContext *avctx) {

I'm not sure what "debug_mv" stands for, and I think this is a bit
overkill. It's run only when FF_DEBUG_THREADS is set (who even sets
that, and why does access to it need to be synchronized?), so I see no
big offense in calling av_log() inside of it.

But other than that LGTM.
Ronald S. Bultje April 7, 2017, 12:30 p.m. UTC | #3
Hi,

On Fri, Apr 7, 2017 at 2:37 AM, wm4 <nfxjfg@googlemail.com> wrote:

> On Thu,  6 Apr 2017 13:48:41 -0400
> "Ronald S. Bultje" <rsbultje@gmail.com> wrote:
>
> > The av_log() is done outside the lock, but this way the accesses to the
> > field (reads and writes) are always protected by a mutex. The av_log()
> > is not run inside the lock context because it may involve user callbacks
> > and doing that in performance-sensitive code is probably not a good idea.
> >
> > This should fix occasional tsan warnings when running fate-h264, like:
> >
> > WARNING: ThreadSanitizer: data race (pid=10916)
> >   Write of size 4 at 0x7d64000174fc by main thread (mutexes: write
> M2313):
> >     #0 update_context_from_user src/libavcodec/pthread_frame.c:335
> (ffmpeg+0x000000df7b06)
> > [..]
> >   Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write
> M2311):
> >     #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592
> (ffmpeg+0x000000df8b3e)
> > ---
> >  libavcodec/pthread_frame.c | 20 ++++++++++++--------
> >  1 file changed, 12 insertions(+), 8 deletions(-)
> >
> > diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
> > index c246c2f..8857bfc 100644
> > --- a/libavcodec/pthread_frame.c
> > +++ b/libavcodec/pthread_frame.c
> > @@ -559,6 +559,7 @@ void ff_thread_report_progress(ThreadFrame *f, int
> n, int field)
> >  {
> >      PerThreadContext *p;
> >      atomic_int *progress = f->progress ? (atomic_int*)f->progress->data
> : NULL;
> > +    int debug_mv;
> >
> >      if (!progress ||
> >          atomic_load_explicit(&progress[field], memory_order_relaxed)
> >= n)
> > @@ -566,22 +567,24 @@ void ff_thread_report_progress(ThreadFrame *f,
> int n, int field)
> >
> >      p = f->owner[field]->internal->thread_ctx;
> >
> > -    if (f->owner[field]->debug&FF_DEBUG_THREADS)
> > -        av_log(f->owner[field], AV_LOG_DEBUG,
> > -               "%p finished %d field %d\n", progress, n, field);
> > -
> >      pthread_mutex_lock(&p->progress_mutex);
> > +    debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS;
> >
> >      atomic_store_explicit(&progress[field], n, memory_order_release);
> >
> >      pthread_cond_broadcast(&p->progress_cond);
> >      pthread_mutex_unlock(&p->progress_mutex);
> > +
> > +    if (debug_mv)
> > +        av_log(f->owner[field], AV_LOG_DEBUG,
> > +               "%p finished %d field %d\n", progress, n, field);
> >  }
> >
> >  void ff_thread_await_progress(ThreadFrame *f, int n, int field)
> >  {
> >      PerThreadContext *p;
> >      atomic_int *progress = f->progress ? (atomic_int*)f->progress->data
> : NULL;
> > +    int debug_mv;
> >
> >      if (!progress ||
> >          atomic_load_explicit(&progress[field], memory_order_acquire)
> >= n)
> > @@ -589,14 +592,15 @@ void ff_thread_await_progress(ThreadFrame *f, int
> n, int field)
> >
> >      p = f->owner[field]->internal->thread_ctx;
> >
> > -    if (f->owner[field]->debug&FF_DEBUG_THREADS)
> > -        av_log(f->owner[field], AV_LOG_DEBUG,
> > -               "thread awaiting %d field %d from %p\n", n, field,
> progress);
> > -
> >      pthread_mutex_lock(&p->progress_mutex);
> > +    debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS;
> >      while (atomic_load_explicit(&progress[field],
> memory_order_relaxed) < n)
> >          pthread_cond_wait(&p->progress_cond, &p->progress_mutex);
> >      pthread_mutex_unlock(&p->progress_mutex);
> > +
> > +    if (debug_mv)
> > +        av_log(f->owner[field], AV_LOG_DEBUG,
> > +               "thread awaited %d field %d from %p\n", n, field,
> progress);
> >  }
> >
> >  void ff_thread_finish_setup(AVCodecContext *avctx) {
>
> I'm not sure what "debug_mv" stands for, and I think this is a bit
> overkill. It's run only when FF_DEBUG_THREADS is set (who even sets
> that,


I actually occasionally use it.

and why does access to it need to be synchronized?),


Now that's a really good question.

Let me give my explanation (which is probably similar to Clement's) and
then we can see if others agree with it.

So first of all: is there a race condition here? Really, no. Sort of. A
race condition is (for me) non-static behaviour in output in response to
otherwise identical input. So imagine the following code fragment:

avctx = alloc(threads=2); open(avctx);
for (i=0;i<5;i++)
    decode(avctx, frame);
close(avctx);

Is there a race condition (with my definition)? No. The code will always
give the same return values. Now, let's look at this code fragment:

avctx = alloc(threads=2); open(avctx);
for (i=0;i<5;i++) {
    decode(avctx, frame);
    if (i == 2) { avctx->debug |= THREADS; av_log_set_level(DEBUG); }
}
close(avctx);

Here, a week ago, you got a race condition because user threads would be
synchronized with the next worker thread unlocked, thus allowing
debug&THREADS to be updated mid-frame in an active thread. This was fixed
in 1269cd5.

Is this a big deal? No, honestly. But it's sort-of a race. If you wanted to
debug threading, at least it know always produces the same output.

This patch does not solve a race of that kind. It just removes a tsan
warning about a protected write and an unprotected read in an unrelated
thread that is waiting for the "owner" thread. So why would we do that? My
answer is: primarily to make tsan less noisy. It found "fake" (like this)
and "real" (some having real implications) races, and we will never find
the real races unless we silence the fake ones also. As long as I can do
that without affecting performance and code readability, I'm happy to do
so. If it affects readability or performance, I will obviously leave it
as-is.

so I see no big offense in calling av_log() inside of it.


There's that also... I have no strong opinion on that.

Ronald
wm4 April 7, 2017, 1:36 p.m. UTC | #4
On Fri, 7 Apr 2017 08:30:00 -0400
"Ronald S. Bultje" <rsbultje@gmail.com> wrote:

> Hi,
> 
> On Fri, Apr 7, 2017 at 2:37 AM, wm4 <nfxjfg@googlemail.com> wrote:
> 
> > On Thu,  6 Apr 2017 13:48:41 -0400
> > "Ronald S. Bultje" <rsbultje@gmail.com> wrote:
> >  
> > > The av_log() is done outside the lock, but this way the accesses to the
> > > field (reads and writes) are always protected by a mutex. The av_log()
> > > is not run inside the lock context because it may involve user callbacks
> > > and doing that in performance-sensitive code is probably not a good idea.
> > >
> > > This should fix occasional tsan warnings when running fate-h264, like:
> > >
> > > WARNING: ThreadSanitizer: data race (pid=10916)
> > >   Write of size 4 at 0x7d64000174fc by main thread (mutexes: write  
> > M2313):  
> > >     #0 update_context_from_user src/libavcodec/pthread_frame.c:335  
> > (ffmpeg+0x000000df7b06)  
> > > [..]
> > >   Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write  
> > M2311):  
> > >     #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592  
> > (ffmpeg+0x000000df8b3e)  
> > > ---
> > >  libavcodec/pthread_frame.c | 20 ++++++++++++--------
> > >  1 file changed, 12 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
> > > index c246c2f..8857bfc 100644
> > > --- a/libavcodec/pthread_frame.c
> > > +++ b/libavcodec/pthread_frame.c
> > > @@ -559,6 +559,7 @@ void ff_thread_report_progress(ThreadFrame *f, int  
> > n, int field)  
> > >  {
> > >      PerThreadContext *p;
> > >      atomic_int *progress = f->progress ? (atomic_int*)f->progress->data  
> > : NULL;  
> > > +    int debug_mv;
> > >
> > >      if (!progress ||
> > >          atomic_load_explicit(&progress[field], memory_order_relaxed)
> > >= n)
> > > @@ -566,22 +567,24 @@ void ff_thread_report_progress(ThreadFrame *f,  
> > int n, int field)  
> > >
> > >      p = f->owner[field]->internal->thread_ctx;
> > >
> > > -    if (f->owner[field]->debug&FF_DEBUG_THREADS)
> > > -        av_log(f->owner[field], AV_LOG_DEBUG,
> > > -               "%p finished %d field %d\n", progress, n, field);
> > > -
> > >      pthread_mutex_lock(&p->progress_mutex);
> > > +    debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS;
> > >
> > >      atomic_store_explicit(&progress[field], n, memory_order_release);
> > >
> > >      pthread_cond_broadcast(&p->progress_cond);
> > >      pthread_mutex_unlock(&p->progress_mutex);
> > > +
> > > +    if (debug_mv)
> > > +        av_log(f->owner[field], AV_LOG_DEBUG,
> > > +               "%p finished %d field %d\n", progress, n, field);
> > >  }
> > >
> > >  void ff_thread_await_progress(ThreadFrame *f, int n, int field)
> > >  {
> > >      PerThreadContext *p;
> > >      atomic_int *progress = f->progress ? (atomic_int*)f->progress->data  
> > : NULL;  
> > > +    int debug_mv;
> > >
> > >      if (!progress ||
> > >          atomic_load_explicit(&progress[field], memory_order_acquire)
> > >= n)
> > > @@ -589,14 +592,15 @@ void ff_thread_await_progress(ThreadFrame *f, int  
> > n, int field)  
> > >
> > >      p = f->owner[field]->internal->thread_ctx;
> > >
> > > -    if (f->owner[field]->debug&FF_DEBUG_THREADS)
> > > -        av_log(f->owner[field], AV_LOG_DEBUG,
> > > -               "thread awaiting %d field %d from %p\n", n, field,  
> > progress);  
> > > -
> > >      pthread_mutex_lock(&p->progress_mutex);
> > > +    debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS;
> > >      while (atomic_load_explicit(&progress[field],  
> > memory_order_relaxed) < n)  
> > >          pthread_cond_wait(&p->progress_cond, &p->progress_mutex);
> > >      pthread_mutex_unlock(&p->progress_mutex);
> > > +
> > > +    if (debug_mv)
> > > +        av_log(f->owner[field], AV_LOG_DEBUG,
> > > +               "thread awaited %d field %d from %p\n", n, field,  
> > progress);  
> > >  }
> > >
> > >  void ff_thread_finish_setup(AVCodecContext *avctx) {  
> >
> > I'm not sure what "debug_mv" stands for, and I think this is a bit
> > overkill. It's run only when FF_DEBUG_THREADS is set (who even sets
> > that,  
> 
> 
> I actually occasionally use it.
> 
> and why does access to it need to be synchronized?),
> 
> 
> Now that's a really good question.
> 
> Let me give my explanation (which is probably similar to Clement's) and
> then we can see if others agree with it.
> 
> So first of all: is there a race condition here? Really, no. Sort of. A
> race condition is (for me) non-static behaviour in output in response to
> otherwise identical input. So imagine the following code fragment:
> 
> avctx = alloc(threads=2); open(avctx);
> for (i=0;i<5;i++)
>     decode(avctx, frame);
> close(avctx);
> 
> Is there a race condition (with my definition)? No. The code will always
> give the same return values. Now, let's look at this code fragment:
> 
> avctx = alloc(threads=2); open(avctx);
> for (i=0;i<5;i++) {
>     decode(avctx, frame);
>     if (i == 2) { avctx->debug |= THREADS; av_log_set_level(DEBUG); }
> }
> close(avctx);
> 
> Here, a week ago, you got a race condition because user threads would be
> synchronized with the next worker thread unlocked, thus allowing
> debug&THREADS to be updated mid-frame in an active thread. This was fixed
> in 1269cd5.
> 
> Is this a big deal? No, honestly. But it's sort-of a race. If you wanted to
> debug threading, at least it know always produces the same output.
> 
> This patch does not solve a race of that kind. It just removes a tsan
> warning about a protected write and an unprotected read in an unrelated
> thread that is waiting for the "owner" thread. So why would we do that? My
> answer is: primarily to make tsan less noisy. It found "fake" (like this)
> and "real" (some having real implications) races, and we will never find
> the real races unless we silence the fake ones also. As long as I can do
> that without affecting performance and code readability, I'm happy to do
> so. If it affects readability or performance, I will obviously leave it
> as-is.
> 
> so I see no big offense in calling av_log() inside of it.
> 
> 
> There's that also... I have no strong opinion on that.

In that case, maybe better to move the av_log into the block to reduce
the amount of code needed for it. But I have no strong opinion either,
just the ever-present feeling that debug logs should not take up too
much space in the source code.
Wan-Teh Chang July 5, 2017, 9:30 p.m. UTC | #5
Hi Ronald,

A variant of this patch is committed as
2e664b9c1e73c80aab91070c1eb7676f04bdd12d:

http://ffmpeg.org/pipermail/ffmpeg-cvslog/2017-April/106577.html

I believe it does not fix the tsan warning.

On Thu, Apr 6, 2017 at 10:48 AM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> ..., but this way the accesses to the
> field (reads and writes) are always protected by a mutex.

The patch protects the reads of the |debug| field with
p->progress_mutex, where |p| points to a PerThreadContext. But the
write of the |debug| field in update_context_from_user() is not
protected by progress_mutex of any PerThreadContext.

The tsan warning this patch tried to fix is:

> WARNING: ThreadSanitizer: data race (pid=10916)
>   Write of size 4 at 0x7d64000174fc by main thread (mutexes: write M2313):
>     #0 update_context_from_user src/libavcodec/pthread_frame.c:335 (ffmpeg+0x000000df7b06)
> [..]
>   Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write M2311):
>     #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592 (ffmpeg+0x000000df8b3e)

The code in libavcodec/pthread_frame.c related to the write of the
|debug| field shown in the tsan warning is the following:

 326 static int update_context_from_user(AVCodecContext *dst,
AVCodecContext *sr     c)
 327 {
 ,,,
 333
 334     dst->opaque   = src->opaque;
 335     dst->debug    = src->debug;
 336     dst->debug_mv = src->debug_mv;

...

 383 static int submit_packet(PerThreadContext *p, AVCodecContext *user_avctx,
 384                          AVPacket *avpkt)
 385 {
 ...
 393
 394     pthread_mutex_lock(&p->mutex);
 395
 396     ret = update_context_from_user(p->avctx, user_avctx);

...

 472 int ff_thread_decode_frame(AVCodecContext *avctx,
 473                            AVFrame *picture, int *got_picture_ptr,
 474                            AVPacket *avpkt)
 475 {
 ...
 480
 481     /* release the async lock, permitting blocked hwaccel threads to
 482      * go forward while we are in this function */
 483     async_unlock(fctx);
 484
 485     /*
 486      * Submit a packet to the next decoding thread.
 487      */
 488
 489     p = &fctx->threads[fctx->next_decoding];
 490     err = submit_packet(p, avctx, avpkt);

The code snippets show the write of dst->debug is only protected by
p->mutex (acquired at line 394) for some |p|. progress_mutex (of any
AVCodecContext) is not involved at all.

Wan-Teh Chang
Ronald S. Bultje July 5, 2017, 9:49 p.m. UTC | #6
Hi Wan-Teh,

On Wed, Jul 5, 2017 at 5:30 PM, Wan-Teh Chang <wtc@google.com> wrote:

> Hi Ronald,
>
> A variant of this patch is committed as
> 2e664b9c1e73c80aab91070c1eb7676f04bdd12d:
>
> http://ffmpeg.org/pipermail/ffmpeg-cvslog/2017-April/106577.html
>
> I believe it does not fix the tsan warning.


I had noticed that too.

Unfortunately, I have spent a significant amount of (unpaid, personal) time
on fixing most of these tsan warnings already (with help of ubitux -
thanks!!), and given the incredibly mega-low impact of this particular
outstanding issue, I'm currently not actively working on fixing this. I may
get back to this issue later but I have some other things that I'd rather
work on and that I believe are more important.

Ronald
Wan-Teh Chang July 5, 2017, 10:12 p.m. UTC | #7
Hi Ronald,

Thank you for the quick reply!

On Wed, Jul 5, 2017 at 2:49 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> Hi Wan-Teh,
>
> On Wed, Jul 5, 2017 at 5:30 PM, Wan-Teh Chang <wtc@google.com> wrote:
>>
>> Hi Ronald,
>>
>> A variant of this patch is committed as
>> 2e664b9c1e73c80aab91070c1eb7676f04bdd12d:
>>
>> http://ffmpeg.org/pipermail/ffmpeg-cvslog/2017-April/106577.html
>>
>> I believe it does not fix the tsan warning.
>
> I had noticed that too.
>
> Unfortunately, I have spent a significant amount of (unpaid, personal) time
> on fixing most of these tsan warnings already (with help of ubitux -
> thanks!!), and given the incredibly mega-low impact of this particular
> outstanding issue, I'm currently not actively working on fixing this. I may
> get back to this issue later but I have some other things that I'd rather
> work on and that I believe are more important.

Thank you for all the tsan warning fixes. In the meantime, it would be
good to revert 2e664b9c1e73c80aab91070c1eb7676f04bdd12d to avoid
confusion.

Wan-Teh Chang
Ronald S. Bultje July 5, 2017, 10:31 p.m. UTC | #8
Hi Wan-Teh,

On Wed, Jul 5, 2017 at 6:12 PM, Wan-Teh Chang <wtc@google.com> wrote:

> Hi Ronald,
>
> Thank you for the quick reply!
>
> On Wed, Jul 5, 2017 at 2:49 PM, Ronald S. Bultje <rsbultje@gmail.com>
> wrote:
> > Hi Wan-Teh,
> >
> > On Wed, Jul 5, 2017 at 5:30 PM, Wan-Teh Chang <wtc@google.com> wrote:
> >>
> >> Hi Ronald,
> >>
> >> A variant of this patch is committed as
> >> 2e664b9c1e73c80aab91070c1eb7676f04bdd12d:
> >>
> >> http://ffmpeg.org/pipermail/ffmpeg-cvslog/2017-April/106577.html
> >>
> >> I believe it does not fix the tsan warning.
> >
> > I had noticed that too.
> >
> > Unfortunately, I have spent a significant amount of (unpaid, personal)
> time
> > on fixing most of these tsan warnings already (with help of ubitux -
> > thanks!!), and given the incredibly mega-low impact of this particular
> > outstanding issue, I'm currently not actively working on fixing this. I
> may
> > get back to this issue later but I have some other things that I'd rather
> > work on and that I believe are more important.
>
> Thank you for all the tsan warning fixes. In the meantime, it would be
> good to revert 2e664b9c1e73c80aab91070c1eb7676f04bdd12d to avoid
> confusion.


Why? I believe it fixes a subset of the issue.

Ronald
Wan-Teh Chang July 6, 2017, 12:08 a.m. UTC | #9
Hi Ronald,

On Wed, Jul 5, 2017 at 3:31 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> Hi Wan-Teh,
>
> On Wed, Jul 5, 2017 at 6:12 PM, Wan-Teh Chang <wtc@google.com> wrote:
>>
>> Thank you for all the tsan warning fixes. In the meantime, it would be
>> good to revert 2e664b9c1e73c80aab91070c1eb7676f04bdd12d to avoid
>> confusion.
>
>
> Why? I believe it fixes a subset of the issue.

Could you explain why it fixes a subset of the issue?

The data race that tsan warned about was between a write and a read:

WARNING: ThreadSanitizer: data race (pid=10916)
  Write of size 4 at 0x7d64000174fc by main thread (mutexes: write M2313):
    #0 update_context_from_user src/libavcodec/pthread_frame.c:335
(ffmpeg+0x000000df7b06)
[..]
  Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write M2311):
    #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592
(ffmpeg+0x000000df8b3e)

But the write is not protected by the mutex that
2e664b9c1e73c80aab91070c1eb7676f04bdd12d uses to protect the reads.

I can't reproduce the tsan warning by running fate-h264 (with and
without 2e664b9c1e73c80aab91070c1eb7676f04bdd12d), but I can still
reproduce the tsan warning by running a test program in my project
with a recent (one or two weeks old) snapshot of ffmpeg:

WARNING: ThreadSanitizer: data race (pid=86821)
  Read of size 4 at 0x7b640003b8fc by thread T19 (mutexes: write
M525086672791228232, write M524523722837806536):
    #0 ff_thread_await_progress
[...]/libavcodec/pthread_frame.c:591:26
(5ab42bb1a6f4b068d7863dabe9b2bacc+0xe749a1)
    #1 await_references [...]/libavcodec/h264_mb.c
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x9530c4)
    #2 hl_motion_420_simple_8 [...]/libavcodec/h264_mc_template.c:80:9
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x95106b)
    #3 hl_decode_mb_simple_8 [...]/libavcodec/h264_mb_template.c:180
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x95106b)
    #4 ff_h264_hl_decode_mb [...]/libavcodec/h264_mb.c:816:9
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x940953)
    #5 decode_slice [...]/libavcodec/h264_slice.c:2572:17
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x982ce7)
    #6 ff_h264_execute_decode_slices
[...]/libavcodec/h264_slice.c:2747:15
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x981fb9)
    #7 decode_nal_units [...]/libavcodec/h264dec.c:718:27
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x99257b)
    #8 h264_decode_frame [...]/libavcodec/h264dec.c:1008
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x99257b)
    #9 frame_worker_thread [...]/libavcodec/pthread_frame.c:199:21
(5ab42bb1a6f4b068d7863dabe9b2bacc+0xe75b4c)

  Previous write of size 4 at 0x7b640003b8fc by main thread (mutexes:
write M524242247861095840):
    #0 update_context_from_user
[...]/libavcodec/pthread_frame.c:335:19
(5ab42bb1a6f4b068d7863dabe9b2bacc+0xe73859)
    #1 submit_packet [...]/libavcodec/pthread_frame.c:396
(5ab42bb1a6f4b068d7863dabe9b2bacc+0xe73859)
    #2 ff_thread_decode_frame [...]/libavcodec/pthread_frame.c:490
(5ab42bb1a6f4b068d7863dabe9b2bacc+0xe73859)
    #3 decode_simple_internal [...]/libavcodec/decode.c:415:15
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d49b9)
    #4 decode_simple_receive_frame [...]/libavcodec/decode.c:620
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d49b9)
    #5 decode_receive_frame_internal [...]/libavcodec/decode.c:638
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d49b9)
    #6 avcodec_send_packet [...]/libavcodec/decode.c:678:15
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d3ea7)
    #7 compat_decode [...]/libavcodec/decode.c:853:15
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d627d)
    #8 avcodec_decode_video2 [...]/libavcodec/decode.c:914:12
(5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d617e)
    ...

The tsan warning shows the read and the write are still protected by
different mutexes. The read is now protected by two mutexes because of
2e664b9c1e73c80aab91070c1eb7676f04bdd12d.

I hope this convinces you that
2e664b9c1e73c80aab91070c1eb7676f04bdd12d is not working as intended
and should be reverted to avoid confusion.

Thanks,
Wan-Teh Chang
Ronald S. Bultje July 6, 2017, 2:24 a.m. UTC | #10
Hi Wan-Teh,

On Wed, Jul 5, 2017 at 8:08 PM, Wan-Teh Chang <wtc@google.com> wrote:

> Hi Ronald,
>
> On Wed, Jul 5, 2017 at 3:31 PM, Ronald S. Bultje <rsbultje@gmail.com>
> wrote:
> > Hi Wan-Teh,
> >
> > On Wed, Jul 5, 2017 at 6:12 PM, Wan-Teh Chang <wtc@google.com> wrote:
> >>
> >> Thank you for all the tsan warning fixes. In the meantime, it would be
> >> good to revert 2e664b9c1e73c80aab91070c1eb7676f04bdd12d to avoid
> >> confusion.
> >
> >
> > Why? I believe it fixes a subset of the issue.
>
> Could you explain why it fixes a subset of the issue?


From what I remember, I was (before this patch) semi-reliably able to
reproduce the issue locally, and it went away after. I've observed the same
thing in the relevant fate station, where before the patch, this warning
occurred semi-reliably in 3-4 tests per run, and after the patch it
sometimes occurs in 1 test per run. I understand this doesn't satisfy you
but I currently don't want to dig through the code to figure out why I
thought this was a good idea, I have other things that take priority.
Reverting the patch will require me to dig through this code also, and I
really don't see the gain.

Some more thoughts:
- I don't think we want to grab a second lock for something trivial like
this (e.g. grabbing progress_mutex when copying the debug field in
update_context_from_user);
- making the debug flags field atomic will be difficult because it's public
API, and I don't think we're ready to expose C11 types in our public API;
- I suppose you could make a private (inside PerThreadContext, which allows
it to be atomic) copy of debug intended for cross-threading purposes and
use that here.

Ronald
Wan-Teh Chang July 7, 2017, 9:17 p.m. UTC | #11
Hi Ronald,

On Wed, Jul 5, 2017 at 7:24 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> Hi Wan-Teh,
>
> On Wed, Jul 5, 2017 at 8:08 PM, Wan-Teh Chang <wtc@google.com> wrote:
>>
>> Hi Ronald,
>>
>> On Wed, Jul 5, 2017 at 3:31 PM, Ronald S. Bultje <rsbultje@gmail.com>
>> wrote:
>> > Hi Wan-Teh,
>> >
>> > On Wed, Jul 5, 2017 at 6:12 PM, Wan-Teh Chang <wtc@google.com> wrote:
>> >>
>> >> Thank you for all the tsan warning fixes. In the meantime, it would be
>> >> good to revert 2e664b9c1e73c80aab91070c1eb7676f04bdd12d to avoid
>> >> confusion.
>> >
>> >
>> > Why? I believe it fixes a subset of the issue.
>>
>> Could you explain why it fixes a subset of the issue?
>
> From what I remember, I was (before this patch) semi-reliably able to
> reproduce the issue locally, and it went away after. I've observed the same
> thing in the relevant fate station, where before the patch, this warning
> occurred semi-reliably in 3-4 tests per run, and after the patch it
> sometimes occurs in 1 test per run. I understand this doesn't satisfy you
> but I currently don't want to dig through the code to figure out why I
> thought this was a good idea, I have other things that take priority.
> Reverting the patch will require me to dig through this code also, and I
> really don't see the gain.

In addition to avoiding confusion, reverting the patch will move the
av_log() statements outside the lock. Since I don't use those two
av_log() statements, I won't insist on reverting the patch.

> Some more thoughts:
> - I don't think we want to grab a second lock for something trivial like
> this (e.g. grabbing progress_mutex when copying the debug field in
> update_context_from_user);
> - making the debug flags field atomic will be difficult because it's public
> API, and I don't think we're ready to expose C11 types in our public API;
> - I suppose you could make a private (inside PerThreadContext, which allows
> it to be atomic) copy of debug intended for cross-threading purposes and use
> that here.

After studying this problem for two more days, I implemented a variant
of your last suggestion:

http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213384.html

Let's continue the discussion in that email thread.

Wan-Teh Chang
diff mbox

Patch

diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
index c246c2f..8857bfc 100644
--- a/libavcodec/pthread_frame.c
+++ b/libavcodec/pthread_frame.c
@@ -559,6 +559,7 @@  void ff_thread_report_progress(ThreadFrame *f, int n, int field)
 {
     PerThreadContext *p;
     atomic_int *progress = f->progress ? (atomic_int*)f->progress->data : NULL;
+    int debug_mv;
 
     if (!progress ||
         atomic_load_explicit(&progress[field], memory_order_relaxed) >= n)
@@ -566,22 +567,24 @@  void ff_thread_report_progress(ThreadFrame *f, int n, int field)
 
     p = f->owner[field]->internal->thread_ctx;
 
-    if (f->owner[field]->debug&FF_DEBUG_THREADS)
-        av_log(f->owner[field], AV_LOG_DEBUG,
-               "%p finished %d field %d\n", progress, n, field);
-
     pthread_mutex_lock(&p->progress_mutex);
+    debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS;
 
     atomic_store_explicit(&progress[field], n, memory_order_release);
 
     pthread_cond_broadcast(&p->progress_cond);
     pthread_mutex_unlock(&p->progress_mutex);
+
+    if (debug_mv)
+        av_log(f->owner[field], AV_LOG_DEBUG,
+               "%p finished %d field %d\n", progress, n, field);
 }
 
 void ff_thread_await_progress(ThreadFrame *f, int n, int field)
 {
     PerThreadContext *p;
     atomic_int *progress = f->progress ? (atomic_int*)f->progress->data : NULL;
+    int debug_mv;
 
     if (!progress ||
         atomic_load_explicit(&progress[field], memory_order_acquire) >= n)
@@ -589,14 +592,15 @@  void ff_thread_await_progress(ThreadFrame *f, int n, int field)
 
     p = f->owner[field]->internal->thread_ctx;
 
-    if (f->owner[field]->debug&FF_DEBUG_THREADS)
-        av_log(f->owner[field], AV_LOG_DEBUG,
-               "thread awaiting %d field %d from %p\n", n, field, progress);
-
     pthread_mutex_lock(&p->progress_mutex);
+    debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS;
     while (atomic_load_explicit(&progress[field], memory_order_relaxed) < n)
         pthread_cond_wait(&p->progress_cond, &p->progress_mutex);
     pthread_mutex_unlock(&p->progress_mutex);
+
+    if (debug_mv)
+        av_log(f->owner[field], AV_LOG_DEBUG,
+               "thread awaited %d field %d from %p\n", n, field, progress);
 }
 
 void ff_thread_finish_setup(AVCodecContext *avctx) {