Message ID | 1491500921-18604-1-git-send-email-rsbultje@gmail.com |
---|---|
State | Accepted |
Headers | show |
On Thu, 6 Apr 2017 at 18:48 Ronald S. Bultje <rsbultje@gmail.com> wrote: > The av_log() is done outside the lock, but this way the accesses to the > field (reads and writes) are always protected by a mutex. The av_log() > is not run inside the lock context because it may involve user callbacks > and doing that in performance-sensitive code is probably not a good idea. > > This should fix occasional tsan warnings when running fate-h264, like: > Looks fine. Kieran
On Thu, 6 Apr 2017 13:48:41 -0400 "Ronald S. Bultje" <rsbultje@gmail.com> wrote: > The av_log() is done outside the lock, but this way the accesses to the > field (reads and writes) are always protected by a mutex. The av_log() > is not run inside the lock context because it may involve user callbacks > and doing that in performance-sensitive code is probably not a good idea. > > This should fix occasional tsan warnings when running fate-h264, like: > > WARNING: ThreadSanitizer: data race (pid=10916) > Write of size 4 at 0x7d64000174fc by main thread (mutexes: write M2313): > #0 update_context_from_user src/libavcodec/pthread_frame.c:335 (ffmpeg+0x000000df7b06) > [..] > Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write M2311): > #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592 (ffmpeg+0x000000df8b3e) > --- > libavcodec/pthread_frame.c | 20 ++++++++++++-------- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c > index c246c2f..8857bfc 100644 > --- a/libavcodec/pthread_frame.c > +++ b/libavcodec/pthread_frame.c > @@ -559,6 +559,7 @@ void ff_thread_report_progress(ThreadFrame *f, int n, int field) > { > PerThreadContext *p; > atomic_int *progress = f->progress ? (atomic_int*)f->progress->data : NULL; > + int debug_mv; > > if (!progress || > atomic_load_explicit(&progress[field], memory_order_relaxed) >= n) > @@ -566,22 +567,24 @@ void ff_thread_report_progress(ThreadFrame *f, int n, int field) > > p = f->owner[field]->internal->thread_ctx; > > - if (f->owner[field]->debug&FF_DEBUG_THREADS) > - av_log(f->owner[field], AV_LOG_DEBUG, > - "%p finished %d field %d\n", progress, n, field); > - > pthread_mutex_lock(&p->progress_mutex); > + debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS; > > atomic_store_explicit(&progress[field], n, memory_order_release); > > pthread_cond_broadcast(&p->progress_cond); > pthread_mutex_unlock(&p->progress_mutex); > + > + if (debug_mv) > + av_log(f->owner[field], AV_LOG_DEBUG, > + "%p finished %d field %d\n", progress, n, field); > } > > void ff_thread_await_progress(ThreadFrame *f, int n, int field) > { > PerThreadContext *p; > atomic_int *progress = f->progress ? (atomic_int*)f->progress->data : NULL; > + int debug_mv; > > if (!progress || > atomic_load_explicit(&progress[field], memory_order_acquire) >= n) > @@ -589,14 +592,15 @@ void ff_thread_await_progress(ThreadFrame *f, int n, int field) > > p = f->owner[field]->internal->thread_ctx; > > - if (f->owner[field]->debug&FF_DEBUG_THREADS) > - av_log(f->owner[field], AV_LOG_DEBUG, > - "thread awaiting %d field %d from %p\n", n, field, progress); > - > pthread_mutex_lock(&p->progress_mutex); > + debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS; > while (atomic_load_explicit(&progress[field], memory_order_relaxed) < n) > pthread_cond_wait(&p->progress_cond, &p->progress_mutex); > pthread_mutex_unlock(&p->progress_mutex); > + > + if (debug_mv) > + av_log(f->owner[field], AV_LOG_DEBUG, > + "thread awaited %d field %d from %p\n", n, field, progress); > } > > void ff_thread_finish_setup(AVCodecContext *avctx) { I'm not sure what "debug_mv" stands for, and I think this is a bit overkill. It's run only when FF_DEBUG_THREADS is set (who even sets that, and why does access to it need to be synchronized?), so I see no big offense in calling av_log() inside of it. But other than that LGTM.
Hi, On Fri, Apr 7, 2017 at 2:37 AM, wm4 <nfxjfg@googlemail.com> wrote: > On Thu, 6 Apr 2017 13:48:41 -0400 > "Ronald S. Bultje" <rsbultje@gmail.com> wrote: > > > The av_log() is done outside the lock, but this way the accesses to the > > field (reads and writes) are always protected by a mutex. The av_log() > > is not run inside the lock context because it may involve user callbacks > > and doing that in performance-sensitive code is probably not a good idea. > > > > This should fix occasional tsan warnings when running fate-h264, like: > > > > WARNING: ThreadSanitizer: data race (pid=10916) > > Write of size 4 at 0x7d64000174fc by main thread (mutexes: write > M2313): > > #0 update_context_from_user src/libavcodec/pthread_frame.c:335 > (ffmpeg+0x000000df7b06) > > [..] > > Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write > M2311): > > #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592 > (ffmpeg+0x000000df8b3e) > > --- > > libavcodec/pthread_frame.c | 20 ++++++++++++-------- > > 1 file changed, 12 insertions(+), 8 deletions(-) > > > > diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c > > index c246c2f..8857bfc 100644 > > --- a/libavcodec/pthread_frame.c > > +++ b/libavcodec/pthread_frame.c > > @@ -559,6 +559,7 @@ void ff_thread_report_progress(ThreadFrame *f, int > n, int field) > > { > > PerThreadContext *p; > > atomic_int *progress = f->progress ? (atomic_int*)f->progress->data > : NULL; > > + int debug_mv; > > > > if (!progress || > > atomic_load_explicit(&progress[field], memory_order_relaxed) > >= n) > > @@ -566,22 +567,24 @@ void ff_thread_report_progress(ThreadFrame *f, > int n, int field) > > > > p = f->owner[field]->internal->thread_ctx; > > > > - if (f->owner[field]->debug&FF_DEBUG_THREADS) > > - av_log(f->owner[field], AV_LOG_DEBUG, > > - "%p finished %d field %d\n", progress, n, field); > > - > > pthread_mutex_lock(&p->progress_mutex); > > + debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS; > > > > atomic_store_explicit(&progress[field], n, memory_order_release); > > > > pthread_cond_broadcast(&p->progress_cond); > > pthread_mutex_unlock(&p->progress_mutex); > > + > > + if (debug_mv) > > + av_log(f->owner[field], AV_LOG_DEBUG, > > + "%p finished %d field %d\n", progress, n, field); > > } > > > > void ff_thread_await_progress(ThreadFrame *f, int n, int field) > > { > > PerThreadContext *p; > > atomic_int *progress = f->progress ? (atomic_int*)f->progress->data > : NULL; > > + int debug_mv; > > > > if (!progress || > > atomic_load_explicit(&progress[field], memory_order_acquire) > >= n) > > @@ -589,14 +592,15 @@ void ff_thread_await_progress(ThreadFrame *f, int > n, int field) > > > > p = f->owner[field]->internal->thread_ctx; > > > > - if (f->owner[field]->debug&FF_DEBUG_THREADS) > > - av_log(f->owner[field], AV_LOG_DEBUG, > > - "thread awaiting %d field %d from %p\n", n, field, > progress); > > - > > pthread_mutex_lock(&p->progress_mutex); > > + debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS; > > while (atomic_load_explicit(&progress[field], > memory_order_relaxed) < n) > > pthread_cond_wait(&p->progress_cond, &p->progress_mutex); > > pthread_mutex_unlock(&p->progress_mutex); > > + > > + if (debug_mv) > > + av_log(f->owner[field], AV_LOG_DEBUG, > > + "thread awaited %d field %d from %p\n", n, field, > progress); > > } > > > > void ff_thread_finish_setup(AVCodecContext *avctx) { > > I'm not sure what "debug_mv" stands for, and I think this is a bit > overkill. It's run only when FF_DEBUG_THREADS is set (who even sets > that, I actually occasionally use it. and why does access to it need to be synchronized?), Now that's a really good question. Let me give my explanation (which is probably similar to Clement's) and then we can see if others agree with it. So first of all: is there a race condition here? Really, no. Sort of. A race condition is (for me) non-static behaviour in output in response to otherwise identical input. So imagine the following code fragment: avctx = alloc(threads=2); open(avctx); for (i=0;i<5;i++) decode(avctx, frame); close(avctx); Is there a race condition (with my definition)? No. The code will always give the same return values. Now, let's look at this code fragment: avctx = alloc(threads=2); open(avctx); for (i=0;i<5;i++) { decode(avctx, frame); if (i == 2) { avctx->debug |= THREADS; av_log_set_level(DEBUG); } } close(avctx); Here, a week ago, you got a race condition because user threads would be synchronized with the next worker thread unlocked, thus allowing debug&THREADS to be updated mid-frame in an active thread. This was fixed in 1269cd5. Is this a big deal? No, honestly. But it's sort-of a race. If you wanted to debug threading, at least it know always produces the same output. This patch does not solve a race of that kind. It just removes a tsan warning about a protected write and an unprotected read in an unrelated thread that is waiting for the "owner" thread. So why would we do that? My answer is: primarily to make tsan less noisy. It found "fake" (like this) and "real" (some having real implications) races, and we will never find the real races unless we silence the fake ones also. As long as I can do that without affecting performance and code readability, I'm happy to do so. If it affects readability or performance, I will obviously leave it as-is. so I see no big offense in calling av_log() inside of it. There's that also... I have no strong opinion on that. Ronald
On Fri, 7 Apr 2017 08:30:00 -0400 "Ronald S. Bultje" <rsbultje@gmail.com> wrote: > Hi, > > On Fri, Apr 7, 2017 at 2:37 AM, wm4 <nfxjfg@googlemail.com> wrote: > > > On Thu, 6 Apr 2017 13:48:41 -0400 > > "Ronald S. Bultje" <rsbultje@gmail.com> wrote: > > > > > The av_log() is done outside the lock, but this way the accesses to the > > > field (reads and writes) are always protected by a mutex. The av_log() > > > is not run inside the lock context because it may involve user callbacks > > > and doing that in performance-sensitive code is probably not a good idea. > > > > > > This should fix occasional tsan warnings when running fate-h264, like: > > > > > > WARNING: ThreadSanitizer: data race (pid=10916) > > > Write of size 4 at 0x7d64000174fc by main thread (mutexes: write > > M2313): > > > #0 update_context_from_user src/libavcodec/pthread_frame.c:335 > > (ffmpeg+0x000000df7b06) > > > [..] > > > Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write > > M2311): > > > #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592 > > (ffmpeg+0x000000df8b3e) > > > --- > > > libavcodec/pthread_frame.c | 20 ++++++++++++-------- > > > 1 file changed, 12 insertions(+), 8 deletions(-) > > > > > > diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c > > > index c246c2f..8857bfc 100644 > > > --- a/libavcodec/pthread_frame.c > > > +++ b/libavcodec/pthread_frame.c > > > @@ -559,6 +559,7 @@ void ff_thread_report_progress(ThreadFrame *f, int > > n, int field) > > > { > > > PerThreadContext *p; > > > atomic_int *progress = f->progress ? (atomic_int*)f->progress->data > > : NULL; > > > + int debug_mv; > > > > > > if (!progress || > > > atomic_load_explicit(&progress[field], memory_order_relaxed) > > >= n) > > > @@ -566,22 +567,24 @@ void ff_thread_report_progress(ThreadFrame *f, > > int n, int field) > > > > > > p = f->owner[field]->internal->thread_ctx; > > > > > > - if (f->owner[field]->debug&FF_DEBUG_THREADS) > > > - av_log(f->owner[field], AV_LOG_DEBUG, > > > - "%p finished %d field %d\n", progress, n, field); > > > - > > > pthread_mutex_lock(&p->progress_mutex); > > > + debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS; > > > > > > atomic_store_explicit(&progress[field], n, memory_order_release); > > > > > > pthread_cond_broadcast(&p->progress_cond); > > > pthread_mutex_unlock(&p->progress_mutex); > > > + > > > + if (debug_mv) > > > + av_log(f->owner[field], AV_LOG_DEBUG, > > > + "%p finished %d field %d\n", progress, n, field); > > > } > > > > > > void ff_thread_await_progress(ThreadFrame *f, int n, int field) > > > { > > > PerThreadContext *p; > > > atomic_int *progress = f->progress ? (atomic_int*)f->progress->data > > : NULL; > > > + int debug_mv; > > > > > > if (!progress || > > > atomic_load_explicit(&progress[field], memory_order_acquire) > > >= n) > > > @@ -589,14 +592,15 @@ void ff_thread_await_progress(ThreadFrame *f, int > > n, int field) > > > > > > p = f->owner[field]->internal->thread_ctx; > > > > > > - if (f->owner[field]->debug&FF_DEBUG_THREADS) > > > - av_log(f->owner[field], AV_LOG_DEBUG, > > > - "thread awaiting %d field %d from %p\n", n, field, > > progress); > > > - > > > pthread_mutex_lock(&p->progress_mutex); > > > + debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS; > > > while (atomic_load_explicit(&progress[field], > > memory_order_relaxed) < n) > > > pthread_cond_wait(&p->progress_cond, &p->progress_mutex); > > > pthread_mutex_unlock(&p->progress_mutex); > > > + > > > + if (debug_mv) > > > + av_log(f->owner[field], AV_LOG_DEBUG, > > > + "thread awaited %d field %d from %p\n", n, field, > > progress); > > > } > > > > > > void ff_thread_finish_setup(AVCodecContext *avctx) { > > > > I'm not sure what "debug_mv" stands for, and I think this is a bit > > overkill. It's run only when FF_DEBUG_THREADS is set (who even sets > > that, > > > I actually occasionally use it. > > and why does access to it need to be synchronized?), > > > Now that's a really good question. > > Let me give my explanation (which is probably similar to Clement's) and > then we can see if others agree with it. > > So first of all: is there a race condition here? Really, no. Sort of. A > race condition is (for me) non-static behaviour in output in response to > otherwise identical input. So imagine the following code fragment: > > avctx = alloc(threads=2); open(avctx); > for (i=0;i<5;i++) > decode(avctx, frame); > close(avctx); > > Is there a race condition (with my definition)? No. The code will always > give the same return values. Now, let's look at this code fragment: > > avctx = alloc(threads=2); open(avctx); > for (i=0;i<5;i++) { > decode(avctx, frame); > if (i == 2) { avctx->debug |= THREADS; av_log_set_level(DEBUG); } > } > close(avctx); > > Here, a week ago, you got a race condition because user threads would be > synchronized with the next worker thread unlocked, thus allowing > debug&THREADS to be updated mid-frame in an active thread. This was fixed > in 1269cd5. > > Is this a big deal? No, honestly. But it's sort-of a race. If you wanted to > debug threading, at least it know always produces the same output. > > This patch does not solve a race of that kind. It just removes a tsan > warning about a protected write and an unprotected read in an unrelated > thread that is waiting for the "owner" thread. So why would we do that? My > answer is: primarily to make tsan less noisy. It found "fake" (like this) > and "real" (some having real implications) races, and we will never find > the real races unless we silence the fake ones also. As long as I can do > that without affecting performance and code readability, I'm happy to do > so. If it affects readability or performance, I will obviously leave it > as-is. > > so I see no big offense in calling av_log() inside of it. > > > There's that also... I have no strong opinion on that. In that case, maybe better to move the av_log into the block to reduce the amount of code needed for it. But I have no strong opinion either, just the ever-present feeling that debug logs should not take up too much space in the source code.
Hi Ronald, A variant of this patch is committed as 2e664b9c1e73c80aab91070c1eb7676f04bdd12d: http://ffmpeg.org/pipermail/ffmpeg-cvslog/2017-April/106577.html I believe it does not fix the tsan warning. On Thu, Apr 6, 2017 at 10:48 AM, Ronald S. Bultje <rsbultje@gmail.com> wrote: > ..., but this way the accesses to the > field (reads and writes) are always protected by a mutex. The patch protects the reads of the |debug| field with p->progress_mutex, where |p| points to a PerThreadContext. But the write of the |debug| field in update_context_from_user() is not protected by progress_mutex of any PerThreadContext. The tsan warning this patch tried to fix is: > WARNING: ThreadSanitizer: data race (pid=10916) > Write of size 4 at 0x7d64000174fc by main thread (mutexes: write M2313): > #0 update_context_from_user src/libavcodec/pthread_frame.c:335 (ffmpeg+0x000000df7b06) > [..] > Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write M2311): > #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592 (ffmpeg+0x000000df8b3e) The code in libavcodec/pthread_frame.c related to the write of the |debug| field shown in the tsan warning is the following: 326 static int update_context_from_user(AVCodecContext *dst, AVCodecContext *sr c) 327 { ,,, 333 334 dst->opaque = src->opaque; 335 dst->debug = src->debug; 336 dst->debug_mv = src->debug_mv; ... 383 static int submit_packet(PerThreadContext *p, AVCodecContext *user_avctx, 384 AVPacket *avpkt) 385 { ... 393 394 pthread_mutex_lock(&p->mutex); 395 396 ret = update_context_from_user(p->avctx, user_avctx); ... 472 int ff_thread_decode_frame(AVCodecContext *avctx, 473 AVFrame *picture, int *got_picture_ptr, 474 AVPacket *avpkt) 475 { ... 480 481 /* release the async lock, permitting blocked hwaccel threads to 482 * go forward while we are in this function */ 483 async_unlock(fctx); 484 485 /* 486 * Submit a packet to the next decoding thread. 487 */ 488 489 p = &fctx->threads[fctx->next_decoding]; 490 err = submit_packet(p, avctx, avpkt); The code snippets show the write of dst->debug is only protected by p->mutex (acquired at line 394) for some |p|. progress_mutex (of any AVCodecContext) is not involved at all. Wan-Teh Chang
Hi Wan-Teh, On Wed, Jul 5, 2017 at 5:30 PM, Wan-Teh Chang <wtc@google.com> wrote: > Hi Ronald, > > A variant of this patch is committed as > 2e664b9c1e73c80aab91070c1eb7676f04bdd12d: > > http://ffmpeg.org/pipermail/ffmpeg-cvslog/2017-April/106577.html > > I believe it does not fix the tsan warning. I had noticed that too. Unfortunately, I have spent a significant amount of (unpaid, personal) time on fixing most of these tsan warnings already (with help of ubitux - thanks!!), and given the incredibly mega-low impact of this particular outstanding issue, I'm currently not actively working on fixing this. I may get back to this issue later but I have some other things that I'd rather work on and that I believe are more important. Ronald
Hi Ronald, Thank you for the quick reply! On Wed, Jul 5, 2017 at 2:49 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote: > Hi Wan-Teh, > > On Wed, Jul 5, 2017 at 5:30 PM, Wan-Teh Chang <wtc@google.com> wrote: >> >> Hi Ronald, >> >> A variant of this patch is committed as >> 2e664b9c1e73c80aab91070c1eb7676f04bdd12d: >> >> http://ffmpeg.org/pipermail/ffmpeg-cvslog/2017-April/106577.html >> >> I believe it does not fix the tsan warning. > > I had noticed that too. > > Unfortunately, I have spent a significant amount of (unpaid, personal) time > on fixing most of these tsan warnings already (with help of ubitux - > thanks!!), and given the incredibly mega-low impact of this particular > outstanding issue, I'm currently not actively working on fixing this. I may > get back to this issue later but I have some other things that I'd rather > work on and that I believe are more important. Thank you for all the tsan warning fixes. In the meantime, it would be good to revert 2e664b9c1e73c80aab91070c1eb7676f04bdd12d to avoid confusion. Wan-Teh Chang
Hi Wan-Teh, On Wed, Jul 5, 2017 at 6:12 PM, Wan-Teh Chang <wtc@google.com> wrote: > Hi Ronald, > > Thank you for the quick reply! > > On Wed, Jul 5, 2017 at 2:49 PM, Ronald S. Bultje <rsbultje@gmail.com> > wrote: > > Hi Wan-Teh, > > > > On Wed, Jul 5, 2017 at 5:30 PM, Wan-Teh Chang <wtc@google.com> wrote: > >> > >> Hi Ronald, > >> > >> A variant of this patch is committed as > >> 2e664b9c1e73c80aab91070c1eb7676f04bdd12d: > >> > >> http://ffmpeg.org/pipermail/ffmpeg-cvslog/2017-April/106577.html > >> > >> I believe it does not fix the tsan warning. > > > > I had noticed that too. > > > > Unfortunately, I have spent a significant amount of (unpaid, personal) > time > > on fixing most of these tsan warnings already (with help of ubitux - > > thanks!!), and given the incredibly mega-low impact of this particular > > outstanding issue, I'm currently not actively working on fixing this. I > may > > get back to this issue later but I have some other things that I'd rather > > work on and that I believe are more important. > > Thank you for all the tsan warning fixes. In the meantime, it would be > good to revert 2e664b9c1e73c80aab91070c1eb7676f04bdd12d to avoid > confusion. Why? I believe it fixes a subset of the issue. Ronald
Hi Ronald, On Wed, Jul 5, 2017 at 3:31 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote: > Hi Wan-Teh, > > On Wed, Jul 5, 2017 at 6:12 PM, Wan-Teh Chang <wtc@google.com> wrote: >> >> Thank you for all the tsan warning fixes. In the meantime, it would be >> good to revert 2e664b9c1e73c80aab91070c1eb7676f04bdd12d to avoid >> confusion. > > > Why? I believe it fixes a subset of the issue. Could you explain why it fixes a subset of the issue? The data race that tsan warned about was between a write and a read: WARNING: ThreadSanitizer: data race (pid=10916) Write of size 4 at 0x7d64000174fc by main thread (mutexes: write M2313): #0 update_context_from_user src/libavcodec/pthread_frame.c:335 (ffmpeg+0x000000df7b06) [..] Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write M2311): #0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592 (ffmpeg+0x000000df8b3e) But the write is not protected by the mutex that 2e664b9c1e73c80aab91070c1eb7676f04bdd12d uses to protect the reads. I can't reproduce the tsan warning by running fate-h264 (with and without 2e664b9c1e73c80aab91070c1eb7676f04bdd12d), but I can still reproduce the tsan warning by running a test program in my project with a recent (one or two weeks old) snapshot of ffmpeg: WARNING: ThreadSanitizer: data race (pid=86821) Read of size 4 at 0x7b640003b8fc by thread T19 (mutexes: write M525086672791228232, write M524523722837806536): #0 ff_thread_await_progress [...]/libavcodec/pthread_frame.c:591:26 (5ab42bb1a6f4b068d7863dabe9b2bacc+0xe749a1) #1 await_references [...]/libavcodec/h264_mb.c (5ab42bb1a6f4b068d7863dabe9b2bacc+0x9530c4) #2 hl_motion_420_simple_8 [...]/libavcodec/h264_mc_template.c:80:9 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x95106b) #3 hl_decode_mb_simple_8 [...]/libavcodec/h264_mb_template.c:180 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x95106b) #4 ff_h264_hl_decode_mb [...]/libavcodec/h264_mb.c:816:9 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x940953) #5 decode_slice [...]/libavcodec/h264_slice.c:2572:17 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x982ce7) #6 ff_h264_execute_decode_slices [...]/libavcodec/h264_slice.c:2747:15 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x981fb9) #7 decode_nal_units [...]/libavcodec/h264dec.c:718:27 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x99257b) #8 h264_decode_frame [...]/libavcodec/h264dec.c:1008 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x99257b) #9 frame_worker_thread [...]/libavcodec/pthread_frame.c:199:21 (5ab42bb1a6f4b068d7863dabe9b2bacc+0xe75b4c) Previous write of size 4 at 0x7b640003b8fc by main thread (mutexes: write M524242247861095840): #0 update_context_from_user [...]/libavcodec/pthread_frame.c:335:19 (5ab42bb1a6f4b068d7863dabe9b2bacc+0xe73859) #1 submit_packet [...]/libavcodec/pthread_frame.c:396 (5ab42bb1a6f4b068d7863dabe9b2bacc+0xe73859) #2 ff_thread_decode_frame [...]/libavcodec/pthread_frame.c:490 (5ab42bb1a6f4b068d7863dabe9b2bacc+0xe73859) #3 decode_simple_internal [...]/libavcodec/decode.c:415:15 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d49b9) #4 decode_simple_receive_frame [...]/libavcodec/decode.c:620 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d49b9) #5 decode_receive_frame_internal [...]/libavcodec/decode.c:638 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d49b9) #6 avcodec_send_packet [...]/libavcodec/decode.c:678:15 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d3ea7) #7 compat_decode [...]/libavcodec/decode.c:853:15 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d627d) #8 avcodec_decode_video2 [...]/libavcodec/decode.c:914:12 (5ab42bb1a6f4b068d7863dabe9b2bacc+0x7d617e) ... The tsan warning shows the read and the write are still protected by different mutexes. The read is now protected by two mutexes because of 2e664b9c1e73c80aab91070c1eb7676f04bdd12d. I hope this convinces you that 2e664b9c1e73c80aab91070c1eb7676f04bdd12d is not working as intended and should be reverted to avoid confusion. Thanks, Wan-Teh Chang
Hi Wan-Teh, On Wed, Jul 5, 2017 at 8:08 PM, Wan-Teh Chang <wtc@google.com> wrote: > Hi Ronald, > > On Wed, Jul 5, 2017 at 3:31 PM, Ronald S. Bultje <rsbultje@gmail.com> > wrote: > > Hi Wan-Teh, > > > > On Wed, Jul 5, 2017 at 6:12 PM, Wan-Teh Chang <wtc@google.com> wrote: > >> > >> Thank you for all the tsan warning fixes. In the meantime, it would be > >> good to revert 2e664b9c1e73c80aab91070c1eb7676f04bdd12d to avoid > >> confusion. > > > > > > Why? I believe it fixes a subset of the issue. > > Could you explain why it fixes a subset of the issue? From what I remember, I was (before this patch) semi-reliably able to reproduce the issue locally, and it went away after. I've observed the same thing in the relevant fate station, where before the patch, this warning occurred semi-reliably in 3-4 tests per run, and after the patch it sometimes occurs in 1 test per run. I understand this doesn't satisfy you but I currently don't want to dig through the code to figure out why I thought this was a good idea, I have other things that take priority. Reverting the patch will require me to dig through this code also, and I really don't see the gain. Some more thoughts: - I don't think we want to grab a second lock for something trivial like this (e.g. grabbing progress_mutex when copying the debug field in update_context_from_user); - making the debug flags field atomic will be difficult because it's public API, and I don't think we're ready to expose C11 types in our public API; - I suppose you could make a private (inside PerThreadContext, which allows it to be atomic) copy of debug intended for cross-threading purposes and use that here. Ronald
Hi Ronald, On Wed, Jul 5, 2017 at 7:24 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote: > Hi Wan-Teh, > > On Wed, Jul 5, 2017 at 8:08 PM, Wan-Teh Chang <wtc@google.com> wrote: >> >> Hi Ronald, >> >> On Wed, Jul 5, 2017 at 3:31 PM, Ronald S. Bultje <rsbultje@gmail.com> >> wrote: >> > Hi Wan-Teh, >> > >> > On Wed, Jul 5, 2017 at 6:12 PM, Wan-Teh Chang <wtc@google.com> wrote: >> >> >> >> Thank you for all the tsan warning fixes. In the meantime, it would be >> >> good to revert 2e664b9c1e73c80aab91070c1eb7676f04bdd12d to avoid >> >> confusion. >> > >> > >> > Why? I believe it fixes a subset of the issue. >> >> Could you explain why it fixes a subset of the issue? > > From what I remember, I was (before this patch) semi-reliably able to > reproduce the issue locally, and it went away after. I've observed the same > thing in the relevant fate station, where before the patch, this warning > occurred semi-reliably in 3-4 tests per run, and after the patch it > sometimes occurs in 1 test per run. I understand this doesn't satisfy you > but I currently don't want to dig through the code to figure out why I > thought this was a good idea, I have other things that take priority. > Reverting the patch will require me to dig through this code also, and I > really don't see the gain. In addition to avoiding confusion, reverting the patch will move the av_log() statements outside the lock. Since I don't use those two av_log() statements, I won't insist on reverting the patch. > Some more thoughts: > - I don't think we want to grab a second lock for something trivial like > this (e.g. grabbing progress_mutex when copying the debug field in > update_context_from_user); > - making the debug flags field atomic will be difficult because it's public > API, and I don't think we're ready to expose C11 types in our public API; > - I suppose you could make a private (inside PerThreadContext, which allows > it to be atomic) copy of debug intended for cross-threading purposes and use > that here. After studying this problem for two more days, I implemented a variant of your last suggestion: http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213384.html Let's continue the discussion in that email thread. Wan-Teh Chang
diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c index c246c2f..8857bfc 100644 --- a/libavcodec/pthread_frame.c +++ b/libavcodec/pthread_frame.c @@ -559,6 +559,7 @@ void ff_thread_report_progress(ThreadFrame *f, int n, int field) { PerThreadContext *p; atomic_int *progress = f->progress ? (atomic_int*)f->progress->data : NULL; + int debug_mv; if (!progress || atomic_load_explicit(&progress[field], memory_order_relaxed) >= n) @@ -566,22 +567,24 @@ void ff_thread_report_progress(ThreadFrame *f, int n, int field) p = f->owner[field]->internal->thread_ctx; - if (f->owner[field]->debug&FF_DEBUG_THREADS) - av_log(f->owner[field], AV_LOG_DEBUG, - "%p finished %d field %d\n", progress, n, field); - pthread_mutex_lock(&p->progress_mutex); + debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS; atomic_store_explicit(&progress[field], n, memory_order_release); pthread_cond_broadcast(&p->progress_cond); pthread_mutex_unlock(&p->progress_mutex); + + if (debug_mv) + av_log(f->owner[field], AV_LOG_DEBUG, + "%p finished %d field %d\n", progress, n, field); } void ff_thread_await_progress(ThreadFrame *f, int n, int field) { PerThreadContext *p; atomic_int *progress = f->progress ? (atomic_int*)f->progress->data : NULL; + int debug_mv; if (!progress || atomic_load_explicit(&progress[field], memory_order_acquire) >= n) @@ -589,14 +592,15 @@ void ff_thread_await_progress(ThreadFrame *f, int n, int field) p = f->owner[field]->internal->thread_ctx; - if (f->owner[field]->debug&FF_DEBUG_THREADS) - av_log(f->owner[field], AV_LOG_DEBUG, - "thread awaiting %d field %d from %p\n", n, field, progress); - pthread_mutex_lock(&p->progress_mutex); + debug_mv = f->owner[field]->debug&FF_DEBUG_THREADS; while (atomic_load_explicit(&progress[field], memory_order_relaxed) < n) pthread_cond_wait(&p->progress_cond, &p->progress_mutex); pthread_mutex_unlock(&p->progress_mutex); + + if (debug_mv) + av_log(f->owner[field], AV_LOG_DEBUG, + "thread awaited %d field %d from %p\n", n, field, progress); } void ff_thread_finish_setup(AVCodecContext *avctx) {