Message ID | 1490970043-57594-1-git-send-email-rsbultje@gmail.com |
---|---|
State | New |
Headers | show |
On Fri, Mar 31, 2017 at 10:20:43AM -0400, Ronald S. Bultje wrote: > Fixes tsan warnings in fate-utvideoenc. Example warning from > utvideoenc_rgb_left: > > WARNING: ThreadSanitizer: data race (pid=19929) > Read of size 8 at 0x7d840001cf18 by main thread: > #0 ff_thread_video_encode_frame src/libavcodec/frame_thread_encoder.c:276 (ffmpeg+0x00000136a39d) > #1 avcodec_encode_video2 src/libavcodec/utils.c:1986 (ffmpeg+0x000000f34a18) > [..] > Previous write of size 8 at 0x7d840001cf18 by thread T14 (mutexes: write M1348): > #0 worker src/libavcodec/frame_thread_encoder.c:100 (ffmpeg+0x000001369964) > > We prevent that by using the same mechanism as frame-mt decoding, and > assuming that we're encoding N packets in parallel (where N=n_threads), > and thus the first N calls to encode_video() won't produce a packet. The whole idea behind frame_thread_encoder was that the number of threads and the delay could be changed at runtime ive rechecked the original commit and it said "Compared to the decoder side, this code is able to change both the delay and the number of threads seamlessly during encoding. Also any idle thread can pick up tasks, the strict round robin in order limit is gone too." Making it like frame-mt decoding doesnt sound good to me. Though the frame_thread_encoder was/is limited by the encoding API about the patch itself, the changes done to indexing seem not to change anything, it makes it possible for the variables to overflow though. IIUC the only change your patch does is to remove the outdata check from the quoted warning its a while ago that i worked n this code but isnt this just "missing" a finished_task_mutex lock over the access ? [...]
Hi, On Sat, Apr 1, 2017 at 6:15 AM, Michael Niedermayer <michael@niedermayer.cc> wrote: [.] > > about the patch itself, the changes done to indexing seem not to > change anything, it makes it possible for the variables to overflow > though. > That's intentional, they're both unsigned and the one is guaranteed to be larger than the other so the difference after subtract would still be guaranteed to be <= n_threads. So although the indices overflow, the subtraction will always be correct. > IIUC the only change your patch does is to remove the outdata check > from the quoted warning > Yes, because if you change the delay to be fixed (and equal to n_threads), you no longer need the check. If the difference between indices is smaller than n_threads, you don't return a packet. If it's equal, you do return a packet. its a while ago that i worked n this code but isnt this just "missing" > a finished_task_mutex lock over the access ? I don't know, it depends on what you're trying to accomplish. If you want a fixed-frame delay, you don't need the access and so no mutex is needed. I haven't thought much about what you need to get a variable-frame delay. To me, a fixed-frame delay for intra-only codecs (which is what this API currently accomplishes in practice) is exactly what you would expect, variable-frame delay implies you're not actually using all threads. But maybe I misunderstand. Ronald
On Sat, Apr 01, 2017 at 08:50:37AM -0400, Ronald S. Bultje wrote: > Hi, > > On Sat, Apr 1, 2017 at 6:15 AM, Michael Niedermayer <michael@niedermayer.cc> > wrote: > [.] > > > > about the patch itself, the changes done to indexing seem not to > > change anything, it makes it possible for the variables to overflow > > though. > > > > That's intentional, they're both unsigned and the one is guaranteed to be > larger than the other so the difference after subtract would still be > guaranteed to be <= n_threads. So although the indices overflow, the > subtraction will always be correct. > > > > IIUC the only change your patch does is to remove the outdata check > > from the quoted warning > > > > Yes, because if you change the delay to be fixed (and equal to n_threads), > you no longer need the check. If the difference between indices is smaller > than n_threads, you don't return a packet. If it's equal, you do return a > packet. > > its a while ago that i worked n this code but isnt this just "missing" > > a finished_task_mutex lock over the access ? > > > I don't know, it depends on what you're trying to accomplish. If you want a > fixed-frame delay, you don't need the access and so no mutex is needed. I > haven't thought much about what you need to get a variable-frame delay. To > me, a fixed-frame delay for intra-only codecs (which is what this API > currently accomplishes in practice) is exactly what you would expect, > variable-frame delay implies you're not actually using all threads. But > maybe I misunderstand. Lower delay is better, so is using fewer threads if theres no need for more it means less delay for any form of real time commuication less memory used, less L2 cache used and consequently better use of the CPU. If decoding with 1 thread is fast enough for the rate at which packets become available using 12 threads will result in more cpu cycles per frame, more memory used, more watts, a higher energy bill, ... [...]
Hi, On Sat, Apr 1, 2017 at 5:44 PM, Michael Niedermayer <michael@niedermayer.cc> wrote: > On Sat, Apr 01, 2017 at 08:50:37AM -0400, Ronald S. Bultje wrote: > > Hi, > > > > On Sat, Apr 1, 2017 at 6:15 AM, Michael Niedermayer > <michael@niedermayer.cc> > > wrote: > > [.] > > > > > > about the patch itself, the changes done to indexing seem not to > > > change anything, it makes it possible for the variables to overflow > > > though. > > > > > > > That's intentional, they're both unsigned and the one is guaranteed to be > > larger than the other so the difference after subtract would still be > > guaranteed to be <= n_threads. So although the indices overflow, the > > subtraction will always be correct. > > > > > > > IIUC the only change your patch does is to remove the outdata check > > > from the quoted warning > > > > > > > Yes, because if you change the delay to be fixed (and equal to > n_threads), > > you no longer need the check. If the difference between indices is > smaller > > than n_threads, you don't return a packet. If it's equal, you do return a > > packet. > > > > > its a while ago that i worked n this code but isnt this just "missing" > > > a finished_task_mutex lock over the access ? > > > > > > I don't know, it depends on what you're trying to accomplish. If you > want a > > fixed-frame delay, you don't need the access and so no mutex is needed. I > > haven't thought much about what you need to get a variable-frame delay. > To > > me, a fixed-frame delay for intra-only codecs (which is what this API > > currently accomplishes in practice) is exactly what you would expect, > > variable-frame delay implies you're not actually using all threads. But > > maybe I misunderstand. > > Lower delay is better, so is using fewer threads if theres no need for > more > it means less delay for any form of real time commuication > less memory used, less L2 cache used and consequently better use of > the CPU. > If decoding with 1 thread is fast enough for the rate at which packets > become available using 12 threads will result in more cpu cycles per > frame, more memory used, more watts, a higher energy bill, ... We already do automatic thread detection, and it's a user-settable option also. Can we please review the patch? Ronald
On Sat, Apr 01, 2017 at 06:34:50PM -0400, Ronald S. Bultje wrote: > Hi, > > On Sat, Apr 1, 2017 at 5:44 PM, Michael Niedermayer <michael@niedermayer.cc> > wrote: > > > On Sat, Apr 01, 2017 at 08:50:37AM -0400, Ronald S. Bultje wrote: > > > Hi, > > > > > > On Sat, Apr 1, 2017 at 6:15 AM, Michael Niedermayer > > <michael@niedermayer.cc> > > > wrote: > > > [.] > > > > > > > > about the patch itself, the changes done to indexing seem not to > > > > change anything, it makes it possible for the variables to overflow > > > > though. > > > > > > > > > > That's intentional, they're both unsigned and the one is guaranteed to be > > > larger than the other so the difference after subtract would still be > > > guaranteed to be <= n_threads. So although the indices overflow, the > > > subtraction will always be correct. > > > > > > > > > > IIUC the only change your patch does is to remove the outdata check > > > > from the quoted warning > > > > > > > > > > Yes, because if you change the delay to be fixed (and equal to > > n_threads), > > > you no longer need the check. If the difference between indices is > > smaller > > > than n_threads, you don't return a packet. If it's equal, you do return a > > > packet. > > > > > > > > its a while ago that i worked n this code but isnt this just "missing" > > > > a finished_task_mutex lock over the access ? > > > > > > > > > I don't know, it depends on what you're trying to accomplish. If you > > want a > > > fixed-frame delay, you don't need the access and so no mutex is needed. I > > > haven't thought much about what you need to get a variable-frame delay. > > To > > > me, a fixed-frame delay for intra-only codecs (which is what this API > > > currently accomplishes in practice) is exactly what you would expect, > > > variable-frame delay implies you're not actually using all threads. But > > > maybe I misunderstand. > > > > Lower delay is better, so is using fewer threads if theres no need for > > more > > it means less delay for any form of real time commuication > > less memory used, less L2 cache used and consequently better use of > > the CPU. > > If decoding with 1 thread is fast enough for the rate at which packets > > become available using 12 threads will result in more cpu cycles per > > frame, more memory used, more watts, a higher energy bill, ... > > > We already do automatic thread detection, and it's a user-settable option > also. automatic thread detection detects the maximum that might be usefull on the hardware, not the optimal Allowing the user to set the number of threads means the user can set the number but the user does not know the best number for each file, the number also changes at runtime. But even if it didnt change and the user knew the best it would be very inconvenient for the user to set it by hand Its better if it just uses the optimal automatically > > Can we please review the patch? This comment feels offensive I reviewed your patch, or at least tried to. "... the changes done to indexing seem not to change anything, ..." If the changes are not needed, please remove them from the patch or if they are neeed please explain why having an index count to UINT_MAX is confusing, i prefer it to stay within 0..BUFFER_SIZE. '...isnt this just "missing" a finished_task_mutex lock over the access ?' If it just is missing a lock, adding this lock is the fix i would prefer. Of course you can as well use an atomic access or somethig else that keeps tsan from complaining. Supporting changing delay and number of threads at runtime was a big factor in the design of the frame thread encoder. Its something i cared about, so i am not to positive towards patches removing that. [...]
Hi, On Sat, Apr 1, 2017 at 8:19 PM, Michael Niedermayer <michael@niedermayer.cc> wrote: > Supporting changing delay and number of threads at runtime was a > big factor in the design of the frame thread encoder. Its something > i cared about, so i am not to positive towards patches removing that. It introduced a bug. I fixed it. Please accept the fix or fix it yourself differently. Don't advocate keeping a bug in our tree. Ronald
On Sun, Apr 02, 2017 at 07:30:00AM -0400, Ronald S. Bultje wrote: > Hi, > > On Sat, Apr 1, 2017 at 8:19 PM, Michael Niedermayer <michael@niedermayer.cc> > wrote: > > > Supporting changing delay and number of threads at runtime was a > > big factor in the design of the frame thread encoder. Its something > > i cared about, so i am not to positive towards patches removing that. > > > It introduced a bug. I fixed it. Please accept the fix or fix it yourself > differently. Don't advocate keeping a bug in our tree. I would but iam not able to reproduce this warning with gcc-6 (Ubuntu 6.2.0-3ubuntu11~12.04) 6.2.0 20160901 please try to move the finished_task_mutex lock up so it covers the part of the check that needs it [...]
diff --git a/libavcodec/frame_thread_encoder.c b/libavcodec/frame_thread_encoder.c index 27ae356..8db9290 100644 --- a/libavcodec/frame_thread_encoder.c +++ b/libavcodec/frame_thread_encoder.c @@ -251,6 +251,7 @@ int ff_thread_video_encode_frame(AVCodecContext *avctx, AVPacket *pkt, const AVF ThreadContext *c = avctx->internal->frame_thread_encoder; Task task; int ret; + unsigned idx; av_assert1(!*got_packet_ptr); @@ -264,32 +265,33 @@ int ff_thread_video_encode_frame(AVCodecContext *avctx, AVPacket *pkt, const AVF return ret; } - task.index = c->task_index; + task.index = c->task_index % BUFFER_SIZE; task.indata = (void*)new; pthread_mutex_lock(&c->task_fifo_mutex); av_fifo_generic_write(c->task_fifo, &task, sizeof(task), NULL); pthread_cond_signal(&c->task_fifo_cond); pthread_mutex_unlock(&c->task_fifo_mutex); - c->task_index = (c->task_index+1) % BUFFER_SIZE; + c->task_index++; - if(!c->finished_tasks[c->finished_task_index].outdata && (c->task_index - c->finished_task_index) % BUFFER_SIZE <= avctx->thread_count) + if (c->task_index - c->finished_task_index < avctx->thread_count) return 0; } if(c->task_index == c->finished_task_index) return 0; + idx = c->finished_task_index % BUFFER_SIZE; pthread_mutex_lock(&c->finished_task_mutex); - while (!c->finished_tasks[c->finished_task_index].outdata) { + while (!c->finished_tasks[idx].outdata) { pthread_cond_wait(&c->finished_task_cond, &c->finished_task_mutex); } - task = c->finished_tasks[c->finished_task_index]; + task = c->finished_tasks[idx]; *pkt = *(AVPacket*)(task.outdata); if(pkt->data) *got_packet_ptr = 1; - av_freep(&c->finished_tasks[c->finished_task_index].outdata); - c->finished_task_index = (c->finished_task_index+1) % BUFFER_SIZE; + av_freep(&c->finished_tasks[idx].outdata); + c->finished_task_index++; pthread_mutex_unlock(&c->finished_task_mutex); return task.return_code;