From patchwork Wed Apr 5 20:20:04 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ronald S. Bultje" X-Patchwork-Id: 3310 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.44.195 with SMTP id s186csp360349vss; Wed, 5 Apr 2017 13:21:29 -0700 (PDT) X-Received: by 10.223.141.140 with SMTP id o12mr5051829wrb.69.1491423689696; Wed, 05 Apr 2017 13:21:29 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f18si9217331wra.120.2017.04.05.13.21.29; Wed, 05 Apr 2017 13:21:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9497C681869; Wed, 5 Apr 2017 23:21:24 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qt0-f175.google.com (mail-qt0-f175.google.com [209.85.216.175]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0E6D76808A2 for ; Wed, 5 Apr 2017 23:21:18 +0300 (EEST) Received: by mail-qt0-f175.google.com with SMTP id n21so20776284qta.1 for ; Wed, 05 Apr 2017 13:21:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=3H2a0S9IUg2alVa6yQ1fTjmWDtrNyuVmyrMtJ/Y8sPk=; b=XOZAPSeHZ4lW0wms+eF/2h/t0/FAB4IhHJ35TjZxMIuPbAMhALHjwy/O9Mrx/v05qv b3X3x106xIul/F4D8sySYKJLSY6GrvfzXiYMVVoobZmL+Uj4g3HiVW4vp4ovD+f4e4as VvUxzAYv3ltt41Z8DguWvfG2YVWD+wR6wafxG8R4+8Lhg9U+s/jWn+CyukOdxlNgt+bs CJV3FxQQouHR1ClHwHdIT2M611JtpF9/j1tKJxZDfMiiXseVsL/Ck5nH7BTfpWuJOrvK 1GU6EK3CibEzCSYkvb6JGPBNb//Phka+i5qMht9y/dpU99VYutwma+pRzcifexkavlGB OdmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=3H2a0S9IUg2alVa6yQ1fTjmWDtrNyuVmyrMtJ/Y8sPk=; b=YYBhXEkCU1ssG2rc56hxwoRElC5CvQRoTAwJM+85H3HXM1NcK6sjP9j3HYv1Dq6kSv pINKhrUykWJKqozhBpqunxM7jNW/Az8tr1u4+6M1VCARl9kBn0QY+kqQJi2B5sxlSjHZ SrBWX/l4FaYYgnmKSQV6GMKkD2YNvx3xhKiPoz3dUTsSc4KuIijlYrfUYnzYcXCUiOof OXqECC7b2GhzlyTEPn/r14hUl0c+yuCubBmmvsTyjMy2TN0+zJvw9k9gQa6U1vIBfMGM HVBjcUV/RrJ8Avx2qzx5+AXfTyEUwHCr25fRbYdePh2O5/cdbOKalw40qp+GYWI8tet0 a2aw== X-Gm-Message-State: AFeK/H1K357iXSCA5d/MEhaKUGx/yC3MIabDZkALC2wEugSAYLJosDZYrmxSZN5BnbjohA== X-Received: by 10.200.46.35 with SMTP id r32mr30720220qta.56.1491423679097; Wed, 05 Apr 2017 13:21:19 -0700 (PDT) Received: from localhost.localdomain ([65.206.95.146]) by smtp.gmail.com with ESMTPSA id m12sm14739653qtf.25.2017.04.05.13.21.18 (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 05 Apr 2017 13:21:18 -0700 (PDT) From: "Ronald S. Bultje" To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Apr 2017 16:20:04 -0400 Message-Id: <1491423604-66388-2-git-send-email-rsbultje@gmail.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1491423604-66388-1-git-send-email-rsbultje@gmail.com> References: <1491423604-66388-1-git-send-email-rsbultje@gmail.com> Subject: [FFmpeg-devel] [PATCH 2/2] vp8: make mv_min/max thread-local if using partition threading. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: "Ronald S. Bultje" MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Fixes tsan warnings like this in fate-vp8-test-vector-007: WARNING: ThreadSanitizer: data race (pid=65909) Write of size 4 at 0x7d8c0000e088 by thread T1: #0 vp8_decode_mb_row_sliced vp8.c:2519 (ffmpeg:x86_64+0x100995ede) [..] Previous write of size 4 at 0x7d8c0000e088 by thread T2: #0 vp8_decode_mb_row_sliced vp8.c:2519 (ffmpeg:x86_64+0x100995ede) --- libavcodec/vp8.c | 53 ++++++++++++++++++++++++++++------------------------- libavcodec/vp8.h | 19 ++++++++++++------- 2 files changed, 40 insertions(+), 32 deletions(-) diff --git a/libavcodec/vp8.c b/libavcodec/vp8.c index 9bc1d95..fe7aa23 100644 --- a/libavcodec/vp8.c +++ b/libavcodec/vp8.c @@ -772,7 +772,7 @@ static int vp8_decode_frame_header(VP8Context *s, const uint8_t *buf, int buf_si } static av_always_inline -void clamp_mv(VP8Context *s, VP56mv *dst, const VP56mv *src) +void clamp_mv(VP8mvbounds *s, VP56mv *dst, const VP56mv *src) { dst->x = av_clip(src->x, av_clip(s->mv_min.x, INT16_MIN, INT16_MAX), av_clip(s->mv_max.x, INT16_MIN, INT16_MAX)); @@ -1031,7 +1031,7 @@ void vp7_decode_mvs(VP8Context *s, VP8Macroblock *mb, } static av_always_inline -void vp8_decode_mvs(VP8Context *s, VP8Macroblock *mb, +void vp8_decode_mvs(VP8Context *s, VP8mvbounds *mv_bounds, VP8Macroblock *mb, int mb_x, int mb_y, int layout) { VP8Macroblock *mb_edge[3] = { 0 /* top */, @@ -1102,7 +1102,7 @@ void vp8_decode_mvs(VP8Context *s, VP8Macroblock *mb, if (vp56_rac_get_prob_branchy(c, vp8_mode_contexts[cnt[CNT_NEAREST]][1])) { if (vp56_rac_get_prob_branchy(c, vp8_mode_contexts[cnt[CNT_NEAR]][2])) { /* Choose the best mv out of 0,0 and the nearest mv */ - clamp_mv(s, &mb->mv, &near_mv[CNT_ZERO + (cnt[CNT_NEAREST] >= cnt[CNT_ZERO])]); + clamp_mv(mv_bounds, &mb->mv, &near_mv[CNT_ZERO + (cnt[CNT_NEAREST] >= cnt[CNT_ZERO])]); cnt[CNT_SPLITMV] = ((mb_edge[VP8_EDGE_LEFT]->mode == VP8_MVMODE_SPLIT) + (mb_edge[VP8_EDGE_TOP]->mode == VP8_MVMODE_SPLIT)) * 2 + (mb_edge[VP8_EDGE_TOPLEFT]->mode == VP8_MVMODE_SPLIT); @@ -1116,11 +1116,11 @@ void vp8_decode_mvs(VP8Context *s, VP8Macroblock *mb, mb->bmv[0] = mb->mv; } } else { - clamp_mv(s, &mb->mv, &near_mv[CNT_NEAR]); + clamp_mv(mv_bounds, &mb->mv, &near_mv[CNT_NEAR]); mb->bmv[0] = mb->mv; } } else { - clamp_mv(s, &mb->mv, &near_mv[CNT_NEAREST]); + clamp_mv(mv_bounds, &mb->mv, &near_mv[CNT_NEAREST]); mb->bmv[0] = mb->mv; } } else { @@ -1166,7 +1166,8 @@ void decode_intra4x4_modes(VP8Context *s, VP56RangeCoder *c, VP8Macroblock *mb, } static av_always_inline -void decode_mb_mode(VP8Context *s, VP8Macroblock *mb, int mb_x, int mb_y, +void decode_mb_mode(VP8Context *s, VP8mvbounds *mv_bounds, + VP8Macroblock *mb, int mb_x, int mb_y, uint8_t *segment, uint8_t *ref, int layout, int is_vp7) { VP56RangeCoder *c = &s->c; @@ -1230,7 +1231,7 @@ void decode_mb_mode(VP8Context *s, VP8Macroblock *mb, int mb_x, int mb_y, if (is_vp7) vp7_decode_mvs(s, mb, mb_x, mb_y, layout); else - vp8_decode_mvs(s, mb, mb_x, mb_y, layout); + vp8_decode_mvs(s, mv_bounds, mb, mb_x, mb_y, layout); } else { // intra MB, 16.1 mb->mode = vp8_rac_get_tree(c, vp8_pred16x16_tree_inter, s->prob->pred16x16); @@ -2205,8 +2206,8 @@ void vp78_decode_mv_mb_modes(AVCodecContext *avctx, VP8Frame *curframe, VP8Context *s = avctx->priv_data; int mb_x, mb_y; - s->mv_min.y = -MARGIN; - s->mv_max.y = ((s->mb_height - 1) << 6) + MARGIN; + s->mv_bounds.mv_min.y = -MARGIN; + s->mv_bounds.mv_max.y = ((s->mb_height - 1) << 6) + MARGIN; for (mb_y = 0; mb_y < s->mb_height; mb_y++) { VP8Macroblock *mb = s->macroblocks_base + ((s->mb_width + 1) * (mb_y + 1) + 1); @@ -2214,20 +2215,20 @@ void vp78_decode_mv_mb_modes(AVCodecContext *avctx, VP8Frame *curframe, AV_WN32A(s->intra4x4_pred_mode_left, DC_PRED * 0x01010101); - s->mv_min.x = -MARGIN; - s->mv_max.x = ((s->mb_width - 1) << 6) + MARGIN; + s->mv_bounds.mv_min.x = -MARGIN; + s->mv_bounds.mv_max.x = ((s->mb_width - 1) << 6) + MARGIN; for (mb_x = 0; mb_x < s->mb_width; mb_x++, mb_xy++, mb++) { if (mb_y == 0) AV_WN32A((mb - s->mb_width - 1)->intra4x4_pred_mode_top, DC_PRED * 0x01010101); - decode_mb_mode(s, mb, mb_x, mb_y, curframe->seg_map->data + mb_xy, + decode_mb_mode(s, &s->mv_bounds, mb, mb_x, mb_y, curframe->seg_map->data + mb_xy, prev_frame && prev_frame->seg_map ? prev_frame->seg_map->data + mb_xy : NULL, 1, is_vp7); - s->mv_min.x -= 64; - s->mv_max.x -= 64; + s->mv_bounds.mv_min.x -= 64; + s->mv_bounds.mv_max.x -= 64; } - s->mv_min.y -= 64; - s->mv_max.y -= 64; + s->mv_bounds.mv_min.y -= 64; + s->mv_bounds.mv_max.y -= 64; } } @@ -2325,8 +2326,8 @@ static av_always_inline int decode_mb_row_no_filter(AVCodecContext *avctx, void if (!is_vp7 || mb_y == 0) memset(td->left_nnz, 0, sizeof(td->left_nnz)); - s->mv_min.x = -MARGIN; - s->mv_max.x = ((s->mb_width - 1) << 6) + MARGIN; + td->mv_bounds.mv_min.x = -MARGIN; + td->mv_bounds.mv_max.x = ((s->mb_width - 1) << 6) + MARGIN; for (mb_x = 0; mb_x < s->mb_width; mb_x++, mb_xy++, mb++) { if (c->end <= c->buffer && c->bits >= 0) @@ -2350,7 +2351,7 @@ static av_always_inline int decode_mb_row_no_filter(AVCodecContext *avctx, void dst[2] - dst[1], 2); if (!s->mb_layout) - decode_mb_mode(s, mb, mb_x, mb_y, curframe->seg_map->data + mb_xy, + decode_mb_mode(s, &td->mv_bounds, mb, mb_x, mb_y, curframe->seg_map->data + mb_xy, prev_frame && prev_frame->seg_map ? prev_frame->seg_map->data + mb_xy : NULL, 0, is_vp7); @@ -2397,8 +2398,8 @@ static av_always_inline int decode_mb_row_no_filter(AVCodecContext *avctx, void dst[0] += 16; dst[1] += 8; dst[2] += 8; - s->mv_min.x -= 64; - s->mv_max.x -= 64; + td->mv_bounds.mv_min.x -= 64; + td->mv_bounds.mv_max.x -= 64; if (mb_x == s->mb_width + 1) { update_pos(td, mb_y, s->mb_width + 3); @@ -2504,6 +2505,8 @@ int vp78_decode_mb_row_sliced(AVCodecContext *avctx, void *tdata, int jobnr, int ret; td->thread_nr = threadnr; + td->mv_bounds.mv_min.y = -MARGIN - 64 * threadnr; + td->mv_bounds.mv_max.y = ((s->mb_height - 1) << 6) + MARGIN - 64 * threadnr; for (mb_y = jobnr; mb_y < s->mb_height; mb_y += num_jobs) { atomic_store(&td->thread_mb_pos, mb_y << 16); ret = s->decode_mb_row_no_filter(avctx, tdata, jobnr, threadnr); @@ -2515,8 +2518,8 @@ int vp78_decode_mb_row_sliced(AVCodecContext *avctx, void *tdata, int jobnr, s->filter_mb_row(avctx, tdata, jobnr, threadnr); update_pos(td, mb_y, INT_MAX & 0xFFFF); - s->mv_min.y -= 64; - s->mv_max.y -= 64; + td->mv_bounds.mv_min.y -= 64 * num_jobs; + td->mv_bounds.mv_max.y -= 64 * num_jobs; if (avctx->active_thread_type == FF_THREAD_FRAME) ff_thread_report_progress(&curframe->tf, mb_y, 0); @@ -2662,8 +2665,8 @@ int vp78_decode_frame(AVCodecContext *avctx, void *data, int *got_frame, s->num_jobs = num_jobs; s->curframe = curframe; s->prev_frame = prev_frame; - s->mv_min.y = -MARGIN; - s->mv_max.y = ((s->mb_height - 1) << 6) + MARGIN; + s->mv_bounds.mv_min.y = -MARGIN; + s->mv_bounds.mv_max.y = ((s->mb_height - 1) << 6) + MARGIN; for (i = 0; i < MAX_THREADS; i++) { VP8ThreadData *td = &s->thread_data[i]; atomic_init(&td->thread_mb_pos, 0); diff --git a/libavcodec/vp8.h b/libavcodec/vp8.h index d7e7680..8263997 100644 --- a/libavcodec/vp8.h +++ b/libavcodec/vp8.h @@ -93,6 +93,16 @@ typedef struct VP8Macroblock { VP56mv bmv[16]; } VP8Macroblock; +typedef struct VP8intmv { + int x; + int y; +} VP8intmv; + +typedef struct VP8mvbounds { + VP8intmv mv_min; + VP8intmv mv_max; +} VP8mvbounds; + typedef struct VP8ThreadData { DECLARE_ALIGNED(16, int16_t, block)[6][4][16]; DECLARE_ALIGNED(16, int16_t, block_dc)[16]; @@ -122,6 +132,7 @@ typedef struct VP8ThreadData { #define EDGE_EMU_LINESIZE 32 DECLARE_ALIGNED(16, uint8_t, edge_emu_buffer)[21 * EDGE_EMU_LINESIZE]; VP8FilterStrength *filter_strength; + VP8mvbounds mv_bounds; } VP8ThreadData; typedef struct VP8Frame { @@ -129,11 +140,6 @@ typedef struct VP8Frame { AVBufferRef *seg_map; } VP8Frame; -typedef struct VP8intmv { - int x; - int y; -} VP8intmv; - #define MAX_THREADS 8 typedef struct VP8Context { VP8ThreadData *thread_data; @@ -152,8 +158,7 @@ typedef struct VP8Context { uint8_t deblock_filter; uint8_t mbskip_enabled; uint8_t profile; - VP8intmv mv_min; - VP8intmv mv_max; + VP8mvbounds mv_bounds; int8_t sign_bias[4]; ///< one state [0, 1] per ref frame type int ref_count[3];