From patchwork Mon Jul 1 17:08:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50258 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc64:0:b0:482:c625:d099 with SMTP id k4csp1947846vqv; Mon, 1 Jul 2024 10:08:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWTqTqGsT63JL9GF95anq0Wbvd7WW8WsjOoDupaM759PY3XBEqvjoGqU4EUTeYxIuXOqOqCn7st/91W1AlOOjP3zgX8gh1j0B6Ghg== X-Google-Smtp-Source: AGHT+IGqY76RCUCq2PmvjDTf85bNRXrkzFPfH/f1iAXqhU1UUylZQRlBKj1clJBvgzzaiTmzzxu1 X-Received: by 2002:a2e:a80c:0:b0:2ee:6254:f9f1 with SMTP id 38308e7fff4ca-2ee6254fb37mr48284231fa.6.1719853699187; Mon, 01 Jul 2024 10:08:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1719853699; cv=none; d=google.com; s=arc-20160816; b=E9/GAqxNAQRgUrwFdk5iTDsZuleJWHCSj+W+nQpR9x1sPD18Y+5bMuZ+QKX2abRrpO 6RnSrw4wqCjLRvXEr2Sbs5c8KnGHz4rkBtEoqaxzSsu6KnF04LTzw3tWoe+grWO6et7c faMioEy4Fx8PvAx9X+gAZY4MtLAOQNOBIGWxZfojmZ6jp8Tx2P1LDZlU9Aq86Q4Gh4Zk pzDIRLh50kh1L4LTEM48quZVqvUUMXGPtxg3O+amb1VT02vR4dhYa9IMgzI6V9sBsiMa 5sUXgFu4ZtWaeIB31Ijd9DduLG6O11tF0UiViorrY16ueXAwd7gKMmgmGD3X9myrYryu agVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=5a6xoAruAzECWvFUBpnxmY7zgESFbYlL978LHv2qXP4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=jjPmShJe7oEOXuV9Ka0Fkh00gjRWvNXJ77POdJW2u9SLcAVKUG7RULrqSlEDAbraEC o79YwMjb66/pKc0PTOdYPqx6fFlqR5bHrlSSXCDfykF45ZUoy6QFUyui8qc0kQdnVFcl HD/cRRwUSXYIuvhCXZeNW68+KFGza+cJp13+eG6P5SDkgFQTh2Acnzp6AsgbzcG6+tQY TNw11Au/9rLN5sKandBG6gpEdjoXksHQ60/ogVd/P7dmnRnpkd8w1T0PCLTXz/s0GuOi mLhkDq7PF/eCEp4XsYwLWgZ4OizMMDee6ptoQOmzYVZm8c5Wn6yOhEv/D7kCLB4sPPBP pPDQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-58615179e8asi3813095a12.489.2024.07.01.10.08.18; Mon, 01 Jul 2024 10:08:19 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B8B4668D800; Mon, 1 Jul 2024 20:08:14 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2D3A668D6E7 for ; Mon, 1 Jul 2024 20:08:08 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 93093C0140 for ; Mon, 1 Jul 2024 20:08:07 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 1 Jul 2024 20:08:04 +0300 Message-ID: <20240701170807.107018-1-remi@remlab.net> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [RFC] [PATCH 1/4] lavc/h264_loopfilter: expose tc0_table (for checkasm) X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: e4h1JUiLf22M --- libavcodec/h264_loopfilter.c | 50 ++++++++++++++++++------------------ libavcodec/h264dsp.h | 2 ++ 2 files changed, 27 insertions(+), 25 deletions(-) diff --git a/libavcodec/h264_loopfilter.c b/libavcodec/h264_loopfilter.c index c164a289b7..9481882dd0 100644 --- a/libavcodec/h264_loopfilter.c +++ b/libavcodec/h264_loopfilter.c @@ -66,7 +66,7 @@ static const uint8_t beta_table[52*3] = { 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, }; -static const uint8_t tc0_table[52*3][4] = { +const int8_t ff_h264_tc0_table[52*3][4] = { {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, @@ -109,10 +109,10 @@ static av_always_inline void filter_mb_edgev(uint8_t *pix, int stride, if( bS[0] < 4 || !intra ) { int8_t tc[4]; - tc[0] = tc0_table[index_a][bS[0]]; - tc[1] = tc0_table[index_a][bS[1]]; - tc[2] = tc0_table[index_a][bS[2]]; - tc[3] = tc0_table[index_a][bS[3]]; + tc[0] = ff_h264_tc0_table[index_a][bS[0]]; + tc[1] = ff_h264_tc0_table[index_a][bS[1]]; + tc[2] = ff_h264_tc0_table[index_a][bS[2]]; + tc[3] = ff_h264_tc0_table[index_a][bS[3]]; h->h264dsp.h264_h_loop_filter_luma(pix, stride, alpha, beta, tc); } else { h->h264dsp.h264_h_loop_filter_luma_intra(pix, stride, alpha, beta); @@ -131,10 +131,10 @@ static av_always_inline void filter_mb_edgecv(uint8_t *pix, int stride, if( bS[0] < 4 || !intra ) { int8_t tc[4]; - tc[0] = tc0_table[index_a][bS[0]]+1; - tc[1] = tc0_table[index_a][bS[1]]+1; - tc[2] = tc0_table[index_a][bS[2]]+1; - tc[3] = tc0_table[index_a][bS[3]]+1; + tc[0] = ff_h264_tc0_table[index_a][bS[0]]+1; + tc[1] = ff_h264_tc0_table[index_a][bS[1]]+1; + tc[2] = ff_h264_tc0_table[index_a][bS[2]]+1; + tc[3] = ff_h264_tc0_table[index_a][bS[3]]+1; h->h264dsp.h264_h_loop_filter_chroma(pix, stride, alpha, beta, tc); } else { h->h264dsp.h264_h_loop_filter_chroma_intra(pix, stride, alpha, beta); @@ -154,10 +154,10 @@ static av_always_inline void filter_mb_mbaff_edgev(const H264Context *h, uint8_t if( bS[0] < 4 || !intra ) { int8_t tc[4]; - tc[0] = tc0_table[index_a][bS[0*bsi]]; - tc[1] = tc0_table[index_a][bS[1*bsi]]; - tc[2] = tc0_table[index_a][bS[2*bsi]]; - tc[3] = tc0_table[index_a][bS[3*bsi]]; + tc[0] = ff_h264_tc0_table[index_a][bS[0*bsi]]; + tc[1] = ff_h264_tc0_table[index_a][bS[1*bsi]]; + tc[2] = ff_h264_tc0_table[index_a][bS[2*bsi]]; + tc[3] = ff_h264_tc0_table[index_a][bS[3*bsi]]; h->h264dsp.h264_h_loop_filter_luma_mbaff(pix, stride, alpha, beta, tc); } else { h->h264dsp.h264_h_loop_filter_luma_mbaff_intra(pix, stride, alpha, beta); @@ -177,10 +177,10 @@ static av_always_inline void filter_mb_mbaff_edgecv(const H264Context *h, if( bS[0] < 4 || !intra ) { int8_t tc[4]; - tc[0] = tc0_table[index_a][bS[0*bsi]] + 1; - tc[1] = tc0_table[index_a][bS[1*bsi]] + 1; - tc[2] = tc0_table[index_a][bS[2*bsi]] + 1; - tc[3] = tc0_table[index_a][bS[3*bsi]] + 1; + tc[0] = ff_h264_tc0_table[index_a][bS[0*bsi]] + 1; + tc[1] = ff_h264_tc0_table[index_a][bS[1*bsi]] + 1; + tc[2] = ff_h264_tc0_table[index_a][bS[2*bsi]] + 1; + tc[3] = ff_h264_tc0_table[index_a][bS[3*bsi]] + 1; h->h264dsp.h264_h_loop_filter_chroma_mbaff(pix, stride, alpha, beta, tc); } else { h->h264dsp.h264_h_loop_filter_chroma_mbaff_intra(pix, stride, alpha, beta); @@ -199,10 +199,10 @@ static av_always_inline void filter_mb_edgeh(uint8_t *pix, int stride, if( bS[0] < 4 || !intra ) { int8_t tc[4]; - tc[0] = tc0_table[index_a][bS[0]]; - tc[1] = tc0_table[index_a][bS[1]]; - tc[2] = tc0_table[index_a][bS[2]]; - tc[3] = tc0_table[index_a][bS[3]]; + tc[0] = ff_h264_tc0_table[index_a][bS[0]]; + tc[1] = ff_h264_tc0_table[index_a][bS[1]]; + tc[2] = ff_h264_tc0_table[index_a][bS[2]]; + tc[3] = ff_h264_tc0_table[index_a][bS[3]]; h->h264dsp.h264_v_loop_filter_luma(pix, stride, alpha, beta, tc); } else { h->h264dsp.h264_v_loop_filter_luma_intra(pix, stride, alpha, beta); @@ -221,10 +221,10 @@ static av_always_inline void filter_mb_edgech(uint8_t *pix, int stride, if( bS[0] < 4 || !intra ) { int8_t tc[4]; - tc[0] = tc0_table[index_a][bS[0]]+1; - tc[1] = tc0_table[index_a][bS[1]]+1; - tc[2] = tc0_table[index_a][bS[2]]+1; - tc[3] = tc0_table[index_a][bS[3]]+1; + tc[0] = ff_h264_tc0_table[index_a][bS[0]]+1; + tc[1] = ff_h264_tc0_table[index_a][bS[1]]+1; + tc[2] = ff_h264_tc0_table[index_a][bS[2]]+1; + tc[3] = ff_h264_tc0_table[index_a][bS[3]]+1; h->h264dsp.h264_v_loop_filter_chroma(pix, stride, alpha, beta, tc); } else { h->h264dsp.h264_v_loop_filter_chroma_intra(pix, stride, alpha, beta); diff --git a/libavcodec/h264dsp.h b/libavcodec/h264dsp.h index 4a9cb1568d..13371c59ea 100644 --- a/libavcodec/h264dsp.h +++ b/libavcodec/h264dsp.h @@ -117,6 +117,8 @@ typedef struct H264DSPContext { int (*startcode_find_candidate)(const uint8_t *buf, int size); } H264DSPContext; +extern const int8_t ff_h264_tc0_table[][4]; + void ff_h264dsp_init(H264DSPContext *c, const int bit_depth, const int chroma_format_idc); void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth, From patchwork Mon Jul 1 17:08:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50259 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc64:0:b0:482:c625:d099 with SMTP id k4csp1947972vqv; Mon, 1 Jul 2024 10:08:30 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXp+9sOK75mL6bkiXMkQd100qM3Vqy04874fbpxtiQSS7KX5rQW5e+mJsqmDxAc7VJeiq5APEPlVOXNqIjJPsqI70O+qXHbSujAqQ== X-Google-Smtp-Source: AGHT+IHftHJ8wh3xQ6D4rWqNaamDIVE2VCUzAv7HscwF8H76mZ9ZCsgl+sTD64JcP2v+uUnHDV9Y X-Received: by 2002:a05:6512:1390:b0:52d:259d:bd91 with SMTP id 2adb3069b0e04-52e8266df45mr4898022e87.18.1719853710271; Mon, 01 Jul 2024 10:08:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1719853710; cv=none; d=google.com; s=arc-20160816; b=gYX9U47C+mPHYQa529NlJmtXLeR/Yosd+c17W1Jgwd6tChFbkgu/XSCcbmTyDiEAZP znGb7W2DiAP6CQhGGiwCiu5uEaqtGqFI2Aszy4hC7UO9kGvN+BVPWBftoSYarBcKya9h XmzycG5RSd7jq511KiKkb2wGlzVHiYNQ1/sVpmtFHmd2o0OjzpHlhoW5o6MUOVCcsu/d Ox6ghHAJBRqNNpnA9mYBtyLgzDwZfQkZ8T69HV7alqzXgVoMKxTNcrLbm+XqpomRtIYZ 2vI8QIXtoZh8CrOVFnJbU0JQNaP3eMHyUnhkPi++fNvdw/kYPdDtUEUGWGs5Ng4Smyfr UjaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=k8bOb6r5X2mHT0hWdQUFdyvdK/83BwW1B0d26vuV0/0=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=Ph1N8bPjx1CtWx18kUE57twgf1S8BTI476EwRI5dYspF2wP40/fTEvJjH0q03bZpc6 MjgJwRkDjkfXeMEXd13iaawxqaixgJGU6J+Cs63Pz2kvKxhs6B+YihnO3eCEbMzCC/U4 DP/mNyuaFvC53PsnvH0roHBosKDAWk6TwKnU6fMg3EMRWZj28yOBEMzBBBc2TT7xS/hy PKCyhyPGZ8Tsr6790AD4X3psYp1Gw2Kj532nsPtMR5L6RvZ17J95zbOd0PJ5jjr6ZsRj BJxxXKVn6u2h94mnx3fT6pJ4q43UVSkGnwuT8rDnfsIbAz+KYY8++GqGathwKw3KTq+w mUlw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-52e7ab25989si2417408e87.267.2024.07.01.10.08.29; Mon, 01 Jul 2024 10:08:30 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BFED168D3C9; Mon, 1 Jul 2024 20:08:15 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3C0F168D72A for ; Mon, 1 Jul 2024 20:08:08 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id C7CFAC0170 for ; Mon, 1 Jul 2024 20:08:07 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 1 Jul 2024 20:08:05 +0300 Message-ID: <20240701170807.107018-2-remi@remlab.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240701170807.107018-1-remi@remlab.net> References: <20240701170807.107018-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/4] lavc/h264_loopfilter: align TC and bS tables X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ng7tV4ocD/j9 --- libavcodec/h264_loopfilter.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavcodec/h264_loopfilter.c b/libavcodec/h264_loopfilter.c index 9481882dd0..96f572c1d2 100644 --- a/libavcodec/h264_loopfilter.c +++ b/libavcodec/h264_loopfilter.c @@ -66,7 +66,7 @@ static const uint8_t beta_table[52*3] = { 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, }; -const int8_t ff_h264_tc0_table[52*3][4] = { +const DECLARE_ALIGNED_4(int8_t, ff_h264_tc0_table)[52*3][4] = { {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, {-1, 0, 0, 0 }, @@ -266,8 +266,8 @@ static av_always_inline void h264_filter_mb_fast_internal(const H264Context *h, qpc1 = (qpc + qpc1 + 1) >> 1; if( IS_INTRA(mb_type) ) { - static const int16_t bS4[4] = {4,4,4,4}; - static const int16_t bS3[4] = {3,3,3,3}; + static const DECLARE_ALIGNED_8(int16_t, bS4)[4] = {4,4,4,4}; + static const DECLARE_ALIGNED_8(int16_t, bS3)[4] = {3,3,3,3}; const int16_t *bSH = FIELD_PICTURE(h) ? bS3 : bS4; if(left_type) filter_mb_edgev( &img_y[4*0< X-Patchwork-Id: 50260 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc64:0:b0:482:c625:d099 with SMTP id k4csp1948065vqv; Mon, 1 Jul 2024 10:08:37 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVd8wV7u92OEodtJeAKLZ5Ut8Zfleo+fhClVWF2uOsz/s7uVGXTFRWNfMC124V8EygEv++vmd/sFlSoatI9xZ7ooBlzzsKRHbWvyw== X-Google-Smtp-Source: AGHT+IFdvOSd1HAyAVx/A0OYj4KAesqHxiBFH8mOhyoQf2rjx5Gccoy/51sgG+jOUGlhAPKwTfQ6 X-Received: by 2002:a17:907:9854:b0:a6f:d867:4259 with SMTP id a640c23a62f3a-a7514489abbmr471147666b.26.1719853717668; Mon, 01 Jul 2024 10:08:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1719853717; cv=none; d=google.com; s=arc-20160816; b=0/575imrEEt72SB+xQPQ0PGuHxhA4FB3qYx62oD3pQL7OVS46OjZAEdUHgjZjZrXbU hfFDcwMbP82m1Bxdy/yoG1cIQWkMPJy4FhHbYHI99PrnR+eUtMmIs3w6CEwUJcu5+THQ 5svM2R6tC/WaAKd17EdTUNV1S62tfGibWjOa1+r+TvhBGGhSMNXzIXDwZNHMrH2R+YC+ iOZ9NvFLRhNPUEOO/BYuo7OagQj55J//naCFrI4wvy5RQXfPC1RdSWK/5ALFVgj8gepw 22qqVWah9h+cc5n9NBvh1139UphuQAuwrqY/aHxS4xGNFn1QmEdbrWXCL/rfHrV1zXZf Bvfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=3YVAa7rJsiEu0TnbAKtOdmxEO2h50vcyig59s5N9+mY=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=rRIcSco7UEQocO/Bz5el9yYYtdzcN7Ojk3LOrc9vM/Etx8+8A1mS+5zmph4tBNvcQM ps2BLjYvcDUKfu0QqTNomJn0nMryd12AYrAwSpNbKKNstbpaZG0y0q+CRiW9Bud0A6Rm iVsgbBI7RSqXXfEFeP1odvl78D0BE6+F63MrtXtPz+qhLRUJNoSBsW6DP7hFNFW5zTLn qYBJ4liSZp6bgKsxCpQNoe3DPreOnBY6fqX3fgtqzvIJXRSHK6MhyqHPXIvOOBY+t+uX JNK1jsQedjv3R4tPtIuAg2EofEhX3yWAh3jNrWBefmgyeFDFOH+bP4rEQDxNzbPvgBWP kFiQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a72ab075322si368403766b.544.2024.07.01.10.08.37; Mon, 01 Jul 2024 10:08:37 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CEBB068D816; Mon, 1 Jul 2024 20:08:16 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 479C868D787 for ; Mon, 1 Jul 2024 20:08:08 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id F41E5C02F8 for ; Mon, 1 Jul 2024 20:08:07 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 1 Jul 2024 20:08:06 +0300 Message-ID: <20240701170807.107018-3-remi@remlab.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240701170807.107018-1-remi@remlab.net> References: <20240701170807.107018-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/4] WIP: lavc/h264dsp: take over looking up TC values X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: GK/BpLhKmrOP This moves the look-up of TC values from bS from the generic C loop filter code to the DSP functions. This (potentially) eliminates a round-trip to the stack for the looked-up values. This is work-in-progress. 8 functions need to be updated and this only updates one of them. Also updating the platform-specific optimisations is left as an exercise for a future version, as is updating checkasm. --- libavcodec/h264_loopfilter.c | 8 ++------ libavcodec/h264dsp.h | 4 +++- libavcodec/h264dsp_template.c | 9 +++++++-- 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/libavcodec/h264_loopfilter.c b/libavcodec/h264_loopfilter.c index 96f572c1d2..8fca08811c 100644 --- a/libavcodec/h264_loopfilter.c +++ b/libavcodec/h264_loopfilter.c @@ -108,12 +108,8 @@ static av_always_inline void filter_mb_edgev(uint8_t *pix, int stride, if (alpha ==0 || beta == 0) return; if( bS[0] < 4 || !intra ) { - int8_t tc[4]; - tc[0] = ff_h264_tc0_table[index_a][bS[0]]; - tc[1] = ff_h264_tc0_table[index_a][bS[1]]; - tc[2] = ff_h264_tc0_table[index_a][bS[2]]; - tc[3] = ff_h264_tc0_table[index_a][bS[3]]; - h->h264dsp.h264_h_loop_filter_luma(pix, stride, alpha, beta, tc); + const int8_t *tc = ff_h264_tc0_table[index_a]; + h->h264dsp.h264_h_loop_filter_luma(pix, stride, alpha, beta, tc, bS); } else { h->h264dsp.h264_h_loop_filter_luma_intra(pix, stride, alpha, beta); } diff --git a/libavcodec/h264dsp.h b/libavcodec/h264dsp.h index 13371c59ea..f37ff5414c 100644 --- a/libavcodec/h264dsp.h +++ b/libavcodec/h264dsp.h @@ -48,7 +48,9 @@ typedef struct H264DSPContext { void (*h264_v_loop_filter_luma)(uint8_t *pix /*align 16*/, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); void (*h264_h_loop_filter_luma)(uint8_t *pix /*align 4 */, ptrdiff_t stride, - int alpha, int beta, int8_t *tc0); + int alpha, int beta, + const int8_t tc0[4] /*align 4*/, + const int16_t bs[4] /*align 8*/); void (*h264_h_loop_filter_luma_mbaff)(uint8_t *pix /*align 16*/, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); /* v/h_loop_filter_luma_intra: align 16 */ diff --git a/libavcodec/h264dsp_template.c b/libavcodec/h264dsp_template.c index fe23a2cff1..4d4e34cf81 100644 --- a/libavcodec/h264dsp_template.c +++ b/libavcodec/h264dsp_template.c @@ -153,9 +153,14 @@ static void FUNCC(h264_v_loop_filter_luma)(uint8_t *pix, ptrdiff_t stride, int a { FUNCC(h264_loop_filter_luma)(pix, stride, sizeof(pixel), 4, alpha, beta, tc0); } -static void FUNCC(h264_h_loop_filter_luma)(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0) +static void FUNCC(h264_h_loop_filter_luma)(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, const int8_t tc0[4], const int16_t bS[4]) { - FUNCC(h264_loop_filter_luma)(pix, sizeof(pixel), stride, 4, alpha, beta, tc0); + int8_t tc[4]; + + for (size_t i = 0; i < 4; i++) + tc[i] = tc0[bS[i]]; + + FUNCC(h264_loop_filter_luma)(pix, sizeof(pixel), stride, 4, alpha, beta, tc); } static void FUNCC(h264_h_loop_filter_luma_mbaff)(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0) { From patchwork Mon Jul 1 17:08:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50261 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc64:0:b0:482:c625:d099 with SMTP id k4csp1953522vqv; Mon, 1 Jul 2024 10:17:51 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXKjKI5s0hcAeCbuOB8UNc/D80ss73fUcgMZQXxTVOe1/N/8Vl21TdXU8NorwWooaGCEqNKTBDIOy9ZahGz442qtjVs/6erNGzO7A== X-Google-Smtp-Source: AGHT+IGSLmFNGcHkpSs8axnwZMbZW+WS3XUXHXyUDhi0XQE9qdrNv580xFMqN/Bbm9g+lGZV+CdT X-Received: by 2002:a2e:b888:0:b0:2ec:57c7:c740 with SMTP id 38308e7fff4ca-2ee5e6bc38amr41624891fa.39.1719854270362; Mon, 01 Jul 2024 10:17:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1719854270; cv=none; d=google.com; s=arc-20160816; b=t26TUW/h8urncGHUQWhuWRRvD7g2pfv0zozQ3rnJYSGf3ui1sw/Zuzcrne+zxs/ArL IU/4Kz/alfgHCxfxwCivrATEogmMY5SLZUnqV/aZRZ4v4u/z3YAKXZMJZG9wI3839fpy dDI+b4/1hQedsEza8481dxfEcsgTZKxeYVXBYOQpu5gVwbEiO3Iq2Mmk+0KdgvZY3bex jYmt1HpIsdyRqtBfXcLbTIlMRSOnbkl0seN8sikNAMHb+GfVsI4Nb8klrQ85uR3ScNS9 35jygD5JlXdE4NEt2cNn+yopISswt+K+XIED4Gedf07g0K384OyUL/YHs3SK8IkK+Lxg ij5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=EXr2NDPNJdegZC66J+88oxFDEhTosuw7sQD9CXorY2s=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=zdayNj5l02045SqepSfoj6R//QVXxOIinR6kUYxW7fDYW1Xz/2ewmBJYeqlYgVqilc i1W8v6GWIn2zNJeqqIP4xCVNGdtxIVdXaUpWDKtuhbWHqUtN5o/DYUSmMtLVeItltXN8 d4JnrTvYo+C5fNLoKCOlnjIdptTyO2m1QsL+k8nt+v48laDKR/qsY5M828zBl5rCys0K ftkptE9D8yRd+LkJ6SYGElbv9jOmtCpb9sgMrOt5kyDmA8YP/dz57NfeiBzMOcxAnrc8 ZcWmRjNkvm7nXEL4Pi0C2Y0NoCCsP7W5kjWenXC8ibPzwJCmCbIMhi51XCK5O95uTaCy AN8Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2ee51584037si20157821fa.648.2024.07.01.10.17.49; Mon, 01 Jul 2024 10:17:50 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 201FB68D820; Mon, 1 Jul 2024 20:08:18 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AA3E768D7F5 for ; Mon, 1 Jul 2024 20:08:08 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 2C693C02F9 for ; Mon, 1 Jul 2024 20:08:08 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 1 Jul 2024 20:08:07 +0300 Message-ID: <20240701170807.107018-4-remi@remlab.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240701170807.107018-1-remi@remlab.net> References: <20240701170807.107018-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] lavc/h264dsp: update R-V V intra luma loop filter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: khg9/8UThhrc Note that the performance reported by checkasm is slightly worse. This is expected since the assembler is now doing more work. --- libavcodec/riscv/h264dsp_init.c | 3 ++- libavcodec/riscv/h264dsp_rvv.S | 6 ++++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/libavcodec/riscv/h264dsp_init.c b/libavcodec/riscv/h264dsp_init.c index ab412a9924..9650cae66b 100644 --- a/libavcodec/riscv/h264dsp_init.c +++ b/libavcodec/riscv/h264dsp_init.c @@ -30,7 +30,8 @@ void ff_h264_v_loop_filter_luma_8_rvv(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); void ff_h264_h_loop_filter_luma_8_rvv(uint8_t *pix, ptrdiff_t stride, - int alpha, int beta, int8_t *tc0); + int alpha, int beta, const int8_t *tc0, + const int16_t *bS); void ff_h264_h_loop_filter_luma_mbaff_8_rvv(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); diff --git a/libavcodec/riscv/h264dsp_rvv.S b/libavcodec/riscv/h264dsp_rvv.S index 96a8a0a8a3..6bc5406ba3 100644 --- a/libavcodec/riscv/h264dsp_rvv.S +++ b/libavcodec/riscv/h264dsp_rvv.S @@ -126,9 +126,11 @@ func ff_h264_v_loop_filter_luma_8_rvv, zve32x endfunc func ff_h264_h_loop_filter_luma_8_rvv, zve32x - vsetivli zero, 4, e32, m1, ta, ma - vle8.v v4, (a4) + vsetivli zero, 4, e8, mf4, ta, ma + vle16.v v8, (a5) li t0, 0x01010101 + vluxei16.v v4, (a4), v8 + vsetivli zero, 4, e32, m1, ta, ma vzext.vf4 v6, v4 addi a0, a0, -3 vmul.vx v6, v6, t0