From patchwork Sun Sep 22 03:39:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51690 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d154:0:b0:48e:c0f8:d0de with SMTP id bt20csp1803014vqb; Sat, 21 Sep 2024 20:40:47 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX9Ece9GuPls2Myb96S2lFXPrzIiQGV5sdK97OX/6Pq5qH7GZx4oA9152/ilUvzSADJWC+0S2hXO+zLmyNEyCwp@gmail.com X-Google-Smtp-Source: AGHT+IFBa58y9oKG70ExCdZVVheAT8sRHOmj30ZJ3i+xIHY9rjAeRXEaFgf49uo2mCYj3xQ3g0RZ X-Received: by 2002:a05:651c:2115:b0:2f6:6531:3df1 with SMTP id 38308e7fff4ca-2f7cc365e07mr35806241fa.15.1726976447032; Sat, 21 Sep 2024 20:40:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726976447; cv=none; d=google.com; s=arc-20240605; b=CquPQNbCDAd5EA5QzsDehPMnb7abmQaGzhKK0bG7lkuHpHq7MVEmJK5WkRDMwRjtRz JGleauR0P0p60aDU0Nzq+VKyQvHFxnyc78FKJV4/MqyTc1Naigw3v2/VVdPMngsz6liK bW5EO8moX8FCOFYAKdkT1FvrHiGqRmQ8cByH1/Qlcxb3mIetYv8sCn4AIN214pKhcclA ltvpzUxkMQXHCEkTzae1eKvVldNJSC8BP54xwmv064oW1seT2X4w74W2RinNvORk5VQL l/s0YNA0W1iGCGVf5etgrmyvj4yP9RMmbKqwct979bjnK4hb+zvJxcT6kkiJwXwFB7sJ 5zcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=7cfhqX4rG8B51cqUgJGp4DAc2aJ0fJfmuG1V4xvfnxg=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=Cz8FM0qMl29lxNhCglYqwJmlFPkS1fBNUwIF8U7KoOJ5C2poKEQ+5UFdaZgqGlousG UuLNZEq+l4XqNmZjWaBIKHVD9WqYnTgmZbuUUrf0Cp7Eswvso2efOj6NbJWL89dYcnEv fWWC5l70gqJfyUjz2csc0GWarNqV7IqGDtCAxz+zFKMVvrkjBEV/+JozqekvoZ3uJyp9 22i47/FWZp4svvHHoQKc/zsvlQZTNnB1KJUkM4U+rQ1fcupEOESznW95nqyiu4hl1d6I Gn5yqWMjm0UgW1cbTDl/JzbYZgolqjB0Yq54/p2HewJda4c9UgYdSOTPcX2FMhH0R2NC g34w==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=xxooYrr+; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f79d35e546si53788211fa.313.2024.09.21.20.40.46; Sat, 21 Sep 2024 20:40:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=xxooYrr+; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 28BF668DAEC; Sun, 22 Sep 2024 06:40:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-58-211.mail.qq.com (out162-62-58-211.mail.qq.com [162.62.58.211]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8881F68CB66 for ; Sun, 22 Sep 2024 06:40:34 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726976431; bh=62wdcPIjn+bjb+HF6I8/S/PmlzklBhUECcIqzeM04PM=; h=From:To:Cc:Subject:Date; b=xxooYrr+QhZA6KcWr75RCdSPmpfab78wSMPl2qTXurZe6Us2541kdccGrtwtAZDjc cMmtM2NL3WDL09+ZujfxAgVMBgBQdYxYt9fV6GqgjCaCgwsA/OzetHZyV97ag520cf /fiOLsDpkb3aU43CAjnaxN7mJ+HHpLvfCCdNGl2Y= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.103.137]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 9CF8C0B2; Sun, 22 Sep 2024 11:39:15 +0800 X-QQ-mid: xmsmtpt1726976355t3rp394di Message-ID: X-QQ-XMAILINFO: NafziRg7Bx69gt+Se/+EKyBw/27XO/Uc+uTpIt24eSPjs2jewNNI1NZzT0y9aL LBZH1J6E5WBkrdzY7rRGYBvebHpcACjb1v1UvclqYxOWRJuIVFGlDSH3xX4/kxz061Jcpp5Mr5Al RWcXt/Jr81QmxUnVBBZEanEFEXkj6StuwlugZzGok3KLyAmwoaQrE55Q/Fkg7wk6AHn6NwasC/Pc gUfSnbxb7JzSHGMUw7D+bihI1fsKuvKKxaK1UcfEX0U1gQfc9BSIbpNbGDYckU2doMQ7azUYaEmB UmhVwR7Xcv5N0x/Mryx7A2icsLogrjMt78Vj+PRtcuqgP/vwfg95KSY7aBc89z7niPiMoUk4WMJG eXJmzYndQSGzuyZcy6WsuxBmoT46O4o68KQW40qUH2D0aHvJv7hXXdMmgwI9gF861qdRfZJiIXpw M8idnJpAml9ue1tqQs0gihT44zUJzICxvB85sQzh8508ISYayS6Ac7AuSH/q3YUwsHni9KoZ2QYq c7/zyYeMTtNnmHw62dN1XpuGNMAMbqO7OfqWSuL3Rwiwefwjc2VoxjM+Q4qMbdZTa0ij7OnEAqct zW/HXFooOWkEFW5krhciAklnAVGl6zNtdU1ew/y5WyRCI2fjKHBUELJRN7+FYM7mrsnXoapYbimJ 2unkePh2bOpEnAQ5eAe1c2+ewwsAxbmMIyP+H4lFixx7tIDGmkYQxWPr/fQ30kIg+70rd1CxN6q3 cPi0yO0DkaD2PDtxYIU4QKZE69sdaxQqFQ5vQtEHxbJfkK2hCvD+4CN8GmTDkuNqJenjjDlNfhRQ qWeiufsCU4mnekCZpcHt8TF3GCq30tZXFMW+Dupn4UB69Or4MWzp3KrzWM5jo7CBpGcQonx9kiJp KpSY80y+vqAfPoOyvwe20rGRNzNZsOZiymkccIHx5wctPGg5y/nANpiYK8cGkevtaEi7IL3uod2h cF9aPRiNY0g6bZNR7JviMI7zecMD4pbDhNJrco/SLAL/c8cWEveDep7t3lpyp/ X-QQ-XMRINFO: NI4Ajvh11aEj8Xl/2s1/T8w= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Sun, 22 Sep 2024 11:39:14 +0800 X-OQ-MSGID: <20240922033914.22270-1-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4] avcodec/vvc: Don't use large array on stack X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: /XnOC8n2lraw From: Zhao Zhili tmp_array in dmvr_hv takes 33024 bytes on stack, which can be dangerous. --- v4: 1. Add DMVR_FILTER2 macro 2. Process first line out of loop to remove condition check. libavcodec/vvc/inter_template.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c index c073a73e76..aee4994c17 100644 --- a/libavcodec/vvc/inter_template.c +++ b/libavcodec/vvc/inter_template.c @@ -472,6 +472,9 @@ static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, const i (filter[0] * src[x] + \ filter[1] * src[x + stride]) +#define DMVR_FILTER2(filter, src0, src1) \ + (filter[0] * src0 + filter[1] * src1) + //8.5.3.2.2 Luma sample bilinear interpolation process static void FUNC(dmvr)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride, const int height, const intptr_t mx, const intptr_t my, const int width) @@ -541,31 +544,31 @@ static void FUNC(dmvr_v)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src static void FUNC(dmvr_hv)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride, const int height, const intptr_t mx, const intptr_t my, const int width) { - int16_t tmp_array[(MAX_PB_SIZE + BILINEAR_EXTRA) * MAX_PB_SIZE]; - int16_t *tmp = tmp_array; + int16_t tmp_array[MAX_PB_SIZE * 2]; + int16_t *tmp0 = tmp_array; + int16_t *tmp1 = tmp_array + MAX_PB_SIZE; const pixel *src = (const pixel*)_src; const ptrdiff_t src_stride = _src_stride / sizeof(pixel); - const int8_t *filter = ff_vvc_inter_luma_dmvr_filters[mx]; + const int8_t *filter_x = ff_vvc_inter_luma_dmvr_filters[mx]; + const int8_t *filter_y = ff_vvc_inter_luma_dmvr_filters[my]; const int shift1 = BIT_DEPTH - 6; const int offset1 = 1 << (shift1 - 1); const int shift2 = 4; const int offset2 = 1 << (shift2 - 1); src -= BILINEAR_EXTRA_BEFORE * src_stride; - for (int y = 0; y < height + BILINEAR_EXTRA; y++) { - for (int x = 0; x < width; x++) - tmp[x] = (DMVR_FILTER(src, 1) + offset1) >> shift1; - src += src_stride; - tmp += MAX_PB_SIZE; - } + for (int x = 0; x < width; x++) + tmp0[x] = (DMVR_FILTER2(filter_x, src[x], src[x + 1]) + offset1) >> shift1; + src += src_stride; - tmp = tmp_array + BILINEAR_EXTRA_BEFORE * MAX_PB_SIZE; - filter = ff_vvc_inter_luma_dmvr_filters[my]; - for (int y = 0; y < height; y++) { - for (int x = 0; x < width; x++) - dst[x] = (DMVR_FILTER(tmp, MAX_PB_SIZE) + offset2) >> shift2; - tmp += MAX_PB_SIZE; + for (int y = 1; y < height + BILINEAR_EXTRA; y++) { + for (int x = 0; x < width; x++) { + tmp1[x] = (DMVR_FILTER2(filter_x, src[x], src[x + 1]) + offset1) >> shift1; + dst[x] = (DMVR_FILTER2(filter_y, tmp0[x], tmp1[x]) + offset2) >> shift2; + } + src += src_stride; dst += MAX_PB_SIZE; + FFSWAP(int16_t *, tmp0, tmp1); } }