From patchwork Thu Sep 28 21:08:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Niklas Haas X-Patchwork-Id: 44002 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:394d:b0:15d:8365:d4b8 with SMTP id r13csp143602pzg; Thu, 28 Sep 2023 14:09:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGyJsad/Ei+3ZGnjuoea3EuzEFF8sR/5wxDLzQunZcmwrMj8dTDWbIXwCahrEXPQjfIyD+Y X-Received: by 2002:aa7:c6d3:0:b0:531:f4:57a8 with SMTP id b19-20020aa7c6d3000000b0053100f457a8mr2125295eds.26.1695935369064; Thu, 28 Sep 2023 14:09:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695935369; cv=none; d=google.com; s=arc-20160816; b=QZxAG/P7VsCDQdi2bbxJ82Sr8PNHYnZ+yz0vPiF/L+9xXznd9DzASHXk4+G2IoDePK vc+4hfCtcZHecb7t7yusyujz+pke29j0LCfwlN7LaViDygqpk0J76TNZrP9xj/7an4oO V1/1FWbjtNFO+gO+NINyTwKxwecSWNskYE9elRT6B2Vl/0HR9dsVCodyi6x2TfH/o3bJ Gpqoc+nXFjiJBH7TqyUJRPjTiY+ZGx1DH5rbNiOzTnK+RmzLg1d6YOIkLdDejrEL9/ZR 3/elToNUYLOzE9cJfJRa2pyvC0sm6c7Y9RuHP64JEhgZ04Yd1I8w+ZZmr9/DpYxb086k Txkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=LVFzA0QLOKE2gN2POe87juOfxvH+x8+fBBBxxHzmNsk=; fh=xmAeKtysnShNOmkhiJmYkS30uw4Fu2hvBJ7qlIwukxQ=; b=Z6SH0mD4eQggtpCkg1Gli8V5ru4E2CwFDDhqSVqLtuQYPOlGLtMctePRwmERF4lSBt jw+wrNOTn23lt220XSJVSpl/YptPFV/mwsGTZaxS0D47UiBzeZygzy/TayoPtNmd+7hO IgU0siIuYZxgKMJW+K1pKmQEVH16VTtRc+zvMhOy7PHHy9h+ZdxI1GWy1m3effe2dFr9 vp/OtRmexvJyypkYJNr2CdvLQ6Z4QQzN7AkarynJwlN/GdWh7nJWW0CmhJ+SOQTdm9V+ oWAmbgliS+llBeLhGyFialnIKevF21dlb1tnmQjHHg7/MWWhk1GODmbwJv1lNQl+1pS2 nL9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@haasn.xyz header.s=mail header.b=atUJcXww; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m15-20020aa7c48f000000b00530e3412dc3si7025237edq.383.2023.09.28.14.09.05; Thu, 28 Sep 2023 14:09:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@haasn.xyz header.s=mail header.b=atUJcXww; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2B46568CBAE; Fri, 29 Sep 2023 00:09:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from haasn.dev (haasn.dev [78.46.187.166]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A539B68CA52 for ; Fri, 29 Sep 2023 00:08:55 +0300 (EEST) Received: from haasn.dev (unknown [10.30.0.2]) by haasn.dev (Postfix) with ESMTP id 5FA4F408F8; Thu, 28 Sep 2023 23:08:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=haasn.xyz; s=mail; t=1695935335; bh=UPWlNNJEQYbmxaKa1eEAsvO1lrd7Z4U+9k49iAPTSu4=; h=From:To:Cc:Subject:Date:From; b=atUJcXwwpidV3F5n5MgTS0evNoi394AqkUUtRKCERG8ySFSxAm+UeXV1oflGSjgpm hfrEZ/kjGrWQak/Zds9bbuzWkqg2j9L0vSSGy4gDJwjVe/2mW6WdufRCCtp0uyoK7g CIhoz6/R1ixphJpS3b3sQoWAOYPBu8OvtgMfSHCw= From: Niklas Haas To: ffmpeg-devel@ffmpeg.org Date: Thu, 28 Sep 2023 23:08:48 +0200 Message-ID: <20230928210848.95565-1-ffmpeg@haasn.xyz> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/h274: transpose IDCT X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Niklas Haas Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: LndXelZDuJtJ From: Niklas Haas This is mathematically equivalent to what we were doing before, but gives subtly different results due to rounding (rows first vs columns first). Doing it this way makes our film grain database generation match reference implementation and now produces bit-exact outputs in my testing. Rename the transposed variables to be a bit less confusing. --- libavcodec/h274.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/libavcodec/h274.c b/libavcodec/h274.c index a5caf09564d..5709200322e 100644 --- a/libavcodec/h274.c +++ b/libavcodec/h274.c @@ -59,13 +59,13 @@ static void init_slice_c(int8_t out[64][64], uint8_t h, uint8_t v, // // Note: To make the subsequent matrix multiplication cache friendlier, we // store each *column* of the starting image in a *row* of `out` - for (int y = 0; y <= freq_v; y++) { - for (int x = 0; x <= freq_h; x += 4) { + for (int l = 0; l <= freq_v; l++) { + for (int k = 0; k <= freq_h; k += 4) { uint16_t offset = seed % 2048; - out[x + 0][y] = Gaussian_LUT[offset + 0]; - out[x + 1][y] = Gaussian_LUT[offset + 1]; - out[x + 2][y] = Gaussian_LUT[offset + 2]; - out[x + 3][y] = Gaussian_LUT[offset + 3]; + out[l][k + 0] = Gaussian_LUT[offset + 0]; + out[l][k + 1] = Gaussian_LUT[offset + 1]; + out[l][k + 2] = Gaussian_LUT[offset + 2]; + out[l][k + 3] = Gaussian_LUT[offset + 3]; prng_shift(&seed); } } @@ -74,9 +74,9 @@ static void init_slice_c(int8_t out[64][64], uint8_t h, uint8_t v, // 64x64 inverse integer transform for (int y = 0; y < 64; y++) { - for (int x = 0; x <= freq_h; x++) { + for (int x = 0; x <= freq_v; x++) { int32_t sum = 0; - for (int p = 0; p <= freq_v; p++) + for (int p = 0; p <= freq_h; p++) sum += R64T[y][p] * out[x][p]; tmp[y][x] = (sum + 128) >> 8; } @@ -85,8 +85,8 @@ static void init_slice_c(int8_t out[64][64], uint8_t h, uint8_t v, for (int y = 0; y < 64; y++) { for (int x = 0; x < 64; x++) { int32_t sum = 0; - for (int p = 0; p <= freq_h; p++) - sum += tmp[y][p] * R64T[x][p]; // R64T^T = R64 + for (int p = 0; p <= freq_v; p++) + sum += tmp[x][p] * R64T[y][p]; // R64T^T = R64 // Renormalize and clip to [-127, 127] out[y][x] = av_clip((sum + 128) >> 8, -127, 127); }