From patchwork Sun Aug 20 15:10:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 43275 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:47ca:b0:130:ccc6:6c4b with SMTP id ey10csp936582pzb; Sun, 20 Aug 2023 08:11:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGuD3GKAQxa6udNs4bOl1KhL4pQWQCM0NA3gaRP/L+sC87IJrDhFEPvN8skLmFiI5aW8xTS X-Received: by 2002:a2e:94c8:0:b0:2bb:78ad:56cb with SMTP id r8-20020a2e94c8000000b002bb78ad56cbmr2837099ljh.37.1692544281063; Sun, 20 Aug 2023 08:11:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692544281; cv=none; d=google.com; s=arc-20160816; b=017dfiPz+hpCYNab/a5O1JWYY8U6sDMGknV1v3T435tu+YpC95yuYNoNx3HL/TOjDs qGS/VoWuJbxfOEdjpgWij+dzWNaBtRpdcWxQWQyCnPwL2lLTS1y8U1LHpcadQb4LrqVy +fqWqMtmM/ndanI0HBQzLPBbIqyCxJf4dFXEJn0GLNpsvKs5+knb57thEBaSohn5+5Ew xmPMn6uCmp/fG9AEJ6FKKZwkXyT05PTGPreQfrFrkyHCgcSE/4HAJ/pS+fofjBqWlpf4 2TbPrckdUKSPt8lNfQtnMGzUiw8oUTi6R1sqh3tUiGQn/gTajvJXc9OrETVvuOJUJmff +POw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=ISmiB2oB/zIxJQO2dV8UEw3uteESweDu1xFjFDgLlSs=; fh=9QDi6dFFPFAV43XzYhuUbqo2pwrpR9p92hw/7eQiArk=; b=N6hDcTMZvYHKWMK6An1N6ClC5qgVdGYX7EE+kT9/Fy9dtEXGBLpOlEq+MV3eSZkSxA xgf1NktdAgkR/LmmvDHFJNe/jxuf6mI0T1N1Mtv60awXKVhi0gKH52JHDLOso5pF0mH3 3U6MVHuP0HF1siQHSGuB2QM4/QfFoJE5EL+9aRlhcQWp5QkuHKyrIwIzTr5/FjA9TWUL MCW0gexsFIr9MUb5RMqHeXvIkxQNPXNgzGdbEh0kGQ1TQNToVL1MSABHMte9iREePOb2 17i99l6m6L0Lrf5ZSjMfQzL8V25MLP/UO7zEAByHbj/mzeoPjNva9Yjfl8D20B6fGeA+ pFJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=cFahH2ie; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id rv6-20020a17090710c600b00965cb784a27si4066286ejb.699.2023.08.20.08.11.20; Sun, 20 Aug 2023 08:11:21 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=cFahH2ie; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 43D8968C364; Sun, 20 Aug 2023 18:10:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2D53668BFEE for ; Sun, 20 Aug 2023 18:10:35 +0300 (EEST) Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-3197808bb08so2334086f8f.2 for ; Sun, 20 Aug 2023 08:10:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1692544234; x=1693149034; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=X1WbP+pSWdXklWqcofKp5THMMSn+sUNB7aEv72qzwI8=; b=cFahH2ie/ijxeFhsbnCrBk/hZ/V/wAiMjYFy1uD1wHnU9buwRMq/ecbzLedvNvad9p YKG4sPmsbRItFq7rnprifX0LqrR2rNkjyHrvX1KaZJ1toKpZnNMGQTiekLODibQWpS98 gflMT0BCYJOB5iR55RX+XNY9e5qAMqVxrE4RaDK9hWDpt8HC2itlno8OCQJrHpdDWE4+ arRlZM0lsVqcaa9wh+Aj3cyM3PNwpTS9/2DyTpp2A3+aYcq5taT5jcprgLf2eGTwRdbn nEjCEEsMq4wxl5pqG2/JH4OTUvV5YjkHJaavkMxIL+HfDMdIMRnhZOHWMUS9cUSd8TJR vXOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692544234; x=1693149034; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=X1WbP+pSWdXklWqcofKp5THMMSn+sUNB7aEv72qzwI8=; b=T++aSUZBEu8kEpSYfmM8YGT1tXokoa4T2J0VxWC7JU+Tqu3N5ylmEQdh9tYjPWtIxQ b0tZaMhnXjq8tPIT8F2Nhy+RiZDn66qhwX+pP2auRAwux0iPz6gSAzkowPGoV+lTSPWG oydC3ZzjtWKU+sHVcdsz+2wYNkdP1YyEqUBYFCjrOSERSupfn2yjcvZKls2AXnBXJlqq 30kFZEd56XkFVhVsQBedzfVFkIhHmj665n7m0z0Zk0ZuxKdbh3wb0VRCZHn329lB4T18 kQES+xRpwwwRFyJ3iw4Ij7di6ufIobRXaLSySCxUE8V4wawG7ffj7jQtrOaRD4VHUSQN Kjxg== X-Gm-Message-State: AOJu0Yw9i1JY5Roac7wT5vdOIuJ3DW5w6KXYYBT59nJ3+sCwY6IXemKO R51teupueeqDqj84MKL3H4uJVL9Y9AZIL2N2qxo= X-Received: by 2002:adf:e704:0:b0:319:6d20:49c7 with SMTP id c4-20020adfe704000000b003196d2049c7mr3457963wrm.3.1692544234609; Sun, 20 Aug 2023 08:10:34 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id b4-20020adff904000000b003197c7d08ddsm9494476wrr.71.2023.08.20.08.10.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Aug 2023 08:10:34 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Sun, 20 Aug 2023 15:10:20 +0000 Message-Id: <20230820151022.2204421-5-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230820151022.2204421-1-jc@kynesim.co.uk> References: <20230820151022.2204421-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v1 4/6] swscale: RGB24->YUV allow odd widths & improve C rounding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: John Cox Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: PRi/w2hjE5rl Allow odd widths for conversion it costs very little and simplifies setup slightly. x86 asm will fall back to the C code if width is odd. Round to nearest rather than just down. This reduces the Y error reported by tests/swscale from 3 to 1. x86 asm doesn't mirror the C so exact correspondence isn't an issue there. Signed-off-by: John Cox --- libswscale/rgb2rgb_template.c | 42 ++++++++++++++++++------------- libswscale/swscale_unscaled.c | 5 ++-- libswscale/x86/rgb2rgb_template.c | 5 ++++ 3 files changed, 32 insertions(+), 20 deletions(-) diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c index e57bfa6545..5503e58a29 100644 --- a/libswscale/rgb2rgb_template.c +++ b/libswscale/rgb2rgb_template.c @@ -656,6 +656,8 @@ static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst, int32_t rv = rgb2yuv[x[6]], gv = rgb2yuv[x[7]], bv = rgb2yuv[x[8]]; int y; const int chromWidth = width >> 1; + const int32_t ky = ((16 << 1) + 1) << (RGB2YUV_SHIFT - 1); + const int32_t kc = ((128 << 1) + 1) << (RGB2YUV_SHIFT - 1); for (y = 0; y < height; y += 2) { int i; @@ -664,9 +666,9 @@ static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst, unsigned int g = src[6 * i + 1]; unsigned int r = src[6 * i + 2]; - unsigned int Y = ((ry * r + gy * g + by * b) >> RGB2YUV_SHIFT) + 16; - unsigned int V = ((rv * r + gv * g + bv * b) >> RGB2YUV_SHIFT) + 128; - unsigned int U = ((ru * r + gu * g + bu * b) >> RGB2YUV_SHIFT) + 128; + unsigned int Y = (ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT; + unsigned int V = (rv * r + gv * g + bv * b + kc) >> RGB2YUV_SHIFT; + unsigned int U = (ru * r + gu * g + bu * b + kc) >> RGB2YUV_SHIFT; udst[i] = U; vdst[i] = V; @@ -676,30 +678,36 @@ static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst, g = src[6 * i + 4]; r = src[6 * i + 5]; - Y = ((ry * r + gy * g + by * b) >> RGB2YUV_SHIFT) + 16; + Y = ((ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT); ydst[2 * i + 1] = Y; } - ydst += lumStride; - src += srcStride; - - if (y+1 == height) - break; - - for (i = 0; i < chromWidth; i++) { + if ((width & 1) != 0) { unsigned int b = src[6 * i + 0]; unsigned int g = src[6 * i + 1]; unsigned int r = src[6 * i + 2]; - unsigned int Y = ((ry * r + gy * g + by * b) >> RGB2YUV_SHIFT) + 16; + unsigned int Y = (ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT; + unsigned int V = (rv * r + gv * g + bv * b + kc) >> RGB2YUV_SHIFT; + unsigned int U = (ru * r + gu * g + bu * b + kc) >> RGB2YUV_SHIFT; + udst[i] = U; + vdst[i] = V; ydst[2 * i] = Y; + } + ydst += lumStride; + src += srcStride; - b = src[6 * i + 3]; - g = src[6 * i + 4]; - r = src[6 * i + 5]; + if (y+1 == height) + break; - Y = ((ry * r + gy * g + by * b) >> RGB2YUV_SHIFT) + 16; - ydst[2 * i + 1] = Y; + for (i = 0; i < width; i++) { + unsigned int b = src[3 * i + 0]; + unsigned int g = src[3 * i + 1]; + unsigned int r = src[3 * i + 2]; + + unsigned int Y = (ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT; + + ydst[i] = Y; } udst += chromStride; vdst += chromStride; diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c index 751bdcb2e4..e10f967755 100644 --- a/libswscale/swscale_unscaled.c +++ b/libswscale/swscale_unscaled.c @@ -1994,7 +1994,6 @@ void ff_get_unscaled_swscale(SwsContext *c) const enum AVPixelFormat dstFormat = c->dstFormat; const int flags = c->flags; const int dstH = c->dstH; - const int dstW = c->dstW; int needsDither; needsDither = isAnyRGB(dstFormat) && @@ -2052,12 +2051,12 @@ void ff_get_unscaled_swscale(SwsContext *c) /* bgr24toYV12 */ if (srcFormat == AV_PIX_FMT_BGR24 && (dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P) && - !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)) && !(dstW&1)) + !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT))) c->convert_unscaled = bgr24ToYv12Wrapper; /* rgb24toYV12 */ if (srcFormat == AV_PIX_FMT_RGB24 && (dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P) && - !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)) && !(dstW&1)) + !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT))) c->convert_unscaled = rgb24ToYv12Wrapper; /* RGB/BGR -> RGB/BGR (no dither needed forms) */ diff --git a/libswscale/x86/rgb2rgb_template.c b/libswscale/x86/rgb2rgb_template.c index dc2b4e205a..f90527aa08 100644 --- a/libswscale/x86/rgb2rgb_template.c +++ b/libswscale/x86/rgb2rgb_template.c @@ -1555,6 +1555,11 @@ static inline void RENAME(bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_ int y; const x86_reg chromWidth= width>>1; + if ((width & 1) != 0) { + ff_bgr24toyv12_c(src, ydst, udst, vdst, width, height, lumStride, chromStride, srcStride, rgb2yuv); + return; + } + if (height > 2) { ff_bgr24toyv12_c(src, ydst, udst, vdst, width, 2, lumStride, chromStride, srcStride, rgb2yuv); src += 2*srcStride;