From patchwork Mon Aug  8 18:23:58 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Timo Rothenpieler <timo@rothenpieler.org>
X-Patchwork-Id: 37192
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a20:9595:b0:8d:a2d5:b782 with SMTP id iu21csp413pzb;
        Mon, 8 Aug 2022 11:24:16 -0700 (PDT)
X-Google-Smtp-Source: 
 AA6agR5SS1074QXFEZMW28fH108ngYZJC4CkyHGwhKEP9QUfsHWhAfWz8SnVyneNiBq7vytAnv1L
X-Received: by 2002:a17:907:738a:b0:730:6d62:4ec8 with SMTP id
 er10-20020a170907738a00b007306d624ec8mr14860783ejc.590.1659983056162;
        Mon, 08 Aug 2022 11:24:16 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1659983056; cv=none;
        d=google.com; s=arc-20160816;
        b=K2U7lt09poj9a6OBnC73uhZ/tXGJV7IH1NLsJ9+ezrxWqJ55Pza8j+5FRhJUNuW7n6
         6uaJHR4s/xWsFu6Qwea2Bjncoowdv4KrmbcxPxvwxEfNu8SwutIhxQB6GkpcvgpgAEFz
         h9bX2qn9cXllAR2MSrIYOEISnD16LFI1oHCgsWHF8/3i59XRklz2dZpzdShzsOtnj/f2
         AzapCy4VFiwCNXddutMoPEDpLaXJZnwn5SQiVjXzuZOXWUpetiPmFe9q8hkfTSn3UYnr
         dkkw+EwBOdUD+E+r3SW+3UAoVAJfPuzQjJgnPkI+3Gy59Y0CqpXSevN0PQTn4vF5YqVg
         DA+w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:mime-version:message-id:date:to:from
         :dkim-signature:delivered-to;
        bh=Kwx0zscjiqHI9u2rxxFNk8G5AUH8A3LwIGQi7GWh7Qg=;
        b=qaIGfVc4sNmTb7AkH+xViqGBeuprG/4CqnsVGxECXUqJmWI5oQfOQwe8+UNCknWXhC
         cIJe/pWc8Gp4O20QKYLEr7tarf7CKago61PngS8eEPXaRRu/LXjOeLtXvKw580oQuLlq
         QcCFxJgxXVHRwl8a31KdPjH/NVJdYCOYhWJhrVaNxVxPDHI+A0MgNUGfeBN4A0gjNFX+
         TVAjscMNtGlV5tM/YESXAPNJqCtzEPQIwTydGJTpcy/lUSRV8GzDGaBkYD0CphxMgmOh
         HeOyXLD4WWKdbAzTSSHBBzmHQpwYBxKd1iXdaXWGqYrz3cqja+G68r6rsBHzoTgnG89J
         ycpg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@rothenpieler.org
 header.s=mail header.b=K+qEyiv+;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 s1-20020a17090699c100b007307c7bb9f9si263627ejn.94.2022.08.08.11.24.15;
        Mon, 08 Aug 2022 11:24:16 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@rothenpieler.org
 header.s=mail header.b=K+qEyiv+;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E353968B786;
	Mon,  8 Aug 2022 21:24:12 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from btbn.de (btbn.de [136.243.74.85])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BD7EF68B743
 for <ffmpeg-devel@ffmpeg.org>; Mon,  8 Aug 2022 21:24:06 +0300 (EEST)
Received: from [authenticated] by btbn.de (Postfix) with ESMTPSA id
 63968352B37; Mon,  8 Aug 2022 20:24:06 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rothenpieler.org;
 s=mail; t=1659983046;
 bh=yfi71VJ7MGTWkqbl99ugCrQAZq0mr+FewVWr5uvLxDY=;
 h=From:To:Cc:Subject:Date;
 b=K+qEyiv+t0YVEv+vX0C6Et+U0uITppMj6Ydw/i09S4nggYQbzg8YmXR/i3mxgfj3w
 +x1bpr5RbiEeVX5XxDVdz2bgd4Z+d1U8IdVYnACvubj5oEKWs4v+QSeuMbyTAZ8dzr
 jBNG1UovC2N6al8IdWdepj0a4n6AoTVGyejvcmc+BBMENPxe9kFOw2mv249BNxmllA
 pOq+MwdKMlm8L9wH8RCH38foGj3iHsKkkfyg5IZqUg5QPL7SmXFBMOiuqYGfvA3c5B
 FuE+zsYJJ4vDmD+B22LaL9/m5oxPDMKHIUZDYGYxFZFtf9IXPvIkqnqEFuD726chQq
 /qUff/MCOM7kQ==
From: Timo Rothenpieler <timo@rothenpieler.org>
To: ffmpeg-devel@ffmpeg.org
Date: Mon,  8 Aug 2022 20:23:58 +0200
Message-Id: <20220808182358.24264-1-timo@rothenpieler.org>
X-Mailer: git-send-email 2.34.1
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH] swscale/input: add rgbaf16 input support
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Timo Rothenpieler <timo@rothenpieler.org>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: BOdb7J0ip4ax

This is by no means perfect, since at least ddagrab will return scRGB
data with values outside of 0.0f to 1.0f for HDR values.
Its primary purpose is to be able to work with the format at all.

_Float16 support was available on arm/aarch64 for a while, and with gcc
12 was enabled on x86 as long as SSE2 is supported.

If the target arch supports f16c, gcc emits fairly efficient assembly,
taking advantage of it. This is the case on x86-64-v3 or higher.
Without f16c, it emulates it in software using sse2 instructions.
---

I am by no means certain this is the correct way to implement this.
Tested it with ddagrab output in that format, and it looks like what I'd
expect.

Specially the order of arguments is a bit of a mystery. I'd have
expected them to be in order of the planes, so for packed formats, only
the first one would matter.
But a bunch of other packed formats left the first src unused, and so I
followed along, and it ended up working fine.

 configure            |  2 +
 libswscale/input.c   | 95 ++++++++++++++++++++++++++++++++++++++++++++
 libswscale/utils.c   |  3 ++
 libswscale/version.h |  2 +-
 4 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 6761d0cb32..d989498bba 100755
--- a/configure
+++ b/configure
@@ -2143,6 +2143,7 @@ ARCH_FEATURES="
     fast_64bit
     fast_clz
     fast_cmov
+    float16
     local_aligned
     simd_align_16
     simd_align_32
@@ -6228,6 +6229,7 @@ check_builtin MemoryBarrier windows.h "MemoryBarrier()"
 check_builtin sync_val_compare_and_swap "" "int *ptr; int oldval, newval; __sync_val_compare_and_swap(ptr, oldval, newval)"
 check_builtin gmtime_r time.h "time_t *time; struct tm *tm; gmtime_r(time, tm)"
 check_builtin localtime_r time.h "time_t *time; struct tm *tm; localtime_r(time, tm)"
+check_builtin float16 "" "_Float16 f16var"
 
 case "$custom_allocator" in
     jemalloc)
diff --git a/libswscale/input.c b/libswscale/input.c
index 68abc4d62c..0b5bd952e8 100644
--- a/libswscale/input.c
+++ b/libswscale/input.c
@@ -1111,6 +1111,89 @@ static void grayf32##endian_name##ToY16_c(uint8_t *dst, const uint8_t *src,
 rgbf32_planar_funcs_endian(le, 0)
 rgbf32_planar_funcs_endian(be, 1)
 
+static void rgbaf16ToUV_half_c(uint8_t *_dstU, uint8_t *_dstV,
+                               const uint8_t *unused0, const uint8_t *src1, const uint8_t *src2,
+                               int width, uint32_t *_rgb2yuv)
+{
+#if HAVE_FLOAT16
+    const _Float16 *src = (const _Float16*)src1;
+    uint16_t *dstU = (uint16_t*)_dstU;
+    uint16_t *dstV = (uint16_t*)_dstV;
+    int32_t *rgb2yuv = (int32_t*)_rgb2yuv;
+    int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = rgb2yuv[BU_IDX];
+    int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv = rgb2yuv[BV_IDX];
+    int i;
+    av_assert1(src1==src2);
+    for (i = 0; i < width; i++) {
+        int r = (lrintf(av_clipf(65535.0f * src[i*8+0], 0.0f, 65535.0f)) +
+                 lrintf(av_clipf(65535.0f * src[i*8+4], 0.0f, 65535.0f))) >> 1;
+        int g = (lrintf(av_clipf(65535.0f * src[i*8+1], 0.0f, 65535.0f)) +
+                 lrintf(av_clipf(65535.0f * src[i*8+5], 0.0f, 65535.0f))) >> 1;
+        int b = (lrintf(av_clipf(65535.0f * src[i*8+2], 0.0f, 65535.0f)) +
+                 lrintf(av_clipf(65535.0f * src[i*8+6], 0.0f, 65535.0f))) >> 1;
+
+        dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1))) >> RGB2YUV_SHIFT;
+        dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1))) >> RGB2YUV_SHIFT;
+    }
+#endif
+}
+
+static void rgbaf16ToUV_c(uint8_t *_dstU, uint8_t *_dstV,
+                          const uint8_t *unused0, const uint8_t *src1, const uint8_t *src2,
+                          int width, uint32_t *_rgb2yuv)
+{
+#if HAVE_FLOAT16
+    const _Float16 *src = (const _Float16*)src1;
+    uint16_t *dstU = (uint16_t*)_dstU;
+    uint16_t *dstV = (uint16_t*)_dstV;
+    int32_t *rgb2yuv = (int32_t*)_rgb2yuv;
+    int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = rgb2yuv[BU_IDX];
+    int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv = rgb2yuv[BV_IDX];
+    int i;
+    av_assert1(src1==src2);
+    for (i = 0; i < width; i++) {
+        int r = lrintf(av_clipf(65535.0f * src[i*4+0], 0.0f, 65535.0f));
+        int g = lrintf(av_clipf(65535.0f * src[i*4+1], 0.0f, 65535.0f));
+        int b = lrintf(av_clipf(65535.0f * src[i*4+2], 0.0f, 65535.0f));
+
+        dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1))) >> RGB2YUV_SHIFT;
+        dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1))) >> RGB2YUV_SHIFT;
+    }
+#endif
+}
+
+static void rgbaf16ToY_c(uint8_t *_dst, const uint8_t *_src, const uint8_t *unused0, const uint8_t *unused1,
+                         int width, uint32_t *_rgb2yuv)
+{
+#if HAVE_FLOAT16
+    const _Float16 *src = (const _Float16*)_src;
+    uint16_t *dst = (uint16_t*)_dst;
+    int32_t *rgb2yuv = (int32_t*)_rgb2yuv;
+    int32_t ry = rgb2yuv[RY_IDX], gy = rgb2yuv[GY_IDX], by = rgb2yuv[BY_IDX];
+    int i;
+    for (i = 0; i < width; i++) {
+        int r = lrintf(av_clipf(65535.0f * src[i*4+0], 0.0f, 65535.0f));
+        int g = lrintf(av_clipf(65535.0f * src[i*4+1], 0.0f, 65535.0f));
+        int b = lrintf(av_clipf(65535.0f * src[i*4+2], 0.0f, 65535.0f));
+
+        dst[i] = (ry*r + gy*g + by*b + (0x2001<<(RGB2YUV_SHIFT-1))) >> RGB2YUV_SHIFT;
+    }
+#endif
+}
+
+static void rgbaf16ToA_c(uint8_t *_dst, const uint8_t *_src, const uint8_t *unused0, const uint8_t *unused1,
+                         int width, uint32_t *unused2)
+{
+#if HAVE_FLOAT16
+    const _Float16 *src = (const _Float16*)_src;
+    uint16_t *dst = (uint16_t*)_dst;
+    int i;
+    for (i=0; i<width; i++) {
+        dst[i] = lrintf(av_clipf(65535.0f * src[i*4+3], 0.0f, 65535.0f));
+    }
+#endif
+}
+
 av_cold void ff_sws_init_input_funcs(SwsContext *c)
 {
     enum AVPixelFormat srcFormat = c->srcFormat;
@@ -1375,6 +1458,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c)
         case AV_PIX_FMT_X2BGR10LE:
             c->chrToYV12 = bgr30leToUV_half_c;
             break;
+        case AV_PIX_FMT_RGBAF16:
+            c->chrToYV12 = rgbaf16ToUV_half_c;
+            break;
         }
     } else {
         switch (srcFormat) {
@@ -1462,6 +1548,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c)
         case AV_PIX_FMT_X2BGR10LE:
             c->chrToYV12 = bgr30leToUV_c;
             break;
+        case AV_PIX_FMT_RGBAF16:
+            c->chrToYV12 = rgbaf16ToUV_c;
+            break;
         }
     }
 
@@ -1750,6 +1839,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c)
     case AV_PIX_FMT_X2BGR10LE:
         c->lumToYV12 = bgr30leToY_c;
         break;
+    case AV_PIX_FMT_RGBAF16:
+        c->lumToYV12 = rgbaf16ToY_c;
+        break;
     }
     if (c->needAlpha) {
         if (is16BPS(srcFormat) || isNBPS(srcFormat)) {
@@ -1769,6 +1861,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c)
         case AV_PIX_FMT_ARGB:
             c->alpToYV12 = abgrToA_c;
             break;
+        case AV_PIX_FMT_RGBAF16:
+            c->alpToYV12 = rgbaf16ToA_c;
+            break;
         case AV_PIX_FMT_YA8:
             c->alpToYV12 = uyvyToY_c;
             break;
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 34503e57f4..c5c22017ff 100644
--- a/libswscale/utils.c
+++ b/libswscale/utils.c
@@ -259,6 +259,9 @@ static const FormatEntry format_entries[] = {
     [AV_PIX_FMT_P416LE]      = { 1, 1 },
     [AV_PIX_FMT_NV16]        = { 1, 1 },
     [AV_PIX_FMT_VUYA]        = { 1, 1 },
+#if HAVE_FLOAT16
+    [AV_PIX_FMT_RGBAF16]     = { 1, 0 },
+#endif
 };
 
 int ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos,
diff --git a/libswscale/version.h b/libswscale/version.h
index 3193562d18..d8694bb5c0 100644
--- a/libswscale/version.h
+++ b/libswscale/version.h
@@ -29,7 +29,7 @@
 #include "version_major.h"
 
 #define LIBSWSCALE_VERSION_MINOR   8
-#define LIBSWSCALE_VERSION_MICRO 102
+#define LIBSWSCALE_VERSION_MICRO 103
 
 #define LIBSWSCALE_VERSION_INT  AV_VERSION_INT(LIBSWSCALE_VERSION_MAJOR, \
                                                LIBSWSCALE_VERSION_MINOR, \