From patchwork Fri Sep 16 14:52:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 37956 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1022615pzh; Fri, 16 Sep 2022 07:55:02 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5WdALFfEbtlFETdCT8JKaLPkYH5GvQnocVoGPZ09Pw+Snj1WABDJbDm9zYeASwkFGBmL0x X-Received: by 2002:a17:907:8a03:b0:77f:f5ac:8b46 with SMTP id sc3-20020a1709078a0300b0077ff5ac8b46mr3987237ejc.65.1663340101865; Fri, 16 Sep 2022 07:55:01 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id cy23-20020a0564021c9700b004467c12a6bdsi2221816edb.459.2022.09.16.07.55.00; Fri, 16 Sep 2022 07:55:01 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=SDBmj3KP; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5CAB768BBE9; Fri, 16 Sep 2022 17:54:57 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03olkn2027.outbound.protection.outlook.com [40.92.58.27]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DAEDB68BB11 for ; Fri, 16 Sep 2022 17:54:50 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=d1nkhbXzX30ug5nXYRl2WCTshSpGWwgZaXS5kMmTsEIDBT8s0BtZBZT5PGBmv+EJs7g9zUO27w7fSZ/FDkoBjitpa5wL+sd7SDZIc5ogXcmIlCNvR3uff+qJ5OVEXeJNO1JBcL0wAeU4IrUAGDHQy7vwAJHOoOsidZk7msL4O9G3KCvY91qVaWyLtJ6pOiTk82zKIVxUqpLcdszguAUDAlXqc0gTHDVcSjp2/bDgh+lQAHD6SsRk83iNSFF7yrn1mSTduWMqZm+WciLz7VbcrPsGaaVmAh8TTUfFBSf0tuJpj1TlsTX9LkHfbz3FijLplf63KuyMrAF0FZJ3fMO23A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EBJIi6pD9VUUY7abSpXozJYW6SS3mIzsUVRaZqOsrx4=; b=LZQXaLb495QK1EAiYtWc2GT0FxnuTpaG4UzUBul8wBMLtmHveDs4uya6kRZLI+VllTPHwggCW1Gav5yCFOlw9Gmb5TIeaeXtsQfUCrXsaOHKntVAopR3IHuHuniU16OVEC7XrxzD3RsTZTSAeW0kHVuAy6ZcuHjZ4TlnQFMI4UQU52PnCai0hg6+/DFYOTcX/hkk3heyo4s+N5ZlQwoe3kWylaJdsBbrdd+QBejva3da2sjfRCYu7Ky4zo8WphDPraEYqoFci7fU6iAkXX8NF9XrvzKBQxNTfSuSDOqgIhjfhtLAQNaU1Yc2a+j7K/4YOtzsPKXgPkrw5chiXBvlbg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EBJIi6pD9VUUY7abSpXozJYW6SS3mIzsUVRaZqOsrx4=; b=SDBmj3KPAD/L2wZ1N9PMRl6wc66lixKEpJTa0KVvg1oUdYGD5q12g+JY0zmtMB8M/svxvmTMmSa8d4t7kQ4cccpJpb9oCGwQTZACN+V6cCn3jI5zK4Nj4Og3arDkRgxNByYs7s7Gt+cfMqJII7IK7Er9k9UqRKCTnISbzA+BIstylC/aM/fsTCv4H/ZXnNJ50Rj6e7Vb0x90sLGb9S7whReznQR7AIe1a2yFphM4AUGs5niHcCy6Luq2YHjirG6PiCKdZ8PPSNsqRb3zMdkjaOdzJjjtxrVrhOru4O/6BhY/jBvYUoDtUhCijjr9byZ/H8J45KuciWVxj0ewgllRkg== Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) by AS8P250MB0863.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:54c::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5632.15; Fri, 16 Sep 2022 14:54:49 +0000 Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::2c84:e72a:48a9:ff90]) by AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::2c84:e72a:48a9:ff90%6]) with mapi id 15.20.5632.014; Fri, 16 Sep 2022 14:54:49 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 16 Sep 2022 16:52:22 +0200 Message-ID: X-Mailer: git-send-email 2.34.1 X-TMN: [blsSLxQXxHDciu3Cbdl08Ul1YefJajtk+7ZG5g7ZnlM=] X-ClientProxiedBy: ZR2P278CA0013.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:50::17) To AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) X-Microsoft-Original-Message-ID: <20220916145223.3710225-1-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AS8P250MB0744:EE_|AS8P250MB0863:EE_ X-MS-Office365-Filtering-Correlation-Id: ca7a8e91-776d-49a0-dd0c-08da97f366f7 X-MS-Exchange-SLBlob-MailProps: AZnQBsB9XmoK+zKX/dtz/2UGzCu+TCIsu5CKvd6VtLCSU6DX6ISRqBJFmIVq0yEBsz0HH6to0Af45uSz5G8gWeJEaID5AiZddy/cAOyqhz7LueUSDB6BtqKfux980Js/G/hT8Wcfmv0AwmlH2jqw2/NX1g/ld7QGoV3lvv0QVrmouiRkaF7c2Jwi/T/6wUNXXzozv6hh3gPCgXD4kBtOQYxYasceyxI15rbvzJ5P0oClpZW4KBTGlpwRcjRZyECUul6pLZeZIunnodO2hRrqSW5u06k/BtAljB5W1duoPmszoMW2UFLuYrx9lB5DWTMaLvgQJJD9az8LKV8Huz31RGs5vZvQQmyn5c+biExgW3F6yApSg5YOGm/TO9h95GkX34v6iNyERunuT5fbJqnmZfbEoVoMHUMiQqXn0AHPYkF5iTSTfvtJU09canItkWquUmAFkcvg8p80xxOPEGYel4lWwYa5ulNt+YyLCu75JFl7YEUVE2iMvhgXnLtMrUKzPtw+MjkINtZqc7ygAtbFtmp2MOoZNC8rMqOqRNQILU99lbLA2IQuGEDL3NG+xL0f9Klkiz9IL9Oqk0fXCuDcLenXoGuWOAGd+CqrKE/iLd9kRzl2EtGZi6yQ+2bpEbVlz+40WcRyk4l9rKRnOZN0Em0T9UN3UuqIRyZ64bd15u9a4tHW0kk4IvLBOdCmU0SG2UuevaWAFSM6iAL1Hw9FFT/ifFKTtJB9YIv0g3k0V7m7I2/2jy1Wi3VIXY/tHCtLu2agurY8li4= X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: AkTst022fV3uqhRKUCKcoe5Xtmx3miiFDDirEqWVf/uzOt26Th92NhoBjVhlvcQrWQ9lM+SDtK/xF7N0P7Za6Wpu/shyreEYd14L2SWA5mt8BpM2nRRTlFp4n8ZYLdv/Ae41gadMlTYCZ9Gy92c7bchHkEDiCBkP5VTrOCCe9hVoI1n+rqDCk360IokiMFcMqcryaTe4FwvAaK3wz2y2fgNueXu4T5gbOKOLZyKNvb6MjWogqEXwhTvQhYL8rU8uXgX34HAFYzUe9zBaMoKWyfNJCi7So93bZ1kcrETQUjXcT1LKXFHuazuaRVPFr0PCCJpXNfCXf+VNyOR7rzfjB6T8/OKojvfKuXdyIb65XYxNWtQdFGslMHL36au1S5wwaTLq7vdFmA4DjuBsih/2WgQs5SFEK/DHtJNjkLdaVkk2sl59rFwd5IwrRCXDQiYYEI5xoT/rfk9rxf9i4H89QZJny4lpv6tRKBvLXPHO60GB2JptjQF7XhmsWZY1/MqzgIGJUzeLJ0uQZlN/KS1oK93xJxT/tlBdfPVVvPUCNuTqawkDFD2cPGxyVgiHzPoFIayTmeJuzmuh8LTVJuYJI+k8LiMswjT2L/BJlrsAtvn3bZA1SJOoRPx1FgxvcXRZ/OGtEGdDO2iXaZ2d67pZbQ== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Bohy+sd1Na1lkqhk51bJQg1jFxeCpFd4OkGF1PrCcm974E432o0dFxQds1iGCWgxDTX5aeRHRaXewKHX+sZGHBOjAQHjpf5glPbOVJzgzhJ7mLm2ibxbRoj2sqGjtZZmqFJGay7TxRBBKMaBSiz9lvn0oLwBsbJ4/KbrqBY72vWx64CkbLcfFbbT6FZRk5Dd5S4q0GmEOl9dy5TGPwzQlDzxQqhSzBru/XOt9LYruoLMTc1rBovme45B9pQpBZchIYV2xu9sKJ9trT61m/x4dey3GdVmXe8B2rBJMJmc/xBb+Q7rIKHbDC+kawl04NAYmaWQrSz8z1mw6EtpQrcTO2VpB3zrCbWgzWUyJo8AXfC6j3jVJw2hjsJm7Nv+9BzWVV1MbsTmlIUR5h+4GuJOMNUsrgPUeL00zFcV8qZgAn3PfzA30WpZsZfGUPwJqltPGONnHj39nOqJmKZZ+AIwPu6JOzrN7sg+hcp6XUomDkbf7aifo3DiSIcVrG6BoGqJO7gke0XYemTZll+3d2pv8Gen42FU071F4U+Hm55NYNIbhYDGVLydKbSS7NojveUz1CDLO9KMri1NARzFK/EHYM8uMmo4LAUdLT57qcQSQ7kxe2jQiak40fp6FcGDwAgV/fFZQrF0sFoKavTfapWmxBVDR6japtAxsVl7aW0LXxVexGNhzlCMTaCl//Boam8wpe/iwvrKfNxIl1QmxBF8+Q0qrjZvLcep3NIctV+TtyOUg8/3oFdvEJM/Qq0nRx10ySNt89b70XdJYX7bocDWDgvi4y6EBaCS2HViwCbOqZ6t0uRFiuOtgPdXGqYFx8DBVYAVChs46HXMNh9cLV6G7MdBlcGso7RqCa9jtcq1/ByzQXhU6bzc65PqrfORXqaZRXmAZBYAF8zvrBST3//BTuM+4oKccrmjrQb/XixZlxsFUacs1VCSpAIWRnoTMOmT++6jV1Ig0FIG7pyBWxOokId3oCBrAaQF1hAzmHpfwf0F0X0j7GyZd7ZCpnM/2hGOom8f4H/9lL2yH/IHG8cEuHeaeNcNYlDYhC1JmPYEdpVKxHxk/6vNwabFXuk2xIroZb0ABb+ZDRR5boZkqGbiTuoBcFSXMJmhgIJ1sv8YWfoZBRLWwReV/SbOoTFhWqw8m2XKicXsI3d2iH/5Py7qAxRTM7s3jO0MVydj6DPeOJjmlyZP1BSSchJ3OudotOp6taJq82xnzp99Wp7Fppj8R8ocIApvlIY/NrEo4e6FBRfYYIWyWLyPFAhwxhpHX+jES/BIOpFe3HWe2nDjvwwlhwhmLGvibTQSERsz1OVUSa+IW23WT6Q5GdSnTZpjKr71 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: ca7a8e91-776d-49a0-dd0c-08da97f366f7 X-MS-Exchange-CrossTenant-AuthSource: AS8P250MB0744.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2022 14:54:48.9681 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8P250MB0863 Subject: [FFmpeg-devel] [PATCH v2 1/2] swscale/input: Avoid calls to av_pix_fmt_desc_get() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: YE+66FTG8vpo Up until now, libswscale/input.c used a macro to read an input pixel which involved a call to av_pix_fmt_desc_get() to find out whether the input pixel format is BE or LE despite this being known at compile-time (there are templates per pixfmt). Even worse, these calls are made in a loop, so that e.g. there are six calls to av_pix_fmt_desc_get() for every pair of UV pixel processed in rgb64ToUV_half_c_template(). This commit modifies these macros to ensure that isBE() is evaluated at compile-time. This saved 9743B of .text for me (GCC 11.2, -O3). For a simple RGB64LE->YUV420P transformation like ffmpeg -f lavfi -i haldclutsrc,format=rgba64le -pix_fmt yuv420p \ -threads 1 -t 1:00 -f null - the amount of decicycles spent in rgb64LEToUV_half_c (which is created via the template mentioned above) decreases from 19751 to 5341; for RGBA64BE the number went down from 11945 to 5393. For shared builds (where the call to av_pix_fmt_desc_get() is indirect) the old numbers are 15230 for RGBA64BE and 27502 for RGBA64LE, whereas the numbers with this patch are indistinguishable from the numbers from a static build. Also make the macros that are touched conform to the usual convention of using uppercase names while just at it. Signed-off-by: Andreas Rheinhardt --- libswscale/input.c | 122 +++++++++++++++++++++++++-------------------- 1 file changed, 68 insertions(+), 54 deletions(-) diff --git a/libswscale/input.c b/libswscale/input.c index 88e318e664..7ff7bfaa01 100644 --- a/libswscale/input.c +++ b/libswscale/input.c @@ -28,14 +28,21 @@ #include "config.h" #include "swscale_internal.h" -#define input_pixel(pos) (isBE(origin) ? AV_RB16(pos) : AV_RL16(pos)) +#define input_pixel(pos) (is_be ? AV_RB16(pos) : AV_RL16(pos)) + +#define IS_BE_LE 0 +#define IS_BE_BE 1 +#define IS_BE_ 0 +/* ENDIAN_IDENTIFIER needs to be "BE", "LE" or "". The latter is intended + * for single-byte cases where the concept of endianness does not apply. */ +#define IS_BE(ENDIAN_IDENTIFIER) IS_BE_ ## ENDIAN_IDENTIFIER #define r ((origin == AV_PIX_FMT_BGR48BE || origin == AV_PIX_FMT_BGR48LE || origin == AV_PIX_FMT_BGRA64BE || origin == AV_PIX_FMT_BGRA64LE) ? b_r : r_b) #define b ((origin == AV_PIX_FMT_BGR48BE || origin == AV_PIX_FMT_BGR48LE || origin == AV_PIX_FMT_BGRA64BE || origin == AV_PIX_FMT_BGRA64LE) ? r_b : b_r) static av_always_inline void rgb64ToY_c_template(uint16_t *dst, const uint16_t *src, int width, - enum AVPixelFormat origin, int32_t *rgb2yuv) + enum AVPixelFormat origin, int32_t *rgb2yuv, int is_be) { int32_t ry = rgb2yuv[RY_IDX], gy = rgb2yuv[GY_IDX], by = rgb2yuv[BY_IDX]; int i; @@ -51,7 +58,7 @@ rgb64ToY_c_template(uint16_t *dst, const uint16_t *src, int width, static av_always_inline void rgb64ToUV_c_template(uint16_t *dstU, uint16_t *dstV, const uint16_t *src1, const uint16_t *src2, - int width, enum AVPixelFormat origin, int32_t *rgb2yuv) + int width, enum AVPixelFormat origin, int32_t *rgb2yuv, int is_be) { int i; int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = rgb2yuv[BU_IDX]; @@ -70,7 +77,7 @@ rgb64ToUV_c_template(uint16_t *dstU, uint16_t *dstV, static av_always_inline void rgb64ToUV_half_c_template(uint16_t *dstU, uint16_t *dstV, const uint16_t *src1, const uint16_t *src2, - int width, enum AVPixelFormat origin, int32_t *rgb2yuv) + int width, enum AVPixelFormat origin, int32_t *rgb2yuv, int is_be) { int i; int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = rgb2yuv[BU_IDX]; @@ -86,13 +93,13 @@ rgb64ToUV_half_c_template(uint16_t *dstU, uint16_t *dstV, } } -#define rgb64funcs(pattern, BE_LE, origin) \ +#define RGB64FUNCS_EXT(pattern, BE_LE, origin, is_be) \ static void pattern ## 64 ## BE_LE ## ToY_c(uint8_t *_dst, const uint8_t *_src, const uint8_t *unused0, const uint8_t *unused1,\ int width, uint32_t *rgb2yuv, void *opq) \ { \ const uint16_t *src = (const uint16_t *) _src; \ uint16_t *dst = (uint16_t *) _dst; \ - rgb64ToY_c_template(dst, src, width, origin, rgb2yuv); \ + rgb64ToY_c_template(dst, src, width, origin, rgb2yuv, is_be); \ } \ \ static void pattern ## 64 ## BE_LE ## ToUV_c(uint8_t *_dstU, uint8_t *_dstV, \ @@ -102,7 +109,7 @@ static void pattern ## 64 ## BE_LE ## ToUV_c(uint8_t *_dstU, uint8_t *_dstV, \ const uint16_t *src1 = (const uint16_t *) _src1, \ *src2 = (const uint16_t *) _src2; \ uint16_t *dstU = (uint16_t *) _dstU, *dstV = (uint16_t *) _dstV; \ - rgb64ToUV_c_template(dstU, dstV, src1, src2, width, origin, rgb2yuv); \ + rgb64ToUV_c_template(dstU, dstV, src1, src2, width, origin, rgb2yuv, is_be); \ } \ \ static void pattern ## 64 ## BE_LE ## ToUV_half_c(uint8_t *_dstU, uint8_t *_dstV, \ @@ -112,18 +119,20 @@ static void pattern ## 64 ## BE_LE ## ToUV_half_c(uint8_t *_dstU, uint8_t *_dstV const uint16_t *src1 = (const uint16_t *) _src1, \ *src2 = (const uint16_t *) _src2; \ uint16_t *dstU = (uint16_t *) _dstU, *dstV = (uint16_t *) _dstV; \ - rgb64ToUV_half_c_template(dstU, dstV, src1, src2, width, origin, rgb2yuv); \ + rgb64ToUV_half_c_template(dstU, dstV, src1, src2, width, origin, rgb2yuv, is_be); \ } +#define RGB64FUNCS(pattern, endianness, base_fmt) \ + RGB64FUNCS_EXT(pattern, endianness, base_fmt ## endianness, IS_BE(endianness)) -rgb64funcs(rgb, LE, AV_PIX_FMT_RGBA64LE) -rgb64funcs(rgb, BE, AV_PIX_FMT_RGBA64BE) -rgb64funcs(bgr, LE, AV_PIX_FMT_BGRA64LE) -rgb64funcs(bgr, BE, AV_PIX_FMT_BGRA64BE) +RGB64FUNCS(rgb, LE, AV_PIX_FMT_RGBA64) +RGB64FUNCS(rgb, BE, AV_PIX_FMT_RGBA64) +RGB64FUNCS(bgr, LE, AV_PIX_FMT_BGRA64) +RGB64FUNCS(bgr, BE, AV_PIX_FMT_BGRA64) static av_always_inline void rgb48ToY_c_template(uint16_t *dst, const uint16_t *src, int width, enum AVPixelFormat origin, - int32_t *rgb2yuv) + int32_t *rgb2yuv, int is_be) { int32_t ry = rgb2yuv[RY_IDX], gy = rgb2yuv[GY_IDX], by = rgb2yuv[BY_IDX]; int i; @@ -142,7 +151,7 @@ static av_always_inline void rgb48ToUV_c_template(uint16_t *dstU, const uint16_t *src2, int width, enum AVPixelFormat origin, - int32_t *rgb2yuv) + int32_t *rgb2yuv, int is_be) { int i; int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = rgb2yuv[BU_IDX]; @@ -164,7 +173,7 @@ static av_always_inline void rgb48ToUV_half_c_template(uint16_t *dstU, const uint16_t *src2, int width, enum AVPixelFormat origin, - int32_t *rgb2yuv) + int32_t *rgb2yuv, int is_be) { int i; int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = rgb2yuv[BU_IDX]; @@ -187,7 +196,7 @@ static av_always_inline void rgb48ToUV_half_c_template(uint16_t *dstU, #undef b #undef input_pixel -#define rgb48funcs(pattern, BE_LE, origin) \ +#define RGB48FUNCS_EXT(pattern, BE_LE, origin, is_be) \ static void pattern ## 48 ## BE_LE ## ToY_c(uint8_t *_dst, \ const uint8_t *_src, \ const uint8_t *unused0, const uint8_t *unused1,\ @@ -197,7 +206,7 @@ static void pattern ## 48 ## BE_LE ## ToY_c(uint8_t *_dst, \ { \ const uint16_t *src = (const uint16_t *)_src; \ uint16_t *dst = (uint16_t *)_dst; \ - rgb48ToY_c_template(dst, src, width, origin, rgb2yuv); \ + rgb48ToY_c_template(dst, src, width, origin, rgb2yuv, is_be); \ } \ \ static void pattern ## 48 ## BE_LE ## ToUV_c(uint8_t *_dstU, \ @@ -213,7 +222,7 @@ static void pattern ## 48 ## BE_LE ## ToUV_c(uint8_t *_dstU, \ *src2 = (const uint16_t *)_src2; \ uint16_t *dstU = (uint16_t *)_dstU, \ *dstV = (uint16_t *)_dstV; \ - rgb48ToUV_c_template(dstU, dstV, src1, src2, width, origin, rgb2yuv); \ + rgb48ToUV_c_template(dstU, dstV, src1, src2, width, origin, rgb2yuv, is_be); \ } \ \ static void pattern ## 48 ## BE_LE ## ToUV_half_c(uint8_t *_dstU, \ @@ -229,13 +238,15 @@ static void pattern ## 48 ## BE_LE ## ToUV_half_c(uint8_t *_dstU, \ *src2 = (const uint16_t *)_src2; \ uint16_t *dstU = (uint16_t *)_dstU, \ *dstV = (uint16_t *)_dstV; \ - rgb48ToUV_half_c_template(dstU, dstV, src1, src2, width, origin, rgb2yuv); \ + rgb48ToUV_half_c_template(dstU, dstV, src1, src2, width, origin, rgb2yuv, is_be); \ } +#define RGB48FUNCS(pattern, endianness, base_fmt) \ + RGB48FUNCS_EXT(pattern, endianness, base_fmt ## endianness, IS_BE(endianness)) -rgb48funcs(rgb, LE, AV_PIX_FMT_RGB48LE) -rgb48funcs(rgb, BE, AV_PIX_FMT_RGB48BE) -rgb48funcs(bgr, LE, AV_PIX_FMT_BGR48LE) -rgb48funcs(bgr, BE, AV_PIX_FMT_BGR48BE) +RGB48FUNCS(rgb, LE, AV_PIX_FMT_RGB48) +RGB48FUNCS(rgb, BE, AV_PIX_FMT_RGB48) +RGB48FUNCS(bgr, LE, AV_PIX_FMT_BGR48) +RGB48FUNCS(bgr, BE, AV_PIX_FMT_BGR48) #define input_pixel(i) ((origin == AV_PIX_FMT_RGBA || \ origin == AV_PIX_FMT_BGRA || \ @@ -245,7 +256,7 @@ rgb48funcs(bgr, BE, AV_PIX_FMT_BGR48BE) : ((origin == AV_PIX_FMT_X2RGB10LE || \ origin == AV_PIX_FMT_X2BGR10LE) \ ? AV_RL32(&src[(i) * 4]) \ - : (isBE(origin) ? AV_RB16(&src[(i) * 2]) \ + : (is_be ? AV_RB16(&src[(i) * 2]) \ : AV_RL16(&src[(i) * 2])))) static av_always_inline void rgb16_32ToY_c_template(int16_t *dst, @@ -257,7 +268,7 @@ static av_always_inline void rgb16_32ToY_c_template(int16_t *dst, int maskr, int maskg, int maskb, int rsh, int gsh, int bsh, int S, - int32_t *rgb2yuv) + int32_t *rgb2yuv, int is_be) { const int ry = rgb2yuv[RY_IDX]< X-Patchwork-Id: 37958 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1023053pzh; Fri, 16 Sep 2022 07:55:54 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6CpCEtdKKRemvgWdVJPGP8vEfH8Gqkovz8tLs+BoQhqSiLBVka/4uXzGw3MXf86VAVG/+Q X-Received: by 2002:a17:907:a079:b0:77d:a363:64b6 with SMTP id ia25-20020a170907a07900b0077da36364b6mr3780398ejc.451.1663340154227; Fri, 16 Sep 2022 07:55:54 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l20-20020a170906795400b00780b95a07b4si1336490ejo.873.2022.09.16.07.55.53; Fri, 16 Sep 2022 07:55:54 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=muOdkqHm; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6396768BBEF; Fri, 16 Sep 2022 17:55:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR03-AM7-obe.outbound.protection.outlook.com (mail-am7eur03olkn2026.outbound.protection.outlook.com [40.92.59.26]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8350068BA2B for ; Fri, 16 Sep 2022 17:55:44 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bGzHRf11hrTWWanNHT/J/zLx/b8lh+/hccuttm1S4fFM1G+cj5emptALH781GMKhgsaUoyRrwYZcONtHj2mMLL6m9EG8FoWiqyt1zEsweqpYJvLTt3UdBfZmX93ybMKZ8ImHTnVdutBSObuRzIHiQxnU8lm7U3qL8EMQFchfK7S/4dP2rmtGgSfV4+zZDuAIbM3ke8Z8m+Fll4FLsseuyZwy2OTmDwFemtqzWrmzZAaucQD5e49jT6aVP5nBrc4I904vrPiI8MzfGFldbON+OE2hz5wtH/diXyU7chxqgmF6wgsih+C+oaoE4Fv3rPiAlQDptM4jnCaTxEF+Yxgg6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CVyXeVUNVAeVmgK4IgZ2hwJFFKH2B/xQmseFk2PqHAo=; b=bX+hKhqS0tLxwqHidkit1EhbKd5f7ldRQg1yfOUFhK6cjb3g+Y4zgo7oj2gr+dFMenDvYtVVLA++Vqz0zSC8zYRi1elJ+3STYEM0WJ/w24FTYMG05D31G9AcKEaXXNO/5LQNgzCI7e8AJlThQGz4oUwLgNRw4S2YhqUf7C4NCEtfEUSs+k37l0XJXjYOP9wR5jtbU59G9eXkkspm0ndSH0O9o2U+Ouhbsv4RhOqjltiEKGnRYwZCc7DLtExlQP5z5lnbZfXl/gXHdlih8xlgffHMnyi3W/TgWlWVkPZoLMQgmLhcwUbKZ2yWOJZ04VPADcorIyDC3pNjPjT2BEyNzw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CVyXeVUNVAeVmgK4IgZ2hwJFFKH2B/xQmseFk2PqHAo=; b=muOdkqHmzUxerUeHOiJ+LE0u8xflkZbzGxexCSoiyVLnSajG2++7O+CJmpMJ125Q2kR4jMtH9Y9X6pC9lZ/U6nNZKoNjeE/BeysRZEJ7qWJaksQqZ1Mk2ANJ4yns17VAljhCIxd1nZ31pM8J/6Hipgd6LekLkwvPoAoTdt1KpePxfigKKXHjQE/i5mdaJcD8lyC6199cP1KuA7CcZccQKkLjEqCzzkN/zS9E2b+WvjQ4iqT6JkEuj6e6Q7tKag2/579braOunJEDkbJqFTRVe4esKuVzNAlwjrziBHbf33uFRZ6p65E4trBoOCtgKhfyhVmQUllMqvB8pQno60+zMw== Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) by AS8P250MB0863.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:54c::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5632.15; Fri, 16 Sep 2022 14:55:42 +0000 Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::2c84:e72a:48a9:ff90]) by AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::2c84:e72a:48a9:ff90%6]) with mapi id 15.20.5632.014; Fri, 16 Sep 2022 14:55:42 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 16 Sep 2022 16:55:39 +0200 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: X-TMN: [yOOrz3PdaASuSoNNpQx1ULS05LfgCNpikJIN9HlRz7U=] X-ClientProxiedBy: ZR2P278CA0033.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:47::9) To AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) X-Microsoft-Original-Message-ID: <20220916145539.3952288-1-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AS8P250MB0744:EE_|AS8P250MB0863:EE_ X-MS-Office365-Filtering-Correlation-Id: 26cfe077-ad49-44c0-c387-08da97f3870f X-MS-Exchange-SLBlob-MailProps: AZnQBsB9XmoK+zKX/dtz/2UGzCu+TCIsv9O6IE0sfwiDRskJ88CLpPR91xPX9Gdau5LpEt/d2Y8/ty5y9iYiDfXSIhQpFW7Bo7KcoQ5Jm9ZSmgA7sZ3GMS8SqFu67FkLt58y+XL6G9RQp4SJOkeBa7EaJge8VWJfGFQnC9RmAN01k5KMHBtfnpynu63LQ1/pwG9/EeeezzSJ9YhCZXpZ2KQ9jgM2GPlWNIDixMQAoLyI+bDrzl1vM0njcUTYBgUDwiX5ATSvKI0AaZup1oAdK2310amPxNI4YgYwlesidvlnD/gPeLTLsBbBigs8qNw/wcZyroBni5LHTMRzrQEW+oaERgYLDAwK24R1KzKLW9PeCdaF6Cloolz9oacpdqQWLJfSGwLhLgP9jEuYmgracz5Wk5V1l3hVzMHjaycCvux40N3Ki9AQz2YlbJrb4WZAeNxjkJg0GbZJ+lwokKZI7NSEY7bWZ1eLpaA8jvNPzkqytzxAFDgmsOmkJtuU7KZOh6lLAIYcAmitT/McK3RxxpKAZpW6VBqXpprj/7oNMmv7lr+XNkpuT5jUETb2juoyG6PhWqLTBXgmyHMNhTwjy596KW+IjRu0aPsSpv5T1FM/RioVyiWSoZ537DRWR4dbRrwsQfGX7RvbDLQRjxXooDTl1Eva8b5+e2gmMTd4OUJaQwl0Qwfian5dHWFOV+1ATyDAnc8DUrn2Y/IBq2PPu8iVQGinHoe1pP7mzJBg4H/Yw8Gduns83MSuKZ+Dd2mxs/n9S7HF4DQ= X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZRh0uyZsyOdBl1rQGedIme9zalK1V/vd35i6tYYF+R1Oxyoubm50kAXktc9VZdoWKhXGOWKSbH7PDWkKEYAnFc4NXTVV6bprbZ1Mo6tn3fUYStLOUZOFSZHsGIdZsWxGPBUz8B3+nNYRvJSwE0s0cJEnGksnliQdLo8rWDqv+5QiG+CILtgt6AAL20lbYoF35ncd+LYEj7+tIJnoWcTXYkzL6Q6c86EY/bN5iP1Txjj6I5JdIx7lgX3acU5oT8qEEVk8WafOGTfNSeWKOMB1xkjPTZzAiupNCGGsJG9L1NQ0fILrAqN54X54/tkQpPSAUT1fyogm3QJKICjcY+PjkRnQjWMX4QJJv6EVt2juWtDn4OZExAbTEJNAPBEu2gGIlTs+fxAzqhwwIdceZwSoxdsgONW8HELxJhOgQRE/j3IDFzP3FgEt+WNB5SSK7S3u4jTCT6sIHJ/X3pPEYtlx2vOkABu0tNzlNoHGqCyKRzhUaLsbChC0iNIp+Q3DtN2ah2lv9UJNs5Op4OfeGx6w4PfNCB9wA+Q8Nmmi+7tTsT86hiRIBL8flaFOSEQJugQ0G3VI7ZFchTGk6pwh3mv2RGcOCDHgciYfLVwYmLbWohABxtkq2bDpty3IUZ2/QvLn8wwOQ0GsSC4mqP30iLhVag== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: cW6LEDMQQ9UZMxl+6W1Rg4DyEM7BbXOA0aXNE6NTpzIT8qROKn4qi8ZK1273o+Wz3kFEZOHP85/91gt/Si3d7avD/ews3J0sdSGNZ0u/EH2Z6OBmCWpLa0fIXDx9UC9MjTtJmwkIxgDsvhbFkcHuWfhfXA43JH3+Wx7ck3HpiwEHvsMyCTvSXzw+1vnUJHwk4qQJf3/TMV2EyEBLvHzlqL+GVgb/5iKP2AV0vxAJCWcExKK8k6R8/5VIXsW/aA96Iv/JJ+RmQTrkflP1c3A6J2SgpEsHkT5cCfjGzzmnVjesHvfFx5bBjD3bDBIhQJJCz0hoFY6t8WE6MXTOVgztuqCFXhqzzmD34yCkCGCU21OU+oD1cmZjpxmM1ctDlsa/rjA8cEdLUfFgwe+xDY2JgcdVXnXVBUCXcwommdtHcXOwWVD/tnBpoW4VNAUQ1IjsYIjSdfoX3AX2p81X9vpFR0jxcV4EbDnFKL0XIo8sFVqdet2Kwc7+v46Hz6yUq1ir+2lioDH86k+W/xY0q73pnrQZbvIDgVzhihMraPL/tl1mcy45JrwfO+BGMkmhUjTk+ScdSokTaMGbAFqUp0eWIp97pAqL/OTzjsS+eLPBCechaNUu/sP7LPHARYCa3dF1bVuqfPjSsfYHFotWhE3LzR4Hvi53LcXF4rP6L3WDSyH5+Xui4DcbRvN6pG33NsksXKgITWl0w/tENCp6KiiUhWI3s1aaBgsTX/2mJ/R7wbUsborXU8GoE02V4m6D8FCMyBdsxGouQ8OpLXNtH8Si9LFTnIoAEOH4T2Kdpx7NPu7mP6vV+54O9J7JlWVFoI/f84qCSPzUx9RJRIqCcBaZm0X1S7ntjucfAVZZVwvQQ3i+GsJp4yMy4cGNsGo7f3/5fzySwHCQTPUkLHxqWNNUK1SJoD44EXeRYFAHwBCEaeYpea1+oDyk4IjNJjHlRsTCXWw2E/+FjfVoz5WyBKMyyOVWbA9pYi7rB47vO2oivb8c9vv0WymYQuNCG5PH/YE/FsMUe9jEu57JLSTwKnYJF5DI5akLkBxKSS41t+TQZtbdldD4N7QsYS+5BdV3o2yunaB5/R6brmZ3W9kwn2YfRnLOoam8tK09G6J4B37khR0EzOJGM0kut1Rqj7eWVyFWRBzF+r4h0/cZOA5VS58XQ4Woo1t4HhlZ2sYDZRtO+cIUdfnfhxwCZqlnGsnjqi5fYG99bH7xS66mGoSALj6hdrCJiAwT+dpS4DpSKXvsgQg4kuUvxeizfZ2in1krDRKyb9qdEUkDpdpD/kGcygGGoUHoYWYvePTmSf/UEFlrblFalcMofWWfgwMtFmUD7xb6 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 26cfe077-ad49-44c0-c387-08da97f3870f X-MS-Exchange-CrossTenant-AuthSource: AS8P250MB0744.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2022 14:55:42.7336 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8P250MB0863 Subject: [FFmpeg-devel] [PATCH v2 2/2] swscale/output: Don't call av_pix_fmt_desc_get() in a loop X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: USsEirq2A+yo Up until now, libswscale/output.c used a macro to write an output pixel which involved a call to av_pix_fmt_desc_get() to find out whether the input pixel format is BE or LE despite this being known at compile-time (there are templates per pixfmt). Even worse, these calls are made in a loop, so that e.g. there are eight calls to av_pix_fmt_desc_get() for every pixel processed in yuv2rgba64_X_c_template() for 64bit RGB formats. This commit modifies these macros to ensure that isBE() is evaluated at compile-time. This saved 41184B of .text for me (GCC 11.2, -O3). Of course, it also improved performance. E.g. ffmpeg_g -f lavfi -i testsrc2,format=yuva420p -pix_fmt rgba64le \ -threads 1 -t 1:00 -f null - (which uses yuv2rgba64le_X_c, which is an invocation of yuv2rgba64_X_c_template() mentioned above), performance improved from 95589 to 41387 decicycles for one call to yuv2packedX; for the be variant the numbers went down from 76087 to 43024 decicycles. Signed-off-by: Andreas Rheinhardt --- libswscale/output.c | 100 +++++++++++++++++++++++++------------------- 1 file changed, 58 insertions(+), 42 deletions(-) diff --git a/libswscale/output.c b/libswscale/output.c index 2f599698e9..0e1c1225a0 100644 --- a/libswscale/output.c +++ b/libswscale/output.c @@ -133,6 +133,11 @@ DECLARE_ALIGNED(8, const uint8_t, ff_dither_8x8_220)[][8] = { }; #endif +#define IS_BE_LE 0 +#define IS_BE_BE 1 +/* ENDIAN_IDENTIFIER needs to be "BE" or "LE". */ +#define IS_BE(ENDIAN_IDENTIFIER) IS_BE_ ## ENDIAN_IDENTIFIER + #define output_pixel(pos, val, bias, signedness) \ if (big_endian) { \ AV_WB16(pos, bias + av_clip_ ## signedness ## 16(val >> shift)); \ @@ -935,7 +940,7 @@ YUV2PACKEDWRAPPER(yuv2, 422, uyvy422, AV_PIX_FMT_UYVY422) #define R_B ((target == AV_PIX_FMT_RGB48LE || target == AV_PIX_FMT_RGB48BE || target == AV_PIX_FMT_RGBA64LE || target == AV_PIX_FMT_RGBA64BE) ? R : B) #define B_R ((target == AV_PIX_FMT_RGB48LE || target == AV_PIX_FMT_RGB48BE || target == AV_PIX_FMT_RGBA64LE || target == AV_PIX_FMT_RGBA64BE) ? B : R) #define output_pixel(pos, val) \ - if (isBE(target)) { \ + if (is_be) { \ AV_WB16(pos, val); \ } else { \ AV_WL16(pos, val); \ @@ -947,7 +952,8 @@ yuv2ya16_X_c_template(SwsContext *c, const int16_t *lumFilter, const int16_t *chrFilter, const int32_t **unused_chrUSrc, const int32_t **unused_chrVSrc, int unused_chrFilterSize, const int32_t **alpSrc, uint16_t *dest, int dstW, - int y, enum AVPixelFormat target, int unused_hasAlpha, int unused_eightbytes) + int y, enum AVPixelFormat target, + int unused_hasAlpha, int unused_eightbytes, int is_be) { int hasAlpha = !!alpSrc; int i; @@ -984,7 +990,8 @@ yuv2ya16_2_c_template(SwsContext *c, const int32_t *buf[2], const int32_t *unused_ubuf[2], const int32_t *unused_vbuf[2], const int32_t *abuf[2], uint16_t *dest, int dstW, int yalpha, int unused_uvalpha, int y, - enum AVPixelFormat target, int unused_hasAlpha, int unused_eightbytes) + enum AVPixelFormat target, int unused_hasAlpha, + int unused_eightbytes, int is_be) { int hasAlpha = abuf && abuf[0] && abuf[1]; const int32_t *buf0 = buf[0], *buf1 = buf[1], @@ -1015,7 +1022,8 @@ static av_always_inline void yuv2ya16_1_c_template(SwsContext *c, const int32_t *buf0, const int32_t *unused_ubuf[2], const int32_t *unused_vbuf[2], const int32_t *abuf0, uint16_t *dest, int dstW, - int unused_uvalpha, int y, enum AVPixelFormat target, int unused_hasAlpha, int unused_eightbytes) + int unused_uvalpha, int y, enum AVPixelFormat target, + int unused_hasAlpha, int unused_eightbytes, int is_be) { int hasAlpha = !!abuf0; int i; @@ -1043,7 +1051,8 @@ yuv2rgba64_X_c_template(SwsContext *c, const int16_t *lumFilter, const int16_t *chrFilter, const int32_t **chrUSrc, const int32_t **chrVSrc, int chrFilterSize, const int32_t **alpSrc, uint16_t *dest, int dstW, - int y, enum AVPixelFormat target, int hasAlpha, int eightbytes) + int y, enum AVPixelFormat target, int hasAlpha, int eightbytes, + int is_be) { int i; int A1 = 0xffff<<14, A2 = 0xffff<<14; @@ -1124,7 +1133,8 @@ yuv2rgba64_2_c_template(SwsContext *c, const int32_t *buf[2], const int32_t *ubuf[2], const int32_t *vbuf[2], const int32_t *abuf[2], uint16_t *dest, int dstW, int yalpha, int uvalpha, int y, - enum AVPixelFormat target, int hasAlpha, int eightbytes) + enum AVPixelFormat target, int hasAlpha, int eightbytes, + int is_be) { const int32_t *buf0 = buf[0], *buf1 = buf[1], *ubuf0 = ubuf[0], *ubuf1 = ubuf[1], @@ -1188,7 +1198,8 @@ static av_always_inline void yuv2rgba64_1_c_template(SwsContext *c, const int32_t *buf0, const int32_t *ubuf[2], const int32_t *vbuf[2], const int32_t *abuf0, uint16_t *dest, int dstW, - int uvalpha, int y, enum AVPixelFormat target, int hasAlpha, int eightbytes) + int uvalpha, int y, enum AVPixelFormat target, + int hasAlpha, int eightbytes, int is_be) { const int32_t *ubuf0 = ubuf[0], *vbuf0 = vbuf[0]; int i; @@ -1293,7 +1304,8 @@ yuv2rgba64_full_X_c_template(SwsContext *c, const int16_t *lumFilter, const int16_t *chrFilter, const int32_t **chrUSrc, const int32_t **chrVSrc, int chrFilterSize, const int32_t **alpSrc, uint16_t *dest, int dstW, - int y, enum AVPixelFormat target, int hasAlpha, int eightbytes) + int y, enum AVPixelFormat target, int hasAlpha, + int eightbytes, int is_be) { int i; int A = 0xffff<<14; @@ -1356,7 +1368,8 @@ yuv2rgba64_full_2_c_template(SwsContext *c, const int32_t *buf[2], const int32_t *ubuf[2], const int32_t *vbuf[2], const int32_t *abuf[2], uint16_t *dest, int dstW, int yalpha, int uvalpha, int y, - enum AVPixelFormat target, int hasAlpha, int eightbytes) + enum AVPixelFormat target, int hasAlpha, int eightbytes, + int is_be) { const int32_t *buf0 = buf[0], *buf1 = buf[1], *ubuf0 = ubuf[0], *ubuf1 = ubuf[1], @@ -1407,7 +1420,8 @@ static av_always_inline void yuv2rgba64_full_1_c_template(SwsContext *c, const int32_t *buf0, const int32_t *ubuf[2], const int32_t *vbuf[2], const int32_t *abuf0, uint16_t *dest, int dstW, - int uvalpha, int y, enum AVPixelFormat target, int hasAlpha, int eightbytes) + int uvalpha, int y, enum AVPixelFormat target, + int hasAlpha, int eightbytes, int is_be) { const int32_t *ubuf0 = ubuf[0], *vbuf0 = vbuf[0]; int i; @@ -1484,7 +1498,7 @@ yuv2rgba64_full_1_c_template(SwsContext *c, const int32_t *buf0, #undef r_b #undef b_r -#define YUV2PACKED16WRAPPER(name, base, ext, fmt, hasAlpha, eightbytes) \ +#define YUV2PACKED16WRAPPER_EXT(name, base, ext, fmt, is_be, hasAlpha, eightbytes) \ static void name ## ext ## _X_c(SwsContext *c, const int16_t *lumFilter, \ const int16_t **_lumSrc, int lumFilterSize, \ const int16_t *chrFilter, const int16_t **_chrUSrc, \ @@ -1499,7 +1513,7 @@ static void name ## ext ## _X_c(SwsContext *c, const int16_t *lumFilter, \ uint16_t *dest = (uint16_t *) _dest; \ name ## base ## _X_c_template(c, lumFilter, lumSrc, lumFilterSize, \ chrFilter, chrUSrc, chrVSrc, chrFilterSize, \ - alpSrc, dest, dstW, y, fmt, hasAlpha, eightbytes); \ + alpSrc, dest, dstW, y, fmt, hasAlpha, eightbytes, is_be); \ } \ \ static void name ## ext ## _2_c(SwsContext *c, const int16_t *_buf[2], \ @@ -1513,7 +1527,7 @@ static void name ## ext ## _2_c(SwsContext *c, const int16_t *_buf[2], \ **abuf = (const int32_t **) _abuf; \ uint16_t *dest = (uint16_t *) _dest; \ name ## base ## _2_c_template(c, buf, ubuf, vbuf, abuf, \ - dest, dstW, yalpha, uvalpha, y, fmt, hasAlpha, eightbytes); \ + dest, dstW, yalpha, uvalpha, y, fmt, hasAlpha, eightbytes, is_be); \ } \ \ static void name ## ext ## _1_c(SwsContext *c, const int16_t *_buf0, \ @@ -1527,36 +1541,38 @@ static void name ## ext ## _1_c(SwsContext *c, const int16_t *_buf0, \ *abuf0 = (const int32_t *) _abuf0; \ uint16_t *dest = (uint16_t *) _dest; \ name ## base ## _1_c_template(c, buf0, ubuf, vbuf, abuf0, dest, \ - dstW, uvalpha, y, fmt, hasAlpha, eightbytes); \ + dstW, uvalpha, y, fmt, hasAlpha, eightbytes, is_be); \ } - -YUV2PACKED16WRAPPER(yuv2, rgba64, rgb48be, AV_PIX_FMT_RGB48BE, 0, 0) -YUV2PACKED16WRAPPER(yuv2, rgba64, rgb48le, AV_PIX_FMT_RGB48LE, 0, 0) -YUV2PACKED16WRAPPER(yuv2, rgba64, bgr48be, AV_PIX_FMT_BGR48BE, 0, 0) -YUV2PACKED16WRAPPER(yuv2, rgba64, bgr48le, AV_PIX_FMT_BGR48LE, 0, 0) -YUV2PACKED16WRAPPER(yuv2, rgba64, rgba64be, AV_PIX_FMT_RGBA64BE, 1, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64, rgba64le, AV_PIX_FMT_RGBA64LE, 1, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64, rgbx64be, AV_PIX_FMT_RGBA64BE, 0, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64, rgbx64le, AV_PIX_FMT_RGBA64LE, 0, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64, bgra64be, AV_PIX_FMT_BGRA64BE, 1, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64, bgra64le, AV_PIX_FMT_BGRA64LE, 1, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64, bgrx64be, AV_PIX_FMT_BGRA64BE, 0, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64, bgrx64le, AV_PIX_FMT_BGRA64LE, 0, 1) -YUV2PACKED16WRAPPER(yuv2, ya16, ya16be, AV_PIX_FMT_YA16BE, 1, 0) -YUV2PACKED16WRAPPER(yuv2, ya16, ya16le, AV_PIX_FMT_YA16LE, 1, 0) - -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgb48be_full, AV_PIX_FMT_RGB48BE, 0, 0) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgb48le_full, AV_PIX_FMT_RGB48LE, 0, 0) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgr48be_full, AV_PIX_FMT_BGR48BE, 0, 0) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgr48le_full, AV_PIX_FMT_BGR48LE, 0, 0) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgba64be_full, AV_PIX_FMT_RGBA64BE, 1, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgba64le_full, AV_PIX_FMT_RGBA64LE, 1, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgbx64be_full, AV_PIX_FMT_RGBA64BE, 0, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgbx64le_full, AV_PIX_FMT_RGBA64LE, 0, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgra64be_full, AV_PIX_FMT_BGRA64BE, 1, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgra64le_full, AV_PIX_FMT_BGRA64LE, 1, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgrx64be_full, AV_PIX_FMT_BGRA64BE, 0, 1) -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgrx64le_full, AV_PIX_FMT_BGRA64LE, 0, 1) +#define YUV2PACKED16WRAPPER(name, base, ext, base_fmt, endianness, hasAlpha, eightbytes) \ + YUV2PACKED16WRAPPER_EXT(name, base, ext, base_fmt ## endianness, IS_BE(endianness), hasAlpha, eightbytes) + +YUV2PACKED16WRAPPER(yuv2, rgba64, rgb48be, AV_PIX_FMT_RGB48, BE, 0, 0) +YUV2PACKED16WRAPPER(yuv2, rgba64, rgb48le, AV_PIX_FMT_RGB48, LE, 0, 0) +YUV2PACKED16WRAPPER(yuv2, rgba64, bgr48be, AV_PIX_FMT_BGR48, BE, 0, 0) +YUV2PACKED16WRAPPER(yuv2, rgba64, bgr48le, AV_PIX_FMT_BGR48, LE, 0, 0) +YUV2PACKED16WRAPPER(yuv2, rgba64, rgba64be, AV_PIX_FMT_RGBA64, BE, 1, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64, rgba64le, AV_PIX_FMT_RGBA64, LE, 1, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64, rgbx64be, AV_PIX_FMT_RGBA64, BE, 0, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64, rgbx64le, AV_PIX_FMT_RGBA64, LE, 0, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64, bgra64be, AV_PIX_FMT_BGRA64, BE, 1, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64, bgra64le, AV_PIX_FMT_BGRA64, LE, 1, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64, bgrx64be, AV_PIX_FMT_BGRA64, BE, 0, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64, bgrx64le, AV_PIX_FMT_BGRA64, LE, 0, 1) +YUV2PACKED16WRAPPER(yuv2, ya16, ya16be, AV_PIX_FMT_YA16, BE, 1, 0) +YUV2PACKED16WRAPPER(yuv2, ya16, ya16le, AV_PIX_FMT_YA16, LE, 1, 0) + +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgb48be_full, AV_PIX_FMT_RGB48, BE, 0, 0) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgb48le_full, AV_PIX_FMT_RGB48, LE, 0, 0) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgr48be_full, AV_PIX_FMT_BGR48, BE, 0, 0) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgr48le_full, AV_PIX_FMT_BGR48, LE, 0, 0) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgba64be_full, AV_PIX_FMT_RGBA64, BE, 1, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgba64le_full, AV_PIX_FMT_RGBA64, LE, 1, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgbx64be_full, AV_PIX_FMT_RGBA64, BE, 0, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgbx64le_full, AV_PIX_FMT_RGBA64, LE, 0, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgra64be_full, AV_PIX_FMT_BGRA64, BE, 1, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgra64le_full, AV_PIX_FMT_BGRA64, LE, 1, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgrx64be_full, AV_PIX_FMT_BGRA64, BE, 0, 1) +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgrx64le_full, AV_PIX_FMT_BGRA64, LE, 0, 1) /* * Write out 2 RGB pixels in the target pixel format. This function takes a