From patchwork Wed Jun 5 21:37:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 49611 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9185:0:b0:460:55fa:d5ed with SMTP id s5csp45529vqg; Wed, 5 Jun 2024 14:53:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUIVatqvV/73NO/NuH2Bsi/OsohOSN/oly2ZXj8rgkNQSevl027/v6Tzsfmhx75LyM4/E6iYpbv1KRfZSrO4gBFUisQUYOJfzagtg== X-Google-Smtp-Source: AGHT+IFfaWsDkvu2C0axxzPqHm0SSOPj9Z4laRmepwx1vku75MOLPU9gT0LWKnOfguko5+vsPUHh X-Received: by 2002:a05:6512:3b8d:b0:52b:9a53:83d1 with SMTP id 2adb3069b0e04-52bab50b1b6mr3105451e87.56.1717624420336; Wed, 05 Jun 2024 14:53:40 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a68c5b407acsi509022166b.66.2024.06.05.14.53.14; Wed, 05 Jun 2024 14:53:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=FwX+MRX+; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4416468D5D2; Thu, 6 Jun 2024 00:53:11 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05olkn2033.outbound.protection.outlook.com [40.92.89.33]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C468268D444 for ; Thu, 6 Jun 2024 00:53:04 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IFg9gXKeET+QN5u20uzIhQEVVaoWL9nzSAHE+rAFylSOfVD5dstpyqMIC43rucq3L2l05i7JSQKRsd3QM25IbrWArOL55cm9AmuqEZtRIlNT4/9HUmvK2gk6E9c43AWQ/Wl9Q1Y86nQ25heYq9NdqUG84FJeuPfNz0tyU85AE0cZUbEkSz3gxhxNXUGO4bM1oaHKoWKiEapAbLkHbWNAG9Ue+GPi1zmim1bXZIv2aQP0ud/bYQRgFlnwda779eRhtCwyAnff+7JgpMx29Twm3zUrJXt5kWhn6RF2x6G/ZYh+9hisqjWJ9cTxGHMXcq3BacH/Yc8PN/58kxXa/YEVhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iBR2eOogX3KZziE/RoIBvLzE7RPGA9r57oAJconDkXI=; b=MmD2UpsJ0Y7aj/RswgaUfu6EsDPFl4Y/+Iw8jIa5IlYIDEPuvXzBjA2zwsCQQ+XIyzt648JIBEo5YA/FDzrGZjsDa0/oshmGuoMHiV5UyOrH2RrYTKS9VZDjaDjsbt4GoHudMs8pE63rxLfA6RSygv6go2GQ2cEC7YIXQWpd0CMSbbbO/aACYKxbsP1IaHeotkZkOGjeSATtqPYmvjBCB9YklE+snS17xDj4BJu40LqZR6OsxXSafz4JcAY8KWjlfXT8GptrwYWqfClGb7QH5BT1qC53QM4pONVp4Jo/UKfvGDfRWt+PICTDtGhC7sSeI8Uz90NTzkb+0d1aGh8jKw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=iBR2eOogX3KZziE/RoIBvLzE7RPGA9r57oAJconDkXI=; b=FwX+MRX+uRFbUy1zyD1XyDGprl19r4F4DKxAd4GHo283rbgIOoMYE1/LOkdPDmeSycCspuBoRKDw3N6EnaOlXzgl46bPe1fVD5cT4CRlP2igIPbXMUKrMn+1gJKsHsKRsJlag9gHkrrDic8xfQ60XqHn3Lxn01syV2wNDwp0X7+m1a4SXeyAu/H7WyHLjpahSyx0yqEBXndYnIGPnk0T9Jvo80TUK26Pc6NBcWXPKhNWiEK/RZS4CMDvpcNffMtVdb+7XZ6p7u0F2QJsiouf8N9SSNx1Ucy8O66m52FWAln1OcIOGIXgCB2kSCw686uUz26VpMVHBH2Axib4BVdHYQ== Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) by AM8P250MB0357.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:328::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.27; Wed, 5 Jun 2024 21:37:20 +0000 Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614]) by GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614%4]) with mapi id 15.20.7633.021; Wed, 5 Jun 2024 21:37:20 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Jun 2024 23:37:09 +0200 Message-ID: X-Mailer: git-send-email 2.40.1 X-TMN: [Ek9gjiusq1PLqLmaHlGNGCL9WQyEEFPtxOpRQ5SB1v4=] X-ClientProxiedBy: ZR0P278CA0059.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:21::10) To GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) X-Microsoft-Original-Message-ID: <20240605213712.2074050-1-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: GV1P250MB0737:EE_|AM8P250MB0357:EE_ X-MS-Office365-Filtering-Correlation-Id: ac5aa9eb-0e7d-499b-9923-08dc85a7ad88 X-Microsoft-Antispam: BCL:0; ARA:14566002|461199019|440099019|3412199016|1710799017; X-Microsoft-Antispam-Message-Info: dztVKXljuu7ULnRzDDjhIrM3PdstypQu+x1nqi0QQ0EH0DcM2p0Bjo6shoZTgHji/8j1jR58OJkSJdnT1C9dvdVSlHm1fr5DfVKkFw7NA7eu+bADNJtJ4txtJ3cYtYCvdp+UtIT7LdkXSBoUH8JBPYGMesp3r3OL1j8DexSghfYmAXitq/B0d90mz5J7rpCxwIoscZghQSPIRgFvz4HyHBVDF/bETjP4LacaLjdRC5MIYmWQe5rc40m3QyYwv1HMtwB4ZAaTWJ/D217te4L8HUjeXy3VBITkYH4Jd/vNjCq6YFdhw94NUGsggI6EoVNrb87YFt2fuos7BW6uKJ+zGLbnreb02gzvS7DoLDLioQBrf8BrDKw5sG9WpZLoHXkDf0WiTFzZcGVJHDRllYs3fuz4IoLADyHQO3Ju8kqligWbchpEkIs76eC0ig4S/VJWu6TTfxPA0vvfeN46LaxEiry5ljLn5oAuzy1IeUxA9xQS25q9rSM3qRcpPyoZDxx0SOc0+r6XmcJXD6DTGZYQtcuMT2bw9bVmqwfFKyiTbGZLnEVyDJHf0nIxq9GnF75+tp6GGxDP4JB4H3P9jJSqMXVy5D/gF75VcH4bELC41JLAbGryxECa/yAovrcFlgAA X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: IHpM+XG2vpW3a+Qhx1TA+E4rMAo4L3Fj+AewlWeZfoR63AGR6XFDUwbb8qz7q9vjLfUqOqHnkE6wIMc7+TipbfrwvD37sCMy6XQh/Kudv1A39EHaEUG3KXDC8/2AyWtizC73B/QJA8stydBpFZQgfY5nndbzCJvyKmaF5mVngWXQf6T/CbwIe/LuYAFtS7V43MlOBNMe+KW3C9AGypLROLvNPg4DYtI9SBnUhfVP8ilm1LZzz3XkKpnUJ/7UR7vGLIVz25dTrPYaww3neqmlCAG2Vf0dKTAsmfDzhCsImJGkkxXltRUPp2HPOQHU6b2dLFWrFUyGF7yWnM+K/QTmjvpwod8p8az2UkUhtntg8z8ggSxX6Rk2NUVBf/gVi5XEAjbq65VAcKmDccHyomPjokc2sxww+IRxTh8lK0yTOJTBq4Dn6xWNJBI/rKHyJeWO2hR2Ia7gf/9jUSrPmetq6WOi3ViU1y5fEPxv1OSc0+wTDQwHrGIPrwOyOFAlGSwSgABqo3ONqr9aXcUl+wv/u3+OInj489uNwAAMUUt7/x8hKcFJm6yai/f+5G908t+iXiF5NhzvWJMVCYTI/uKEuQV68klK1N6SYogSssn8NCplC2BlE649Gn243whB4Df+VYDvS+eJfvHiCjSychkDCRTt2U2MnrN8ahUVYEtvDZY7i5teUl5x8M2Vy7vkLdTJxX0JPfoddMUGeiEFOUzelXcD4OGGiaCf0BXzYxcItliKjZczauxxIHpwWVvdrHpjnBbjHtEDlcQYFUrZMkFY1smGf6igmtMk46NZAMpVeH4qkefvQtye5Ixed49P9tGpAAGj459crmKp4TZj35voTADDxOeJGhn4+CrW5+4yxMjLWKRiDklv+bKxgyV8H8SMHj2EsR92owo/+OsO1g9koA8yBNh0gE99dnEYr0LoBKC6hLy5YI+zKc+AvGi2CmKDOZZZvMPQ/4BwJCyjvysSVjl+9Hk6jWjssiHzq5UUsH+J4z2NhZ4TwJmHvk7vcrthlW2GRsZ6yTt7J34vKh3lBM9VEgCCI6lZw3dmSv0ng03IQP1wb/XHqUEU/tuUTgdWL9HMNarvUyKbFazAkjsayXafTAIsEUbDGStVB/vIH7jIzDUMOqXf6gWq58xRJLWY0cEGMxXj+1MWmxpiW6vHfKS2UZSj761fHZ99NCQUsy52n4sTYRt7FKV5tKQveJjnOBlge+PJehXkNpHBIf5oj6YVvBVku8PdUu4PxSKWQeIoXoe/TAROkNzSKfFjM+igjv9ZDCapEqrXIEutRNjgHA== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: ac5aa9eb-0e7d-499b-9923-08dc85a7ad88 X-MS-Exchange-CrossTenant-AuthSource: GV1P250MB0737.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Jun 2024 21:37:19.9457 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8P250MB0357 Subject: [FFmpeg-devel] [PATCH 1/4] swscale/x86/rgb2rgb_template: Remove unnecessary SFENCE X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: obXAHa9z2SWY The ff_nv12ToUV_* functions don't use non-temporal stores at all. Signed-off-by: Andreas Rheinhardt --- libswscale/x86/rgb2rgb_template.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/libswscale/x86/rgb2rgb_template.c b/libswscale/x86/rgb2rgb_template.c index edbacea784..e4e884827c 100644 --- a/libswscale/x86/rgb2rgb_template.c +++ b/libswscale/x86/rgb2rgb_template.c @@ -1837,10 +1837,6 @@ static void RENAME(deinterleaveBytes)(const uint8_t *src, uint8_t *dst1, uint8_t dst1 += dst1Stride; dst2 += dst2Stride; } - __asm__( - SFENCE" \n\t" - ::: "memory" - ); } #endif /* COMPILE_TEMPLATE_SSE2 && HAVE_X86ASM */ #endif /* !COMPILE_TEMPLATE_AVX || HAVE_AVX_EXTERNAL */ From patchwork Wed Jun 5 21:38:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 49608 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9185:0:b0:460:55fa:d5ed with SMTP id s5csp40431vqg; Wed, 5 Jun 2024 14:39:24 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV5xVYRMOrUxDvtE+uxlP02qVMsv9eNsFeNbX5JqLQVm4t0UVl7AW0eQqGYbKxYNMx1Jf1jGBtU8TlK+Opl0AIupR+BDDLO5ToISw== X-Google-Smtp-Source: AGHT+IEPAMn20wfYqr467IElb2Rfq7PofAc2+mKcH06myhckENJz7DN2ZEJ84EuoIdyvxW3knI9h X-Received: by 2002:adf:e70e:0:b0:35e:7dfc:345b with SMTP id ffacd0b85a97d-35e8ef09a53mr2555382f8f.35.1717623564088; Wed, 05 Jun 2024 14:39:24 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a6a6ed0fb07si96014666b.361.2024.06.05.14.39.23; Wed, 05 Jun 2024 14:39:24 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=IcKfV6bx; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1FBEE68D5D2; Thu, 6 Jun 2024 00:39:20 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR02-AM0-obe.outbound.protection.outlook.com (mail-am0eur02olkn2017.outbound.protection.outlook.com [40.92.49.17]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D841568D426 for ; Thu, 6 Jun 2024 00:39:13 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YW+UJ0/jFq+a8XCxZ/+fEh+Yt7/HyqVdPOiz0s8QWjLl6zTSUQV/76hl28jCFDdp+hUJmwKAzIHvyQ4jkNi+STkzAGTt1JEjuH7H1O0SjHifEXTnQw/1v8WI/06fyZbGqp+H6h9DxX8jBfFPYR4tqmx3bZyRhykqUR8b4kromKiYXGo2a7PDfL1Loh4A7KYCOru9OpYhzDwA7bn2/WA0h6nFWM3My/sSg027dHPNSfzTPg+fGtojXpKbB72trDK5g8inGcTZLp+b65pQeb59dIJJJIohbU/AhUMo3lraBi5TORthNaotyxpwKtq9FUgv6B/ukltaJN1DnkBTS5/nRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0PG5cqkrGdTcKG4Zl9CpdsRQUwkLJvOhEeD6Ji0jD+w=; b=nr430su0uQfUBSTzF4IbY++82mc77dWcrR81ikO+mVMYp1tfpt4qgw6e/D6d1g5HsCpHMgsjTYQCyh8UVff3B4Ws/BksF0Mo9+sIxRoXpkAXU56nc6nIiErPsDI55285B3m61uia8OQNQdLSLTF2JmBK8uG8G2eGVjwav+SocOHAavtQRlIm9S/fk6LKryaUJ5KvoEa8VTfy8e2hY47wVLuENHv5p5gN6iQiT89YkDhVRGlcK4pJ+F5f57wAUU3NZohJfFKyfxVZp4+Iggi6npoosGYLKkOzmg1DUBKMszpGIWUtV099k9SFQ4k2348DEmxlI7ZY6ZiCXw/WAmNn6g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0PG5cqkrGdTcKG4Zl9CpdsRQUwkLJvOhEeD6Ji0jD+w=; b=IcKfV6bxbXsj9drHXM99chOcDIuGfxcSiIC6kVWTLAUAxp3yP+dm2JRM5dpXp7eIHy7p+q+7Muvwsu+qDkI3bry3XTqTS3seaA8QoOzJkuLqdQXHGjWATcQyVki6QGRVjUtDdhH/pVnrb5EVNpQtX8KqFXeFQ+xShrE5GtJKdg8wkOGwAyXbyx/q6vo5QoL6sdmBTcuvuOUjltye6WQdOR2PoKJ1NBgbWHac71qgqkuXshVk/zrUfBdukpHD3lEZ1hMEVuS6EaOzDfCaMPPMIpqKz5kKDlVL4OQT7D7P52KeHXSXV7UAi0LHXgSHSSxULBUSGKncQdXTAS6dodVEhQ== Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) by AM8P250MB0357.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:328::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.27; Wed, 5 Jun 2024 21:39:10 +0000 Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614]) by GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614%4]) with mapi id 15.20.7633.021; Wed, 5 Jun 2024 21:39:10 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Jun 2024 23:38:59 +0200 Message-ID: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: X-TMN: [d4mM6bEdoDyBWb/UNl9u8siHqTKv+bl7X1cQhf6FMgk=] X-ClientProxiedBy: ZR2P278CA0042.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:47::13) To GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) X-Microsoft-Original-Message-ID: <20240605213901.2074109-1-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: GV1P250MB0737:EE_|AM8P250MB0357:EE_ X-MS-Office365-Filtering-Correlation-Id: fffe0c1b-c541-408f-a596-08dc85a7ef44 X-Microsoft-Antispam: BCL:0; ARA:14566002|461199019|440099019|3412199016|1710799017; X-Microsoft-Antispam-Message-Info: RIs85q2e4PQ1U16XZebqkXdKBHD5LHlVJwamAIjfUMcE9240yvDcsqW3/01ieAlCkAlJgYE1EfPjJE4YzgshfYRR0vEv3uTSVUEPC/GCKtRvAv6QwUAKrNDSLLhC0XiefdRP/9jqpR/Dav7HWJuV9V5OkSZEHyLlJ9ZKN69RMGLA8opoMJD1n775b/4Wg1Q6bHmdM04w6vTRuHtkFNx9HmU0q47JqICWfr2Y2zxJ6YfxD0xGe1tErJeoHigGInw6oMDDzeV/oWBQO6oFpp/NOHnlS/nZBGUm2ccGis1BaZB7lz/9bscuJA1Xx6c6dLyG7jJwx4w9LFEy5voHGtaqYu0/f69Ypd57M43c1DF0umb5K9nkMEv2vESqP+E8sq987AcyO2FTrILOoFvfJnHvscJAa59AuX4q3HuL44xZPBAQMBCwKa4tnektsw3LJT3WR9Vr52bt+9KCHXAtnW64SsuIKrNSwDbjhAYwPYEgzqzbB8CCN2VS0cVlugwWWe+68RejOZfiynHqNgFe/FCAf72aQA4MNzrDdYsAp8VgejXa28SQv+0mzccFYetDkzjR/5cnekPqxfivcGulkGDGemzGSJot2rRoh/4ziFoZ/YzmW/oigNdfTBO80bTvxZ96 X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 5YkLl+G8M98f60IXHrgK6BeK/tjLvCvd8ozsPdQlMl+4qHcIcnkU+d3kl6IZ/Lpr4rqPflDzlJz2jHvWIGjpM6EIy7W53BNRSNoVUXnKUViZrkXJUBV3Y0Dc96p8cpwzM3TWZeXOq4Nk1OcRJ1B6cPzUnVNyj6GkNlZ5sYJ6zMr7ZdnWQlXJBf9gRhD9+s1zMuZUNrw2Q3Kl9xrSh9lzdNggJIAwrsipAXMyQxeo9IGdeiXzbWEm2/ZSFoaSmmfzv5JLhrhydVYOFVfT8PoA2dXFOvMJw72DguHy/sgoVnAZ9xgNCC6wXtCNEa2b9tBo4pXhlnvHA/PUrOTEqT2UTObxm85CV/wIM4fCJFgyXWfJVJ5fXh466Zybz4a9K6VuPxBXo2cyaoc0JYD2SOX8Uz5qw33cxKDg7h2eNxz2ZoW+c0y/WaVng+/PmoFHlNKy2btpgic2KZkeidzykPVkqF3cX7FmtO4bJuS3hKfFtdb5Zpn3yDvFZ0NicfMzOhEqcwP9TKlgkp2fubYoc+NvGumZshM2qRvp4pJ4BaKBI0aFBWemttpIUgdRzT2RX/RqgMzlFdWNQ0f4SzV3k3M3DlW6NR3oYF/gUjsKuJvsZh+npdo1WuCsOfsZ127HpCqXJi5TgPx94FUzdQ1dFIH4JFq4dMXN0MuAgkvBqRhq77k5y8aVNTyMfADUl/xo+QeQyjHJA/ADn9amn62vCVHg+e8glilVrjGC9PxI3HuJmEdIVcFVRSIc0XYwcPbkZDBkKyceYq4xrFzp7Nq1BYNp2y8Y0j/W8qNO1xjkUDl3VAXOcL22PW4r/vtPl/E3aArFvBu8XiKeSvnyqRo+CBf7CWWI83a+5CNDdCMhZSDUQGNhfXJRBRGWh4u5sjV0p660XbbQOCxnb2hgMmu9u8QUP64CEXN8k6MK77WTIBzm6SB5YAuDErL1yrxDM9e45J0iqk52jgUim/UShmU8lcSQaP4v/6jjom6oQsGCp75eTlnCaUAZkWPNVeoTCaEL+BUgoZa5Ry+DH1nUJUghEs7MlIXBAfEOsXBJ2aM2iQ8yjLDFRAge7khgdebEeTPyefYIGuq3s81T8KElugVtdcA6kjAw7yIGD1PFRqXNpumTzIBbTqpQZJmaEsISKU1OL3sV2kduBKWu0ph9Zpluh++zdhHygWf9tWU1AD530A2/zGwnNUzsqq41h1CQdEFSY9oHlllKJ7IgXMtDilvojp6ttizHItpsp2206rVlhBg3IiQKjOzFgvVeZHnVaV9nO7H8POx3hr6lxCO47W4WSf3CIw== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: fffe0c1b-c541-408f-a596-08dc85a7ef44 X-MS-Exchange-CrossTenant-AuthSource: GV1P250MB0737.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Jun 2024 21:39:10.2343 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8P250MB0357 Subject: [FFmpeg-devel] [PATCH 2/4] swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASM X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: g4T73GMOBv4t The SSE2 and AVX versions of deinterleaveBytes are external ASM. Move them out of the inline ASM template. Signed-off-by: Andreas Rheinhardt --- libswscale/x86/rgb2rgb.c | 48 +++++++++++++++++++++++-------- libswscale/x86/rgb2rgb_template.c | 30 ------------------- 2 files changed, 36 insertions(+), 42 deletions(-) diff --git a/libswscale/x86/rgb2rgb.c b/libswscale/x86/rgb2rgb.c index b325e5dbd5..be6f5abc95 100644 --- a/libswscale/x86/rgb2rgb.c +++ b/libswscale/x86/rgb2rgb.c @@ -100,13 +100,6 @@ DECLARE_ALIGNED(8, extern const uint64_t, ff_bgr2UVOffset); #define RENAME(a) a ## _sse2 #include "rgb2rgb_template.c" -//AVX versions -#undef RENAME -#undef COMPILE_TEMPLATE_AVX -#define COMPILE_TEMPLATE_AVX 1 -#define RENAME(a) a ## _avx -#include "rgb2rgb_template.c" - /* RGB15->RGB16 original by Strepto/Astral ported to gcc & bugfixed : A'rpi @@ -138,6 +131,33 @@ void ff_uyvytoyuv422_avx(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, int lumStride, int chromStride, int srcStride); #endif +#define DEINTERLEAVE_BYTES(cpuext) \ +void ff_nv12ToUV_ ## cpuext(uint8_t *dstU, uint8_t *dstV, \ + const uint8_t *unused, \ + const uint8_t *src1, \ + const uint8_t *src2, \ + int w, \ + uint32_t *unused2, \ + void *opq); \ +static void deinterleave_bytes_ ## cpuext(const uint8_t *src, uint8_t *dst1, uint8_t *dst2, \ + int width, int height, int srcStride, \ + int dst1Stride, int dst2Stride) \ +{ \ + for (int h = 0; h < height; h++) { \ + ff_nv12ToUV_ ## cpuext(dst1, dst2, NULL, src, NULL, width, NULL, NULL); \ + src += srcStride; \ + dst1 += dst1Stride; \ + dst2 += dst2Stride; \ + } \ +} + +#if HAVE_SSE2_EXTERNAL +DEINTERLEAVE_BYTES(sse2) +#endif +#if HAVE_AVX_EXTERNAL +DEINTERLEAVE_BYTES(avx) +#endif + av_cold void rgb2rgb_init_x86(void) { int cpu_flags = av_get_cpu_flags(); @@ -147,18 +167,19 @@ av_cold void rgb2rgb_init_x86(void) rgb2rgb_init_mmxext(); if (INLINE_SSE2(cpu_flags)) rgb2rgb_init_sse2(); - if (INLINE_AVX(cpu_flags)) - rgb2rgb_init_avx(); #endif /* HAVE_INLINE_ASM */ if (EXTERNAL_MMXEXT(cpu_flags)) { shuffle_bytes_2103 = ff_shuffle_bytes_2103_mmxext; } +#if HAVE_SSE2_EXTERNAL if (EXTERNAL_SSE2(cpu_flags)) { #if ARCH_X86_64 uyvytoyuv422 = ff_uyvytoyuv422_sse2; #endif + deinterleaveBytes = deinterleave_bytes_sse2; } +#endif if (EXTERNAL_SSSE3(cpu_flags)) { shuffle_bytes_0321 = ff_shuffle_bytes_0321_ssse3; shuffle_bytes_2103 = ff_shuffle_bytes_2103_ssse3; @@ -166,16 +187,19 @@ av_cold void rgb2rgb_init_x86(void) shuffle_bytes_3012 = ff_shuffle_bytes_3012_ssse3; shuffle_bytes_3210 = ff_shuffle_bytes_3210_ssse3; } +#if HAVE_AVX_EXTERNAL + if (EXTERNAL_AVX(cpu_flags)) { + deinterleaveBytes = deinterleave_bytes_avx; #if ARCH_X86_64 + uyvytoyuv422 = ff_uyvytoyuv422_avx; + } if (EXTERNAL_AVX2_FAST(cpu_flags)) { shuffle_bytes_0321 = ff_shuffle_bytes_0321_avx2; shuffle_bytes_2103 = ff_shuffle_bytes_2103_avx2; shuffle_bytes_1230 = ff_shuffle_bytes_1230_avx2; shuffle_bytes_3012 = ff_shuffle_bytes_3012_avx2; shuffle_bytes_3210 = ff_shuffle_bytes_3210_avx2; - } - if (EXTERNAL_AVX(cpu_flags)) { - uyvytoyuv422 = ff_uyvytoyuv422_avx; +#endif } #endif } diff --git a/libswscale/x86/rgb2rgb_template.c b/libswscale/x86/rgb2rgb_template.c index e4e884827c..5c73fa4e16 100644 --- a/libswscale/x86/rgb2rgb_template.c +++ b/libswscale/x86/rgb2rgb_template.c @@ -1816,31 +1816,6 @@ static void RENAME(interleaveBytes)(const uint8_t *src1, const uint8_t *src2, ui } #endif /* !COMPILE_TEMPLATE_AVX && COMPILE_TEMPLATE_SSE2 */ -#if !COMPILE_TEMPLATE_AVX || HAVE_AVX_EXTERNAL -#if COMPILE_TEMPLATE_SSE2 && HAVE_X86ASM -void RENAME(ff_nv12ToUV)(uint8_t *dstU, uint8_t *dstV, - const uint8_t *unused, - const uint8_t *src1, - const uint8_t *src2, - int w, - uint32_t *unused2, - void *opq); -static void RENAME(deinterleaveBytes)(const uint8_t *src, uint8_t *dst1, uint8_t *dst2, - int width, int height, int srcStride, - int dst1Stride, int dst2Stride) -{ - int h; - - for (h = 0; h < height; h++) { - RENAME(ff_nv12ToUV)(dst1, dst2, NULL, src, NULL, width, NULL, NULL); - src += srcStride; - dst1 += dst1Stride; - dst2 += dst2Stride; - } -} -#endif /* COMPILE_TEMPLATE_SSE2 && HAVE_X86ASM */ -#endif /* !COMPILE_TEMPLATE_AVX || HAVE_AVX_EXTERNAL */ - #if !COMPILE_TEMPLATE_SSE2 static inline void RENAME(vu9_to_vu12)(const uint8_t *src1, const uint8_t *src2, uint8_t *dst1, uint8_t *dst2, @@ -2441,9 +2416,4 @@ static av_cold void RENAME(rgb2rgb_init)(void) #if !COMPILE_TEMPLATE_AVX && COMPILE_TEMPLATE_SSE2 interleaveBytes = RENAME(interleaveBytes); #endif /* !COMPILE_TEMPLATE_AVX && COMPILE_TEMPLATE_SSE2 */ -#if !COMPILE_TEMPLATE_AVX || HAVE_AVX_EXTERNAL -#if COMPILE_TEMPLATE_SSE2 && HAVE_X86ASM - deinterleaveBytes = RENAME(deinterleaveBytes); -#endif -#endif } From patchwork Wed Jun 5 21:39:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 49609 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9185:0:b0:460:55fa:d5ed with SMTP id s5csp40608vqg; Wed, 5 Jun 2024 14:39:44 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXC9XHsSVaDgJTNslqh0gaKAi5vUGp3Ywu2mVnmhjCDygzblQfDBcUHAZ5XgjrMaVuLULIyOaomTPSZMmS7/xVpKozt+yqqY782Ug== X-Google-Smtp-Source: AGHT+IEOAdHKEqN1C1RxKpHfmvrOD06CizlZsaglIfoqJyOR0QIhaFfuR5DIoqGhoMBfdOBE9qVb X-Received: by 2002:a50:d6dc:0:b0:57a:2400:6fb9 with SMTP id 4fb4d7f45d1cf-57a8b6b7059mr2415754a12.20.1717623584065; Wed, 05 Jun 2024 14:39:44 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-57a31b80e3esi6493646a12.102.2024.06.05.14.39.34; Wed, 05 Jun 2024 14:39:44 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=cTTIUeP8; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5D0FA68D5CB; Thu, 6 Jun 2024 00:39:31 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04olkn2090.outbound.protection.outlook.com [40.92.75.90]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 77F1268D6A7 for ; Thu, 6 Jun 2024 00:39:24 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=agBP6ZHeHGsg0V1kE4H4d8UMpxGbYCle6zahaGV+DeW0Cnp42D/cr1bI0Kf4jEON3HyiaEGTWo6eZYcoXgxJHkAcyoQ6Pj46iunT0ut/ZmXkj5yr53A+BWzUYkXQL5OSl1td1bX1UJt2hQUbKRMuvYN2hDwaGo+9hHZtz0TgeCnfxMsQhk5NFL42FlKLrGC0Z1eikL0f7nL/KbV2/spTcuxKowvC/fs2VTaWHYon9u07fijYtvNjnV++zcy271SDOQR9RR/W4JpDpcPK8pJM4umE8r4B9pxNb3EtaYecRDxgtqsNetOWCNbfPo7dZA0fCEpou0vum4+KFE0GkhlWRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/ByTJ0fEhxkTVYN36lncrjl9j/wiKg5EE8oW2xNjRTQ=; b=YL56x450f/F5A6H/WxV7VBGLwpcyvGOmb21rHAqfcwXJ9lmvjGNefdM2IiaTbpWTsngNAINtMSzTXSLRb4jhwGTGy42tpsK0LtAmDebQYd5Jexl/32G0GzPATRaZ4Vkdg59OZCFm7H7Sk9pIi5NwTQ2VPrO4t3ZgxLrqLko96shNy/8M80qGJUzyKtWRk+EX8FchgIapQheNwE6tmAMgU4zQUCPhwUxukZGjyrEw95rifB+MnfOFZMvrX8sUp2Unn3MuGjV962b9A783BG3cTLEk+5G+NOOJoi7a43VeiWKKlJVNkNdbfq4f3CLFQCtND30Ex5muMlX7wMQiYywwhA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/ByTJ0fEhxkTVYN36lncrjl9j/wiKg5EE8oW2xNjRTQ=; b=cTTIUeP8ijiQ54OlN84z5fW5GEWrwOdYJIe9QP84gw6+B2G7eO2cWEHQBpKdLUDvjIQ2nkMzQUTCCpDOdFKrDsawvejStUHPltDgMfNz94VTvsdrdVTRfNOhm+opYSYxqljjFYDqlFUxdCx9fa6pxTa/2gF8wKgmxetWt0M5OhvOYOgcYCEydkgA+7+kfNUfolVOVMrM4La5+jx218ha8W2K8qLgvGmeuHa7xUxGyI9Qwfp2T9Iu41eIBHxY+OBX6IEYD4c8OQ42JQNsIe5gvd2w2k22ykeToX15aZGP654hcFQZAj79ZTgMt+2QObB8VRh5mJme1sVLxlr6L6V4dA== Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) by AM8P250MB0357.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:328::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.27; Wed, 5 Jun 2024 21:39:21 +0000 Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614]) by GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614%4]) with mapi id 15.20.7633.021; Wed, 5 Jun 2024 21:39:21 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Jun 2024 23:39:00 +0200 Message-ID: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: X-TMN: [WcYt39ZJq+geRiu0KEQEhWqPWl7Mn7VnR0jYZ2JBzvM=] X-ClientProxiedBy: ZR2P278CA0042.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:47::13) To GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) X-Microsoft-Original-Message-ID: <20240605213901.2074109-2-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: GV1P250MB0737:EE_|AM8P250MB0357:EE_ X-MS-Office365-Filtering-Correlation-Id: 94348ad4-d858-4702-8a92-08dc85a7f5ea X-Microsoft-Antispam: BCL:0; ARA:14566002|461199019|440099019|3412199016|1710799017; X-Microsoft-Antispam-Message-Info: 9zBQQBI5zre89YhAcBt7R1HgSOFarotaZQm7kRum2AuT3OM92iTcnLsUz0hXJN/5gmyWDbyJeqbH+DmEICXLZswDEkhkgT9CzD0iKqJgoZ7Rwu/vci8ZtQqZxntfw3cP8E/DVZx1WZF/H5r9D/U2cbGWlTNQWC+AtnZYqIf0jt4RMA1JzMvFCoAZLc+T/r1vdFu7l6i8MQVIl9o438U58+em8LfplmPCIATz43iKTYfu4yuRVf5qPefv1qVvJfWoPGqkfko0yPAlxAD1qU7EKnjDVUAwVVpA7ZjXw+OU9eXxh8hlFSch7zpI+FPPSRVXeNNLof00g5pd+MXMBvEk4E0FSvRQZYHEAiv5luLhXMtyZ4TjPzA1Gm0x3QwVbtWDryKUw4x5aJ8YAxg6OC4L19JYtoyJMD5YEDY51435Z27eOn5G62DaE2jh7kBBfnNIJY5IYxw64ldUSHiqeGcDwUvgaSZ+aKd7ncRYUrujUHIpe+3kye5qZ9VLJ336DoKCXEYyNOLBCZAHBk1dek7zI8fLjtMUdbZ1ZaQV4noCmFhvG+X4Vx6d/0L8jrRMq0TDJtReZlQB+HnpFc+0j5woKWMr+Iwho2ewz/qhs8ais32YQuEP+8NIoECKp+eSAX6v X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: sM3IemInx/bShkMBOr29/QE73t580bbtUkkJTaIro1wqPMkjCEP7KwoUVM8/pUO1NisEPpBSbhAj7wCkwIZxfXxdyr+RdM+6OWow62B1DUA3k3L6ENmSUBqfXA0UpOd1YO8RKtIgV3N46t3TOKANFjOqlQkND0tiDAENNGz79X9ig9DF4pm1koUYQq5QAxzk3c0fzYTKL1pqMTRtOb/J5l18b0yfGm2oTSt+btaSUy8nbKcBSguNGgk8qziZBtP3MmpU57TEaIJfdki9rCIGQepRe5nCI9ImPTJuRyQCvrVfgYEggiOrxEmaLAG6In11+olpfByGJt+EGM7kjD4yqlSkvq6T7aNbhsXMOpfx6aW4FKI/77t7XE6yujlpZ4gYhIsiG9K2FiYL6jKF8KhEOUu5fJAarqzzfTBQZwkgi19T/q9o18IMs8nW9YIQ6loDBxIiBa40J+g065YjQQ35TDcFwHMM7DkNsH87cBav7zdB90nBzkB073kB/7X+RcCXJ5FySTGq5WNoYoGfZam0RWs67AS4wE8H8c73/qZzL6sWVXmUrLu2k6soANDFiNNtRQ1rLy3ZMJWKc9XJWE+3eoLxndEGwtUamw9YpqALveAuokUtyKA1YHkIucSmkAl9/FVqOEm+mO4AdiAQaFQOilbaCo4bWnTCi0asyWCFhWmvW561D0OoDBvNR1DA7RzyCfDdiECYyxdDqR3DRKqjIam5D565NfElR2YuT7hf+I1EgQ7YMhbjU60OC225QxEZDNQQmYXmIpRstGk/xf3mLPOy63hzsCknOkL5P1UpTqiFxgaf81nrcYO2Dl0BEEeyZFJrGm83QLW4GH4v4HKJRMiBWsn1z9AQecLaGZLx+8sXo3i95oMYumBDuxl/SFwNewJcaDHK4muJT2KKggy+BN1UOZ646X7g4po7PaNsQlX0D4PyN2S2h5PSy3v3mqKnKCznHJvn3kMXRtds5YVEify9Q5nk0+XAMN96uGTUjrwvDJNhipnxj/RtKA9+maZy9LqajWkVJBrd1xuOAhCV1HHbMBUEymD3H7xFG4iIcTf+HMwwvWXVe/rZ1Tu4n75g4TmB3of3MkNewhniTwNPH6Itnr5poGk8PFJN8mbq4wB7DnCiTZARQDb+pBx2RRFFomAcqRp33DR1vCeFitoRLaCxFpe2C7SaUmkQvqyolUQQWdneOOJYXQ3wVAuee7F9oxNafylvM/OyNu9EVe5aFManjzc7yrM+9HMXtDp5Lrwm37KOJP6smApLocwpnD6TUHAgxr/8dk+S26taGf7CYg== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 94348ad4-d858-4702-8a92-08dc85a7f5ea X-MS-Exchange-CrossTenant-AuthSource: GV1P250MB0737.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Jun 2024 21:39:21.4233 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8P250MB0357 Subject: [FFmpeg-devel] [PATCH 3/4] swscale/x86/rgb2rgb_template: Remove unused uyvytoyv12 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: lsQ2MfZ+vXNs Signed-off-by: Andreas Rheinhardt --- libswscale/x86/rgb2rgb_template.c | 104 ------------------------------ 1 file changed, 104 deletions(-) diff --git a/libswscale/x86/rgb2rgb_template.c b/libswscale/x86/rgb2rgb_template.c index 5c73fa4e16..d1403d08e6 100644 --- a/libswscale/x86/rgb2rgb_template.c +++ b/libswscale/x86/rgb2rgb_template.c @@ -1432,110 +1432,6 @@ static inline void RENAME(planar2x)(const uint8_t *src, uint8_t *dst, int srcWid :::"memory"); } -/** - * Height should be a multiple of 2 and width should be a multiple of 16. - * (If this is a problem for anyone then tell me, and I will fix it.) - * Chrominance data is only taken from every second line, others are ignored. - * FIXME: Write HQ version. - */ -static inline void RENAME(uyvytoyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst, - int width, int height, - int lumStride, int chromStride, int srcStride) -{ - int y; - const x86_reg chromWidth= width>>1; - for (y=0; y X-Patchwork-Id: 49610 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9185:0:b0:460:55fa:d5ed with SMTP id s5csp40634vqg; Wed, 5 Jun 2024 14:39:49 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWqXm/rnoOfG/83LkonQy5YNvhearUMDnhP2OvRU+Z6g9WRe7+uJ4GgkDgfekuBN8BmU7KEeR7nlqvtb3WzSbqjo1wDiKSnEcxwYg== X-Google-Smtp-Source: AGHT+IFA3td6OZ9kVxTcVls8PJHEg3hxFSYPFsWLuvou6OXIOILzOWIHQDwYrHqNSqJYuUiLFXCP X-Received: by 2002:a17:907:1385:b0:a6c:6fca:5fee with SMTP id a640c23a62f3a-a6c6fca60f6mr103420766b.4.1717623589164; Wed, 05 Jun 2024 14:39:49 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a68e8bb2875si436293566b.741.2024.06.05.14.39.48; Wed, 05 Jun 2024 14:39:49 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=QpPIvq4Q; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C76A168D6C1; Thu, 6 Jun 2024 00:39:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04olkn2101.outbound.protection.outlook.com [40.92.75.101]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 588BE68D6C3 for ; Thu, 6 Jun 2024 00:39:37 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Yb0e8plej7DHEx+wwmP/+qW0YbafSgiYvEwPc1YkjBZNi+C/OI0TKVvT1Rw7AFqhC+Sv/onXxL9d9pAmE5CWTi7C644gc9QmaQxpLDoOIgAqJlHZcvWeUEqNqql8d7ZCoXbYayo+mhTV+SddPBY5zFw5yPLTWjTnwl08iogL//1u+ZYR1OLordOpcJ9AIOwMXBe3HA7DAEIQ27Mq0cXviQFmOzdvU4qsWkiKjT9kqX3xDaUx4UideC5gWFMpCHYxqswhSHoEwSLM/yxOmfKmGn0Z9uy78GMGs3FXUrHh/pMz5pq6oPYCcy5j76r4vdRTvXOv0ghzSmcK5eZAZPO6hQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XTKyLKWr7A6q6C7ForLkOPMdbp7xy4aFxcWvKSqdlHY=; b=ZU9m6SJo3ifb6aWu478CJr0f0sQwHuQfZjWe808ZvTsmIraAwevDAHBNElaWTzpe2we1rFF2Ge8wM0HAKCeV6X/ZKTfRxfMKgYZYC7Nrff3DkDP7oiOEKH7mpQExkhM2s/QiQ938HJ1P/oNnpNWqKk+DfyovMRkOPO7JCq/PvckQ4Kf0k3EtvmrezYOxWkIYL8fMIL9cQjdekGJbAJpMli2Jy/TgZ3ruMlqAasXe6Qa7o9hJWqJaPRonvhzbb55kvMwQtsYWgaMofC/d5OuTYef+WwsYq0WGthRsxDdImi2IkpllMKZy3uAPXbq9IFh/NZevXpOL/Sf6GQx4J5rLfw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XTKyLKWr7A6q6C7ForLkOPMdbp7xy4aFxcWvKSqdlHY=; b=QpPIvq4QJvyh6y1E2iJU0/zf2a9AwCJ1yWhCy9N1kJQ1alYB/lrSGyUgarUCrAXEVgmburEgtaCvn6Pf0Whj1Jfd1BwLwZzYZI+T5PxWQD8xKbpaIJsX7nw0bSY3eirSfiidRZee3PcRHjYceNO3xgYG2j6HLzLuA/ptuprlyVjqx1C6LdPOzq7L7hHSb89e9RS7xzRPwXamwrHkXkvepAiP7IuJw23Y+ltsTJfC88THc7S4WomfeZm4ABNO2ydXSaxicAuny6acKv4H1FC5C6c5YJhMJUB5EsUK1JrtGll/21vwC9EVKjSvNlNuE/Nyq6zSCkeIfnEZNmlYfNDWaQ== Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) by AM8P250MB0357.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:328::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.27; Wed, 5 Jun 2024 21:39:29 +0000 Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614]) by GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614%4]) with mapi id 15.20.7633.021; Wed, 5 Jun 2024 21:39:29 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Jun 2024 23:39:01 +0200 Message-ID: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: X-TMN: [cvmH6TekttNojz71o8G+FdLXyvFFyGAC] X-ClientProxiedBy: ZR2P278CA0042.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:47::13) To GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) X-Microsoft-Original-Message-ID: <20240605213901.2074109-3-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: GV1P250MB0737:EE_|AM8P250MB0357:EE_ X-MS-Office365-Filtering-Correlation-Id: 243fb9cd-99a2-454e-df81-08dc85a7fa63 X-Microsoft-Antispam: BCL:0; ARA:14566002|461199019|440099019|3412199016|1710799017; X-Microsoft-Antispam-Message-Info: d3AhALSLt4VyGqn5cALEtRj79GsRtqY37JKbQYGVUr4VXRtkSVk6jVUTOhZ3IzcRPDJuPzmpTYe6HdQhkUecIiERB8t6AtYMtmUQDfdp3VxNlhnYvxM6MbGg9F/dZVaGed38hLaqvBCXz8ry9sJnRyYNq7LSkUoIKAjLcmWsVfxHfolov5bFtUcFYWs7ivtgLIpTH4bP2fNiPu3EEC2hafpf2Fev5buPdAKBUXvjk0Q+5vdWEMdO6rGcSUYuJu2okP8ufxn+iRr4sXp/Xmuz9WAXFG81vwbtVnuE/K6hpWpyJX3ZqwmTUcye7R7X46HHxHAtnIrUF+z5sTneE1OZby1IVzhnFOOArKd5jpbjo6DxoVzAzCAmnBlSu6LhoKh6388QvM998/dTwRno/n1HNuhfakEiDMQd8X0m/h/ySdSiJkylHZAfell9FFXRZQxUQGXYzwdg9DurEyYJLSxi9ffSc1mhPu9ZgfGeI/SbqQJPGlymzTxi20n3nEGsSPieg22CLk8AuOhNlvvBrwqYIxyc+GOnM+XzTzCFJkbrpAkklJuuAV62bt2EWpMfYrY96838GNOTlPrEpRPj7gMQPCushE+SwIUPHHYfYWpEMZDoGo0JGOMg0GxfBPX+cmoZ X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Xmm6lpu5JV6fckynbV+9mb7TzhoYADRHCH8kNRbYtajFJ3QCBJfiboMHNbTQo/5yJhG8T4eB/JXOZzDGjKZv/NVqIJaINgZqgE0HH7k6R4ggQRHcBU2GyZir5Q9sGoesGWJMczfF0dvEXtEK42rsjxwal9Lj09G//xEGPVNDKxLDLR3y9EYRNGiY0tR4Yx9kTQeMsIcOqwAqJh/lChCKkGR5iNJ7PHUAWNPMcNJaTg0wQAdiW2gAHJE9ndHuyofV4Xi2Se8v6+2LidCKAP5uPzsypVEiA6bEQLQJuhQgEh+FYFEmCPiPRKGZVf83PdI+bVZMGZZ6ssoErlUTGPX+zi6rZegKItcDbsOtBMs1mvRoWE8dwVqCvpe+yqK35cPw+oOuKVYUe4C90OksplLL9fI8TIHrolj1lNtmfwhw4I53wVugDSmAUlHfabHHavuqe3NAJnHreCKccgk1AO7ICf4CmCL1/H29TdD7BQ57E3NRKgiH+IB2lfT5ElvA75Y22yfcbwAgNZ55yBdrTglJbVHIWsABMudlq2kkBAebfohVVAwIc02I1SXrnhmUl+j04tGbRPXG92eUXSUBd/r0+gWBhEKxuOGD7hWw86F43lN/7PaMpP8fD48X0HyUYfoWP/Lw4SwEjbM1FWAfzq5FM0gDUnyUnM6aA+JG4TYZmTTFRYHJUZXZ00m+e9zxjUjfH+iYBehdshvPzDJfTrPSLgAjTg23L9ujmoMvrbJpyf1O6zRt/Xq7owmdDSOy2XtSfkF5EhTMcTcvKxs/ty4Ox5QvrqYvYg3Z+0ofCPEP/PxX6gwDFE5MnXw3f88ogGAe8UCKbXaPwAjkwpYl4L3kRmi8sYiSIIEeLcATfFk5iZ6tPMsLMTEr0HVchcT8RycG8ko1EsEfPmjmXYm8zZuXsMPKuJUjEGgrtJIypOIO+mUYnOmyKWsHGO+LptcRbFTMXyyzd4jOFMkcw2O5z2HvGvpcoO0zseOvu5FdfKLORJxZUpPq5g4DVv1iJ99HT+q14ZpOL3GuZ2cG2BCzSTjXA+owLaE26dMNqsFhr51YGYbEvTWDRUvnULrh1UnwK/JkQjyTbcAGs4ciAXeH1pnY9LNKVdc/G13Wi+6aY61h/myi9i9RVtWs32fX0yHEHpX4fduN6YsIc2AbeNGwYWlWGBUYeOOlfvi+pnTFKYcYARx5D46PYjw1w4UPYZ6AkEwPIcYXxP5dH/yiK/0BEV3WL9jFh2LibaeFqPdTSRpa9tABtlfWrcC+vAC9lXo8N+reMeyQj5aNfmbDVETN9LBIGQ== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 243fb9cd-99a2-454e-df81-08dc85a7fa63 X-MS-Exchange-CrossTenant-AuthSource: GV1P250MB0737.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Jun 2024 21:39:29.0633 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8P250MB0357 Subject: [FFmpeg-devel] [PATCH 4/4] swscale/x86/rgb2rgb: Detemplatize X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: q3DQys8lS9XR Every function in rgb2rgb_template.c is only compiled exactly once; there is no overlap at all between the MMXEXT and the SSE2 functions, so detemplatize it. Signed-off-by: Andreas Rheinhardt --- libswscale/x86/rgb2rgb.c | 2272 +++++++++++++++++++++++++++- libswscale/x86/rgb2rgb_template.c | 2315 ----------------------------- 2 files changed, 2253 insertions(+), 2334 deletions(-) delete mode 100644 libswscale/x86/rgb2rgb_template.c diff --git a/libswscale/x86/rgb2rgb.c b/libswscale/x86/rgb2rgb.c index be6f5abc95..2bfab2cf16 100644 --- a/libswscale/x86/rgb2rgb.c +++ b/libswscale/x86/rgb2rgb.c @@ -37,6 +37,7 @@ #include "libswscale/swscale_internal.h" #if HAVE_INLINE_ASM +#include "libavutil/x86/asm.h" DECLARE_ASM_CONST(8, uint64_t, mmx_ff) = 0x00000000000000FFULL; DECLARE_ASM_CONST(8, uint64_t, mmx_null) = 0x0000000000000000ULL; @@ -83,22 +84,2255 @@ DECLARE_ALIGNED(8, extern const uint64_t, ff_bgr2UVOffset); #define RV ((int)( 0.439*(1<>1)&0x7FE07FE0) | (x&0x001F001F); + s+=4; + d+=4; + } + if (s < end) { + register uint16_t x= *((const uint16_t*)s); + *((uint16_t *)d) = ((x>>1)&0x7FE0) | (x&0x001F); + } +} + +static inline void rgb32to16_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint8_t *s = src; + const uint8_t *end; + const uint8_t *mm_end; + uint16_t *d = (uint16_t *)dst; + end = s + src_size; + mm_end = end - 15; + __asm__ volatile( + "movq %3, %%mm5 \n\t" + "movq %4, %%mm6 \n\t" + "movq %5, %%mm7 \n\t" + "jmp 2f \n\t" + ".p2align 4 \n\t" + "1: \n\t" + PREFETCH" 32(%1) \n\t" + "movd (%1), %%mm0 \n\t" + "movd 4(%1), %%mm3 \n\t" + "punpckldq 8(%1), %%mm0 \n\t" + "punpckldq 12(%1), %%mm3 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm3, %%mm4 \n\t" + "pand %%mm6, %%mm0 \n\t" + "pand %%mm6, %%mm3 \n\t" + "pmaddwd %%mm7, %%mm0 \n\t" + "pmaddwd %%mm7, %%mm3 \n\t" + "pand %%mm5, %%mm1 \n\t" + "pand %%mm5, %%mm4 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm4, %%mm3 \n\t" + "psrld $5, %%mm0 \n\t" + "pslld $11, %%mm3 \n\t" + "por %%mm3, %%mm0 \n\t" + MOVNTQ" %%mm0, (%0) \n\t" + "add $16, %1 \n\t" + "add $8, %0 \n\t" + "2: \n\t" + "cmp %2, %1 \n\t" + " jb 1b \n\t" + : "+r" (d), "+r"(s) + : "r" (mm_end), "m" (mask3216g), "m" (mask3216br), "m" (mul3216) + ); + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + register int rgb = *(const uint32_t*)s; s += 4; + *d++ = ((rgb&0xFF)>>3) + ((rgb&0xFC00)>>5) + ((rgb&0xF80000)>>8); + } +} + +static inline void rgb32tobgr16_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint8_t *s = src; + const uint8_t *end; + const uint8_t *mm_end; + uint16_t *d = (uint16_t *)dst; + end = s + src_size; + __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); + __asm__ volatile( + "movq %0, %%mm7 \n\t" + "movq %1, %%mm6 \n\t" + ::"m"(red_16mask),"m"(green_16mask)); + mm_end = end - 15; + while (s < mm_end) { + __asm__ volatile( + PREFETCH" 32(%1) \n\t" + "movd (%1), %%mm0 \n\t" + "movd 4(%1), %%mm3 \n\t" + "punpckldq 8(%1), %%mm0 \n\t" + "punpckldq 12(%1), %%mm3 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm0, %%mm2 \n\t" + "movq %%mm3, %%mm4 \n\t" + "movq %%mm3, %%mm5 \n\t" + "psllq $8, %%mm0 \n\t" + "psllq $8, %%mm3 \n\t" + "pand %%mm7, %%mm0 \n\t" + "pand %%mm7, %%mm3 \n\t" + "psrlq $5, %%mm1 \n\t" + "psrlq $5, %%mm4 \n\t" + "pand %%mm6, %%mm1 \n\t" + "pand %%mm6, %%mm4 \n\t" + "psrlq $19, %%mm2 \n\t" + "psrlq $19, %%mm5 \n\t" + "pand %2, %%mm2 \n\t" + "pand %2, %%mm5 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm4, %%mm3 \n\t" + "por %%mm2, %%mm0 \n\t" + "por %%mm5, %%mm3 \n\t" + "psllq $16, %%mm3 \n\t" + "por %%mm3, %%mm0 \n\t" + MOVNTQ" %%mm0, (%0) \n\t" + :: "r"(d),"r"(s),"m"(blue_16mask):"memory"); + d += 4; + s += 16; + } + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + register int rgb = *(const uint32_t*)s; s += 4; + *d++ = ((rgb&0xF8)<<8) + ((rgb&0xFC00)>>5) + ((rgb&0xF80000)>>19); + } +} + +static inline void rgb32to15_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint8_t *s = src; + const uint8_t *end; + const uint8_t *mm_end; + uint16_t *d = (uint16_t *)dst; + end = s + src_size; + mm_end = end - 15; + __asm__ volatile( + "movq %3, %%mm5 \n\t" + "movq %4, %%mm6 \n\t" + "movq %5, %%mm7 \n\t" + "jmp 2f \n\t" + ".p2align 4 \n\t" + "1: \n\t" + PREFETCH" 32(%1) \n\t" + "movd (%1), %%mm0 \n\t" + "movd 4(%1), %%mm3 \n\t" + "punpckldq 8(%1), %%mm0 \n\t" + "punpckldq 12(%1), %%mm3 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm3, %%mm4 \n\t" + "pand %%mm6, %%mm0 \n\t" + "pand %%mm6, %%mm3 \n\t" + "pmaddwd %%mm7, %%mm0 \n\t" + "pmaddwd %%mm7, %%mm3 \n\t" + "pand %%mm5, %%mm1 \n\t" + "pand %%mm5, %%mm4 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm4, %%mm3 \n\t" + "psrld $6, %%mm0 \n\t" + "pslld $10, %%mm3 \n\t" + "por %%mm3, %%mm0 \n\t" + MOVNTQ" %%mm0, (%0) \n\t" + "add $16, %1 \n\t" + "add $8, %0 \n\t" + "2: \n\t" + "cmp %2, %1 \n\t" + " jb 1b \n\t" + : "+r" (d), "+r"(s) + : "r" (mm_end), "m" (mask3215g), "m" (mask3216br), "m" (mul3215) + ); + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + register int rgb = *(const uint32_t*)s; s += 4; + *d++ = ((rgb&0xFF)>>3) + ((rgb&0xF800)>>6) + ((rgb&0xF80000)>>9); + } +} + +static inline void rgb32tobgr15_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint8_t *s = src; + const uint8_t *end; + const uint8_t *mm_end; + uint16_t *d = (uint16_t *)dst; + end = s + src_size; + __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); + __asm__ volatile( + "movq %0, %%mm7 \n\t" + "movq %1, %%mm6 \n\t" + ::"m"(red_15mask),"m"(green_15mask)); + mm_end = end - 15; + while (s < mm_end) { + __asm__ volatile( + PREFETCH" 32(%1) \n\t" + "movd (%1), %%mm0 \n\t" + "movd 4(%1), %%mm3 \n\t" + "punpckldq 8(%1), %%mm0 \n\t" + "punpckldq 12(%1), %%mm3 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm0, %%mm2 \n\t" + "movq %%mm3, %%mm4 \n\t" + "movq %%mm3, %%mm5 \n\t" + "psllq $7, %%mm0 \n\t" + "psllq $7, %%mm3 \n\t" + "pand %%mm7, %%mm0 \n\t" + "pand %%mm7, %%mm3 \n\t" + "psrlq $6, %%mm1 \n\t" + "psrlq $6, %%mm4 \n\t" + "pand %%mm6, %%mm1 \n\t" + "pand %%mm6, %%mm4 \n\t" + "psrlq $19, %%mm2 \n\t" + "psrlq $19, %%mm5 \n\t" + "pand %2, %%mm2 \n\t" + "pand %2, %%mm5 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm4, %%mm3 \n\t" + "por %%mm2, %%mm0 \n\t" + "por %%mm5, %%mm3 \n\t" + "psllq $16, %%mm3 \n\t" + "por %%mm3, %%mm0 \n\t" + MOVNTQ" %%mm0, (%0) \n\t" + ::"r"(d),"r"(s),"m"(blue_15mask):"memory"); + d += 4; + s += 16; + } + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + register int rgb = *(const uint32_t*)s; s += 4; + *d++ = ((rgb&0xF8)<<7) + ((rgb&0xF800)>>6) + ((rgb&0xF80000)>>19); + } +} + +static inline void rgb24tobgr16_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint8_t *s = src; + const uint8_t *end; + const uint8_t *mm_end; + uint16_t *d = (uint16_t *)dst; + end = s + src_size; + __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); + __asm__ volatile( + "movq %0, %%mm7 \n\t" + "movq %1, %%mm6 \n\t" + ::"m"(red_16mask),"m"(green_16mask)); + mm_end = end - 11; + while (s < mm_end) { + __asm__ volatile( + PREFETCH" 32(%1) \n\t" + "movd (%1), %%mm0 \n\t" + "movd 3(%1), %%mm3 \n\t" + "punpckldq 6(%1), %%mm0 \n\t" + "punpckldq 9(%1), %%mm3 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm0, %%mm2 \n\t" + "movq %%mm3, %%mm4 \n\t" + "movq %%mm3, %%mm5 \n\t" + "psrlq $3, %%mm0 \n\t" + "psrlq $3, %%mm3 \n\t" + "pand %2, %%mm0 \n\t" + "pand %2, %%mm3 \n\t" + "psrlq $5, %%mm1 \n\t" + "psrlq $5, %%mm4 \n\t" + "pand %%mm6, %%mm1 \n\t" + "pand %%mm6, %%mm4 \n\t" + "psrlq $8, %%mm2 \n\t" + "psrlq $8, %%mm5 \n\t" + "pand %%mm7, %%mm2 \n\t" + "pand %%mm7, %%mm5 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm4, %%mm3 \n\t" + "por %%mm2, %%mm0 \n\t" + "por %%mm5, %%mm3 \n\t" + "psllq $16, %%mm3 \n\t" + "por %%mm3, %%mm0 \n\t" + MOVNTQ" %%mm0, (%0) \n\t" + ::"r"(d),"r"(s),"m"(blue_16mask):"memory"); + d += 4; + s += 12; + } + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + const int b = *s++; + const int g = *s++; + const int r = *s++; + *d++ = (b>>3) | ((g&0xFC)<<3) | ((r&0xF8)<<8); + } +} + +static inline void rgb24to16_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint8_t *s = src; + const uint8_t *end; + const uint8_t *mm_end; + uint16_t *d = (uint16_t *)dst; + end = s + src_size; + __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); + __asm__ volatile( + "movq %0, %%mm7 \n\t" + "movq %1, %%mm6 \n\t" + ::"m"(red_16mask),"m"(green_16mask)); + mm_end = end - 15; + while (s < mm_end) { + __asm__ volatile( + PREFETCH" 32(%1) \n\t" + "movd (%1), %%mm0 \n\t" + "movd 3(%1), %%mm3 \n\t" + "punpckldq 6(%1), %%mm0 \n\t" + "punpckldq 9(%1), %%mm3 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm0, %%mm2 \n\t" + "movq %%mm3, %%mm4 \n\t" + "movq %%mm3, %%mm5 \n\t" + "psllq $8, %%mm0 \n\t" + "psllq $8, %%mm3 \n\t" + "pand %%mm7, %%mm0 \n\t" + "pand %%mm7, %%mm3 \n\t" + "psrlq $5, %%mm1 \n\t" + "psrlq $5, %%mm4 \n\t" + "pand %%mm6, %%mm1 \n\t" + "pand %%mm6, %%mm4 \n\t" + "psrlq $19, %%mm2 \n\t" + "psrlq $19, %%mm5 \n\t" + "pand %2, %%mm2 \n\t" + "pand %2, %%mm5 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm4, %%mm3 \n\t" + "por %%mm2, %%mm0 \n\t" + "por %%mm5, %%mm3 \n\t" + "psllq $16, %%mm3 \n\t" + "por %%mm3, %%mm0 \n\t" + MOVNTQ" %%mm0, (%0) \n\t" + ::"r"(d),"r"(s),"m"(blue_16mask):"memory"); + d += 4; + s += 12; + } + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + const int r = *s++; + const int g = *s++; + const int b = *s++; + *d++ = (b>>3) | ((g&0xFC)<<3) | ((r&0xF8)<<8); + } +} + +static inline void rgb24tobgr15_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint8_t *s = src; + const uint8_t *end; + const uint8_t *mm_end; + uint16_t *d = (uint16_t *)dst; + end = s + src_size; + __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); + __asm__ volatile( + "movq %0, %%mm7 \n\t" + "movq %1, %%mm6 \n\t" + ::"m"(red_15mask),"m"(green_15mask)); + mm_end = end - 11; + while (s < mm_end) { + __asm__ volatile( + PREFETCH" 32(%1) \n\t" + "movd (%1), %%mm0 \n\t" + "movd 3(%1), %%mm3 \n\t" + "punpckldq 6(%1), %%mm0 \n\t" + "punpckldq 9(%1), %%mm3 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm0, %%mm2 \n\t" + "movq %%mm3, %%mm4 \n\t" + "movq %%mm3, %%mm5 \n\t" + "psrlq $3, %%mm0 \n\t" + "psrlq $3, %%mm3 \n\t" + "pand %2, %%mm0 \n\t" + "pand %2, %%mm3 \n\t" + "psrlq $6, %%mm1 \n\t" + "psrlq $6, %%mm4 \n\t" + "pand %%mm6, %%mm1 \n\t" + "pand %%mm6, %%mm4 \n\t" + "psrlq $9, %%mm2 \n\t" + "psrlq $9, %%mm5 \n\t" + "pand %%mm7, %%mm2 \n\t" + "pand %%mm7, %%mm5 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm4, %%mm3 \n\t" + "por %%mm2, %%mm0 \n\t" + "por %%mm5, %%mm3 \n\t" + "psllq $16, %%mm3 \n\t" + "por %%mm3, %%mm0 \n\t" + MOVNTQ" %%mm0, (%0) \n\t" + ::"r"(d),"r"(s),"m"(blue_15mask):"memory"); + d += 4; + s += 12; + } + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + const int b = *s++; + const int g = *s++; + const int r = *s++; + *d++ = (b>>3) | ((g&0xF8)<<2) | ((r&0xF8)<<7); + } +} + +static inline void rgb24to15_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint8_t *s = src; + const uint8_t *end; + const uint8_t *mm_end; + uint16_t *d = (uint16_t *)dst; + end = s + src_size; + __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); + __asm__ volatile( + "movq %0, %%mm7 \n\t" + "movq %1, %%mm6 \n\t" + ::"m"(red_15mask),"m"(green_15mask)); + mm_end = end - 15; + while (s < mm_end) { + __asm__ volatile( + PREFETCH" 32(%1) \n\t" + "movd (%1), %%mm0 \n\t" + "movd 3(%1), %%mm3 \n\t" + "punpckldq 6(%1), %%mm0 \n\t" + "punpckldq 9(%1), %%mm3 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm0, %%mm2 \n\t" + "movq %%mm3, %%mm4 \n\t" + "movq %%mm3, %%mm5 \n\t" + "psllq $7, %%mm0 \n\t" + "psllq $7, %%mm3 \n\t" + "pand %%mm7, %%mm0 \n\t" + "pand %%mm7, %%mm3 \n\t" + "psrlq $6, %%mm1 \n\t" + "psrlq $6, %%mm4 \n\t" + "pand %%mm6, %%mm1 \n\t" + "pand %%mm6, %%mm4 \n\t" + "psrlq $19, %%mm2 \n\t" + "psrlq $19, %%mm5 \n\t" + "pand %2, %%mm2 \n\t" + "pand %2, %%mm5 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm4, %%mm3 \n\t" + "por %%mm2, %%mm0 \n\t" + "por %%mm5, %%mm3 \n\t" + "psllq $16, %%mm3 \n\t" + "por %%mm3, %%mm0 \n\t" + MOVNTQ" %%mm0, (%0) \n\t" + ::"r"(d),"r"(s),"m"(blue_15mask):"memory"); + d += 4; + s += 12; + } + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + const int r = *s++; + const int g = *s++; + const int b = *s++; + *d++ = (b>>3) | ((g&0xF8)<<2) | ((r&0xF8)<<7); + } +} + +static inline void rgb15tobgr24_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint16_t *end; + const uint16_t *mm_end; + uint8_t *d = dst; + const uint16_t *s = (const uint16_t*)src; + end = s + src_size/2; + __asm__ volatile(PREFETCH" %0"::"m"(*s):"memory"); + mm_end = end - 7; + while (s < mm_end) { + __asm__ volatile( + PREFETCH" 32(%1) \n\t" + "movq (%1), %%mm0 \n\t" + "movq (%1), %%mm1 \n\t" + "movq (%1), %%mm2 \n\t" + "pand %2, %%mm0 \n\t" + "pand %3, %%mm1 \n\t" + "pand %4, %%mm2 \n\t" + "psllq $5, %%mm0 \n\t" + "pmulhw "MANGLE(mul15_mid)", %%mm0 \n\t" + "pmulhw "MANGLE(mul15_mid)", %%mm1 \n\t" + "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" + "movq %%mm0, %%mm3 \n\t" + "movq %%mm1, %%mm4 \n\t" + "movq %%mm2, %%mm5 \n\t" + "punpcklwd %5, %%mm0 \n\t" + "punpcklwd %5, %%mm1 \n\t" + "punpcklwd %5, %%mm2 \n\t" + "punpckhwd %5, %%mm3 \n\t" + "punpckhwd %5, %%mm4 \n\t" + "punpckhwd %5, %%mm5 \n\t" + "psllq $8, %%mm1 \n\t" + "psllq $16, %%mm2 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm2, %%mm0 \n\t" + "psllq $8, %%mm4 \n\t" + "psllq $16, %%mm5 \n\t" + "por %%mm4, %%mm3 \n\t" + "por %%mm5, %%mm3 \n\t" + + "movq %%mm0, %%mm6 \n\t" + "movq %%mm3, %%mm7 \n\t" + + "movq 8(%1), %%mm0 \n\t" + "movq 8(%1), %%mm1 \n\t" + "movq 8(%1), %%mm2 \n\t" + "pand %2, %%mm0 \n\t" + "pand %3, %%mm1 \n\t" + "pand %4, %%mm2 \n\t" + "psllq $5, %%mm0 \n\t" + "pmulhw "MANGLE(mul15_mid)", %%mm0 \n\t" + "pmulhw "MANGLE(mul15_mid)", %%mm1 \n\t" + "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" + "movq %%mm0, %%mm3 \n\t" + "movq %%mm1, %%mm4 \n\t" + "movq %%mm2, %%mm5 \n\t" + "punpcklwd %5, %%mm0 \n\t" + "punpcklwd %5, %%mm1 \n\t" + "punpcklwd %5, %%mm2 \n\t" + "punpckhwd %5, %%mm3 \n\t" + "punpckhwd %5, %%mm4 \n\t" + "punpckhwd %5, %%mm5 \n\t" + "psllq $8, %%mm1 \n\t" + "psllq $16, %%mm2 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm2, %%mm0 \n\t" + "psllq $8, %%mm4 \n\t" + "psllq $16, %%mm5 \n\t" + "por %%mm4, %%mm3 \n\t" + "por %%mm5, %%mm3 \n\t" + + :"=m"(*d) + :"r"(s),"m"(mask15b),"m"(mask15g),"m"(mask15r), "m"(mmx_null) + NAMED_CONSTRAINTS_ADD(mul15_mid,mul15_hi) + :"memory"); + /* borrowed 32 to 24 */ + __asm__ volatile( + "movq %%mm0, %%mm4 \n\t" + "movq %%mm3, %%mm5 \n\t" + "movq %%mm6, %%mm0 \n\t" + "movq %%mm7, %%mm1 \n\t" + + "movq %%mm4, %%mm6 \n\t" + "movq %%mm5, %%mm7 \n\t" + "movq %%mm0, %%mm2 \n\t" + "movq %%mm1, %%mm3 \n\t" + + STORE_BGR24_MMX + + :: "r"(d), "m"(*s) + NAMED_CONSTRAINTS_ADD(mask24l,mask24h) + :"memory"); + d += 24; + s += 8; + } + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + register uint16_t bgr; + bgr = *s++; + *d++ = ((bgr&0x1F)<<3) | ((bgr&0x1F)>>2); + *d++ = ((bgr&0x3E0)>>2) | ((bgr&0x3E0)>>7); + *d++ = ((bgr&0x7C00)>>7) | ((bgr&0x7C00)>>12); + } +} + +static inline void rgb16tobgr24_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint16_t *end; + const uint16_t *mm_end; + uint8_t *d = (uint8_t *)dst; + const uint16_t *s = (const uint16_t *)src; + end = s + src_size/2; + __asm__ volatile(PREFETCH" %0"::"m"(*s):"memory"); + mm_end = end - 7; + while (s < mm_end) { + __asm__ volatile( + PREFETCH" 32(%1) \n\t" + "movq (%1), %%mm0 \n\t" + "movq (%1), %%mm1 \n\t" + "movq (%1), %%mm2 \n\t" + "pand %2, %%mm0 \n\t" + "pand %3, %%mm1 \n\t" + "pand %4, %%mm2 \n\t" + "psllq $5, %%mm0 \n\t" + "psrlq $1, %%mm2 \n\t" + "pmulhw "MANGLE(mul15_mid)", %%mm0 \n\t" + "pmulhw "MANGLE(mul16_mid)", %%mm1 \n\t" + "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" + "movq %%mm0, %%mm3 \n\t" + "movq %%mm1, %%mm4 \n\t" + "movq %%mm2, %%mm5 \n\t" + "punpcklwd %5, %%mm0 \n\t" + "punpcklwd %5, %%mm1 \n\t" + "punpcklwd %5, %%mm2 \n\t" + "punpckhwd %5, %%mm3 \n\t" + "punpckhwd %5, %%mm4 \n\t" + "punpckhwd %5, %%mm5 \n\t" + "psllq $8, %%mm1 \n\t" + "psllq $16, %%mm2 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm2, %%mm0 \n\t" + "psllq $8, %%mm4 \n\t" + "psllq $16, %%mm5 \n\t" + "por %%mm4, %%mm3 \n\t" + "por %%mm5, %%mm3 \n\t" + + "movq %%mm0, %%mm6 \n\t" + "movq %%mm3, %%mm7 \n\t" + + "movq 8(%1), %%mm0 \n\t" + "movq 8(%1), %%mm1 \n\t" + "movq 8(%1), %%mm2 \n\t" + "pand %2, %%mm0 \n\t" + "pand %3, %%mm1 \n\t" + "pand %4, %%mm2 \n\t" + "psllq $5, %%mm0 \n\t" + "psrlq $1, %%mm2 \n\t" + "pmulhw "MANGLE(mul15_mid)", %%mm0 \n\t" + "pmulhw "MANGLE(mul16_mid)", %%mm1 \n\t" + "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" + "movq %%mm0, %%mm3 \n\t" + "movq %%mm1, %%mm4 \n\t" + "movq %%mm2, %%mm5 \n\t" + "punpcklwd %5, %%mm0 \n\t" + "punpcklwd %5, %%mm1 \n\t" + "punpcklwd %5, %%mm2 \n\t" + "punpckhwd %5, %%mm3 \n\t" + "punpckhwd %5, %%mm4 \n\t" + "punpckhwd %5, %%mm5 \n\t" + "psllq $8, %%mm1 \n\t" + "psllq $16, %%mm2 \n\t" + "por %%mm1, %%mm0 \n\t" + "por %%mm2, %%mm0 \n\t" + "psllq $8, %%mm4 \n\t" + "psllq $16, %%mm5 \n\t" + "por %%mm4, %%mm3 \n\t" + "por %%mm5, %%mm3 \n\t" + :"=m"(*d) + :"r"(s),"m"(mask16b),"m"(mask16g),"m"(mask16r),"m"(mmx_null) + NAMED_CONSTRAINTS_ADD(mul15_mid,mul16_mid,mul15_hi) + :"memory"); + /* borrowed 32 to 24 */ + __asm__ volatile( + "movq %%mm0, %%mm4 \n\t" + "movq %%mm3, %%mm5 \n\t" + "movq %%mm6, %%mm0 \n\t" + "movq %%mm7, %%mm1 \n\t" + + "movq %%mm4, %%mm6 \n\t" + "movq %%mm5, %%mm7 \n\t" + "movq %%mm0, %%mm2 \n\t" + "movq %%mm1, %%mm3 \n\t" + + STORE_BGR24_MMX + + :: "r"(d), "m"(*s) + NAMED_CONSTRAINTS_ADD(mask24l,mask24h) + :"memory"); + d += 24; + s += 8; + } + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + register uint16_t bgr; + bgr = *s++; + *d++ = ((bgr&0x1F)<<3) | ((bgr&0x1F)>>2); + *d++ = ((bgr&0x7E0)>>3) | ((bgr&0x7E0)>>9); + *d++ = ((bgr&0xF800)>>8) | ((bgr&0xF800)>>13); + } +} + +/* + * mm0 = 00 B3 00 B2 00 B1 00 B0 + * mm1 = 00 G3 00 G2 00 G1 00 G0 + * mm2 = 00 R3 00 R2 00 R1 00 R0 + * mm6 = FF FF FF FF FF FF FF FF + * mm7 = 00 00 00 00 00 00 00 00 + */ +#define PACK_RGB32 \ + "packuswb %%mm7, %%mm0 \n\t" /* 00 00 00 00 B3 B2 B1 B0 */ \ + "packuswb %%mm7, %%mm1 \n\t" /* 00 00 00 00 G3 G2 G1 G0 */ \ + "packuswb %%mm7, %%mm2 \n\t" /* 00 00 00 00 R3 R2 R1 R0 */ \ + "punpcklbw %%mm1, %%mm0 \n\t" /* G3 B3 G2 B2 G1 B1 G0 B0 */ \ + "punpcklbw %%mm6, %%mm2 \n\t" /* FF R3 FF R2 FF R1 FF R0 */ \ + "movq %%mm0, %%mm3 \n\t" \ + "punpcklwd %%mm2, %%mm0 \n\t" /* FF R1 G1 B1 FF R0 G0 B0 */ \ + "punpckhwd %%mm2, %%mm3 \n\t" /* FF R3 G3 B3 FF R2 G2 B2 */ \ + MOVNTQ" %%mm0, (%0) \n\t" \ + MOVNTQ" %%mm3, 8(%0) \n\t" \ + +static inline void rgb15to32_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint16_t *end; + const uint16_t *mm_end; + uint8_t *d = dst; + const uint16_t *s = (const uint16_t *)src; + end = s + src_size/2; + __asm__ volatile(PREFETCH" %0"::"m"(*s):"memory"); + __asm__ volatile("pxor %%mm7,%%mm7 \n\t":::"memory"); + __asm__ volatile("pcmpeqd %%mm6,%%mm6 \n\t":::"memory"); + mm_end = end - 3; + while (s < mm_end) { + __asm__ volatile( + PREFETCH" 32(%1) \n\t" + "movq (%1), %%mm0 \n\t" + "movq (%1), %%mm1 \n\t" + "movq (%1), %%mm2 \n\t" + "pand %2, %%mm0 \n\t" + "pand %3, %%mm1 \n\t" + "pand %4, %%mm2 \n\t" + "psllq $5, %%mm0 \n\t" + "pmulhw %5, %%mm0 \n\t" + "pmulhw %5, %%mm1 \n\t" + "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" + PACK_RGB32 + ::"r"(d),"r"(s),"m"(mask15b),"m"(mask15g),"m"(mask15r) ,"m"(mul15_mid) + NAMED_CONSTRAINTS_ADD(mul15_hi) + :"memory"); + d += 16; + s += 4; + } + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + register uint16_t bgr; + bgr = *s++; + *d++ = ((bgr&0x1F)<<3) | ((bgr&0x1F)>>2); + *d++ = ((bgr&0x3E0)>>2) | ((bgr&0x3E0)>>7); + *d++ = ((bgr&0x7C00)>>7) | ((bgr&0x7C00)>>12); + *d++ = 255; + } +} + +static inline void rgb16to32_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + const uint16_t *end; + const uint16_t *mm_end; + uint8_t *d = dst; + const uint16_t *s = (const uint16_t*)src; + end = s + src_size/2; + __asm__ volatile(PREFETCH" %0"::"m"(*s):"memory"); + __asm__ volatile("pxor %%mm7,%%mm7 \n\t":::"memory"); + __asm__ volatile("pcmpeqd %%mm6,%%mm6 \n\t":::"memory"); + mm_end = end - 3; + while (s < mm_end) { + __asm__ volatile( + PREFETCH" 32(%1) \n\t" + "movq (%1), %%mm0 \n\t" + "movq (%1), %%mm1 \n\t" + "movq (%1), %%mm2 \n\t" + "pand %2, %%mm0 \n\t" + "pand %3, %%mm1 \n\t" + "pand %4, %%mm2 \n\t" + "psllq $5, %%mm0 \n\t" + "psrlq $1, %%mm2 \n\t" + "pmulhw %5, %%mm0 \n\t" + "pmulhw "MANGLE(mul16_mid)", %%mm1 \n\t" + "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" + PACK_RGB32 + ::"r"(d),"r"(s),"m"(mask16b),"m"(mask16g),"m"(mask16r),"m"(mul15_mid) + NAMED_CONSTRAINTS_ADD(mul16_mid,mul15_hi) + :"memory"); + d += 16; + s += 4; + } + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + while (s < end) { + register uint16_t bgr; + bgr = *s++; + *d++ = ((bgr&0x1F)<<3) | ((bgr&0x1F)>>2); + *d++ = ((bgr&0x7E0)>>3) | ((bgr&0x7E0)>>9); + *d++ = ((bgr&0xF800)>>8) | ((bgr&0xF800)>>13); + *d++ = 255; + } +} + +static inline void rgb24tobgr24_mmxext(const uint8_t *src, uint8_t *dst, int src_size) +{ + x86_reg mmx_size= 23 - src_size; + __asm__ volatile ( + "test %%"FF_REG_a", %%"FF_REG_a" \n\t" + "jns 2f \n\t" + "movq "MANGLE(mask24r)", %%mm5 \n\t" + "movq "MANGLE(mask24g)", %%mm6 \n\t" + "movq "MANGLE(mask24b)", %%mm7 \n\t" + ".p2align 4 \n\t" + "1: \n\t" + PREFETCH" 32(%1, %%"FF_REG_a") \n\t" + "movq (%1, %%"FF_REG_a"), %%mm0 \n\t" // BGR BGR BG + "movq (%1, %%"FF_REG_a"), %%mm1 \n\t" // BGR BGR BG + "movq 2(%1, %%"FF_REG_a"), %%mm2 \n\t" // R BGR BGR B + "psllq $16, %%mm0 \n\t" // 00 BGR BGR + "pand %%mm5, %%mm0 \n\t" + "pand %%mm6, %%mm1 \n\t" + "pand %%mm7, %%mm2 \n\t" + "por %%mm0, %%mm1 \n\t" + "por %%mm2, %%mm1 \n\t" + "movq 6(%1, %%"FF_REG_a"), %%mm0 \n\t" // BGR BGR BG + MOVNTQ" %%mm1,(%2, %%"FF_REG_a") \n\t" // RGB RGB RG + "movq 8(%1, %%"FF_REG_a"), %%mm1 \n\t" // R BGR BGR B + "movq 10(%1, %%"FF_REG_a"), %%mm2 \n\t" // GR BGR BGR + "pand %%mm7, %%mm0 \n\t" + "pand %%mm5, %%mm1 \n\t" + "pand %%mm6, %%mm2 \n\t" + "por %%mm0, %%mm1 \n\t" + "por %%mm2, %%mm1 \n\t" + "movq 14(%1, %%"FF_REG_a"), %%mm0 \n\t" // R BGR BGR B + MOVNTQ" %%mm1, 8(%2, %%"FF_REG_a")\n\t" // B RGB RGB R + "movq 16(%1, %%"FF_REG_a"), %%mm1 \n\t" // GR BGR BGR + "movq 18(%1, %%"FF_REG_a"), %%mm2 \n\t" // BGR BGR BG + "pand %%mm6, %%mm0 \n\t" + "pand %%mm7, %%mm1 \n\t" + "pand %%mm5, %%mm2 \n\t" + "por %%mm0, %%mm1 \n\t" + "por %%mm2, %%mm1 \n\t" + MOVNTQ" %%mm1, 16(%2, %%"FF_REG_a") \n\t" + "add $24, %%"FF_REG_a" \n\t" + " js 1b \n\t" + "2: \n\t" + : "+a" (mmx_size) + : "r" (src-mmx_size), "r"(dst-mmx_size) + NAMED_CONSTRAINTS_ADD(mask24r,mask24g,mask24b) + ); + + __asm__ volatile(SFENCE:::"memory"); + __asm__ volatile(EMMS:::"memory"); + + if (mmx_size==23) return; //finished, was multiple of 8 + + src+= src_size; + dst+= src_size; + src_size= 23-mmx_size; + src-= src_size; + dst-= src_size; + for (unsigned i = 0; i < src_size; i +=3) { + register uint8_t x; + x = src[i + 2]; + dst[i + 1] = src[i + 1]; + dst[i + 2] = src[i + 0]; + dst[i + 0] = x; + } +} + +static inline void yuvPlanartoyuy2_mmxext(const uint8_t *ysrc, const uint8_t *usrc, const uint8_t *vsrc, uint8_t *dst, + int width, int height, + int lumStride, int chromStride, int dstStride, int vertLumPerChroma) +{ + const x86_reg chromWidth= width>>1; + for (int y = 0; y < height; y++) { + //FIXME handle 2 lines at once (fewer prefetches, reuse some chroma, but very likely memory-limited anyway) + __asm__ volatile( + "xor %%"FF_REG_a", %%"FF_REG_a" \n\t" + ".p2align 4 \n\t" + "1: \n\t" + PREFETCH" 32(%1, %%"FF_REG_a", 2) \n\t" + PREFETCH" 32(%2, %%"FF_REG_a") \n\t" + PREFETCH" 32(%3, %%"FF_REG_a") \n\t" + "movq (%2, %%"FF_REG_a"), %%mm0 \n\t" // U(0) + "movq %%mm0, %%mm2 \n\t" // U(0) + "movq (%3, %%"FF_REG_a"), %%mm1 \n\t" // V(0) + "punpcklbw %%mm1, %%mm0 \n\t" // UVUV UVUV(0) + "punpckhbw %%mm1, %%mm2 \n\t" // UVUV UVUV(8) + + "movq (%1, %%"FF_REG_a",2), %%mm3 \n\t" // Y(0) + "movq 8(%1, %%"FF_REG_a",2), %%mm5 \n\t" // Y(8) + "movq %%mm3, %%mm4 \n\t" // Y(0) + "movq %%mm5, %%mm6 \n\t" // Y(8) + "punpcklbw %%mm0, %%mm3 \n\t" // YUYV YUYV(0) + "punpckhbw %%mm0, %%mm4 \n\t" // YUYV YUYV(4) + "punpcklbw %%mm2, %%mm5 \n\t" // YUYV YUYV(8) + "punpckhbw %%mm2, %%mm6 \n\t" // YUYV YUYV(12) + + MOVNTQ" %%mm3, (%0, %%"FF_REG_a", 4) \n\t" + MOVNTQ" %%mm4, 8(%0, %%"FF_REG_a", 4) \n\t" + MOVNTQ" %%mm5, 16(%0, %%"FF_REG_a", 4) \n\t" + MOVNTQ" %%mm6, 24(%0, %%"FF_REG_a", 4) \n\t" + + "add $8, %%"FF_REG_a" \n\t" + "cmp %4, %%"FF_REG_a" \n\t" + " jb 1b \n\t" + ::"r"(dst), "r"(ysrc), "r"(usrc), "r"(vsrc), "g" (chromWidth) + : "%"FF_REG_a + ); + if ((y&(vertLumPerChroma-1)) == vertLumPerChroma-1) { + usrc += chromStride; + vsrc += chromStride; + } + ysrc += lumStride; + dst += dstStride; + } + __asm__(EMMS" \n\t" + SFENCE" \n\t" + :::"memory"); +} + +/** + * Height should be a multiple of 2 and width should be a multiple of 16. + * (If this is a problem for anyone then tell me, and I will fix it.) + */ +static inline void yv12toyuy2_mmxext(const uint8_t *ysrc, const uint8_t *usrc, const uint8_t *vsrc, uint8_t *dst, + int width, int height, + int lumStride, int chromStride, int dstStride) +{ + //FIXME interpolate chroma + yuvPlanartoyuy2_mmxext(ysrc, usrc, vsrc, dst, width, height, lumStride, chromStride, dstStride, 2); +} + +static inline void yuvPlanartouyvy_mmxext(const uint8_t *ysrc, const uint8_t *usrc, const uint8_t *vsrc, uint8_t *dst, + int width, int height, + int lumStride, int chromStride, int dstStride, int vertLumPerChroma) +{ + const x86_reg chromWidth= width>>1; + for (int y = 0; y < height; y++) { + //FIXME handle 2 lines at once (fewer prefetches, reuse some chroma, but very likely memory-limited anyway) + __asm__ volatile( + "xor %%"FF_REG_a", %%"FF_REG_a" \n\t" + ".p2align 4 \n\t" + "1: \n\t" + PREFETCH" 32(%1, %%"FF_REG_a", 2) \n\t" + PREFETCH" 32(%2, %%"FF_REG_a") \n\t" + PREFETCH" 32(%3, %%"FF_REG_a") \n\t" + "movq (%2, %%"FF_REG_a"), %%mm0 \n\t" // U(0) + "movq %%mm0, %%mm2 \n\t" // U(0) + "movq (%3, %%"FF_REG_a"), %%mm1 \n\t" // V(0) + "punpcklbw %%mm1, %%mm0 \n\t" // UVUV UVUV(0) + "punpckhbw %%mm1, %%mm2 \n\t" // UVUV UVUV(8) + + "movq (%1, %%"FF_REG_a",2), %%mm3 \n\t" // Y(0) + "movq 8(%1, %%"FF_REG_a",2), %%mm5 \n\t" // Y(8) + "movq %%mm0, %%mm4 \n\t" // Y(0) + "movq %%mm2, %%mm6 \n\t" // Y(8) + "punpcklbw %%mm3, %%mm0 \n\t" // YUYV YUYV(0) + "punpckhbw %%mm3, %%mm4 \n\t" // YUYV YUYV(4) + "punpcklbw %%mm5, %%mm2 \n\t" // YUYV YUYV(8) + "punpckhbw %%mm5, %%mm6 \n\t" // YUYV YUYV(12) + + MOVNTQ" %%mm0, (%0, %%"FF_REG_a", 4) \n\t" + MOVNTQ" %%mm4, 8(%0, %%"FF_REG_a", 4) \n\t" + MOVNTQ" %%mm2, 16(%0, %%"FF_REG_a", 4) \n\t" + MOVNTQ" %%mm6, 24(%0, %%"FF_REG_a", 4) \n\t" + + "add $8, %%"FF_REG_a" \n\t" + "cmp %4, %%"FF_REG_a" \n\t" + " jb 1b \n\t" + ::"r"(dst), "r"(ysrc), "r"(usrc), "r"(vsrc), "g" (chromWidth) + : "%"FF_REG_a + ); + if ((y&(vertLumPerChroma-1)) == vertLumPerChroma-1) { + usrc += chromStride; + vsrc += chromStride; + } + ysrc += lumStride; + dst += dstStride; + } + __asm__(EMMS" \n\t" + SFENCE" \n\t" + :::"memory"); +} + +/** + * Height should be a multiple of 2 and width should be a multiple of 16 + * (If this is a problem for anyone then tell me, and I will fix it.) + */ +static inline void yv12touyvy_mmxext(const uint8_t *ysrc, const uint8_t *usrc, const uint8_t *vsrc, uint8_t *dst, + int width, int height, + int lumStride, int chromStride, int dstStride) +{ + //FIXME interpolate chroma + yuvPlanartouyvy_mmxext(ysrc, usrc, vsrc, dst, width, height, lumStride, chromStride, dstStride, 2); +} + +/** + * Width should be a multiple of 16. + */ +static inline void yuv422ptouyvy_mmxext(const uint8_t *ysrc, const uint8_t *usrc, const uint8_t *vsrc, uint8_t *dst, + int width, int height, + int lumStride, int chromStride, int dstStride) +{ + yuvPlanartouyvy_mmxext(ysrc, usrc, vsrc, dst, width, height, lumStride, chromStride, dstStride, 1); +} + +/** + * Width should be a multiple of 16. + */ +static inline void yuv422ptoyuy2_mmxext(const uint8_t *ysrc, const uint8_t *usrc, const uint8_t *vsrc, uint8_t *dst, + int width, int height, + int lumStride, int chromStride, int dstStride) +{ + yuvPlanartoyuy2_mmxext(ysrc, usrc, vsrc, dst, width, height, lumStride, chromStride, dstStride, 1); +} + +/** + * Height should be a multiple of 2 and width should be a multiple of 16. + * (If this is a problem for anyone then tell me, and I will fix it.) + */ +static inline void yuy2toyv12_mmxext(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst, + int width, int height, + int lumStride, int chromStride, int srcStride) +{ + const x86_reg chromWidth= width>>1; + for (int y = 0; y < height; y += 2) { + __asm__ volatile( + "xor %%"FF_REG_a", %%"FF_REG_a"\n\t" + "pcmpeqw %%mm7, %%mm7 \n\t" + "psrlw $8, %%mm7 \n\t" // FF,00,FF,00... + ".p2align 4 \n\t" + "1: \n\t" + PREFETCH" 64(%0, %%"FF_REG_a", 4) \n\t" + "movq (%0, %%"FF_REG_a", 4), %%mm0 \n\t" // YUYV YUYV(0) + "movq 8(%0, %%"FF_REG_a", 4), %%mm1 \n\t" // YUYV YUYV(4) + "movq %%mm0, %%mm2 \n\t" // YUYV YUYV(0) + "movq %%mm1, %%mm3 \n\t" // YUYV YUYV(4) + "psrlw $8, %%mm0 \n\t" // U0V0 U0V0(0) + "psrlw $8, %%mm1 \n\t" // U0V0 U0V0(4) + "pand %%mm7, %%mm2 \n\t" // Y0Y0 Y0Y0(0) + "pand %%mm7, %%mm3 \n\t" // Y0Y0 Y0Y0(4) + "packuswb %%mm1, %%mm0 \n\t" // UVUV UVUV(0) + "packuswb %%mm3, %%mm2 \n\t" // YYYY YYYY(0) + + MOVNTQ" %%mm2, (%1, %%"FF_REG_a", 2) \n\t" + + "movq 16(%0, %%"FF_REG_a", 4), %%mm1 \n\t" // YUYV YUYV(8) + "movq 24(%0, %%"FF_REG_a", 4), %%mm2 \n\t" // YUYV YUYV(12) + "movq %%mm1, %%mm3 \n\t" // YUYV YUYV(8) + "movq %%mm2, %%mm4 \n\t" // YUYV YUYV(12) + "psrlw $8, %%mm1 \n\t" // U0V0 U0V0(8) + "psrlw $8, %%mm2 \n\t" // U0V0 U0V0(12) + "pand %%mm7, %%mm3 \n\t" // Y0Y0 Y0Y0(8) + "pand %%mm7, %%mm4 \n\t" // Y0Y0 Y0Y0(12) + "packuswb %%mm2, %%mm1 \n\t" // UVUV UVUV(8) + "packuswb %%mm4, %%mm3 \n\t" // YYYY YYYY(8) + + MOVNTQ" %%mm3, 8(%1, %%"FF_REG_a", 2) \n\t" + + "movq %%mm0, %%mm2 \n\t" // UVUV UVUV(0) + "movq %%mm1, %%mm3 \n\t" // UVUV UVUV(8) + "psrlw $8, %%mm0 \n\t" // V0V0 V0V0(0) + "psrlw $8, %%mm1 \n\t" // V0V0 V0V0(8) + "pand %%mm7, %%mm2 \n\t" // U0U0 U0U0(0) + "pand %%mm7, %%mm3 \n\t" // U0U0 U0U0(8) + "packuswb %%mm1, %%mm0 \n\t" // VVVV VVVV(0) + "packuswb %%mm3, %%mm2 \n\t" // UUUU UUUU(0) + + MOVNTQ" %%mm0, (%3, %%"FF_REG_a") \n\t" + MOVNTQ" %%mm2, (%2, %%"FF_REG_a") \n\t" + + "add $8, %%"FF_REG_a" \n\t" + "cmp %4, %%"FF_REG_a" \n\t" + " jb 1b \n\t" + ::"r"(src), "r"(ydst), "r"(udst), "r"(vdst), "g" (chromWidth) + : "memory", "%"FF_REG_a + ); + + ydst += lumStride; + src += srcStride; + + __asm__ volatile( + "xor %%"FF_REG_a", %%"FF_REG_a"\n\t" + ".p2align 4 \n\t" + "1: \n\t" + PREFETCH" 64(%0, %%"FF_REG_a", 4) \n\t" + "movq (%0, %%"FF_REG_a", 4), %%mm0 \n\t" // YUYV YUYV(0) + "movq 8(%0, %%"FF_REG_a", 4), %%mm1 \n\t" // YUYV YUYV(4) + "movq 16(%0, %%"FF_REG_a", 4), %%mm2 \n\t" // YUYV YUYV(8) + "movq 24(%0, %%"FF_REG_a", 4), %%mm3 \n\t" // YUYV YUYV(12) + "pand %%mm7, %%mm0 \n\t" // Y0Y0 Y0Y0(0) + "pand %%mm7, %%mm1 \n\t" // Y0Y0 Y0Y0(4) + "pand %%mm7, %%mm2 \n\t" // Y0Y0 Y0Y0(8) + "pand %%mm7, %%mm3 \n\t" // Y0Y0 Y0Y0(12) + "packuswb %%mm1, %%mm0 \n\t" // YYYY YYYY(0) + "packuswb %%mm3, %%mm2 \n\t" // YYYY YYYY(8) + + MOVNTQ" %%mm0, (%1, %%"FF_REG_a", 2) \n\t" + MOVNTQ" %%mm2, 8(%1, %%"FF_REG_a", 2) \n\t" + + "add $8, %%"FF_REG_a"\n\t" + "cmp %4, %%"FF_REG_a"\n\t" + " jb 1b \n\t" + + ::"r"(src), "r"(ydst), "r"(udst), "r"(vdst), "g" (chromWidth) + : "memory", "%"FF_REG_a + ); + udst += chromStride; + vdst += chromStride; + ydst += lumStride; + src += srcStride; + } + __asm__ volatile(EMMS" \n\t" + SFENCE" \n\t" + :::"memory"); +} + +static inline void planar2x_mmxext(const uint8_t *src, uint8_t *dst, int srcWidth, int srcHeight, int srcStride, int dstStride) +{ + dst[0]= src[0]; + + // first line + for (int x = 0; x < srcWidth - 1; x++) { + dst[2*x+1]= (3*src[x] + src[x+1])>>2; + dst[2*x+2]= ( src[x] + 3*src[x+1])>>2; + } + dst[2*srcWidth-1]= src[srcWidth-1]; + + dst+= dstStride; + + for (int y = 1; y < srcHeight; y++) { + x86_reg mmxSize= srcWidth&~15; + + if (mmxSize) { + __asm__ volatile( + "mov %4, %%"FF_REG_a" \n\t" + "movq "MANGLE(mmx_ff)", %%mm0 \n\t" + "movq (%0, %%"FF_REG_a"), %%mm4 \n\t" + "movq %%mm4, %%mm2 \n\t" + "psllq $8, %%mm4 \n\t" + "pand %%mm0, %%mm2 \n\t" + "por %%mm2, %%mm4 \n\t" + "movq (%1, %%"FF_REG_a"), %%mm5 \n\t" + "movq %%mm5, %%mm3 \n\t" + "psllq $8, %%mm5 \n\t" + "pand %%mm0, %%mm3 \n\t" + "por %%mm3, %%mm5 \n\t" + "1: \n\t" + "movq (%0, %%"FF_REG_a"), %%mm0 \n\t" + "movq (%1, %%"FF_REG_a"), %%mm1 \n\t" + "movq 1(%0, %%"FF_REG_a"), %%mm2 \n\t" + "movq 1(%1, %%"FF_REG_a"), %%mm3 \n\t" + PAVGB" %%mm0, %%mm5 \n\t" + PAVGB" %%mm0, %%mm3 \n\t" + PAVGB" %%mm0, %%mm5 \n\t" + PAVGB" %%mm0, %%mm3 \n\t" + PAVGB" %%mm1, %%mm4 \n\t" + PAVGB" %%mm1, %%mm2 \n\t" + PAVGB" %%mm1, %%mm4 \n\t" + PAVGB" %%mm1, %%mm2 \n\t" + "movq %%mm5, %%mm7 \n\t" + "movq %%mm4, %%mm6 \n\t" + "punpcklbw %%mm3, %%mm5 \n\t" + "punpckhbw %%mm3, %%mm7 \n\t" + "punpcklbw %%mm2, %%mm4 \n\t" + "punpckhbw %%mm2, %%mm6 \n\t" + MOVNTQ" %%mm5, (%2, %%"FF_REG_a", 2) \n\t" + MOVNTQ" %%mm7, 8(%2, %%"FF_REG_a", 2) \n\t" + MOVNTQ" %%mm4, (%3, %%"FF_REG_a", 2) \n\t" + MOVNTQ" %%mm6, 8(%3, %%"FF_REG_a", 2) \n\t" + "add $8, %%"FF_REG_a" \n\t" + "movq -1(%0, %%"FF_REG_a"), %%mm4 \n\t" + "movq -1(%1, %%"FF_REG_a"), %%mm5 \n\t" + " js 1b \n\t" + :: "r" (src + mmxSize ), "r" (src + srcStride + mmxSize ), + "r" (dst + mmxSize*2), "r" (dst + dstStride + mmxSize*2), + "g" (-mmxSize) + NAMED_CONSTRAINTS_ADD(mmx_ff) + : "%"FF_REG_a + ); + } else { + mmxSize = 1; + dst[0] = (src[0] * 3 + src[srcStride]) >> 2; + dst[dstStride] = (src[0] + 3 * src[srcStride]) >> 2; + } + + for (int x = mmxSize - 1; x < srcWidth - 1; x++) { + dst[2*x +1]= (3*src[x+0] + src[x+srcStride+1])>>2; + dst[2*x+dstStride+2]= ( src[x+0] + 3*src[x+srcStride+1])>>2; + dst[2*x+dstStride+1]= ( src[x+1] + 3*src[x+srcStride ])>>2; + dst[2*x +2]= (3*src[x+1] + src[x+srcStride ])>>2; + } + dst[srcWidth*2 -1 ]= (3*src[srcWidth-1] + src[srcWidth-1 + srcStride])>>2; + dst[srcWidth*2 -1 + dstStride]= ( src[srcWidth-1] + 3*src[srcWidth-1 + srcStride])>>2; + + dst+=dstStride*2; + src+=srcStride; + } + + // last line + dst[0]= src[0]; + + for (int x = 0; x < srcWidth - 1; x++) { + dst[2*x+1]= (3*src[x] + src[x+1])>>2; + dst[2*x+2]= ( src[x] + 3*src[x+1])>>2; + } + dst[2*srcWidth-1]= src[srcWidth-1]; + + __asm__ volatile(EMMS" \n\t" + SFENCE" \n\t" + :::"memory"); +} + +/** + * Height should be a multiple of 2 and width should be a multiple of 2. + * (If this is a problem for anyone then tell me, and I will fix it.) + * Chrominance data is only taken from every second line, + * others are ignored in the C version. + * FIXME: Write HQ version. + */ +#if HAVE_7REGS +static inline void rgb24toyv12_mmxext(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst, + int width, int height, + int lumStride, int chromStride, int srcStride, + int32_t *rgb2yuv) +{ +#define BGR2Y_IDX "16*4+16*32" +#define BGR2U_IDX "16*4+16*33" +#define BGR2V_IDX "16*4+16*34" + int y; + const x86_reg chromWidth= width>>1; + + if (height > 2) { + ff_rgb24toyv12_c(src, ydst, udst, vdst, width, 2, lumStride, chromStride, srcStride, rgb2yuv); + src += 2*srcStride; + ydst += 2*lumStride; + udst += chromStride; + vdst += chromStride; + height -= 2; + } + + for (y = 0; y < height - 2; y += 2) { + for (int i = 0; i < 2; i++) { + __asm__ volatile( + "mov %2, %%"FF_REG_a"\n\t" + "movq "BGR2Y_IDX"(%3), %%mm6 \n\t" + "movq "MANGLE(ff_w1111)", %%mm5 \n\t" + "pxor %%mm7, %%mm7 \n\t" + "lea (%%"FF_REG_a", %%"FF_REG_a", 2), %%"FF_REG_d" \n\t" + ".p2align 4 \n\t" + "1: \n\t" + PREFETCH" 64(%0, %%"FF_REG_d") \n\t" + "movd (%0, %%"FF_REG_d"), %%mm0 \n\t" + "movd 3(%0, %%"FF_REG_d"), %%mm1 \n\t" + "punpcklbw %%mm7, %%mm0 \n\t" + "punpcklbw %%mm7, %%mm1 \n\t" + "movd 6(%0, %%"FF_REG_d"), %%mm2 \n\t" + "movd 9(%0, %%"FF_REG_d"), %%mm3 \n\t" + "punpcklbw %%mm7, %%mm2 \n\t" + "punpcklbw %%mm7, %%mm3 \n\t" + "pmaddwd %%mm6, %%mm0 \n\t" + "pmaddwd %%mm6, %%mm1 \n\t" + "pmaddwd %%mm6, %%mm2 \n\t" + "pmaddwd %%mm6, %%mm3 \n\t" + "psrad $8, %%mm0 \n\t" + "psrad $8, %%mm1 \n\t" + "psrad $8, %%mm2 \n\t" + "psrad $8, %%mm3 \n\t" + "packssdw %%mm1, %%mm0 \n\t" + "packssdw %%mm3, %%mm2 \n\t" + "pmaddwd %%mm5, %%mm0 \n\t" + "pmaddwd %%mm5, %%mm2 \n\t" + "packssdw %%mm2, %%mm0 \n\t" + "psraw $7, %%mm0 \n\t" + + "movd 12(%0, %%"FF_REG_d"), %%mm4 \n\t" + "movd 15(%0, %%"FF_REG_d"), %%mm1 \n\t" + "punpcklbw %%mm7, %%mm4 \n\t" + "punpcklbw %%mm7, %%mm1 \n\t" + "movd 18(%0, %%"FF_REG_d"), %%mm2 \n\t" + "movd 21(%0, %%"FF_REG_d"), %%mm3 \n\t" + "punpcklbw %%mm7, %%mm2 \n\t" + "punpcklbw %%mm7, %%mm3 \n\t" + "pmaddwd %%mm6, %%mm4 \n\t" + "pmaddwd %%mm6, %%mm1 \n\t" + "pmaddwd %%mm6, %%mm2 \n\t" + "pmaddwd %%mm6, %%mm3 \n\t" + "psrad $8, %%mm4 \n\t" + "psrad $8, %%mm1 \n\t" + "psrad $8, %%mm2 \n\t" + "psrad $8, %%mm3 \n\t" + "packssdw %%mm1, %%mm4 \n\t" + "packssdw %%mm3, %%mm2 \n\t" + "pmaddwd %%mm5, %%mm4 \n\t" + "pmaddwd %%mm5, %%mm2 \n\t" + "add $24, %%"FF_REG_d"\n\t" + "packssdw %%mm2, %%mm4 \n\t" + "psraw $7, %%mm4 \n\t" + + "packuswb %%mm4, %%mm0 \n\t" + "paddusb "MANGLE(ff_bgr2YOffset)", %%mm0 \n\t" + + MOVNTQ" %%mm0, (%1, %%"FF_REG_a") \n\t" + "add $8, %%"FF_REG_a" \n\t" + " js 1b \n\t" + : : "r" (src+width*3), "r" (ydst+width), "g" ((x86_reg)-width), "r"(rgb2yuv) + NAMED_CONSTRAINTS_ADD(ff_w1111,ff_bgr2YOffset) + : "%"FF_REG_a, "%"FF_REG_d + ); + ydst += lumStride; + src += srcStride; + } + src -= srcStride*2; + __asm__ volatile( + "mov %4, %%"FF_REG_a"\n\t" + "movq "MANGLE(ff_w1111)", %%mm5 \n\t" + "movq "BGR2U_IDX"(%5), %%mm6 \n\t" + "pxor %%mm7, %%mm7 \n\t" + "lea (%%"FF_REG_a", %%"FF_REG_a", 2), %%"FF_REG_d" \n\t" + "add %%"FF_REG_d", %%"FF_REG_d"\n\t" + ".p2align 4 \n\t" + "1: \n\t" + PREFETCH" 64(%0, %%"FF_REG_d") \n\t" + PREFETCH" 64(%1, %%"FF_REG_d") \n\t" + "movq (%0, %%"FF_REG_d"), %%mm0 \n\t" + "movq (%1, %%"FF_REG_d"), %%mm1 \n\t" + "movq 6(%0, %%"FF_REG_d"), %%mm2 \n\t" + "movq 6(%1, %%"FF_REG_d"), %%mm3 \n\t" + PAVGB" %%mm1, %%mm0 \n\t" + PAVGB" %%mm3, %%mm2 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm2, %%mm3 \n\t" + "psrlq $24, %%mm0 \n\t" + "psrlq $24, %%mm2 \n\t" + PAVGB" %%mm1, %%mm0 \n\t" + PAVGB" %%mm3, %%mm2 \n\t" + "punpcklbw %%mm7, %%mm0 \n\t" + "punpcklbw %%mm7, %%mm2 \n\t" + "movq "BGR2V_IDX"(%5), %%mm1 \n\t" + "movq "BGR2V_IDX"(%5), %%mm3 \n\t" + + "pmaddwd %%mm0, %%mm1 \n\t" + "pmaddwd %%mm2, %%mm3 \n\t" + "pmaddwd %%mm6, %%mm0 \n\t" + "pmaddwd %%mm6, %%mm2 \n\t" + "psrad $8, %%mm0 \n\t" + "psrad $8, %%mm1 \n\t" + "psrad $8, %%mm2 \n\t" + "psrad $8, %%mm3 \n\t" + "packssdw %%mm2, %%mm0 \n\t" + "packssdw %%mm3, %%mm1 \n\t" + "pmaddwd %%mm5, %%mm0 \n\t" + "pmaddwd %%mm5, %%mm1 \n\t" + "packssdw %%mm1, %%mm0 \n\t" // V1 V0 U1 U0 + "psraw $7, %%mm0 \n\t" + + "movq 12(%0, %%"FF_REG_d"), %%mm4 \n\t" + "movq 12(%1, %%"FF_REG_d"), %%mm1 \n\t" + "movq 18(%0, %%"FF_REG_d"), %%mm2 \n\t" + "movq 18(%1, %%"FF_REG_d"), %%mm3 \n\t" + PAVGB" %%mm1, %%mm4 \n\t" + PAVGB" %%mm3, %%mm2 \n\t" + "movq %%mm4, %%mm1 \n\t" + "movq %%mm2, %%mm3 \n\t" + "psrlq $24, %%mm4 \n\t" + "psrlq $24, %%mm2 \n\t" + PAVGB" %%mm1, %%mm4 \n\t" + PAVGB" %%mm3, %%mm2 \n\t" + "punpcklbw %%mm7, %%mm4 \n\t" + "punpcklbw %%mm7, %%mm2 \n\t" + "movq "BGR2V_IDX"(%5), %%mm1 \n\t" + "movq "BGR2V_IDX"(%5), %%mm3 \n\t" + + "pmaddwd %%mm4, %%mm1 \n\t" + "pmaddwd %%mm2, %%mm3 \n\t" + "pmaddwd %%mm6, %%mm4 \n\t" + "pmaddwd %%mm6, %%mm2 \n\t" + "psrad $8, %%mm4 \n\t" + "psrad $8, %%mm1 \n\t" + "psrad $8, %%mm2 \n\t" + "psrad $8, %%mm3 \n\t" + "packssdw %%mm2, %%mm4 \n\t" + "packssdw %%mm3, %%mm1 \n\t" + "pmaddwd %%mm5, %%mm4 \n\t" + "pmaddwd %%mm5, %%mm1 \n\t" + "add $24, %%"FF_REG_d"\n\t" + "packssdw %%mm1, %%mm4 \n\t" // V3 V2 U3 U2 + "psraw $7, %%mm4 \n\t" + + "movq %%mm0, %%mm1 \n\t" + "punpckldq %%mm4, %%mm0 \n\t" + "punpckhdq %%mm4, %%mm1 \n\t" + "packsswb %%mm1, %%mm0 \n\t" + "paddb "MANGLE(ff_bgr2UVOffset)", %%mm0 \n\t" + "movd %%mm0, (%2, %%"FF_REG_a") \n\t" + "punpckhdq %%mm0, %%mm0 \n\t" + "movd %%mm0, (%3, %%"FF_REG_a") \n\t" + "add $4, %%"FF_REG_a" \n\t" + " js 1b \n\t" + : : "r" (src+chromWidth*6), "r" (src+srcStride+chromWidth*6), "r" (udst+chromWidth), "r" (vdst+chromWidth), "g" (-chromWidth), "r"(rgb2yuv) + NAMED_CONSTRAINTS_ADD(ff_w1111,ff_bgr2UVOffset) + : "%"FF_REG_a, "%"FF_REG_d + ); + + udst += chromStride; + vdst += chromStride; + src += srcStride*2; + } + + __asm__ volatile(EMMS" \n\t" + SFENCE" \n\t" + :::"memory"); + + ff_rgb24toyv12_c(src, ydst, udst, vdst, width, height-y, lumStride, chromStride, srcStride, rgb2yuv); +} +#endif /* HAVE_7REGS */ + +static inline void vu9_to_vu12_mmxext(const uint8_t *src1, const uint8_t *src2, + uint8_t *dst1, uint8_t *dst2, + int width, int height, + int srcStride1, int srcStride2, + int dstStride1, int dstStride2) +{ + int w,h; + w=width/2; h=height/2; + __asm__ volatile( + PREFETCH" %0 \n\t" + PREFETCH" %1 \n\t" + ::"m"(*(src1+srcStride1)),"m"(*(src2+srcStride2)):"memory"); + for (x86_reg y = 0; y < h; y++) { + const uint8_t* s1=src1+srcStride1*(y>>1); + uint8_t* d=dst1+dstStride1*y; + x86_reg x = 0; + for (;x>1); + uint8_t* d=dst2+dstStride2*y; + x86_reg x = 0; + for (;x>2); + const uint8_t* vp=src3+srcStride3*(y>>2); + uint8_t* d=dst+dstStride*y; + x86_reg x = 0; + for (;x>1; + dst1[count]= (src0[4*count+2]+src1[4*count+2])>>1; + count++; + } +} + +static void extract_odd2_mmxext(const uint8_t *src, uint8_t *dst0, uint8_t *dst1, x86_reg count) +{ + dst0+= count; + dst1+= count; + src += 4*count; + count= - count; + if(count <= -8) { + count += 7; + __asm__ volatile( + "pcmpeqw %%mm7, %%mm7 \n\t" + "psrlw $8, %%mm7 \n\t" + "1: \n\t" + "movq -28(%1, %0, 4), %%mm0 \n\t" + "movq -20(%1, %0, 4), %%mm1 \n\t" + "movq -12(%1, %0, 4), %%mm2 \n\t" + "movq -4(%1, %0, 4), %%mm3 \n\t" + "psrlw $8, %%mm0 \n\t" + "psrlw $8, %%mm1 \n\t" + "psrlw $8, %%mm2 \n\t" + "psrlw $8, %%mm3 \n\t" + "packuswb %%mm1, %%mm0 \n\t" + "packuswb %%mm3, %%mm2 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm2, %%mm3 \n\t" + "psrlw $8, %%mm0 \n\t" + "psrlw $8, %%mm2 \n\t" + "pand %%mm7, %%mm1 \n\t" + "pand %%mm7, %%mm3 \n\t" + "packuswb %%mm2, %%mm0 \n\t" + "packuswb %%mm3, %%mm1 \n\t" + MOVNTQ" %%mm0,- 7(%3, %0) \n\t" + MOVNTQ" %%mm1,- 7(%2, %0) \n\t" + "add $8, %0 \n\t" + " js 1b \n\t" + : "+r"(count) + : "r"(src), "r"(dst0), "r"(dst1) + ); + count -= 7; + } + src++; + while(count<0) { + dst0[count]= src[4*count+0]; + dst1[count]= src[4*count+2]; + count++; + } +} + +static void extract_odd2avg_mmxext(const uint8_t *src0, const uint8_t *src1, uint8_t *dst0, uint8_t *dst1, x86_reg count) +{ + dst0 += count; + dst1 += count; + src0 += 4*count; + src1 += 4*count; + count= - count; +#ifdef PAVGB + if(count <= -8) { + count += 7; + __asm__ volatile( + "pcmpeqw %%mm7, %%mm7 \n\t" + "psrlw $8, %%mm7 \n\t" + "1: \n\t" + "movq -28(%1, %0, 4), %%mm0 \n\t" + "movq -20(%1, %0, 4), %%mm1 \n\t" + "movq -12(%1, %0, 4), %%mm2 \n\t" + "movq -4(%1, %0, 4), %%mm3 \n\t" + PAVGB" -28(%2, %0, 4), %%mm0 \n\t" + PAVGB" -20(%2, %0, 4), %%mm1 \n\t" + PAVGB" -12(%2, %0, 4), %%mm2 \n\t" + PAVGB" - 4(%2, %0, 4), %%mm3 \n\t" + "psrlw $8, %%mm0 \n\t" + "psrlw $8, %%mm1 \n\t" + "psrlw $8, %%mm2 \n\t" + "psrlw $8, %%mm3 \n\t" + "packuswb %%mm1, %%mm0 \n\t" + "packuswb %%mm3, %%mm2 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm2, %%mm3 \n\t" + "psrlw $8, %%mm0 \n\t" + "psrlw $8, %%mm2 \n\t" + "pand %%mm7, %%mm1 \n\t" + "pand %%mm7, %%mm3 \n\t" + "packuswb %%mm2, %%mm0 \n\t" + "packuswb %%mm3, %%mm1 \n\t" + MOVNTQ" %%mm0,- 7(%4, %0) \n\t" + MOVNTQ" %%mm1,- 7(%3, %0) \n\t" + "add $8, %0 \n\t" + " js 1b \n\t" + : "+r"(count) + : "r"(src0), "r"(src1), "r"(dst0), "r"(dst1) + ); + count -= 7; + } +#endif + src0++; + src1++; + while(count<0) { + dst0[count]= (src0[4*count+0]+src1[4*count+0])>>1; + dst1[count]= (src0[4*count+2]+src1[4*count+2])>>1; + count++; + } +} + +static void yuyvtoyuv420_mmxext(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, const uint8_t *src, + int width, int height, + int lumStride, int chromStride, int srcStride) +{ + const int chromWidth = AV_CEIL_RSHIFT(width, 1); + + for (int y = 0; y < height; y++) { + extract_even_mmxext(src, ydst, width); + if(y&1) { + extract_odd2avg_mmxext(src-srcStride, src, udst, vdst, chromWidth); + udst+= chromStride; + vdst+= chromStride; + } + + src += srcStride; + ydst+= lumStride; + } + __asm__( + EMMS" \n\t" + SFENCE" \n\t" + ::: "memory" + ); +} + +static void yuyvtoyuv422_mmxext(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, const uint8_t *src, + int width, int height, + int lumStride, int chromStride, int srcStride) +{ + const int chromWidth = AV_CEIL_RSHIFT(width, 1); + + for (int y = 0; y < height; y++) { + extract_even_mmxext(src, ydst, width); + extract_odd2_mmxext(src, udst, vdst, chromWidth); + + src += srcStride; + ydst+= lumStride; + udst+= chromStride; + vdst+= chromStride; + } + __asm__( + EMMS" \n\t" + SFENCE" \n\t" + ::: "memory" + ); +} + +static void uyvytoyuv420_mmxext(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, const uint8_t *src, + int width, int height, + int lumStride, int chromStride, int srcStride) +{ + const int chromWidth = AV_CEIL_RSHIFT(width, 1); + + for (int y = 0; y < height; y++) { + extract_odd_mmxext(src, ydst, width); + if(y&1) { + extract_even2avg_mmxext(src-srcStride, src, udst, vdst, chromWidth); + udst+= chromStride; + vdst+= chromStride; + } + + src += srcStride; + ydst+= lumStride; + } + __asm__( + EMMS" \n\t" + SFENCE" \n\t" + ::: "memory" + ); +} + +#if ARCH_X86_32 +static void uyvytoyuv422_mmxext(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, const uint8_t *src, + int width, int height, + int lumStride, int chromStride, int srcStride) +{ + const int chromWidth = AV_CEIL_RSHIFT(width, 1); + + for (int y = 0; y < height; y++) { + extract_odd_mmxext(src, ydst, width); + extract_even2_mmxext(src, udst, vdst, chromWidth); + + src += srcStride; + ydst+= lumStride; + udst+= chromStride; + vdst+= chromStride; + } + __asm__( + EMMS" \n\t" + SFENCE" \n\t" + ::: "memory" + ); +} +#endif /* ARCH_X86_32 */ + +static av_cold void rgb2rgb_init_mmxext(void) +{ + rgb15to16 = rgb15to16_mmxext; + rgb15tobgr24 = rgb15tobgr24_mmxext; + rgb15to32 = rgb15to32_mmxext; + rgb16tobgr24 = rgb16tobgr24_mmxext; + rgb16to32 = rgb16to32_mmxext; + rgb16to15 = rgb16to15_mmxext; + rgb24tobgr16 = rgb24tobgr16_mmxext; + rgb24tobgr15 = rgb24tobgr15_mmxext; + rgb24tobgr32 = rgb24tobgr32_mmxext; + rgb32to16 = rgb32to16_mmxext; + rgb32to15 = rgb32to15_mmxext; + rgb32tobgr24 = rgb32tobgr24_mmxext; + rgb24to15 = rgb24to15_mmxext; + rgb24to16 = rgb24to16_mmxext; + rgb24tobgr24 = rgb24tobgr24_mmxext; + rgb32tobgr16 = rgb32tobgr16_mmxext; + rgb32tobgr15 = rgb32tobgr15_mmxext; + yv12toyuy2 = yv12toyuy2_mmxext; + yv12touyvy = yv12touyvy_mmxext; + yuv422ptoyuy2 = yuv422ptoyuy2_mmxext; + yuv422ptouyvy = yuv422ptouyvy_mmxext; + yuy2toyv12 = yuy2toyv12_mmxext; + vu9_to_vu12 = vu9_to_vu12_mmxext; + yvu9_to_yuy2 = yvu9_to_yuy2_mmxext; +#if ARCH_X86_32 + uyvytoyuv422 = uyvytoyuv422_mmxext; +#endif + yuyvtoyuv422 = yuyvtoyuv422_mmxext; + + planar2x = planar2x_mmxext; +#if HAVE_7REGS + ff_rgb24toyv12 = rgb24toyv12_mmxext; +#endif /* HAVE_7REGS */ + + yuyvtoyuv420 = yuyvtoyuv420_mmxext; + uyvytoyuv420 = uyvytoyuv420_mmxext; +} //SSE2 versions -#undef RENAME -#undef COMPILE_TEMPLATE_SSE2 -#define COMPILE_TEMPLATE_SSE2 1 -#define RENAME(a) a ## _sse2 -#include "rgb2rgb_template.c" +static void interleave_bytes_sse2(const uint8_t *src1, const uint8_t *src2, uint8_t *dest, + int width, int height, int src1Stride, + int src2Stride, int dstStride) +{ + for (int h = 0; h < height; h++) { + if (width >= 16) { + if (!((((intptr_t)src1) | ((intptr_t)src2) | ((intptr_t)dest))&15)) { + __asm__( + "xor %%"FF_REG_a", %%"FF_REG_a" \n\t" + "1: \n\t" + PREFETCH" 64(%1, %%"FF_REG_a") \n\t" + PREFETCH" 64(%2, %%"FF_REG_a") \n\t" + "movdqa (%1, %%"FF_REG_a"), %%xmm0 \n\t" + "movdqa (%1, %%"FF_REG_a"), %%xmm1 \n\t" + "movdqa (%2, %%"FF_REG_a"), %%xmm2 \n\t" + "punpcklbw %%xmm2, %%xmm0 \n\t" + "punpckhbw %%xmm2, %%xmm1 \n\t" + "movntdq %%xmm0, (%0, %%"FF_REG_a", 2) \n\t" + "movntdq %%xmm1, 16(%0, %%"FF_REG_a", 2) \n\t" + "add $16, %%"FF_REG_a" \n\t" + "cmp %3, %%"FF_REG_a" \n\t" + " jb 1b \n\t" + ::"r"(dest), "r"(src1), "r"(src2), "r" ((x86_reg)width-15) + : "memory", XMM_CLOBBERS("xmm0", "xmm1", "xmm2",) "%"FF_REG_a + ); + } else + __asm__( + "xor %%"FF_REG_a", %%"FF_REG_a" \n\t" + "1: \n\t" + PREFETCH" 64(%1, %%"FF_REG_a") \n\t" + PREFETCH" 64(%2, %%"FF_REG_a") \n\t" + "movq (%1, %%"FF_REG_a"), %%mm0 \n\t" + "movq 8(%1, %%"FF_REG_a"), %%mm2 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm2, %%mm3 \n\t" + "movq (%2, %%"FF_REG_a"), %%mm4 \n\t" + "movq 8(%2, %%"FF_REG_a"), %%mm5 \n\t" + "punpcklbw %%mm4, %%mm0 \n\t" + "punpckhbw %%mm4, %%mm1 \n\t" + "punpcklbw %%mm5, %%mm2 \n\t" + "punpckhbw %%mm5, %%mm3 \n\t" + MOVNTQ" %%mm0, (%0, %%"FF_REG_a", 2) \n\t" + MOVNTQ" %%mm1, 8(%0, %%"FF_REG_a", 2) \n\t" + MOVNTQ" %%mm2, 16(%0, %%"FF_REG_a", 2) \n\t" + MOVNTQ" %%mm3, 24(%0, %%"FF_REG_a", 2) \n\t" + "add $16, %%"FF_REG_a" \n\t" + "cmp %3, %%"FF_REG_a" \n\t" + " jb 1b \n\t" + ::"r"(dest), "r"(src1), "r"(src2), "r" ((x86_reg)width-15) + : "memory", "%"FF_REG_a + ); + + } + for (int w = (width & (~15)); w < width; w++) { + dest[2*w+0] = src1[w]; + dest[2*w+1] = src2[w]; + } + dest += dstStride; + src1 += src1Stride; + src2 += src2Stride; + } + __asm__( + EMMS" \n\t" + SFENCE" \n\t" + ::: "memory" + ); +} /* RGB15->RGB16 original by Strepto/Astral @@ -133,12 +2367,12 @@ void ff_uyvytoyuv422_avx(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, #define DEINTERLEAVE_BYTES(cpuext) \ void ff_nv12ToUV_ ## cpuext(uint8_t *dstU, uint8_t *dstV, \ - const uint8_t *unused, \ - const uint8_t *src1, \ - const uint8_t *src2, \ - int w, \ - uint32_t *unused2, \ - void *opq); \ + const uint8_t *unused, \ + const uint8_t *src1, \ + const uint8_t *src2, \ + int w, \ + uint32_t *unused2, \ + void *opq); \ static void deinterleave_bytes_ ## cpuext(const uint8_t *src, uint8_t *dst1, uint8_t *dst2, \ int width, int height, int srcStride, \ int dst1Stride, int dst2Stride) \ @@ -166,7 +2400,7 @@ av_cold void rgb2rgb_init_x86(void) if (INLINE_MMXEXT(cpu_flags)) rgb2rgb_init_mmxext(); if (INLINE_SSE2(cpu_flags)) - rgb2rgb_init_sse2(); + interleaveBytes = interleave_bytes_sse2; #endif /* HAVE_INLINE_ASM */ if (EXTERNAL_MMXEXT(cpu_flags)) { diff --git a/libswscale/x86/rgb2rgb_template.c b/libswscale/x86/rgb2rgb_template.c deleted file mode 100644 index d1403d08e6..0000000000 --- a/libswscale/x86/rgb2rgb_template.c +++ /dev/null @@ -1,2315 +0,0 @@ -/* - * software RGB to RGB converter - * pluralize by software PAL8 to RGB converter - * software YUV to YUV converter - * software YUV to RGB converter - * Written by Nick Kurshev. - * palette & YUV & runtime CPU stuff by Michael (michaelni@gmx.at) - * lot of big-endian byte order fixes by Alex Beregszaszi - * - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - */ - -#include -#include - -#include "libavutil/attributes.h" -#include "libavutil/x86/asm.h" - -#undef PREFETCH -#undef MOVNTQ -#undef EMMS -#undef SFENCE -#undef PAVGB - -#define PREFETCH "prefetchnta" -#define PAVGB "pavgb" -#define MOVNTQ "movntq" -#define SFENCE "sfence" - -#define EMMS "emms" - -#if !COMPILE_TEMPLATE_SSE2 - -static inline void RENAME(rgb24tobgr32)(const uint8_t *src, uint8_t *dst, int src_size) -{ - uint8_t *dest = dst; - const uint8_t *s = src; - const uint8_t *end; - const uint8_t *mm_end; - end = s + src_size; - __asm__ volatile(PREFETCH" %0"::"m"(*s):"memory"); - mm_end = end - 23; - __asm__ volatile("movq %0, %%mm7"::"m"(mask32a):"memory"); - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movd (%1), %%mm0 \n\t" - "punpckldq 3(%1), %%mm0 \n\t" - "movd 6(%1), %%mm1 \n\t" - "punpckldq 9(%1), %%mm1 \n\t" - "movd 12(%1), %%mm2 \n\t" - "punpckldq 15(%1), %%mm2 \n\t" - "movd 18(%1), %%mm3 \n\t" - "punpckldq 21(%1), %%mm3 \n\t" - "por %%mm7, %%mm0 \n\t" - "por %%mm7, %%mm1 \n\t" - "por %%mm7, %%mm2 \n\t" - "por %%mm7, %%mm3 \n\t" - MOVNTQ" %%mm0, (%0) \n\t" - MOVNTQ" %%mm1, 8(%0) \n\t" - MOVNTQ" %%mm2, 16(%0) \n\t" - MOVNTQ" %%mm3, 24(%0)" - :: "r"(dest), "r"(s) - :"memory"); - dest += 32; - s += 24; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - *dest++ = *s++; - *dest++ = *s++; - *dest++ = *s++; - *dest++ = 255; - } -} - -#define STORE_BGR24_MMX \ - "psrlq $8, %%mm2 \n\t" \ - "psrlq $8, %%mm3 \n\t" \ - "psrlq $8, %%mm6 \n\t" \ - "psrlq $8, %%mm7 \n\t" \ - "pand "MANGLE(mask24l)", %%mm0\n\t" \ - "pand "MANGLE(mask24l)", %%mm1\n\t" \ - "pand "MANGLE(mask24l)", %%mm4\n\t" \ - "pand "MANGLE(mask24l)", %%mm5\n\t" \ - "pand "MANGLE(mask24h)", %%mm2\n\t" \ - "pand "MANGLE(mask24h)", %%mm3\n\t" \ - "pand "MANGLE(mask24h)", %%mm6\n\t" \ - "pand "MANGLE(mask24h)", %%mm7\n\t" \ - "por %%mm2, %%mm0 \n\t" \ - "por %%mm3, %%mm1 \n\t" \ - "por %%mm6, %%mm4 \n\t" \ - "por %%mm7, %%mm5 \n\t" \ - \ - "movq %%mm1, %%mm2 \n\t" \ - "movq %%mm4, %%mm3 \n\t" \ - "psllq $48, %%mm2 \n\t" \ - "psllq $32, %%mm3 \n\t" \ - "por %%mm2, %%mm0 \n\t" \ - "psrlq $16, %%mm1 \n\t" \ - "psrlq $32, %%mm4 \n\t" \ - "psllq $16, %%mm5 \n\t" \ - "por %%mm3, %%mm1 \n\t" \ - "por %%mm5, %%mm4 \n\t" \ - \ - MOVNTQ" %%mm0, (%0) \n\t" \ - MOVNTQ" %%mm1, 8(%0) \n\t" \ - MOVNTQ" %%mm4, 16(%0)" - - -static inline void RENAME(rgb32tobgr24)(const uint8_t *src, uint8_t *dst, int src_size) -{ - uint8_t *dest = dst; - const uint8_t *s = src; - const uint8_t *end; - const uint8_t *mm_end; - end = s + src_size; - __asm__ volatile(PREFETCH" %0"::"m"(*s):"memory"); - mm_end = end - 31; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movq (%1), %%mm0 \n\t" - "movq 8(%1), %%mm1 \n\t" - "movq 16(%1), %%mm4 \n\t" - "movq 24(%1), %%mm5 \n\t" - "movq %%mm0, %%mm2 \n\t" - "movq %%mm1, %%mm3 \n\t" - "movq %%mm4, %%mm6 \n\t" - "movq %%mm5, %%mm7 \n\t" - STORE_BGR24_MMX - :: "r"(dest), "r"(s) - NAMED_CONSTRAINTS_ADD(mask24l,mask24h) - :"memory"); - dest += 24; - s += 32; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - *dest++ = *s++; - *dest++ = *s++; - *dest++ = *s++; - s++; - } -} - -/* - original by Strepto/Astral - ported to gcc & bugfixed: A'rpi - MMXEXT, 3DNOW optimization by Nick Kurshev - 32-bit C version, and and&add trick by Michael Niedermayer -*/ -static inline void RENAME(rgb15to16)(const uint8_t *src, uint8_t *dst, int src_size) -{ - register const uint8_t* s=src; - register uint8_t* d=dst; - register const uint8_t *end; - const uint8_t *mm_end; - end = s + src_size; - __asm__ volatile(PREFETCH" %0"::"m"(*s)); - __asm__ volatile("movq %0, %%mm4"::"m"(mask15s)); - mm_end = end - 15; - while (s>1)&0x7FE07FE0) | (x&0x001F001F); - s+=4; - d+=4; - } - if (s < end) { - register uint16_t x= *((const uint16_t*)s); - *((uint16_t *)d) = ((x>>1)&0x7FE0) | (x&0x001F); - } -} - -static inline void RENAME(rgb32to16)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint8_t *s = src; - const uint8_t *end; - const uint8_t *mm_end; - uint16_t *d = (uint16_t *)dst; - end = s + src_size; - mm_end = end - 15; - __asm__ volatile( - "movq %3, %%mm5 \n\t" - "movq %4, %%mm6 \n\t" - "movq %5, %%mm7 \n\t" - "jmp 2f \n\t" - ".p2align 4 \n\t" - "1: \n\t" - PREFETCH" 32(%1) \n\t" - "movd (%1), %%mm0 \n\t" - "movd 4(%1), %%mm3 \n\t" - "punpckldq 8(%1), %%mm0 \n\t" - "punpckldq 12(%1), %%mm3 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm3, %%mm4 \n\t" - "pand %%mm6, %%mm0 \n\t" - "pand %%mm6, %%mm3 \n\t" - "pmaddwd %%mm7, %%mm0 \n\t" - "pmaddwd %%mm7, %%mm3 \n\t" - "pand %%mm5, %%mm1 \n\t" - "pand %%mm5, %%mm4 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm4, %%mm3 \n\t" - "psrld $5, %%mm0 \n\t" - "pslld $11, %%mm3 \n\t" - "por %%mm3, %%mm0 \n\t" - MOVNTQ" %%mm0, (%0) \n\t" - "add $16, %1 \n\t" - "add $8, %0 \n\t" - "2: \n\t" - "cmp %2, %1 \n\t" - " jb 1b \n\t" - : "+r" (d), "+r"(s) - : "r" (mm_end), "m" (mask3216g), "m" (mask3216br), "m" (mul3216) - ); - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - register int rgb = *(const uint32_t*)s; s += 4; - *d++ = ((rgb&0xFF)>>3) + ((rgb&0xFC00)>>5) + ((rgb&0xF80000)>>8); - } -} - -static inline void RENAME(rgb32tobgr16)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint8_t *s = src; - const uint8_t *end; - const uint8_t *mm_end; - uint16_t *d = (uint16_t *)dst; - end = s + src_size; - __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); - __asm__ volatile( - "movq %0, %%mm7 \n\t" - "movq %1, %%mm6 \n\t" - ::"m"(red_16mask),"m"(green_16mask)); - mm_end = end - 15; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movd (%1), %%mm0 \n\t" - "movd 4(%1), %%mm3 \n\t" - "punpckldq 8(%1), %%mm0 \n\t" - "punpckldq 12(%1), %%mm3 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm0, %%mm2 \n\t" - "movq %%mm3, %%mm4 \n\t" - "movq %%mm3, %%mm5 \n\t" - "psllq $8, %%mm0 \n\t" - "psllq $8, %%mm3 \n\t" - "pand %%mm7, %%mm0 \n\t" - "pand %%mm7, %%mm3 \n\t" - "psrlq $5, %%mm1 \n\t" - "psrlq $5, %%mm4 \n\t" - "pand %%mm6, %%mm1 \n\t" - "pand %%mm6, %%mm4 \n\t" - "psrlq $19, %%mm2 \n\t" - "psrlq $19, %%mm5 \n\t" - "pand %2, %%mm2 \n\t" - "pand %2, %%mm5 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm4, %%mm3 \n\t" - "por %%mm2, %%mm0 \n\t" - "por %%mm5, %%mm3 \n\t" - "psllq $16, %%mm3 \n\t" - "por %%mm3, %%mm0 \n\t" - MOVNTQ" %%mm0, (%0) \n\t" - :: "r"(d),"r"(s),"m"(blue_16mask):"memory"); - d += 4; - s += 16; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - register int rgb = *(const uint32_t*)s; s += 4; - *d++ = ((rgb&0xF8)<<8) + ((rgb&0xFC00)>>5) + ((rgb&0xF80000)>>19); - } -} - -static inline void RENAME(rgb32to15)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint8_t *s = src; - const uint8_t *end; - const uint8_t *mm_end; - uint16_t *d = (uint16_t *)dst; - end = s + src_size; - mm_end = end - 15; - __asm__ volatile( - "movq %3, %%mm5 \n\t" - "movq %4, %%mm6 \n\t" - "movq %5, %%mm7 \n\t" - "jmp 2f \n\t" - ".p2align 4 \n\t" - "1: \n\t" - PREFETCH" 32(%1) \n\t" - "movd (%1), %%mm0 \n\t" - "movd 4(%1), %%mm3 \n\t" - "punpckldq 8(%1), %%mm0 \n\t" - "punpckldq 12(%1), %%mm3 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm3, %%mm4 \n\t" - "pand %%mm6, %%mm0 \n\t" - "pand %%mm6, %%mm3 \n\t" - "pmaddwd %%mm7, %%mm0 \n\t" - "pmaddwd %%mm7, %%mm3 \n\t" - "pand %%mm5, %%mm1 \n\t" - "pand %%mm5, %%mm4 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm4, %%mm3 \n\t" - "psrld $6, %%mm0 \n\t" - "pslld $10, %%mm3 \n\t" - "por %%mm3, %%mm0 \n\t" - MOVNTQ" %%mm0, (%0) \n\t" - "add $16, %1 \n\t" - "add $8, %0 \n\t" - "2: \n\t" - "cmp %2, %1 \n\t" - " jb 1b \n\t" - : "+r" (d), "+r"(s) - : "r" (mm_end), "m" (mask3215g), "m" (mask3216br), "m" (mul3215) - ); - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - register int rgb = *(const uint32_t*)s; s += 4; - *d++ = ((rgb&0xFF)>>3) + ((rgb&0xF800)>>6) + ((rgb&0xF80000)>>9); - } -} - -static inline void RENAME(rgb32tobgr15)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint8_t *s = src; - const uint8_t *end; - const uint8_t *mm_end; - uint16_t *d = (uint16_t *)dst; - end = s + src_size; - __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); - __asm__ volatile( - "movq %0, %%mm7 \n\t" - "movq %1, %%mm6 \n\t" - ::"m"(red_15mask),"m"(green_15mask)); - mm_end = end - 15; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movd (%1), %%mm0 \n\t" - "movd 4(%1), %%mm3 \n\t" - "punpckldq 8(%1), %%mm0 \n\t" - "punpckldq 12(%1), %%mm3 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm0, %%mm2 \n\t" - "movq %%mm3, %%mm4 \n\t" - "movq %%mm3, %%mm5 \n\t" - "psllq $7, %%mm0 \n\t" - "psllq $7, %%mm3 \n\t" - "pand %%mm7, %%mm0 \n\t" - "pand %%mm7, %%mm3 \n\t" - "psrlq $6, %%mm1 \n\t" - "psrlq $6, %%mm4 \n\t" - "pand %%mm6, %%mm1 \n\t" - "pand %%mm6, %%mm4 \n\t" - "psrlq $19, %%mm2 \n\t" - "psrlq $19, %%mm5 \n\t" - "pand %2, %%mm2 \n\t" - "pand %2, %%mm5 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm4, %%mm3 \n\t" - "por %%mm2, %%mm0 \n\t" - "por %%mm5, %%mm3 \n\t" - "psllq $16, %%mm3 \n\t" - "por %%mm3, %%mm0 \n\t" - MOVNTQ" %%mm0, (%0) \n\t" - ::"r"(d),"r"(s),"m"(blue_15mask):"memory"); - d += 4; - s += 16; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - register int rgb = *(const uint32_t*)s; s += 4; - *d++ = ((rgb&0xF8)<<7) + ((rgb&0xF800)>>6) + ((rgb&0xF80000)>>19); - } -} - -static inline void RENAME(rgb24tobgr16)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint8_t *s = src; - const uint8_t *end; - const uint8_t *mm_end; - uint16_t *d = (uint16_t *)dst; - end = s + src_size; - __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); - __asm__ volatile( - "movq %0, %%mm7 \n\t" - "movq %1, %%mm6 \n\t" - ::"m"(red_16mask),"m"(green_16mask)); - mm_end = end - 11; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movd (%1), %%mm0 \n\t" - "movd 3(%1), %%mm3 \n\t" - "punpckldq 6(%1), %%mm0 \n\t" - "punpckldq 9(%1), %%mm3 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm0, %%mm2 \n\t" - "movq %%mm3, %%mm4 \n\t" - "movq %%mm3, %%mm5 \n\t" - "psrlq $3, %%mm0 \n\t" - "psrlq $3, %%mm3 \n\t" - "pand %2, %%mm0 \n\t" - "pand %2, %%mm3 \n\t" - "psrlq $5, %%mm1 \n\t" - "psrlq $5, %%mm4 \n\t" - "pand %%mm6, %%mm1 \n\t" - "pand %%mm6, %%mm4 \n\t" - "psrlq $8, %%mm2 \n\t" - "psrlq $8, %%mm5 \n\t" - "pand %%mm7, %%mm2 \n\t" - "pand %%mm7, %%mm5 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm4, %%mm3 \n\t" - "por %%mm2, %%mm0 \n\t" - "por %%mm5, %%mm3 \n\t" - "psllq $16, %%mm3 \n\t" - "por %%mm3, %%mm0 \n\t" - MOVNTQ" %%mm0, (%0) \n\t" - ::"r"(d),"r"(s),"m"(blue_16mask):"memory"); - d += 4; - s += 12; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - const int b = *s++; - const int g = *s++; - const int r = *s++; - *d++ = (b>>3) | ((g&0xFC)<<3) | ((r&0xF8)<<8); - } -} - -static inline void RENAME(rgb24to16)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint8_t *s = src; - const uint8_t *end; - const uint8_t *mm_end; - uint16_t *d = (uint16_t *)dst; - end = s + src_size; - __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); - __asm__ volatile( - "movq %0, %%mm7 \n\t" - "movq %1, %%mm6 \n\t" - ::"m"(red_16mask),"m"(green_16mask)); - mm_end = end - 15; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movd (%1), %%mm0 \n\t" - "movd 3(%1), %%mm3 \n\t" - "punpckldq 6(%1), %%mm0 \n\t" - "punpckldq 9(%1), %%mm3 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm0, %%mm2 \n\t" - "movq %%mm3, %%mm4 \n\t" - "movq %%mm3, %%mm5 \n\t" - "psllq $8, %%mm0 \n\t" - "psllq $8, %%mm3 \n\t" - "pand %%mm7, %%mm0 \n\t" - "pand %%mm7, %%mm3 \n\t" - "psrlq $5, %%mm1 \n\t" - "psrlq $5, %%mm4 \n\t" - "pand %%mm6, %%mm1 \n\t" - "pand %%mm6, %%mm4 \n\t" - "psrlq $19, %%mm2 \n\t" - "psrlq $19, %%mm5 \n\t" - "pand %2, %%mm2 \n\t" - "pand %2, %%mm5 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm4, %%mm3 \n\t" - "por %%mm2, %%mm0 \n\t" - "por %%mm5, %%mm3 \n\t" - "psllq $16, %%mm3 \n\t" - "por %%mm3, %%mm0 \n\t" - MOVNTQ" %%mm0, (%0) \n\t" - ::"r"(d),"r"(s),"m"(blue_16mask):"memory"); - d += 4; - s += 12; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - const int r = *s++; - const int g = *s++; - const int b = *s++; - *d++ = (b>>3) | ((g&0xFC)<<3) | ((r&0xF8)<<8); - } -} - -static inline void RENAME(rgb24tobgr15)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint8_t *s = src; - const uint8_t *end; - const uint8_t *mm_end; - uint16_t *d = (uint16_t *)dst; - end = s + src_size; - __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); - __asm__ volatile( - "movq %0, %%mm7 \n\t" - "movq %1, %%mm6 \n\t" - ::"m"(red_15mask),"m"(green_15mask)); - mm_end = end - 11; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movd (%1), %%mm0 \n\t" - "movd 3(%1), %%mm3 \n\t" - "punpckldq 6(%1), %%mm0 \n\t" - "punpckldq 9(%1), %%mm3 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm0, %%mm2 \n\t" - "movq %%mm3, %%mm4 \n\t" - "movq %%mm3, %%mm5 \n\t" - "psrlq $3, %%mm0 \n\t" - "psrlq $3, %%mm3 \n\t" - "pand %2, %%mm0 \n\t" - "pand %2, %%mm3 \n\t" - "psrlq $6, %%mm1 \n\t" - "psrlq $6, %%mm4 \n\t" - "pand %%mm6, %%mm1 \n\t" - "pand %%mm6, %%mm4 \n\t" - "psrlq $9, %%mm2 \n\t" - "psrlq $9, %%mm5 \n\t" - "pand %%mm7, %%mm2 \n\t" - "pand %%mm7, %%mm5 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm4, %%mm3 \n\t" - "por %%mm2, %%mm0 \n\t" - "por %%mm5, %%mm3 \n\t" - "psllq $16, %%mm3 \n\t" - "por %%mm3, %%mm0 \n\t" - MOVNTQ" %%mm0, (%0) \n\t" - ::"r"(d),"r"(s),"m"(blue_15mask):"memory"); - d += 4; - s += 12; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - const int b = *s++; - const int g = *s++; - const int r = *s++; - *d++ = (b>>3) | ((g&0xF8)<<2) | ((r&0xF8)<<7); - } -} - -static inline void RENAME(rgb24to15)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint8_t *s = src; - const uint8_t *end; - const uint8_t *mm_end; - uint16_t *d = (uint16_t *)dst; - end = s + src_size; - __asm__ volatile(PREFETCH" %0"::"m"(*src):"memory"); - __asm__ volatile( - "movq %0, %%mm7 \n\t" - "movq %1, %%mm6 \n\t" - ::"m"(red_15mask),"m"(green_15mask)); - mm_end = end - 15; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movd (%1), %%mm0 \n\t" - "movd 3(%1), %%mm3 \n\t" - "punpckldq 6(%1), %%mm0 \n\t" - "punpckldq 9(%1), %%mm3 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm0, %%mm2 \n\t" - "movq %%mm3, %%mm4 \n\t" - "movq %%mm3, %%mm5 \n\t" - "psllq $7, %%mm0 \n\t" - "psllq $7, %%mm3 \n\t" - "pand %%mm7, %%mm0 \n\t" - "pand %%mm7, %%mm3 \n\t" - "psrlq $6, %%mm1 \n\t" - "psrlq $6, %%mm4 \n\t" - "pand %%mm6, %%mm1 \n\t" - "pand %%mm6, %%mm4 \n\t" - "psrlq $19, %%mm2 \n\t" - "psrlq $19, %%mm5 \n\t" - "pand %2, %%mm2 \n\t" - "pand %2, %%mm5 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm4, %%mm3 \n\t" - "por %%mm2, %%mm0 \n\t" - "por %%mm5, %%mm3 \n\t" - "psllq $16, %%mm3 \n\t" - "por %%mm3, %%mm0 \n\t" - MOVNTQ" %%mm0, (%0) \n\t" - ::"r"(d),"r"(s),"m"(blue_15mask):"memory"); - d += 4; - s += 12; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - const int r = *s++; - const int g = *s++; - const int b = *s++; - *d++ = (b>>3) | ((g&0xF8)<<2) | ((r&0xF8)<<7); - } -} - -static inline void RENAME(rgb15tobgr24)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint16_t *end; - const uint16_t *mm_end; - uint8_t *d = dst; - const uint16_t *s = (const uint16_t*)src; - end = s + src_size/2; - __asm__ volatile(PREFETCH" %0"::"m"(*s):"memory"); - mm_end = end - 7; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movq (%1), %%mm0 \n\t" - "movq (%1), %%mm1 \n\t" - "movq (%1), %%mm2 \n\t" - "pand %2, %%mm0 \n\t" - "pand %3, %%mm1 \n\t" - "pand %4, %%mm2 \n\t" - "psllq $5, %%mm0 \n\t" - "pmulhw "MANGLE(mul15_mid)", %%mm0 \n\t" - "pmulhw "MANGLE(mul15_mid)", %%mm1 \n\t" - "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" - "movq %%mm0, %%mm3 \n\t" - "movq %%mm1, %%mm4 \n\t" - "movq %%mm2, %%mm5 \n\t" - "punpcklwd %5, %%mm0 \n\t" - "punpcklwd %5, %%mm1 \n\t" - "punpcklwd %5, %%mm2 \n\t" - "punpckhwd %5, %%mm3 \n\t" - "punpckhwd %5, %%mm4 \n\t" - "punpckhwd %5, %%mm5 \n\t" - "psllq $8, %%mm1 \n\t" - "psllq $16, %%mm2 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm2, %%mm0 \n\t" - "psllq $8, %%mm4 \n\t" - "psllq $16, %%mm5 \n\t" - "por %%mm4, %%mm3 \n\t" - "por %%mm5, %%mm3 \n\t" - - "movq %%mm0, %%mm6 \n\t" - "movq %%mm3, %%mm7 \n\t" - - "movq 8(%1), %%mm0 \n\t" - "movq 8(%1), %%mm1 \n\t" - "movq 8(%1), %%mm2 \n\t" - "pand %2, %%mm0 \n\t" - "pand %3, %%mm1 \n\t" - "pand %4, %%mm2 \n\t" - "psllq $5, %%mm0 \n\t" - "pmulhw "MANGLE(mul15_mid)", %%mm0 \n\t" - "pmulhw "MANGLE(mul15_mid)", %%mm1 \n\t" - "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" - "movq %%mm0, %%mm3 \n\t" - "movq %%mm1, %%mm4 \n\t" - "movq %%mm2, %%mm5 \n\t" - "punpcklwd %5, %%mm0 \n\t" - "punpcklwd %5, %%mm1 \n\t" - "punpcklwd %5, %%mm2 \n\t" - "punpckhwd %5, %%mm3 \n\t" - "punpckhwd %5, %%mm4 \n\t" - "punpckhwd %5, %%mm5 \n\t" - "psllq $8, %%mm1 \n\t" - "psllq $16, %%mm2 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm2, %%mm0 \n\t" - "psllq $8, %%mm4 \n\t" - "psllq $16, %%mm5 \n\t" - "por %%mm4, %%mm3 \n\t" - "por %%mm5, %%mm3 \n\t" - - :"=m"(*d) - :"r"(s),"m"(mask15b),"m"(mask15g),"m"(mask15r), "m"(mmx_null) - NAMED_CONSTRAINTS_ADD(mul15_mid,mul15_hi) - :"memory"); - /* borrowed 32 to 24 */ - __asm__ volatile( - "movq %%mm0, %%mm4 \n\t" - "movq %%mm3, %%mm5 \n\t" - "movq %%mm6, %%mm0 \n\t" - "movq %%mm7, %%mm1 \n\t" - - "movq %%mm4, %%mm6 \n\t" - "movq %%mm5, %%mm7 \n\t" - "movq %%mm0, %%mm2 \n\t" - "movq %%mm1, %%mm3 \n\t" - - STORE_BGR24_MMX - - :: "r"(d), "m"(*s) - NAMED_CONSTRAINTS_ADD(mask24l,mask24h) - :"memory"); - d += 24; - s += 8; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - register uint16_t bgr; - bgr = *s++; - *d++ = ((bgr&0x1F)<<3) | ((bgr&0x1F)>>2); - *d++ = ((bgr&0x3E0)>>2) | ((bgr&0x3E0)>>7); - *d++ = ((bgr&0x7C00)>>7) | ((bgr&0x7C00)>>12); - } -} - -static inline void RENAME(rgb16tobgr24)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint16_t *end; - const uint16_t *mm_end; - uint8_t *d = (uint8_t *)dst; - const uint16_t *s = (const uint16_t *)src; - end = s + src_size/2; - __asm__ volatile(PREFETCH" %0"::"m"(*s):"memory"); - mm_end = end - 7; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movq (%1), %%mm0 \n\t" - "movq (%1), %%mm1 \n\t" - "movq (%1), %%mm2 \n\t" - "pand %2, %%mm0 \n\t" - "pand %3, %%mm1 \n\t" - "pand %4, %%mm2 \n\t" - "psllq $5, %%mm0 \n\t" - "psrlq $1, %%mm2 \n\t" - "pmulhw "MANGLE(mul15_mid)", %%mm0 \n\t" - "pmulhw "MANGLE(mul16_mid)", %%mm1 \n\t" - "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" - "movq %%mm0, %%mm3 \n\t" - "movq %%mm1, %%mm4 \n\t" - "movq %%mm2, %%mm5 \n\t" - "punpcklwd %5, %%mm0 \n\t" - "punpcklwd %5, %%mm1 \n\t" - "punpcklwd %5, %%mm2 \n\t" - "punpckhwd %5, %%mm3 \n\t" - "punpckhwd %5, %%mm4 \n\t" - "punpckhwd %5, %%mm5 \n\t" - "psllq $8, %%mm1 \n\t" - "psllq $16, %%mm2 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm2, %%mm0 \n\t" - "psllq $8, %%mm4 \n\t" - "psllq $16, %%mm5 \n\t" - "por %%mm4, %%mm3 \n\t" - "por %%mm5, %%mm3 \n\t" - - "movq %%mm0, %%mm6 \n\t" - "movq %%mm3, %%mm7 \n\t" - - "movq 8(%1), %%mm0 \n\t" - "movq 8(%1), %%mm1 \n\t" - "movq 8(%1), %%mm2 \n\t" - "pand %2, %%mm0 \n\t" - "pand %3, %%mm1 \n\t" - "pand %4, %%mm2 \n\t" - "psllq $5, %%mm0 \n\t" - "psrlq $1, %%mm2 \n\t" - "pmulhw "MANGLE(mul15_mid)", %%mm0 \n\t" - "pmulhw "MANGLE(mul16_mid)", %%mm1 \n\t" - "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" - "movq %%mm0, %%mm3 \n\t" - "movq %%mm1, %%mm4 \n\t" - "movq %%mm2, %%mm5 \n\t" - "punpcklwd %5, %%mm0 \n\t" - "punpcklwd %5, %%mm1 \n\t" - "punpcklwd %5, %%mm2 \n\t" - "punpckhwd %5, %%mm3 \n\t" - "punpckhwd %5, %%mm4 \n\t" - "punpckhwd %5, %%mm5 \n\t" - "psllq $8, %%mm1 \n\t" - "psllq $16, %%mm2 \n\t" - "por %%mm1, %%mm0 \n\t" - "por %%mm2, %%mm0 \n\t" - "psllq $8, %%mm4 \n\t" - "psllq $16, %%mm5 \n\t" - "por %%mm4, %%mm3 \n\t" - "por %%mm5, %%mm3 \n\t" - :"=m"(*d) - :"r"(s),"m"(mask16b),"m"(mask16g),"m"(mask16r),"m"(mmx_null) - NAMED_CONSTRAINTS_ADD(mul15_mid,mul16_mid,mul15_hi) - :"memory"); - /* borrowed 32 to 24 */ - __asm__ volatile( - "movq %%mm0, %%mm4 \n\t" - "movq %%mm3, %%mm5 \n\t" - "movq %%mm6, %%mm0 \n\t" - "movq %%mm7, %%mm1 \n\t" - - "movq %%mm4, %%mm6 \n\t" - "movq %%mm5, %%mm7 \n\t" - "movq %%mm0, %%mm2 \n\t" - "movq %%mm1, %%mm3 \n\t" - - STORE_BGR24_MMX - - :: "r"(d), "m"(*s) - NAMED_CONSTRAINTS_ADD(mask24l,mask24h) - :"memory"); - d += 24; - s += 8; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - register uint16_t bgr; - bgr = *s++; - *d++ = ((bgr&0x1F)<<3) | ((bgr&0x1F)>>2); - *d++ = ((bgr&0x7E0)>>3) | ((bgr&0x7E0)>>9); - *d++ = ((bgr&0xF800)>>8) | ((bgr&0xF800)>>13); - } -} - -/* - * mm0 = 00 B3 00 B2 00 B1 00 B0 - * mm1 = 00 G3 00 G2 00 G1 00 G0 - * mm2 = 00 R3 00 R2 00 R1 00 R0 - * mm6 = FF FF FF FF FF FF FF FF - * mm7 = 00 00 00 00 00 00 00 00 - */ -#define PACK_RGB32 \ - "packuswb %%mm7, %%mm0 \n\t" /* 00 00 00 00 B3 B2 B1 B0 */ \ - "packuswb %%mm7, %%mm1 \n\t" /* 00 00 00 00 G3 G2 G1 G0 */ \ - "packuswb %%mm7, %%mm2 \n\t" /* 00 00 00 00 R3 R2 R1 R0 */ \ - "punpcklbw %%mm1, %%mm0 \n\t" /* G3 B3 G2 B2 G1 B1 G0 B0 */ \ - "punpcklbw %%mm6, %%mm2 \n\t" /* FF R3 FF R2 FF R1 FF R0 */ \ - "movq %%mm0, %%mm3 \n\t" \ - "punpcklwd %%mm2, %%mm0 \n\t" /* FF R1 G1 B1 FF R0 G0 B0 */ \ - "punpckhwd %%mm2, %%mm3 \n\t" /* FF R3 G3 B3 FF R2 G2 B2 */ \ - MOVNTQ" %%mm0, (%0) \n\t" \ - MOVNTQ" %%mm3, 8(%0) \n\t" \ - -static inline void RENAME(rgb15to32)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint16_t *end; - const uint16_t *mm_end; - uint8_t *d = dst; - const uint16_t *s = (const uint16_t *)src; - end = s + src_size/2; - __asm__ volatile(PREFETCH" %0"::"m"(*s):"memory"); - __asm__ volatile("pxor %%mm7,%%mm7 \n\t":::"memory"); - __asm__ volatile("pcmpeqd %%mm6,%%mm6 \n\t":::"memory"); - mm_end = end - 3; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movq (%1), %%mm0 \n\t" - "movq (%1), %%mm1 \n\t" - "movq (%1), %%mm2 \n\t" - "pand %2, %%mm0 \n\t" - "pand %3, %%mm1 \n\t" - "pand %4, %%mm2 \n\t" - "psllq $5, %%mm0 \n\t" - "pmulhw %5, %%mm0 \n\t" - "pmulhw %5, %%mm1 \n\t" - "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" - PACK_RGB32 - ::"r"(d),"r"(s),"m"(mask15b),"m"(mask15g),"m"(mask15r) ,"m"(mul15_mid) - NAMED_CONSTRAINTS_ADD(mul15_hi) - :"memory"); - d += 16; - s += 4; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - register uint16_t bgr; - bgr = *s++; - *d++ = ((bgr&0x1F)<<3) | ((bgr&0x1F)>>2); - *d++ = ((bgr&0x3E0)>>2) | ((bgr&0x3E0)>>7); - *d++ = ((bgr&0x7C00)>>7) | ((bgr&0x7C00)>>12); - *d++ = 255; - } -} - -static inline void RENAME(rgb16to32)(const uint8_t *src, uint8_t *dst, int src_size) -{ - const uint16_t *end; - const uint16_t *mm_end; - uint8_t *d = dst; - const uint16_t *s = (const uint16_t*)src; - end = s + src_size/2; - __asm__ volatile(PREFETCH" %0"::"m"(*s):"memory"); - __asm__ volatile("pxor %%mm7,%%mm7 \n\t":::"memory"); - __asm__ volatile("pcmpeqd %%mm6,%%mm6 \n\t":::"memory"); - mm_end = end - 3; - while (s < mm_end) { - __asm__ volatile( - PREFETCH" 32(%1) \n\t" - "movq (%1), %%mm0 \n\t" - "movq (%1), %%mm1 \n\t" - "movq (%1), %%mm2 \n\t" - "pand %2, %%mm0 \n\t" - "pand %3, %%mm1 \n\t" - "pand %4, %%mm2 \n\t" - "psllq $5, %%mm0 \n\t" - "psrlq $1, %%mm2 \n\t" - "pmulhw %5, %%mm0 \n\t" - "pmulhw "MANGLE(mul16_mid)", %%mm1 \n\t" - "pmulhw "MANGLE(mul15_hi)", %%mm2 \n\t" - PACK_RGB32 - ::"r"(d),"r"(s),"m"(mask16b),"m"(mask16g),"m"(mask16r),"m"(mul15_mid) - NAMED_CONSTRAINTS_ADD(mul16_mid,mul15_hi) - :"memory"); - d += 16; - s += 4; - } - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - while (s < end) { - register uint16_t bgr; - bgr = *s++; - *d++ = ((bgr&0x1F)<<3) | ((bgr&0x1F)>>2); - *d++ = ((bgr&0x7E0)>>3) | ((bgr&0x7E0)>>9); - *d++ = ((bgr&0xF800)>>8) | ((bgr&0xF800)>>13); - *d++ = 255; - } -} - -static inline void RENAME(rgb24tobgr24)(const uint8_t *src, uint8_t *dst, int src_size) -{ - unsigned i; - x86_reg mmx_size= 23 - src_size; - __asm__ volatile ( - "test %%"FF_REG_a", %%"FF_REG_a" \n\t" - "jns 2f \n\t" - "movq "MANGLE(mask24r)", %%mm5 \n\t" - "movq "MANGLE(mask24g)", %%mm6 \n\t" - "movq "MANGLE(mask24b)", %%mm7 \n\t" - ".p2align 4 \n\t" - "1: \n\t" - PREFETCH" 32(%1, %%"FF_REG_a") \n\t" - "movq (%1, %%"FF_REG_a"), %%mm0 \n\t" // BGR BGR BG - "movq (%1, %%"FF_REG_a"), %%mm1 \n\t" // BGR BGR BG - "movq 2(%1, %%"FF_REG_a"), %%mm2 \n\t" // R BGR BGR B - "psllq $16, %%mm0 \n\t" // 00 BGR BGR - "pand %%mm5, %%mm0 \n\t" - "pand %%mm6, %%mm1 \n\t" - "pand %%mm7, %%mm2 \n\t" - "por %%mm0, %%mm1 \n\t" - "por %%mm2, %%mm1 \n\t" - "movq 6(%1, %%"FF_REG_a"), %%mm0 \n\t" // BGR BGR BG - MOVNTQ" %%mm1,(%2, %%"FF_REG_a") \n\t" // RGB RGB RG - "movq 8(%1, %%"FF_REG_a"), %%mm1 \n\t" // R BGR BGR B - "movq 10(%1, %%"FF_REG_a"), %%mm2 \n\t" // GR BGR BGR - "pand %%mm7, %%mm0 \n\t" - "pand %%mm5, %%mm1 \n\t" - "pand %%mm6, %%mm2 \n\t" - "por %%mm0, %%mm1 \n\t" - "por %%mm2, %%mm1 \n\t" - "movq 14(%1, %%"FF_REG_a"), %%mm0 \n\t" // R BGR BGR B - MOVNTQ" %%mm1, 8(%2, %%"FF_REG_a")\n\t" // B RGB RGB R - "movq 16(%1, %%"FF_REG_a"), %%mm1 \n\t" // GR BGR BGR - "movq 18(%1, %%"FF_REG_a"), %%mm2 \n\t" // BGR BGR BG - "pand %%mm6, %%mm0 \n\t" - "pand %%mm7, %%mm1 \n\t" - "pand %%mm5, %%mm2 \n\t" - "por %%mm0, %%mm1 \n\t" - "por %%mm2, %%mm1 \n\t" - MOVNTQ" %%mm1, 16(%2, %%"FF_REG_a") \n\t" - "add $24, %%"FF_REG_a" \n\t" - " js 1b \n\t" - "2: \n\t" - : "+a" (mmx_size) - : "r" (src-mmx_size), "r"(dst-mmx_size) - NAMED_CONSTRAINTS_ADD(mask24r,mask24g,mask24b) - ); - - __asm__ volatile(SFENCE:::"memory"); - __asm__ volatile(EMMS:::"memory"); - - if (mmx_size==23) return; //finished, was multiple of 8 - - src+= src_size; - dst+= src_size; - src_size= 23-mmx_size; - src-= src_size; - dst-= src_size; - for (i=0; i>1; - for (y=0; y>1; - for (y=0; y>1; - for (y=0; y>2; - dst[2*x+2]= ( src[x] + 3*src[x+1])>>2; - } - dst[2*srcWidth-1]= src[srcWidth-1]; - - dst+= dstStride; - - for (y=1; y> 2; - dst[dstStride] = (src[0] + 3 * src[srcStride]) >> 2; - } - - for (x=mmxSize-1; x>2; - dst[2*x+dstStride+2]= ( src[x+0] + 3*src[x+srcStride+1])>>2; - dst[2*x+dstStride+1]= ( src[x+1] + 3*src[x+srcStride ])>>2; - dst[2*x +2]= (3*src[x+1] + src[x+srcStride ])>>2; - } - dst[srcWidth*2 -1 ]= (3*src[srcWidth-1] + src[srcWidth-1 + srcStride])>>2; - dst[srcWidth*2 -1 + dstStride]= ( src[srcWidth-1] + 3*src[srcWidth-1 + srcStride])>>2; - - dst+=dstStride*2; - src+=srcStride; - } - - // last line - dst[0]= src[0]; - - for (x=0; x>2; - dst[2*x+2]= ( src[x] + 3*src[x+1])>>2; - } - dst[2*srcWidth-1]= src[srcWidth-1]; - - __asm__ volatile(EMMS" \n\t" - SFENCE" \n\t" - :::"memory"); -} - -/** - * Height should be a multiple of 2 and width should be a multiple of 2. - * (If this is a problem for anyone then tell me, and I will fix it.) - * Chrominance data is only taken from every second line, - * others are ignored in the C version. - * FIXME: Write HQ version. - */ -#if HAVE_7REGS -static inline void RENAME(rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst, - int width, int height, - int lumStride, int chromStride, int srcStride, - int32_t *rgb2yuv) -{ -#define BGR2Y_IDX "16*4+16*32" -#define BGR2U_IDX "16*4+16*33" -#define BGR2V_IDX "16*4+16*34" - int y; - const x86_reg chromWidth= width>>1; - - if (height > 2) { - ff_rgb24toyv12_c(src, ydst, udst, vdst, width, 2, lumStride, chromStride, srcStride, rgb2yuv); - src += 2*srcStride; - ydst += 2*lumStride; - udst += chromStride; - vdst += chromStride; - height -= 2; - } - - for (y=0; y= 16) { - if (!((((intptr_t)src1) | ((intptr_t)src2) | ((intptr_t)dest))&15)) { - __asm__( - "xor %%"FF_REG_a", %%"FF_REG_a" \n\t" - "1: \n\t" - PREFETCH" 64(%1, %%"FF_REG_a") \n\t" - PREFETCH" 64(%2, %%"FF_REG_a") \n\t" - "movdqa (%1, %%"FF_REG_a"), %%xmm0 \n\t" - "movdqa (%1, %%"FF_REG_a"), %%xmm1 \n\t" - "movdqa (%2, %%"FF_REG_a"), %%xmm2 \n\t" - "punpcklbw %%xmm2, %%xmm0 \n\t" - "punpckhbw %%xmm2, %%xmm1 \n\t" - "movntdq %%xmm0, (%0, %%"FF_REG_a", 2) \n\t" - "movntdq %%xmm1, 16(%0, %%"FF_REG_a", 2) \n\t" - "add $16, %%"FF_REG_a" \n\t" - "cmp %3, %%"FF_REG_a" \n\t" - " jb 1b \n\t" - ::"r"(dest), "r"(src1), "r"(src2), "r" ((x86_reg)width-15) - : "memory", XMM_CLOBBERS("xmm0", "xmm1", "xmm2",) "%"FF_REG_a - ); - } else - __asm__( - "xor %%"FF_REG_a", %%"FF_REG_a" \n\t" - "1: \n\t" - PREFETCH" 64(%1, %%"FF_REG_a") \n\t" - PREFETCH" 64(%2, %%"FF_REG_a") \n\t" - "movq (%1, %%"FF_REG_a"), %%mm0 \n\t" - "movq 8(%1, %%"FF_REG_a"), %%mm2 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm2, %%mm3 \n\t" - "movq (%2, %%"FF_REG_a"), %%mm4 \n\t" - "movq 8(%2, %%"FF_REG_a"), %%mm5 \n\t" - "punpcklbw %%mm4, %%mm0 \n\t" - "punpckhbw %%mm4, %%mm1 \n\t" - "punpcklbw %%mm5, %%mm2 \n\t" - "punpckhbw %%mm5, %%mm3 \n\t" - MOVNTQ" %%mm0, (%0, %%"FF_REG_a", 2) \n\t" - MOVNTQ" %%mm1, 8(%0, %%"FF_REG_a", 2) \n\t" - MOVNTQ" %%mm2, 16(%0, %%"FF_REG_a", 2) \n\t" - MOVNTQ" %%mm3, 24(%0, %%"FF_REG_a", 2) \n\t" - "add $16, %%"FF_REG_a" \n\t" - "cmp %3, %%"FF_REG_a" \n\t" - " jb 1b \n\t" - ::"r"(dest), "r"(src1), "r"(src2), "r" ((x86_reg)width-15) - : "memory", "%"FF_REG_a - ); - - } - for (w= (width&(~15)); w < width; w++) { - dest[2*w+0] = src1[w]; - dest[2*w+1] = src2[w]; - } - dest += dstStride; - src1 += src1Stride; - src2 += src2Stride; - } - __asm__( - EMMS" \n\t" - SFENCE" \n\t" - ::: "memory" - ); -} -#endif /* !COMPILE_TEMPLATE_AVX && COMPILE_TEMPLATE_SSE2 */ - -#if !COMPILE_TEMPLATE_SSE2 -static inline void RENAME(vu9_to_vu12)(const uint8_t *src1, const uint8_t *src2, - uint8_t *dst1, uint8_t *dst2, - int width, int height, - int srcStride1, int srcStride2, - int dstStride1, int dstStride2) -{ - x86_reg x, y; - int w,h; - w=width/2; h=height/2; - __asm__ volatile( - PREFETCH" %0 \n\t" - PREFETCH" %1 \n\t" - ::"m"(*(src1+srcStride1)),"m"(*(src2+srcStride2)):"memory"); - for (y=0;y>1); - uint8_t* d=dst1+dstStride1*y; - x=0; - for (;x>1); - uint8_t* d=dst2+dstStride2*y; - x=0; - for (;x>2); - const uint8_t* vp=src3+srcStride3*(y>>2); - uint8_t* d=dst+dstStride*y; - x=0; - for (;x>1; - dst1[count]= (src0[4*count+2]+src1[4*count+2])>>1; - count++; - } -} - -static void RENAME(extract_odd2)(const uint8_t *src, uint8_t *dst0, uint8_t *dst1, x86_reg count) -{ - dst0+= count; - dst1+= count; - src += 4*count; - count= - count; - if(count <= -8) { - count += 7; - __asm__ volatile( - "pcmpeqw %%mm7, %%mm7 \n\t" - "psrlw $8, %%mm7 \n\t" - "1: \n\t" - "movq -28(%1, %0, 4), %%mm0 \n\t" - "movq -20(%1, %0, 4), %%mm1 \n\t" - "movq -12(%1, %0, 4), %%mm2 \n\t" - "movq -4(%1, %0, 4), %%mm3 \n\t" - "psrlw $8, %%mm0 \n\t" - "psrlw $8, %%mm1 \n\t" - "psrlw $8, %%mm2 \n\t" - "psrlw $8, %%mm3 \n\t" - "packuswb %%mm1, %%mm0 \n\t" - "packuswb %%mm3, %%mm2 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm2, %%mm3 \n\t" - "psrlw $8, %%mm0 \n\t" - "psrlw $8, %%mm2 \n\t" - "pand %%mm7, %%mm1 \n\t" - "pand %%mm7, %%mm3 \n\t" - "packuswb %%mm2, %%mm0 \n\t" - "packuswb %%mm3, %%mm1 \n\t" - MOVNTQ" %%mm0,- 7(%3, %0) \n\t" - MOVNTQ" %%mm1,- 7(%2, %0) \n\t" - "add $8, %0 \n\t" - " js 1b \n\t" - : "+r"(count) - : "r"(src), "r"(dst0), "r"(dst1) - ); - count -= 7; - } - src++; - while(count<0) { - dst0[count]= src[4*count+0]; - dst1[count]= src[4*count+2]; - count++; - } -} - -static void RENAME(extract_odd2avg)(const uint8_t *src0, const uint8_t *src1, uint8_t *dst0, uint8_t *dst1, x86_reg count) -{ - dst0 += count; - dst1 += count; - src0 += 4*count; - src1 += 4*count; - count= - count; -#ifdef PAVGB - if(count <= -8) { - count += 7; - __asm__ volatile( - "pcmpeqw %%mm7, %%mm7 \n\t" - "psrlw $8, %%mm7 \n\t" - "1: \n\t" - "movq -28(%1, %0, 4), %%mm0 \n\t" - "movq -20(%1, %0, 4), %%mm1 \n\t" - "movq -12(%1, %0, 4), %%mm2 \n\t" - "movq -4(%1, %0, 4), %%mm3 \n\t" - PAVGB" -28(%2, %0, 4), %%mm0 \n\t" - PAVGB" -20(%2, %0, 4), %%mm1 \n\t" - PAVGB" -12(%2, %0, 4), %%mm2 \n\t" - PAVGB" - 4(%2, %0, 4), %%mm3 \n\t" - "psrlw $8, %%mm0 \n\t" - "psrlw $8, %%mm1 \n\t" - "psrlw $8, %%mm2 \n\t" - "psrlw $8, %%mm3 \n\t" - "packuswb %%mm1, %%mm0 \n\t" - "packuswb %%mm3, %%mm2 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm2, %%mm3 \n\t" - "psrlw $8, %%mm0 \n\t" - "psrlw $8, %%mm2 \n\t" - "pand %%mm7, %%mm1 \n\t" - "pand %%mm7, %%mm3 \n\t" - "packuswb %%mm2, %%mm0 \n\t" - "packuswb %%mm3, %%mm1 \n\t" - MOVNTQ" %%mm0,- 7(%4, %0) \n\t" - MOVNTQ" %%mm1,- 7(%3, %0) \n\t" - "add $8, %0 \n\t" - " js 1b \n\t" - : "+r"(count) - : "r"(src0), "r"(src1), "r"(dst0), "r"(dst1) - ); - count -= 7; - } -#endif - src0++; - src1++; - while(count<0) { - dst0[count]= (src0[4*count+0]+src1[4*count+0])>>1; - dst1[count]= (src0[4*count+2]+src1[4*count+2])>>1; - count++; - } -} - -static void RENAME(yuyvtoyuv420)(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, const uint8_t *src, - int width, int height, - int lumStride, int chromStride, int srcStride) -{ - int y; - const int chromWidth = AV_CEIL_RSHIFT(width, 1); - - for (y=0; y