From patchwork Fri Apr 26 12:21:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: hu heng X-Patchwork-Id: 48268 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1509:b0:1a9:af23:56c1 with SMTP id nq9csp330180pzb; Fri, 26 Apr 2024 05:22:02 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWrzihnbP9Bgg62gfUqQ+2HAeMNUMtjJFi/QGhbdcSe7q5ZR3NaVO8RcDc3L0JKjuBCic3tNsNl8qXDp95OP9y8H3RgXQT6wCDEkA== X-Google-Smtp-Source: AGHT+IFqc8lfvZqrS2mrpM4d90Phy2M26iKLVqi6J+54RrFkagJbmRoRxt0Le/TpUHZ3EIwnHIms X-Received: by 2002:ac2:44c5:0:b0:51c:b73f:950 with SMTP id d5-20020ac244c5000000b0051cb73f0950mr1192820lfm.43.1714134122257; Fri, 26 Apr 2024 05:22:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714134122; cv=none; d=google.com; s=arc-20160816; b=BolkiI6Q6jWKGS26LBf6ZeI1Et1tqmfhmzIt2m+lWjehKN+eqUYiJ7vdopoo4yA7QP fpDvAjehgTmK+qKHQDVcVbNkEN945crO/HqkSigoiFw6esO8LKLWhdgGbdcm0qMW5Lwi vF+tjk01lt++dYgU1HAJGOtur3vqzCErpm/i4w3vlDl2XhCwEvARH8taCBX/G+TES0cR 0OIsLAhif8vUkwOKfNnmzu0h14iafCutAU+1qADKRGeCrjHyKElDKb14MZGpaqHAY3WB ModCWBKhDnEZKfXLtD9sOQxgw3MasoZ4v0F3Kh9TpCxBJck9ZEihfE4P8Okc9brsSz0M JfhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=0dmOu4UOCR3mmpB55EgLF16kMMugAGCRoVf+wMQWm3k=; fh=9Y2B2cxSPitxn6emCoBc2YcX/wxVcW8k+Amhx477zP8=; b=tnEALoea6FZgcyS3rcBzJyHlvYN2J3FYR2mYnnA6oTAPL8AvG3Dww5FlnZg6LNKoEs gLKWcdPzgUrTQpCiwLFyY9TcZ/SBdCaxCI8N0ZfzQgPmohzPm+9te+VxZZ4k04niuz2F Svi/hMJNv3UcvZknNzS+dPLK5NYO6dAhTDg1hBfGCcaEzIcTXkDs+faE4zU0A4tNEkUW jSCawn+ZWRppbM0waM+VeTRT1Kf+53UzSPp8OzuFcvVXKfgBeZAOtiO4VS/9a1MUnVL1 Y0zHZGah4PqJ5fApCRZjrOmBYCEr6csVSrqzPIRGaDlsoGq9p2bZnBJfk0RffD6Tlt1Y 75Ww==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="m/sUNtdp"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r15-20020a50aacf000000b00571b913721asi11551457edc.316.2024.04.26.05.22.01; Fri, 26 Apr 2024 05:22:02 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="m/sUNtdp"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8033368D41A; Fri, 26 Apr 2024 15:21:58 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5688768CFCE for ; Fri, 26 Apr 2024 15:21:51 +0300 (EEST) Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-6ee12766586so1614765b3a.0 for ; Fri, 26 Apr 2024 05:21:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714134109; x=1714738909; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=GLNMZ+pxr85pwq3l7ko7gn5f4pXMC91gac64ugsznbQ=; b=m/sUNtdplYeVuI2gt8rYTOTXvgLbk12QP0vtsVe96RV6I5KrVNsJuG7PCgF877we32 Q2eP8tdOfd3zQ6r2GDRLLnXJSWxLbG6q4HvwKDicO6iYb0Au8eX1txlRvlhDhPOFPQTM EppRyI0MdukwvaXKqJNWp5b0Jcbd3vOyqrDXWWdGcyMNRn+0nOokqUEgY2EaaP2VGUM9 lHQhcpUNZw0rSlrWDc1mR9AjjhezrsH2mfFMdX8ssw9iB3Q3pfN9agY6k89TzclT8Wm+ kfY+GcIgIMxzKSzy5HMlYKNeEkd3LhEn4ecZToYxVteRH8I3oglZ23jBoSky2IiwQqRP UrtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714134109; x=1714738909; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=GLNMZ+pxr85pwq3l7ko7gn5f4pXMC91gac64ugsznbQ=; b=vGDDQrAnmEEy/odKYzFVdpIsfQHPe8E55KN14T+9hNdvKwF0UTEx1A7RtMa+DJfw8L 3MgwazzCeK49GVoRgpyqHcrMm7aP0h23PrPNiiEpgoqntbGJIlkFbk/XAOCc46kNvSN7 8y12oWiEV6YQBq6WQuHAjXiAo/VDDNlB5pnDqvB0U/2DGD769B36ZrizPWolTcpHWi6n hpBgDGdpSU1F1SVmzQJx0eLjJjJiuLJHBzfgQU/OoXZ+ltSI+frNcn4WQgeGf2WSsbc1 nKSSNG7x4hPw8LFDebAMHTtkFNkLSoRJKKWJrfv56KduD0l8HYSyKQZu923cRtX8LtCE boHg== X-Gm-Message-State: AOJu0Yyytgv0rdlPi1WSZEgLPK7YFJ3hf9xH5Piw7i7vj2eZhIqC/KNu 3wlUktaOoTL2hRMkV2geF+W4JtHPR1Ypxxw95meoYOHCa2EeewXmIKAB/tA7 X-Received: by 2002:a05:6a00:3a14:b0:6ec:fa34:34ab with SMTP id fj20-20020a056a003a1400b006ecfa3434abmr3510080pfb.9.1714134109321; Fri, 26 Apr 2024 05:21:49 -0700 (PDT) Received: from C02G649AMD6R.bytedance.net ([203.208.167.149]) by smtp.gmail.com with ESMTPSA id bw20-20020a056a02049400b005f8004f613asm11090612pgb.39.2024.04.26.05.21.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Apr 2024 05:21:49 -0700 (PDT) From: heng.hu.1989@gmail.com To: ffmpeg-devel@ffmpeg.org Date: Fri, 26 Apr 2024 20:21:42 +0800 Message-Id: <20240426122142.60282-1-heng.hu.1989@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v1] scale: Bring back the old yuv2yuvX, use it when disable-x86asm. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: huheng Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: eXA0+hm31aH4 From: huheng rename old inline yuv2yuvX to yuv2yuv_X, to avoid conflicts with the names of standalone asm functions. When ffmpeg is compiled with --disable-x86asm, using the scale function will cause the video to be blurred. The reason is that when disable-x86asm, INLINE_MMXEXT is 1 and use_mmx_vfilter is 1, but c->yuv2planeX uses the c language version, which causes a problem of mismatch with the vfilter. This problem has persisted from version 4.4 to the present. Fix it by using inline yuv2yuv_X_mmxext, that can maintain the consistency of use_mmx_vfilter. reproduce the issue: 1. ./configure --disable-x86asm --enable-gpl --enable-libx264 2. ./ffmpeg -i input.mp4 -vf "scale=1280x720" -c:v libx264 output.mp4 the output.mp4 is abnormal Signed-off-by: huheng --- libswscale/x86/swscale.c | 6 +++- libswscale/x86/swscale_template.c | 53 +++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index ff16398988..1bb9d1d51a 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -452,8 +452,12 @@ av_cold void ff_sws_init_swscale_x86(SwsContext *c) int cpu_flags = av_get_cpu_flags(); #if HAVE_MMXEXT_INLINE - if (INLINE_MMXEXT(cpu_flags)) + if (INLINE_MMXEXT(cpu_flags)) { sws_init_swscale_mmxext(c); + if (c->use_mmx_vfilter && !(c->flags & SWS_ACCURATE_RND)) { + c->yuv2planeX = yuv2yuv_X_mmxext; + } + } #endif if(c->use_mmx_vfilter && !(c->flags & SWS_ACCURATE_RND)) { #if HAVE_MMXEXT_EXTERNAL diff --git a/libswscale/x86/swscale_template.c b/libswscale/x86/swscale_template.c index 6190fcb4fe..1b8794480d 100644 --- a/libswscale/x86/swscale_template.c +++ b/libswscale/x86/swscale_template.c @@ -33,6 +33,59 @@ #define MOVNTQ2 "movntq " #define MOVNTQ(a,b) REAL_MOVNTQ(a,b) +static void RENAME(yuv2yuv_X)(const int16_t *filter, int filterSize, + const int16_t **src, uint8_t *dest, int dstW, + const uint8_t *dither, int offset) +{ + filterSize--; + __asm__ volatile( + "movd %0, %%mm1\n\t" + "punpcklwd %%mm1, %%mm1\n\t" + "punpckldq %%mm1, %%mm1\n\t" + "psllw $3, %%mm1\n\t" + "paddw %%mm1, %%mm3\n\t" + "paddw %%mm1, %%mm4\n\t" + "psraw $4, %%mm3\n\t" + "psraw $4, %%mm4\n\t" + ::"m"(filterSize) + ); + + __asm__ volatile(\ + "movq %%mm3, %%mm6\n\t" + "movq %%mm4, %%mm7\n\t" + "movl %3, %%ecx\n\t" + "mov %0, %%"FF_REG_d" \n\t"\ + "mov (%%"FF_REG_d"), %%"FF_REG_S" \n\t"\ + ".p2align 4 \n\t" /* FIXME Unroll? */\ + "1: \n\t"\ + "movq 8(%%"FF_REG_d"), %%mm0 \n\t" /* filterCoeff */\ + "movq (%%"FF_REG_S", %%"FF_REG_c", 2), %%mm2 \n\t" /* srcData */\ + "movq 8(%%"FF_REG_S", %%"FF_REG_c", 2), %%mm5 \n\t" /* srcData */\ + "add $16, %%"FF_REG_d" \n\t"\ + "mov (%%"FF_REG_d"), %%"FF_REG_S" \n\t"\ + "test %%"FF_REG_S", %%"FF_REG_S" \n\t"\ + "pmulhw %%mm0, %%mm2 \n\t"\ + "pmulhw %%mm0, %%mm5 \n\t"\ + "paddw %%mm2, %%mm3 \n\t"\ + "paddw %%mm5, %%mm4 \n\t"\ + " jnz 1b \n\t"\ + "psraw $3, %%mm3 \n\t"\ + "psraw $3, %%mm4 \n\t"\ + "packuswb %%mm4, %%mm3 \n\t" + MOVNTQ2 " %%mm3, (%1, %%"FF_REG_c")\n\t" + "add $8, %%"FF_REG_c" \n\t"\ + "cmp %2, %%"FF_REG_c" \n\t"\ + "movq %%mm6, %%mm3\n\t" + "movq %%mm7, %%mm4\n\t" + "mov %0, %%"FF_REG_d" \n\t"\ + "mov (%%"FF_REG_d"), %%"FF_REG_S" \n\t"\ + "jb 1b \n\t"\ + :: "g" (filter), + "r" (dest-offset), "g" ((x86_reg)(dstW+offset)), "m" (offset) + : "%"FF_REG_d, "%"FF_REG_S, "%"FF_REG_c + ); +} + #define YSCALEYUV2PACKEDX_UV \ __asm__ volatile(\ "xor %%"FF_REG_a", %%"FF_REG_a" \n\t"\