From patchwork Mon Mar 14 21:06:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marton Balint X-Patchwork-Id: 34735 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:3486:0:0:0:0 with SMTP id ek6csp2497215nkb; Mon, 14 Mar 2022 14:06:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy3fyqSEtkG7Kh+7+yLkgmz0BC9hTDhiIc5zbRGGmbl7BSXZqGj4HYJu0l9x1s1HzKFn8ca X-Received: by 2002:a50:fb93:0:b0:416:c4f:bd24 with SMTP id e19-20020a50fb93000000b004160c4fbd24mr22226940edq.225.1647291980022; Mon, 14 Mar 2022 14:06:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647291980; cv=none; d=google.com; s=arc-20160816; b=AYcgrT0sNr3JS91mbzjX4TcWHl9rkVk8INW38NVUwMZ9wQyiyUOQ01Nlcj66S/PNI8 bR/qpjmcBy03syzZE4vamcdijDBP6TERiS2BPkVPn6UvZHFpFAOXFAwmuSv3bJ5+/W50 zm53SHlfegggD1tp9EtUPqhwtSKn02TOA+JNtADbPpNsKbT5ZsnXvQEMVat8tg1GEE1e 05CorksZIayp7Te6jGHgGZaXwvl8awuglg1C1a8bK77GYSaEMW/N4vxHU5GrOAODJ2l2 0Mz8CCtNzTSNmqnhGl/Dp/M23NMin6ki/lju/UUraXAxG2kETzPiwLs5xsg1xjXEZlHd CIsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=qOGhilYE4LygKqVYTfWqgNZ2ij+VC657ngYdf+p37CU=; b=Q973OY1YYhloJLXsXBub6UIUus1gOhF5QtU6Q//PLFBjvkAAt0ky5MBbLdNACMXIQZ B26Se7pfyvnYd/83fyw9KAWl1l/XaWiZ16yEf4A2YaExTxPYv3q4g7cjVrzmni6RgLo+ yeLdq6QTcXpjwAhc5TdX3NTBgTFUwQXesW59LCZgip+FTXOONYaK/XxQmoaI8rlQyXMV 2nlKWWjisFb4wNY4dFXQ7z6QLPvQw5waiUb07we8dD4c/+vxZsnlMkPPhJkgmqI4hTTU IZypgYLTZu6JOxZdtGqp31Dg4SeNFhbohXpQD4Q4CXqip+b8a+8zo1i4E1yAhTeDo/FK xJpA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id w5-20020aa7dcc5000000b00415b06c4cd4si10518287edu.205.2022.03.14.14.06.19; Mon, 14 Mar 2022 14:06:20 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 45C3C68B073; Mon, 14 Mar 2022 23:06:15 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from iq.passwd.hu (iq.passwd.hu [217.27.212.140]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 533E86881C8 for ; Mon, 14 Mar 2022 23:06:09 +0200 (EET) Received: from localhost (localhost [127.0.0.1]) by iq.passwd.hu (Postfix) with ESMTP id 8E748E67E2; Mon, 14 Mar 2022 22:06:09 +0100 (CET) X-Virus-Scanned: amavisd-new at passwd.hu Received: from iq.passwd.hu ([127.0.0.1]) by localhost (iq.passwd.hu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XOCN7C-AgoAC; Mon, 14 Mar 2022 22:06:08 +0100 (CET) Received: from bluegene.passwd.hu (localhost [127.0.0.1]) by iq.passwd.hu (Postfix) with ESMTP id 15258E67D9; Mon, 14 Mar 2022 22:06:08 +0100 (CET) From: Marton Balint To: ffmpeg-devel@ffmpeg.org Date: Mon, 14 Mar 2022 22:06:00 +0100 Message-Id: <20220314210603.23870-1-cus@passwd.hu> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/4] avfilter/x86/vf_blend: use unaligned movs for output X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Marton Balint Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: pSM400Rs4fu8 Fixes crashes with: ffmpeg -f lavfi -i allyuv=d=1 -vf tblend=difference128,pad=5000:ih:1 -f null x Signed-off-by: Marton Balint --- libavfilter/x86/vf_blend.asm | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/libavfilter/x86/vf_blend.asm b/libavfilter/x86/vf_blend.asm index 766e5b7bc1..277b100e4d 100644 --- a/libavfilter/x86/vf_blend.asm +++ b/libavfilter/x86/vf_blend.asm @@ -75,7 +75,7 @@ BLEND_INIT %1, 2, %3 movu m0, [topq + xq] movu m1, [bottomq + xq] p%2 m0, m1 - mova [dstq + xq], m0 + movu [dstq + xq], m0 add xq, mmsize jl .loop BLEND_END @@ -108,7 +108,7 @@ BLEND_INIT %1, 6, %4 packus%3%2 m0, m1 - mova [dstq + xq], m0 + movu [dstq + xq], m0 add xq, mmsize jl .loop BLEND_END @@ -148,7 +148,7 @@ BLEND_INIT multiply, 6 MULTIPLY m1, m3, m5 packuswb m0, m1 - mova [dstq + xq], m0 + movu [dstq + xq], m0 add xq, mmsize jl .loop BLEND_END @@ -175,7 +175,7 @@ BLEND_INIT screen, 7 SCREEN m1, m3, m5, m6 packuswb m0, m1 - mova [dstq + xq], m0 + movu [dstq + xq], m0 add xq, mmsize jl .loop BLEND_END @@ -196,7 +196,7 @@ BLEND_INIT %1, 3, %3 pxor m1, m2 pavg%2 m0, m1 pxor m0, m2 - mova [dstq + xq], m0 + movu [dstq + xq], m0 add xq, mmsize jl .loop BLEND_END @@ -230,7 +230,7 @@ BLEND_INIT %1, 6, %4 packus%3%2 m0, m1 - mova [dstq + xq], m0 + movu [dstq + xq], m0 add xq, mmsize jl .loop BLEND_END @@ -251,7 +251,7 @@ BLEND_INIT hardmix, 5 pxor m0, m3 pcmpgtb m1, m0 pxor m1, m2 - mova [dstq + xq], m1 + movu [dstq + xq], m1 add xq, mmsize jl .loop BLEND_END @@ -304,7 +304,7 @@ BLEND_INIT %1, 4, %3 mova m2, m3 psubus%2 m2, m1 paddus%2 m2, m0 - mova [dstq + xq], m2 + movu [dstq + xq], m2 add xq, mmsize jl .loop BLEND_END @@ -333,7 +333,7 @@ BLEND_INIT %1, 5, %4 ABS2 m0, m3, m1, m4 %endif packus%3%2 m0, m3 - mova [dstq + xq], m0 + movu [dstq + xq], m0 add xq, mmsize jl .loop BLEND_END @@ -369,7 +369,7 @@ BLEND_INIT %1, 8, %4 ABS2 m3, m7, m1, m6 %endif packus%3%2 m3, m7 - mova [dstq + xq], m3 + movu [dstq + xq], m3 add xq, mmsize jl .loop BLEND_END @@ -406,7 +406,7 @@ BLEND_INIT %1, 8, %4 psub%3 m0, m4, m3 psub%3 m1, m4, m7 packus%3%2 m0, m1 - mova [dstq + xq], m0 + movu [dstq + xq], m0 add xq, mmsize jl .loop BLEND_END