From patchwork Wed Nov 2 20:34:23 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Cadhalpun X-Patchwork-Id: 1264 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.90.1 with SMTP id o1csp458828vsb; Wed, 2 Nov 2016 13:34:34 -0700 (PDT) X-Received: by 10.194.95.131 with SMTP id dk3mr958233wjb.207.1478118874146; Wed, 02 Nov 2016 13:34:34 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id sl19si4689615wjb.283.2016.11.02.13.34.33; Wed, 02 Nov 2016 13:34:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@googlemail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE dis=NONE) header.from=googlemail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B176F689C39; Wed, 2 Nov 2016 22:34:28 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 47490689AF1 for ; Wed, 2 Nov 2016 22:34:22 +0200 (EET) Received: by mail-wm0-f67.google.com with SMTP id c17so4766070wmc.3 for ; Wed, 02 Nov 2016 13:34:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=from:to:subject:message-id:date:user-agent:mime-version :content-transfer-encoding; bh=GqYMh7vtOP+w31YT70kuYuCR1We2keW23AOVjIk9Mkk=; b=D6cmf42YsUWlcI8ZUduAG3cJggth8D3Hrxx7WkQB61wtI08d7S5n0bcFoqFaShxmrv OET4scwVqtT7Bovm2C6FRBtMyQj1aX5qoXc6zu2vUtWPQt78O6vgI0s+o/zeDAbzIsjT /CmJydeSbtUx6AW0j2diMMfHFU2M2ceH8RlT6r7BoxhUrn5DLcWgUioZdQ9gavAMy0yC 3FIwK8ki4Bxpz88lcQQXy3gPg3JwrHYm1ljHQorEt0+9GS6YNosX6gPoFh1aY3Wy5o4u MurOfJfKicGI9/p4z0O/jqAtgNzqe/vavpOZhq61WDPzcLLhAaiUouSQsrVMCdTbc4Q+ dF6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:message-id:date:user-agent :mime-version:content-transfer-encoding; bh=GqYMh7vtOP+w31YT70kuYuCR1We2keW23AOVjIk9Mkk=; b=Bk0XDATvo8RpFu53R87pckPUvnpv2KJRdzDVgfo2kdtpUk73QQulyWMWYEFFCMEZtg Wx4cLgMJ2vCvd0uzS+3kj9iQWOmoV+68xX8n0ALHzTyz+9ZgMG0QlewQ1TJF4GrnCtv5 rhWhW0+um1PRLASR0sxXFIxqP4C/je5l0tdNCQYH/JFAZsEqCFvScjmErpXUKYNtOIj1 IpdKYPLptEnqaIc1EHY+5AqRn3gKUVfLz/q6UOPws5q3pz4cdLgPngwYgXrA6vgdUbps 6mCmJkKdU4BFZJJ8sbI82yfLhmvkF3o3QUELfPhAYkAaHUU7mDSEsUpFjxTCH1rm2RDj DsMQ== X-Gm-Message-State: ABUngveGMz5GDIUvvMZ6eSlPW24Onc6JpyHTYwqq7AFkDQ+2KMdBj/8h2dN/D8+ccP7tYg== X-Received: by 10.28.158.209 with SMTP id h200mr5275382wme.54.1478118865201; Wed, 02 Nov 2016 13:34:25 -0700 (PDT) Received: from [192.168.2.21] (p5B0722EB.dip0.t-ipconnect.de. [91.7.34.235]) by smtp.googlemail.com with ESMTPSA id 70sm38356700wmv.1.2016.11.02.13.34.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 02 Nov 2016 13:34:24 -0700 (PDT) From: Andreas Cadhalpun X-Google-Original-From: Andreas Cadhalpun To: FFmpeg development discussions and patches , libav development Message-ID: <968a0227-86bc-eb2d-ce72-3b7858bb2375@googlemail.com> Date: Wed, 2 Nov 2016 21:34:23 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] ppc: pixblockdsp: do unaligned block accesses correctly again X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This was broken by the following Libav commit: 4c387c7 ppc: dsputil: do unaligned block accesses correctly The following tests fail due to this: fate-checkasm fate-vsynth1-dnxhd-2k-hr-hq fate-vsynth1-dnxhd-edge1-hr fate-vsynth1-dnxhd-edge2-hr fate-vsynth1-dnxhd-edge3-hr fate-vsynth1-dnxhd-hr-sq-mov fate-vsynth1-dnxhd-hr-hq-mov fate-vsynth2-dnxhd-2k-hr-hq fate-vsynth2-dnxhd-edge1-hr fate-vsynth2-dnxhd-edge2-hr fate-vsynth2-dnxhd-edge3-hr fate-vsynth2-dnxhd-hr-sq-mov fate-vsynth2-dnxhd-hr-hq-mov fate-vsynth3-dnxhd-2k-hr-hq fate-vsynth3-dnxhd-edge1-hr fate-vsynth3-dnxhd-edge2-hr fate-vsynth3-dnxhd-edge3-hr fate-vsynth3-dnxhd-hr-sq-mov fate-vsynth3-dnxhd-hr-hq-mov Fixes trac ticket #5508. Signed-off-by: Andreas Cadhalpun --- Tested with qemu on ppc32be and ppc64be. --- libavcodec/ppc/pixblockdsp.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/libavcodec/ppc/pixblockdsp.c b/libavcodec/ppc/pixblockdsp.c index 84aa562..f3a5050 100644 --- a/libavcodec/ppc/pixblockdsp.c +++ b/libavcodec/ppc/pixblockdsp.c @@ -67,10 +67,10 @@ static void get_pixels_altivec(int16_t *restrict block, const uint8_t *pixels, ptrdiff_t line_size) { int i; - vec_u8 perm = vec_lvsl(0, pixels); const vec_u8 zero = (const vec_u8)vec_splat_u8(0); for (i = 0; i < 8; i++) { + vec_u8 perm = vec_lvsl(0, pixels); /* Read potentially unaligned pixels. * We're reading 16 pixels, and actually only want 8, * but we simply ignore the extras. */ @@ -157,8 +157,7 @@ static void diff_pixels_altivec(int16_t *restrict block, const uint8_t *s1, const uint8_t *s2, int stride) { int i; - vec_u8 perm1 = vec_lvsl(0, s1); - vec_u8 perm2 = vec_lvsl(0, s2); + vec_u8 perm; const vec_u8 zero = (const vec_u8)vec_splat_u8(0); vec_s16 shorts1, shorts2; @@ -166,17 +165,19 @@ static void diff_pixels_altivec(int16_t *restrict block, const uint8_t *s1, /* Read potentially unaligned pixels. * We're reading 16 pixels, and actually only want 8, * but we simply ignore the extras. */ + perm = vec_lvsl(0, s1); vec_u8 pixl = vec_ld(0, s1); vec_u8 pixr = vec_ld(15, s1); - vec_u8 bytes = vec_perm(pixl, pixr, perm1); + vec_u8 bytes = vec_perm(pixl, pixr, perm); // Convert the bytes into shorts. shorts1 = (vec_s16)vec_mergeh(zero, bytes); // Do the same for the second block of pixels. + perm = vec_lvsl(0, s2); pixl = vec_ld(0, s2); pixr = vec_ld(15, s2); - bytes = vec_perm(pixl, pixr, perm2); + bytes = vec_perm(pixl, pixr, perm); // Convert the bytes into shorts. shorts2 = (vec_s16)vec_mergeh(zero, bytes); @@ -197,17 +198,19 @@ static void diff_pixels_altivec(int16_t *restrict block, const uint8_t *s1, /* Read potentially unaligned pixels. * We're reading 16 pixels, and actually only want 8, * but we simply ignore the extras. */ + perm = vec_lvsl(0, s1); pixl = vec_ld(0, s1); pixr = vec_ld(15, s1); - bytes = vec_perm(pixl, pixr, perm1); + bytes = vec_perm(pixl, pixr, perm); // Convert the bytes into shorts. shorts1 = (vec_s16)vec_mergeh(zero, bytes); // Do the same for the second block of pixels. + perm = vec_lvsl(0, s2); pixl = vec_ld(0, s2); pixr = vec_ld(15, s2); - bytes = vec_perm(pixl, pixr, perm2); + bytes = vec_perm(pixl, pixr, perm); // Convert the bytes into shorts. shorts2 = (vec_s16)vec_mergeh(zero, bytes);