From patchwork Fri Apr 27 14:47:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jerome Borsboom X-Patchwork-Id: 8668 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:a02:155:0:0:0:0:0 with SMTP id c82-v6csp717235jad; Fri, 27 Apr 2018 07:47:23 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoxCj8zf895lzwUrmHUOg61GtYBfaZ/jq6gksCzBAVoDujwxCqQ7BuiCAs/RL3DYGOAT8KV X-Received: by 10.28.36.139 with SMTP id k133mr1539185wmk.38.1524840443193; Fri, 27 Apr 2018 07:47:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524840443; cv=none; d=google.com; s=arc-20160816; b=a6FcPju+nWgDuTU8y9FQ1uHsXuAVAQsjNMZ07W3oDeAjZj3Shl6qhJS9WP4Q7Jg0lm IOsMsrjAZek/seOLmxNGU9te92Jga3sjH+wDXUlRc3+4UDLbeS/f1DRHrTJK7NGsASR9 kV0bYhZWuJ/W1EgwgHXuh3qmlMr9IBesyIVO25VuDgZRv2d0J2dcCRF4AnV/jtrYdHqj 30esQgzAbqCbgTG/SxHF4TLUUQp2lJsPNYeQG5zQkSZ9xjsRbYhhurFyqOscqgOBI+QR nuwuHuT4nGSwahUAPna7WWeTtboONnTRVMCfT2FIXu8FBs3PprmbBKGFQhszq8SInbQc Sj9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:content-language:mime-version:user-agent:date :message-id:from:to:dkim-signature:delivered-to :arc-authentication-results; bh=nDy/gy/LgIHZIzm2CR28aw3VXmBwwRnX4E2XLYxSKUY=; b=NsaZ5YQVnMDWWbgZw432PYAAny2sVMbyC8hPLE5mTW+lqXy4PgJhJeUPBVVCl+Ojrk 4vVhaa25oF4GamdXvDAcW/GxEZsU5ikD5Bxy50a8ZqwQw3pqaY6k+EyhwwSDsw+sDsLA jUmkxMfOJfObl/lzv+8MKCDvJj23aaws89lgM1mE8btsXGOgrBFErUxfW7p4N2LcVu63 Vo4xpsnJ4VlOj+b7dh2/q10NGWI661dnng5j5QBTy8Lh9Gp/LSw6AglpVF/ZZU6OXzZe jHX+nSgDYwYqREQxDoaS1WGyAvcH6pwVBqp5g8uDMdfBI2sBbNvLgjUEeJvr7m4kZ74K gsyA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@carpalis.nl header.s=default header.b=KGwzOsRe; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id n196si985886wmd.33.2018.04.27.07.47.22; Fri, 27 Apr 2018 07:47:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@carpalis.nl header.s=default header.b=KGwzOsRe; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A737768A628; Fri, 27 Apr 2018 17:46:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from kyoto.xs4all.nl (kyoto.xs4all.nl [83.161.153.34]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9BA0B68A3D1 for ; Fri, 27 Apr 2018 17:46:43 +0300 (EEST) Received: from [IPv6:2001:980:9507:0:8e70:5aff:fec6:83fc] ([IPv6:2001:980:9507:0:8e70:5aff:fec6:83fc]) (authenticated bits=0) by kyoto.xs4all.nl (8.14.7/8.14.7) with ESMTP id w3RElDqw026639 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 27 Apr 2018 16:47:13 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=carpalis.nl; s=default; t=1524840433; bh=LGnt73WSttC4PWLBjdvJOCXi2MyT2tyJJqr3JVdako4=; h=To:From:Subject:Date; b=KGwzOsRe1MvbJpTqchvUXESR9rVMP4YQZKnROpRmFM2mrrrDWLwtG2VmHZFsiACgy Ic1FRFPvlXbSAFaLcGS82EoVYW2FXFSAAhSQfyJUxLRopag5pvpKMU/LvH4SYUMLLe YVPLZ1GjOIt5GM0Sfo4QvcVo2LxjuLP6tNTDG0s69Wiui5/SYobiRbFnxr5bMAuQhc DY4kHqitCLsCaiCI1B0iIlSFlT0fBTZtew1w7BF6HM4FXj63pPxP54Rtt4APiDk+8J x8jGcFwVCZERfuxME7t0VZXpBYeyKakdf8sj2FCqiEfLhS8Yy9RFtOmlWnljoSPu7z S7xNu1Q6HfPMg== To: ffmpeg-devel@ffmpeg.org From: Jerome Borsboom Message-ID: <150b1c38-70a1-a796-5ff7-9e07d54d96f2@carpalis.nl> Date: Fri, 27 Apr 2018 16:47:13 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 Content-Language: nl Subject: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The assembly optimized half pel interpolation in some cases rounds the interpolated value when no rounding is requested. The result is a off by one error when one of the pixel values is zero. Signed-off-by: Jerome Borsboom --- In the put_no_rnd_pixels functions, the psubusb instruction subtracts one from each unsigned byte to correct for the rouding that the PAVGB instruction performs. The psubusb instruction, however, uses saturation when the value does not fit in the operand type, i.e. an unsigned byte. In this particular case, this means that when the value of a pixel is 0, the psubusb instruction will return 0 instead of -1 as this value does not fit in an unsigned byte and is saturated to 0. The result is that the interpolated value is not corrected for the rounding that PAVGB performs and that the result will be off by one. The corrections below solved the issues for me, but I do not a lot of experience in optimizing assembly. A good check for the correctness of the solution might be advisable. Furthermore, I have not checked the other assembly, but there may be more cases where the psubusb instruction does not provide the desired results. A good check by the owner/maintainer of the assembly code might be appropriate. libavcodec/x86/hpeldsp.asm | 38 ++++++++++++++++++++++++++++++++------ 1 file changed, 32 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm index ce5d7a4e28..bae2ba9880 100644 --- a/libavcodec/x86/hpeldsp.asm +++ b/libavcodec/x86/hpeldsp.asm @@ -145,10 +145,16 @@ cglobal put_no_rnd_pixels8_x2, 4,5 mova m1, [r1+1] mova m3, [r1+r2+1] add r1, r4 - psubusb m0, m6 - psubusb m2, m6 + mova m4, m0 + pxor m4, m1 + pand m4, m6 PAVGB m0, m1 + psubb m0, m4 + mova m4, m2 + pxor m4, m3 + pand m4, m6 PAVGB m2, m3 + psubb m2, m4 mova [r0], m0 mova [r0+r2], m2 mova m0, [r1] @@ -157,10 +163,16 @@ cglobal put_no_rnd_pixels8_x2, 4,5 mova m3, [r1+r2+1] add r0, r4 add r1, r4 - psubusb m0, m6 - psubusb m2, m6 + mova m4, m0 + pxor m4, m1 + pand m4, m6 PAVGB m0, m1 + psubb m0, m4 + mova m4, m2 + pxor m4, m3 + pand m4, m6 PAVGB m2, m3 + psubb m2, m4 mova [r0], m0 mova [r0+r2], m2 add r0, r4 @@ -227,18 +239,32 @@ cglobal put_no_rnd_pixels8_y2, 4,5 mova m1, [r1+r2] mova m2, [r1+r4] add r1, r4 - psubusb m1, m6 + mova m3, m0 + pxor m3, m1 + pand m3, m6 PAVGB m0, m1 + psubb m0, m3 + mova m3, m1 + pxor m3, m2 + pand m3, m6 PAVGB m1, m2 + psubb m1, m3 mova [r0+r2], m0 mova [r0+r4], m1 mova m1, [r1+r2] mova m0, [r1+r4] add r0, r4 add r1, r4 - psubusb m1, m6 + mova m3, m2 + pxor m3, m1 + pand m3, m6 PAVGB m2, m1 + psubb m2, m3 + mova m3, m1 + pxor m3, m0 + pand m3, m6 PAVGB m1, m0 + psubb m1, m3 mova [r0+r2], m2 mova [r0+r4], m1 add r0, r4