From patchwork Fri Jun 9 11:41:05 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ivan Kalvachev X-Patchwork-Id: 3883 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.10.2 with SMTP id 2csp3286309vsk; Fri, 9 Jun 2017 04:41:19 -0700 (PDT) X-Received: by 10.28.72.212 with SMTP id v203mr7351931wma.90.1497008478984; Fri, 09 Jun 2017 04:41:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1497008478; cv=none; d=google.com; s=arc-20160816; b=eZfLR4nBmVutIvfoOmGlBUDJ48lVU7Y86FMc8B6gRvzqMzApC8DGmuIql7oWOUKuja n9fjJnCa6DV7ZA2uxXCYPZk5wKUr5imp9wpEeJ2AFOhjueRNTAfM/H9xjZHirKjSx7Lq 6u+oAP1VIk0yaWVE+5SSM5byt/MDQx3iyhFfkFlTChZW7L5uNCfI/TWcoPx1TCAQoxWX PnNl0UXVf9m3s18lP3pI/3fBUlYegpDsGHYvuQcJsC41WlRwLT5aim0LKhPKt3p/GbYd DVoHdXBs22cYVJLJDydS/ugQoxeqT10L2IXiYqF21V3n+09q6hHuyW45R1KU6V0NylTt DjOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:to:message-id:date:from:references:in-reply-to :mime-version:dkim-signature:delivered-to:arc-authentication-results; bh=ONhwhct6YLckB6f42+B8oVLR6n3tLpxMvIVZ16JX0KM=; b=x3u1WDZVEd5w94akY8OvkZxFwl+Rwc3s4qlMLn4BRi60f8fnrzEto2u9y7jfGuEYcl k8o8h0ui/1BK/NB8JpAHj7q4EgsunhtGzr8PHOSnSs5hUhIJtUEhwuKxHFt6LusRRjvc Q3m/D9/g8724RoTX87ami6vlQe0nhnU6z6dfU5GbF9m01waFKLjCcmnMtWaCDR8adBZZ G1xZWjeWGpl0f+bqZ7iZu2rn/5XwwR3TpHMucB8qHvGD0Jie2w2WtRq4QFRAHcBuWtE9 zO8u/tdKF1wd/QibBRw7esMnDjGlJXxjvoCIGrV+ubYPEviYH5Gu6+xw2FXSue4WE+Qr 1cLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 28si978622wrv.297.2017.06.09.04.41.18; Fri, 09 Jun 2017 04:41:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EAB90689E07; Fri, 9 Jun 2017 14:41:14 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg0-f65.google.com (mail-pg0-f65.google.com [74.125.83.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 26FBF689D88 for ; Fri, 9 Jun 2017 14:41:08 +0300 (EEST) Received: by mail-pg0-f65.google.com with SMTP id v18so7462535pgb.3 for ; Fri, 09 Jun 2017 04:41:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=wgKSoPdlny4mRbXdFB2BFa39e62MucVncQt/ZSG6qrU=; b=t73/eQjSjxnut8l6T6Nz9oaZH3C3P10yp351CuUTAGU9VmeMh/EutDUQ27Xs5Ix5Ai wo55j/DeWWNng7bMsJpOBvNQhnGg3TvvldRkJ6ZIIfGKnqbFXkFNleOGlJDiFqXCq3Kz 9n8OYgyW3tcUKhI83hhgnFUb9rH7xkh5quiUkcQjlck/VqPmRX+dP3Ky5OTaFF9ELLzE V4FfcTGKQbziGFq4r996FWoA8d3ZnWkGYsCB0B/p8s06ObM4U0dlq/Aq+d6mqn9A7GrA XRE3GSSDq7vygIM1JtBWdwVv8Tdbz79r/t0UVuVOx36XcH1L40BBLuOph3VHkldTGCvG 5x9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=wgKSoPdlny4mRbXdFB2BFa39e62MucVncQt/ZSG6qrU=; b=JAKUrZXxG0+O/7aUMGmvarh4+gdWebd23h5GAUSRFGzaltN7NjBc3kA/g3waVMns5h 37Y5GhK5NyO5lmKuZ2nD0R0cnG4z84Ajtbok7O9DN5ZH2QzW9DgAgAriFIbj6fmoYR12 e4Ln+hdS2dMQ2EnEZxA/HGZmWZUSC49oC2aBXz4wH6xGY9T4IXBeO9qWRLfKUzzcinwp /Szall1wB9CWERdohS3RkdUoosOHx2i4bL3PWgjYkcQruXvmvbUXSsbW5JBQWnmmQ7TU 8tHbHAdPiUpgEgGCLbeA8bBwR4ScZwk7z+EypPuhPjZWq4C7XDgEWbnyRB2CijgxXsMY s8sg== X-Gm-Message-State: AODbwcB5rIWeFQdLFyM6WdNzm+TlFAqQLWjXgkwI+fm+xAJMPQ9wjd2f vMtPE/Yg2xTQcjpm/fYo+nf227xSXA== X-Received: by 10.84.193.3 with SMTP id e3mr41219969pld.178.1497008466553; Fri, 09 Jun 2017 04:41:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.183.197 with HTTP; Fri, 9 Jun 2017 04:41:05 -0700 (PDT) In-Reply-To: <20170609100848.GA4759@nb4> References: <20170609100848.GA4759@nb4> From: Ivan Kalvachev Date: Fri, 9 Jun 2017 14:41:05 +0300 Message-ID: To: FFmpeg development discussions and patches Subject: Re: [FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" On 6/9/17, Michael Niedermayer wrote: > On Fri, Jun 09, 2017 at 01:36:07AM +0300, Ivan Kalvachev wrote: >> opus_pvq.c | 9 >> opus_pvq.h | 5 >> x86/Makefile | 1 >> x86/opus_dsp_init.c | 47 +++ >> x86/opus_pvq_search.asm | 597 >> ++++++++++++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 657 insertions(+), 2 deletions(-) >> 3b9648bea3f01dad2cf159382f0ffc2d992c84b2 >> 0001-SIMD-opus-pvq_search-implementation.patch >> From 06dc798c302e90aa5b45bec5d8fbcd64ba4af076 Mon Sep 17 00:00:00 2001 >> From: Ivan Kalvachev >> Date: Thu, 8 Jun 2017 22:24:33 +0300 >> Subject: [PATCH 1/3] SIMD opus pvq_search implementation. > > seems this breaks build with mingw64, didnt investigate but it > fails with these errors: > > libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): > relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): > relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): > relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): > relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): > relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): > relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): > relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): > relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > collect2: error: ld returned 1 exit status > collect2: error: ld returned 1 exit status > make: *** [ffmpeg_g.exe] Error 1 > make: *** Waiting for unfinished jobs.... > make: *** [ffprobe_g.exe] Error 1 const_*_edge is used on only one place is the code. Would you check if this patch fixes the issue. lea r4q, [Nq-mmsize] ; Nq is rounded up (aligned up) to mmsize, so r4q can't become negative here, unless N=0. movups m2, [inXq + r4q] === I expected that the addresses would be pre-calculated by n/yasm as one value and indexed relative to the section start. Instead it seems that each entry is represented with its own address and offset from it. Since the offset is negative it uses all 64 bits and it makes difference if it is truncated to 32 bits. Same issue could happen with clang tools. --- a/libavcodec/x86/opus_pvq_search.asm +++ b/libavcodec/x86/opus_pvq_search.asm @@ -419,7 +419,7 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N add Nq, r4q ; Nq = align(Nq, mmsize) sub rsp, Nq ; allocate tmpX[Nq] - movups m3, [const_align_abs_edge-mmsize+r4q] ; this is the bit mask for the padded read at the end of the input + movups m3, [const_align_abs_mask+32-mmsize+r4q] ; this is the bit mask for the padded read at the end of the input