From patchwork Fri Jun 9 12:07:55 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ivan Kalvachev X-Patchwork-Id: 3886 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.10.2 with SMTP id 2csp3296377vsk; Fri, 9 Jun 2017 05:08:07 -0700 (PDT) X-Received: by 10.223.171.77 with SMTP id r13mr28339177wrc.83.1497010087622; Fri, 09 Jun 2017 05:08:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1497010087; cv=none; d=google.com; s=arc-20160816; b=tSvclKzP1OU1K970PCHPloMxqr0xMP/aMYPf+A2Hgn1Tjt3z34/kEZLFyWv3aOgtvn jrW56pdOjByHeAtVEdz1euieeAxt9Q/wDY9czoFvugkcoD7Nt69DX3liWi65i+rTh5FW fiCCLPGZF6LjggSvOgzhxRdI9ONoapmIrBQSL9unNY4e5M3R45OAH/Klm1KyVeKCsmOt E8IJoU85P1ziruG4aQvsrW1as6ZaMm0y4h2Vue3YgCZ3P+AY67m4+yOEmCo5hxurcHPF FuKCjmp+TMEerCkrohCCLVkciseIIj1WxuyaY+38tgWuyZqWUAxkgRovdPDT83EP9Vz3 XSQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:to:message-id:date:from:references:in-reply-to :mime-version:dkim-signature:delivered-to:arc-authentication-results; bh=6mwM1pNE5AfVqUhBXQ6uUIHfKOOKbVr5vjulywJT6Zs=; b=AEQloeLxOBc3A8tSB3cyq/BJ6tvS4SeCf7KV9UK02WjXz25hL/YZIIk7alykENeTz4 ZuaiD5C/UZoT3eVJhtwvCXcETCVRYhEthq1h80Kd7OpTnhk9OjXse9iu11J9HWZMXMOL CrSJmARXEpRpTI0GM1BkFK4ti9rDyz71y1EmlyLZuLQrSRz6+QcD1arOAli/WmVE+BJl k/rfYmxK4E2LNlrFnyelpNHihTVycRga36eep0SaiaFwPLUd/969JiL5O2PAwjgx/FqM D0xEZd0AcVqeKM7SCKlUXOewcc4OSvPDfXtwX5hrCYbiZUukB7hSjAS35hwRobHZWK2p mhRw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id x7si1021019wrd.117.2017.06.09.05.08.06; Fri, 09 Jun 2017 05:08:07 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8ED1D689E87; Fri, 9 Jun 2017 15:08:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg0-f68.google.com (mail-pg0-f68.google.com [74.125.83.68]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4FDAF689E69 for ; Fri, 9 Jun 2017 15:07:57 +0300 (EEST) Received: by mail-pg0-f68.google.com with SMTP id a70so7531680pge.0 for ; Fri, 09 Jun 2017 05:07:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=qisXuO1nZ5WEN7N6GVg8OK7NI6BEvlnBxz2T9leXRDM=; b=DNQfH63tmGMOSMKeXoMMYaC9IgxZDTHrVMRvmaZiiUaknrhjhGkCzRsFwBYXMGh6iR Tcdz1WBK4UW5t8Yrsr1M/J4dtFIlJIAAkL3WOUvM1oHply75f0Cdcn85qA28aprS1dOI E0fBw2rRacFaCTiw7ugXExU446dJ6fiBoCDvpCo2jpOydWm2x9rFwdE3qjf+5puXdInl u4Ujk+sow9X7sB9Qslcp4WS6VODDcfCo4iTucCfbiqg35Uh4+nl3+vVyd0AfYXXCves1 I11QdoFzENzXF9mKyr7ltbQoNt+Ql3hXNoRDPEcOJkH1wbJ6qFKBBCMfQNI9JzZ/+jti //wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=qisXuO1nZ5WEN7N6GVg8OK7NI6BEvlnBxz2T9leXRDM=; b=hR+IvINvU9vFeq3LwwViNSRdkoIGqPFYTLM0O8bHOnqG2VmLf5t13guJWQ7Mgw4s27 WX3Zkm8xcKjx2MOSclxwV3a7/dWdXPYXevtthldJBukleHFzZYfjy82JqLVvB6NkErPV 54EAvJxrM4xVhqCuaGfGMybIEGmWx6EG51QcouFJDxvaspGEXZfXk2/FMh+4bPJpcOH7 FFMmqvpo+RRikjiSvy9gIzSZG10GZ0s9xEMA176sAoi248DR/lT/auZ3181f1GSc75VQ v/Bn59njnV91kxOQ1erZFWxi+TP+6P40LIPFTVOw4J4BI/aFjZYDgUZawCd3D9HdHncb Yt1A== X-Gm-Message-State: AODbwcDix4m5WPenoo5natpJeVRb3YhoS+DgMNkR0evjnJQKIqsXpYnY EiSWBJz1G1Q5NiaIj/yKev4GJPwKaw== X-Received: by 10.99.185.67 with SMTP id v3mr13021921pgo.189.1497010076107; Fri, 09 Jun 2017 05:07:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.183.197 with HTTP; Fri, 9 Jun 2017 05:07:55 -0700 (PDT) In-Reply-To: References: <20170609100848.GA4759@nb4> From: Ivan Kalvachev Date: Fri, 9 Jun 2017 15:07:55 +0300 Message-ID: To: FFmpeg development discussions and patches Subject: Re: [FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" On 6/9/17, Ivan Kalvachev wrote: > On 6/9/17, Michael Niedermayer wrote: >> On Fri, Jun 09, 2017 at 01:36:07AM +0300, Ivan Kalvachev wrote: >>> opus_pvq.c | 9 >>> opus_pvq.h | 5 >>> x86/Makefile | 1 >>> x86/opus_dsp_init.c | 47 +++ >>> x86/opus_pvq_search.asm | 597 >>> ++++++++++++++++++++++++++++++++++++++++++++++++ >>> 5 files changed, 657 insertions(+), 2 deletions(-) >>> 3b9648bea3f01dad2cf159382f0ffc2d992c84b2 >>> 0001-SIMD-opus-pvq_search-implementation.patch >>> From 06dc798c302e90aa5b45bec5d8fbcd64ba4af076 Mon Sep 17 00:00:00 2001 >>> From: Ivan Kalvachev >>> Date: Thu, 8 Jun 2017 22:24:33 +0300 >>> Subject: [PATCH 1/3] SIMD opus pvq_search implementation. >> >> seems this breaks build with mingw64, didnt investigate but it >> fails with these errors: >> >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >> collect2: error: ld returned 1 exit status >> collect2: error: ld returned 1 exit status >> make: *** [ffmpeg_g.exe] Error 1 >> make: *** Waiting for unfinished jobs.... >> make: *** [ffprobe_g.exe] Error 1 > > > const_*_edge is used on only one place is the code. > Would you check if this patch fixes the issue. > Sorry, the patch was not tested and the variable name was not correct. This one should be fine... I hope lea r4q, [Nq-mmsize] ; Nq is rounded up (aligned up) to mmsize, so r4q can't become negative here, unless N=0. movups m2, [inXq + r4q] --- a/libavcodec/x86/opus_pvq_search.asm +++ b/libavcodec/x86/opus_pvq_search.asm @@ -419,7 +419,7 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N add Nq, r4q ; Nq = align(Nq, mmsize) sub rsp, Nq ; allocate tmpX[Nq] - movups m3, [const_align_abs_edge-mmsize+r4q] ; this is the bit mask for the padded read at the end of the input + movups m3, [const_float_abs_mask+32-mmsize+r4q] ; this is the bit mask for the padded read at the end of the input