From patchwork Sun Nov 26 22:51:07 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 6373 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.161.94 with SMTP id m30csp966470jah; Sun, 26 Nov 2017 14:58:03 -0800 (PST) X-Google-Smtp-Source: AGs4zMZ21DSzNGwxbqmS8tDUyivbgVnREbwNoC0Wu1nNQs9drLb7SjNCeiC8toF+WBq3Jvvxn2tO X-Received: by 10.223.170.193 with SMTP id i1mr18294940wrc.218.1511737083360; Sun, 26 Nov 2017 14:58:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511737083; cv=none; d=google.com; s=arc-20160816; b=OMkYM+QhIich1bAS23+KiZFhi9vDDZlKGal9HU3PyukOosJbfFAEFkqhub7j2nggw+ nGTfCDKjm78LazVYjSv2UuGdZcn8HOXCNc/lDPWZEO6IyhuhluzpClT6QCz8JE+euCSm 119w9GQfcRGj0MBZ5r6Jzas5Rj5gR9K5A9SYyam8Y9asp8EGcE1zq2emua4KprnU1/lN A2UcU0jX/ZlKRggTuZFvHluP+geq3wwzWlBxkeBEC2zb7QrEY2+ot/d4LPNtxlIdqR+p p5NLaSzG0lbwUD1nq5411Nlrlgf0cAqPo2MYNp0wXLwTS5DSMkyUD6N92hF519Fef/+1 gO0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=OmA4w0b8VJN9ni14cvLkGOiCKUaLeF5UKClyJx6lo8k=; b=mwqj7sCuxF4guoAqxhLc1aqAYSJ8SSu9wGuK930jKNEaEA0+Tu/d9h4ZHKFlXiEdTX nZ8OYBRm/gcyTyeY5GZNzRwEfRVK6BToybQJqc0v7YqRs3QGG2+0fLxRdsJShfYoDgXS tBXsjaGb+U//EDubVAWP0OUB5yEjLCbG1BoEO3byYnJaDIxgzadsTs5qBaeudahA6Yds 4n92cvTE0x/COlF9HMc1ks7d4nP3gqWdEzUX+QC8Zw8o4agEjaK7SKgX+8eeLysWguVE xLmgmMuAlCGrG7yq9V7uO4xuUxagSblJTKfBcnLJof5Br8vnFjIxrGQcdQKkKVuZPPOL Okkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=WKbaBk/x; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id q124si10587300wma.132.2017.11.26.14.58.03; Sun, 26 Nov 2017 14:58:03 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=WKbaBk/x; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 61E8368A31A; Mon, 27 Nov 2017 00:58:00 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0ED0368A172 for ; Mon, 27 Nov 2017 00:57:55 +0200 (EET) Received: by mail-wm0-f67.google.com with SMTP id u83so30898884wmb.5 for ; Sun, 26 Nov 2017 14:57:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=Z5o6K30Ywh9Mx+DilYgsMfye3tc0YgSDe6diEMLoRnA=; b=WKbaBk/xSndcW7i6bpyWb9/akzi77PbDgXN442KteuVh19Xa1UEQpKNIsPpDxP6cdU QLtn5XRqwE/VAWJ16Ong4wLTH7vzs+6igmBTmd3AilXNDbV+x3fAG5JYuUAWdKALpLxd zfWnO5UlmTJV4Xh+74fMY8C9I1dUvamhAEHPHfxCXef9JWWA2QWYZ/z52YpRRWvbh/kp Wd40fXm/pO78NTYGX0mswTveYC3Bh9q0SbwcuXkHFJOn1MOqrqwo6MFzuhO28ahPpw3/ qyWxoVw4+Z0vbQiFcBSmM1mGCcvaPbVlSWv0GKemexYSDqRb07cwefeo3ZhyujDh0qe0 UhlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=Z5o6K30Ywh9Mx+DilYgsMfye3tc0YgSDe6diEMLoRnA=; b=V60IvuPW3lQmZ9IkcGGvePjCapbu7GsbpXu/4ntCmk+KfazaVZCC/CVuE7fGjNhdWh RVzyVtbMiJYct5vGXlGi3aJe6CVLinDqQlc6be4OODZXUBC+P+s32i414x44xwCK91PG 1CsVnf/3wmHmW5JbWLUIuIxg1ppDG8M30NPqdvs5YSXJAyfZsP7TE5C3YhycJ5hrAZU4 sxZrWFBwQUfCjv+DNZslBnyy67uM5QTFiJ6H/FEP/dfwZV91eGNq6XvTrzrJn8+oM/Ds lP706Ny0H6NDJrknwplixSt5ejCfz3ynlbuBiQ9wX4O1yb/EeEs/kOrq5xoZxE5b5eI2 2I8w== X-Gm-Message-State: AJaThX7dlNHPNua0rcgF++fdrVXu6/iWQTKOkgpGQnNlo8B1dv2aFZ4l O3j2nYYPuK9hR+mmGCjkA9BCiA== X-Received: by 10.80.154.193 with SMTP id p59mr3172536edb.304.1511736722563; Sun, 26 Nov 2017 14:52:02 -0800 (PST) Received: from Highwind.systemlords.lan (d51a44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id h56sm22545791ede.15.2017.11.26.14.52.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2017 14:52:01 -0800 (PST) From: James Darnley To: ffmpeg-devel@ffmpeg.org Date: Sun, 26 Nov 2017 23:51:07 +0100 Message-Id: <20171126225111.5108-5-james.darnley@gmail.com> X-Mailer: git-send-email 2.15.0 In-Reply-To: <20171126225111.5108-1-james.darnley@gmail.com> References: <20171126225111.5108-1-james.darnley@gmail.com> Subject: [FFmpeg-devel] [PATCH 4/8] avcodec/flac: partially unroll loop in flac_enc_lpc_32 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Now does 6 samples per iteration, up from 2. From 1.6 to 2.1 times faster again. 2.5 to 3.9 times faster overall. Runtime is reduced by a further 4 to 17%. Reduced by 9 to 65% overall. Same conditions as previously. --- libavcodec/x86/flac_dsp_gpl.asm | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm index 618306eb5f..4d212ed212 100644 --- a/libavcodec/x86/flac_dsp_gpl.asm +++ b/libavcodec/x86/flac_dsp_gpl.asm @@ -152,13 +152,13 @@ RET %macro FUNCTION_BODY_32 0 %if ARCH_X86_64 - cglobal flac_enc_lpc_32, 5, 7, 4, mmsize, res, smp, len, order, coefs + cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs DECLARE_REG_TMP 5, 6 %define length r2d movsxd orderq, orderd %else - cglobal flac_enc_lpc_32, 5, 6, 4, mmsize, res, smp, len, order, coefs + cglobal flac_enc_lpc_32, 5, 6, 8, mmsize, res, smp, len, order, coefs DECLARE_REG_TMP 2, 5 %define length r2mp %endif @@ -190,6 +190,8 @@ mova [rsp], m4 ; save sign extend mask .looplen: pxor m0, m0 + pxor m4, m4 + pxor m6, m6 mov posj, orderq xor negj, negj @@ -197,23 +199,41 @@ mova [rsp], m4 ; save sign extend mask movd m2, [coefsq+posj*4] ; c = coefs[j] SPLATD m2 pmovzxdq m1, [smpq+negj*4-4] ; s = smp[i-j-1] + pmovzxdq m5, [smpq+negj*4-4+mmsize/2] + pmovzxdq m7, [smpq+negj*4-4+mmsize] pmuldq m1, m2 + pmuldq m5, m2 + pmuldq m7, m2 paddq m0, m1 ; p += c * s + paddq m4, m5 + paddq m6, m7 dec negj inc posj jnz .looporder HACK_PSRAQ m0, m3, [rsp], m2 ; p >>= shift + HACK_PSRAQ m4, m3, [rsp], m2 + HACK_PSRAQ m6, m3, [rsp], m2 CLIPQ m0, [pq_int_min], [pq_int_max], m2 ; clip(p >> shift) + CLIPQ m4, [pq_int_min], [pq_int_max], m2 + CLIPQ m6, [pq_int_min], [pq_int_max], m2 pshufd m0, m0, q0020 ; pack into first 2 dwords + pshufd m4, m4, q0020 + pshufd m6, m6, q0020 movh m1, [smpq] + movh m5, [smpq+mmsize/2] + movh m7, [smpq+mmsize] psubd m1, m0 ; smp[i] - p + psubd m5, m4 + psubd m7, m6 movh [resq], m1 ; res[i] = smp[i] - (p >> shift) + movh [resq+mmsize/2], m5 + movh [resq+mmsize], m7 - add resq, mmsize/2 - add smpq, mmsize/2 - sub length, mmsize/8 + add resq, (3*mmsize)/2 + add smpq, (3*mmsize)/2 + sub length, (3*mmsize)/8 jg .looplen RET