From patchwork Sun May 12 16:06:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 48817 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp647012pzb; Sun, 12 May 2024 09:07:22 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWOvTCbFZ6wxF1bTCZEpNdy59tr4MX8JgctJdcvPirVtYOAQCy6cGn9SDBIwygCa7YynG1Q1eGPj5V37aGeKxegye2RFjpynjsI5Q== X-Google-Smtp-Source: AGHT+IFhjPCST4U0UR37+yjRbY5PH1gayI7MiPs/DNmLrHznk3JhvfoAr1PWCI4dbxCifmim/JDB X-Received: by 2002:a17:906:a28c:b0:a59:e5ca:db31 with SMTP id a640c23a62f3a-a5a2d54c743mr475731066b.16.1715530042458; Sun, 12 May 2024 09:07:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715530042; cv=none; d=google.com; s=arc-20160816; b=rCfulLaUhaRbzQxiX2G77NeiCv63e9UUkWEgIDG7m83/908etWo98EMbPQ/767WUJY tlY6LBqzQZ0jFdi+C/1TrAcXeL+N8cYsYArNuDgsSlu/ZcW1oKNj/pKxMliOjOBYycp+ I1pJBanrwCPg06dviXgPvLjInxKxmX/EMJAqbpZXVM7lmCyyQJ23nvrpcHELENh4gNb1 U0uFyz2PbTZkAb8zKxIQ4rtmoNJX1pEFpeiORO2EgxWqbLXk5qTAw7yLs4kIFvdUEbXB wIGR+AQHphsfLMP5ToKQViNDDz4YQxvC7JIxEp/uVFv2eC2vHgi7OI2gIOoTTLJxNCb3 tfgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=lD44OM56At5+WOpZqa5oxu0Gwqot/JR+y7OFueCVcjM=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=G/RVaa7DHyJpSD3wtKt6P2IeRkD34dQeWCU2RGsPa4Q7AZmQ/wL/TfM59ON7gQLiHP CEiA0VlgfF3EqOa0UW8RqP1sX7JbcYJ14y03xdekHhDT4oP2j7RmTlciRVqQce5NnN04 fto/SZkyIVxudPlsBTjhk0V5DHnUkrExhX0asUvl6LzhHqoI6NsAbKCMwNbfOURHd1p1 Yp6nGj66+TuNw8ObEm+wdWMO0Eh943VuquMwdwxam72CdW7zYVmYfKRZzKrjwnfLFbF7 5WhYBwtIvc9PjXd4XBdkirJHr5F6Gsor/E0Z+FJD3yHa8D6R/UpEoBFAJmBfBkA4zXwW QAgw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="FvD3N9/6"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a1b602bcbsi399784466b.489.2024.05.12.09.07.21; Sun, 12 May 2024 09:07:22 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="FvD3N9/6"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2B66468CEC6; Sun, 12 May 2024 19:07:18 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 87D7E68D3C0 for ; Sun, 12 May 2024 19:07:11 +0300 (EEST) Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6f5053dc057so11671b3a.2 for ; Sun, 12 May 2024 09:07:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715530028; x=1716134828; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=Pt0Rpudd2PRnmfwEPnU4CPz2xpGYBHn54SCJKpRooYw=; b=FvD3N9/6uIa3p6slPaqUET+tgwKGlfhvX1Jrn0GpThgvL5tG238EdBfUKV1xa2ZHxE dCvUbV4dzjxFGvYPqWCS0VP28Oj/hiyGMypQZav2enfiDEfgr4QmBdlAgxFNIQtjBPWU 8h9pCJnglemLVqi//AHmrYH1CHx3NxtAuC7Pjtj2RoEaIkGJJbgpvydZt7MAb+Ncp+UV LbkiRXPJ1tFlvQfAUJrfZETjDf9m327ZlUq5tgZYC9Xm4n3SvEUXcEUbAtDZr73UThMD JDm/PHbz9VBhbW7qZnFmUHFQLCd1rQvqKuP1Yu5SlIcVx6MacFqg+6R/lwoMXUVKsZSt VQ6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715530028; x=1716134828; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Pt0Rpudd2PRnmfwEPnU4CPz2xpGYBHn54SCJKpRooYw=; b=kN1c01PyjJvdkLz6g4mY0ORWD3V87mdegwWJKSXD785S68GXO4c9ZsmK8p8/U6BnEd rv4Pz2TUR+d41NTQ162q+S5VPXNrvy45/7Zkdai6QIwCJe/e1c6IQoPoRzxP5gI/fHd/ AFAKMunExbVUnJgwn/fVp/1bVfzf2oojkmxNMCAWAtOZp06OBXXoTMoB26pzR8oNFEWP T8qH1kLt2clPQA5SvxcpfL8W5zIeYNGz1fDeoNR/0nb4lNsMOkeIZ305MhY7QGSqr2t7 hZwB7/sCPrpjbqEL4RNZZzJ8XyEm2n8zBdcQEpsX1XPTgxLT/Qxh/gVRv/9Er2Ps8Ogx 2YIQ== X-Gm-Message-State: AOJu0Yy4tb4d42UOr0HObM6P6cajcvgyq19tWFmSv4mVeIV0hN3LC415 xJ3PpG8XHtx26bcFdvpOC6XfHqfPElFQfuKr3hjkXQnq6ajErrlBOmoQEw== X-Received: by 2002:a05:6a21:339f:b0:1af:dbe7:ccbe with SMTP id adf61e73a8af0-1afde103d87mr10636301637.28.1715530028286; Sun, 12 May 2024 09:07:08 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-6341134705dsm6274574a12.85.2024.05.12.09.07.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 May 2024 09:07:07 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 13:06:52 -0300 Message-ID: <20240512160657.2733-1-jamrial@gmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240511194656.1576-1-jamrial@gmail.com> References: <20240511194656.1576-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/8] x86/flacdsp: add a SSE4 version of lpc16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Dkz/RPw+rly3 flac_lpc_16_13_c: 2841.3 flac_lpc_16_13_sse4: 2151.8 flac_lpc_16_16_c: 3382.8 flac_lpc_16_16_sse4: 2228.3 flac_lpc_16_29_c: 5800.3 flac_lpc_16_29_sse4: 3727.3 flac_lpc_16_32_c: 5972.8 flac_lpc_16_32_sse4: 4052.3 Signed-off-by: James Almer --- libavcodec/x86/flacdsp.asm | 13 +++++++------ libavcodec/x86/flacdsp_init.c | 3 +++ 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/flacdsp.asm b/libavcodec/x86/flacdsp.asm index 4b2fd65435..f38eb7db76 100644 --- a/libavcodec/x86/flacdsp.asm +++ b/libavcodec/x86/flacdsp.asm @@ -38,9 +38,9 @@ SECTION .text %endif %endmacro -%macro LPC_32 1 +%macro LPC_32 3 INIT_XMM %1 -cglobal flac_lpc_32, 5,6,5, decoded, coeffs, pred_order, qlevel, len, j +cglobal flac_lpc_%2, 5,6,5, decoded, coeffs, pred_order, qlevel, len, j sub lend, pred_orderd jle .ret movsxdifnidn pred_orderq, pred_orderd @@ -67,14 +67,14 @@ ALIGN 16 jl .loop_order .end_order: PMACSDQL m2, m0, m1, m2, m0 - psrlq m2, m4 + %3 m2, m4 movd m0, [decodedq] paddd m0, m2 movd [decodedq], m0 sub lend, 2 jl .ret PMACSDQL m3, m1, m0, m3, m1 - psrlq m3, m4 + %3 m3, m4 movd m1, [decodedq+4] paddd m1, m3 movd [decodedq+4], m1 @@ -83,10 +83,11 @@ ALIGN 16 RET %endmacro +LPC_32 sse4, 16, psrad +LPC_32 sse4, 32, psrlq %if HAVE_XOP_EXTERNAL -LPC_32 xop +LPC_32 xop, 32, psrlq %endif -LPC_32 sse4 ;---------------------------------------------------------------------------------- ;void ff_flac_decorrelate_[lrm]s_16_sse2(uint8_t **out, int32_t **in, int channels, diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c index 87daed7005..dee4bf88fc 100644 --- a/libavcodec/x86/flacdsp_init.c +++ b/libavcodec/x86/flacdsp_init.c @@ -23,6 +23,8 @@ #include "libavutil/x86/cpu.h" #include "config.h" +void ff_flac_lpc_16_sse4(int32_t *samples, const int coeffs[32], int order, + int qlevel, int len); void ff_flac_lpc_32_sse4(int32_t *samples, const int coeffs[32], int order, int qlevel, int len); void ff_flac_lpc_32_xop(int32_t *samples, const int coeffs[32], int order, @@ -93,6 +95,7 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum AVSampleFormat fmt, int } } if (EXTERNAL_SSE4(cpu_flags)) { + c->lpc16 = ff_flac_lpc_16_sse4; c->lpc32 = ff_flac_lpc_32_sse4; } if (EXTERNAL_AVX(cpu_flags)) {