From patchwork Sun May 12 18:53:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 48826 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp706144pzb; Sun, 12 May 2024 11:53:58 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWQGZb4D13qKe1x+enCybthxUvm/JcvOZqK8SVci8EYKPbEfkGB4txuJjJfnC1DLhCwTBXcUhaw6JU1lvqojPfBbvAEpnb24lGdHg== X-Google-Smtp-Source: AGHT+IFqv/w6nHTJZWxqCOUBf1GwWR7AB0BbWI4knvUMLNHDQSIIzLdXZqRcUa1wYEsMeamAGec1 X-Received: by 2002:a17:906:a0d8:b0:a59:aa69:9794 with SMTP id a640c23a62f3a-a5a2d572504mr522992366b.18.1715540038651; Sun, 12 May 2024 11:53:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715540038; cv=none; d=google.com; s=arc-20160816; b=emtfUH5h26EQ6JvSLpISRjRy+pymKcUjXn45kq/AIxxL51hUFUpqV8nJdx4QlmeMKY +tL4ESjLCIsr4AORRdCM1mdHCxEagGuoVyrmUnffiZYErmn86EcnqOCcy4FkcSYMKhx9 Yva6Z/xQszous+ZF8yJ4Th59VFafFoMqeo7iTs3e4WUZyptbs6F8ee/AjzZD63qf6gNM ZDpvBCYssH/edBLfyoRXYrwa3FfQX3PkIVe/9CaXwyBr+U/klwjTwuaJCPmoiCzfUDuE 4DKISZmpfFg27Dgc3pLDsLSFJXg3r/KmB9Yo+vbY3j1VKotnShwcpqPR4EET0UOKzf/j JZbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=49lf/o4UzXqOIkw8KED9Ps/x3jBMDYdGpriQswLFrR4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=E3B/C/Gl8uccnJXThbmW64b6eNpm7MAOXGd69NU9oVBj6Km5DxwLm1m1THOUNzp/Yk wpKh34YrjatwYDEWLCG6ultqyl5o1iJBC3XIHnODYarftUwwijWZP/xSf1fLT7WnGFrU wXUAUCLa9rT7AwlklAPEFH2Z5j2KXRwxzlNYMj0gvhC/68IYM1AWPPCUlqub5T3OMVKI povQ4Co64/D8/u1hx2FWpzp9GtbQXxGFU2SCcMlqkr2rjG+2STy8+Ew1OoOOKvhBmafp A0jKLhXExbJksDHZSXAXcCX071BkjvBzfrdey3SC3VNgNLBb2pTY5rweTNrW9AiXaOVF ASvg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=JwvdpM4X; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17945d5esi413040766b.16.2024.05.12.11.53.58; Sun, 12 May 2024 11:53:58 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=JwvdpM4X; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1E34768D678; Sun, 12 May 2024 21:53:56 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D095D68D3BD for ; Sun, 12 May 2024 21:53:49 +0300 (EEST) Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1eb24e3a2d9so32852845ad.1 for ; Sun, 12 May 2024 11:53:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715540027; x=1716144827; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=NW0xmkYyyCKT072/qPul0xnDLWoVuNEKdOfSQ0OoIWU=; b=JwvdpM4XKNopsstN+IrmLPqSWxxw3f7YCI8AIH+7BKESLXA1jxEI+DL0Tq7nsPWg2C b0oIGAxfzK1bdNbX278PXlAbUrCg9p6dzxFKCKWXHX0CQp9reNye5AAsV8RURaR+DJvx 0XKnB/beksFl1WvL8Xe/NfEVzvfXVT5fJyDvReLMF8t5iFm6C4KWZMaWPzurpFunqts4 ytMyWyNNex8GyG/QP956Su+PaHMzd/urrl7m1oitjx2YvEcTKvy6rK5Zo1jJqN/7QmNO J+Kql0AXfApaYhrZO0IX0AD55QYyEPHI0XBOpZ+wZ64vzxBOC5Z9vkxQKHiNkWKbeYNn c6ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715540027; x=1716144827; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NW0xmkYyyCKT072/qPul0xnDLWoVuNEKdOfSQ0OoIWU=; b=T+wdnfNTV90ZO/WUHvSnLHJr0rMn6HAioFhSR0fIivFJhyS1nLRRc+nCSGDMbh6k09 wsoHtBu1w8T2nEK2wHeCO31BHOBUoT06+0jWPwb8DAkOzUL8k46gtQr20C1ZaysOUUlK 2Voqpm1PyuLR5ZsusgPMtjk+HMamKPX/sDKYKubOojE41zkpWa/hyoDd46fRP/ASIPb+ uRtL6j5EXMQZalWIVaucETPl/iukLDoRqWA8csxIIuCwnS51OdR0LkZ3fNpu6dZ5jQ+E QKOCdCnPAeNzchwCn+iTNq8uE6drM042C79ANX0TIjYEQLNOmZKV57y5xUbWKdCnfTV8 jPqw== X-Gm-Message-State: AOJu0Yz4OMUzJ7EugdtIldNjGOLgGZfWo2OlEJm8idy8a7lTZEtWY9rw NEU2tIuNH12WK4Xcf9CRBuVt1bl9O/CbwRnAyv0AQPtYdLt42uarQZMlfg== X-Received: by 2002:a17:902:db0a:b0:1e3:e243:1995 with SMTP id d9443c01a7336-1ef43c095cdmr138250635ad.1.1715540027205; Sun, 12 May 2024 11:53:47 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1ef0b9d40besm66725615ad.42.2024.05.12.11.53.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 May 2024 11:53:46 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 15:53:36 -0300 Message-ID: <20240512185336.60155-1-jamrial@gmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240512160657.2733-6-jamrial@gmail.com> References: <20240512160657.2733-6-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 8/8 v2] x86/flacdsp: add an SSE4 version of wasted33 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: d8otAPIy7scd flac_wasted_33_c: 214.1 flac_wasted_33_sse4: 103.2 Signed-off-by: James Almer --- Removed the AVX2 one as the lane crossing in pmovsxdq removed pretty much all speed up for processing twice the amount of data. libavcodec/x86/flacdsp.asm | 25 +++++++++++++++++++++++++ libavcodec/x86/flacdsp_init.c | 2 ++ 2 files changed, 27 insertions(+) diff --git a/libavcodec/x86/flacdsp.asm b/libavcodec/x86/flacdsp.asm index 21b2439bc0..15fcec4f08 100644 --- a/libavcodec/x86/flacdsp.asm +++ b/libavcodec/x86/flacdsp.asm @@ -113,6 +113,31 @@ ALIGN 16 jl .loop RET +INIT_XMM sse4 +cglobal flac_wasted_33, 4,4,5, decoded, residuals, wasted, len + shl lend, 2 + lea decodedq, [decodedq+lenq*2] + add residualsq, lenq + neg lenq + movd m4, wastedd +ALIGN 16 +.loop: + pmovsxdq m0, [residualsq+lenq+mmsize*0] + pmovsxdq m1, [residualsq+lenq+mmsize/2] + pmovsxdq m2, [residualsq+lenq+mmsize*1] + pmovsxdq m3, [residualsq+lenq+mmsize*1+mmsize/2] + psllq m0, m4 + psllq m1, m4 + psllq m2, m4 + psllq m3, m4 + mova [decodedq+lenq*2+mmsize*0], m0 + mova [decodedq+lenq*2+mmsize*1], m1 + mova [decodedq+lenq*2+mmsize*2], m2 + mova [decodedq+lenq*2+mmsize*3], m3 + add lenq, mmsize * 2 + jl .loop + RET + ;---------------------------------------------------------------------------------- ;void ff_flac_decorrelate_[lrm]s_16_sse2(uint8_t **out, int32_t **in, int channels, ; int len, int shift); diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c index 67aa118760..fa993d3466 100644 --- a/libavcodec/x86/flacdsp_init.c +++ b/libavcodec/x86/flacdsp_init.c @@ -31,6 +31,7 @@ void ff_flac_lpc_32_xop(int32_t *samples, const int coeffs[32], int order, int qlevel, int len); void ff_flac_wasted_32_sse2(int32_t *decoded, int wasted, int len); +void ff_flac_wasted_33_sse4(int64_t *decoded, const int32_t *residual, int wasted, int len); #define DECORRELATE_FUNCS(fmt, opt) \ void ff_flac_decorrelate_ls_##fmt##_##opt(uint8_t **out, int32_t **in, int channels, \ @@ -100,6 +101,7 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum AVSampleFormat fmt, int if (EXTERNAL_SSE4(cpu_flags)) { c->lpc16 = ff_flac_lpc_16_sse4; c->lpc32 = ff_flac_lpc_32_sse4; + c->wasted33 = ff_flac_wasted_33_sse4; } if (EXTERNAL_AVX(cpu_flags)) { if (fmt == AV_SAMPLE_FMT_S16) {