From patchwork Thu May 23 12:27:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 49176 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:542:0:b0:460:55fa:d5ed with SMTP id 63csp1005075vqf; Thu, 23 May 2024 05:28:12 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW1r16RIrI6Y7Nz55U4o72UijcYe2lUYZvEfTmQB61cLiX0hp7XMp3D5MLPpBnDrNgP36vd0n1f1aHRp1f7qT/C8JcElEF4q8HqeA== X-Google-Smtp-Source: AGHT+IFjAIgA/DDtSHP5Kfk4TWE9awxELuZpvD7FTCOQTktdCkQClYHyzEZvtP5Z2LJLAzPnTyod X-Received: by 2002:ac2:5331:0:b0:519:6e94:9b4d with SMTP id 2adb3069b0e04-526bef87b28mr3048527e87.48.1716467292231; Thu, 23 May 2024 05:28:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716467292; cv=none; d=google.com; s=arc-20160816; b=PyZo685UI/IuBYAr3iWVke1CD3/o6nxKiKsl6MX36VWQuSiCSsxWca8ieLS120zz+Y 4JRWobPLI+NrKp1k1r/6zRMEtwKZ1feZceF7R+v5/yQS9DWinC8OH0R8HyV6QXsZKk9o UYspA1MNNpzR8Fv6QXFTazFykJOCTDAqZm8tW6+D+XrugbLzEbUonXbQTozTnLje9s2h /dh5AMYXto1NO+ydJ7yi8Hn04ViQD0CsbdbWs/fZNsyc4q9E9v+V4wEVA+Ct4UQGVMjx a80MSGyED2GaSOyT37O7Z6pnl9hR9RrPlRMYFuJyKUulX0aL2u3Siqlev0ibO4okpnsf EqRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=axrTnrABl1HBskWxpjd9WxXF7BB1lUt0QiFT9DxnMvo=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=gNRf8+faU207RNfJ1pnqgsorQlzJNl+1fJtWQrfNfFz8n3UmEWbXKWk/S/gM2UMl/f UXF6qmUfbPAkvgIaO7E3vPwvAjMEQTH1ZyTUGm7lrm+bEBwfWNVm1R/mOCDlDl1giEu7 WUvvEsehVPQ5sKqsc2EF+iVCEhBq16zN7YHo/WZMg6gSQVy0su0T+DTAyPzN54hdqWb4 SuqCZKf1ZT/FXK8hu6Q+hRz3LUCTmFawFaP2DJH1+v/sYcl/KwNmU5k4yp13L6M/RxIJ aNljzm0jlfqYc4mvWBsJtr8vPmXEDyf+UAx5o/KIcE8tumR36CZcosBlHVPRqsexRsPi B+0Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=IdFoYH0m; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-521f38d3f5dsi10213878e87.265.2024.05.23.05.28.11; Thu, 23 May 2024 05:28:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=IdFoYH0m; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D0D5F68D48C; Thu, 23 May 2024 15:27:57 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7772768D355 for ; Thu, 23 May 2024 15:27:51 +0300 (EEST) Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1ee954e0aa6so19955245ad.3 for ; Thu, 23 May 2024 05:27:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716467269; x=1717072069; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=waA1qhxwTV58NOjeeYqzq/nxU+Ey3ICRRSOVn8w41RU=; b=IdFoYH0mWS/0dsGSw5lG6vZTZbr6BfdRj/D/7Y0HiwrWCV+cWdUdk+c8XNN8qNUWdG 26CyV2SbszknWxc62gG6cp1OBOkI/w08AjTzNyCuMp8uaRFBV7QH85Fpo5F69kFX4SaR tUzHS9ntDxG2ONNKNgI0zOoi5jWWgaJjVsAYy+8TXryGJKDKnvPOcVPSZONlSDGC5jBP 45dmKIR64v5c2EjR01OKY9UDIRk/YRCuwTeSdnEyBTtFDwVkXkDU5uvvGbh/VLXlUStE OtvQXiV0hsegUlBNkzSDyCz5WuvnfKC/y60fpuG3TXiPKRw11YYlXMvKQDOcOXCBHlWs bw3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716467269; x=1717072069; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=waA1qhxwTV58NOjeeYqzq/nxU+Ey3ICRRSOVn8w41RU=; b=g1xVzJ+xmsry/7v7kNUMHHOdRTnaU0G51LoVIxA2biTR2Vp9gyBhOHESLSrDBxk/c1 a61d9ARRlu76qitkeAbIrzo+g68idC699ZCHLi4VfO62rDCz1El6/7loELGJGOmgK9d3 3lYTR8gYb8gI9p16uZtYlRTDRg1x5nKQ/iucw/31ylAjOsHaR2IIqPZMcaOgqihRtCuC Q9b6m/BY4CmTSQ38whOHEsI44zI+rAIR0QwPpkrVy8xMPLTj95nxd+ZbyDfQThG3IBob 2E33GBPuIoZJ4gu395aUVRPc2hBagU/DZ9SrHC3L3+79yvUo/GUJjWJAIZkF5P+Q8P1C vnzw== X-Gm-Message-State: AOJu0Yy1UPcH9zHBjU7bNZrURasdy/rsD+hFOSU+icI6QeeF1ckiHsMZ 4LfEV67e2f5BeE+EZchAY2OTfz3DTQEqzUGEk1Dd71IRaTGb847JD3OWbA== X-Received: by 2002:a17:902:b908:b0:1f2:f497:2409 with SMTP id d9443c01a7336-1f31c978a36mr42404885ad.19.1716467268759; Thu, 23 May 2024 05:27:48 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f335da4b87sm16158405ad.100.2024.05.23.05.27.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 05:27:48 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 May 2024 09:27:13 -0300 Message-ID: <20240523122716.2158-2-jamrial@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240523122716.2158-1-jamrial@gmail.com> References: <20240523122716.2158-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/5] x86/vvc_sad: optimize vvc_sad_16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: bvCsYZijgw3D Signed-off-by: James Almer --- libavcodec/x86/vvc/vvc_sad.asm | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm index a20818530f..829dbce489 100644 --- a/libavcodec/x86/vvc/vvc_sad.asm +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -96,7 +96,7 @@ cglobal vvc_sad_8, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, ro movd eax, xm0 RET -cglobal vvc_sad_16, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx +cglobal vvc_sad_16, 6, 8, 5, src1, src2, dx, dy, block_w, block_h, off1, off2 movsxdifnidn dxq, dxd movsxdifnidn dyq, dyd @@ -121,26 +121,27 @@ cglobal vvc_sad_16, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, r pxor m3, m3 vpbroadcastd m4, [pw_1] - sar block_wd, 4 + shl block_wd, 1 + add src1q, block_wq + add src2q, block_wq + neg block_wq + +DEFINE_ARGS src1, src2, dx, dy, block_w, block_h, row_idx .loop_height: - mov off1q, src1q - mov off2q, src2q - mov row_idxd, block_wd + mov row_idxq, block_wq .loop_width: - movu m0, [src1q] - movu m1, [src2q] + movu m0, [src1q+row_idxq] + movu m1, [src2q+row_idxq] MIN_MAX_SAD m1, m0, m2 pmaddwd m1, m4 paddd m3, m1 - add src1q, 32 - add src2q, 32 - dec row_idxd - jg .loop_width + add row_idxq, mmsize + jl .loop_width - lea src1q, [off1q + ROWS * MAX_PB_SIZE * 2] - lea src2q, [off2q + ROWS * MAX_PB_SIZE * 2] + add src1q, ROWS * MAX_PB_SIZE * 2 + add src2q, ROWS * MAX_PB_SIZE * 2 sub block_hd, 2 jg .loop_height