From patchwork Wed May 25 01:21:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Swinney, Jonathan" X-Patchwork-Id: 34731 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6914:b0:82:6b11:2509 with SMTP id q20csp822895pzj; Tue, 24 May 2022 18:21:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwMLRbX+rdYozhgu4OSIfUofqQ13rOlf9m9B7mVYeevkDPANp+OTv27LLkygTjSV/CFcdZh X-Received: by 2002:a17:907:3e03:b0:6da:8c5a:6d4a with SMTP id hp3-20020a1709073e0300b006da8c5a6d4amr27335620ejc.585.1653441694582; Tue, 24 May 2022 18:21:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653441694; cv=none; d=google.com; s=arc-20160816; b=Vphqlbiz5ul0UGR0TFijeKIjCUIA4BUdrqgq3winPd6atgqSYuSh19gOZD5JvnCPPl zPnALRzJ4I45vE6KW9PHtgzkjftqUtrygJDIvoxlUGYled33LoOrvW9aZkWo5Umnjv4g tW2vTcEXrnj9sIqhYS6g38dv0cx3g7TrxSej/aUP/rcRgL45/y9GdDuuqNFb6B68SC0b uYUG2750kChy+8W0+q/J+ANu+ny0k2gmToPDBkDaG+C4LGCTGgMxcIAdeHjtcINtAkEF z8v7sxUxEobdET8JieaB9JdgIUkGJmZlJVh5/AApOj3UL648k+uUzvDP4hUoLVQ10/98 H9lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:content-language :accept-language:message-id:date:thread-index:thread-topic:to:from :dkim-signature:delivered-to; bh=DiQ9InHM2mc+j7peEXCFF8cgFd/shBozosktrMyJaTU=; b=VGFNxrtpidKgXS/y4lj2/WicwrKSDdevM5uVrUR7xnWNQpYKnQJQqKr06DQ8LOK1EI L4hlsJZ43HcEodbMYWxMVcTJRVM7MY05Ubo0OE34RNVDIBpqHw60Rs6RSlTtb5v5PIZQ 6zYFjft4ZuOj40eGKiuZCjL3sPmZQPng09LCuUXnoUi6rbEQlyDH2fvHwB8tXMujaLQB h7QtuOPx3gEpqwy9CTgSoNoXoEKAkiWrFZnmR3wzCEENldEIiRDAlzeGqkfOSgQk7m7I qdvTXjMhSEkHlWwC2bniljd+3gaRrvTgLuI/CU4OgCj21oFdK1y46bT+iXYf+syqxEIM hGkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b=oyL5dcvf; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s18-20020a1709067b9200b006fea2a02361si13088587ejo.371.2022.05.24.18.21.33; Tue, 24 May 2022 18:21:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b=oyL5dcvf; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5910F68B50E; Wed, 25 May 2022 04:21:30 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 15AE668B47C for ; Wed, 25 May 2022 04:21:23 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1653441690; x=1684977690; h=from:to:cc:subject:date:message-id: content-transfer-encoding:mime-version; bh=kI5x/Dsou4gaaSfzRysevr9jWeD6jq+Vgfgt2EUeK30=; b=oyL5dcvfsO5KNTP9z+4JtQUkd3nrm+RVUKMJVVtBqTC1E8VbO3Np9BH1 rYX64UDsu7ZS5OSTWKwMUaeiJtvAou2r2fDv1gV4Lh7VktLxys00329EU X44DnfPKYQYKeusHqjGZ1NVVzvb2ZkYTZeTJG2SsJBbs/jE+k059iDq8A c=; X-IronPort-AV: E=Sophos;i="5.91,250,1647302400"; d="scan'208";a="197492583" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-pdx-2c-d9fba5dd.us-west-2.amazon.com) ([10.43.8.2]) by smtp-border-fw-33001.sea14.amazon.com with ESMTP; 25 May 2022 01:21:22 +0000 Received: from EX13MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-pdx-2c-d9fba5dd.us-west-2.amazon.com (Postfix) with ESMTPS id 91A4A40DE8; Wed, 25 May 2022 01:21:20 +0000 (UTC) Received: from EX13D01UWB002.ant.amazon.com (10.43.161.136) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Wed, 25 May 2022 01:21:19 +0000 Received: from EX13D07UWB004.ant.amazon.com (10.43.161.196) by EX13d01UWB002.ant.amazon.com (10.43.161.136) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Wed, 25 May 2022 01:21:19 +0000 Received: from EX13D07UWB004.ant.amazon.com ([10.43.161.196]) by EX13D07UWB004.ant.amazon.com ([10.43.161.196]) with mapi id 15.00.1497.036; Wed, 25 May 2022 01:21:19 +0000 From: "Swinney, Jonathan" To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH v2 0/2] checkasm: added additional dstW tests for hscale Thread-Index: Adhv1HXEy/8+GAiqR16bBW8gU/9iKg== Date: Wed, 25 May 2022 01:21:19 +0000 Message-ID: <7a6e659930f74fa3827c967e5858d9d4@EX13D07UWB004.ant.amazon.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.43.160.132] MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 0/2] checkasm: added additional dstW tests for hscale X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?iso-8859-1?q?Martin_Storsj=F6?= , "Pop, Sebastian" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ss0ubf9NPCyl This is a resubmission of changes to the hscale function for aarch64. I added a test as a separate patch so that it would be easier to get consistent before and after performance data. After Martin already submitted the improvement to the final section which adds up the results, the additional performance gains of the changes to the filterSize == 8 were marginal, so I took them out of this patch to show only the work with clear improvement. I may submit changes to the other function in the future. I also removed my changes to vertical scaling from the patch series because there are some problems with the existing checkasm for yuv2planeX in aarch64. Martin, do you know why there is a different reference function used for testing in tests/checkasm/sw_scale.c than the one in libswscale/output.c? I haven't figured out how to reconcile these differences so I will resubmit that change later once I do. Thanks again for your reviews! Jonathan Swinney