From patchwork Thu Aug 19 21:31:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikhail Nitenko X-Patchwork-Id: 29623 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2a4a:0:0:0:0 with SMTP id k10csp659596iov; Thu, 19 Aug 2021 14:31:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzk+0kMtmC/tAmf9Q2vTY2KGPb2M/U4wreTzBysA0b0C30pNfUn+/8ghqzpjK0kgJsTT+8x X-Received: by 2002:a05:6402:48e:: with SMTP id k14mr18803099edv.212.1629408682081; Thu, 19 Aug 2021 14:31:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629408682; cv=none; d=google.com; s=arc-20160816; b=NYFxh/wSnrrMqaUWyoJgC3o2fv3FhMW88LCMrBKfZx13d3QigIsN5rbt3/+uWygPw0 /Y7Ig+gczgNWsblHnj+gtmunexn/nYL8SzpR1MVOkvXJyB29GmM0zQALeS9qk2qBBjtQ Yso8zGorU8KkJ+CnhueDj+QLR6BE9DKiGD4UITJyjrWXonE61SWBeAYJRo8ZdBy1OoPn 6uf/CbLQvOP1MiBi/AxZrULCkZP0ccG5peA7e/dWHJrz7uEEfPoL2QRRI5ImbFEKIgW3 b8h9fiwlU9/SIt88RXnOTYpKGjQDLR9VDY2vjtk3Pgmoaj92OsB6M4DwQACFmFfSpIFB kMjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=UMWYqJV/oR/cpAj1cMbwQV0W+UmAP2OVrn+FuS+vlIY=; b=09N4XGHRJFs/9OMyUwgr+9ziJxpPWIRAiSIQQEFSFgd5plTX/HJJVLC2Xcb7KxBiRZ wx0UtgdBq0wBuIBxyW81TfwKxbKlBYfQnKrlmaSxnoL96Qh5QcXPRVyz+cjhUUvWsk7y FEQwbq4R1kzShLeFLGGndjJOhFwfS/RlRUgzT58e8xAo8B8V5LPsK7e54xtCdCDv/5g/ fwaL0th5IQl422f2jHDE5byjoo4qBufYh3l6bf/kHHJMsSHA+81PEnC7FXec7ge8e7oI DLg3SF0+Y5bfZ28uE8XTwlnaI/7VCro7jEy89O7+8GF5gvtuHCMYkoeJsxmShrgEkzwl /NtA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=dOJjgxPv; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id yz15si4557072ejb.470.2021.08.19.14.31.21; Thu, 19 Aug 2021 14:31:22 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=dOJjgxPv; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AD8A768994A; Fri, 20 Aug 2021 00:31:17 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B6A35680213 for ; Fri, 20 Aug 2021 00:31:10 +0300 (EEST) Received: by mail-lf1-f44.google.com with SMTP id p38so15977582lfa.0 for ; Thu, 19 Aug 2021 14:31:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=And1uuXXOszLwZ0F987XbFOslRq2zVdB1+2+IwD1Nzk=; b=dOJjgxPvt6MGCoFNr2YpjkpLyh3AYEUwQq2zsOVSRRZiKMEgTrWkqzQsVzXe9A6XAX q25XgdWVTGVrp4MQciOjYvlCn+2Aw0Z6UEUczQjAGWEAouCY5nHKJdHVQRotASmAtBoK erRhEfAMxl+o2LJDYXep56nhYNk5Z70JMmyK6zJrmSNCCsinu4azbZZ292cR5EzSmS4x KgpkzshIGAU74wzzdRa4nrFMT1KruI3YQqxO2RCfwVlMgqHsnaGYADWwBRfZ5/2x/d8Y /p3lkDrBu5E5pQOUEyKYVXecMzcN1G2IgHZxxHTUoNWv+JM3TIK6AarP/63LHonTOd/D l/IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=And1uuXXOszLwZ0F987XbFOslRq2zVdB1+2+IwD1Nzk=; b=b0chJvqLJ4my2dW5ld+E7R6jRICepSRSd3kctInvI/34p28vBr+KC9CAfcS1WIE73F HhK/pbniMoryJUqy+o7/+2Dq8g+fnJTnK4MSbuYOi65Occ6gG/Vh63+wmSG5OJ4IjnbJ O1IMBl3et6Pe+oWKzUa/ioI/hR9YyO7IGXxz408MX/5VVOILgJhXYR+kjxhXs0chn6kX vqHTO6YTMtXPOfzic5WWZo5PTm0jcyY53zCwI0eTzCFIMSa37F5EaqxEILFpz2NYmCcE HKYcbwm3MABf4CNbFBvcKm+MoPSqu1NDUV9sRasVbocMd5AlaNnUxmZBTBxvHKx/SBtU avHw== X-Gm-Message-State: AOAM533tS3SQjyzuVFpDqT4xhRLTd5ixZmrzURJTOgPdHSX3cMHImEPK wGb/hyZ7A1ZqhFeOI9+oCxitOso1XIFO2Q== X-Received: by 2002:a19:f60c:: with SMTP id x12mr12309180lfe.244.1629408669348; Thu, 19 Aug 2021 14:31:09 -0700 (PDT) Received: from localhost.localdomain ([213.87.146.53]) by smtp.gmail.com with ESMTPSA id u10sm421596lft.252.2021.08.19.14.31.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Aug 2021 14:31:08 -0700 (PDT) From: Mikhail Nitenko To: ffmpeg-devel@ffmpeg.org Date: Fri, 20 Aug 2021 00:31:01 +0300 Message-Id: <20210819213102.603690-1-mnitenko@gmail.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavc/aarch64: move transpose_4x4S and transpose_8x8S to neon.S X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Mikhail Nitenko Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: bhYxQ5cDoG7w transpose_4x4S and transpose_8x8S were declared in vp9itxfm_16bpp_neon, however these macros are not unique to vp9 and could be used elsewhere. Signed-off-by: Mikhail Nitenko --- libavcodec/aarch64/neon.S | 49 ++++++++++++++++++++++++ libavcodec/aarch64/vp9itxfm_16bpp_neon.S | 49 ------------------------ 2 files changed, 49 insertions(+), 49 deletions(-) diff --git a/libavcodec/aarch64/neon.S b/libavcodec/aarch64/neon.S index 1ad32c359d..4186186185 100644 --- a/libavcodec/aarch64/neon.S +++ b/libavcodec/aarch64/neon.S @@ -160,3 +160,52 @@ trn2 \r7\().2D, \r9\().2D, \r7\().2D .endm + +.macro transpose_4x4S r0, r1, r2, r3, r4, r5, r6, r7 + trn1 \r4\().4s, \r0\().4s, \r1\().4s + trn2 \r5\().4s, \r0\().4s, \r1\().4s + trn1 \r6\().4s, \r2\().4s, \r3\().4s + trn2 \r7\().4s, \r2\().4s, \r3\().4s + trn1 \r0\().2d, \r4\().2d, \r6\().2d + trn2 \r2\().2d, \r4\().2d, \r6\().2d + trn1 \r1\().2d, \r5\().2d, \r7\().2d + trn2 \r3\().2d, \r5\().2d, \r7\().2d +.endm + +// Transpose a 8x8 matrix of 32 bit elements, where each row is spread out +// over two registers. +.macro transpose_8x8S r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r13, r14, r15, t0, t1, t2, t3 + transpose_4x4S \r0, \r2, \r4, \r6, \t0, \t1, \t2, \t3 + transpose_4x4S \r9, \r11, \r13, \r15, \t0, \t1, \t2, \t3 + + // Do 4x4 transposes of r1,r3,r5,r7 and r8,r10,r12,r14 + // while swapping the two 4x4 matrices between each other + + // First step of the 4x4 transpose of r1-r7, into t0-t3 + trn1 \t0\().4s, \r1\().4s, \r3\().4s + trn2 \t1\().4s, \r1\().4s, \r3\().4s + trn1 \t2\().4s, \r5\().4s, \r7\().4s + trn2 \t3\().4s, \r5\().4s, \r7\().4s + + // First step of the 4x4 transpose of r8-r12, into r1-r7 + trn1 \r1\().4s, \r8\().4s, \r10\().4s + trn2 \r3\().4s, \r8\().4s, \r10\().4s + trn1 \r5\().4s, \r12\().4s, \r14\().4s + trn2 \r7\().4s, \r12\().4s, \r14\().4s + + // Second step of the 4x4 transpose of r1-r7 (now in t0-r3), into r8-r12 + trn1 \r8\().2d, \t0\().2d, \t2\().2d + trn2 \r12\().2d, \t0\().2d, \t2\().2d + trn1 \r10\().2d, \t1\().2d, \t3\().2d + trn2 \r14\().2d, \t1\().2d, \t3\().2d + + // Second step of the 4x4 transpose of r8-r12 (now in r1-r7), in place as far as possible + trn1 \t0\().2d, \r1\().2d, \r5\().2d + trn2 \r5\().2d, \r1\().2d, \r5\().2d + trn1 \t1\().2d, \r3\().2d, \r7\().2d + trn2 \r7\().2d, \r3\().2d, \r7\().2d + + // Move the outputs of trn1 back in place + mov \r1\().16b, \t0\().16b + mov \r3\().16b, \t1\().16b +.endm \ No newline at end of file diff --git a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S b/libavcodec/aarch64/vp9itxfm_16bpp_neon.S index 68296d9c40..a165ab3271 100644 --- a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S +++ b/libavcodec/aarch64/vp9itxfm_16bpp_neon.S @@ -41,55 +41,6 @@ const iadst16_coeffs, align=4 .short 14811, 7005, 13160, 9760, 5520, 15426, 2404, 16207 endconst -.macro transpose_4x4s r0, r1, r2, r3, r4, r5, r6, r7 - trn1 \r4\().4s, \r0\().4s, \r1\().4s - trn2 \r5\().4s, \r0\().4s, \r1\().4s - trn1 \r6\().4s, \r2\().4s, \r3\().4s - trn2 \r7\().4s, \r2\().4s, \r3\().4s - trn1 \r0\().2d, \r4\().2d, \r6\().2d - trn2 \r2\().2d, \r4\().2d, \r6\().2d - trn1 \r1\().2d, \r5\().2d, \r7\().2d - trn2 \r3\().2d, \r5\().2d, \r7\().2d -.endm - -// Transpose a 8x8 matrix of 32 bit elements, where each row is spread out -// over two registers. -.macro transpose_8x8s r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r13, r14, r15, t0, t1, t2, t3 - transpose_4x4s \r0, \r2, \r4, \r6, \t0, \t1, \t2, \t3 - transpose_4x4s \r9, \r11, \r13, \r15, \t0, \t1, \t2, \t3 - - // Do 4x4 transposes of r1,r3,r5,r7 and r8,r10,r12,r14 - // while swapping the two 4x4 matrices between each other - - // First step of the 4x4 transpose of r1-r7, into t0-t3 - trn1 \t0\().4s, \r1\().4s, \r3\().4s - trn2 \t1\().4s, \r1\().4s, \r3\().4s - trn1 \t2\().4s, \r5\().4s, \r7\().4s - trn2 \t3\().4s, \r5\().4s, \r7\().4s - - // First step of the 4x4 transpose of r8-r12, into r1-r7 - trn1 \r1\().4s, \r8\().4s, \r10\().4s - trn2 \r3\().4s, \r8\().4s, \r10\().4s - trn1 \r5\().4s, \r12\().4s, \r14\().4s - trn2 \r7\().4s, \r12\().4s, \r14\().4s - - // Second step of the 4x4 transpose of r1-r7 (now in t0-r3), into r8-r12 - trn1 \r8\().2d, \t0\().2d, \t2\().2d - trn2 \r12\().2d, \t0\().2d, \t2\().2d - trn1 \r10\().2d, \t1\().2d, \t3\().2d - trn2 \r14\().2d, \t1\().2d, \t3\().2d - - // Second step of the 4x4 transpose of r8-r12 (now in r1-r7), in place as far as possible - trn1 \t0\().2d, \r1\().2d, \r5\().2d - trn2 \r5\().2d, \r1\().2d, \r5\().2d - trn1 \t1\().2d, \r3\().2d, \r7\().2d - trn2 \r7\().2d, \r3\().2d, \r7\().2d - - // Move the outputs of trn1 back in place - mov \r1\().16b, \t0\().16b - mov \r3\().16b, \t1\().16b -.endm - // out1 = ((in1 + in2) * d0[0] + (1 << 13)) >> 14 // out2 = ((in1 - in2) * d0[0] + (1 << 13)) >> 14 // in/out are .4s registers; this can do with 4 temp registers, but is