From patchwork Wed Mar 11 12:04:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carl Eugen Hoyos X-Patchwork-Id: 18125 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id CDAF24498BB for ; Wed, 11 Mar 2020 14:04:51 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B06D268AA74; Wed, 11 Mar 2020 14:04:51 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1766968AA74 for ; Wed, 11 Mar 2020 14:04:45 +0200 (EET) Received: by mail-io1-f50.google.com with SMTP id v3so1581170iom.13 for ; Wed, 11 Mar 2020 05:04:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=WdCMlE/nGn/U5vpKyT3TeklsjBUmhojtikbvcPRj7Q0=; b=VZgqlKl8G4peWo+EJudjhk9pjiSsP0qRL/99uFXZty3H9HWcfRQBG0X2R1o82chZjN oZY6hvC/BP6lmNpMjIx6ZEFuoSfBFB7dWXOA61eu9iBhu43L6fjsAiZINPIKQtR4GygB YJFyM/41HyJZqDDcUDWxX+ib9hBxsU/Q1nK785HgtBdnL6JQLaR6YgpOTyBWeKU9MpbX jxUDgxrzxCt9+Gs4roj/di5AJJwIlme6Tj8XdPaFrltW82r+F1286ztTjAsjnf/dRTpl NdyJY7BWJ3xbl94buIW8y+/Y76zDofH7RIUpr3PaqswKNRykYd83LjT1yvG1VlggNKQn xGeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=WdCMlE/nGn/U5vpKyT3TeklsjBUmhojtikbvcPRj7Q0=; b=YxLk7FykR4uvX1iEBB1/6SijgFY6vYsOpJ5rsUahUudu/d9wct1a68ZeWad/XLoQPg Djv6U/ggRN4amrOToW4C0Um6rK81fHcxLKsl5SWQvuSGw3xNQLit6dYVbWB6Aw7G4lNo a8cCmRfNGlf0IEqQGVY+x7gb69LQ1yGYTMgd8C28IPoLPT1ak/iNo9mo3YygsEPNdRwP uiNm1q39Vasm4W6kRyscUVgxD9hxp6xD7gP+Y8JcQM7mJgavHjrYD6pEY80K5LKLQA/o jE00Y5t8Xfjx49sobD9JifdLVt/uyZ1fVgbU8Ek7oU14ILWwdOUo632ug4WUor3FvqGP loAQ== X-Gm-Message-State: ANhLgQ0EgmhsstFlxNNkjJQqP9oC3fjL0mwkADyTNLMdb322LfMJXSK1 yMpbui+OOz0FSAKAaOFf77kZs2RZG//AIEDmJ1B7+g== X-Google-Smtp-Source: ADFU+vv7OSjCU6IvB3W2uDv0aCpAUC6pBrSCeiK0UE4aSTogOPLAWfs0NRyl1CZiERy7sPBn45MOwO9hb0/sZ5CrByk= X-Received: by 2002:a05:6602:1217:: with SMTP id y23mr2696864iot.34.1583928283365; Wed, 11 Mar 2020 05:04:43 -0700 (PDT) MIME-Version: 1.0 From: Carl Eugen Hoyos Date: Wed, 11 Mar 2020 13:04:32 +0100 Message-ID: To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH]lavc/aarch64: Move non-neon vp9 copy functions out of neon source file X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Hi! Attached patch fixes part of ticket #8565 (compilation with --disable-neon is broken on aarch64). Please comment, Carl Eugen From d96c8d26802978077d5d32b7aa2b535eca99cfea Mon Sep 17 00:00:00 2001 From: Carl Eugen Hoyos Date: Wed, 11 Mar 2020 13:01:02 +0100 Subject: [PATCH] lavc/aarch64: Move non-neon vp9 copy functions out of neon source file. Fixes part of ticket #8565. --- libavcodec/aarch64/Makefile | 1 + libavcodec/aarch64/vp9mc_16bpp_neon.S | 25 --------- libavcodec/aarch64/vp9mc_aarch64.c | 81 +++++++++++++++++++++++++++ libavcodec/aarch64/vp9mc_neon.S | 30 ---------- 4 files changed, 82 insertions(+), 55 deletions(-) create mode 100644 libavcodec/aarch64/vp9mc_aarch64.c diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index 00f93bf59f..90e7210ee0 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -21,6 +21,7 @@ OBJS-$(CONFIG_VC1DSP) += aarch64/vc1dsp_init_aarch64.o OBJS-$(CONFIG_VORBIS_DECODER) += aarch64/vorbisdsp_init.o OBJS-$(CONFIG_VP9_DECODER) += aarch64/vp9dsp_init_10bpp_aarch64.o \ aarch64/vp9dsp_init_12bpp_aarch64.o \ + aarch64/vp9mc_aarch64.o \ aarch64/vp9dsp_init_aarch64.o # ARMv8 optimizations diff --git a/libavcodec/aarch64/vp9mc_16bpp_neon.S b/libavcodec/aarch64/vp9mc_16bpp_neon.S index cac6428709..53b372c262 100644 --- a/libavcodec/aarch64/vp9mc_16bpp_neon.S +++ b/libavcodec/aarch64/vp9mc_16bpp_neon.S @@ -25,31 +25,6 @@ // const uint8_t *ref, ptrdiff_t ref_stride, // int h, int mx, int my); -function ff_vp9_copy128_aarch64, export=1 -1: - ldp x5, x6, [x2] - ldp x7, x8, [x2, #16] - stp x5, x6, [x0] - ldp x9, x10, [x2, #32] - stp x7, x8, [x0, #16] - subs w4, w4, #1 - ldp x11, x12, [x2, #48] - stp x9, x10, [x0, #32] - stp x11, x12, [x0, #48] - ldp x5, x6, [x2, #64] - ldp x7, x8, [x2, #80] - stp x5, x6, [x0, #64] - ldp x9, x10, [x2, #96] - stp x7, x8, [x0, #80] - ldp x11, x12, [x2, #112] - stp x9, x10, [x0, #96] - stp x11, x12, [x0, #112] - add x2, x2, x3 - add x0, x0, x1 - b.ne 1b - ret -endfunc - function ff_vp9_avg64_16_neon, export=1 mov x5, x0 sub x1, x1, #64 diff --git a/libavcodec/aarch64/vp9mc_aarch64.c b/libavcodec/aarch64/vp9mc_aarch64.c new file mode 100644 index 0000000000..f17a8cf04a --- /dev/null +++ b/libavcodec/aarch64/vp9mc_aarch64.c @@ -0,0 +1,81 @@ +/* + * Copyright (c) 2016 Google Inc. + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +// All public functions in this file have the following signature: +// typedef void (*vp9_mc_func)(uint8_t *dst, ptrdiff_t dst_stride, +// const uint8_t *ref, ptrdiff_t ref_stride, +// int h, int mx, int my); + +function ff_vp9_copy128_aarch64, export=1 +1: + ldp x5, x6, [x2] + ldp x7, x8, [x2, #16] + stp x5, x6, [x0] + ldp x9, x10, [x2, #32] + stp x7, x8, [x0, #16] + subs w4, w4, #1 + ldp x11, x12, [x2, #48] + stp x9, x10, [x0, #32] + stp x11, x12, [x0, #48] + ldp x5, x6, [x2, #64] + ldp x7, x8, [x2, #80] + stp x5, x6, [x0, #64] + ldp x9, x10, [x2, #96] + stp x7, x8, [x0, #80] + ldp x11, x12, [x2, #112] + stp x9, x10, [x0, #96] + stp x11, x12, [x0, #112] + add x2, x2, x3 + add x0, x0, x1 + b.ne 1b + ret +endfunc + +function ff_vp9_copy64_aarch64, export=1 +1: + ldp x5, x6, [x2] + ldp x7, x8, [x2, #16] + stp x5, x6, [x0] + ldp x9, x10, [x2, #32] + stp x7, x8, [x0, #16] + subs w4, w4, #1 + ldp x11, x12, [x2, #48] + stp x9, x10, [x0, #32] + stp x11, x12, [x0, #48] + add x2, x2, x3 + add x0, x0, x1 + b.ne 1b + ret +endfunc + +function ff_vp9_copy32_aarch64, export=1 +1: + ldp x5, x6, [x2] + ldp x7, x8, [x2, #16] + stp x5, x6, [x0] + subs w4, w4, #1 + stp x7, x8, [x0, #16] + add x2, x2, x3 + add x0, x0, x1 + b.ne 1b + ret +endfunc diff --git a/libavcodec/aarch64/vp9mc_neon.S b/libavcodec/aarch64/vp9mc_neon.S index f67624ca04..abf2bae9db 100644 --- a/libavcodec/aarch64/vp9mc_neon.S +++ b/libavcodec/aarch64/vp9mc_neon.S @@ -25,23 +25,6 @@ // const uint8_t *ref, ptrdiff_t ref_stride, // int h, int mx, int my); -function ff_vp9_copy64_aarch64, export=1 -1: - ldp x5, x6, [x2] - ldp x7, x8, [x2, #16] - stp x5, x6, [x0] - ldp x9, x10, [x2, #32] - stp x7, x8, [x0, #16] - subs w4, w4, #1 - ldp x11, x12, [x2, #48] - stp x9, x10, [x0, #32] - stp x11, x12, [x0, #48] - add x2, x2, x3 - add x0, x0, x1 - b.ne 1b - ret -endfunc - function ff_vp9_avg64_neon, export=1 mov x5, x0 1: @@ -64,19 +47,6 @@ function ff_vp9_avg64_neon, export=1 ret endfunc -function ff_vp9_copy32_aarch64, export=1 -1: - ldp x5, x6, [x2] - ldp x7, x8, [x2, #16] - stp x5, x6, [x0] - subs w4, w4, #1 - stp x7, x8, [x0, #16] - add x2, x2, x3 - add x0, x0, x1 - b.ne 1b - ret -endfunc - function ff_vp9_avg32_neon, export=1 1: ld1 {v2.16b, v3.16b}, [x2], x3 -- 2.24.1