From patchwork Fri Dec 8 22:14:13 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 6618 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.161.94 with SMTP id m30csp51800jah; Fri, 8 Dec 2017 14:22:05 -0800 (PST) X-Google-Smtp-Source: AGs4zMYTTbabDw8+FiRz8MCFZ4x4lOu2cXAmJob4DXYsEOXQoRHpr4oHVCJIUqoF0apKP9cE8KW4 X-Received: by 10.223.180.66 with SMTP id v2mr28222584wrd.93.1512771725651; Fri, 08 Dec 2017 14:22:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512771725; cv=none; d=google.com; s=arc-20160816; b=J8uWRb62bXsPWEEfq8G/wGVbY/XDYrAGlwx5bRT0I8O3MEV9xz4H2CduTCPFkAlfb8 Nyb3P+lJ8LNmjtbD+u7sUqihgk/jW0UhxRHQD3W98kZbEHZF4b3F8Nk/phHAhB97kd+O 3R/8iDVskkUYbv4i25ytdamgyPr4xudMDERTGRv7wctHXubV7fcK5Fkn5WevNg8DhynZ kzo8RRmcgkFOeRvPIhbzL8krKmmU0xy9vZF7kgtnEJJQhP4jr17YOtIHncvf76fPjuWV 4Z48lBwSlguFbqFSZccRHnnoKnxRki7ekYPifBcH48nSsPdiNCD7I1HQegwvNIpr5cGy OtwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=3r936xR1cJ3c95f/Y+1AcCtBTdH++nPOJ67w8kkiqNQ=; b=kXQHU9taNql68rstIe3hQRS+aRfskqFntKFPo9lp0sB6dOkkLq/kaZjItjbPcv8tcm svGs9i8n1XxNpNLmBemX839vMhSMpZrtnhIUaaIkAWt0TkxAzUwzqGz/9b6pvEI5OG+x Bgma+ygIAyAITMi7NK3hQMyj68VOPnNrSLK7BBJKgDCPcZ94yZQw8nAfomtEMKBUxMw4 KrZmZLD1TXec3cJsw/a2ZkEbjosIuCTDgMI4S/JGdkeLiD/CvlxtlK5IQUkQUbVX+FnH JKjIbYzbqFIeUIIF61Do4dcKOju4NnHe1D2UBepRjAs8gpxwuj8/fP4Pl2YlmqHzG6pe Kt/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=uS+43Cfj; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id e9si1729122wmi.141.2017.12.08.14.21.57; Fri, 08 Dec 2017 14:22:05 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=uS+43Cfj; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id F279F68A39A; Sat, 9 Dec 2017 00:21:49 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk0-f193.google.com (mail-qk0-f193.google.com [209.85.220.193]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 04C6D68A109 for ; Sat, 9 Dec 2017 00:21:44 +0200 (EET) Received: by mail-qk0-f193.google.com with SMTP id z203so698394qkb.5 for ; Fri, 08 Dec 2017 14:21:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=5dsTXX5r4jYIp2Ji8evFB82bX350fZuIJP1CTwFsGmQ=; b=uS+43CfjMWkMOkkTYl781GUE9FdssxnFQL09VZ+t9wsLRq2w5d2Gtdl3kbVvh0yt5a 9/KAN/1og+TH327VKr6xaPk9JbSxya00KUfWhnzemgwaIm6SszKftd7G9DJ5XDLgnrzg NNN6sHXIbofCbS2/Q5SCZczLsXPkxhYB7kyjyHTLU24jN4NztDIUXupbuOBNiqXMCdnd zpcQ0z/Hgdb8C5kFIUgYVWgj+XKJwF2iLSn5bh4AlkFEDHPJch9L+v+AHmnErbZ1QoGc 1WLVALBoTtx0UeJ8LBU8NEfDm2vg8EUGIE2VPdwCM4jS1OoAJeSqmZcsYiO0v9VAN/T2 X5jQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=5dsTXX5r4jYIp2Ji8evFB82bX350fZuIJP1CTwFsGmQ=; b=dD+9U5fbP53diyJMOPeYgpW3fF25xsITooyrZefCT0pPo8CglnYioTRsJYtb1rIyCc JmprMzGzYq9A99pADwTsGkriEOZaTxRu18j3d2UmMGB7jmxwPrNEOKLWdpQqXyhKiSnt X2NjHFq/2FsDMer2ojtyRcURe6oJXOuIE8oiTh5FwAozxLHoC8VWpXXMj1PiK4PuyPU+ xwPz93R6gHV7hheMLRk0Cgadi7ebJJHbh/3Yp+XJrPP4V5oQQ55lpsx0qLe+88oevog1 xgNDuEHL+K0RmGk/mirnjX2HxbYUuhMzMtq5pmGHn3X+3dlPXkjXvU57Y4ZgNHRXN9X3 qYBA== X-Gm-Message-State: AKGB3mJDY+GRPNZ65JAFsFvatjsRAHD2ulq2WhMJgTWbxeW1kt3n37t6 41/v4cZgSlMEjLfQSNrx+RPkfA== X-Received: by 10.233.239.211 with SMTP id d202mr36464755qkg.165.1512771285436; Fri, 08 Dec 2017 14:14:45 -0800 (PST) Received: from localhost.localdomain ([181.231.59.220]) by smtp.gmail.com with ESMTPSA id t28sm641013qtb.83.2017.12.08.14.14.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 08 Dec 2017 14:14:44 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Fri, 8 Dec 2017 19:14:13 -0300 Message-Id: <20171208221413.2492-1-jamrial@gmail.com> X-Mailer: git-send-email 2.15.0 In-Reply-To: <20171208191300.GN4636@nb4> References: <20171208191300.GN4636@nb4> Subject: [FFmpeg-devel] [PATCH] arm/hevc_idct: fix compilation on Android X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Compilation error "out of range" fixed for armeabi-v7a. Compilation failed trying to build libvlc.aar for ARM7 android on ubuntu 16.04 host. Error messages is "Offset out of range". The reason of the error is assembler LDR directives in function "ff_hevc_transform_luma_4x4_neon_8" need local storage in range <1k, but no such storage provided. Based on a patch by Ihor Bobalo Suggested-by: wbs Signed-off-by: James Almer --- This solution prevents increasing the diff of functions shared with libav and thus makes future merges easier. Untested. libavcodec/arm/hevcdsp_idct_neon.S | 119 +++++++++++++++++++------------------ 1 file changed, 60 insertions(+), 59 deletions(-) diff --git a/libavcodec/arm/hevcdsp_idct_neon.S b/libavcodec/arm/hevcdsp_idct_neon.S index 139029a256..75795e6a6a 100644 --- a/libavcodec/arm/hevcdsp_idct_neon.S +++ b/libavcodec/arm/hevcdsp_idct_neon.S @@ -229,65 +229,6 @@ function ff_hevc_add_residual_32x32_10_neon, export=1 bx lr endfunc -/* uses registers q2 - q9 for temp values */ -/* TODO: reorder */ -.macro tr4_luma_shift r0, r1, r2, r3, shift - vaddl.s16 q5, \r0, \r2 // c0 = src0 + src2 - vaddl.s16 q2, \r2, \r3 // c1 = src2 + src3 - vsubl.s16 q4, \r0, \r3 // c2 = src0 - src3 - vmull.s16 q6, \r1, d0[0] // c3 = 74 * src1 - - vaddl.s16 q7, \r0, \r3 // src0 + src3 - vsubw.s16 q7, q7, \r2 // src0 - src2 + src3 - vmul.s32 q7, q7, d0[0] // dst2 = 74 * (src0 - src2 + src3) - - vmul.s32 q8, q5, d0[1] // 29 * c0 - vmul.s32 q9, q2, d1[0] // 55 * c1 - vadd.s32 q8, q9 // 29 * c0 + 55 * c1 - vadd.s32 q8, q6 // dst0 = 29 * c0 + 55 * c1 + c3 - - vmul.s32 q2, q2, d0[1] // 29 * c1 - vmul.s32 q9, q4, d1[0] // 55 * c2 - vsub.s32 q9, q2 // 55 * c2 - 29 * c1 - vadd.s32 q9, q6 // dst1 = 55 * c2 - 29 * c1 + c3 - - vmul.s32 q5, q5, d1[0] // 55 * c0 - vmul.s32 q4, q4, d0[1] // 29 * c2 - vadd.s32 q5, q4 // 55 * c0 + 29 * c2 - vsub.s32 q5, q6 // dst3 = 55 * c0 + 29 * c2 - c3 - - vqrshrn.s32 \r0, q8, \shift - vqrshrn.s32 \r1, q9, \shift - vqrshrn.s32 \r2, q7, \shift - vqrshrn.s32 \r3, q5, \shift -.endm - -function ff_hevc_transform_luma_4x4_neon_8, export=1 - vpush {d8-d15} - vld1.16 {q14, q15}, [r0] // coeffs - ldr r3, =0x4a // 74 - vmov.32 d0[0], r3 - ldr r3, =0x1d // 29 - vmov.32 d0[1], r3 - ldr r3, =0x37 // 55 - vmov.32 d1[0], r3 - - tr4_luma_shift d28, d29, d30, d31, #7 - - vtrn.16 d28, d29 - vtrn.16 d30, d31 - vtrn.32 q14, q15 - - tr4_luma_shift d28, d29, d30, d31, #12 - - vtrn.16 d28, d29 - vtrn.16 d30, d31 - vtrn.32 q14, q15 - vst1.16 {q14, q15}, [r0] - vpop {d8-d15} - bx lr -endfunc - .macro idct_4x4_dc bitdepth function ff_hevc_idct_4x4_dc_\bitdepth\()_neon, export=1 ldrsh r1, [r0] @@ -1040,3 +981,63 @@ idct_32x32 8 idct_32x32_dc 8 idct_32x32 10 idct_32x32_dc 10 + +/* uses registers q2 - q9 for temp values */ +/* TODO: reorder */ +.macro tr4_luma_shift r0, r1, r2, r3, shift + vaddl.s16 q5, \r0, \r2 // c0 = src0 + src2 + vaddl.s16 q2, \r2, \r3 // c1 = src2 + src3 + vsubl.s16 q4, \r0, \r3 // c2 = src0 - src3 + vmull.s16 q6, \r1, d0[0] // c3 = 74 * src1 + + vaddl.s16 q7, \r0, \r3 // src0 + src3 + vsubw.s16 q7, q7, \r2 // src0 - src2 + src3 + vmul.s32 q7, q7, d0[0] // dst2 = 74 * (src0 - src2 + src3) + + vmul.s32 q8, q5, d0[1] // 29 * c0 + vmul.s32 q9, q2, d1[0] // 55 * c1 + vadd.s32 q8, q9 // 29 * c0 + 55 * c1 + vadd.s32 q8, q6 // dst0 = 29 * c0 + 55 * c1 + c3 + + vmul.s32 q2, q2, d0[1] // 29 * c1 + vmul.s32 q9, q4, d1[0] // 55 * c2 + vsub.s32 q9, q2 // 55 * c2 - 29 * c1 + vadd.s32 q9, q6 // dst1 = 55 * c2 - 29 * c1 + c3 + + vmul.s32 q5, q5, d1[0] // 55 * c0 + vmul.s32 q4, q4, d0[1] // 29 * c2 + vadd.s32 q5, q4 // 55 * c0 + 29 * c2 + vsub.s32 q5, q6 // dst3 = 55 * c0 + 29 * c2 - c3 + + vqrshrn.s32 \r0, q8, \shift + vqrshrn.s32 \r1, q9, \shift + vqrshrn.s32 \r2, q7, \shift + vqrshrn.s32 \r3, q5, \shift +.endm + +.ltorg +function ff_hevc_transform_luma_4x4_neon_8, export=1 + vpush {d8-d15} + vld1.16 {q14, q15}, [r0] // coeffs + ldr r3, =0x4a // 74 + vmov.32 d0[0], r3 + ldr r3, =0x1d // 29 + vmov.32 d0[1], r3 + ldr r3, =0x37 // 55 + vmov.32 d1[0], r3 + + tr4_luma_shift d28, d29, d30, d31, #7 + + vtrn.16 d28, d29 + vtrn.16 d30, d31 + vtrn.32 q14, q15 + + tr4_luma_shift d28, d29, d30, d31, #12 + + vtrn.16 d28, d29 + vtrn.16 d30, d31 + vtrn.32 q14, q15 + vst1.16 {q14, q15}, [r0] + vpop {d8-d15} + bx lr +endfunc