[FFmpeg-devel] arm/hevc_idct: fix compilation on Android

Message ID	20171208221413.2492-1-jamrial@gmail.com
State	Accepted
Commit	36de24d5b7d67ab323ed41c7dc06fa0345404227
Headers	show Delivered-To: ffmpegpatchwork@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; From: James Almer <jamrial@gmail.com> To: ffmpeg-devel@ffmpeg.org Date: Fri, 8 Dec 2017 19:14:13 -0300 Message-Id: <20171208221413.2492-1-jamrial@gmail.com> In-Reply-To: <20171208191300.GN4636@nb4> References: <20171208191300.GN4636@nb4> Subject: [FFmpeg-devel] [PATCH] arm/hevc_idct: fix compilation on Android Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Message ID

20171208221413.2492-1-jamrial@gmail.com

State

Accepted

Commit

36de24d5b7d67ab323ed41c7dc06fa0345404227

Headers

Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100; 
From: James Almer <jamrial@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Date: Fri,  8 Dec 2017 19:14:13 -0300
Message-Id: <20171208221413.2492-1-jamrial@gmail.com>
In-Reply-To: <20171208191300.GN4636@nb4>
References: <20171208191300.GN4636@nb4>
Subject: [FFmpeg-devel] [PATCH] arm/hevc_idct: fix compilation on Android
Precedence: list
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Commit Message

James Almer Dec. 8, 2017, 10:14 p.m. UTC

Compilation error "out of range" fixed for armeabi-v7a. Compilation failed
trying to build libvlc.aar for ARM7 android on ubuntu 16.04 host. Error
messages is "Offset out of range". The reason of the error is assembler LDR
directives in function "ff_hevc_transform_luma_4x4_neon_8" need local storage
in range <1k, but no such storage provided.

Based on a patch by Ihor Bobalo <bob@eleks.com>

Suggested-by: wbs
Signed-off-by: James Almer <jamrial@gmail.com>
---
This solution prevents increasing the diff of functions shared with
libav and thus makes future merges easier.

Untested.

 libavcodec/arm/hevcdsp_idct_neon.S | 119 +++++++++++++++++++------------------
 1 file changed, 60 insertions(+), 59 deletions(-)

Comments

Jan Ekström Dec. 9, 2017, 12:52 p.m. UTC | #1

On Sat, Dec 9, 2017 at 12:14 AM, James Almer <jamrial@gmail.com> wrote:
> Compilation error "out of range" fixed for armeabi-v7a. Compilation failed
> trying to build libvlc.aar for ARM7 android on ubuntu 16.04 host. Error
> messages is "Offset out of range". The reason of the error is assembler LDR
> directives in function "ff_hevc_transform_luma_4x4_neon_8" need local storage
> in range <1k, but no such storage provided.
>
> Based on a patch by Ihor Bobalo <bob@eleks.com>
>
> Suggested-by: wbs
> Signed-off-by: James Almer <jamrial@gmail.com>
>

Can confirm that this fixes compilation, clang+gas-preprocessor works. Thanks.

Jan

Jan Ekström Dec. 9, 2017, 1:12 p.m. UTC | #2

On Sat, Dec 9, 2017 at 2:52 PM, Jan Ekström <jeebjp@gmail.com> wrote:
> On Sat, Dec 9, 2017 at 12:14 AM, James Almer <jamrial@gmail.com> wrote:
>> Compilation error "out of range" fixed for armeabi-v7a. Compilation failed
>> trying to build libvlc.aar for ARM7 android on ubuntu 16.04 host. Error
>> messages is "Offset out of range". The reason of the error is assembler LDR
>> directives in function "ff_hevc_transform_luma_4x4_neon_8" need local storage
>> in range <1k, but no such storage provided.
>>
>> Based on a patch by Ihor Bobalo <bob@eleks.com>
>>
>> Suggested-by: wbs
>> Signed-off-by: James Almer <jamrial@gmail.com>
>>
>
> Can confirm that this fixes compilation, clang+gas-preprocessor works. Thanks.
>
> Jan

Aditionally tested with armv7 GCC 4.9 as well, so nothing seems to be
broken on that side, either. This is not surprising, as this only
moves a function and a macro - as well as adds a .ltorg which causes
the literal pool to be dumped. This helps as the offsets are getting
too large otherwise for the SIMD used in the following function.

Given the limited change in this, I'd say LGTM and will push tonight
unless someone objects.

Jan

Jan Ekström Dec. 9, 2017, 7:51 p.m. UTC | #3

On Sat, Dec 9, 2017 at 12:14 AM, James Almer <jamrial@gmail.com> wrote:
> Compilation error "out of range" fixed for armeabi-v7a. Compilation failed
> trying to build libvlc.aar for ARM7 android on ubuntu 16.04 host. Error
> messages is "Offset out of range". The reason of the error is assembler LDR
> directives in function "ff_hevc_transform_luma_4x4_neon_8" need local storage
> in range <1k, but no such storage provided.
>
> Based on a patch by Ihor Bobalo <bob@eleks.com>
>
> Suggested-by: wbs
> Signed-off-by: James Almer <jamrial@gmail.com>
> ---

Pushed. Thanks for posting the patch.

Jan

diff --git a/libavcodec/arm/hevcdsp_idct_neon.S b/libavcodec/arm/hevcdsp_idct_neon.S
index 139029a256..75795e6a6a 100644
--- a/libavcodec/arm/hevcdsp_idct_neon.S
+++ b/libavcodec/arm/hevcdsp_idct_neon.S
@@ -229,65 +229,6 @@  function ff_hevc_add_residual_32x32_10_neon, export=1
         bx              lr
 endfunc
 
-/* uses registers q2 - q9 for temp values */
-/* TODO: reorder */
-.macro tr4_luma_shift r0, r1, r2, r3, shift
-        vaddl.s16   q5, \r0, \r2    // c0 = src0 + src2
-        vaddl.s16   q2, \r2, \r3    // c1 = src2 + src3
-        vsubl.s16   q4, \r0, \r3    // c2 = src0 - src3
-        vmull.s16   q6, \r1, d0[0]  // c3 = 74 * src1
-
-        vaddl.s16   q7, \r0, \r3    // src0 + src3
-        vsubw.s16   q7, q7, \r2     // src0 - src2 + src3
-        vmul.s32    q7, q7, d0[0]   // dst2 = 74 * (src0 - src2 + src3)
-
-        vmul.s32    q8, q5, d0[1]   // 29 * c0
-        vmul.s32    q9, q2, d1[0]   // 55 * c1
-        vadd.s32    q8, q9          // 29 * c0 + 55 * c1
-        vadd.s32    q8, q6          // dst0 = 29 * c0 + 55 * c1 + c3
-
-        vmul.s32    q2, q2, d0[1]   // 29 * c1
-        vmul.s32    q9, q4, d1[0]   // 55 * c2
-        vsub.s32    q9, q2          // 55 * c2 - 29 * c1
-        vadd.s32    q9, q6          // dst1 = 55 * c2 - 29 * c1 + c3
-
-        vmul.s32    q5, q5, d1[0]   // 55 * c0
-        vmul.s32    q4, q4, d0[1]   // 29 * c2
-        vadd.s32    q5, q4          // 55 * c0 + 29 * c2
-        vsub.s32    q5, q6          // dst3 = 55 * c0 + 29 * c2 - c3
-
-        vqrshrn.s32   \r0, q8, \shift
-        vqrshrn.s32   \r1, q9, \shift
-        vqrshrn.s32   \r2, q7, \shift
-        vqrshrn.s32   \r3, q5, \shift
-.endm
-
-function ff_hevc_transform_luma_4x4_neon_8, export=1
-        vpush       {d8-d15}
-        vld1.16     {q14, q15}, [r0]  // coeffs
-        ldr         r3, =0x4a  // 74
-        vmov.32     d0[0], r3
-        ldr         r3, =0x1d  // 29
-        vmov.32     d0[1], r3
-        ldr         r3, =0x37  // 55
-        vmov.32     d1[0], r3
-
-        tr4_luma_shift d28, d29, d30, d31, #7
-
-        vtrn.16     d28, d29
-        vtrn.16     d30, d31
-        vtrn.32     q14, q15
-
-        tr4_luma_shift d28, d29, d30, d31, #12
-
-        vtrn.16     d28, d29
-        vtrn.16     d30, d31
-        vtrn.32     q14, q15
-        vst1.16     {q14, q15}, [r0]
-        vpop        {d8-d15}
-        bx lr
-endfunc
-
 .macro idct_4x4_dc bitdepth
 function ff_hevc_idct_4x4_dc_\bitdepth\()_neon, export=1
         ldrsh           r1, [r0]
@@ -1040,3 +981,63 @@  idct_32x32 8
 idct_32x32_dc 8
 idct_32x32 10
 idct_32x32_dc 10
+
+/* uses registers q2 - q9 for temp values */
+/* TODO: reorder */
+.macro tr4_luma_shift r0, r1, r2, r3, shift
+        vaddl.s16   q5, \r0, \r2    // c0 = src0 + src2
+        vaddl.s16   q2, \r2, \r3    // c1 = src2 + src3
+        vsubl.s16   q4, \r0, \r3    // c2 = src0 - src3
+        vmull.s16   q6, \r1, d0[0]  // c3 = 74 * src1
+
+        vaddl.s16   q7, \r0, \r3    // src0 + src3
+        vsubw.s16   q7, q7, \r2     // src0 - src2 + src3
+        vmul.s32    q7, q7, d0[0]   // dst2 = 74 * (src0 - src2 + src3)
+
+        vmul.s32    q8, q5, d0[1]   // 29 * c0
+        vmul.s32    q9, q2, d1[0]   // 55 * c1
+        vadd.s32    q8, q9          // 29 * c0 + 55 * c1
+        vadd.s32    q8, q6          // dst0 = 29 * c0 + 55 * c1 + c3
+
+        vmul.s32    q2, q2, d0[1]   // 29 * c1
+        vmul.s32    q9, q4, d1[0]   // 55 * c2
+        vsub.s32    q9, q2          // 55 * c2 - 29 * c1
+        vadd.s32    q9, q6          // dst1 = 55 * c2 - 29 * c1 + c3
+
+        vmul.s32    q5, q5, d1[0]   // 55 * c0
+        vmul.s32    q4, q4, d0[1]   // 29 * c2
+        vadd.s32    q5, q4          // 55 * c0 + 29 * c2
+        vsub.s32    q5, q6          // dst3 = 55 * c0 + 29 * c2 - c3
+
+        vqrshrn.s32   \r0, q8, \shift
+        vqrshrn.s32   \r1, q9, \shift
+        vqrshrn.s32   \r2, q7, \shift
+        vqrshrn.s32   \r3, q5, \shift
+.endm
+
+.ltorg
+function ff_hevc_transform_luma_4x4_neon_8, export=1
+        vpush       {d8-d15}
+        vld1.16     {q14, q15}, [r0]  // coeffs
+        ldr         r3, =0x4a  // 74
+        vmov.32     d0[0], r3
+        ldr         r3, =0x1d  // 29
+        vmov.32     d0[1], r3
+        ldr         r3, =0x37  // 55
+        vmov.32     d1[0], r3
+
+        tr4_luma_shift d28, d29, d30, d31, #7
+
+        vtrn.16     d28, d29
+        vtrn.16     d30, d31
+        vtrn.32     q14, q15
+
+        tr4_luma_shift d28, d29, d30, d31, #12
+
+        vtrn.16     d28, d29
+        vtrn.16     d30, d31
+        vtrn.32     q14, q15
+        vst1.16     {q14, q15}, [r0]
+        vpop        {d8-d15}
+        bx lr
+endfunc

[FFmpeg-devel] arm/hevc_idct: fix compilation on Android

Commit Message

Comments

Patch