diff mbox series

[FFmpeg-devel] arm32/neon: Avoid using bge/beq for function calls

Message ID CACKH++YDMtFrb7eMf__D=yXkVjHDZ_P6aC1qawCqUX6XMaJO6Q@mail.gmail.com
State New
Headers show
Series [FFmpeg-devel] arm32/neon: Avoid using bge/beq for function calls | expand

Checks

Context Check Description
yinshiyou/configure_loongarch64 warning Failed to apply patch
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Rui Ueyama Jan. 7, 2023, 3:54 a.m. UTC
It looks like compiler-generated code always uses `b`, `bl` or `blx`
instructions for function calls. These instructions have a 24-bit
immediate and therefore can jump anywhere between PC +- 16 MiB.

This hand-written assembly code instead uses `bge` and `beq` for
interprocedural jumps. Since these instructions have only a 19-bit
immediate (we have less bits for condition code), they can jump only
within PC +- 512 KiB. This sometimes causes a "relocation R_ARM_THM_JUMP19
out of range" error when linked with the mold linker. This error can
easily be avoided by using `b` instead of `bge` or `beq`.

Signed-off-by: Rui Ueyama <rui314@gmail.com>
---
 libswresample/arm/audio_convert_neon.S | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--
2.34.1

Comments

Martin Storsjö Jan. 9, 2023, 4:01 p.m. UTC | #1
Hi Rui,

Long time no see!

On Sat, 7 Jan 2023, Rui Ueyama wrote:

> It looks like compiler-generated code always uses `b`, `bl` or `blx`
> instructions for function calls. These instructions have a 24-bit
> immediate and therefore can jump anywhere between PC +- 16 MiB.
>
> This hand-written assembly code instead uses `bge` and `beq` for
> interprocedural jumps. Since these instructions have only a 19-bit
> immediate (we have less bits for condition code), they can jump only
> within PC +- 512 KiB. This sometimes causes a "relocation R_ARM_THM_JUMP19
> out of range" error when linked with the mold linker. This error can
> easily be avoided by using `b` instead of `bge` or `beq`.

Can you add a bit more explanation about what happens in mold in this case 
and context about the setup - I don't quite understand how this can happen 
(even if the code admittedly is a bit unusual)?

Since .L_swri_oldapi_conv_flt_to_s16_neon and 
.L_swri_oldapi_conv_fltp_to_s16_2ch_neon are local symbols, they don't get 
emitted by the assembler, and the branch instructions are encoded with 
fixed offsets and no relocations. And even if there would be a relocation, 
the destination is within the same text section chunk in the object file, 
so it shouldn't be possible for it to be out of range.

The only possibility for this to be out of range, is if the destination is 
treated as a global and routed via the PLC?

What am I missing here?

// Martin
Martin Storsjö Jan. 9, 2023, 9:48 p.m. UTC | #2
On Mon, 9 Jan 2023, Martin Storsjö wrote:

> Hi Rui,
>
> Long time no see!
>
> On Sat, 7 Jan 2023, Rui Ueyama wrote:
>
>> It looks like compiler-generated code always uses `b`, `bl` or `blx`
>> instructions for function calls. These instructions have a 24-bit
>> immediate and therefore can jump anywhere between PC +- 16 MiB.
>> 
>> This hand-written assembly code instead uses `bge` and `beq` for
>> interprocedural jumps. Since these instructions have only a 19-bit
>> immediate (we have less bits for condition code), they can jump only
>> within PC +- 512 KiB. This sometimes causes a "relocation R_ARM_THM_JUMP19
>> out of range" error when linked with the mold linker. This error can
>> easily be avoided by using `b` instead of `bge` or `beq`.
>
> Can you add a bit more explanation about what happens in mold in this case 
> and context about the setup - I don't quite understand how this can happen 
> (even if the code admittedly is a bit unusual)?
>
> Since .L_swri_oldapi_conv_flt_to_s16_neon and 
> .L_swri_oldapi_conv_fltp_to_s16_2ch_neon are local symbols, they don't get 
> emitted by the assembler, and the branch instructions are encoded with fixed 
> offsets and no relocations. And even if there would be a relocation, the 
> destination is within the same text section chunk in the object file, so it 
> shouldn't be possible for it to be out of range.
>
> The only possibility for this to be out of range, is if the destination is 
> treated as a global and routed via the PLC?
>
> What am I missing here?

In particular, it seems like the commits 
b22db4f465c9adb2cf1489e04f7b65ef6bb55b8b and 
e84212b78e00df17799e01be1e153a073eb8f689 were introduced to fix exactly 
this issue - by converting references from using the external global 
symbols into local labels instead.

// Martin
Rui Ueyama Jan. 14, 2023, 4:08 a.m. UTC | #3
Hey Martin,

It's nice to see you on this mailing list!

Sorry about sending this email as a reply to a wrong email, as I
didn't receive your mail and thus couldn't send this as a reply to
your mail.

> On Sat, 7 Jan 2023, Rui Ueyama wrote:
>
> > It looks like compiler-generated code always uses `b`, `bl` or `blx`
> > instructions for function calls. These instructions have a 24-bit
> > immediate and therefore can jump anywhere between PC +- 16 MiB.
> >
> > This hand-written assembly code instead uses `bge` and `beq` for
> > interprocedural jumps. Since these instructions have only a 19-bit
> > immediate (we have less bits for condition code), they can jump only
> > within PC +- 512 KiB. This sometimes causes a "relocation R_ARM_THM_JUMP19
> > out of range" error when linked with the mold linker. This error can
> > easily be avoided by using `b` instead of `bge` or `beq`.
>
> Can you add a bit more explanation about what happens in mold in this case
> and context about the setup - I don't quite understand how this can happen
> (even if the code admittedly is a bit unusual)?
>
> Since .L_swri_oldapi_conv_flt_to_s16_neon and
> .L_swri_oldapi_conv_fltp_to_s16_2ch_neon are local symbols, they don't get
> emitted by the assembler, and the branch instructions are encoded with
> fixed offsets and no relocations. And even if there would be a relocation,
> the destination is within the same text section chunk in the object file,
> so it shouldn't be possible for it to be out of range.
>
> The only possibility for this to be out of range, is if the destination is
> treated as a global and routed via the PLC?

There was confusion on our side. ffmpeg used to contain two
audio_convert_neon.S as below

 libswresample/arm/audio_convert_neon.S
 libavresample/arm/audio_convert_neon.S

and the latter had a problem that I explained in the previous mail.
But that file has been removed, so there's no problem with the
existing code. I'll retract the patch I sent before. Sorry for the
confusion.

Rui
Martin Storsjö Jan. 14, 2023, 10:30 p.m. UTC | #4
Hi Rui,

On Sat, 14 Jan 2023, Rui Ueyama wrote:

>> On Sat, 7 Jan 2023, Rui Ueyama wrote:
>>
>>> It looks like compiler-generated code always uses `b`, `bl` or `blx`
>>> instructions for function calls. These instructions have a 24-bit
>>> immediate and therefore can jump anywhere between PC +- 16 MiB.
>>>
>>> This hand-written assembly code instead uses `bge` and `beq` for
>>> interprocedural jumps. Since these instructions have only a 19-bit
>>> immediate (we have less bits for condition code), they can jump only
>>> within PC +- 512 KiB. This sometimes causes a "relocation R_ARM_THM_JUMP19
>>> out of range" error when linked with the mold linker. This error can
>>> easily be avoided by using `b` instead of `bge` or `beq`.
>>
>> Can you add a bit more explanation about what happens in mold in this case
>> and context about the setup - I don't quite understand how this can happen
>> (even if the code admittedly is a bit unusual)?
>>
>> Since .L_swri_oldapi_conv_flt_to_s16_neon and
>> .L_swri_oldapi_conv_fltp_to_s16_2ch_neon are local symbols, they don't get
>> emitted by the assembler, and the branch instructions are encoded with
>> fixed offsets and no relocations. And even if there would be a relocation,
>> the destination is within the same text section chunk in the object file,
>> so it shouldn't be possible for it to be out of range.
>>
>> The only possibility for this to be out of range, is if the destination is
>> treated as a global and routed via the PLC?
>
> There was confusion on our side. ffmpeg used to contain two
> audio_convert_neon.S as below
>
> libswresample/arm/audio_convert_neon.S
> libavresample/arm/audio_convert_neon.S
>
> and the latter had a problem that I explained in the previous mail.
> But that file has been removed, so there's no problem with the
> existing code. I'll retract the patch I sent before. Sorry for the
> confusion.

Ah, I see - that explains it!

Ok, good then, that there's no issue with that code pattern in assembly - 
otherwise there could be a whole lot of issues to run into...

// Martin
diff mbox series

Patch

diff --git a/libswresample/arm/audio_convert_neon.S
b/libswresample/arm/audio_convert_neon.S
index 085d50aafa..3fe114772c 100644
--- a/libswresample/arm/audio_convert_neon.S
+++ b/libswresample/arm/audio_convert_neon.S
@@ -133,12 +133,13 @@  endfunc

 function swri_oldapi_conv_fltp_to_s16_nch_neon, export=1
         cmp             r3,  #2
-        itt             lt
-        ldrlt           r1,  [r1]
-        blt             .L_swri_oldapi_conv_flt_to_s16_neon
-        beq             .L_swri_oldapi_conv_fltp_to_s16_2ch_neon
+        bgt             2f
+        beq             1f
+        ldr             r1,  [r1]
+        b               .L_swri_oldapi_conv_flt_to_s16_neon
+1:      b               .L_swri_oldapi_conv_fltp_to_s16_2ch_neon

-        push            {r4-r8, lr}
+2:      push            {r4-r8, lr}
         cmp             r3,  #4
         lsl             r12, r3,  #1
         blt             4f