diff mbox series

[FFmpeg-devel] Replace br return with ret

Message ID 667c0538-92d0-84d6-7459-ccb4194f2ea7@arm.com
State New
Headers show
Series [FFmpeg-devel] Replace br return with ret | expand

Checks

Context Check Description
yinshiyou/configure_loongarch64 warning Failed to apply patch
andriy/configure_x86 warning Failed to apply patch

Commit Message

Casey Smalley July 27, 2023, 10:26 a.m. UTC
This patch changes the return instruction in the
tr_32x4 macro from br to ret.

On devices that support BTI a landing pad is
required when branching with br, or the instruction
can be replaced with a ret.

The change fixes fate-hevc-hdr-vivid-metadata when
on hardware with BTI support.

Signed-off-by: Casey Smalley <casey.smalley@arm.com>
---
  libavcodec/aarch64/hevcdsp_idct_neon.S | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Comments

Rémi Denis-Courmont July 27, 2023, 1:55 p.m. UTC | #1
Hi,

The use of RET vs BR also has microarchitectural side effects. AFAIU, RET should always be paired with an earlier BL/BLR to avoid interfering with branch prediction.

So depending on the circumstances, either one of these should be addressed:
* Clarify that this is actually a function return , and RET should be used anyway, regardless of BTI.
* Keep BR and add BTI J landing pads where appropriate, if this wasn't really a function return.

Br,
Reimar Döffinger July 27, 2023, 5:22 p.m. UTC | #2
> On 27 Jul 2023, at 15:55, Rémi Denis-Courmont <remi@remlab.net> wrote:
> 
> Hi,
> 
> The use of RET vs BR also has microarchitectural side effects. AFAIU, RET should always be paired with an earlier BL/BLR to avoid interfering with branch prediction.
> 
> So depending on the circumstances, either one of these should be addressed:
> * Clarify that this is actually a function return , and RET should be used anyway, regardless of BTI.
> * Keep BR and add BTI J landing pads where appropriate, if this wasn't really a function return.

Yes BL and RET is best to match up.

For this function:
% git grep func_tr_32x4
libavcodec/aarch64/hevcdsp_idct_neon.S:function func_tr_32x4_\name
libavcodec/aarch64/hevcdsp_idct_neon.S:        bl              func_tr_32x4_firstpass
libavcodec/aarch64/hevcdsp_idct_neon.S:        bl              func_tr_32x4_secondpass_\bitdepth
libavcodec/arm/hevcdsp_idct_neon.S:function func_tr_32x4_\name
libavcodec/arm/hevcdsp_idct_neon.S:        bl              func_tr_32x4_firstpass
libavcodec/arm/hevcdsp_idct_neon.S:        bl              func_tr_32x4_secondpass_\bitdepth

It is always used with "bl", thus ret is also more correct from
that aspect.
Was your comment only on checking that, or did you mean that this should
be mentioned in the commit message?
(if you are wondering why the code did not use ret before, I guess it's
because it was ported from the 32-bit arm assembler and it slipped by code review)

Best regards,
Reimar
Casey Smalley Aug. 4, 2023, 9:14 a.m. UTC | #3
Hi,

Just wondering what current thoughts on the patch was. It looks as
though the change is fine, but if there is still an issue I can submit a
new patch using BTI landing pads instead.

Best regards,

Casey.

On 7/27/23 18:22, Reimar Döffinger wrote:
> 
>
>
>> On 27 Jul 2023, at 15:55, Rémi Denis-Courmont <remi@remlab.net> wrote:
>>
>> Hi,
>>
>> The use of RET vs BR also has microarchitectural side effects. AFAIU, RET should always be paired with an earlier BL/BLR to avoid interfering with branch prediction.
>>
>> So depending on the circumstances, either one of these should be addressed:
>> * Clarify that this is actually a function return , and RET should be used anyway, regardless of BTI.
>> * Keep BR and add BTI J landing pads where appropriate, if this wasn't really a function return.
> Yes BL and RET is best to match up.
>
> For this function:
> % git grep func_tr_32x4
> libavcodec/aarch64/hevcdsp_idct_neon.S:function func_tr_32x4_\name
> libavcodec/aarch64/hevcdsp_idct_neon.S:        bl              func_tr_32x4_firstpass
> libavcodec/aarch64/hevcdsp_idct_neon.S:        bl              func_tr_32x4_secondpass_\bitdepth
> libavcodec/arm/hevcdsp_idct_neon.S:function func_tr_32x4_\name
> libavcodec/arm/hevcdsp_idct_neon.S:        bl              func_tr_32x4_firstpass
> libavcodec/arm/hevcdsp_idct_neon.S:        bl              func_tr_32x4_secondpass_\bitdepth
>
> It is always used with "bl", thus ret is also more correct from
> that aspect.
> Was your comment only on checking that, or did you mean that this should
> be mentioned in the commit message?
> (if you are wondering why the code did not use ret before, I guess it's
> because it was ported from the 32-bit arm assembler and it slipped by code review)
>
> Best regards,
> Reimar
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Martin Storsjö Aug. 4, 2023, 10:48 a.m. UTC | #4
On Thu, 27 Jul 2023, Rémi Denis-Courmont wrote:

> Hi,
>
> The use of RET vs BR also has microarchitectural side effects. AFAIU, RET should always be paired with an earlier BL/BLR to avoid interfering with branch prediction.
>
> So depending on the circumstances, either one of these should be 
> addressed:
> * Clarify that this is actually a function return , and RET should be 
> used anyway, regardless of BTI.

This is the case, and the patch looks good to me.

I guess the commit message could be clarified that this is an issue even 
without BTI (even if the effect is much harder to notice there).

Would this amended commit message be ok with you? (If no input I guess 
I'll push it in a few days.)

---8<---
Subject: aarch64/hevc: Replace br return with ret

This patch changes the return instruction in the tr_32x4 macro from br to 
ret.

Function returns should always use the RET instruction instead of BR, to 
avoid interfering with branch prediction.

On devices that support BTI, this is observeable as a landing pad is 
required when branching with BR. The change fixes 
fate-hevc-hdr-vivid-metadata when on hardware with BTI support.
---8<---

// Martin
diff mbox series

Patch

diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S
b/libavcodec/aarch64/hevcdsp_idct_neon.S
index b7f23386a4..eab2add9e8 100644
--- a/libavcodec/aarch64/hevcdsp_idct_neon.S
+++ b/libavcodec/aarch64/hevcdsp_idct_neon.S
@@ -791,7 +791,7 @@  function func_tr_32x4_\name
          add             x3, x11, #(32 + 3 * 64)
          scale_store     \shift
  -        br               x10
+        ret              x10
  endfunc
  .endm
  -- 2.40.1