diff mbox

[FFmpeg-devel] x86inc: Avoid using eax/rax for storing the stack pointer

Message ID 1482693899-3704-1-git-send-email-henrik@gramner.com
State Accepted
Commit 3cba1ad76d362c994fa98fb686e04e20826fb579
Headers show

Commit Message

Henrik Gramner Dec. 25, 2016, 7:24 p.m. UTC
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.

If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.
---
 libavutil/x86/x86inc.asm | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Ronald S. Bultje Dec. 26, 2016, 1:32 a.m. UTC | #1
Hi,

On Sun, Dec 25, 2016 at 2:24 PM, Henrik Gramner <henrik@gramner.com> wrote:

> When allocating stack space with an alignment requirement that is larger
> than the current stack alignment we need to store a copy of the original
> stack pointer in order to be able to restore it later.
>
> If we chose to use another register for this purpose we should not pick
> eax/rax since it can be overwritten as a return value.
> ---
>  libavutil/x86/x86inc.asm | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
> index b2e9c60..128ddc1 100644
> --- a/libavutil/x86/x86inc.asm
> +++ b/libavutil/x86/x86inc.asm
> @@ -385,7 +385,14 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,
> 13,14
>      %ifnum %1
>          %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
>              %if %1 > 0
> +                ; Reserve an additional register for storing the original
> stack pointer, but avoid using
> +                ; eax/rax for this purpose since it can potentially get
> overwritten as a return value.
>                  %assign regs_used (regs_used + 1)
> +                %if ARCH_X86_64 && regs_used == 7
> +                    %assign regs_used 8
> +                %elif ARCH_X86_64 == 0 && regs_used == 1
> +                    %assign regs_used 2
> +                %endif
>              %endif
>              %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
>                  ; Ensure that we don't clobber any registers containing
> arguments. For UNIX64 we also preserve r6 (rax)
> --
> 2.7.4


I know I'm terribly nitpicking here for the limited scope of the comment,
but this only matters for functions that have a return value. Do you think
it makes sense to allow functions to opt out of this requirement if they
explicitly state to not have a return value?

Ronald
Henrik Gramner Dec. 26, 2016, 9:53 a.m. UTC | #2
On Mon, Dec 26, 2016 at 2:32 AM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> I know I'm terribly nitpicking here for the limited scope of the comment,
> but this only matters for functions that have a return value. Do you think
> it makes sense to allow functions to opt out of this requirement if they
> explicitly state to not have a return value?

An opt-out would only be relevant on 64-bit Windows when the following
criteria are true for a function:

* Reserves exactly 6 registers
* Reserves stack space with the original stack pointer stored in a
register (as opposed to the stack)
* Requires >16 byte stack alignment (e.g. spilling ymm registers to the stack)
* Does not have a return value

If and only if all of those are true this would result in one register
being unnecessarily saved (the cost of which would likely be hidden by
OoE). On other systems than WIN64 or if any of the conditions above is
false an opt-out doesn't make any sense.

Considering how rare that corner case is in combination with how
fairly insignificant the downside is I'm not sure it makes that much
sense to complicate the x86inc API further with an opt-out just for
that specific scenario.

[I accidentally only sent this reply to libav-devel, so resending it
to ffmpeg-devel as well]
Ronald S. Bultje Dec. 26, 2016, 1:52 p.m. UTC | #3
Hi,

On Mon, Dec 26, 2016 at 4:53 AM, Henrik Gramner <henrik@gramner.com> wrote:

> On Mon, Dec 26, 2016 at 2:32 AM, Ronald S. Bultje <rsbultje@gmail.com>
> wrote:
> > I know I'm terribly nitpicking here for the limited scope of the comment,
> > but this only matters for functions that have a return value. Do you
> think
> > it makes sense to allow functions to opt out of this requirement if they
> > explicitly state to not have a return value?
>
> An opt-out would only be relevant on 64-bit Windows when the following
> criteria are true for a function:
>
> * Reserves exactly 6 registers
> * Reserves stack space with the original stack pointer stored in a
> register (as opposed to the stack)
> * Requires >16 byte stack alignment (e.g. spilling ymm registers to the
> stack)
> * Does not have a return value
>
> If and only if all of those are true this would result in one register
> being unnecessarily saved (the cost of which would likely be hidden by
> OoE). On other systems than WIN64 or if any of the conditions above is
> false an opt-out doesn't make any sense.
>
> Considering how rare that corner case is in combination with how
> fairly insignificant the downside is I'm not sure it makes that much
> sense to complicate the x86inc API further with an opt-out just for
> that specific scenario.
>

 Hm, OK, I think it affects unix64/x86-32 also when using 32-byte
alignment. We do use the stack pointer then. But let's ignore that for a
second, I think it's besides the point.

I think my hesitation comes from how I view x86inc.asm. There's two ways to
see it:
- it's a universal tool, like a compiler, to assist writing assembly
(combined with yasm/nasm as actual assembler);
or
- it's a local tool for ffmpeg/libav/x26[5], like libavutil/attributes.h,
to assist writing assembly.

If x86inc.asm were like a compiler, every micro-optimization, no matter the
benefit, would be important. If it were a local tool, we indeed wouldn't
care because ffmpeg spends most runtime for important use cases in other
areas. (There's obviously a grayscale in this black/white range that I'm
drawing out.) So having said that, patch is OK. If someone would later come
in to add something to take return value type (void vs. non-void) into
account, I would still find that helpful. :)

Ronald
Henrik Gramner Dec. 26, 2016, 2:31 p.m. UTC | #4
On Mon, Dec 26, 2016 at 2:52 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
>  Hm, OK, I think it affects unix64/x86-32 also when using 32-byte
> alignment. We do use the stack pointer then.

On 32-bit and UNIX64 it simply uses a different caller-saved register
which doesn't require additional instructions.

> I think my hesitation comes from how I view x86inc.asm. There's two ways to
> see it:
> - it's a universal tool, like a compiler, to assist writing assembly
> (combined with yasm/nasm as actual assembler);
> or
> - it's a local tool for ffmpeg/libav/x26[5], like libavutil/attributes.h,
> to assist writing assembly.

In practice it's basically (a), but designed around the use-case of (b).

> If x86inc.asm were like a compiler, every micro-optimization, no matter the
> benefit, would be important. If it were a local tool, we indeed wouldn't
> care because ffmpeg spends most runtime for important use cases in other
> areas. (There's obviously a grayscale in this black/white range that I'm
> drawing out.) So having said that, patch is OK. If someone would later come
> in to add something to take return value type (void vs. non-void) into
> account, I would still find that helpful. :)

Specifying a full function prototype for cglobal instead of the
current implementation would be ideal. It would also allow stuff like
full floating-point abstraction and the ability to auto-load a
non-contiguous subset of parameters with optional sign-extension of
32-bit args etc.

The problem is that it's difficult to implement in a clean way with
the limited Yasm syntax. Nasm does have better string parsing
capabilities (although I haven't looked into it in detail) so if we
decide to drop Yasm support at some point in the future this feature
could perhaps be considered.
diff mbox

Patch

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index b2e9c60..128ddc1 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -385,7 +385,14 @@  DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
     %ifnum %1
         %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
             %if %1 > 0
+                ; Reserve an additional register for storing the original stack pointer, but avoid using
+                ; eax/rax for this purpose since it can potentially get overwritten as a return value.
                 %assign regs_used (regs_used + 1)
+                %if ARCH_X86_64 && regs_used == 7
+                    %assign regs_used 8
+                %elif ARCH_X86_64 == 0 && regs_used == 1
+                    %assign regs_used 2
+                %endif
             %endif
             %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
                 ; Ensure that we don't clobber any registers containing arguments. For UNIX64 we also preserve r6 (rax)