diff mbox

[FFmpeg-devel,2/3] lavc: add a framework to fix alignment problems.

Message ID 20170506092006.22397-2-george@nsup.org
State Superseded
Headers show

Commit Message

Nicolas George May 6, 2017, 9:20 a.m. UTC
A lot of codecs require aligned frame data, but do not document it.
It can result in crashes with perfectly valid uses of the API.

TODO Implementation missing.

Design rationale:

- requiring frame data to be always aligned would be wasteful;

- alignment requirements will evolve
  (the 16->32 bump is still recent);

- requiring applications to worry about alignment is not convenient.

Signed-off-by: Nicolas George <george@nsup.org>
---
 libavcodec/avcodec.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Hendrik Leppkes May 6, 2017, 10:42 a.m. UTC | #1
On Sat, May 6, 2017 at 11:20 AM, Nicolas George <george@nsup.org> wrote:
> A lot of codecs require aligned frame data, but do not document it.
> It can result in crashes with perfectly valid uses of the API.
>
> TODO Implementation missing.
>
> Design rationale:
>
> - requiring frame data to be always aligned would be wasteful;
>
> - alignment requirements will evolve
>   (the 16->32 bump is still recent);
>
> - requiring applications to worry about alignment is not convenient.
>
> Signed-off-by: Nicolas George <george@nsup.org>
> ---
>  libavcodec/avcodec.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
> index 35df4f6ced..51a7e2db21 100644
> --- a/libavcodec/avcodec.h
> +++ b/libavcodec/avcodec.h
> @@ -3644,6 +3644,16 @@ typedef struct AVCodecContext {
>       *             AVCodecContext.get_format callback)
>       */
>      int hwaccel_flags;
> +
> +    /**
> +     * Minimum alignment of frame data required by the codec.
> +     * All frame data pointers must have the alignment lower bits cleared,
> +     * i.e. be a multiple of 1<<alignment.
> +     * - encoding: set by the encoder and used by the framework
> +     * - decoding: unused
> +     */
> +    unsigned alignment;
> +
>  } AVCodecContext;
>

I don't think this is very practical. We have a bunch of generic DSPs
and if a codec uses those it would need to be constantly updated if
the requirements of those DSPs change. Today it may only use SSE and
be happy with 16-byte alignment, but what if someone adds an AVX2
implementation, or AVX512?
Going through all codecs that use some DSP and somehow set their
alignment value seems rather unfortunate - and on top of that, it'll
add arch-specific conditions, which leads to a lot of unnecessary
changes to a bunch of encoders (and filters) over time.

Or the alternative is that its initialized to some default value based
on the current CPU and only overriden in a minority of cases, which
still makes me wonder if the effort is worth it.

Right now the general consensus is that frames (and stride) should be
aligned to the maximum your CPU might require, ie. 32 byte on a AVX2
CPU, even if this is not documented.
I think it would be far better to officially document this. Making
applications aware of the alignment (which they likely are already
anyway, otherwise they would crash, even if they had to learn the hard
way) can make the entire process more efficient, since re-aligning can
be extremely expensive (ie. high res video) and should be avoided at
all cost if possible (which it not always is when you crop, for
example).

There is no particular reason we can't make the framework check the
alignment and re-align if required to the maximum CPU value (with a
loud warning, perhaps), but I don't think a per-codec or per-filter
alignment value is very practical.

Related, Libav introduced an API called av_cpu_max_align to query the
alignment requirements for the CPU, in case someone is not using
av_malloc to allocate their buffers, but their own allocation
mechanisms.

- Hendrik
Ronald S. Bultje May 6, 2017, 12:50 p.m. UTC | #2
Hi,

On Sat, May 6, 2017 at 5:20 AM, Nicolas George <george@nsup.org> wrote:

> +    /**
> +     * Minimum alignment of frame data required by the codec.
> +     * All frame data pointers must have the alignment lower bits cleared,
> +     * i.e. be a multiple of 1<<alignment.
> +     * - encoding: set by the encoder and used by the framework
> +     * - decoding: unused
> +     */
> +    unsigned alignment;
> +
>  } AVCodecContext;


I agree it's likely that one codec (e.g. h264) would need 32-byte alignment
on a particular system (e.g. x86/haswell), whereas another codec on the
same system (e.g. wmavoice) might not.

However, I find it unlikely that one codec *instance* would need different
alignment from another codec *instance* (for the same codec). The above
allows seems specifically designed for that. Do you think adding alignment
to AVCodec would make more sense (and possibly set it at runtime using
static_init)?

Ronald
Marton Balint May 6, 2017, 4:48 p.m. UTC | #3
On Sat, 6 May 2017, Ronald S. Bultje wrote:

> Hi,
>
> On Sat, May 6, 2017 at 5:20 AM, Nicolas George <george@nsup.org> wrote:
>
>> +    /**
>> +     * Minimum alignment of frame data required by the codec.
>> +     * All frame data pointers must have the alignment lower bits cleared,
>> +     * i.e. be a multiple of 1<<alignment.
>> +     * - encoding: set by the encoder and used by the framework
>> +     * - decoding: unused
>> +     */
>> +    unsigned alignment;
>> +
>>  } AVCodecContext;
>
>
> I agree it's likely that one codec (e.g. h264) would need 32-byte alignment
> on a particular system (e.g. x86/haswell), whereas another codec on the
> same system (e.g. wmavoice) might not.
>
> However, I find it unlikely that one codec *instance* would need different
> alignment from another codec *instance* (for the same codec).

I can imagine a case where alignment requirements are dependant on pixel 
format. Not the best example, because it is not a "codec", and it is using 
bitpacked formats, but the Decklink output device would require 128 byte 
or even 256 byte alignment for 10/12 bit, according to the SDK.

Regards,
Marton
Nicolas George May 6, 2017, 6:10 p.m. UTC | #4
Le septidi 17 floréal, an CCXXV, Ronald S. Bultje a écrit :
> I agree it's likely that one codec (e.g. h264) would need 32-byte alignment
> on a particular system (e.g. x86/haswell), whereas another codec on the
> same system (e.g. wmavoice) might not.
> 
> However, I find it unlikely that one codec *instance* would need different
> alignment from another codec *instance* (for the same codec). The above
> allows seems specifically designed for that. Do you think adding alignment
> to AVCodec would make more sense (and possibly set it at runtime using
> static_init)?

I think Marton's mail gives a reasonable example.

As for me, I do not know exactly what is needed, this is one of the
reason I did not want to draft the proposal myself. This way seemed more
future-proof, but it can be discussed. Also, I thought our AVCodec
structure were const, so putting the field there would have been
incompatible with run-time CPU detection, but apparently I was wrong.
Still, I do not think codecs are supposed to modify it.

Regards,
Nicolas George May 6, 2017, 6:34 p.m. UTC | #5
Le septidi 17 floréal, an CCXXV, Hendrik Leppkes a écrit :
> I don't think this is very practical. We have a bunch of generic DSPs
> and if a codec uses those it would need to be constantly updated if
> the requirements of those DSPs change. Today it may only use SSE and
> be happy with 16-byte alignment, but what if someone adds an AVX2
> implementation, or AVX512?

Well, exactly: pray tell me, with your design what happens when Intel
adds AVX512?

I will tell you what happens: applications will have been written for
32-octets alignment, and as soon as their libavcodec is updated, the
crashes will start.

> Going through all codecs that use some DSP and somehow set their
> alignment value seems rather unfortunate - and on top of that, it'll
> add arch-specific conditions, which leads to a lot of unnecessary
> changes to a bunch of encoders (and filters) over time.

The way I see it, the functions that are used to init the DSP should
return the required alignment. That will require a little change since
they do not take an AVCodecContext argument, but only trivial and
mechanical changes, so even if hundreds of files were touched it is
really not an issue.

Of course, setting a high default value by default is a possible
short-term solution, especially in the branches.

> Right now the general consensus is that frames (and stride) should be
> aligned to the maximum your CPU might require, ie. 32 byte on a AVX2
> CPU, even if this is not documented.

It may have been the consensus for people who work on the vectorized
implementations, but not the general consensus, especially for people
who work on user-facing API.

> I think it would be far better to officially document this. Making
> applications aware of the alignment (which they likely are already
> anyway, otherwise they would crash, even if they had to learn the hard
> way) can make the entire process more efficient, since re-aligning can
> be extremely expensive (ie. high res video) and should be avoided at
> all cost if possible (which it not always is when you crop, for
> example).

Encouraging the applications to be aware of the issue for performances
reasons is a good thing.

Forcing them to do so when it is not convenient, on pain of crash, when
it can be done for them just as easily, not so much.

> There is no particular reason we can't make the framework check the
> alignment and re-align if required to the maximum CPU value (with a
> loud warning, perhaps), but I don't think a per-codec or per-filter
> alignment value is very practical.

If the framework does the realigning, doing it per-codec or per-filter
is almost free. And someone really smart said that "since re-aligning
can be extremely expensive" it "should be avoided at all cost if
possible", doing the realigning only if the codecs needs it is really
the right thing to do.

Regards,
Hendrik Leppkes May 7, 2017, 3:15 p.m. UTC | #6
On Sat, May 6, 2017 at 8:34 PM, Nicolas George <george@nsup.org> wrote:
>
> Encouraging the applications to be aware of the issue for performances
> reasons is a good thing.
>
> Forcing them to do so when it is not convenient, on pain of crash, when
> it can be done for them just as easily, not so much.
>
>> There is no particular reason we can't make the framework check the
>> alignment and re-align if required to the maximum CPU value (with a
>> loud warning, perhaps), but I don't think a per-codec or per-filter
>> alignment value is very practical.
>
> If the framework does the realigning, doing it per-codec or per-filter
> is almost free. And someone really smart said that "since re-aligning
> can be extremely expensive" it "should be avoided at all cost if
> possible", doing the realigning only if the codecs needs it is really
> the right thing to do.
>

One of my main concerns is that FATE will not actually be testing any
of this (otherwise it would be crashing right now), and as such it can
(and we all know, will) lead to subtle bugs that only occur with
certain callers on certain platforms.
This is usually not a very good situation to be in.

Ronald suggested on IRC that the only parts suffering from this are
the ASM DSP components, and we should just make checkasm tests for all
of them and let checkasm test the alignment they claim to support, but
alas this won't be done for months, if not years.
There is potentially nothing wrong with supporting per-codec values in
the framework, but setting them to something lower then the global
alignment (ie. 32-byte) should only be done if we have sufficient
automated testing to ensure this value is and remains correct.

- Hendrik
Ronald S. Bultje May 7, 2017, 5:17 p.m. UTC | #7
Hi,

On Sun, May 7, 2017 at 11:15 AM, Hendrik Leppkes <h.leppkes@gmail.com>
wrote:

> Ronald suggested on IRC that the only parts suffering from this are
> the ASM DSP components, and we should just make checkasm tests for all
> of them and let checkasm test the alignment they claim to support, but
> alas this won't be done for months, if not years.


This really isn't that complicated once we get the general design and goals
explicitly documented.

We test all kind of crazy stuff in checkasm, like xmm register clobbering.
This is just another strange thing that can be done in generic code. We
just need to decide what is OK so we know what to test.

Ronald
Hendrik Leppkes May 7, 2017, 6:55 p.m. UTC | #8
On Sun, May 7, 2017 at 7:17 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> Hi,
>
> On Sun, May 7, 2017 at 11:15 AM, Hendrik Leppkes <h.leppkes@gmail.com>
> wrote:
>
>> Ronald suggested on IRC that the only parts suffering from this are
>> the ASM DSP components, and we should just make checkasm tests for all
>> of them and let checkasm test the alignment they claim to support, but
>> alas this won't be done for months, if not years.
>
>
> This really isn't that complicated once we get the general design and goals
> explicitly documented.
>
> We test all kind of crazy stuff in checkasm, like xmm register clobbering.
> This is just another strange thing that can be done in generic code. We
> just need to decide what is OK so we know what to test.
>

I'm not worried about checkasm perfoming the test, but all DSP
components actually having tests.

- Hendrik
Ronald S. Bultje May 7, 2017, 7:03 p.m. UTC | #9
Hi,

On Sun, May 7, 2017 at 2:55 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote:

> On Sun, May 7, 2017 at 7:17 PM, Ronald S. Bultje <rsbultje@gmail.com>
> wrote:
> > Hi,
> >
> > On Sun, May 7, 2017 at 11:15 AM, Hendrik Leppkes <h.leppkes@gmail.com>
> > wrote:
> >
> >> Ronald suggested on IRC that the only parts suffering from this are
> >> the ASM DSP components, and we should just make checkasm tests for all
> >> of them and let checkasm test the alignment they claim to support, but
> >> alas this won't be done for months, if not years.
> >
> >
> > This really isn't that complicated once we get the general design and
> goals
> > explicitly documented.
> >
> > We test all kind of crazy stuff in checkasm, like xmm register
> clobbering.
> > This is just another strange thing that can be done in generic code. We
> > just need to decide what is OK so we know what to test.
> >
>
> I'm not worried about checkasm perfoming the test, but all DSP
> components actually having tests.


Test coverage in general is a fair point, we can work on that...

Ronald
diff mbox

Patch

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 35df4f6ced..51a7e2db21 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -3644,6 +3644,16 @@  typedef struct AVCodecContext {
      *             AVCodecContext.get_format callback)
      */
     int hwaccel_flags;
+
+    /**
+     * Minimum alignment of frame data required by the codec.
+     * All frame data pointers must have the alignment lower bits cleared,
+     * i.e. be a multiple of 1<<alignment.
+     * - encoding: set by the encoder and used by the framework
+     * - decoding: unused
+     */
+    unsigned alignment;
+
 } AVCodecContext;
 
 AVRational av_codec_get_pkt_timebase         (const AVCodecContext *avctx);