Message ID | 20170902170701.9144-1-jamrial@gmail.com |
---|---|
State | Superseded |
Headers | show |
On Sat, Sep 02, 2017 at 02:07:01PM -0300, James Almer wrote: [...] > +size_t av_cpu_max_align(void) > +{ > + int av_unused flags = av_get_cpu_flags(); > + > +#if ARCH_ARM || ARCH_AARCH64 > + if (flags & AV_CPU_FLAG_NEON) > + return 16; > +#elif ARCH_PPC > + if (flags & AV_CPU_FLAG_ALTIVEC) > + return 16; > +#elif ARCH_X86 > + if (flags & AV_CPU_FLAG_AVX) > + return 32; > + if (flags & AV_CPU_FLAG_SSE) > + return 16; > +#endif mmh, will this really work in FFmpeg? I think we have a difference related to the flags dependency. Typically, if having SSE2 doesn't imply you have SSE. I think you may want to extend the mask. [...]
On Sat, Sep 2, 2017 at 8:29 PM, Clément Bœsch <u@pkh.me> wrote: > On Sat, Sep 02, 2017 at 02:07:01PM -0300, James Almer wrote: > [...] >> +size_t av_cpu_max_align(void) >> +{ >> + int av_unused flags = av_get_cpu_flags(); >> + >> +#if ARCH_ARM || ARCH_AARCH64 >> + if (flags & AV_CPU_FLAG_NEON) >> + return 16; >> +#elif ARCH_PPC >> + if (flags & AV_CPU_FLAG_ALTIVEC) >> + return 16; > >> +#elif ARCH_X86 >> + if (flags & AV_CPU_FLAG_AVX) >> + return 32; >> + if (flags & AV_CPU_FLAG_SSE) >> + return 16; >> +#endif > > mmh, will this really work in FFmpeg? I think we have a difference related > to the flags dependency. Typically, if having SSE2 doesn't imply you have > SSE. I think you may want to extend the mask. > > If we do, we should really get rid of supporting that. Its really not very meaningful to select say SSSE3 but not SSE2. SSSE3 flagged functions are still going to use the majority of SSE2 instructions, and so on. If you unselect MMX, I would wager a bunch of checks all over fall over and fail on you. :p - Hendrik
On 9/2/2017 3:29 PM, Clément Bœsch wrote: > On Sat, Sep 02, 2017 at 02:07:01PM -0300, James Almer wrote: > [...] >> +size_t av_cpu_max_align(void) >> +{ >> + int av_unused flags = av_get_cpu_flags(); >> + >> +#if ARCH_ARM || ARCH_AARCH64 >> + if (flags & AV_CPU_FLAG_NEON) >> + return 16; >> +#elif ARCH_PPC >> + if (flags & AV_CPU_FLAG_ALTIVEC) >> + return 16; > >> +#elif ARCH_X86 >> + if (flags & AV_CPU_FLAG_AVX) >> + return 32; >> + if (flags & AV_CPU_FLAG_SSE) >> + return 16; >> +#endif > > mmh, will this really work in FFmpeg? I think we have a difference related > to the flags dependency. Typically, if having SSE2 doesn't imply you have > SSE. I think you may want to extend the mask. Mmh, you're right, forgot we have av_parse_cpu_caps(). What do i do then? Define two masks with all the CPU flags that would apply for each alignment value? AVX to AVX2 plus FMA3/4 and the slow variants for 32, then SSE to SSE4 plus XOP and the slow variants for 16? > > [...] > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >
On 9/2/2017 3:48 PM, Hendrik Leppkes wrote: > On Sat, Sep 2, 2017 at 8:29 PM, Clément Bœsch <u@pkh.me> wrote: >> On Sat, Sep 02, 2017 at 02:07:01PM -0300, James Almer wrote: >> [...] >>> +size_t av_cpu_max_align(void) >>> +{ >>> + int av_unused flags = av_get_cpu_flags(); >>> + >>> +#if ARCH_ARM || ARCH_AARCH64 >>> + if (flags & AV_CPU_FLAG_NEON) >>> + return 16; >>> +#elif ARCH_PPC >>> + if (flags & AV_CPU_FLAG_ALTIVEC) >>> + return 16; >> >>> +#elif ARCH_X86 >>> + if (flags & AV_CPU_FLAG_AVX) >>> + return 32; >>> + if (flags & AV_CPU_FLAG_SSE) >>> + return 16; >>> +#endif >> >> mmh, will this really work in FFmpeg? I think we have a difference related >> to the flags dependency. Typically, if having SSE2 doesn't imply you have >> SSE. I think you may want to extend the mask. >> >> > > If we do, we should really get rid of supporting that. Its really not > very meaningful to select say SSSE3 but not SSE2. SSSE3 flagged > functions are still going to use the majority of SSE2 instructions, > and so on. av_parse_cpu_caps() + av_force_cpu_flags() was added precisely to enable single flags without pulling "dependencies" and the doxy even recommends their usage over av_parse_cpu_flags() + av_set_cpu_flags_mask(). I don't know if it's really useful or not, but forcing CPU flags is a developer functionality, in any case. In pretty much any real world case people will call av_get_cpu_flags() and use the default return value for the host machine, not bothering to disable anything and much less force flags that are not present. > If you unselect MMX, I would wager a bunch of checks all over fall > over and fail on you. :p MMX is forcefully enabled if anything newer than that is also enabled. It's the only exception to the above precisely because things break if we don't :p > > - Hendrik > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >
diff --git a/doc/APIchanges b/doc/APIchanges index 4effbf9364..6a57c210a9 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -15,6 +15,9 @@ libavutil: 2015-08-28 API changes, most recent first: +2017-09-xx - xxxxxxx - lavu 55.75.100 / lavu 55.31.0 - cpu.h + Add av_cpu_max_align() for querying maximum required data alignment. + 2017-09-01 - xxxxxxx - lavf 57.81.100 - avio.h Add avio_read_partial(). diff --git a/libavutil/cpu.c b/libavutil/cpu.c index a22da0fa8c..c89f4e0b5f 100644 --- a/libavutil/cpu.c +++ b/libavutil/cpu.c @@ -16,9 +16,11 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ +#include <stddef.h> #include <stdint.h> #include <stdatomic.h> +#include "attributes.h" #include "cpu.h" #include "cpu_internal.h" #include "config.h" @@ -299,3 +301,23 @@ int av_cpu_count(void) return nb_cpus; } + +size_t av_cpu_max_align(void) +{ + int av_unused flags = av_get_cpu_flags(); + +#if ARCH_ARM || ARCH_AARCH64 + if (flags & AV_CPU_FLAG_NEON) + return 16; +#elif ARCH_PPC + if (flags & AV_CPU_FLAG_ALTIVEC) + return 16; +#elif ARCH_X86 + if (flags & AV_CPU_FLAG_AVX) + return 32; + if (flags & AV_CPU_FLAG_SSE) + return 16; +#endif + + return 8; +} diff --git a/libavutil/cpu.h b/libavutil/cpu.h index de05593446..9e5d40affe 100644 --- a/libavutil/cpu.h +++ b/libavutil/cpu.h @@ -21,6 +21,8 @@ #ifndef AVUTIL_CPU_H #define AVUTIL_CPU_H +#include <stddef.h> + #include "attributes.h" #define AV_CPU_FLAG_FORCE 0x80000000 /* force usage of selected flags (OR) */ @@ -113,4 +115,15 @@ int av_parse_cpu_caps(unsigned *flags, const char *s); */ int av_cpu_count(void); +/** + * Get the maximum data alignment that may be required by FFmpeg. + * + * Note that this is affected by the build configuration and the CPU flags mask, + * so e.g. if the CPU supports AVX, but libavutil has been built with + * --disable-avx or the AV_CPU_FLAG_AVX flag has been disabled through + * av_set_cpu_flags_mask(), then this function will behave as if AVX is not + * present. + */ +size_t av_cpu_max_align(void); + #endif /* AVUTIL_CPU_H */ diff --git a/libavutil/version.h b/libavutil/version.h index 6e25b4690c..d99eff5d15 100644 --- a/libavutil/version.h +++ b/libavutil/version.h @@ -80,7 +80,7 @@ #define LIBAVUTIL_VERSION_MAJOR 55 -#define LIBAVUTIL_VERSION_MINOR 74 +#define LIBAVUTIL_VERSION_MINOR 75 #define LIBAVUTIL_VERSION_MICRO 100 #define LIBAVUTIL_VERSION_INT AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \