diff mbox

[FFmpeg-devel] cpu: add a function for querying maximum required data alignment

Message ID 20170902170701.9144-1-jamrial@gmail.com
State Superseded
Headers show

Commit Message

James Almer Sept. 2, 2017, 5:07 p.m. UTC
From: Anton Khirnov <anton@khirnov.net>

(cherry picked from commit e6bff23f1e11aefb16a2b5d6ee72bf7469c5a66e)
Signed-off-by: James Almer <jamrial@gmail.com>
---
This is (afaics) the last API introduced to libav before the major bump.

 doc/APIchanges      |  3 +++
 libavutil/cpu.c     | 22 ++++++++++++++++++++++
 libavutil/cpu.h     | 13 +++++++++++++
 libavutil/version.h |  2 +-
 4 files changed, 39 insertions(+), 1 deletion(-)

Comments

Clément Bœsch Sept. 2, 2017, 6:29 p.m. UTC | #1
On Sat, Sep 02, 2017 at 02:07:01PM -0300, James Almer wrote:
[...]
> +size_t av_cpu_max_align(void)
> +{
> +    int av_unused flags = av_get_cpu_flags();
> +
> +#if ARCH_ARM || ARCH_AARCH64
> +    if (flags & AV_CPU_FLAG_NEON)
> +        return 16;
> +#elif ARCH_PPC
> +    if (flags & AV_CPU_FLAG_ALTIVEC)
> +        return 16;

> +#elif ARCH_X86
> +    if (flags & AV_CPU_FLAG_AVX)
> +        return 32;
> +    if (flags & AV_CPU_FLAG_SSE)
> +        return 16;
> +#endif

mmh, will this really work in FFmpeg? I think we have a difference related
to the flags dependency. Typically, if having SSE2 doesn't imply you have
SSE. I think you may want to extend the mask.

[...]
Hendrik Leppkes Sept. 2, 2017, 6:48 p.m. UTC | #2
On Sat, Sep 2, 2017 at 8:29 PM, Clément Bœsch <u@pkh.me> wrote:
> On Sat, Sep 02, 2017 at 02:07:01PM -0300, James Almer wrote:
> [...]
>> +size_t av_cpu_max_align(void)
>> +{
>> +    int av_unused flags = av_get_cpu_flags();
>> +
>> +#if ARCH_ARM || ARCH_AARCH64
>> +    if (flags & AV_CPU_FLAG_NEON)
>> +        return 16;
>> +#elif ARCH_PPC
>> +    if (flags & AV_CPU_FLAG_ALTIVEC)
>> +        return 16;
>
>> +#elif ARCH_X86
>> +    if (flags & AV_CPU_FLAG_AVX)
>> +        return 32;
>> +    if (flags & AV_CPU_FLAG_SSE)
>> +        return 16;
>> +#endif
>
> mmh, will this really work in FFmpeg? I think we have a difference related
> to the flags dependency. Typically, if having SSE2 doesn't imply you have
> SSE. I think you may want to extend the mask.
>
>

If we do, we should really get rid of supporting that. Its really not
very meaningful to select say SSSE3 but not SSE2. SSSE3 flagged
functions are still going to use the majority of SSE2 instructions,
and so on.
If you unselect MMX, I would wager a bunch of checks all over fall
over and fail on you. :p

- Hendrik
James Almer Sept. 2, 2017, 6:48 p.m. UTC | #3
On 9/2/2017 3:29 PM, Clément Bœsch wrote:
> On Sat, Sep 02, 2017 at 02:07:01PM -0300, James Almer wrote:
> [...]
>> +size_t av_cpu_max_align(void)
>> +{
>> +    int av_unused flags = av_get_cpu_flags();
>> +
>> +#if ARCH_ARM || ARCH_AARCH64
>> +    if (flags & AV_CPU_FLAG_NEON)
>> +        return 16;
>> +#elif ARCH_PPC
>> +    if (flags & AV_CPU_FLAG_ALTIVEC)
>> +        return 16;
> 
>> +#elif ARCH_X86
>> +    if (flags & AV_CPU_FLAG_AVX)
>> +        return 32;
>> +    if (flags & AV_CPU_FLAG_SSE)
>> +        return 16;
>> +#endif
> 
> mmh, will this really work in FFmpeg? I think we have a difference related
> to the flags dependency. Typically, if having SSE2 doesn't imply you have
> SSE. I think you may want to extend the mask.

Mmh, you're right, forgot we have av_parse_cpu_caps().

What do i do then? Define two masks with all the CPU flags that would
apply for each alignment value?
AVX to AVX2 plus FMA3/4 and the slow variants for 32, then SSE to SSE4
plus XOP and the slow variants for 16?

> 
> [...]
> 
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
James Almer Sept. 2, 2017, 7 p.m. UTC | #4
On 9/2/2017 3:48 PM, Hendrik Leppkes wrote:
> On Sat, Sep 2, 2017 at 8:29 PM, Clément Bœsch <u@pkh.me> wrote:
>> On Sat, Sep 02, 2017 at 02:07:01PM -0300, James Almer wrote:
>> [...]
>>> +size_t av_cpu_max_align(void)
>>> +{
>>> +    int av_unused flags = av_get_cpu_flags();
>>> +
>>> +#if ARCH_ARM || ARCH_AARCH64
>>> +    if (flags & AV_CPU_FLAG_NEON)
>>> +        return 16;
>>> +#elif ARCH_PPC
>>> +    if (flags & AV_CPU_FLAG_ALTIVEC)
>>> +        return 16;
>>
>>> +#elif ARCH_X86
>>> +    if (flags & AV_CPU_FLAG_AVX)
>>> +        return 32;
>>> +    if (flags & AV_CPU_FLAG_SSE)
>>> +        return 16;
>>> +#endif
>>
>> mmh, will this really work in FFmpeg? I think we have a difference related
>> to the flags dependency. Typically, if having SSE2 doesn't imply you have
>> SSE. I think you may want to extend the mask.
>>
>>
> 
> If we do, we should really get rid of supporting that. Its really not
> very meaningful to select say SSSE3 but not SSE2. SSSE3 flagged
> functions are still going to use the majority of SSE2 instructions,
> and so on.

av_parse_cpu_caps() + av_force_cpu_flags() was added precisely to enable
single flags without pulling "dependencies" and the doxy even recommends
their usage over av_parse_cpu_flags() + av_set_cpu_flags_mask().

I don't know if it's really useful or not, but forcing CPU flags is a
developer functionality, in any case. In pretty much any real world case
people will call av_get_cpu_flags() and use the default return value for
the host machine, not bothering to disable anything and much less force
flags that are not present.

> If you unselect MMX, I would wager a bunch of checks all over fall
> over and fail on you. :p

MMX is forcefully enabled if anything newer than that is also enabled.
It's the only exception to the above precisely because things break if
we don't :p

> 
> - Hendrik
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
diff mbox

Patch

diff --git a/doc/APIchanges b/doc/APIchanges
index 4effbf9364..6a57c210a9 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -15,6 +15,9 @@  libavutil:     2015-08-28
 
 API changes, most recent first:
 
+2017-09-xx - xxxxxxx - lavu 55.75.100 / lavu 55.31.0 - cpu.h
+  Add av_cpu_max_align() for querying maximum required data alignment.
+
 2017-09-01 - xxxxxxx - lavf 57.81.100 - avio.h
   Add avio_read_partial().
 
diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index a22da0fa8c..c89f4e0b5f 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -16,9 +16,11 @@ 
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+#include <stddef.h>
 #include <stdint.h>
 #include <stdatomic.h>
 
+#include "attributes.h"
 #include "cpu.h"
 #include "cpu_internal.h"
 #include "config.h"
@@ -299,3 +301,23 @@  int av_cpu_count(void)
 
     return nb_cpus;
 }
+
+size_t av_cpu_max_align(void)
+{
+    int av_unused flags = av_get_cpu_flags();
+
+#if ARCH_ARM || ARCH_AARCH64
+    if (flags & AV_CPU_FLAG_NEON)
+        return 16;
+#elif ARCH_PPC
+    if (flags & AV_CPU_FLAG_ALTIVEC)
+        return 16;
+#elif ARCH_X86
+    if (flags & AV_CPU_FLAG_AVX)
+        return 32;
+    if (flags & AV_CPU_FLAG_SSE)
+        return 16;
+#endif
+
+    return 8;
+}
diff --git a/libavutil/cpu.h b/libavutil/cpu.h
index de05593446..9e5d40affe 100644
--- a/libavutil/cpu.h
+++ b/libavutil/cpu.h
@@ -21,6 +21,8 @@ 
 #ifndef AVUTIL_CPU_H
 #define AVUTIL_CPU_H
 
+#include <stddef.h>
+
 #include "attributes.h"
 
 #define AV_CPU_FLAG_FORCE    0x80000000 /* force usage of selected flags (OR) */
@@ -113,4 +115,15 @@  int av_parse_cpu_caps(unsigned *flags, const char *s);
  */
 int av_cpu_count(void);
 
+/**
+ * Get the maximum data alignment that may be required by FFmpeg.
+ *
+ * Note that this is affected by the build configuration and the CPU flags mask,
+ * so e.g. if the CPU supports AVX, but libavutil has been built with
+ * --disable-avx or the AV_CPU_FLAG_AVX flag has been disabled through
+ *  av_set_cpu_flags_mask(), then this function will behave as if AVX is not
+ *  present.
+ */
+size_t av_cpu_max_align(void);
+
 #endif /* AVUTIL_CPU_H */
diff --git a/libavutil/version.h b/libavutil/version.h
index 6e25b4690c..d99eff5d15 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -80,7 +80,7 @@ 
 
 
 #define LIBAVUTIL_VERSION_MAJOR  55
-#define LIBAVUTIL_VERSION_MINOR  74
+#define LIBAVUTIL_VERSION_MINOR  75
 #define LIBAVUTIL_VERSION_MICRO 100
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \