Message ID | 1873931254.3575536.1585946518805.JavaMail.zimbra@univ-grenoble-alpes.fr |
---|---|
State | Superseded |
Headers | show |
Series | [FFmpeg-devel,inline,assembly,compliance] Issues and patches | expand |
Context | Check | Description |
---|---|---|
andriy/ffmpeg-patchwork | success | Make fate finished |
Am Fr., 3. Apr. 2020 um 22:42 Uhr schrieb FRÉDÉRIC RECOULES <frederic.recoules@univ-grenoble-alpes.fr>: > we are academic researchers working in automated program analysis. > We are currently interested in checking compliance of inline asm chunks > as found in C programs. > > While benchmarking our tool and technique, we found a number of issues in > FFMPEG. We report them to you, as well as adequate patches. > Actually, we found 59 significant compliance issues in your code. > We join 3 patches for some of them, together with explanations and > we can send you other patches on demand. > > > * All these bugs are related to compliance between the block of asm and its > surrounding "contract" (in gcc-style notation). They are akin to undefined or > implementation-defined behaviours in C: they currently do not manifest > themselves in your program, but at some point in time with compiler > optimizations becoming more and more aggressive or changes in undocumented > compiler choices regarding asm chunks, they can suddenly trigger a > (hard-to-find) bug. So your current patch does not change compilation output (without debug symbols) when compared with md5sum or similar? We only accept patches made with git format-patch, no diff files. Carl Eugen
On Fri, 3 Apr 2020 at 22:07, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > Am Fr., 3. Apr. 2020 um 22:42 Uhr schrieb FRÉDÉRIC RECOULES > <frederic.recoules@univ-grenoble-alpes.fr>: > > > we are academic researchers working in automated program analysis. > > We are currently interested in checking compliance of inline asm chunks > > as found in C programs. > > > > While benchmarking our tool and technique, we found a number of issues in > > FFMPEG. We report them to you, as well as adequate patches. > > Actually, we found 59 significant compliance issues in your code. > > We join 3 patches for some of them, together with explanations and > > we can send you other patches on demand. > > > > > > * All these bugs are related to compliance between the block of asm and > its > > surrounding "contract" (in gcc-style notation). They are akin to > undefined or > > implementation-defined behaviours in C: they currently do not manifest > > themselves in your program, but at some point in time with compiler > > optimizations becoming more and more aggressive or changes in > undocumented > > compiler choices regarding asm chunks, they can suddenly trigger a > > (hard-to-find) bug. > > So your current patch does not change compilation output (without > debug symbols) when compared with md5sum or similar? > We do not do such a comparison for C undefined behaviour changes so why should we do such a comparison for this code? Kieran
On Fri, Apr 03, 2020 at 10:41:58PM +0200, FRÉDÉRIC RECOULES wrote: > Dear developpers, > > > > we are academic researchers working in automated program analysis. > We are currently interested in checking compliance of inline asm chunks > as found in C programs. > > While benchmarking our tool and technique, we found a number of issues in > FFMPEG. We report them to you, as well as adequate patches. > Actually, we found 59 significant compliance issues in your code. > We join 3 patches for some of them, together with explanations and > we can send you other patches on demand. > > > * All these bugs are related to compliance between the block of asm and its > surrounding "contract" (in gcc-style notation). They are akin to undefined or > implementation-defined behaviours in C: they currently do not manifest > themselves in your program, but at some point in time with compiler > optimizations becoming more and more aggressive or changes in undocumented > compiler choices regarding asm chunks, they can suddenly trigger a > (hard-to-find) bug. > > * The typical problems come from the compiler missing dataflow information > and performing undue optimizations on this wrong basis, or the compiler > allocating an already used register. Actually, we demonstrate "in lab" problems > with all these categories of bugs in case of inlining > (especially with LTO enabler) or code refactoring. > > * Some of those issues may seems benign or irrealistic but it cost nothing > to patch so, why not do it? > > > We would be very interested to hear your opinion on these matters. > Are you interested in such errors and patches? > Also, besides the patches, we are currently working on a code analyzer > prototype designed to check asm compliance and to propose patches when the > chunk is not compliant. This is still work in progress and we are finalizing it. > The errors and patches I reported to you came from my prototype. > In case such a prototype would be made available, would you consider using it? > > > > Best regards > > Frédéric Recoules This breaks build with shared libs In file included from src/libavcodec/x86/hpeldsp_init.c:104: src/libavcodec/x86/rnd_template.c:102:30: error: dereference of pointer to incomplete type 'const uint8_t []' "m" (*(const uint8_t (*)[])pixels), ^ src/libavcodec/x86/rnd_template.c:188:30: error: dereference of pointer to incomplete type 'const uint8_t []' "m" (*(const uint8_t (*)[])pixels), ^ In file included from src/libavcodec/x86/hpeldsp_init.c:139: src/libavcodec/x86/rnd_template.c:102:30: error: dereference of pointer to incomplete type 'const uint8_t []' "m" (*(const uint8_t (*)[])pixels), ^ src/libavcodec/x86/rnd_template.c:188:30: error: dereference of pointer to incomplete type 'const uint8_t []' "m" (*(const uint8_t (*)[])pixels), ^ 4 errors generated. src/ffbuild/common.mak:59: recipe for target 'libavcodec/x86/hpeldsp_init.o' failed make: *** [libavcodec/x86/hpeldsp_init.o] Error 1 [...] > diff --git a/libavcodec/x86/rnd_template.c b/libavcodec/x86/rnd_template.c > index 09946bd23f..1be010e066 100644 > --- a/libavcodec/x86/rnd_template.c > +++ b/libavcodec/x86/rnd_template.c > @@ -30,146 +30,164 @@ > #include "inline_asm.h" > > // put_pixels > -av_unused STATIC void DEF(put, pixels8_xy2)(uint8_t *block, const uint8_t *pixels, > - ptrdiff_t line_size, int h) > +av_unused STATIC void DEF(put, pixels8_xy2)(uint8_t *block, > + const uint8_t *pixels, > + ptrdiff_t line_size, int h) unrelated whitespace change > { > - MOVQ_ZERO(mm7); > - SET_RND(mm6); // =2 for rnd and =1 for no_rnd version > - __asm__ volatile( > - "movq (%1), %%mm0 \n\t" > - "movq 1(%1), %%mm4 \n\t" > - "movq %%mm0, %%mm1 \n\t" > - "movq %%mm4, %%mm5 \n\t" > - "punpcklbw %%mm7, %%mm0 \n\t" > - "punpcklbw %%mm7, %%mm4 \n\t" > - "punpckhbw %%mm7, %%mm1 \n\t" > - "punpckhbw %%mm7, %%mm5 \n\t" > - "paddusw %%mm0, %%mm4 \n\t" > - "paddusw %%mm1, %%mm5 \n\t" > - "xor %%"FF_REG_a", %%"FF_REG_a" \n\t" > - "add %3, %1 \n\t" > - ".p2align 3 \n\t" > - "1: \n\t" > - "movq (%1, %%"FF_REG_a"), %%mm0 \n\t" > - "movq 1(%1, %%"FF_REG_a"), %%mm2 \n\t" > - "movq %%mm0, %%mm1 \n\t" > - "movq %%mm2, %%mm3 \n\t" > - "punpcklbw %%mm7, %%mm0 \n\t" > - "punpcklbw %%mm7, %%mm2 \n\t" > - "punpckhbw %%mm7, %%mm1 \n\t" > - "punpckhbw %%mm7, %%mm3 \n\t" > - "paddusw %%mm2, %%mm0 \n\t" > - "paddusw %%mm3, %%mm1 \n\t" > - "paddusw %%mm6, %%mm4 \n\t" > - "paddusw %%mm6, %%mm5 \n\t" > - "paddusw %%mm0, %%mm4 \n\t" > - "paddusw %%mm1, %%mm5 \n\t" > - "psrlw $2, %%mm4 \n\t" > - "psrlw $2, %%mm5 \n\t" > - "packuswb %%mm5, %%mm4 \n\t" > - "movq %%mm4, (%2, %%"FF_REG_a") \n\t" > - "add %3, %%"FF_REG_a" \n\t" > + x86_reg i = 0; > + __asm__ ( > + MOVQ_ZERO_TPL(mm7) > + SET_RND_TPL(mm6) // =2 for rnd and =1 for no_rnd version > + "movq (%[pixels]), %%mm0 \n\t" > + "movq 1(%[pixels]), %%mm4 \n\t" > + "movq %%mm0, %%mm1 \n\t" > + "movq %%mm4, %%mm5 \n\t" > + "punpcklbw %%mm7, %%mm0 \n\t" > + "punpcklbw %%mm7, %%mm4 \n\t" > + "punpckhbw %%mm7, %%mm1 \n\t" > + "punpckhbw %%mm7, %%mm5 \n\t" > + "paddusw %%mm0, %%mm4 \n\t" > + "paddusw %%mm1, %%mm5 \n\t" > + "add %[line_size], %[pixels] \n\t" > + ".p2align 3 \n\t" > + "1: \n\t" > + "movq (%[pixels], %[i]), %%mm0 \n\t" > + "movq 1(%[pixels], %[i]), %%mm2 \n\t" > + "movq %%mm0, %%mm1 \n\t" > + "movq %%mm2, %%mm3 \n\t" > + "punpcklbw %%mm7, %%mm0 \n\t" > + "punpcklbw %%mm7, %%mm2 \n\t" > + "punpckhbw %%mm7, %%mm1 \n\t" > + "punpckhbw %%mm7, %%mm3 \n\t" > + "paddusw %%mm2, %%mm0 \n\t" > + "paddusw %%mm3, %%mm1 \n\t" > + "paddusw %%mm6, %%mm4 \n\t" > + "paddusw %%mm6, %%mm5 \n\t" > + "paddusw %%mm0, %%mm4 \n\t" > + "paddusw %%mm1, %%mm5 \n\t" > + "psrlw $2, %%mm4 \n\t" > + "psrlw $2, %%mm5 \n\t" > + "packuswb %%mm5, %%mm4 \n\t" > + "movq %%mm4, (%[block], %[i]) \n\t" > + "add %[line_size], %[i] \n\t" this is full of unrelated reformating, making the patchset unreviewable each class of fixes should be in its own patch probably too but especially whitespace changes / reformating should be in a seperate patch so the reviewer can see what is changed Also its very important that the patch makes it clear not just what is changed but WHY each individual change is done. You may be assuming this is sloppy written code and do a few quick fixes and this is true for some parts. Bbut some of the inline asm code we had is written as it is because writing it "nicer" didnt work with some compilers. Especially anything that requires the compiler to be smart and optimize code or find the one possible solution can be a problem. I wanted to mention a few more things like nasm/yasm and when they might be worth looking at but i think thats a few steps too much at once. lets first see what fixes there actually are Can you resubmit thispatchset in a more reviewable form ? Thanks [...]
Thank you for your answers. As you have pointed out, these patches are full of unrelated changes that are not important for safety. Most of them were never intended to be posted here, the diff we submitted was the one of an experimental branch and we apologize to have made such a mistake. We will resubmit the patch with only essential patches in a more appropriate format very soon (git format-patch). @Michael These errors come with the Clang compiler, aren't they? We are aware that support for inline assembly may differ from one compiler to another despite the "higly-compatibility" that is stated. The safety patches we are proposing do not rely on them. @Carl @Kieran So far, we passed the FATE tests. The output is slightly different because we have merged contiguous assembly statement such that the compiler can no longer insert instruction between them, but the differences are only instruction swaps A.B.C -> B.A.C. Frédéric Recoules De: "Michael Niedermayer" <michael@niedermayer.cc> À: "ffmpeg-devel" <ffmpeg-devel@ffmpeg.org> Envoyé: Samedi 4 Avril 2020 00:39:13 Objet: Re: [FFmpeg-devel] [inline assembly compliance] Issues and patches On Fri, Apr 03, 2020 at 10:41:58PM +0200, FRÉDÉRIC RECOULES wrote: > Dear developpers, > > > > we are academic researchers working in automated program analysis. > We are currently interested in checking compliance of inline asm chunks > as found in C programs. > > While benchmarking our tool and technique, we found a number of issues in > FFMPEG. We report them to you, as well as adequate patches. > Actually, we found 59 significant compliance issues in your code. > We join 3 patches for some of them, together with explanations and > we can send you other patches on demand. > > > * All these bugs are related to compliance between the block of asm and its > surrounding "contract" (in gcc-style notation). They are akin to undefined or > implementation-defined behaviours in C: they currently do not manifest > themselves in your program, but at some point in time with compiler > optimizations becoming more and more aggressive or changes in undocumented > compiler choices regarding asm chunks, they can suddenly trigger a > (hard-to-find) bug. > > * The typical problems come from the compiler missing dataflow information > and performing undue optimizations on this wrong basis, or the compiler > allocating an already used register. Actually, we demonstrate "in lab" problems > with all these categories of bugs in case of inlining > (especially with LTO enabler) or code refactoring. > > * Some of those issues may seems benign or irrealistic but it cost nothing > to patch so, why not do it? > > > We would be very interested to hear your opinion on these matters. > Are you interested in such errors and patches? > Also, besides the patches, we are currently working on a code analyzer > prototype designed to check asm compliance and to propose patches when the > chunk is not compliant. This is still work in progress and we are finalizing it. > The errors and patches I reported to you came from my prototype. > In case such a prototype would be made available, would you consider using it? > > > > Best regards > > Frédéric Recoules This breaks build with shared libs In file included from src/libavcodec/x86/hpeldsp_init.c:104: src/libavcodec/x86/rnd_template.c:102:30: error: dereference of pointer to incomplete type 'const uint8_t []' "m" (*(const uint8_t (*)[])pixels), ^ src/libavcodec/x86/rnd_template.c:188:30: error: dereference of pointer to incomplete type 'const uint8_t []' "m" (*(const uint8_t (*)[])pixels), ^ In file included from src/libavcodec/x86/hpeldsp_init.c:139: src/libavcodec/x86/rnd_template.c:102:30: error: dereference of pointer to incomplete type 'const uint8_t []' "m" (*(const uint8_t (*)[])pixels), ^ src/libavcodec/x86/rnd_template.c:188:30: error: dereference of pointer to incomplete type 'const uint8_t []' "m" (*(const uint8_t (*)[])pixels), ^ 4 errors generated. src/ffbuild/common.mak:59: recipe for target 'libavcodec/x86/hpeldsp_init.o' failed make: *** [libavcodec/x86/hpeldsp_init.o] Error 1 [...] > diff --git a/libavcodec/x86/rnd_template.c b/libavcodec/x86/rnd_template.c > index 09946bd23f..1be010e066 100644 > --- a/libavcodec/x86/rnd_template.c > +++ b/libavcodec/x86/rnd_template.c > @@ -30,146 +30,164 @@ > #include "inline_asm.h" > > // put_pixels > -av_unused STATIC void DEF(put, pixels8_xy2)(uint8_t *block, const uint8_t *pixels, > - ptrdiff_t line_size, int h) > +av_unused STATIC void DEF(put, pixels8_xy2)(uint8_t *block, > + const uint8_t *pixels, > + ptrdiff_t line_size, int h) unrelated whitespace change > { > - MOVQ_ZERO(mm7); > - SET_RND(mm6); // =2 for rnd and =1 for no_rnd version > - __asm__ volatile( > - "movq (%1), %%mm0 \n\t" > - "movq 1(%1), %%mm4 \n\t" > - "movq %%mm0, %%mm1 \n\t" > - "movq %%mm4, %%mm5 \n\t" > - "punpcklbw %%mm7, %%mm0 \n\t" > - "punpcklbw %%mm7, %%mm4 \n\t" > - "punpckhbw %%mm7, %%mm1 \n\t" > - "punpckhbw %%mm7, %%mm5 \n\t" > - "paddusw %%mm0, %%mm4 \n\t" > - "paddusw %%mm1, %%mm5 \n\t" > - "xor %%"FF_REG_a", %%"FF_REG_a" \n\t" > - "add %3, %1 \n\t" > - ".p2align 3 \n\t" > - "1: \n\t" > - "movq (%1, %%"FF_REG_a"), %%mm0 \n\t" > - "movq 1(%1, %%"FF_REG_a"), %%mm2 \n\t" > - "movq %%mm0, %%mm1 \n\t" > - "movq %%mm2, %%mm3 \n\t" > - "punpcklbw %%mm7, %%mm0 \n\t" > - "punpcklbw %%mm7, %%mm2 \n\t" > - "punpckhbw %%mm7, %%mm1 \n\t" > - "punpckhbw %%mm7, %%mm3 \n\t" > - "paddusw %%mm2, %%mm0 \n\t" > - "paddusw %%mm3, %%mm1 \n\t" > - "paddusw %%mm6, %%mm4 \n\t" > - "paddusw %%mm6, %%mm5 \n\t" > - "paddusw %%mm0, %%mm4 \n\t" > - "paddusw %%mm1, %%mm5 \n\t" > - "psrlw $2, %%mm4 \n\t" > - "psrlw $2, %%mm5 \n\t" > - "packuswb %%mm5, %%mm4 \n\t" > - "movq %%mm4, (%2, %%"FF_REG_a") \n\t" > - "add %3, %%"FF_REG_a" \n\t" > + x86_reg i = 0; > + __asm__ ( > + MOVQ_ZERO_TPL(mm7) > + SET_RND_TPL(mm6) // =2 for rnd and =1 for no_rnd version > + "movq (%[pixels]), %%mm0 \n\t" > + "movq 1(%[pixels]), %%mm4 \n\t" > + "movq %%mm0, %%mm1 \n\t" > + "movq %%mm4, %%mm5 \n\t" > + "punpcklbw %%mm7, %%mm0 \n\t" > + "punpcklbw %%mm7, %%mm4 \n\t" > + "punpckhbw %%mm7, %%mm1 \n\t" > + "punpckhbw %%mm7, %%mm5 \n\t" > + "paddusw %%mm0, %%mm4 \n\t" > + "paddusw %%mm1, %%mm5 \n\t" > + "add %[line_size], %[pixels] \n\t" > + ".p2align 3 \n\t" > + "1: \n\t" > + "movq (%[pixels], %[i]), %%mm0 \n\t" > + "movq 1(%[pixels], %[i]), %%mm2 \n\t" > + "movq %%mm0, %%mm1 \n\t" > + "movq %%mm2, %%mm3 \n\t" > + "punpcklbw %%mm7, %%mm0 \n\t" > + "punpcklbw %%mm7, %%mm2 \n\t" > + "punpckhbw %%mm7, %%mm1 \n\t" > + "punpckhbw %%mm7, %%mm3 \n\t" > + "paddusw %%mm2, %%mm0 \n\t" > + "paddusw %%mm3, %%mm1 \n\t" > + "paddusw %%mm6, %%mm4 \n\t" > + "paddusw %%mm6, %%mm5 \n\t" > + "paddusw %%mm0, %%mm4 \n\t" > + "paddusw %%mm1, %%mm5 \n\t" > + "psrlw $2, %%mm4 \n\t" > + "psrlw $2, %%mm5 \n\t" > + "packuswb %%mm5, %%mm4 \n\t" > + "movq %%mm4, (%[block], %[i]) \n\t" > + "add %[line_size], %[i] \n\t" this is full of unrelated reformating, making the patchset unreviewable each class of fixes should be in its own patch probably too but especially whitespace changes / reformating should be in a seperate patch so the reviewer can see what is changed Also its very important that the patch makes it clear not just what is changed but WHY each individual change is done. You may be assuming this is sloppy written code and do a few quick fixes and this is true for some parts. Bbut some of the inline asm code we had is written as it is because writing it "nicer" didnt work with some compilers. Especially anything that requires the compiler to be smart and optimize code or find the one possible solution can be a problem. I wanted to mention a few more things like nasm/yasm and when they might be worth looking at but i think thats a few steps too much at once. lets first see what fixes there actually are Can you resubmit thispatchset in a more reviewable form ? Thanks [...]
On Sat, Apr 04, 2020 at 12:28:43PM +0200, FRÉDÉRIC RECOULES wrote: > Thank you for your answers. > > As you have pointed out, these patches are full of unrelated changes that are not important for safety. > Most of them were never intended to be posted here, the diff we submitted was the one of an > experimental branch and we apologize to have made such a mistake. > We will resubmit the patch with only essential patches in a more appropriate format very soon > (git format-patch). > > @Michael These errors come with the Clang compiler, aren't they? yes, seems this was clang version 4.0.0 (trunk 283753) i can retest and find a mininal command line for configure which triggers this in case you cannot reproduce it ? > We are aware that support for inline assembly may differ from one compiler to another > despite the "higly-compatibility" that is stated. The safety patches we are proposing > do not rely on them. > > @Carl @Kieran So far, we passed the FATE tests. > The output is slightly different because we have merged contiguous assembly statement > such that the compiler can no longer insert instruction between them, but the differences > are only instruction swaps A.B.C -> B.A.C. [...]
> i can retest and find a mininal command line for configure which triggers this > in case you cannot reproduce it ? Thank you, but it is a known (non) issue of Clang and is reproducible since at least 3.4.1. The fact is that GCC uses incomplete types in to say "block of unknown size" but Clang have chosen to not follow it on this. Also, I have resubmitted the patches and I hope they will be more readable and useful. Regards, Frédéric Recoules De: "Michael Niedermayer" <michael@niedermayer.cc> À: "ffmpeg-devel" <ffmpeg-devel@ffmpeg.org> Envoyé: Samedi 4 Avril 2020 18:55:27 Objet: Re: [FFmpeg-devel] [inline assembly compliance] Issues and patches On Sat, Apr 04, 2020 at 12:28:43PM +0200, FRÉDÉRIC RECOULES wrote: > Thank you for your answers. > > As you have pointed out, these patches are full of unrelated changes that are not important for safety. > Most of them were never intended to be posted here, the diff we submitted was the one of an > experimental branch and we apologize to have made such a mistake. > We will resubmit the patch with only essential patches in a more appropriate format very soon > (git format-patch). > > @Michael These errors come with the Clang compiler, aren't they? yes, seems this was clang version 4.0.0 (trunk 283753) i can retest and find a mininal command line for configure which triggers this in case you cannot reproduce it ? > We are aware that support for inline assembly may differ from one compiler to another > despite the "higly-compatibility" that is stated. The safety patches we are proposing > do not rely on them. > > @Carl @Kieran So far, we passed the FATE tests. > The output is slightly different because we have merged contiguous assembly statement > such that the compiler can no longer insert instruction between them, but the differences > are only instruction swaps A.B.C -> B.A.C. [...]
Hi Michael, We are looking forward to hear news about the patches ([PATCH /5] x86 inline assembly compliance). Did you have time to take a look? We are trying to keep track of the state of the assembly in projects in which we found issues. Please recall that they should be now small, very local, easy to review changes and that they pass the FATE tests. And please contact us if you need more information about them or about the other issues we are willing to patch in the future. Regards, Frédéric Recoules De: "FRÉDÉRIC RECOULES" <frederic.recoules@univ-grenoble-alpes.fr> À: "ffmpeg-devel" <ffmpeg-devel@ffmpeg.org> Envoyé: Samedi 4 Avril 2020 19:55:13 Objet: Re: [FFmpeg-devel] [inline assembly compliance] Issues and patches > i can retest and find a mininal command line for configure which triggers this > in case you cannot reproduce it ? Thank you, but it is a known (non) issue of Clang and is reproducible since at least 3.4.1. The fact is that GCC uses incomplete types in to say "block of unknown size" but Clang have chosen to not follow it on this. Also, I have resubmitted the patches and I hope they will be more readable and useful. Regards, Frédéric Recoules De: "Michael Niedermayer" <michael@niedermayer.cc> À: "ffmpeg-devel" <ffmpeg-devel@ffmpeg.org> Envoyé: Samedi 4 Avril 2020 18:55:27 Objet: Re: [FFmpeg-devel] [inline assembly compliance] Issues and patches On Sat, Apr 04, 2020 at 12:28:43PM +0200, FRÉDÉRIC RECOULES wrote: > Thank you for your answers. > > As you have pointed out, these patches are full of unrelated changes that are not important for safety. > Most of them were never intended to be posted here, the diff we submitted was the one of an > experimental branch and we apologize to have made such a mistake. > We will resubmit the patch with only essential patches in a more appropriate format very soon > (git format-patch). > > @Michael These errors come with the Clang compiler, aren't they? yes, seems this was clang version 4.0.0 (trunk 283753) i can retest and find a mininal command line for configure which triggers this in case you cannot reproduce it ? > We are aware that support for inline assembly may differ from one compiler to another > despite the "higly-compatibility" that is stated. The safety patches we are proposing > do not rely on them. > > @Carl @Kieran So far, we passed the FATE tests. > The output is slightly different because we have merged contiguous assembly statement > such that the compiler can no longer insert instruction between them, but the differences > are only instruction swaps A.B.C -> B.A.C. [...]
Dear developers, We sent you on April a list of several compliance issues in FFMPEG inline assembly chunks, and we submitted a bunch of patches to fix them. They pass the fate test and can be found at https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=996. Since the changes only concern the interface, the patches are small and localized. Moreover, when we adapted the patches to the project, we found that some inline assembly chunks have already been patched in the past with similar changes. All in all, while the project actually works with unsafe interfaces, the proposed changes would be more "future-proof". Beside some form problems, the patches were well received at first glance (by Michael Niedermayer), but for now, we do not clearly know where the review process stands and if the patches are going to be integrated: we would like to hear from you. Still, if you have any remark or question about patches, please, feel free to contact us. Regards, Frédéric Recoules ----- Mail original ----- De: "FRÉDÉRIC RECOULES" <frederic.recoules@univ-grenoble-alpes.fr> À: "ffmpeg-devel" <ffmpeg-devel@ffmpeg.org> Cc: "Richard Bonichon" <richard.bonichon@gmail.com>, "Sébastien Bardin" <sebastien.bardin@cea.fr> Envoyé: Vendredi 3 Avril 2020 22:41:58 Objet: [FFmpeg-devel] [inline assembly compliance] Issues and patches Dear developpers, we are academic researchers working in automated program analysis. We are currently interested in checking compliance of inline asm chunks as found in C programs. While benchmarking our tool and technique, we found a number of issues in FFMPEG. We report them to you, as well as adequate patches. Actually, we found 59 significant compliance issues in your code. We join 3 patches for some of them, together with explanations and we can send you other patches on demand. * All these bugs are related to compliance between the block of asm and its surrounding "contract" (in gcc-style notation). They are akin to undefined or implementation-defined behaviours in C: they currently do not manifest themselves in your program, but at some point in time with compiler optimizations becoming more and more aggressive or changes in undocumented compiler choices regarding asm chunks, they can suddenly trigger a (hard-to-find) bug. * The typical problems come from the compiler missing dataflow information and performing undue optimizations on this wrong basis, or the compiler allocating an already used register. Actually, we demonstrate "in lab" problems with all these categories of bugs in case of inlining (especially with LTO enabler) or code refactoring. * Some of those issues may seems benign or irrealistic but it cost nothing to patch so, why not do it? We would be very interested to hear your opinion on these matters. Are you interested in such errors and patches? Also, besides the patches, we are currently working on a code analyzer prototype designed to check asm compliance and to propose patches when the chunk is not compliant. This is still work in progress and we are finalizing it. The errors and patches I reported to you came from my prototype. In case such a prototype would be made available, would you consider using it? Best regards Frédéric Recoules _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff --git a/configure b/configure index dcead3a300..2e4b68915d 100755 --- a/configure +++ b/configure @@ -2250,6 +2250,7 @@ TOOLCHAIN_FEATURES=" symver_gnu_asm vfp_args xform_asm + mmx_clobbers xmm_clobbers " @@ -5792,6 +5793,8 @@ EOF check_inline_asm ebx_available '""::"b"(0)' && check_inline_asm ebx_available '"":::"%ebx"' + # check whether xmm clobbers are supported + check_inline_asm mmx_clobbers '"":::"%mm0"' # check whether xmm clobbers are supported check_inline_asm xmm_clobbers '"":::"%xmm0"' diff --git a/libavcodec/x86/hpeldsp_init.c b/libavcodec/x86/hpeldsp_init.c index d89928cec6..17501a2b5a 100644 --- a/libavcodec/x86/hpeldsp_init.c +++ b/libavcodec/x86/hpeldsp_init.c @@ -95,6 +95,8 @@ void ff_avg_approx_pixels8_xy2_3dnow(uint8_t *block, const uint8_t *pixels, /* MMX no rounding */ #define DEF(x, y) x ## _no_rnd_ ## y ## _mmx #define SET_RND MOVQ_WONE +#define SET_RND_TPL MOVQ_WONE_TPL +#define SET_RND_IN_COMMA #define PAVGBP(a, b, c, d, e, f) PAVGBP_MMX_NO_RND(a, b, c, d, e, f) #define PAVGB(a, b, c, e) PAVGB_MMX_NO_RND(a, b, c, e) #define STATIC static @@ -104,6 +106,8 @@ void ff_avg_approx_pixels8_xy2_3dnow(uint8_t *block, const uint8_t *pixels, #undef DEF #undef SET_RND +#undef SET_RND_TPL +#undef SET_RND_IN_COMMA #undef PAVGBP #undef PAVGB #undef STATIC @@ -121,6 +125,8 @@ CALL_2X_PIXELS(put_no_rnd_pixels16_xy2_mmx, put_no_rnd_pixels8_xy2_mmx, 8) #define DEF(x, y) x ## _ ## y ## _mmx #define SET_RND MOVQ_WTWO +#define SET_RND_TPL MOVQ_WTWO_TPL +#define SET_RND_IN_COMMA MOVQ_WTWO_IN_COMMA #define PAVGBP(a, b, c, d, e, f) PAVGBP_MMX(a, b, c, d, e, f) #define PAVGB(a, b, c, e) PAVGB_MMX(a, b, c, e) @@ -134,6 +140,8 @@ CALL_2X_PIXELS(put_no_rnd_pixels16_xy2_mmx, put_no_rnd_pixels8_xy2_mmx, 8) #undef DEF #undef SET_RND +#undef SET_RND_TPL +#undef SET_RND_IN_COMMA #undef PAVGBP #undef PAVGB diff --git a/libavcodec/x86/inline_asm.h b/libavcodec/x86/inline_asm.h index 0198746719..565e9e260a 100644 --- a/libavcodec/x86/inline_asm.h +++ b/libavcodec/x86/inline_asm.h @@ -23,13 +23,16 @@ #include "constants.h" -#define MOVQ_WONE(regd) \ - __asm__ volatile ( \ - "pcmpeqd %%" #regd ", %%" #regd " \n\t" \ - "psrlw $15, %%" #regd ::) +#define MOVQ_WONE_TPL(regd) \ + "pcmpeqd %%"#regd", %%"#regd" \n\t" \ + "psrlw $15, %%" #regd" \n\t" + +#define MOVQ_WONE(regd) __asm__ volatile (MOVQ_WONE_TPL(regd) ::) #define JUMPALIGN() __asm__ volatile (".p2align 3"::) -#define MOVQ_ZERO(regd) __asm__ volatile ("pxor %%"#regd", %%"#regd ::) + +#define MOVQ_ZERO_TPL(regd) "pxor %%"#regd", %%"#regd" \n\t" +#define MOVQ_ZERO(regd) __asm__ volatile (MOVQ_ZERO_TPL(regd) ::) #define MOVQ_BFE(regd) \ __asm__ volatile ( \ @@ -37,17 +40,20 @@ "paddb %%"#regd", %%"#regd" \n\t" ::) #ifndef PIC -#define MOVQ_WTWO(regd) __asm__ volatile ("movq %0, %%"#regd" \n\t" :: "m"(ff_pw_2)) +#define MOVQ_WTWO_TPL(regd) "movq %[ff_pw_2], %%"#regd" \n\t" +#define MOVQ_WTWO_IN [ff_pw_2] "m" (ff_pw_2) +#define MOVQ_WTWO_IN_COMMA MOVQ_WTWO_IN, #else // for shared library it's better to use this way for accessing constants // pcmpeqd -> -1 -#define MOVQ_WTWO(regd) \ - __asm__ volatile ( \ - "pcmpeqd %%"#regd", %%"#regd" \n\t" \ - "psrlw $15, %%"#regd" \n\t" \ - "psllw $1, %%"#regd" \n\t"::) - +#define MOVQ_WTWO_TPL(regd) \ + "pcmpeqd %%"#regd", %%"#regd" \n\t" \ + "psrlw $15, %%"#regd" \n\t" \ + "psllw $1, %%"#regd" \n\t" +#define MOVQ_WTWO_IN +#define MOVQ_WTWO_IN_COMMA #endif +#define MOVQ_WTWO(regd) __asm__ volatile (MOVQ_WTWO_TPL(regd) :: MOVQ_WTWO_IN) // using regr as temporary and for the output result // first argument is unmodified and second is trashed diff --git a/libavcodec/x86/lossless_videoencdsp_init.c b/libavcodec/x86/lossless_videoencdsp_init.c index 40407add52..3f2d9968b7 100644 --- a/libavcodec/x86/lossless_videoencdsp_init.c +++ b/libavcodec/x86/lossless_videoencdsp_init.c @@ -48,29 +48,38 @@ static void sub_median_pred_mmxext(uint8_t *dst, const uint8_t *src1, x86_reg i = 0; uint8_t l, lt; - __asm__ volatile ( - "movq (%1, %0), %%mm0 \n\t" // LT - "psllq $8, %%mm0 \n\t" - "1: \n\t" - "movq (%1, %0), %%mm1 \n\t" // T - "movq -1(%2, %0), %%mm2 \n\t" // L - "movq (%2, %0), %%mm3 \n\t" // X - "movq %%mm2, %%mm4 \n\t" // L - "psubb %%mm0, %%mm2 \n\t" - "paddb %%mm1, %%mm2 \n\t" // L + T - LT - "movq %%mm4, %%mm5 \n\t" // L - "pmaxub %%mm1, %%mm4 \n\t" // max(T, L) - "pminub %%mm5, %%mm1 \n\t" // min(T, L) - "pminub %%mm2, %%mm4 \n\t" - "pmaxub %%mm1, %%mm4 \n\t" - "psubb %%mm4, %%mm3 \n\t" // dst - pred - "movq %%mm3, (%3, %0) \n\t" - "add $8, %0 \n\t" - "movq -1(%1, %0), %%mm0 \n\t" // LT - "cmp %4, %0 \n\t" - " jb 1b \n\t" - : "+r" (i) - : "r" (src1), "r" (src2), "r" (dst), "r" ((x86_reg) w)); + __asm__ + ( + "movq (%[src1], %[i]), %%mm0 \n\t" // LT + "psllq $8, %%mm0 \n\t" + "1: \n\t" + "movq (%[src1], %[i]), %%mm1 \n\t" // T + "movq -1(%[src2], %[i]), %%mm2 \n\t" // L + "movq (%[src2], %[i]), %%mm3 \n\t" // X + "movq %%mm2, %%mm4 \n\t" // L + "psubb %%mm0, %%mm2 \n\t" + "paddb %%mm1, %%mm2 \n\t" // L + T - LT + "movq %%mm4, %%mm5 \n\t" // L + "pmaxub %%mm1, %%mm4 \n\t" // max(T, L) + "pminub %%mm5, %%mm1 \n\t" // min(T, L) + "pminub %%mm2, %%mm4 \n\t" + "pmaxub %%mm1, %%mm4 \n\t" + "psubb %%mm4, %%mm3 \n\t" // dst - pred + "movq %%mm3, (%[dst], %[i]) \n\t" + "add $8, %[i] \n\t" + "movq -1(%[src1], %[i]), %%mm0 \n\t" // LT + "cmp %[w], %[i] \n\t" + "jb 1b \n\t" + : "=m" (*(uint8_t (*)[])dst), + [i] "+&r" (i) + : "m" (*(const uint8_t (*)[])src1), + "m" (*(const uint8_t (*)[])src2), + [src1] "r" (src1), + [src2] "r" (src2), + [dst] "r" (dst), + [w] "r" ((x86_reg) w) + : MMX_CLOBBERS("mm0", "mm1", "mm2", "mm3", "mm4", "mm5") + ); l = *left; lt = *left_top; diff --git a/libavcodec/x86/rnd_template.c b/libavcodec/x86/rnd_template.c index 09946bd23f..1be010e066 100644 --- a/libavcodec/x86/rnd_template.c +++ b/libavcodec/x86/rnd_template.c @@ -30,146 +30,164 @@ #include "inline_asm.h" // put_pixels -av_unused STATIC void DEF(put, pixels8_xy2)(uint8_t *block, const uint8_t *pixels, - ptrdiff_t line_size, int h) +av_unused STATIC void DEF(put, pixels8_xy2)(uint8_t *block, + const uint8_t *pixels, + ptrdiff_t line_size, int h) { - MOVQ_ZERO(mm7); - SET_RND(mm6); // =2 for rnd and =1 for no_rnd version - __asm__ volatile( - "movq (%1), %%mm0 \n\t" - "movq 1(%1), %%mm4 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm4, %%mm5 \n\t" - "punpcklbw %%mm7, %%mm0 \n\t" - "punpcklbw %%mm7, %%mm4 \n\t" - "punpckhbw %%mm7, %%mm1 \n\t" - "punpckhbw %%mm7, %%mm5 \n\t" - "paddusw %%mm0, %%mm4 \n\t" - "paddusw %%mm1, %%mm5 \n\t" - "xor %%"FF_REG_a", %%"FF_REG_a" \n\t" - "add %3, %1 \n\t" - ".p2align 3 \n\t" - "1: \n\t" - "movq (%1, %%"FF_REG_a"), %%mm0 \n\t" - "movq 1(%1, %%"FF_REG_a"), %%mm2 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm2, %%mm3 \n\t" - "punpcklbw %%mm7, %%mm0 \n\t" - "punpcklbw %%mm7, %%mm2 \n\t" - "punpckhbw %%mm7, %%mm1 \n\t" - "punpckhbw %%mm7, %%mm3 \n\t" - "paddusw %%mm2, %%mm0 \n\t" - "paddusw %%mm3, %%mm1 \n\t" - "paddusw %%mm6, %%mm4 \n\t" - "paddusw %%mm6, %%mm5 \n\t" - "paddusw %%mm0, %%mm4 \n\t" - "paddusw %%mm1, %%mm5 \n\t" - "psrlw $2, %%mm4 \n\t" - "psrlw $2, %%mm5 \n\t" - "packuswb %%mm5, %%mm4 \n\t" - "movq %%mm4, (%2, %%"FF_REG_a") \n\t" - "add %3, %%"FF_REG_a" \n\t" + x86_reg i = 0; + __asm__ ( + MOVQ_ZERO_TPL(mm7) + SET_RND_TPL(mm6) // =2 for rnd and =1 for no_rnd version + "movq (%[pixels]), %%mm0 \n\t" + "movq 1(%[pixels]), %%mm4 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm4, %%mm5 \n\t" + "punpcklbw %%mm7, %%mm0 \n\t" + "punpcklbw %%mm7, %%mm4 \n\t" + "punpckhbw %%mm7, %%mm1 \n\t" + "punpckhbw %%mm7, %%mm5 \n\t" + "paddusw %%mm0, %%mm4 \n\t" + "paddusw %%mm1, %%mm5 \n\t" + "add %[line_size], %[pixels] \n\t" + ".p2align 3 \n\t" + "1: \n\t" + "movq (%[pixels], %[i]), %%mm0 \n\t" + "movq 1(%[pixels], %[i]), %%mm2 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm2, %%mm3 \n\t" + "punpcklbw %%mm7, %%mm0 \n\t" + "punpcklbw %%mm7, %%mm2 \n\t" + "punpckhbw %%mm7, %%mm1 \n\t" + "punpckhbw %%mm7, %%mm3 \n\t" + "paddusw %%mm2, %%mm0 \n\t" + "paddusw %%mm3, %%mm1 \n\t" + "paddusw %%mm6, %%mm4 \n\t" + "paddusw %%mm6, %%mm5 \n\t" + "paddusw %%mm0, %%mm4 \n\t" + "paddusw %%mm1, %%mm5 \n\t" + "psrlw $2, %%mm4 \n\t" + "psrlw $2, %%mm5 \n\t" + "packuswb %%mm5, %%mm4 \n\t" + "movq %%mm4, (%[block], %[i]) \n\t" + "add %[line_size], %[i] \n\t" - "movq (%1, %%"FF_REG_a"), %%mm2 \n\t" // 0 <-> 2 1 <-> 3 - "movq 1(%1, %%"FF_REG_a"), %%mm4 \n\t" - "movq %%mm2, %%mm3 \n\t" - "movq %%mm4, %%mm5 \n\t" - "punpcklbw %%mm7, %%mm2 \n\t" - "punpcklbw %%mm7, %%mm4 \n\t" - "punpckhbw %%mm7, %%mm3 \n\t" - "punpckhbw %%mm7, %%mm5 \n\t" - "paddusw %%mm2, %%mm4 \n\t" - "paddusw %%mm3, %%mm5 \n\t" - "paddusw %%mm6, %%mm0 \n\t" - "paddusw %%mm6, %%mm1 \n\t" - "paddusw %%mm4, %%mm0 \n\t" - "paddusw %%mm5, %%mm1 \n\t" - "psrlw $2, %%mm0 \n\t" - "psrlw $2, %%mm1 \n\t" - "packuswb %%mm1, %%mm0 \n\t" - "movq %%mm0, (%2, %%"FF_REG_a") \n\t" - "add %3, %%"FF_REG_a" \n\t" + "movq (%[pixels], %[i]), %%mm2 \n\t" + // 0 <-> 2 1 <-> 3 + "movq 1(%[pixels], %[i]), %%mm4 \n\t" + "movq %%mm2, %%mm3 \n\t" + "movq %%mm4, %%mm5 \n\t" + "punpcklbw %%mm7, %%mm2 \n\t" + "punpcklbw %%mm7, %%mm4 \n\t" + "punpckhbw %%mm7, %%mm3 \n\t" + "punpckhbw %%mm7, %%mm5 \n\t" + "paddusw %%mm2, %%mm4 \n\t" + "paddusw %%mm3, %%mm5 \n\t" + "paddusw %%mm6, %%mm0 \n\t" + "paddusw %%mm6, %%mm1 \n\t" + "paddusw %%mm4, %%mm0 \n\t" + "paddusw %%mm5, %%mm1 \n\t" + "psrlw $2, %%mm0 \n\t" + "psrlw $2, %%mm1 \n\t" + "packuswb %%mm1, %%mm0 \n\t" + "movq %%mm0, (%[block], %[i]) \n\t" + "add %[line_size], %[i] \n\t" - "subl $2, %0 \n\t" - "jnz 1b \n\t" - :"+g"(h), "+S"(pixels) - :"D"(block), "r"((x86_reg)line_size) - :FF_REG_a, "memory"); + "subl $2, %[h] \n\t" + "jnz 1b \n\t" + : "=m" (*(uint8_t (*)[])block), + [h] "+&g" (h), + [pixels] "+&S" (pixels), + [i] "+&r" (i) + : SET_RND_IN_COMMA + "m" (*(const uint8_t (*)[])pixels), + [block] "D" (block), + [line_size] "r" ((x86_reg)line_size) + : MMX_CLOBBERS("mm0", "mm1", "mm2", "mm3", + "mm4", "mm5", "mm6", "mm7")); } // avg_pixels // this routine is 'slightly' suboptimal but mostly unused -av_unused STATIC void DEF(avg, pixels8_xy2)(uint8_t *block, const uint8_t *pixels, - ptrdiff_t line_size, int h) +av_unused STATIC void DEF(avg, pixels8_xy2)(uint8_t *block, + const uint8_t *pixels, + ptrdiff_t line_size, int h) { - MOVQ_ZERO(mm7); - SET_RND(mm6); // =2 for rnd and =1 for no_rnd version - __asm__ volatile( - "movq (%1), %%mm0 \n\t" - "movq 1(%1), %%mm4 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm4, %%mm5 \n\t" - "punpcklbw %%mm7, %%mm0 \n\t" - "punpcklbw %%mm7, %%mm4 \n\t" - "punpckhbw %%mm7, %%mm1 \n\t" - "punpckhbw %%mm7, %%mm5 \n\t" - "paddusw %%mm0, %%mm4 \n\t" - "paddusw %%mm1, %%mm5 \n\t" - "xor %%"FF_REG_a", %%"FF_REG_a" \n\t" - "add %3, %1 \n\t" - ".p2align 3 \n\t" - "1: \n\t" - "movq (%1, %%"FF_REG_a"), %%mm0 \n\t" - "movq 1(%1, %%"FF_REG_a"), %%mm2 \n\t" - "movq %%mm0, %%mm1 \n\t" - "movq %%mm2, %%mm3 \n\t" - "punpcklbw %%mm7, %%mm0 \n\t" - "punpcklbw %%mm7, %%mm2 \n\t" - "punpckhbw %%mm7, %%mm1 \n\t" - "punpckhbw %%mm7, %%mm3 \n\t" - "paddusw %%mm2, %%mm0 \n\t" - "paddusw %%mm3, %%mm1 \n\t" - "paddusw %%mm6, %%mm4 \n\t" - "paddusw %%mm6, %%mm5 \n\t" - "paddusw %%mm0, %%mm4 \n\t" - "paddusw %%mm1, %%mm5 \n\t" - "psrlw $2, %%mm4 \n\t" - "psrlw $2, %%mm5 \n\t" - "movq (%2, %%"FF_REG_a"), %%mm3 \n\t" - "packuswb %%mm5, %%mm4 \n\t" - "pcmpeqd %%mm2, %%mm2 \n\t" - "paddb %%mm2, %%mm2 \n\t" - PAVGB_MMX(%%mm3, %%mm4, %%mm5, %%mm2) - "movq %%mm5, (%2, %%"FF_REG_a") \n\t" - "add %3, %%"FF_REG_a" \n\t" + x86_reg i = 0; + __asm__ ( + MOVQ_ZERO_TPL(mm7) + SET_RND_TPL(mm6) // =2 for rnd and =1 for no_rnd version + "movq (%[pixels]), %%mm0 \n\t" + "movq 1(%[pixels]), %%mm4 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm4, %%mm5 \n\t" + "punpcklbw %%mm7, %%mm0 \n\t" + "punpcklbw %%mm7, %%mm4 \n\t" + "punpckhbw %%mm7, %%mm1 \n\t" + "punpckhbw %%mm7, %%mm5 \n\t" + "paddusw %%mm0, %%mm4 \n\t" + "paddusw %%mm1, %%mm5 \n\t" + "add %[line_size], %[pixels] \n\t" + ".p2align 3 \n\t" + "1: \n\t" + "movq (%[pixels], %[i]), %%mm0 \n\t" + "movq 1(%[pixels], %[i]), %%mm2 \n\t" + "movq %%mm0, %%mm1 \n\t" + "movq %%mm2, %%mm3 \n\t" + "punpcklbw %%mm7, %%mm0 \n\t" + "punpcklbw %%mm7, %%mm2 \n\t" + "punpckhbw %%mm7, %%mm1 \n\t" + "punpckhbw %%mm7, %%mm3 \n\t" + "paddusw %%mm2, %%mm0 \n\t" + "paddusw %%mm3, %%mm1 \n\t" + "paddusw %%mm6, %%mm4 \n\t" + "paddusw %%mm6, %%mm5 \n\t" + "paddusw %%mm0, %%mm4 \n\t" + "paddusw %%mm1, %%mm5 \n\t" + "psrlw $2, %%mm4 \n\t" + "psrlw $2, %%mm5 \n\t" + "movq (%[block], %[i]), %%mm3 \n\t" + "packuswb %%mm5, %%mm4 \n\t" + "pcmpeqd %%mm2, %%mm2 \n\t" + "paddb %%mm2, %%mm2 \n\t" + PAVGB_MMX(%%mm3, %%mm4, %%mm5, %%mm2) + "movq %%mm5, (%[block], %[i]) \n\t" + "add %[line_size], %[i] \n\t" - "movq (%1, %%"FF_REG_a"), %%mm2 \n\t" // 0 <-> 2 1 <-> 3 - "movq 1(%1, %%"FF_REG_a"), %%mm4 \n\t" - "movq %%mm2, %%mm3 \n\t" - "movq %%mm4, %%mm5 \n\t" - "punpcklbw %%mm7, %%mm2 \n\t" - "punpcklbw %%mm7, %%mm4 \n\t" - "punpckhbw %%mm7, %%mm3 \n\t" - "punpckhbw %%mm7, %%mm5 \n\t" - "paddusw %%mm2, %%mm4 \n\t" - "paddusw %%mm3, %%mm5 \n\t" - "paddusw %%mm6, %%mm0 \n\t" - "paddusw %%mm6, %%mm1 \n\t" - "paddusw %%mm4, %%mm0 \n\t" - "paddusw %%mm5, %%mm1 \n\t" - "psrlw $2, %%mm0 \n\t" - "psrlw $2, %%mm1 \n\t" - "movq (%2, %%"FF_REG_a"), %%mm3 \n\t" - "packuswb %%mm1, %%mm0 \n\t" - "pcmpeqd %%mm2, %%mm2 \n\t" - "paddb %%mm2, %%mm2 \n\t" - PAVGB_MMX(%%mm3, %%mm0, %%mm1, %%mm2) - "movq %%mm1, (%2, %%"FF_REG_a") \n\t" - "add %3, %%"FF_REG_a" \n\t" + "movq (%[pixels], %[i]), %%mm2 \n\t" + // 0 <-> 2 1 <-> 3 + "movq 1(%[pixels], %[i]), %%mm4 \n\t" + "movq %%mm2, %%mm3 \n\t" + "movq %%mm4, %%mm5 \n\t" + "punpcklbw %%mm7, %%mm2 \n\t" + "punpcklbw %%mm7, %%mm4 \n\t" + "punpckhbw %%mm7, %%mm3 \n\t" + "punpckhbw %%mm7, %%mm5 \n\t" + "paddusw %%mm2, %%mm4 \n\t" + "paddusw %%mm3, %%mm5 \n\t" + "paddusw %%mm6, %%mm0 \n\t" + "paddusw %%mm6, %%mm1 \n\t" + "paddusw %%mm4, %%mm0 \n\t" + "paddusw %%mm5, %%mm1 \n\t" + "psrlw $2, %%mm0 \n\t" + "psrlw $2, %%mm1 \n\t" + "movq (%[block], %[i]), %%mm3 \n\t" + "packuswb %%mm1, %%mm0 \n\t" + "pcmpeqd %%mm2, %%mm2 \n\t" + "paddb %%mm2, %%mm2 \n\t" + PAVGB_MMX(%%mm3, %%mm0, %%mm1, %%mm2) + "movq %%mm1, (%[block], %[i]) \n\t" + "add %[line_size], %[i] \n\t" - "subl $2, %0 \n\t" - "jnz 1b \n\t" - :"+g"(h), "+S"(pixels) - :"D"(block), "r"((x86_reg)line_size) - :FF_REG_a, "memory"); + "subl $2, %[h] \n\t" + "jnz 1b \n\t" + : "=m" (*(uint8_t (*)[])block), + [h] "+&g" (h), + [pixels] "+&S" (pixels), + [i] "+&r" (i) + : SET_RND_IN_COMMA + "m" (*(const uint8_t (*)[])pixels), + [block] "D" (block), + [line_size] "r" ((x86_reg)line_size) + : MMX_CLOBBERS("mm0", "mm1", "mm2", "mm3", + "mm4", "mm5", "mm6", "mm7")); } diff --git a/libavutil/x86/asm.h b/libavutil/x86/asm.h index 9bff42d628..bb3c13f5c1 100644 --- a/libavutil/x86/asm.h +++ b/libavutil/x86/asm.h @@ -79,6 +79,26 @@ typedef int x86_reg; # define BROKEN_RELOCATIONS 1 #endif +/* + * If gcc is not set to support mmx (-mmmx) it will not accept mmx registers + * in the clobber list for inline asm. MMX_CLOBBERS takes a list of mmx + * registers to be marked as clobbered and evaluates to nothing if they are + * not supported, or to the list itself if they are supported. Since a clobber + * list may not be empty, XMM_CLOBBERS_ONLY should be used if the mmx + * registers are the only in the clobber list. + * For example a list with "eax" and "mm0" as clobbers should become: + * : MMX_CLOBBERS("mm0",) "eax" + * and a list with only "mm0" should become: + * MMX_CLOBBERS_ONLY("mm0") + */ +#if HAVE_MMX_CLOBBERS +# define MMX_CLOBBERS(...) __VA_ARGS__ +# define MMX_CLOBBERS_ONLY(...) : __VA_ARGS__ +#else +# define MMX_CLOBBERS(...) +# define MMX_CLOBBERS_ONLY(...) +#endif + /* * If gcc is not set to support sse (-msse) it will not accept xmm registers * in the clobber list for inline asm. XMM_CLOBBERS takes a list of xmm -- 2.17.1