Message ID | 20211029151903.1078367-7-onemda@gmail.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel,1/7] avfilter/vf_nlmeans: use more friendlier 'for (int ...' | expand |
Context | Check | Description |
---|---|---|
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
andriy/make_ppc | success | Make finished |
andriy/make_fate_ppc | success | Make fate finished |
On Fri, Oct 29, 2021 at 05:19:03PM +0200, Paul B Mahol wrote: > Signed-off-by: Paul B Mahol <onemda@gmail.com> > --- > libavfilter/vf_nlmeans.c | 3 + > libavfilter/vf_nlmeans.h | 1 + > libavfilter/x86/Makefile | 2 + > libavfilter/x86/vf_nlmeans.asm | 92 +++++++++++++++++++++++++++++++ > libavfilter/x86/vf_nlmeans_init.c | 40 ++++++++++++++ > 5 files changed, 138 insertions(+) > create mode 100644 libavfilter/x86/vf_nlmeans.asm > create mode 100644 libavfilter/x86/vf_nlmeans_init.c on x86-32 linux: src/libavfilter/x86/vf_nlmeans.asm:42: error: symbol `r9q' undefined src//libavutil/x86/x86inc.asm:298: ... from macro `movsxdifnidn' defined here src/libavfilter/x86/vf_nlmeans.asm:43: error: symbol `r10q' undefined src//libavutil/x86/x86inc.asm:298: ... from macro `movsxdifnidn' defined here src/libavfilter/x86/vf_nlmeans.asm:44: error: symbol `r8q' undefined src//libavutil/x86/x86inc.asm:298: ... from macro `movsxdifnidn' defined here src/libavfilter/x86/vf_nlmeans.asm:46: error: symbol `r10q' undefined src//libavutil/x86/x86inc.asm:1154: ... from macro `sub' defined here src/libavfilter/x86/vf_nlmeans.asm:47: error: symbol `r11q' undefined src/libavfilter/x86/vf_nlmeans.asm:48: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1151: ... from macro `sub' defined here src/libavfilter/x86/vf_nlmeans.asm:49: error: symbol `r11q' undefined src/libavfilter/x86/vf_nlmeans.asm:50: error: symbol `r10q' undefined src/libavfilter/x86/vf_nlmeans.asm:51: error: symbol `r10q' undefined src//libavutil/x86/x86inc.asm:1142: ... from macro `add' defined here src/libavfilter/x86/vf_nlmeans.asm:53: error: symbol `r11q' undefined src/libavfilter/x86/vf_nlmeans.asm:54: error: symbol `r9q' undefined src/libavfilter/x86/vf_nlmeans.asm:60: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1398: ... from macro `movdqu' defined here src//libavutil/x86/x86inc.asm:1263: ... from macro `RUN_AVX_INSTR' defined here src//libavutil/x86/x86inc.asm:1716: ... from macro `vmovdqu' defined here src/libavfilter/x86/vf_nlmeans.asm:61: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1398: ... from macro `movdqu' defined here src//libavutil/x86/x86inc.asm:1263: ... from macro `RUN_AVX_INSTR' defined here src//libavutil/x86/x86inc.asm:1716: ... from macro `vmovdqu' defined here src/libavfilter/x86/vf_nlmeans.asm:62: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1398: ... from macro `movdqu' defined here src//libavutil/x86/x86inc.asm:1263: ... from macro `RUN_AVX_INSTR' defined here src//libavutil/x86/x86inc.asm:1716: ... from macro `vmovdqu' defined here src/libavfilter/x86/vf_nlmeans.asm:63: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1398: ... from macro `movdqu' defined here src//libavutil/x86/x86inc.asm:1263: ... from macro `RUN_AVX_INSTR' defined here src//libavutil/x86/x86inc.asm:1716: ... from macro `vmovdqu' defined here src/libavfilter/x86/vf_nlmeans.asm:65: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1501: ... from macro `pmovzxbd' defined here src//libavutil/x86/x86inc.asm:1263: ... from macro `RUN_AVX_INSTR' defined here src/libavfilter/x86/vf_nlmeans.asm:74: error: symbol `r7q' undefined src/libavfilter/x86/vf_nlmeans.asm:78: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1417: ... from macro `movups' defined here src//libavutil/x86/x86inc.asm:1263: ... from macro `RUN_AVX_INSTR' defined here src/libavfilter/x86/vf_nlmeans.asm:79: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1417: ... from macro `movups' defined here src//libavutil/x86/x86inc.asm:1263: ... from macro `RUN_AVX_INSTR' defined here src/libavfilter/x86/vf_nlmeans.asm:84: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1417: ... from macro `movups' defined here src//libavutil/x86/x86inc.asm:1263: ... from macro `RUN_AVX_INSTR' defined here src/libavfilter/x86/vf_nlmeans.asm:85: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1417: ... from macro `movups' defined here src//libavutil/x86/x86inc.asm:1263: ... from macro `RUN_AVX_INSTR' defined here src/libavfilter/x86/vf_nlmeans.asm:87: error: symbol `r11q' undefined src//libavutil/x86/x86inc.asm:1139: ... from macro `add' defined here src/libavfilter/x86/vf_nlmeans.asm:88: error: symbol `r11q' undefined /home/michael/ffmpeg-git/ffmpeg/ffbuild/common.mak:92: recipe for target 'libavfilter/x86/vf_nlmeans.o' failed make: *** [libavfilter/x86/vf_nlmeans.o] Error 1 [...]
On 10/29/2021 12:19 PM, Paul B Mahol wrote: > Signed-off-by: Paul B Mahol <onemda@gmail.com> > --- > libavfilter/vf_nlmeans.c | 3 + > libavfilter/vf_nlmeans.h | 1 + > libavfilter/x86/Makefile | 2 + > libavfilter/x86/vf_nlmeans.asm | 92 +++++++++++++++++++++++++++++++ > libavfilter/x86/vf_nlmeans_init.c | 40 ++++++++++++++ > 5 files changed, 138 insertions(+) > create mode 100644 libavfilter/x86/vf_nlmeans.asm > create mode 100644 libavfilter/x86/vf_nlmeans_init.c > > diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c > index dee1f68101..4d5dcba5cc 100644 > --- a/libavfilter/vf_nlmeans.c > +++ b/libavfilter/vf_nlmeans.c > @@ -519,6 +519,9 @@ void ff_nlmeans_init(NLMeansDSPContext *dsp) > > if (ARCH_AARCH64) > ff_nlmeans_init_aarch64(dsp); > + > + if (ARCH_X86) > + ff_nlmeans_init_x86(dsp); > } > > static av_cold int init(AVFilterContext *ctx) > diff --git a/libavfilter/vf_nlmeans.h b/libavfilter/vf_nlmeans.h > index cd1ee7c0bf..43611a03bd 100644 > --- a/libavfilter/vf_nlmeans.h > +++ b/libavfilter/vf_nlmeans.h > @@ -41,5 +41,6 @@ typedef struct NLMeansDSPContext { > > void ff_nlmeans_init(NLMeansDSPContext *dsp); > void ff_nlmeans_init_aarch64(NLMeansDSPContext *dsp); > +void ff_nlmeans_init_x86(NLMeansDSPContext *dsp); > > #endif /* AVFILTER_NLMEANS_H */ > diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile > index a29941eaeb..e87481bd7a 100644 > --- a/libavfilter/x86/Makefile > +++ b/libavfilter/x86/Makefile > @@ -20,6 +20,7 @@ OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter_init.o > OBJS-$(CONFIG_LUT3D_FILTER) += x86/vf_lut3d_init.o > OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp_init.o > OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge_init.o > +OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans_init.o > OBJS-$(CONFIG_NOISE_FILTER) += x86/vf_noise.o > OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay_init.o > OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7_init.o > @@ -61,6 +62,7 @@ X86ASM-OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter.o > X86ASM-OBJS-$(CONFIG_LUT3D_FILTER) += x86/vf_lut3d.o > X86ASM-OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp.o > X86ASM-OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge.o > +X86ASM-OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans.o > X86ASM-OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay.o > X86ASM-OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7.o > X86ASM-OBJS-$(CONFIG_PSNR_FILTER) += x86/vf_psnr.o > diff --git a/libavfilter/x86/vf_nlmeans.asm b/libavfilter/x86/vf_nlmeans.asm > new file mode 100644 > index 0000000000..1047e43de4 > --- /dev/null > +++ b/libavfilter/x86/vf_nlmeans.asm > @@ -0,0 +1,92 @@ > +;***************************************************************************** > +;* x86-optimized functions for nlmeans filter > +;* > +;* This file is part of FFmpeg. > +;* > +;* FFmpeg is free software; you can redistribute it and/or > +;* modify it under the terms of the GNU Lesser General Public > +;* License as published by the Free Software Foundation; either > +;* version 2.1 of the License, or (at your option) any later version. > +;* > +;* FFmpeg is distributed in the hope that it will be useful, > +;* but WITHOUT ANY WARRANTY; without even the implied warranty of > +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +;* Lesser General Public License for more details. > +;* > +;* You should have received a copy of the GNU Lesser General Public > +;* License along with FFmpeg; if not, write to the Free Software > +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > +;****************************************************************************** > + > + > +%include "libavutil/x86/x86util.asm" > + > +%if HAVE_AVX2_EXTERNAL Also check for ARCH_X86_64. > + > +SECTION_RODATA No need for this if it's going to be emtpy. > + > +SECTION .text > + > +; void ff_compute_weights_line(const uint32_t *const iia, > +; const uint32_t *const iib, > +; const uint32_t *const iid, > +; const uint32_t *const iie, > +; const uint8_t *const src, > +; struct weighted_avg *wa, > +; const float *const lut, > +; int max, > +; int startx, int endx); > + > +INIT_YMM avx2 > +cglobal compute_weights_line, 11, 12, 7, iia, iib, iid, iie, src, total, sum, lut, max, startx, endx, x cglobal compute_weights_line, 8, 11, 5, iia, iib, iid, iie, src, total, sum, lut, x, startx, endx > + movsxdifnidn startxq, startxd > + movsxdifnidn endxq, endxd movsxd startxq, dword startxm movsxd endxq, dword endxm > + movsxdifnidn maxq, maxd Remove this > + > + sub endxq, startxq > + mov xq, mmsize / 4 > + sub xq, 1 > + not xq > + and endxq, xq > + add endxq, startxq > + > + mov xq, startxq > + xor startxq, startxq > + > + VPBROADCASTD m4, maxm VPBROADCASTD m4, r8m > + vpcmpeqd m5, m5 > + > + .loop: > + movu m0, [iieq + xq * 4] > + movu m1, [iidq + xq * 4] > + movu m2, [iibq + xq * 4] > + movu m3, [iiaq + xq * 4] Load only m0, remove the other three. > + > + pmovzxbd m6, [srcq + xq] > + cvtdq2ps m6, m6 > + > + psubd m0, m1 > + psubd m0, m2 > + paddd m0, m3 psubd m0, [iidq + xq * 4] psubd m0, [iibq + xq * 4] paddd m0, [iiaq + xq * 4] AVX can handle unaligned loads with these instructions. > + pminud m0, m4 > + pslld m0, 2 > + mova m3, m5 > + vgatherdps m1, [lutq + m0], m3 > + > + mulps m0, m1, m6 > + > + movups m3, [totalq + xq * 4] > + movups m2, [sumq + xq * 4] > + > + addps m0, m2 > + addps m1, m3 addps m1, [totalq + xq * 4] addps m0, [sumq + xq * 4] Then you can change the ymm regs to use up to m4. > + > + movups [totalq + xq * 4], m1 > + movups [sumq + xq * 4], m0 > + > + add xq, mmsize / 4 > + cmp xq, endxq > + jl .loop > + RET > + > +%endif > diff --git a/libavfilter/x86/vf_nlmeans_init.c b/libavfilter/x86/vf_nlmeans_init.c > new file mode 100644 > index 0000000000..6fbd8f9008 > --- /dev/null > +++ b/libavfilter/x86/vf_nlmeans_init.c > @@ -0,0 +1,40 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +#include "libavutil/attributes.h" > +#include "libavutil/x86/cpu.h" > +#include "libavfilter/vf_nlmeans.h" > + > +void ff_compute_weights_line_avx2(const uint32_t *const iia, > + const uint32_t *const iib, > + const uint32_t *const iid, > + const uint32_t *const iie, > + const uint8_t *const src, > + float *total_weight, > + float *sum, > + const float *const weight_lut, > + int max_meaningful_diff, > + int startx, int endx); > + > +av_cold void ff_nlmeans_init_x86(NLMeansDSPContext *dsp) > +{ > + int cpu_flags = av_get_cpu_flags(); > + > + if (EXTERNAL_AVX2_FAST(cpu_flags)) Also ARCH_X86_64. > + dsp->compute_weights_line = ff_compute_weights_line_avx2; > +} >
diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c index dee1f68101..4d5dcba5cc 100644 --- a/libavfilter/vf_nlmeans.c +++ b/libavfilter/vf_nlmeans.c @@ -519,6 +519,9 @@ void ff_nlmeans_init(NLMeansDSPContext *dsp) if (ARCH_AARCH64) ff_nlmeans_init_aarch64(dsp); + + if (ARCH_X86) + ff_nlmeans_init_x86(dsp); } static av_cold int init(AVFilterContext *ctx) diff --git a/libavfilter/vf_nlmeans.h b/libavfilter/vf_nlmeans.h index cd1ee7c0bf..43611a03bd 100644 --- a/libavfilter/vf_nlmeans.h +++ b/libavfilter/vf_nlmeans.h @@ -41,5 +41,6 @@ typedef struct NLMeansDSPContext { void ff_nlmeans_init(NLMeansDSPContext *dsp); void ff_nlmeans_init_aarch64(NLMeansDSPContext *dsp); +void ff_nlmeans_init_x86(NLMeansDSPContext *dsp); #endif /* AVFILTER_NLMEANS_H */ diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile index a29941eaeb..e87481bd7a 100644 --- a/libavfilter/x86/Makefile +++ b/libavfilter/x86/Makefile @@ -20,6 +20,7 @@ OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter_init.o OBJS-$(CONFIG_LUT3D_FILTER) += x86/vf_lut3d_init.o OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp_init.o OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge_init.o +OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans_init.o OBJS-$(CONFIG_NOISE_FILTER) += x86/vf_noise.o OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay_init.o OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7_init.o @@ -61,6 +62,7 @@ X86ASM-OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter.o X86ASM-OBJS-$(CONFIG_LUT3D_FILTER) += x86/vf_lut3d.o X86ASM-OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp.o X86ASM-OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge.o +X86ASM-OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans.o X86ASM-OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay.o X86ASM-OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7.o X86ASM-OBJS-$(CONFIG_PSNR_FILTER) += x86/vf_psnr.o diff --git a/libavfilter/x86/vf_nlmeans.asm b/libavfilter/x86/vf_nlmeans.asm new file mode 100644 index 0000000000..1047e43de4 --- /dev/null +++ b/libavfilter/x86/vf_nlmeans.asm @@ -0,0 +1,92 @@ +;***************************************************************************** +;* x86-optimized functions for nlmeans filter +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + + +%include "libavutil/x86/x86util.asm" + +%if HAVE_AVX2_EXTERNAL + +SECTION_RODATA + +SECTION .text + +; void ff_compute_weights_line(const uint32_t *const iia, +; const uint32_t *const iib, +; const uint32_t *const iid, +; const uint32_t *const iie, +; const uint8_t *const src, +; struct weighted_avg *wa, +; const float *const lut, +; int max, +; int startx, int endx); + +INIT_YMM avx2 +cglobal compute_weights_line, 11, 12, 7, iia, iib, iid, iie, src, total, sum, lut, max, startx, endx, x + movsxdifnidn startxq, startxd + movsxdifnidn endxq, endxd + movsxdifnidn maxq, maxd + + sub endxq, startxq + mov xq, mmsize / 4 + sub xq, 1 + not xq + and endxq, xq + add endxq, startxq + + mov xq, startxq + xor startxq, startxq + + VPBROADCASTD m4, maxm + vpcmpeqd m5, m5 + + .loop: + movu m0, [iieq + xq * 4] + movu m1, [iidq + xq * 4] + movu m2, [iibq + xq * 4] + movu m3, [iiaq + xq * 4] + + pmovzxbd m6, [srcq + xq] + cvtdq2ps m6, m6 + + psubd m0, m1 + psubd m0, m2 + paddd m0, m3 + pminud m0, m4 + pslld m0, 2 + mova m3, m5 + vgatherdps m1, [lutq + m0], m3 + + mulps m0, m1, m6 + + movups m3, [totalq + xq * 4] + movups m2, [sumq + xq * 4] + + addps m0, m2 + addps m1, m3 + + movups [totalq + xq * 4], m1 + movups [sumq + xq * 4], m0 + + add xq, mmsize / 4 + cmp xq, endxq + jl .loop + RET + +%endif diff --git a/libavfilter/x86/vf_nlmeans_init.c b/libavfilter/x86/vf_nlmeans_init.c new file mode 100644 index 0000000000..6fbd8f9008 --- /dev/null +++ b/libavfilter/x86/vf_nlmeans_init.c @@ -0,0 +1,40 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/x86/cpu.h" +#include "libavfilter/vf_nlmeans.h" + +void ff_compute_weights_line_avx2(const uint32_t *const iia, + const uint32_t *const iib, + const uint32_t *const iid, + const uint32_t *const iie, + const uint8_t *const src, + float *total_weight, + float *sum, + const float *const weight_lut, + int max_meaningful_diff, + int startx, int endx); + +av_cold void ff_nlmeans_init_x86(NLMeansDSPContext *dsp) +{ + int cpu_flags = av_get_cpu_flags(); + + if (EXTERNAL_AVX2_FAST(cpu_flags)) + dsp->compute_weights_line = ff_compute_weights_line_avx2; +}
Signed-off-by: Paul B Mahol <onemda@gmail.com> --- libavfilter/vf_nlmeans.c | 3 + libavfilter/vf_nlmeans.h | 1 + libavfilter/x86/Makefile | 2 + libavfilter/x86/vf_nlmeans.asm | 92 +++++++++++++++++++++++++++++++ libavfilter/x86/vf_nlmeans_init.c | 40 ++++++++++++++ 5 files changed, 138 insertions(+) create mode 100644 libavfilter/x86/vf_nlmeans.asm create mode 100644 libavfilter/x86/vf_nlmeans_init.c