Message ID | 20191127145546.6873-1-xujunzz@sjtu.edu.cn |
---|---|
State | New |
Headers | show |
Am Mi., 27. Nov. 2019 um 15:56 Uhr schrieb <xujunzz@sjtu.edu.cn>: > From: Xu Jun <xujunzz@sjtu.edu.cn> > > In order to add x86 SIMD for filter_column(), I write a C function which processes 16 columns at a time. How does this perform compared to the existing C code? Carl Eugen
I'm sorry not to reply in time. The performance of this C code is about 10% better than the existing C code. It will have a bigger improvement after X86 SIMD optimizations. Xu Jun ----- 原始邮件 ----- 发件人: "Carl Eugen Hoyos" <ceffmpeg@gmail.com> 收件人: "FFmpeg development discussions and patches" <ffmpeg-devel@ffmpeg.org> 发送时间: 星期四, 2019年 11 月 28日 上午 12:19:44 主题: Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD. Am Mi., 27. Nov. 2019 um 15:56 Uhr schrieb <xujunzz@sjtu.edu.cn>: > From: Xu Jun <xujunzz@sjtu.edu.cn> > > In order to add x86 SIMD for filter_column(), I write a C function which processes 16 columns at a time. How does this perform compared to the existing C code? Carl Eugen
> 在 2019年12月2日,10:42,徐鋆 <xujunzz@sjtu.edu.cn> 写道: > > I'm sorry not to reply in time. > > The performance of this C code is about 10% better than the existing C code. > > It will have a bigger improvement after X86 SIMD optimizations. 1. How to test? 1. 怎么测试的? 1. どうやってテストしたの? 2. Don’t TOP-Posting: https://en.wikipedia.org/wiki/Top-posting 2. 回邮件要在你回的那一条的下面回复,别再最上面回复,人家看不懂你是针对的哪一条 2. 返信メールは、あなたが返信した項目の下にある。一番上に返信しないと、あなたが何を狙っているのか分からない > > Xu Jun > > ----- 原始邮件 ----- > 发件人: "Carl Eugen Hoyos" <ceffmpeg@gmail.com> > 收件人: "FFmpeg development discussions and patches" <ffmpeg-devel@ffmpeg.org> > 发送时间: 星期四, 2019年 11 月 28日 上午 12:19:44 > 主题: Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD. > > Am Mi., 27. Nov. 2019 um 15:56 Uhr schrieb <xujunzz@sjtu.edu.cn>: > >> From: Xu Jun <xujunzz@sjtu.edu.cn> >> >> In order to add x86 SIMD for filter_column(), I write a C function which processes 16 columns at a time. > > How does this perform compared to the existing C code? > > Carl Eugen > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > -- > 敬颂钧安, > 徐鋆 > 电子信息与电气工程学院 > 上海交通大学 > 邮箱:xujunzz@sjtu.edu.cn > 地址:上海市闵行区东川路800号 > > Yours sincerely, > Xylem(Jun Xu) > School of Electronic, Information and Electrical Engineering > Shanghai Jiao Tong University > Email: xujunzz@sjtu.edu.cn > No. 800, Dongchuan Road, Minhang District, Shanghai 200240, China > > 宜しくお愿いたします > 徐鋆 > 電子情報と電気工程学院 > 上海交通大学 > メールアドレス :xujunzz@sjtu.edu.cn > 住所:上海市閔行区ドンチュワンルー800号 > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". Thanks Steven
Hi, Steven ----- 原始邮件 ----- 发件人: "Steven Liu" <lq@chinaffmpeg.org> 收件人: "FFmpeg development discussions and patches" <ffmpeg-devel@ffmpeg.org> 抄送: "Steven Liu" <lq@chinaffmpeg.org> 发送时间: 星期一, 2019年 12 月 02日 上午 10:44:48 主题: Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD. > 在 2019年12月2日,10:42,徐鋆 <xujunzz@sjtu.edu.cn> 写道: > > I'm sorry not to reply in time. > > The performance of this C code is about 10% better than the existing C code. > > It will have a bigger improvement after X86 SIMD optimizations. 1. How to test? 1. 怎么测试的? 1. どうやってテストしたの? I tested using this command: ./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1/45:1/45:1/45:1/45:1:2:3:4:column:column:column:column" -an -vframes 2000 -f null /dev/null The FPS increases from 329 to 365 on my local machine. 2. Don’t TOP-Posting: https://en.wikipedia.org/wiki/Top-posting 2. 回邮件要在你回的那一条的下面回复,别再最上面回复,人家看不懂你是针对的哪一条 2. 返信メールは、あなたが返信した項目の下にある。一番上に返信しないと、あなたが何を狙っているのか分からない Thank you for reminding me. I'm new here. Forgive me for not knowing the rules:) > > Xu Jun > > ----- 原始邮件 ----- > 发件人: "Carl Eugen Hoyos" <ceffmpeg@gmail.com> > 收件人: "FFmpeg development discussions and patches" <ffmpeg-devel@ffmpeg.org> > 发送时间: 星期四, 2019年 11 月 28日 上午 12:19:44 > 主题: Re: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD. > > Am Mi., 27. Nov. 2019 um 15:56 Uhr schrieb <xujunzz@sjtu.edu.cn>: > >> From: Xu Jun <xujunzz@sjtu.edu.cn> >> >> In order to add x86 SIMD for filter_column(), I write a C function which processes 16 columns at a time. > > How does this perform compared to the existing C code? > > Carl Eugen > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > -- > 敬颂钧安, > 徐鋆 > 电子信息与电气工程学院 > 上海交通大学 > 邮箱:xujunzz@sjtu.edu.cn > 地址:上海市闵行区东川路800号 > > Yours sincerely, > Xylem(Jun Xu) > School of Electronic, Information and Electrical Engineering > Shanghai Jiao Tong University > Email: xujunzz@sjtu.edu.cn > No. 800, Dongchuan Road, Minhang District, Shanghai 200240, China > > 宜しくお愿いたします > 徐鋆 > 電子情報と電気工程学院 > 上海交通大学 > メールアドレス :xujunzz@sjtu.edu.cn > 住所:上海市閔行区ドンチュワンルー800号 > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". Thanks Steven
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > xujunzz@sjtu.edu.cn > Sent: Wednesday, November 27, 2019 10:56 PM > To: ffmpeg-devel@ffmpeg.org > Cc: xujunzz@sjtu.edu.cn > Subject: [FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column > operation for filter_column() to prepare for x86 SIMD. > > From: Xu Jun <xujunzz@sjtu.edu.cn> > > In order to add x86 SIMD for filter_column(), I write a C function which > processes 16 columns at a time. > > Signed-off-by: Xu Jun <xujunzz@sjtu.edu.cn> > --- > libavfilter/vf_convolution.c | 56 +++++++++++++++++++++++++++ > libavfilter/x86/vf_convolution_init.c | 23 +++++++++++ > 2 files changed, 79 insertions(+) > > diff --git a/libavfilter/vf_convolution.c b/libavfilter/vf_convolution.c > index d022f1a04a..5291415d48 100644 > --- a/libavfilter/vf_convolution.c > +++ b/libavfilter/vf_convolution.c > @@ -520,6 +520,61 @@ static int filter_slice(AVFilterContext *ctx, void *arg, > int jobnr, int nb_jobs) > continue; > } > > + if (mode == MATRIX_COLUMN && s->filter[plane] != filter_column){ > + for (y = slice_start; y < slice_end - 16; y+=16) { Please take care of the coding style there should be white-space between variables and operators. And also I think this piece of change make it harder to maintain, let's try to avoid code duplicate as much as we can. > + const int xoff = (y - slice_start) * bpc; > + const int yoff = radius * stride; > + for (x = 0; x < radius; x++) { > + const int xoff = (y - slice_start) * bpc; > + const int yoff = x * stride; > + > + s->setup[plane](radius, c, src, stride, x, width, y, height, bpc); > + s->filter[plane](dst + yoff + xoff, 1, rdiv, > + bias, matrix, c, 16, radius, > + dstride, stride); > + } > + s->setup[plane](radius, c, src, stride, radius, width, y, height, bpc); > + s->filter[plane](dst + yoff + xoff, sizew - 2 * radius, > + rdiv, bias, matrix, c, 16, radius, > + dstride, stride); > + for (x = sizew - radius; x < sizew; x++) { > + const int xoff = (y - slice_start) * bpc; > + const int yoff = x * stride; > + > + s->setup[plane](radius, c, src, stride, x, width, y, height, bpc); > + s->filter[plane](dst + yoff + xoff, 1, rdiv, > + bias, matrix, c, 16, radius, > + dstride, stride); > + } > + } > + if (y < slice_end){ > + const int xoff = (y - slice_start) * bpc; > + const int yoff = radius * stride; > + for (x = 0; x < radius; x++) { > + const int xoff = (y - slice_start) * bpc; > + const int yoff = x * stride; > + > + s->setup[plane](radius, c, src, stride, x, width, y, height, bpc); > + s->filter[plane](dst + yoff + xoff, 1, rdiv, > + bias, matrix, c, slice_end - y, radius, > + dstride, stride); > + } > + s->setup[plane](radius, c, src, stride, radius, width, y, height, bpc); > + s->filter[plane](dst + yoff + xoff, sizew - 2 * radius, > + rdiv, bias, matrix, c, slice_end - y, radius, > + dstride, stride); > + for (x = sizew - radius; x < sizew; x++) { > + const int xoff = (y - slice_start) * bpc; > + const int yoff = x * stride; > + > + s->setup[plane](radius, c, src, stride, x, width, y, height, bpc); > + s->filter[plane](dst + yoff + xoff, 1, rdiv, > + bias, matrix, c, slice_end - y, radius, > + dstride, stride); > + } > + } > + } > + else { > for (y = slice_start; y < slice_end; y++) { > const int xoff = mode == MATRIX_COLUMN ? (y - slice_start) * bpc : > radius * bpc; > const int yoff = mode == MATRIX_COLUMN ? radius * stride : 0; > @@ -550,6 +605,7 @@ static int filter_slice(AVFilterContext *ctx, void *arg, > int jobnr, int nb_jobs) > dst += dstride; > } > } > + } > > return 0; > } > diff --git a/libavfilter/x86/vf_convolution_init.c > b/libavfilter/x86/vf_convolution_init.c > index d1e8c90ceb..6b1c2f0e9f 100644 > --- a/libavfilter/x86/vf_convolution_init.c > +++ b/libavfilter/x86/vf_convolution_init.c > @@ -34,6 +34,27 @@ void ff_filter_row_sse4(uint8_t *dst, int width, > const uint8_t *c[], int peak, int radius, > int dstride, int stride); > This C code should not be in the x86-specific file. Ruiling > +static void filter_column16(uint8_t *dst, int height, > + float rdiv, float bias, const int *const matrix, > + const uint8_t *c[], int length, int radius, > + int dstride, int stride) > +{ > + int y, off16; > + > + for (y = 0; y < height; y++) { > + for (off16 = 0; off16 < length; off16++){ > + int i, sum = 0; > + > + for (i = 0; i < 2 * radius + 1; i++) > + sum += c[i][0 + y * stride + off16] * matrix[i]; > + > + sum = (int)(sum * rdiv + bias + 0.5f); > + dst[off16] = av_clip_uint8(sum); > + } > + dst += dstride; > + } > + > +} > > av_cold void ff_convolution_init_x86(ConvolutionContext *s) > { > @@ -51,6 +72,8 @@ av_cold void > ff_convolution_init_x86(ConvolutionContext *s) > if (EXTERNAL_SSE4(cpu_flags)) > s->filter[i] = ff_filter_row_sse4; > } > + if (s->mode[i] == MATRIX_COLUMN) > + s->filter[i] = filter_column16; > } > #endif > } > -- > 2.17.1 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Am Mo., 2. Dez. 2019 um 03:42 Uhr schrieb 徐鋆 <xujunzz@sjtu.edu.cn>: > I'm sorry not to reply in time. Definitely in time! > The performance of this C code is about 10% better than the existing C code. Please add this to the commit message. Carl Eugen
diff --git a/libavfilter/vf_convolution.c b/libavfilter/vf_convolution.c index d022f1a04a..5291415d48 100644 --- a/libavfilter/vf_convolution.c +++ b/libavfilter/vf_convolution.c @@ -520,6 +520,61 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) continue; } + if (mode == MATRIX_COLUMN && s->filter[plane] != filter_column){ + for (y = slice_start; y < slice_end - 16; y+=16) { + const int xoff = (y - slice_start) * bpc; + const int yoff = radius * stride; + for (x = 0; x < radius; x++) { + const int xoff = (y - slice_start) * bpc; + const int yoff = x * stride; + + s->setup[plane](radius, c, src, stride, x, width, y, height, bpc); + s->filter[plane](dst + yoff + xoff, 1, rdiv, + bias, matrix, c, 16, radius, + dstride, stride); + } + s->setup[plane](radius, c, src, stride, radius, width, y, height, bpc); + s->filter[plane](dst + yoff + xoff, sizew - 2 * radius, + rdiv, bias, matrix, c, 16, radius, + dstride, stride); + for (x = sizew - radius; x < sizew; x++) { + const int xoff = (y - slice_start) * bpc; + const int yoff = x * stride; + + s->setup[plane](radius, c, src, stride, x, width, y, height, bpc); + s->filter[plane](dst + yoff + xoff, 1, rdiv, + bias, matrix, c, 16, radius, + dstride, stride); + } + } + if (y < slice_end){ + const int xoff = (y - slice_start) * bpc; + const int yoff = radius * stride; + for (x = 0; x < radius; x++) { + const int xoff = (y - slice_start) * bpc; + const int yoff = x * stride; + + s->setup[plane](radius, c, src, stride, x, width, y, height, bpc); + s->filter[plane](dst + yoff + xoff, 1, rdiv, + bias, matrix, c, slice_end - y, radius, + dstride, stride); + } + s->setup[plane](radius, c, src, stride, radius, width, y, height, bpc); + s->filter[plane](dst + yoff + xoff, sizew - 2 * radius, + rdiv, bias, matrix, c, slice_end - y, radius, + dstride, stride); + for (x = sizew - radius; x < sizew; x++) { + const int xoff = (y - slice_start) * bpc; + const int yoff = x * stride; + + s->setup[plane](radius, c, src, stride, x, width, y, height, bpc); + s->filter[plane](dst + yoff + xoff, 1, rdiv, + bias, matrix, c, slice_end - y, radius, + dstride, stride); + } + } + } + else { for (y = slice_start; y < slice_end; y++) { const int xoff = mode == MATRIX_COLUMN ? (y - slice_start) * bpc : radius * bpc; const int yoff = mode == MATRIX_COLUMN ? radius * stride : 0; @@ -550,6 +605,7 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) dst += dstride; } } + } return 0; } diff --git a/libavfilter/x86/vf_convolution_init.c b/libavfilter/x86/vf_convolution_init.c index d1e8c90ceb..6b1c2f0e9f 100644 --- a/libavfilter/x86/vf_convolution_init.c +++ b/libavfilter/x86/vf_convolution_init.c @@ -34,6 +34,27 @@ void ff_filter_row_sse4(uint8_t *dst, int width, const uint8_t *c[], int peak, int radius, int dstride, int stride); +static void filter_column16(uint8_t *dst, int height, + float rdiv, float bias, const int *const matrix, + const uint8_t *c[], int length, int radius, + int dstride, int stride) +{ + int y, off16; + + for (y = 0; y < height; y++) { + for (off16 = 0; off16 < length; off16++){ + int i, sum = 0; + + for (i = 0; i < 2 * radius + 1; i++) + sum += c[i][0 + y * stride + off16] * matrix[i]; + + sum = (int)(sum * rdiv + bias + 0.5f); + dst[off16] = av_clip_uint8(sum); + } + dst += dstride; + } + +} av_cold void ff_convolution_init_x86(ConvolutionContext *s) { @@ -51,6 +72,8 @@ av_cold void ff_convolution_init_x86(ConvolutionContext *s) if (EXTERNAL_SSE4(cpu_flags)) s->filter[i] = ff_filter_row_sse4; } + if (s->mode[i] == MATRIX_COLUMN) + s->filter[i] = filter_column16; } #endif }