From patchwork Sun Dec 3 20:50:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 6530 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.161.94 with SMTP id m30csp3660013jah; Sun, 3 Dec 2017 12:51:08 -0800 (PST) X-Google-Smtp-Source: AGs4zMZMOQGABNMtHCp9/KCOpz2zFFTADdn2gL658O7b/NNfjmadC4c++HeKrO1EM+MhZ9wY3Epx X-Received: by 10.28.105.14 with SMTP id e14mr1419156wmc.74.1512334268827; Sun, 03 Dec 2017 12:51:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512334268; cv=none; d=google.com; s=arc-20160816; b=d+9ZZcnm/see1MUJN0M/E/o7aHZb7q3QXFzkXJypXKrBExWqjWDAz9vL5f5MX5H3gs UZVEu6rTgErky1/ryE6ebZ5jG6WCDejhG/g+3UEcZdT4Kl85kNxxIlcKo+TWx05ru1n7 ZmQuc1AiWru+BCiidCCFlmR3MpidIo9/3ILszyGJLO12iJdf+4TbeH/pVS3KIvKkqxDS 8yaD+/u5UACYHUN3TMoQsOyatGk/IYsDoVRKBpZHvxTGOgZj+W1/WAXajcgFKlKbXyZ3 Xne4TztsHGxIRWnS439T9FwnSyt3uTSbiFVD4XqmIKObbYwUk/DI7xIWLGnqUvpEXHrM TBnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=Lve0mIRfj8oiN8LxjYI3+QYpYRrG5cSfrKEYaUDPXWA=; b=grnFb+ul84wplkWwLxAVdzN+crP2g7So96zQ1+Hu5fRxUVdbqy73hiGgFy1JCRk73e gcPSH5nK9Mpu9FKFepsQGYJPeei0ArBeYSHXy5UcfKIV+iTHO7C3hn0zanY6WbxWae7+ YVGasmNyq+Jlvu7lTS4kRBEXCgZ+9jJqvbsZ8YCHRwA9Ce6o5EgtEGxJS01euK3aFan6 DGhW+rGcHfsxVB6BFzOMohEqaX4XrmoSohq5PCCq3OlXo2XKAaYKKIxjxFIhUe2i2LWd 6gYTT350hJCUX4Z8m50k4OGfLXbaD20bU3qLYqEQ5x9AbdpqNBgLTGTVls7BWcLnlIZV firA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=qAe0wonm; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id v7si9183901wrf.198.2017.12.03.12.51.07; Sun, 03 Dec 2017 12:51:08 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=qAe0wonm; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DB24368A389; Sun, 3 Dec 2017 22:51:01 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com [74.125.82.68]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8A1CE68A350 for ; Sun, 3 Dec 2017 22:50:55 +0200 (EET) Received: by mail-wm0-f68.google.com with SMTP id i11so11012575wmf.4 for ; Sun, 03 Dec 2017 12:51:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=Tie7HmIvqvMRugZPZu71d4nOTeESb6XqoZMm73g9rmI=; b=qAe0wonmaCMa/s/8LL5ehhbAKeYIYRfzEy7gp7/xI5zFECVph8Ss2JPeg0fXr6vJYX hgSlUamyvJu3OyGhA91tLqYfjX/Gm7wOxLoNYd9jE2rEnOnWZw1yr9NOZybJqyb8daSp e2PnelViOMR1JJJ2pqcSeQ5sOsDAEyZMJuYTcwXnOtS8GxkoGeArPOfJCDFN0ze4GqAH Jxr36tCtcQjyVVqAozheKDCN6frQj7+86wiPru7uxPSFy6l8eCa5OJpIpdEvJs/rybzw ljrAlLB+9Zhhfte5qO8wWSorwt7CsaK6OVoyjhh80hOsmB0snB7OHCv2iw+lC+XqtyJD X64g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=Tie7HmIvqvMRugZPZu71d4nOTeESb6XqoZMm73g9rmI=; b=qUwQAO0hOd5hTxNhc2yeyixmvjnxHtitZ3y/lLe+NGg2Is1DemILSdLoe44vnp2cU4 P+CV7XIzkkX4RVJAjYnjtxpMC3y0Lu8VBUq6fCunlzQkPthn0A8nkKunLrZv7Y3Vcohw 8URlZetAPPb0It64htxSnDcXsUUqzc8ZNJdo4Tg5T16d1T1+869ZmaLm8U8ERirfvDb9 ena+vl/1wN2XnZ2MpkTSXKqSry6IVLOIcBLG4jRmcMRWe39ywG2QyLw0ACY7XydF5CPf EmTlXdHPykJuCBQBDgDMqoCAcFHIFWOOLBCZwhuzvt+92azmuD3UVQEvdO1L1p+SAq1J MZyg== X-Gm-Message-State: AJaThX56nzMYg7bupQeUgw8xAJPYfatQipBgxGgEGlWPrlukJhpaMnY7 XOz227La40J5qQsQTed9Bhtitw== X-Received: by 10.28.214.70 with SMTP id n67mr5399539wmg.83.1512334259309; Sun, 03 Dec 2017 12:50:59 -0800 (PST) Received: from localhost.localdomain ([94.250.174.60]) by smtp.gmail.com with ESMTPSA id m23sm2063094wmc.29.2017.12.03.12.50.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 03 Dec 2017 12:50:58 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sun, 3 Dec 2017 21:50:17 +0100 Message-Id: <20171203205017.5112-1-onemda@gmail.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171202115856.340-1-onemda@gmail.com> References: <20171202115856.340-1-onemda@gmail.com> Subject: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavfilter/hflip.h | 38 ++++++++++++ libavfilter/vf_hflip.c | 133 ++++++++++++++++++++++++++-------------- libavfilter/x86/Makefile | 2 + libavfilter/x86/vf_hflip.asm | 102 ++++++++++++++++++++++++++++++ libavfilter/x86/vf_hflip_init.c | 41 +++++++++++++ 5 files changed, 269 insertions(+), 47 deletions(-) create mode 100644 libavfilter/hflip.h create mode 100644 libavfilter/x86/vf_hflip.asm create mode 100644 libavfilter/x86/vf_hflip_init.c diff --git a/libavfilter/hflip.h b/libavfilter/hflip.h new file mode 100644 index 0000000000..138380427c --- /dev/null +++ b/libavfilter/hflip.h @@ -0,0 +1,38 @@ +/* + * Copyright (c) 2007 Benoit Fouet + * Copyright (c) 2010 Stefano Sabatini + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVFILTER_HFLIP_H +#define AVFILTER_HFLIP_H + +#include "avfilter.h" + +typedef struct FlipContext { + const AVClass *class; + int max_step[4]; ///< max pixel step for each plane, expressed as a number of bytes + int planewidth[4]; ///< width of each plane + int planeheight[4]; ///< height of each plane + + void (*flip_line[4])(const uint8_t *src, uint8_t *dst, int w); +} FlipContext; + +void ff_hflip_init_x86(FlipContext *s, int step[4]); + +#endif /* AVFILTER_HFLIP_H */ diff --git a/libavfilter/vf_hflip.c b/libavfilter/vf_hflip.c index cf20c193f7..030015df0a 100644 --- a/libavfilter/vf_hflip.c +++ b/libavfilter/vf_hflip.c @@ -29,6 +29,7 @@ #include "libavutil/opt.h" #include "avfilter.h" #include "formats.h" +#include "hflip.h" #include "internal.h" #include "video.h" #include "libavutil/pixdesc.h" @@ -36,13 +37,6 @@ #include "libavutil/intreadwrite.h" #include "libavutil/imgutils.h" -typedef struct FlipContext { - const AVClass *class; - int max_step[4]; ///< max pixel step for each plane, expressed as a number of bytes - int planewidth[4]; ///< width of each plane - int planeheight[4]; ///< height of each plane -} FlipContext; - static const AVOption hflip_options[] = { { NULL } }; @@ -67,12 +61,77 @@ static int query_formats(AVFilterContext *ctx) return ff_set_common_formats(ctx, pix_fmts); } +static void hflip_byte_c(const uint8_t *src, uint8_t *dst, int w) +{ + int j; + + for (j = 0; j < w; j++) + dst[j] = src[-j]; +} + +static void hflip_short_c(const uint8_t *ssrc, uint8_t *ddst, int w) +{ + const uint16_t *src = (const uint16_t *)ssrc; + uint16_t *dst = (uint16_t *)ddst; + int j; + + for (j = 0; j < w; j++) + dst[j] = src[-j]; +} + +static void hflip_dword_c(const uint8_t *ssrc, uint8_t *ddst, int w) +{ + const uint32_t *src = (const uint32_t *)ssrc; + uint32_t *dst = (uint32_t *)ddst; + int j; + + for (j = 0; j < w; j++) + dst[j] = src[-j]; +} + +static void hflip_b24_c(const uint8_t *src, uint8_t *dst, int w) +{ + const uint8_t *in = src; + uint8_t *out = dst; + int j; + + for (j = 0; j < w; j++, out += 3, in -= 3) { + int32_t v = AV_RB24(in); + + AV_WB24(out, v); + } +} + +static void hflip_b48_c(const uint8_t *src, uint8_t *dst, int w) +{ + const uint8_t *in = src; + uint8_t *out = dst; + int j; + + for (j = 0; j < w; j++, out += 6, in -= 6) { + int64_t v = AV_RB48(in); + + AV_WB48(out, v); + } +} + +static void hflip_qword_c(const uint8_t *ssrc, uint8_t *ddst, int w) +{ + const uint64_t *src = (const uint64_t *)ssrc; + uint64_t *dst = (uint64_t *)ddst; + int j; + + for (j = 0; j < w; j++) + dst[j] = src[-j]; +} + static int config_props(AVFilterLink *inlink) { FlipContext *s = inlink->dst->priv; const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(inlink->format); const int hsub = pix_desc->log2_chroma_w; const int vsub = pix_desc->log2_chroma_h; + int nb_planes, i; av_image_fill_max_pixsteps(s->max_step, NULL, pix_desc); s->planewidth[0] = s->planewidth[3] = inlink->w; @@ -80,6 +139,24 @@ static int config_props(AVFilterLink *inlink) s->planeheight[0] = s->planeheight[3] = inlink->h; s->planeheight[1] = s->planeheight[2] = AV_CEIL_RSHIFT(inlink->h, vsub); + nb_planes = av_pix_fmt_count_planes(inlink->format); + + for (i = 0; i < nb_planes; i++) { + switch (s->max_step[i]) { + case 1: s->flip_line[i] = hflip_byte_c; break; + case 2: s->flip_line[i] = hflip_short_c; break; + case 3: s->flip_line[i] = hflip_b24_c; break; + case 4: s->flip_line[i] = hflip_dword_c; break; + case 6: s->flip_line[i] = hflip_b48_c; break; + case 8: s->flip_line[i] = hflip_qword_c; break; + default: + return AVERROR_BUG; + } + } + + if (ARCH_X86) + ff_hflip_init_x86(s, s->max_step); + return 0; } @@ -94,7 +171,7 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int job, int nb_jobs) AVFrame *in = td->in; AVFrame *out = td->out; uint8_t *inrow, *outrow; - int i, j, plane, step; + int i, plane, step; for (plane = 0; plane < 4 && in->data[plane] && in->linesize[plane]; plane++) { const int width = s->planewidth[plane]; @@ -107,45 +184,7 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int job, int nb_jobs) outrow = out->data[plane] + start * out->linesize[plane]; inrow = in ->data[plane] + start * in->linesize[plane] + (width - 1) * step; for (i = start; i < end; i++) { - switch (step) { - case 1: - for (j = 0; j < width; j++) - outrow[j] = inrow[-j]; - break; - - case 2: - { - uint16_t *outrow16 = (uint16_t *)outrow; - uint16_t * inrow16 = (uint16_t *) inrow; - for (j = 0; j < width; j++) - outrow16[j] = inrow16[-j]; - } - break; - - case 3: - { - uint8_t *in = inrow; - uint8_t *out = outrow; - for (j = 0; j < width; j++, out += 3, in -= 3) { - int32_t v = AV_RB24(in); - AV_WB24(out, v); - } - } - break; - - case 4: - { - uint32_t *outrow32 = (uint32_t *)outrow; - uint32_t * inrow32 = (uint32_t *) inrow; - for (j = 0; j < width; j++) - outrow32[j] = inrow32[-j]; - } - break; - - default: - for (j = 0; j < width; j++) - memcpy(outrow + j*step, inrow - j*step, step); - } + s->flip_line[plane](inrow, outrow, width); inrow += in ->linesize[plane]; outrow += out->linesize[plane]; diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile index c10f4d5538..2fc5c62644 100644 --- a/libavfilter/x86/Makefile +++ b/libavfilter/x86/Makefile @@ -5,6 +5,7 @@ OBJS-$(CONFIG_COLORSPACE_FILTER) += x86/colorspacedsp_init.o OBJS-$(CONFIG_EQ_FILTER) += x86/vf_eq.o OBJS-$(CONFIG_FSPP_FILTER) += x86/vf_fspp_init.o OBJS-$(CONFIG_GRADFUN_FILTER) += x86/vf_gradfun_init.o +OBJS-$(CONFIG_HFLIP_FILTER) += x86/vf_hflip_init.o OBJS-$(CONFIG_HQDN3D_FILTER) += x86/vf_hqdn3d_init.o OBJS-$(CONFIG_IDET_FILTER) += x86/vf_idet_init.o OBJS-$(CONFIG_INTERLACE_FILTER) += x86/vf_interlace_init.o @@ -32,6 +33,7 @@ X86ASM-OBJS-$(CONFIG_BWDIF_FILTER) += x86/vf_bwdif.o X86ASM-OBJS-$(CONFIG_COLORSPACE_FILTER) += x86/colorspacedsp.o X86ASM-OBJS-$(CONFIG_FSPP_FILTER) += x86/vf_fspp.o X86ASM-OBJS-$(CONFIG_GRADFUN_FILTER) += x86/vf_gradfun.o +X86ASM-OBJS-$(CONFIG_HFLIP_FILTER) += x86/vf_hflip.o X86ASM-OBJS-$(CONFIG_HQDN3D_FILTER) += x86/vf_hqdn3d.o X86ASM-OBJS-$(CONFIG_IDET_FILTER) += x86/vf_idet.o X86ASM-OBJS-$(CONFIG_INTERLACE_FILTER) += x86/vf_interlace.o diff --git a/libavfilter/x86/vf_hflip.asm b/libavfilter/x86/vf_hflip.asm new file mode 100644 index 0000000000..d2ad3e1161 --- /dev/null +++ b/libavfilter/x86/vf_hflip.asm @@ -0,0 +1,102 @@ +;***************************************************************************** +;* x86-optimized functions for hflip filter +;* +;* Copyright (C) 2017 Paul B Mahol +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;***************************************************************************** + +%include "libavutil/x86/x86util.asm" + +SECTION_RODATA + +pb_flip_byte: db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 +pb_flip_short: db 14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1 + +SECTION .text + +INIT_XMM ssse3 +cglobal hflip_byte, 3, 6, 3, src, dst, w, x, v, r + mova m0, [pb_flip_byte] + mov xq, 0 + mov wd, dword wm + mov rq, wq + and rq, 2 * mmsize - 1 + cmp wq, 2 * mmsize + jl .loop1 + sub wq, rq + + .loop0: + neg xq + movu m1, [srcq + xq - mmsize + 1] + movu m2, [srcq + xq - 2 * mmsize + 1] + pshufb m1, m0 + pshufb m2, m0 + neg xq + movu [dstq + xq ], m1 + movu [dstq + xq + mmsize], m2 + add xq, mmsize * 2 + cmp xq, wq + jl .loop0 + + add wq, rq + + .loop1: + neg xq + mov vb, [srcq + xq] + neg xq + mov [dstq + xq], vb + add xq, 1 + cmp xq, wq + jl .loop1 +RET + +cglobal hflip_short, 3, 6, 3, src, dst, w, x, v, r + mova m0, [pb_flip_short] + mov xq, 0 + mov wd, dword wm + add wq, wq + mov rq, wq + and rq, 2 * mmsize - 1 + cmp wq, 2 * mmsize + jl .loop1 + sub wq, rq + + .loop0: + neg xq + movu m1, [srcq + xq - mmsize + 2] + movu m2, [srcq + xq - 2 * mmsize + 2] + pshufb m1, m0 + pshufb m2, m0 + neg xq + movu [dstq + xq ], m1 + movu [dstq + xq + mmsize], m2 + add xq, mmsize * 2 + cmp xq, wq + jl .loop0 + + add wq, rq + + .loop1: + neg xq + mov vw, [srcq + xq] + neg xq + mov [dstq + xq], vw + add xq, 2 + cmp xq, wq + jl .loop1 +RET diff --git a/libavfilter/x86/vf_hflip_init.c b/libavfilter/x86/vf_hflip_init.c new file mode 100644 index 0000000000..d8eab1f905 --- /dev/null +++ b/libavfilter/x86/vf_hflip_init.c @@ -0,0 +1,41 @@ +/* + * Copyright (c) 2017 Paul B Mahol + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavutil/x86/cpu.h" +#include "libavfilter/hflip.h" + +void ff_hflip_byte_ssse3(const uint8_t *src, uint8_t *dst, int w); +void ff_hflip_short_ssse3(const uint8_t *src, uint8_t *dst, int w); + +av_cold void ff_hflip_init_x86(FlipContext *s, int step[4]) +{ + int cpu_flags = av_get_cpu_flags(); + int i; + + for (i = 0; i < 4; i++) { + if (EXTERNAL_SSSE3(cpu_flags) && step[i] == 1) { + s->flip_line[i] = ff_hflip_byte_ssse3; + } else if (EXTERNAL_SSSE3(cpu_flags) && step[i] == 2) { + s->flip_line[i] = ff_hflip_short_ssse3; + } + } +}