From patchwork Thu Feb 18 23:02:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 25770 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 6F851449552 for ; Fri, 19 Feb 2021 01:03:15 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3EEEE689CD5; Fri, 19 Feb 2021 01:03:15 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C6188680A07 for ; Fri, 19 Feb 2021 01:03:08 +0200 (EET) Received: by mail-ej1-f42.google.com with SMTP id hs11so8513116ejc.1 for ; Thu, 18 Feb 2021 15:03:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=bBY7gWzT7pOIFLr2vdn9lqGh/8E8ibkWf2f66ElyCSY=; b=QT/ojQ7X/LSpEdz+/+yWAUFIi0LZHMnWoSmyIN5i+PMeOMezSz4ENYEqI9CrLm3v/6 WbXCBoE/hWEhRfQ63lL93qcG834Hr5NGbBtCvzNKLiq8rxys6MVSO3etCbWsOjPycsU5 qlLYr3VfabvEKu0C2EITHBfSVN0sv4R0hIa8xFbRsGOWHCplcA8kamgyiXcxfiINlJrk VK9JHyu9euPrFq9xKAfabzU5bxexBzG/nY7FT+MQc6iyIJtKBLsU8PfJuhFp/KhJhnSN lGLvXUUo3hzkv+Nz/Fc1rm0jIvtm+2YLJVoOLCpY5TMX4ieZN/3flvEhHmU95VCcPo6c mLiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=bBY7gWzT7pOIFLr2vdn9lqGh/8E8ibkWf2f66ElyCSY=; b=KRtRbQSjuaOvqOEFbsQVyiQFW777E8leUNysxC2zzPwydVJhr0SRwE/b7XFxh6fI9y Own5f208L0qi7fqGL0kmYvBMXeYDIHy1+sXKRGZ2BVeNT2qu24hU0dtoOyGUEkOwjbuk 0EI5rW0nWo98vyfqtpT54OA7+9xPGHUUWn3AwknekFBMLEDjNriMxqk9bnpBGIJGcAaV MtpUxBADjpMsaOKw9Dlarj44IPwYf/lS0f0/ju6Rb7XopMe041/LOVLcUWgKeAVhvEQB oYMXSoxbRmRB+i2s6quT1m/qJ2fBonUzRiyc2scK1fa3vjZDVL3x17B66YndcZgF31jp lzdA== X-Gm-Message-State: AOAM532sR1OPrFZO2qG4Hk8zr6SSxlVilgpWsiTlirMl9W/tkRigZjxx CfhzdKQz/qjUSMDrZba9p6SCb7MbgOAeuA== X-Google-Smtp-Source: ABdhPJyBkTnBvOZEqBVeSAbqXcPnJvbH7I5kElWEU6BoCLWXGx0JvwPrEw1wFG4ojLSqqxKFSfY/Hg== X-Received: by 2002:a17:906:9bf8:: with SMTP id de56mr5987618ejc.425.1613689388237; Thu, 18 Feb 2021 15:03:08 -0800 (PST) Received: from localhost.localdomain ([94.250.162.225]) by smtp.gmail.com with ESMTPSA id e22sm3851361edu.61.2021.02.18.15.03.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 15:03:07 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Fri, 19 Feb 2021 00:02:55 +0100 Message-Id: <20210218230258.1263-1-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH 1/4] avcodec/cfhdenc: refactor DSP code for CFHD encoder X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This is needed to implement x86 SIMD. Signed-off-by: Paul B Mahol --- libavcodec/Makefile | 2 +- libavcodec/cfhdenc.c | 123 +++++++++++----------------------------- libavcodec/cfhdencdsp.c | 76 +++++++++++++++++++++++++ libavcodec/cfhdencdsp.h | 41 ++++++++++++++ 4 files changed, 151 insertions(+), 91 deletions(-) create mode 100644 libavcodec/cfhdencdsp.c create mode 100644 libavcodec/cfhdencdsp.h diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 53d164e83e..6a57ddf2b1 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -259,7 +259,7 @@ OBJS-$(CONFIG_CDGRAPHICS_DECODER) += cdgraphics.o OBJS-$(CONFIG_CDTOONS_DECODER) += cdtoons.o OBJS-$(CONFIG_CDXL_DECODER) += cdxl.o OBJS-$(CONFIG_CFHD_DECODER) += cfhd.o cfhddata.o cfhddsp.o -OBJS-$(CONFIG_CFHD_ENCODER) += cfhdenc.o cfhddata.o +OBJS-$(CONFIG_CFHD_ENCODER) += cfhdenc.o cfhddata.o cfhdencdsp.o OBJS-$(CONFIG_CINEPAK_DECODER) += cinepak.o OBJS-$(CONFIG_CINEPAK_ENCODER) += cinepakenc.o elbg.o OBJS-$(CONFIG_CLEARVIDEO_DECODER) += clearvideo.o diff --git a/libavcodec/cfhdenc.c b/libavcodec/cfhdenc.c index 5554baefa3..1e89ffc41c 100644 --- a/libavcodec/cfhdenc.c +++ b/libavcodec/cfhdenc.c @@ -33,6 +33,7 @@ #include "avcodec.h" #include "bytestream.h" #include "cfhd.h" +#include "cfhdencdsp.h" #include "put_bits.h" #include "internal.h" #include "thread.h" @@ -239,6 +240,8 @@ typedef struct CFHDEncContext { Runbook rb[321]; Codebook cb[513]; int16_t *alpha; + + CFHDEncDSPContext dsp; } CFHDEncContext; static av_cold int cfhd_encode_init(AVCodecContext *avctx) @@ -359,6 +362,8 @@ static av_cold int cfhd_encode_init(AVCodecContext *avctx) s->lut[i] = last; } + ff_cfhdencdsp_init(&s->dsp); + if (s->planes != 4) return 0; @@ -369,42 +374,6 @@ static av_cold int cfhd_encode_init(AVCodecContext *avctx) return 0; } -static av_always_inline void filter(int16_t *input, ptrdiff_t in_stride, - int16_t *low, ptrdiff_t low_stride, - int16_t *high, ptrdiff_t high_stride, - int len) -{ - low[(0>>1) * low_stride] = av_clip_int16(input[0*in_stride] + input[1*in_stride]); - high[(0>>1) * high_stride] = av_clip_int16((5 * input[0*in_stride] - 11 * input[1*in_stride] + - 4 * input[2*in_stride] + 4 * input[3*in_stride] - - 1 * input[4*in_stride] - 1 * input[5*in_stride] + 4) >> 3); - - for (int i = 2; i < len - 2; i += 2) { - low[(i>>1) * low_stride] = av_clip_int16(input[i*in_stride] + input[(i+1)*in_stride]); - high[(i>>1) * high_stride] = av_clip_int16(((-input[(i-2)*in_stride] - input[(i-1)*in_stride] + - input[(i+2)*in_stride] + input[(i+3)*in_stride] + 4) >> 3) + - input[(i+0)*in_stride] - input[(i+1)*in_stride]); - } - - low[((len-2)>>1) * low_stride] = av_clip_int16(input[((len-2)+0)*in_stride] + input[((len-2)+1)*in_stride]); - high[((len-2)>>1) * high_stride] = av_clip_int16((11* input[((len-2)+0)*in_stride] - 5 * input[((len-2)+1)*in_stride] - - 4 * input[((len-2)-1)*in_stride] - 4 * input[((len-2)-2)*in_stride] + - 1 * input[((len-2)-3)*in_stride] + 1 * input[((len-2)-4)*in_stride] + 4) >> 3); -} - -static void horiz_filter(int16_t *input, int16_t *low, int16_t *high, - int width) -{ - filter(input, 1, low, 1, high, 1, width); -} - -static void vert_filter(int16_t *input, ptrdiff_t in_stride, - int16_t *low, ptrdiff_t low_stride, - int16_t *high, ptrdiff_t high_stride, int len) -{ - filter(input, in_stride, low, low_stride, high, high_stride, len); -} - static void quantize_band(int16_t *input, int width, int a_width, int height, unsigned quantization) { @@ -454,6 +423,7 @@ static int cfhd_encode_frame(AVCodecContext *avctx, AVPacket *pkt, const AVFrame *frame, int *got_packet) { CFHDEncContext *s = avctx->priv_data; + CFHDEncDSPContext *dsp = &s->dsp; PutByteContext *pby = &s->pby; PutBitContext *pb = &s->pb; const Codebook *const cb = s->cb; @@ -480,12 +450,9 @@ static int cfhd_encode_frame(AVCodecContext *avctx, AVPacket *pkt, in_stride = avctx->width; } - for (int i = 0; i < height * 2; i++) { - horiz_filter(input, low, high, width * 2); - input += in_stride; - low += a_width; - high += a_width; - } + dsp->horiz_filter(input, low, high, + in_stride, a_width, a_width, + width * 2, height * 2); input = s->plane[plane].l_h[7]; low = s->plane[plane].subband[7]; @@ -493,23 +460,17 @@ static int cfhd_encode_frame(AVCodecContext *avctx, AVPacket *pkt, high = s->plane[plane].subband[9]; high_stride = s->plane[plane].band[2][0].a_width; - for (int i = 0; i < width; i++) { - vert_filter(input, a_width, low, low_stride, high, high_stride, height * 2); - input++; - low++; - high++; - } + dsp->vert_filter(input, low, high, + a_width, low_stride, high_stride, + width, height * 2); input = s->plane[plane].l_h[6]; low = s->plane[plane].l_h[7]; high = s->plane[plane].subband[8]; - for (int i = 0; i < width; i++) { - vert_filter(input, a_width, low, low_stride, high, high_stride, height * 2); - input++; - low++; - high++; - } + dsp->vert_filter(input, low, high, + a_width, low_stride, high_stride, + width, height * 2); a_width = s->plane[plane].band[1][0].a_width; width = s->plane[plane].band[1][0].width; @@ -527,34 +488,25 @@ static int cfhd_encode_frame(AVCodecContext *avctx, AVPacket *pkt, } input = s->plane[plane].l_h[7]; - for (int i = 0; i < height * 2; i++) { - horiz_filter(input, low, high, width * 2); - input += a_width * 2; - low += low_stride; - high += high_stride; - } + dsp->horiz_filter(input, low, high, + a_width * 2, low_stride, high_stride, + width * 2, height * 2); input = s->plane[plane].l_h[4]; low = s->plane[plane].subband[4]; high = s->plane[plane].subband[6]; - for (int i = 0; i < width; i++) { - vert_filter(input, a_width, low, low_stride, high, high_stride, height * 2); - input++; - low++; - high++; - } + dsp->vert_filter(input, low, high, + a_width, low_stride, high_stride, + width, height * 2); input = s->plane[plane].l_h[3]; low = s->plane[plane].l_h[4]; high = s->plane[plane].subband[5]; - for (int i = 0; i < width; i++) { - vert_filter(input, a_width, low, low_stride, high, high_stride, height * 2); - input++; - low++; - high++; - } + dsp->vert_filter(input, low, high, + a_width, low_stride, high_stride, + width, height * 2); a_width = s->plane[plane].band[0][0].a_width; width = s->plane[plane].band[0][0].width; @@ -574,34 +526,25 @@ static int cfhd_encode_frame(AVCodecContext *avctx, AVPacket *pkt, } input = s->plane[plane].l_h[4]; - for (int i = 0; i < height * 2; i++) { - horiz_filter(input, low, high, width * 2); - input += a_width * 2; - low += low_stride; - high += high_stride; - } + dsp->horiz_filter(input, low, high, + a_width * 2, low_stride, high_stride, + width * 2, height * 2); low = s->plane[plane].subband[1]; high = s->plane[plane].subband[3]; input = s->plane[plane].l_h[1]; - for (int i = 0; i < width; i++) { - vert_filter(input, a_width, low, low_stride, high, high_stride, height * 2); - input++; - low++; - high++; - } + dsp->vert_filter(input, low, high, + a_width, low_stride, high_stride, + width, height * 2); low = s->plane[plane].subband[0]; high = s->plane[plane].subband[2]; input = s->plane[plane].l_h[0]; - for (int i = 0; i < width; i++) { - vert_filter(input, a_width, low, low_stride, high, high_stride, height * 2); - input++; - low++; - high++; - } + dsp->vert_filter(input, low, high, + a_width, low_stride, high_stride, + width, height * 2); } ret = ff_alloc_packet2(avctx, pkt, 64LL + s->planes * (2LL * avctx->width * avctx->height + 1000LL), 0); diff --git a/libavcodec/cfhdencdsp.c b/libavcodec/cfhdencdsp.c new file mode 100644 index 0000000000..0becb76d1d --- /dev/null +++ b/libavcodec/cfhdencdsp.c @@ -0,0 +1,76 @@ +/* + * Copyright (c) 2015-2016 Kieran Kunhya + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/common.h" +#include "libavutil/avassert.h" + +#include "cfhdencdsp.h" + +static av_always_inline void filter(int16_t *input, ptrdiff_t in_stride, + int16_t *low, ptrdiff_t low_stride, + int16_t *high, ptrdiff_t high_stride, + int len) +{ + low[(0>>1) * low_stride] = av_clip_int16(input[0*in_stride] + input[1*in_stride]); + high[(0>>1) * high_stride] = av_clip_int16((5 * input[0*in_stride] - 11 * input[1*in_stride] + + 4 * input[2*in_stride] + 4 * input[3*in_stride] - + 1 * input[4*in_stride] - 1 * input[5*in_stride] + 4) >> 3); + + for (int i = 2; i < len - 2; i += 2) { + low[(i>>1) * low_stride] = av_clip_int16(input[i*in_stride] + input[(i+1)*in_stride]); + high[(i>>1) * high_stride] = av_clip_int16(((-input[(i-2)*in_stride] - input[(i-1)*in_stride] + + input[(i+2)*in_stride] + input[(i+3)*in_stride] + 4) >> 3) + + input[(i+0)*in_stride] - input[(i+1)*in_stride]); + } + + low[((len-2)>>1) * low_stride] = av_clip_int16(input[((len-2)+0)*in_stride] + input[((len-2)+1)*in_stride]); + high[((len-2)>>1) * high_stride] = av_clip_int16((11* input[((len-2)+0)*in_stride] - 5 * input[((len-2)+1)*in_stride] - + 4 * input[((len-2)-1)*in_stride] - 4 * input[((len-2)-2)*in_stride] + + 1 * input[((len-2)-3)*in_stride] + 1 * input[((len-2)-4)*in_stride] + 4) >> 3); +} + +static void horiz_filter(int16_t *input, int16_t *low, int16_t *high, + ptrdiff_t in_stride, ptrdiff_t low_stride, + ptrdiff_t high_stride, + int width, int height) +{ + for (int i = 0; i < height; i++) { + filter(input, 1, low, 1, high, 1, width); + input += in_stride; + low += low_stride; + high += high_stride; + } +} + +static void vert_filter(int16_t *input, int16_t *low, int16_t *high, + ptrdiff_t in_stride, ptrdiff_t low_stride, + ptrdiff_t high_stride, + int width, int height) +{ + for (int i = 0; i < width; i++) + filter(&input[i], in_stride, &low[i], low_stride, &high[i], high_stride, height); +} + +av_cold void ff_cfhdencdsp_init(CFHDEncDSPContext *c) +{ + c->horiz_filter = horiz_filter; + c->vert_filter = vert_filter; +} diff --git a/libavcodec/cfhdencdsp.h b/libavcodec/cfhdencdsp.h new file mode 100644 index 0000000000..b3aac8d0a7 --- /dev/null +++ b/libavcodec/cfhdencdsp.h @@ -0,0 +1,41 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_CFHDENCDSP_H +#define AVCODEC_CFHDENCDSP_H + +#include +#include + +typedef struct CFHDEncDSPContext { + void (*horiz_filter)(int16_t *input, int16_t *low, int16_t *high, + ptrdiff_t in_stride, ptrdiff_t low_stride, + ptrdiff_t high_stride, + int width, int height); + + void (*vert_filter)(int16_t *input, int16_t *low, int16_t *high, + ptrdiff_t in_stride, ptrdiff_t low_stride, + ptrdiff_t high_stride, + int width, int height); +} CFHDEncDSPContext; + +void ff_cfhdencdsp_init(CFHDEncDSPContext *c); + +void ff_cfhdencdsp_init_x86(CFHDEncDSPContext *c); + +#endif /* AVCODEC_CFHDENCDSP_H */ From patchwork Thu Feb 18 23:02:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 25771 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 30C1E449552 for ; Fri, 19 Feb 2021 01:03:17 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 13DC968A0FC; Fri, 19 Feb 2021 01:03:17 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 19214680635 for ; Fri, 19 Feb 2021 01:03:10 +0200 (EET) Received: by mail-ej1-f41.google.com with SMTP id d8so8452052ejc.4 for ; Thu, 18 Feb 2021 15:03:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=4XH/hQhSFn4e090JCUiDR759KrgJrf4v+arLOIBu0ts=; b=JEtxxawmIDPNncE3JK/WhWOTAy6BDxBrTo2M9p5YH+swznF/LE0o1znBb9YJo8xY0A DPUgQFJc/nJeiUvn1oWgJlIuCUwiG/UyRFl+vpy2bCJ+GKbEAruuKbrWONTBQJLz3gc/ 8QtcjL82GHkDmh1DcYyGKRFz1/Q5ZzlN7/1hoJw16sqnUgUEp95TW4D/w+Lzg7VAZ906 zyxRNnyBXAUUUfw9Bm75ykvrdWGQbe89mQyasmdalthF0oa4oZkv0uUZctnE5lKsbs3Z iAy2t2zhDPjtZXF1PkLLs0S0fsr4jt2wzk38fO8kEuh8KmeRAUPh8Q6A4ph85b700HWT 6okg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=4XH/hQhSFn4e090JCUiDR759KrgJrf4v+arLOIBu0ts=; b=BCSZ6czQitLKQC/Bs7ag2hPGLiDaNgxpx16UqDZSaV1LRZYTir7ggScD7mkyCms5n1 B1cTBsqikO3DigN3rYUR77NLgj89mInVXMH8WkKenUhppPoJ5xQaej1VlPCNUUXhSpRt +x7XsWqumFMBo/W5mBT7bu/f8bcPqzA8YWb+RAyJ7vEIaAeBDkxZgIYvA1N/eVn9CVSu Iz7MzKo/JwfFNmIQc3cTaPE5g4Av3nwbcIOxZXqltIe5UvJmLnhWPSTp1hIolDQNyXPx fAnygSessyz1vCIOSaft7c1qpgREcmsTnUPb1jcVohX5ehAOqwEyEwM+mcQ5OKqpMbZu rE7A== X-Gm-Message-State: AOAM5335LE5ao2u1RWWyc3S5Pbc2iOr4rruSw84V2LuWdQwLA+a3Wom6 Nr3gsTrfyQVvYJz3gdiO4iRRhFepi5e45g== X-Google-Smtp-Source: ABdhPJzbc1ia0by26iT4chrPmEGAN6iaymt3aBrLJomI1AN2hGjb+5ynyBRLE3KSZC2SS9xepBZ9Sg== X-Received: by 2002:a17:906:c7cd:: with SMTP id dc13mr5840613ejb.405.1613689389607; Thu, 18 Feb 2021 15:03:09 -0800 (PST) Received: from localhost.localdomain ([94.250.162.225]) by smtp.gmail.com with ESMTPSA id e22sm3851361edu.61.2021.02.18.15.03.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 15:03:08 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Fri, 19 Feb 2021 00:02:56 +0100 Message-Id: <20210218230258.1263-2-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210218230258.1263-1-onemda@gmail.com> References: <20210218230258.1263-1-onemda@gmail.com> Subject: [FFmpeg-devel] [PATCH 2/4] avcodec/cfhdenc: add padding to each decomposition X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavcodec/cfhdenc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavcodec/cfhdenc.c b/libavcodec/cfhdenc.c index 1e89ffc41c..9c8ba3700c 100644 --- a/libavcodec/cfhdenc.c +++ b/libavcodec/cfhdenc.c @@ -270,10 +270,10 @@ static av_cold int cfhd_encode_init(AVCodecContext *avctx) int width = i ? avctx->width >> s->chroma_h_shift : avctx->width; int height = i ? FFALIGN(avctx->height >> s->chroma_v_shift, 8) : FFALIGN(avctx->height >> s->chroma_v_shift, 8); - ptrdiff_t stride = FFALIGN(width / 8, 8) * 8; + ptrdiff_t stride = (FFALIGN(width / 8, 8) + 64) * 8; - w8 = FFALIGN(width / 8, 8); - h8 = height / 8; + w8 = FFALIGN(width / 8, 8) + 64; + h8 = FFALIGN(height, 8) / 8; w4 = w8 * 2; h4 = h8 * 2; w2 = w4 * 2; From patchwork Thu Feb 18 23:02:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 25772 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 7BC52449552 for ; Fri, 19 Feb 2021 01:03:18 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 671DD68A0D0; Fri, 19 Feb 2021 01:03:18 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AD1DD689FF0 for ; Fri, 19 Feb 2021 01:03:11 +0200 (EET) Received: by mail-ej1-f47.google.com with SMTP id k13so6570043ejs.10 for ; Thu, 18 Feb 2021 15:03:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=u5tQ7XPshbcDzp1lwFxsNm+FeovW3wlxkWN6N4Uf6LQ=; b=L7UXTcC7oLQqXz16BWLI54546JV7VzvDmkqtEkfKdVNGIwBysJTrg6Wnye4d5YMTdI kixbV2DtwcEejWs7cu9pRmhQSlzdg6UJqvvf8rmVZVAXwMRxRL4XfYCsNmlbrdWc6wzk Qakxu+jZt3WsnGUdC7/mfzJO31s9RQvfdVbvHGiDvwEzQ0GWHTN04aw0WTABFNros656 ZniDE+TM9tHAHVX6OyVfBvkEOpUaXVLLVjtQ1dO9U36Egy17IKLbda6Fa1bwziHwXQVZ euyUgiIQ+mK4diku50+iaOwXprhtBxA9stv3RDC3YB98JRG8YqOrK5troseeaFPX3CPc qsuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=u5tQ7XPshbcDzp1lwFxsNm+FeovW3wlxkWN6N4Uf6LQ=; b=Ft6g+FbpPLCiVlNDihxyPBJ3k1G5Fi6JjEnDxHTKwtnhBulKnISm0nVpHrkt/IftVH jwsIFqFVRlU81kIQNEQNmb4lMTTDONMaMQaL1mvmPnZ/QbZHKnBCCjHXt96kBd/qabLO npzYN07YbEExsjLepy7y8bGJPByTGxF8WpxTDyFWZA2Ci+rrAKkdhKQVudTJdf264Yki JIcg9dWOjRi+Z8tYmJq05DoX4Ka+JNhn0p9PX6eMCDAs4RsN1WQQmD0nGmspwKF0DzDW 3WFg0lFIxNosQOiCSSSdXuWxZmGrAhHD3gFEOjsPmgvRmO9ghDOtr91XbkiMlG+GxWye H8QA== X-Gm-Message-State: AOAM533+XnFX+QtIktV2adxjCoA42Q6M8SXpspAVC+9rkE0UHcyLVN2d GesYvN3QWYX3Nvj6GXRBWCS0+vieo56VcQ== X-Google-Smtp-Source: ABdhPJyMdmEjIf4NF/f4jR+CUmvPW7++ZQ9aDUZueclxLsTZgmJCAz+JkSuyLY4i6L2UAAnlgVw5GA== X-Received: by 2002:a17:907:3e26:: with SMTP id hp38mr6215514ejc.459.1613689391303; Thu, 18 Feb 2021 15:03:11 -0800 (PST) Received: from localhost.localdomain ([94.250.162.225]) by smtp.gmail.com with ESMTPSA id e22sm3851361edu.61.2021.02.18.15.03.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 15:03:10 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Fri, 19 Feb 2021 00:02:57 +0100 Message-Id: <20210218230258.1263-3-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210218230258.1263-1-onemda@gmail.com> References: <20210218230258.1263-1-onemda@gmail.com> Subject: [FFmpeg-devel] [PATCH 3/4] avcodec/cfhdenc: do not try to encode junk X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavcodec/cfhdenc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/cfhdenc.c b/libavcodec/cfhdenc.c index 9c8ba3700c..b2993cf2a9 100644 --- a/libavcodec/cfhdenc.c +++ b/libavcodec/cfhdenc.c @@ -742,7 +742,7 @@ static int cfhd_encode_frame(AVCodecContext *avctx, AVPacket *pkt, for (int m = 0; m < height; m++) { for (int j = 0; j < stride; j++) { - int16_t index = FFSIGN(data[j]) * lut[FFABS(data[j])]; + int16_t index = j >= width ? 0 : FFSIGN(data[j]) * lut[FFABS(data[j])]; if (index < 0) index += 512; From patchwork Thu Feb 18 23:02:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 25773 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id A5F3D44922F for ; Fri, 19 Feb 2021 01:11:24 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 77158689CD5; Fri, 19 Feb 2021 01:11:24 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 450E0688059 for ; Fri, 19 Feb 2021 01:11:18 +0200 (EET) Received: by mail-lj1-f178.google.com with SMTP id y7so7075448lji.7 for ; Thu, 18 Feb 2021 15:11:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=yiOQTSUb6pTngbzB4OzDFetYvbZBLaXs1/qSOOS5wjw=; b=NuN7tGT/WwNnVT0bnz4pTDpHRx5RsTlHaoVhLHLm/7iD820Y17Fzbo5yxDImFDu372 4jZODNAAP+v3OKYRXdfML2+7CWcVyqWsPEvEU2eztVqZDkD300jWTNMjPWUUw2lf0ibc gyFSnT0FV2a8icjbJewDkHYBI73Oc7WR3Kj2EDC4O6fX/xH/tV9w0RDyD9Y2Xgw9KaYu R7uqvGaWXkuAprM7C6m2uYRrITgItjD3FW9VGAGI/94eVr5wjpsJdhb/LMG7gEgIjTIK 1KL/j3cOw5iTuZ1cCaKRmu2DgzGocKDsBby7tAnrQwzrpkTRW7gV+9vPkSeaCOLzb6f8 JHTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=yiOQTSUb6pTngbzB4OzDFetYvbZBLaXs1/qSOOS5wjw=; b=svPAVDSCm/HO3vgC35eThzuL+cn3IJAZlgxFEFVsYcX9i1Q9kjqPyYXPLdV4PAzu0H KOBvK5PTb1RB6WfThhstAjOUz+y3hfYDcEDx/s6uMwCu2f9mgW08yApbj6T5GGLB/1ST R/7QMm+17JzZFxdGblN1Uy43RvF5VicinstbD4IhaDmnYmuOzmHXbBBljFwiHTMjaWMs fXaZgEH4l7jrDnndSCuO6uyu9urOp+ZB6ylD2Zb8mNKUsWhwLAOc6gbq1PQ7EHJnoBjK VsY6Bv9zOT2YXezHMIDbmchUBc+/YFUUiCWYY3eP+RAC5hV7TUVehVpMAmi+Lv/dfZAN j9kw== X-Gm-Message-State: AOAM531W6poQYkJss9AJcQBrlXlKl7Vt8Uhsavkw7/GP67uG1WWZlNoL YiIrO5eU2BgmlZb76P2SZ2g2TOhRtLxqLQ== X-Google-Smtp-Source: ABdhPJyV7kL3l459qqx5TYOhJJYydQwxK0LHe7zUzuz9rmmgzOH3jiJYSqu2aSDuZ3jBRb8BwV/E8A== X-Received: by 2002:aa7:cd87:: with SMTP id x7mr6491822edv.210.1613689392823; Thu, 18 Feb 2021 15:03:12 -0800 (PST) Received: from localhost.localdomain ([94.250.162.225]) by smtp.gmail.com with ESMTPSA id e22sm3851361edu.61.2021.02.18.15.03.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 15:03:12 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Fri, 19 Feb 2021 00:02:58 +0100 Message-Id: <20210218230258.1263-4-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210218230258.1263-1-onemda@gmail.com> References: <20210218230258.1263-1-onemda@gmail.com> Subject: [FFmpeg-devel] [PATCH 4/4] avcodec/x86: add cfhdenc SIMD X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavcodec/cfhdencdsp.c | 3 + libavcodec/x86/Makefile | 2 + libavcodec/x86/cfhdencdsp.asm | 426 +++++++++++++++++++++++++++++++ libavcodec/x86/cfhdencdsp_init.c | 46 ++++ 4 files changed, 477 insertions(+) create mode 100644 libavcodec/x86/cfhdencdsp.asm create mode 100644 libavcodec/x86/cfhdencdsp_init.c diff --git a/libavcodec/cfhdencdsp.c b/libavcodec/cfhdencdsp.c index 0becb76d1d..b979e9e09a 100644 --- a/libavcodec/cfhdencdsp.c +++ b/libavcodec/cfhdencdsp.c @@ -73,4 +73,7 @@ av_cold void ff_cfhdencdsp_init(CFHDEncDSPContext *c) { c->horiz_filter = horiz_filter; c->vert_filter = vert_filter; + + if (ARCH_X86) + ff_cfhdencdsp_init_x86(c); } diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile index 884dc0c759..6361161180 100644 --- a/libavcodec/x86/Makefile +++ b/libavcodec/x86/Makefile @@ -51,6 +51,7 @@ OBJS-$(CONFIG_ALAC_DECODER) += x86/alacdsp_init.o OBJS-$(CONFIG_APNG_DECODER) += x86/pngdsp_init.o OBJS-$(CONFIG_CAVS_DECODER) += x86/cavsdsp.o OBJS-$(CONFIG_CFHD_DECODER) += x86/cfhddsp_init.o +OBJS-$(CONFIG_CFHD_ENCODER) += x86/cfhdencdsp_init.o OBJS-$(CONFIG_DCA_DECODER) += x86/dcadsp_init.o x86/synth_filter_init.o OBJS-$(CONFIG_DNXHD_ENCODER) += x86/dnxhdenc_init.o OBJS-$(CONFIG_EXR_DECODER) += x86/exrdsp_init.o @@ -154,6 +155,7 @@ X86ASM-OBJS-$(CONFIG_ADPCM_G722_ENCODER) += x86/g722dsp.o X86ASM-OBJS-$(CONFIG_ALAC_DECODER) += x86/alacdsp.o X86ASM-OBJS-$(CONFIG_APNG_DECODER) += x86/pngdsp.o X86ASM-OBJS-$(CONFIG_CAVS_DECODER) += x86/cavsidct.o +X86ASM-OBJS-$(CONFIG_CFHD_ENCODER) += x86/cfhdencdsp.o X86ASM-OBJS-$(CONFIG_CFHD_DECODER) += x86/cfhddsp.o X86ASM-OBJS-$(CONFIG_DCA_DECODER) += x86/dcadsp.o x86/synth_filter.o X86ASM-OBJS-$(CONFIG_DIRAC_DECODER) += x86/diracdsp.o \ diff --git a/libavcodec/x86/cfhdencdsp.asm b/libavcodec/x86/cfhdencdsp.asm new file mode 100644 index 0000000000..2fb4744345 --- /dev/null +++ b/libavcodec/x86/cfhdencdsp.asm @@ -0,0 +1,426 @@ +;****************************************************************************** +;* x86-optimized functions for the CFHD encoder +;* Copyright (c) 2021 Paul B Mahol +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + +%include "libavutil/x86/x86util.asm" + +SECTION_RODATA + +pw_p1_n1: dw 1, -1, 1, -1, 1, -1, 1, -1 +pw_n1_p1: dw -1, 1, -1, 1, -1, 1, -1, 1 +pw_p5_n11: dw 5, -11, 5, -11, 5, -11, 5, -11 +pw_n5_p11: dw -5, 11, -5, 11, -5, 11, -5, 11 +pw_p11_n5: dw 11, -5, 11, -5, 11, -5, 11, -5 +pw_n11_p5: dw -11, 5, -11, 5, -11, 5, -11, 5 +pd_4: times 4 dd 4 +pw_p4: times 8 dw 4 +pw_n4: times 8 dw -4 +pw_p1: times 8 dw 1 +pw_n1: times 8 dw -1 + +SECTION .text + +INIT_XMM sse2 +%if ARCH_X86_64 +cglobal cfhdenc_horiz_filter, 8, 11, 12, input, low, high, istride, lwidth, hwidth, width, height, x, y, temp + mov yd, heightd + shl istrided, 1 + shl lwidthd, 1 + shl hwidthd, 1 + neg yq +%else +cglobal cfhdenc_horiz_filter, 7, 7, 8, input, x, low, y, high, temp, width, height +TODO +%endif +.looph: + movsx xq, word [inputq] + + movsx tempq, word [inputq + 2] + add tempq, xq + + movd xm0, tempd + packssdw m0, m0 + pextrw tempd, xm0, 0 + mov word [lowq], tempw + + movsx xq, word [inputq] + imul xq, 5 + movsx tempq, word [inputq + 2] + imul tempq, -11 + add tempq, xq + + movsx xq, word [inputq + 4] + imul xq, 4 + add tempq, xq + + movsx xq, word [inputq + 6] + imul xq, 4 + add tempq, xq + + movsx xq, word [inputq + 8] + imul xq, -1 + add tempq, xq + + movsx xq, word [inputq + 10] + imul xq, -1 + add tempq, xq + + add tempq, 4 + sar tempq, 3 + + movd xm0, tempd + packssdw m0, m0 + pextrw tempd, xm0, 0 + mov word [highq], tempw + + mov xq, 2 + +.loopw: + movu m0, [inputq + xq * 2] + movu m1, [inputq + xq * 2 + mmsize] + + pmaddwd m0, [pw_p1] + pmaddwd m1, [pw_p1] + + packssdw m0, m1 + movu [lowq+xq], m0 + + movu m2, [inputq + xq * 2 - 4] + movu m3, [inputq + xq * 2 - 4 + mmsize] + + pmaddwd m2, [pw_n1] + pmaddwd m3, [pw_n1] + + movu m0, [inputq + xq * 2 + 4] + movu m1, [inputq + xq * 2 + 4 + mmsize] + + pmaddwd m0, [pw_p1] + pmaddwd m1, [pw_p1] + + paddd m0, m2 + paddd m1, m3 + + paddd m0, [pd_4] + paddd m1, [pd_4] + + psrad m0, 3 + psrad m1, 3 + + movu m5, [inputq + xq * 2 + 0] + movu m6, [inputq + xq * 2 + mmsize] + + pmaddwd m5, [pw_p1_n1] + pmaddwd m6, [pw_p1_n1] + + paddd m0, m5 + paddd m1, m6 + + packssdw m0, m1 + movu [highq+xq], m0 + + add xq, mmsize + cmp xq, widthq + jl .loopw + + add lowq, widthq + add highq, widthq + add inputq, widthq + add inputq, widthq + + movsx xq, word [inputq - 4] + movsx tempq, word [inputq - 2] + add tempq, xq + + movd xm0, tempd + packssdw m0, m0 + pextrw tempd, xm0, 0 + mov word [lowq-2], tempw + + movsx tempq, word [inputq - 4] + imul tempq, 11 + movsx xq, word [inputq - 2] + imul xq, -5 + add tempq, xq + + movsx xq, word [inputq - 6] + imul xq, -4 + add tempq, xq + + movsx xq, word [inputq - 8] + imul xq, -4 + add tempq, xq + + movsx xq, word [inputq - 10] + add tempq, xq + + movsx xq, word [inputq - 12] + add tempq, xq + + add tempq, 4 + sar tempq, 3 + + movd xm0, tempd + packssdw m0, m0 + pextrw tempd, xm0, 0 + mov word [highq-2], tempw + + sub inputq, widthq + sub inputq, widthq + sub highq, widthq + sub lowq, widthq + + add lowq, lwidthq + add highq, hwidthq + add inputq, istrideq + add yq, 1 + jl .looph + + RET + +INIT_XMM sse2 +%if ARCH_X86_64 +cglobal cfhdenc_vert_filter, 8, 11, 8, input, low, high, istride, lwidth, hwidth, width, height, x, y, pos + shl istrided, 1 + shl widthd, 1 +%else +cglobal cfhdenc_vert_filter, 7, 7, 8, input, low, high, istride, lwitdh +TODO +%endif + + sub heightq, 2 + + xor xq, xq +.loopw: + mov yq, 2 + + mov posq, xq + movu m0, [inputq + posq] + add posq, istrideq + movu m1, [inputq + posq] + + paddsw m0, m1 + + movu [lowq + xq], m0 + + mov posq, xq + + movu m0, [inputq + posq] + add posq, istrideq + movu m1, [inputq + posq] + add posq, istrideq + movu m2, [inputq + posq] + add posq, istrideq + movu m3, [inputq + posq] + add posq, istrideq + movu m4, [inputq + posq] + add posq, istrideq + movu m5, [inputq + posq] + + mova m6, m0 + punpcklwd m0, m1 + punpckhwd m1, m6 + + mova m6, m2 + punpcklwd m2, m3 + punpckhwd m3, m6 + + mova m6, m4 + punpcklwd m4, m5 + punpckhwd m5, m6 + + pmaddwd m0, [pw_p5_n11] + pmaddwd m1, [pw_n11_p5] + pmaddwd m2, [pw_p4] + pmaddwd m3, [pw_p4] + pmaddwd m4, [pw_n1] + pmaddwd m5, [pw_n1] + + paddd m0, m2 + paddd m1, m3 + paddd m0, m4 + paddd m1, m5 + + paddd m0, [pd_4] + paddd m1, [pd_4] + + psrad m0, 3 + psrad m1, 3 + packssdw m0, m1 + + movu [highq + xq], m0 + +.looph: + + mov posq, istrideq + imul posq, yq + add posq, xq + + movu m0, [inputq + posq] + + add posq, istrideq + movu m1, [inputq + posq] + + paddsw m0, m1 + + mov posq, lwidthq + imul posq, yq + add posq, xq + + movu [lowq + posq], m0 + + add yq, -2 + + mov posq, istrideq + imul posq, yq + add posq, xq + + movu m0, [inputq + posq] + add posq, istrideq + movu m1, [inputq + posq] + add posq, istrideq + movu m2, [inputq + posq] + add posq, istrideq + movu m3, [inputq + posq] + add posq, istrideq + movu m4, [inputq + posq] + add posq, istrideq + movu m5, [inputq + posq] + + add yq, 2 + + mova m6, m0 + punpcklwd m0, m1 + punpckhwd m1, m6 + + mova m6, m2 + punpcklwd m2, m3 + punpckhwd m3, m6 + + mova m6, m4 + punpcklwd m4, m5 + punpckhwd m5, m6 + + pmaddwd m0, [pw_n1] + pmaddwd m1, [pw_n1] + pmaddwd m2, [pw_p1_n1] + pmaddwd m3, [pw_n1_p1] + pmaddwd m4, [pw_p1] + pmaddwd m5, [pw_p1] + + paddd m0, m4 + paddd m1, m5 + + paddd m0, [pd_4] + paddd m1, [pd_4] + + psrad m0, 3 + psrad m1, 3 + paddd m0, m2 + paddd m1, m3 + packssdw m0, m1 + + mov posq, hwidthq + imul posq, yq + add posq, xq + + movu [highq + posq], m0 + + add yq, 2 + cmp yq, heightq + jl .looph + + mov posq, istrideq + imul posq, yq + add posq, xq + + movu m0, [inputq + posq] + add posq, istrideq + movu m1, [inputq + posq] + + paddsw m0, m1 + + mov posq, lwidthq + imul posq, yq + add posq, xq + + movu [lowq + posq], m0 + + sub yq, 4 + + mov posq, istrideq + imul posq, yq + add posq, xq + + movu m0, [inputq + posq] + add posq, istrideq + movu m1, [inputq + posq] + add posq, istrideq + movu m2, [inputq + posq] + add posq, istrideq + movu m3, [inputq + posq] + add posq, istrideq + movu m4, [inputq + posq] + add posq, istrideq + movu m5, [inputq + posq] + + add yq, 4 + + mova m6, m0 + punpcklwd m0, m1 + punpckhwd m1, m6 + + mova m6, m2 + punpcklwd m2, m3 + punpckhwd m3, m6 + + mova m6, m4 + punpcklwd m4, m5 + punpckhwd m5, m6 + + pmaddwd m0, [pw_p1] + pmaddwd m1, [pw_p1] + pmaddwd m2, [pw_n4] + pmaddwd m3, [pw_n4] + pmaddwd m4, [pw_p11_n5] + pmaddwd m5, [pw_n5_p11] + + paddd m4, m2 + paddd m5, m3 + + paddd m4, m0 + paddd m5, m1 + + paddd m4, [pd_4] + paddd m5, [pd_4] + + psrad m4, 3 + psrad m5, 3 + packssdw m4, m5 + + mov posq, hwidthq + imul posq, yq + add posq, xq + + movu [highq + posq], m4 + + add xq, mmsize + cmp xq, widthq + jl .loopw + RET diff --git a/libavcodec/x86/cfhdencdsp_init.c b/libavcodec/x86/cfhdencdsp_init.c new file mode 100644 index 0000000000..4a867f53f9 --- /dev/null +++ b/libavcodec/x86/cfhdencdsp_init.c @@ -0,0 +1,46 @@ +/* + * Copyright (c) 2021 Paul B Mahol + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include + +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavutil/x86/cpu.h" +#include "libavcodec/avcodec.h" +#include "libavcodec/cfhdencdsp.h" + +void ff_cfhdenc_horiz_filter_sse2(int16_t *input, int16_t *low, int16_t *high, + ptrdiff_t in_stride, ptrdiff_t low_stride, + ptrdiff_t high_stride, + int width, int height); +void ff_cfhdenc_vert_filter_sse2(int16_t *input, int16_t *low, int16_t *high, + ptrdiff_t in_stride, ptrdiff_t low_stride, + ptrdiff_t high_stride, + int width, int height); + +av_cold void ff_cfhdencdsp_init_x86(CFHDEncDSPContext *c) +{ + int cpu_flags = av_get_cpu_flags(); + + if (EXTERNAL_SSE2(cpu_flags)) { + c->horiz_filter = ff_cfhdenc_horiz_filter_sse2; + c->vert_filter = ff_cfhdenc_vert_filter_sse2; + } +}