From patchwork Sun Jan 8 15:25:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 2102 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.89.21 with SMTP id n21csp6708394vsb; Sun, 8 Jan 2017 07:26:35 -0800 (PST) X-Received: by 10.223.154.165 with SMTP id a34mr10589690wrc.193.1483889195556; Sun, 08 Jan 2017 07:26:35 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id t10si7864492wmb.0.2017.01.08.07.26.35; Sun, 08 Jan 2017 07:26:35 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A8B6C68A4FD; Sun, 8 Jan 2017 17:26:09 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk0-f195.google.com (mail-qk0-f195.google.com [209.85.220.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AED9268A4FD for ; Sun, 8 Jan 2017 17:26:02 +0200 (EET) Received: by mail-qk0-f195.google.com with SMTP id a20so16014257qkc.3 for ; Sun, 08 Jan 2017 07:26:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=dQchL1/7WyAEXI2RUt57hlWr0IqQln6RulEkSu1yYV0=; b=EJCzVC+SVSIk3Uo8CvHbA73nzTnMwRT6BsE4hqDAisSGkGrZ7CZf6nOV0MvX+fL0gH 6t5kxDSh2k96U42EtwK+K8Fh8J4phKU9za+T+BOpbQyeP2ADZFQucKGoFPCO1hnVhSIj XqCv+dszLxgV6UPPJdLeMbOGX1uRhC7QOaiTBHoRLn4JOWdRgpOyNCpSbeQx3xqfFifE P5vRG7fitoOJK2Ea1cutUUZiM/KoOnelcnWRgHhIKO7ZeY1loXCSQgw3AzD8nMyZmcyQ tRYlI4hNVqbrJatFu4bPZKJSWloblS9R8lr5F0Ur7MG0chUlAepegyHDLj35iwFVuMx0 anuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=dQchL1/7WyAEXI2RUt57hlWr0IqQln6RulEkSu1yYV0=; b=Gu/oMkzWH4nWgI95lz+YY+yG9IBnk8sTWDwVmY9aUSZuMtZOrp+LwfJ1BB7K/mVgFS +i56XbJVwCVnb+G9vGH9vIZsIqDBhMHEvjBhFKu/Rou4h88mpIVMp41qUjLaJdV0P8TA ITDUqb5qLSd3QDYT2IjrcSmhSMH9anKo9kkO6xEcJyZBRNcbFiS49PuRD7SMtirVRPRX Rbx21Kh0JEH0oQSgLGx91cDE3QrES+1YdULxLlRzpQg0yjM5ue/3imuYr7xVV6XVdveS oNtye7k9nkK/6qhLdoK76VHGTRqVxoSobBko+rINSqX3Tg19jrAtbB+FEvygbhw2UBtg xoVA== X-Gm-Message-State: AIkVDXLv3NAjzZmXa1SRO1/tJE7egVRXqb4Ldw2kIdNnecrEKbYcAoY1t5FB4VYfYt/PHA== X-Received: by 10.55.100.88 with SMTP id y85mr14018297qkb.194.1483889168076; Sun, 08 Jan 2017 07:26:08 -0800 (PST) Received: from localhost.localdomain ([181.22.44.206]) by smtp.gmail.com with ESMTPSA id v184sm1866821qkc.34.2017.01.08.07.26.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 08 Jan 2017 07:26:07 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Sun, 8 Jan 2017 12:25:36 -0300 Message-Id: <20170108152537.5468-5-jamrial@gmail.com> X-Mailer: git-send-email 2.10.2 In-Reply-To: <20170108152537.5468-1-jamrial@gmail.com> References: <20170108152537.5468-1-jamrial@gmail.com> Subject: [FFmpeg-devel] [PATCH 4/5] huffyuvdsp: move functions only used by huffyuv from lossless_videodsp X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: James Almer --- PPC changes untested. libavcodec/huffyuvdec.c | 8 +- libavcodec/huffyuvdsp.c | 36 +++++++- libavcodec/huffyuvdsp.h | 13 ++- libavcodec/lagarith.c | 2 +- libavcodec/lossless_videodsp.c | 36 +------- libavcodec/lossless_videodsp.h | 9 +- libavcodec/magicyuv.c | 2 +- libavcodec/ppc/lossless_videodsp_altivec.c | 2 +- libavcodec/utvideodec.c | 2 +- libavcodec/vble.c | 2 +- libavcodec/x86/huffyuvdsp.asm | 137 +++++++++++++++++++++++++++++ libavcodec/x86/huffyuvdsp_init.c | 13 ++- libavcodec/x86/lossless_videodsp.asm | 136 ---------------------------- libavcodec/x86/lossless_videodsp_init.c | 14 +-- 14 files changed, 208 insertions(+), 204 deletions(-) diff --git a/libavcodec/huffyuvdec.c b/libavcodec/huffyuvdec.c index d068204..c79dda4 100644 --- a/libavcodec/huffyuvdec.c +++ b/libavcodec/huffyuvdec.c @@ -297,8 +297,8 @@ static av_cold int decode_init(AVCodecContext *avctx) if (ret < 0) return ret; - ff_huffyuvdsp_init(&s->hdsp); - ff_llviddsp_init(&s->llviddsp, avctx); + ff_huffyuvdsp_init(&s->hdsp, avctx); + ff_llviddsp_init(&s->llviddsp); memset(s->vlc, 0, 4 * sizeof(VLC)); s->interlaced = avctx->height > 288; @@ -891,7 +891,7 @@ static void add_bytes(HYuvContext *s, uint8_t *dst, uint8_t *src, int w) if (s->bps <= 8) { s->llviddsp.add_bytes(dst, src, w); } else { - s->llviddsp.add_int16((uint16_t*)dst, (const uint16_t*)src, s->n - 1, w); + s->hdsp.add_int16((uint16_t*)dst, (const uint16_t*)src, s->n - 1, w); } } @@ -900,7 +900,7 @@ static void add_median_prediction(HYuvContext *s, uint8_t *dst, const uint8_t *s if (s->bps <= 8) { s->llviddsp.add_median_pred(dst, src, diff, w, left, left_top); } else { - s->llviddsp.add_hfyu_median_pred_int16((uint16_t *)dst, (const uint16_t *)src, (const uint16_t *)diff, s->n-1, w, left, left_top); + s->hdsp.add_hfyu_median_pred_int16((uint16_t *)dst, (const uint16_t *)src, (const uint16_t *)diff, s->n-1, w, left, left_top); } } static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, diff --git a/libavcodec/huffyuvdsp.c b/libavcodec/huffyuvdsp.c index 2adfc19..759ffda 100644 --- a/libavcodec/huffyuvdsp.c +++ b/libavcodec/huffyuvdsp.c @@ -23,6 +23,36 @@ #include "mathops.h" #include "huffyuvdsp.h" +static void add_int16_c(uint16_t *dst, const uint16_t *src, unsigned mask, int w){ + long i; + unsigned long pw_lsb = (mask >> 1) * 0x0001000100010001ULL; + unsigned long pw_msb = pw_lsb + 0x0001000100010001ULL; + for (i = 0; i <= w - (int)sizeof(long)/2; i += sizeof(long)/2) { + long a = *(long*)(src+i); + long b = *(long*)(dst+i); + *(long*)(dst+i) = ((a&pw_lsb) + (b&pw_lsb)) ^ ((a^b)&pw_msb); + } + for(; iadd_int16 = add_int16_c; + c->add_hfyu_median_pred_int16 = add_hfyu_median_pred_int16_c; c->add_hfyu_left_pred_bgr32 = add_hfyu_left_pred_bgr32_c; if (ARCH_X86) - ff_huffyuvdsp_init_x86(c); + ff_huffyuvdsp_init_x86(c, avctx); } diff --git a/libavcodec/huffyuvdsp.h b/libavcodec/huffyuvdsp.h index eaad1af..7680f2e 100644 --- a/libavcodec/huffyuvdsp.h +++ b/libavcodec/huffyuvdsp.h @@ -21,6 +21,7 @@ #include #include "config.h" +#include "avcodec.h" #if HAVE_BIGENDIAN #define B 3 @@ -35,12 +36,18 @@ #endif typedef struct HuffYUVDSPContext { + void (*add_int16)(uint16_t *dst/*align 16*/, const uint16_t *src/*align 16*/, + unsigned mask, int w); + + void (*add_hfyu_median_pred_int16)(uint16_t *dst, const uint16_t *top, + const uint16_t *diff, unsigned mask, + int w, int *left, int *left_top); void (*add_hfyu_left_pred_bgr32)(uint8_t *dst, const uint8_t *src, intptr_t w, uint8_t *left); } HuffYUVDSPContext; -void ff_huffyuvdsp_init(HuffYUVDSPContext *c); -void ff_huffyuvdsp_init_ppc(HuffYUVDSPContext *c); -void ff_huffyuvdsp_init_x86(HuffYUVDSPContext *c); +void ff_huffyuvdsp_init(HuffYUVDSPContext *c, AVCodecContext *avctx); +void ff_huffyuvdsp_init_ppc(HuffYUVDSPContext *c, AVCodecContext *avctx); +void ff_huffyuvdsp_init_x86(HuffYUVDSPContext *c, AVCodecContext *avctx); #endif /* AVCODEC_HUFFYUVDSP_H */ diff --git a/libavcodec/lagarith.c b/libavcodec/lagarith.c index 96a4b5c..f03305f 100644 --- a/libavcodec/lagarith.c +++ b/libavcodec/lagarith.c @@ -725,7 +725,7 @@ static av_cold int lag_decode_init(AVCodecContext *avctx) LagarithContext *l = avctx->priv_data; l->avctx = avctx; - ff_llviddsp_init(&l->llviddsp, avctx); + ff_llviddsp_init(&l->llviddsp); return 0; } diff --git a/libavcodec/lossless_videodsp.c b/libavcodec/lossless_videodsp.c index b93d4e7..54ce677 100644 --- a/libavcodec/lossless_videodsp.c +++ b/libavcodec/lossless_videodsp.c @@ -79,36 +79,6 @@ static int add_left_pred_c(uint8_t *dst, const uint8_t *src, intptr_t w, return acc; } -static void add_int16_c(uint16_t *dst, const uint16_t *src, unsigned mask, int w){ - long i; - unsigned long pw_lsb = (mask >> 1) * 0x0001000100010001ULL; - unsigned long pw_msb = pw_lsb + 0x0001000100010001ULL; - for (i = 0; i <= w - (int)sizeof(long)/2; i += sizeof(long)/2) { - long a = *(long*)(src+i); - long b = *(long*)(dst+i); - *(long*)(dst+i) = ((a&pw_lsb) + (b&pw_lsb)) ^ ((a^b)&pw_msb); - } - for(; iadd_bytes = add_bytes_c; c->add_median_pred = add_median_pred_c; c->add_left_pred = add_left_pred_c; - c->add_int16 = add_int16_c; c->add_hfyu_left_pred_int16 = add_hfyu_left_pred_int16_c; - c->add_hfyu_median_pred_int16 = add_hfyu_median_pred_int16_c; if (ARCH_X86) - ff_llviddsp_init_x86(c, avctx); + ff_llviddsp_init_x86(c); } diff --git a/libavcodec/lossless_videodsp.h b/libavcodec/lossless_videodsp.h index 7f31683..b8105d3 100644 --- a/libavcodec/lossless_videodsp.h +++ b/libavcodec/lossless_videodsp.h @@ -34,14 +34,11 @@ typedef struct LLVidDSPContext { int (*add_left_pred)(uint8_t *dst, const uint8_t *src, intptr_t w, int left); - void (*add_int16)(uint16_t *dst/*align 16*/, const uint16_t *src/*align 16*/, unsigned mask, int w); - - void (*add_hfyu_median_pred_int16)(uint16_t *dst, const uint16_t *top, const uint16_t *diff, unsigned mask, int w, int *left, int *left_top); int (*add_hfyu_left_pred_int16)(uint16_t *dst, const uint16_t *src, unsigned mask, int w, unsigned left); } LLVidDSPContext; -void ff_llviddsp_init(LLVidDSPContext *llviddsp, AVCodecContext *avctx); -void ff_llviddsp_init_x86(LLVidDSPContext *llviddsp, AVCodecContext *avctx); -void ff_llviddsp_init_ppc(LLVidDSPContext *llviddsp, AVCodecContext *avctx); +void ff_llviddsp_init(LLVidDSPContext *llviddsp); +void ff_llviddsp_init_x86(LLVidDSPContext *llviddsp); +void ff_llviddsp_init_ppc(LLVidDSPContext *llviddsp); #endif //AVCODEC_LOSSLESS_VIDEODSP_H diff --git a/libavcodec/magicyuv.c b/libavcodec/magicyuv.c index 4e78ff1..ac0737c 100644 --- a/libavcodec/magicyuv.c +++ b/libavcodec/magicyuv.c @@ -697,7 +697,7 @@ static int magy_init_thread_copy(AVCodecContext *avctx) static av_cold int magy_decode_init(AVCodecContext *avctx) { MagicYUVContext *s = avctx->priv_data; - ff_llviddsp_init(&s->llviddsp, avctx); + ff_llviddsp_init(&s->llviddsp); return 0; } diff --git a/libavcodec/ppc/lossless_videodsp_altivec.c b/libavcodec/ppc/lossless_videodsp_altivec.c index e17abaa..c388dc3 100644 --- a/libavcodec/ppc/lossless_videodsp_altivec.c +++ b/libavcodec/ppc/lossless_videodsp_altivec.c @@ -51,7 +51,7 @@ static void add_bytes_altivec(uint8_t *dst, uint8_t *src, intptr_t w) } #endif /* HAVE_ALTIVEC */ -av_cold void ff_llviddsp_init_ppc(LLVidDSPContext *c, AVCodecContext *avctx) +av_cold void ff_llviddsp_init_ppc(LLVidDSPContext *c) { #if HAVE_ALTIVEC if (!PPC_ALTIVEC(av_get_cpu_flags())) diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c index 7d1d35b..38de2c8 100644 --- a/libavcodec/utvideodec.c +++ b/libavcodec/utvideodec.c @@ -827,7 +827,7 @@ static av_cold int decode_init(AVCodecContext *avctx) c->avctx = avctx; ff_bswapdsp_init(&c->bdsp); - ff_llviddsp_init(&c->llviddsp, avctx); + ff_llviddsp_init(&c->llviddsp); if (avctx->extradata_size >= 16) { av_log(avctx, AV_LOG_DEBUG, "Encoder version %d.%d.%d.%d\n", diff --git a/libavcodec/vble.c b/libavcodec/vble.c index 7598d30..4a07ab3 100644 --- a/libavcodec/vble.c +++ b/libavcodec/vble.c @@ -185,7 +185,7 @@ static av_cold int vble_decode_init(AVCodecContext *avctx) /* Stash for later use */ ctx->avctx = avctx; - ff_llviddsp_init(&ctx->llviddsp, avctx); + ff_llviddsp_init(&ctx->llviddsp); avctx->pix_fmt = AV_PIX_FMT_YUV420P; avctx->bits_per_raw_sample = 8; diff --git a/libavcodec/x86/huffyuvdsp.asm b/libavcodec/x86/huffyuvdsp.asm index 0befd3b..0d8cae3 100644 --- a/libavcodec/x86/huffyuvdsp.asm +++ b/libavcodec/x86/huffyuvdsp.asm @@ -24,6 +24,78 @@ SECTION .text + +%macro INT16_LOOP 2 ; %1 = a/u (aligned/unaligned), %2 = add/sub + movd m4, maskd + SPLATW m4, m4 + add wd, wd + test wq, 2*mmsize - 1 + jz %%.tomainloop + push tmpq +%%.wordloop: + sub wq, 2 +%ifidn %2, add + mov tmpw, [srcq+wq] + add tmpw, [dstq+wq] +%else + mov tmpw, [src1q+wq] + sub tmpw, [src2q+wq] +%endif + and tmpw, maskw + mov [dstq+wq], tmpw + test wq, 2*mmsize - 1 + jnz %%.wordloop + pop tmpq +%%.tomainloop: +%ifidn %2, add + add srcq, wq +%else + add src1q, wq + add src2q, wq +%endif + add dstq, wq + neg wq + jz %%.end +%%.loop: +%ifidn %2, add + mov%1 m0, [srcq+wq] + mov%1 m1, [dstq+wq] + mov%1 m2, [srcq+wq+mmsize] + mov%1 m3, [dstq+wq+mmsize] +%else + mov%1 m0, [src1q+wq] + mov%1 m1, [src2q+wq] + mov%1 m2, [src1q+wq+mmsize] + mov%1 m3, [src2q+wq+mmsize] +%endif + p%2w m0, m1 + p%2w m2, m3 + pand m0, m4 + pand m2, m4 + mov%1 [dstq+wq] , m0 + mov%1 [dstq+wq+mmsize], m2 + add wq, 2*mmsize + jl %%.loop +%%.end: + RET +%endmacro + +%if ARCH_X86_32 +INIT_MMX mmx +cglobal add_int16, 4,4,5, dst, src, mask, w, tmp + INT16_LOOP a, add +%endif + +INIT_XMM sse2 +cglobal add_int16, 4,4,5, dst, src, mask, w, tmp + test srcq, mmsize-1 + jnz .unaligned + test dstq, mmsize-1 + jnz .unaligned + INT16_LOOP a, add +.unaligned: + INT16_LOOP u, add + ; void add_hfyu_left_pred_bgr32(uint8_t *dst, const uint8_t *src, ; intptr_t w, uint8_t *left) %macro LEFT_BGR32 0 @@ -63,3 +135,68 @@ LEFT_BGR32 %endif INIT_XMM sse2 LEFT_BGR32 + +; void add_hfyu_median_prediction_mmxext(uint8_t *dst, const uint8_t *top, const uint8_t *diff, int mask, int w, int *left, int *left_top) +INIT_MMX mmxext +cglobal add_hfyu_median_pred_int16, 7,7,0, dst, top, diff, mask, w, left, left_top + add wd, wd + movd mm6, maskd + SPLATW mm6, mm6 + movq mm0, [topq] + movq mm2, mm0 + movd mm4, [left_topq] + psllq mm2, 16 + movq mm1, mm0 + por mm4, mm2 + movd mm3, [leftq] + psubw mm0, mm4 ; t-tl + add dstq, wq + add topq, wq + add diffq, wq + neg wq + jmp .skip +.loop: + movq mm4, [topq+wq] + movq mm0, mm4 + psllq mm4, 16 + por mm4, mm1 + movq mm1, mm0 ; t + psubw mm0, mm4 ; t-tl +.skip: + movq mm2, [diffq+wq] +%assign i 0 +%rep 4 + movq mm4, mm0 + paddw mm4, mm3 ; t-tl+l + pand mm4, mm6 + movq mm5, mm3 + pmaxsw mm3, mm1 + pminsw mm5, mm1 + pminsw mm3, mm4 + pmaxsw mm3, mm5 ; median + paddw mm3, mm2 ; +residual + pand mm3, mm6 +%if i==0 + movq mm7, mm3 + psllq mm7, 48 +%else + movq mm4, mm3 + psrlq mm7, 16 + psllq mm4, 48 + por mm7, mm4 +%endif +%if i<3 + psrlq mm0, 16 + psrlq mm1, 16 + psrlq mm2, 16 +%endif +%assign i i+1 +%endrep + movq [dstq+wq], mm7 + add wq, 8 + jl .loop + movzx r2d, word [dstq-2] + mov [leftq], r2d + movzx r2d, word [topq-2] + mov [left_topq], r2d + RET diff --git a/libavcodec/x86/huffyuvdsp_init.c b/libavcodec/x86/huffyuvdsp_init.c index fc87c38..f72d759 100644 --- a/libavcodec/x86/huffyuvdsp_init.c +++ b/libavcodec/x86/huffyuvdsp_init.c @@ -21,24 +21,35 @@ #include "config.h" #include "libavutil/attributes.h" #include "libavutil/cpu.h" +#include "libavutil/pixdesc.h" #include "libavutil/x86/asm.h" #include "libavutil/x86/cpu.h" #include "libavcodec/huffyuvdsp.h" +void ff_add_int16_mmx(uint16_t *dst, const uint16_t *src, unsigned mask, int w); +void ff_add_int16_sse2(uint16_t *dst, const uint16_t *src, unsigned mask, int w); void ff_add_hfyu_left_pred_bgr32_mmx(uint8_t *dst, const uint8_t *src, intptr_t w, uint8_t *left); void ff_add_hfyu_left_pred_bgr32_sse2(uint8_t *dst, const uint8_t *src, intptr_t w, uint8_t *left); +void ff_add_hfyu_median_pred_int16_mmxext(uint16_t *dst, const uint16_t *top, const uint16_t *diff, unsigned mask, int w, int *left, int *left_top); -av_cold void ff_huffyuvdsp_init_x86(HuffYUVDSPContext *c) +av_cold void ff_huffyuvdsp_init_x86(HuffYUVDSPContext *c, AVCodecContext *avctx) { int cpu_flags = av_get_cpu_flags(); + const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(avctx->pix_fmt); if (ARCH_X86_32 && EXTERNAL_MMX(cpu_flags)) { c->add_hfyu_left_pred_bgr32 = ff_add_hfyu_left_pred_bgr32_mmx; + c->add_int16 = ff_add_int16_mmx; + } + + if (EXTERNAL_MMXEXT(cpu_flags) && pix_desc && pix_desc->comp[0].depth<16) { + c->add_hfyu_median_pred_int16 = ff_add_hfyu_median_pred_int16_mmxext; } if (EXTERNAL_SSE2(cpu_flags)) { + c->add_int16 = ff_add_int16_sse2; c->add_hfyu_left_pred_bgr32 = ff_add_hfyu_left_pred_bgr32_sse2; } } diff --git a/libavcodec/x86/lossless_videodsp.asm b/libavcodec/x86/lossless_videodsp.asm index e12e93a..1f484cb 100644 --- a/libavcodec/x86/lossless_videodsp.asm +++ b/libavcodec/x86/lossless_videodsp.asm @@ -217,77 +217,6 @@ ADD_BYTES INIT_XMM sse2 ADD_BYTES -%macro INT16_LOOP 2 ; %1 = a/u (aligned/unaligned), %2 = add/sub - movd m4, maskd - SPLATW m4, m4 - add wd, wd - test wq, 2*mmsize - 1 - jz %%.tomainloop - push tmpq -%%.wordloop: - sub wq, 2 -%ifidn %2, add - mov tmpw, [srcq+wq] - add tmpw, [dstq+wq] -%else - mov tmpw, [src1q+wq] - sub tmpw, [src2q+wq] -%endif - and tmpw, maskw - mov [dstq+wq], tmpw - test wq, 2*mmsize - 1 - jnz %%.wordloop - pop tmpq -%%.tomainloop: -%ifidn %2, add - add srcq, wq -%else - add src1q, wq - add src2q, wq -%endif - add dstq, wq - neg wq - jz %%.end -%%.loop: -%ifidn %2, add - mov%1 m0, [srcq+wq] - mov%1 m1, [dstq+wq] - mov%1 m2, [srcq+wq+mmsize] - mov%1 m3, [dstq+wq+mmsize] -%else - mov%1 m0, [src1q+wq] - mov%1 m1, [src2q+wq] - mov%1 m2, [src1q+wq+mmsize] - mov%1 m3, [src2q+wq+mmsize] -%endif - p%2w m0, m1 - p%2w m2, m3 - pand m0, m4 - pand m2, m4 - mov%1 [dstq+wq] , m0 - mov%1 [dstq+wq+mmsize], m2 - add wq, 2*mmsize - jl %%.loop -%%.end: - RET -%endmacro - -%if ARCH_X86_32 -INIT_MMX mmx -cglobal add_int16, 4,4,5, dst, src, mask, w, tmp - INT16_LOOP a, add -%endif - -INIT_XMM sse2 -cglobal add_int16, 4,4,5, dst, src, mask, w, tmp - test srcq, mmsize-1 - jnz .unaligned - test dstq, mmsize-1 - jnz .unaligned - INT16_LOOP a, add -.unaligned: - INT16_LOOP u, add - %macro ADD_HFYU_LEFT_LOOP_INT16 2 ; %1 = dst alignment (a/u), %2 = src alignment (a/u) add wd, wd add srcq, wq @@ -359,68 +288,3 @@ cglobal add_hfyu_left_pred_int16, 4,4,8, dst, src, mask, w, left ADD_HFYU_LEFT_LOOP_INT16 u, a .src_unaligned: ADD_HFYU_LEFT_LOOP_INT16 u, u - -; void add_hfyu_median_prediction_mmxext(uint8_t *dst, const uint8_t *top, const uint8_t *diff, int mask, int w, int *left, int *left_top) -INIT_MMX mmxext -cglobal add_hfyu_median_pred_int16, 7,7,0, dst, top, diff, mask, w, left, left_top - add wd, wd - movd mm6, maskd - SPLATW mm6, mm6 - movq mm0, [topq] - movq mm2, mm0 - movd mm4, [left_topq] - psllq mm2, 16 - movq mm1, mm0 - por mm4, mm2 - movd mm3, [leftq] - psubw mm0, mm4 ; t-tl - add dstq, wq - add topq, wq - add diffq, wq - neg wq - jmp .skip -.loop: - movq mm4, [topq+wq] - movq mm0, mm4 - psllq mm4, 16 - por mm4, mm1 - movq mm1, mm0 ; t - psubw mm0, mm4 ; t-tl -.skip: - movq mm2, [diffq+wq] -%assign i 0 -%rep 4 - movq mm4, mm0 - paddw mm4, mm3 ; t-tl+l - pand mm4, mm6 - movq mm5, mm3 - pmaxsw mm3, mm1 - pminsw mm5, mm1 - pminsw mm3, mm4 - pmaxsw mm3, mm5 ; median - paddw mm3, mm2 ; +residual - pand mm3, mm6 -%if i==0 - movq mm7, mm3 - psllq mm7, 48 -%else - movq mm4, mm3 - psrlq mm7, 16 - psllq mm4, 48 - por mm7, mm4 -%endif -%if i<3 - psrlq mm0, 16 - psrlq mm1, 16 - psrlq mm2, 16 -%endif -%assign i i+1 -%endrep - movq [dstq+wq], mm7 - add wq, 8 - jl .loop - movzx r2d, word [dstq-2] - mov [leftq], r2d - movzx r2d, word [topq-2] - mov [left_topq], r2d - RET diff --git a/libavcodec/x86/lossless_videodsp_init.c b/libavcodec/x86/lossless_videodsp_init.c index dabd20c..5b139a7 100644 --- a/libavcodec/x86/lossless_videodsp_init.c +++ b/libavcodec/x86/lossless_videodsp_init.c @@ -21,7 +21,6 @@ #include "config.h" #include "libavutil/x86/asm.h" #include "../lossless_videodsp.h" -#include "libavutil/pixdesc.h" #include "libavutil/x86/cpu.h" void ff_add_bytes_mmx(uint8_t *dst, uint8_t *src, intptr_t w); @@ -39,11 +38,8 @@ int ff_add_left_pred_ssse3(uint8_t *dst, const uint8_t *src, int ff_add_left_pred_sse4(uint8_t *dst, const uint8_t *src, intptr_t w, int left); -void ff_add_int16_mmx(uint16_t *dst, const uint16_t *src, unsigned mask, int w); -void ff_add_int16_sse2(uint16_t *dst, const uint16_t *src, unsigned mask, int w); int ff_add_hfyu_left_pred_int16_ssse3(uint16_t *dst, const uint16_t *src, unsigned mask, int w, unsigned acc); int ff_add_hfyu_left_pred_int16_sse4(uint16_t *dst, const uint16_t *src, unsigned mask, int w, unsigned acc); -void ff_add_hfyu_median_pred_int16_mmxext(uint16_t *dst, const uint16_t *top, const uint16_t *diff, unsigned mask, int w, int *left, int *left_top); #if HAVE_INLINE_ASM && HAVE_7REGS && ARCH_X86_32 static void add_median_pred_cmov(uint8_t *dst, const uint8_t *top, @@ -83,10 +79,9 @@ static void add_median_pred_cmov(uint8_t *dst, const uint8_t *top, } #endif -void ff_llviddsp_init_x86(LLVidDSPContext *c, AVCodecContext *avctx) +void ff_llviddsp_init_x86(LLVidDSPContext *c) { int cpu_flags = av_get_cpu_flags(); - const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(avctx->pix_fmt); #if HAVE_INLINE_ASM && HAVE_7REGS && ARCH_X86_32 if (cpu_flags & AV_CPU_FLAG_CMOV) @@ -95,7 +90,6 @@ void ff_llviddsp_init_x86(LLVidDSPContext *c, AVCodecContext *avctx) if (ARCH_X86_32 && EXTERNAL_MMX(cpu_flags)) { c->add_bytes = ff_add_bytes_mmx; - c->add_int16 = ff_add_int16_mmx; } if (ARCH_X86_32 && EXTERNAL_MMXEXT(cpu_flags)) { @@ -104,15 +98,9 @@ void ff_llviddsp_init_x86(LLVidDSPContext *c, AVCodecContext *avctx) c->add_median_pred = ff_add_median_pred_mmxext; } - if (EXTERNAL_MMXEXT(cpu_flags) && pix_desc && pix_desc->comp[0].depth<16) { - c->add_hfyu_median_pred_int16 = ff_add_hfyu_median_pred_int16_mmxext; - } - if (EXTERNAL_SSE2(cpu_flags)) { c->add_bytes = ff_add_bytes_sse2; c->add_median_pred = ff_add_median_pred_sse2; - - c->add_int16 = ff_add_int16_sse2; } if (EXTERNAL_SSSE3(cpu_flags)) {