From patchwork Fri Dec 23 17:32:53 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 1900 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.89.21 with SMTP id n21csp1253728vsb; Fri, 23 Dec 2016 09:33:52 -0800 (PST) X-Received: by 10.194.124.100 with SMTP id mh4mr15929917wjb.154.1482514432148; Fri, 23 Dec 2016 09:33:52 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id y21si32668887wmd.102.2016.12.23.09.33.51; Fri, 23 Dec 2016 09:33:52 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1078F689B29; Fri, 23 Dec 2016 19:33:49 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com [74.125.82.68]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1F8D96898CE for ; Fri, 23 Dec 2016 19:33:43 +0200 (EET) Received: by mail-wm0-f68.google.com with SMTP id u144so42658015wmu.0 for ; Fri, 23 Dec 2016 09:33:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=55qWZX6seUJamiI+8LeUlASJd6cSFRi+e4N7JPldd4M=; b=diYqZh1T50d8oj0k3ni4Pd/sb6gcfRqddj0qzcWRZuQyhu2t/TzSe1xBuH9dY8DiCO c6A3XV2OER+x+mnCs8kqcg01mJQk7KvRY8Zcz2wYyolO47rRHYkpsJaY6drICxBiWQiU wLxAoY4zg99mqaF/HolWFgXbeOVpuFlNtkZEyXQCWfGBhtCQ97j8xi4VrOtCa0b47Gek s5fpdSDnR1phBm494orop2mf7YTv+Rar6GuEODRD5twEOlSiMaG5rcLDLwT+MereRX+e 4mfBnTkv+uS7thgvDTdUVUqsYpLsxN1xRhao0wPsMWHPSgvcrE72o1ABozhBE99jzSDt N+Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=55qWZX6seUJamiI+8LeUlASJd6cSFRi+e4N7JPldd4M=; b=Fc+QeNwc++j345vqnnhjQDa467bSczKwCiXc9xOeoZh0O+jdiI4++6NHckHG4r109A ANccCcQma5PDWxmO1iEyogC+plFpj+N5t540bwQi74TwnaAFOfg0cZs5Tnetp34ybD/v fiM9ytNDvKuXsxvlOXS+aFrCiVimsBLR5xEQHyB8Z3FaOJkSZNZQ9I3lt5By3EQNcgoH Fk0d+CI4CkvoR1quvvAvct6jvpKtaKPz2S8snybTXN/kRnGefsmlVRbrhYuzf+iz2FAg eZ7unvFV8FYegFq7c/Ek2Q9MIUbFFaeHQ0TCc5StEmVlXbWsdkD5Yo+IwnvpysslcIC7 VAQw== X-Gm-Message-State: AIkVDXI5zuHdt3X/evtijdYoyGdnFAn8EtSMTXl6JiqI6WoTuAiy5hIKixOVamQgXJo0Jg== X-Received: by 10.28.20.139 with SMTP id 133mr16596090wmu.9.1482514423541; Fri, 23 Dec 2016 09:33:43 -0800 (PST) Received: from localhost.localdomain ([31.45.251.36]) by smtp.gmail.com with ESMTPSA id o3sm41495791wjx.39.2016.12.23.09.33.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 23 Dec 2016 09:33:42 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Fri, 23 Dec 2016 18:32:53 +0100 Message-Id: <1482514373-28939-1-git-send-email-onemda@gmail.com> X-Mailer: git-send-email 2.5.0 Subject: [FFmpeg-devel] [PATCH] avcodec/magicyuv: add SIMD for median of 10bits X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavcodec/lossless_videodsp.c | 18 ++++++++++ libavcodec/lossless_videodsp.h | 1 + libavcodec/magicyuv.c | 23 +----------- libavcodec/x86/lossless_videodsp.asm | 62 +++++++++++++++++++++++++++++++++ libavcodec/x86/lossless_videodsp_init.c | 2 ++ 5 files changed, 84 insertions(+), 22 deletions(-) diff --git a/libavcodec/lossless_videodsp.c b/libavcodec/lossless_videodsp.c index 3491621..15189f1 100644 --- a/libavcodec/lossless_videodsp.c +++ b/libavcodec/lossless_videodsp.c @@ -77,6 +77,23 @@ static void add_hfyu_median_pred_int16_c(uint16_t *dst, const uint16_t *src, con *left_top = lt; } +static void add_magy_median_pred_int16_c(uint16_t *dst, const uint16_t *src, const uint16_t *diff, unsigned mask, int w, int *left, int *left_top){ + int i; + uint16_t l, lt; + + l = *left; + lt = *left_top; + + for(i=0; iadd_hfyu_left_pred_int16 = add_hfyu_left_pred_int16_c; c->add_hfyu_median_pred_int16 = add_hfyu_median_pred_int16_c; c->sub_hfyu_median_pred_int16 = sub_hfyu_median_pred_int16_c; + c->add_magy_median_pred_int16 = add_magy_median_pred_int16_c; if (ARCH_X86) ff_llviddsp_init_x86(c, avctx); diff --git a/libavcodec/lossless_videodsp.h b/libavcodec/lossless_videodsp.h index 040902e..c7a6881 100644 --- a/libavcodec/lossless_videodsp.h +++ b/libavcodec/lossless_videodsp.h @@ -32,6 +32,7 @@ typedef struct LLVidDSPContext { void (*sub_hfyu_median_pred_int16)(uint16_t *dst, const uint16_t *src1, const uint16_t *src2, unsigned mask, int w, int *left, int *left_top); void (*add_hfyu_median_pred_int16)(uint16_t *dst, const uint16_t *top, const uint16_t *diff, unsigned mask, int w, int *left, int *left_top); int (*add_hfyu_left_pred_int16)(uint16_t *dst, const uint16_t *src, unsigned mask, int w, unsigned left); + void (*add_magy_median_pred_int16)(uint16_t *dst, const uint16_t *top, const uint16_t *diff, unsigned mask, int w, int *left, int *left_top); } LLVidDSPContext; void ff_llviddsp_init(LLVidDSPContext *llviddsp, AVCodecContext *avctx); diff --git a/libavcodec/magicyuv.c b/libavcodec/magicyuv.c index 16d7027..f65c434 100644 --- a/libavcodec/magicyuv.c +++ b/libavcodec/magicyuv.c @@ -144,27 +144,6 @@ static int huff_build(VLC *vlc, uint8_t *len) syms, sizeof(*syms), sizeof(*syms), 0); } -static void magicyuv_median_pred10(uint16_t *dst, const uint16_t *src1, - const uint16_t *diff, intptr_t w, - int *left, int *left_top) -{ - int i; - uint16_t l, lt; - - l = *left; - lt = *left_top; - - for (i = 0; i < w; i++) { - l = mid_pred(l, src1[i], (l + src1[i] - lt)) + diff[i]; - l &= 0x3FF; - lt = src1[i]; - dst[i] = l; - } - - *left = l; - *left_top = lt; -} - static int magy_decode_slice10(AVCodecContext *avctx, void *tdata, int j, int threadnr) { @@ -265,7 +244,7 @@ static int magy_decode_slice10(AVCodecContext *avctx, void *tdata, dst += stride; } for (k = 1 + interlaced; k < height; k++) { - magicyuv_median_pred10(dst, dst - fake_stride, dst, width, &left, &lefttop); + s->llviddsp.add_magy_median_pred_int16(dst, dst - fake_stride, dst, 1023, width, &left, &lefttop); lefttop = left = dst[0]; dst += stride; } diff --git a/libavcodec/x86/lossless_videodsp.asm b/libavcodec/x86/lossless_videodsp.asm index f06fcdf..8a2eb26 100644 --- a/libavcodec/x86/lossless_videodsp.asm +++ b/libavcodec/x86/lossless_videodsp.asm @@ -292,3 +292,65 @@ cglobal sub_hfyu_median_pred_int16, 7,7,0, dst, src1, src2, mask, w, left, left_ movzx maskd, word [src2q + wq - 2] mov [leftq], maskd RET + +cglobal add_magy_median_pred_int16, 7,7,0, dst, top, diff, mask, w, left, left_top + add wd, wd + movd mm6, maskd + SPLATW mm6, mm6 + movq mm0, [topq] + movq mm2, mm0 + movd mm4, [left_topq] + psllq mm2, 16 + movq mm1, mm0 + por mm4, mm2 + movd mm3, [leftq] + psubw mm0, mm4 ; t-tl + add dstq, wq + add topq, wq + add diffq, wq + neg wq + jmp .skip +.loop: + movq mm4, [topq+wq] + movq mm0, mm4 + psllq mm4, 16 + por mm4, mm1 + movq mm1, mm0 ; t + psubw mm0, mm4 ; t-tl +.skip: + movq mm2, [diffq+wq] +%assign i 0 +%rep 4 + movq mm4, mm0 + paddw mm4, mm3 ; t-tl+l + movq mm5, mm3 + pmaxsw mm3, mm1 + pminsw mm5, mm1 + pminsw mm3, mm4 + pmaxsw mm3, mm5 ; median + paddw mm3, mm2 ; +residual + pand mm3, mm6 +%if i==0 + movq mm7, mm3 + psllq mm7, 48 +%else + movq mm4, mm3 + psrlq mm7, 16 + psllq mm4, 48 + por mm7, mm4 +%endif +%if i<3 + psrlq mm0, 16 + psrlq mm1, 16 + psrlq mm2, 16 +%endif +%assign i i+1 +%endrep + movq [dstq+wq], mm7 + add wq, 8 + jl .loop + movzx r2d, word [dstq-2] + mov [leftq], r2d + movzx r2d, word [topq-2] + mov [left_topq], r2d + RET diff --git a/libavcodec/x86/lossless_videodsp_init.c b/libavcodec/x86/lossless_videodsp_init.c index 548d043..8112c70 100644 --- a/libavcodec/x86/lossless_videodsp_init.c +++ b/libavcodec/x86/lossless_videodsp_init.c @@ -30,6 +30,7 @@ int ff_add_hfyu_left_pred_int16_ssse3(uint16_t *dst, const uint16_t *src, unsign int ff_add_hfyu_left_pred_int16_sse4(uint16_t *dst, const uint16_t *src, unsigned mask, int w, unsigned acc); void ff_add_hfyu_median_pred_int16_mmxext(uint16_t *dst, const uint16_t *top, const uint16_t *diff, unsigned mask, int w, int *left, int *left_top); void ff_sub_hfyu_median_pred_int16_mmxext(uint16_t *dst, const uint16_t *src1, const uint16_t *src2, unsigned mask, int w, int *left, int *left_top); +void ff_add_magy_median_pred_int16_mmxext(uint16_t *dst, const uint16_t *top, const uint16_t *diff, unsigned mask, int w, int *left, int *left_top); void ff_llviddsp_init_x86(LLVidDSPContext *c, AVCodecContext *avctx) @@ -44,6 +45,7 @@ void ff_llviddsp_init_x86(LLVidDSPContext *c, AVCodecContext *avctx) if (EXTERNAL_MMXEXT(cpu_flags) && pix_desc && pix_desc->comp[0].depth<16) { c->add_hfyu_median_pred_int16 = ff_add_hfyu_median_pred_int16_mmxext; + c->add_magy_median_pred_int16 = ff_add_magy_median_pred_int16_mmxext; c->sub_hfyu_median_pred_int16 = ff_sub_hfyu_median_pred_int16_mmxext; }