From patchwork Mon Nov 14 10:32:19 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 1419 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.90.1 with SMTP id o1csp975387vsb; Mon, 14 Nov 2016 02:51:03 -0800 (PST) X-Received: by 10.194.8.226 with SMTP id u2mr18684106wja.91.1479120663384; Mon, 14 Nov 2016 02:51:03 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i67si10845249wmi.162.2016.11.14.02.51.03; Mon, 14 Nov 2016 02:51:03 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 88E54689E23; Mon, 14 Nov 2016 12:51:01 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f65.google.com (mail-lf0-f65.google.com [209.85.215.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 10AED689CFC for ; Mon, 14 Nov 2016 12:50:55 +0200 (EET) Received: by mail-lf0-f65.google.com with SMTP id p100so6640575lfg.2 for ; Mon, 14 Nov 2016 02:50:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id; bh=ZaWPuhbj3VWYAkAXwE729WyGg2xBzwNpT/YywiP4v4k=; b=w1x4qrOG1wxZBD2JaJdtc7T8zi5Y4WePKaBoycVJ9o5Sa8VHkQg62FmqvoBwPwjmT0 /qfT7y+GU48GeOiVHQ111YyR/posixfbKyC/g7rQXnDfZfMVOaWqJJPQszHL3banyqDV HorLSTkVI0w8bw8SkidLOYtULxUJaoFz/EjSp7VkpYoduUMAKPfpQo91bh9Pkm5oCmzc OgVdIXDU1PEtRRxt3exU/PLuvcjmrxBFdBnL3nNAFjha7BDqIQytBOijuTsbPiD3LOAc O73dDBXOa56drbk5CUo7Aw7jtwRvzHVeWD4FtEgkMY/WxjHzQulTyl47J1h15XKZHeBN eEHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id; bh=ZaWPuhbj3VWYAkAXwE729WyGg2xBzwNpT/YywiP4v4k=; b=IZCE68jh9rfr5oIr9Wec1o3agmBOzptKq0bcTtLjXnNFF9+lB7v5Ot3QIwveN4pBVd VBswbD8tNaVxQaNJN7tVzm6JBAXdb+dlTsimZcR+ifCj47Uc2CFk6DtLB4HwnvzUG0cA cB/e5+YKd9BKzEI3hn+DJsCiKtqVlO7+d7lBuKLbzEH59nya3w+YhLo/JojwXUjMidLE e1AP9lrxuHzC2qtrt6ahoRlBEG9n1hPCaYDGVx1dhUvN9nTxmEajFq9V4qi+N6YvnlOm jVli2WtvN13boylpjkKnMN2r2c1HNOoU0yQotgyAniT4DagT+e49gf/fAnelT4pyy3tn FxkA== X-Gm-Message-State: ABUngvemPvLu8KE0fVb3ce+cyRR6sZpxlUWHCf40cRQ4+GFnH4wGI25LgKPiHFqpMFnHyA== X-Received: by 10.25.204.213 with SMTP id c204mr33103lfg.70.1479119549537; Mon, 14 Nov 2016 02:32:29 -0800 (PST) Received: from localhost.localdomain ([2001:470:28:852:907f:a7ad:5ca8:25a7]) by smtp.gmail.com with ESMTPSA id w67sm4845727lff.16.2016.11.14.02.32.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 14 Nov 2016 02:32:29 -0800 (PST) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 14 Nov 2016 12:32:19 +0200 Message-Id: <1479119547-7392-1-git-send-email-martin@martin.st> X-Mailer: git-send-email 2.7.4 Subject: [FFmpeg-devel] [PATCH 1/9] vp9dsp: Deduplicate the subpel filters X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Make them aligned, to allow efficient access to them from simd. This is an adapted cherry-pick from libav commit a4cfcddcb0f76e837d5abc06840c2b26c0e8aefc. --- libavcodec/vp9dsp.c | 56 +++++++++++++++++++++++++++++++++++++++ libavcodec/vp9dsp.h | 3 +++ libavcodec/vp9dsp_template.c | 63 +++----------------------------------------- 3 files changed, 63 insertions(+), 59 deletions(-) diff --git a/libavcodec/vp9dsp.c b/libavcodec/vp9dsp.c index 54e77e2..6dd49c8 100644 --- a/libavcodec/vp9dsp.c +++ b/libavcodec/vp9dsp.c @@ -25,6 +25,62 @@ #include "libavutil/common.h" #include "vp9dsp.h" +const DECLARE_ALIGNED(16, int16_t, ff_vp9_subpel_filters)[3][16][8] = { + [FILTER_8TAP_REGULAR] = { + { 0, 0, 0, 128, 0, 0, 0, 0 }, + { 0, 1, -5, 126, 8, -3, 1, 0 }, + { -1, 3, -10, 122, 18, -6, 2, 0 }, + { -1, 4, -13, 118, 27, -9, 3, -1 }, + { -1, 4, -16, 112, 37, -11, 4, -1 }, + { -1, 5, -18, 105, 48, -14, 4, -1 }, + { -1, 5, -19, 97, 58, -16, 5, -1 }, + { -1, 6, -19, 88, 68, -18, 5, -1 }, + { -1, 6, -19, 78, 78, -19, 6, -1 }, + { -1, 5, -18, 68, 88, -19, 6, -1 }, + { -1, 5, -16, 58, 97, -19, 5, -1 }, + { -1, 4, -14, 48, 105, -18, 5, -1 }, + { -1, 4, -11, 37, 112, -16, 4, -1 }, + { -1, 3, -9, 27, 118, -13, 4, -1 }, + { 0, 2, -6, 18, 122, -10, 3, -1 }, + { 0, 1, -3, 8, 126, -5, 1, 0 }, + }, [FILTER_8TAP_SHARP] = { + { 0, 0, 0, 128, 0, 0, 0, 0 }, + { -1, 3, -7, 127, 8, -3, 1, 0 }, + { -2, 5, -13, 125, 17, -6, 3, -1 }, + { -3, 7, -17, 121, 27, -10, 5, -2 }, + { -4, 9, -20, 115, 37, -13, 6, -2 }, + { -4, 10, -23, 108, 48, -16, 8, -3 }, + { -4, 10, -24, 100, 59, -19, 9, -3 }, + { -4, 11, -24, 90, 70, -21, 10, -4 }, + { -4, 11, -23, 80, 80, -23, 11, -4 }, + { -4, 10, -21, 70, 90, -24, 11, -4 }, + { -3, 9, -19, 59, 100, -24, 10, -4 }, + { -3, 8, -16, 48, 108, -23, 10, -4 }, + { -2, 6, -13, 37, 115, -20, 9, -4 }, + { -2, 5, -10, 27, 121, -17, 7, -3 }, + { -1, 3, -6, 17, 125, -13, 5, -2 }, + { 0, 1, -3, 8, 127, -7, 3, -1 }, + }, [FILTER_8TAP_SMOOTH] = { + { 0, 0, 0, 128, 0, 0, 0, 0 }, + { -3, -1, 32, 64, 38, 1, -3, 0 }, + { -2, -2, 29, 63, 41, 2, -3, 0 }, + { -2, -2, 26, 63, 43, 4, -4, 0 }, + { -2, -3, 24, 62, 46, 5, -4, 0 }, + { -2, -3, 21, 60, 49, 7, -4, 0 }, + { -1, -4, 18, 59, 51, 9, -4, 0 }, + { -1, -4, 16, 57, 53, 12, -4, -1 }, + { -1, -4, 14, 55, 55, 14, -4, -1 }, + { -1, -4, 12, 53, 57, 16, -4, -1 }, + { 0, -4, 9, 51, 59, 18, -4, -1 }, + { 0, -4, 7, 49, 60, 21, -3, -2 }, + { 0, -4, 5, 46, 62, 24, -3, -2 }, + { 0, -4, 4, 43, 63, 26, -2, -2 }, + { 0, -3, 2, 41, 63, 29, -2, -2 }, + { 0, -3, 1, 38, 64, 32, -1, -3 }, + } +}; + + av_cold void ff_vp9dsp_init(VP9DSPContext *dsp, int bpp, int bitexact) { if (bpp == 8) { diff --git a/libavcodec/vp9dsp.h b/libavcodec/vp9dsp.h index 733f5bf..cb43f5e 100644 --- a/libavcodec/vp9dsp.h +++ b/libavcodec/vp9dsp.h @@ -120,6 +120,9 @@ typedef struct VP9DSPContext { vp9_scaled_mc_func smc[5][4][2]; } VP9DSPContext; + +extern const int16_t ff_vp9_subpel_filters[3][16][8]; + void ff_vp9dsp_init(VP9DSPContext *dsp, int bpp, int bitexact); void ff_vp9dsp_init_8(VP9DSPContext *dsp); diff --git a/libavcodec/vp9dsp_template.c b/libavcodec/vp9dsp_template.c index 4d810fe..bb54561 100644 --- a/libavcodec/vp9dsp_template.c +++ b/libavcodec/vp9dsp_template.c @@ -1991,61 +1991,6 @@ copy_avg_fn(4) #endif /* BIT_DEPTH != 12 */ -static const int16_t vp9_subpel_filters[3][16][8] = { - [FILTER_8TAP_REGULAR] = { - { 0, 0, 0, 128, 0, 0, 0, 0 }, - { 0, 1, -5, 126, 8, -3, 1, 0 }, - { -1, 3, -10, 122, 18, -6, 2, 0 }, - { -1, 4, -13, 118, 27, -9, 3, -1 }, - { -1, 4, -16, 112, 37, -11, 4, -1 }, - { -1, 5, -18, 105, 48, -14, 4, -1 }, - { -1, 5, -19, 97, 58, -16, 5, -1 }, - { -1, 6, -19, 88, 68, -18, 5, -1 }, - { -1, 6, -19, 78, 78, -19, 6, -1 }, - { -1, 5, -18, 68, 88, -19, 6, -1 }, - { -1, 5, -16, 58, 97, -19, 5, -1 }, - { -1, 4, -14, 48, 105, -18, 5, -1 }, - { -1, 4, -11, 37, 112, -16, 4, -1 }, - { -1, 3, -9, 27, 118, -13, 4, -1 }, - { 0, 2, -6, 18, 122, -10, 3, -1 }, - { 0, 1, -3, 8, 126, -5, 1, 0 }, - }, [FILTER_8TAP_SHARP] = { - { 0, 0, 0, 128, 0, 0, 0, 0 }, - { -1, 3, -7, 127, 8, -3, 1, 0 }, - { -2, 5, -13, 125, 17, -6, 3, -1 }, - { -3, 7, -17, 121, 27, -10, 5, -2 }, - { -4, 9, -20, 115, 37, -13, 6, -2 }, - { -4, 10, -23, 108, 48, -16, 8, -3 }, - { -4, 10, -24, 100, 59, -19, 9, -3 }, - { -4, 11, -24, 90, 70, -21, 10, -4 }, - { -4, 11, -23, 80, 80, -23, 11, -4 }, - { -4, 10, -21, 70, 90, -24, 11, -4 }, - { -3, 9, -19, 59, 100, -24, 10, -4 }, - { -3, 8, -16, 48, 108, -23, 10, -4 }, - { -2, 6, -13, 37, 115, -20, 9, -4 }, - { -2, 5, -10, 27, 121, -17, 7, -3 }, - { -1, 3, -6, 17, 125, -13, 5, -2 }, - { 0, 1, -3, 8, 127, -7, 3, -1 }, - }, [FILTER_8TAP_SMOOTH] = { - { 0, 0, 0, 128, 0, 0, 0, 0 }, - { -3, -1, 32, 64, 38, 1, -3, 0 }, - { -2, -2, 29, 63, 41, 2, -3, 0 }, - { -2, -2, 26, 63, 43, 4, -4, 0 }, - { -2, -3, 24, 62, 46, 5, -4, 0 }, - { -2, -3, 21, 60, 49, 7, -4, 0 }, - { -1, -4, 18, 59, 51, 9, -4, 0 }, - { -1, -4, 16, 57, 53, 12, -4, -1 }, - { -1, -4, 14, 55, 55, 14, -4, -1 }, - { -1, -4, 12, 53, 57, 16, -4, -1 }, - { 0, -4, 9, 51, 59, 18, -4, -1 }, - { 0, -4, 7, 49, 60, 21, -3, -2 }, - { 0, -4, 5, 46, 62, 24, -3, -2 }, - { 0, -4, 4, 43, 63, 26, -2, -2 }, - { 0, -3, 2, 41, 63, 29, -2, -2 }, - { 0, -3, 1, 38, 64, 32, -1, -3 }, - } -}; - #define FILTER_8TAP(src, x, F, stride) \ av_clip_pixel((F[0] * src[x + -3 * stride] + \ F[1] * src[x + -2 * stride] + \ @@ -2155,7 +2100,7 @@ static void avg##_8tap_##type##_##sz##dir##_c(uint8_t *dst, ptrdiff_t dst_stride int h, int mx, int my) \ { \ avg##_8tap_1d_##dir##_c(dst, dst_stride, src, src_stride, sz, h, \ - vp9_subpel_filters[type_idx][dir_m]); \ + ff_vp9_subpel_filters[type_idx][dir_m]); \ } #define filter_fn_2d(sz, type, type_idx, avg) \ @@ -2164,8 +2109,8 @@ static void avg##_8tap_##type##_##sz##hv_c(uint8_t *dst, ptrdiff_t dst_stride, \ int h, int mx, int my) \ { \ avg##_8tap_2d_hv_c(dst, dst_stride, src, src_stride, sz, h, \ - vp9_subpel_filters[type_idx][mx], \ - vp9_subpel_filters[type_idx][my]); \ + ff_vp9_subpel_filters[type_idx][mx], \ + ff_vp9_subpel_filters[type_idx][my]); \ } #if BIT_DEPTH != 12 @@ -2454,7 +2399,7 @@ static void avg##_scaled_##type##_##sz##_c(uint8_t *dst, ptrdiff_t dst_stride, \ int h, int mx, int my, int dx, int dy) \ { \ avg##_scaled_8tap_c(dst, dst_stride, src, src_stride, sz, h, mx, my, dx, dy, \ - vp9_subpel_filters[type_idx]); \ + ff_vp9_subpel_filters[type_idx]); \ } #if BIT_DEPTH != 12