From patchwork Thu Jan 2 15:22:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derek Buitenhuis X-Patchwork-Id: 17134 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 2B0E644B176 for ; Thu, 2 Jan 2020 17:28:27 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id F308A68A628; Thu, 2 Jan 2020 17:28:26 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3E8BD68A486 for ; Thu, 2 Jan 2020 17:28:20 +0200 (EET) Received: by mail-wr1-f67.google.com with SMTP id y17so39613589wrh.5 for ; Thu, 02 Jan 2020 07:28:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=SMCIrjDWoblWU1EoDdH3tBBkXJEJEX+gG65togKHegE=; b=pR7XINH2hw97mY+yNXiX0mHb8Uhbb+h5xLR2yz6He0/EXkXx3CufTE9IHrZsrure34 cM5DGHETP6Dmf2NuLhy8axznuIZSsniMoDiODUrB83Br+IFcydGxxbuqJyh8NiMFW6V+ DlCo50lKVRrXI835uDXo2t+sO3NOvdvPKwO5K7aL9jClwxpgnL2Y1SUl98A/mCNB2tja VzHpGq3HhctQEYaF6Nj2boHoPQvBWdBWLC7d7b4gmHKhpEI4maQN5u047PitsKjZ1M5A ozkRZTQz1ltI5+zA76+cX01a/ad4s4br5h38IITKViU7XJWZxvUnEnIqE8dD2TxzTzbf gXUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=SMCIrjDWoblWU1EoDdH3tBBkXJEJEX+gG65togKHegE=; b=DbkGpkVrgx9Q3i67r32GmYD+8viyKf6lQRvwR/yvXXSaReM3Yxq+/GB1xPYaCDlU8M EeC9lDQeJrT98ZgPojzGC0D9+1cP+Ei8xpA8dJNhtjlRnwaLHYzVpsjW1X0jUJVMKhP7 /CJ2gvzr2x3e/D/wC0y5NvDnIKwEmxfUP/DWZC5gHbk/MbfqZV5aAbrNliny6FMf84gC AVI5YKev7vS84Vm7W45XtOU6F9fgBEZd4G9hJy1ZHP10PzvgdsqeH653J7xNhPSlh6iO JPq8jlLDz649LRASyyOgSTDZ36ZROyDvg4v+mom5jcP96I5Hd0RWS6HpwogKAOuF/AYA dUtA== X-Gm-Message-State: APjAAAUvLmgJfwjtNQCz9pkUTfXYaByQnlH6yMQV4eE2pg66YjV5oMMY FgiwZ3v7sRnvAoBOOybFOsDv+zpm X-Google-Smtp-Source: APXvYqxGHK7w7iNooD1WgSJ6aXS2axwty0plxnWwYYWQhpP+ilNdNVrgTzdKM6Z40OGqkPo7eV0aLw== X-Received: by 2002:adf:e74f:: with SMTP id c15mr86557927wrn.274.1577978560772; Thu, 02 Jan 2020 07:22:40 -0800 (PST) Received: from localhost.localdomain ([82.129.88.112]) by smtp.gmail.com with ESMTPSA id a16sm56099522wrt.37.2020.01.02.07.22.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Jan 2020 07:22:39 -0800 (PST) From: Derek Buitenhuis To: ffmpeg-devel@ffmpeg.org Date: Thu, 2 Jan 2020 15:22:25 +0000 Message-Id: <20200102152225.182072-1-derek.buitenhuis@gmail.com> X-Mailer: git-send-email 2.25.0.rc0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/ffv1: Properly check that the 4th and 5th quant tables are zeroes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: jerome@mediaarea.net Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Currently, the decoder checks the 128th value of the 4th quant table during while deriving the context on each sample, in order to speed itself up. This is due to relying on the behavior of FFmpeg's FFV1 encoder, in which if that value is zero, the entire 4th and 5th quant tables are assumed to be entirely zero. This does not match the FFV1 spec, which has no such restriction, and after some discussion, it was decided to fix FFmpeg to abide by the spec, rather than change the spec. We will now check whether the 4th and 5th quant tables are zero properly, using a full table compare, once, when they are read in. There should be no speed loss with this method. For further context, the FFV1 issue in question is located at: https://github.com/FFmpeg/FFV1/issues/169 Signed-off-by: Derek Buitenhuis --- libavcodec/ffv1.h | 15 +++++++++++++++ libavcodec/ffv1_template.c | 2 +- libavcodec/ffv1dec.c | 5 +++++ libavcodec/ffv1enc.c | 6 ++++++ 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h index f0bb19350a..115ea24120 100644 --- a/libavcodec/ffv1.h +++ b/libavcodec/ffv1.h @@ -67,6 +67,7 @@ typedef struct VlcState { typedef struct PlaneContext { int16_t quant_table[MAX_CONTEXT_INPUTS][256]; + int three_quant_tables; int quant_table_index; int context_count; uint8_t (*state)[CONTEXT_SIZE]; @@ -102,7 +103,9 @@ typedef struct FFV1Context { int ac_byte_count; ///< number of bytes used for AC coding PlaneContext plane[MAX_PLANES]; int16_t quant_table[MAX_CONTEXT_INPUTS][256]; + int three_quant_table; int16_t quant_tables[MAX_QUANT_TABLES][MAX_CONTEXT_INPUTS][256]; + int three_quant_tables[MAX_QUANT_TABLES]; int context_count[MAX_QUANT_TABLES]; uint8_t state_transition[256]; uint8_t (*initial_states[MAX_QUANT_TABLES])[32]; @@ -148,6 +151,18 @@ int ff_ffv1_allocate_initial_states(FFV1Context *f); void ff_ffv1_clear_slice_state(FFV1Context *f, FFV1Context *fs); int ff_ffv1_close(AVCodecContext *avctx); +/* + * The last two quant tables are often zeroes, so we can do a single check + * here, to speed up every get_context call on such files. + */ +static av_always_inline int is_three_quant_tables(const int16_t quant_table[MAX_CONTEXT_INPUTS][256]) +{ + const int16_t zero_quant_table[256] = { 0 }; + + return !memcmp(zero_quant_table, quant_table[3], sizeof(zero_quant_table)) && + !memcmp(zero_quant_table, quant_table[4], sizeof(zero_quant_table)); +} + static av_always_inline int fold(int diff, int bits) { if (bits == 8) diff --git a/libavcodec/ffv1_template.c b/libavcodec/ffv1_template.c index f2ab93313a..6304b54ab1 100644 --- a/libavcodec/ffv1_template.c +++ b/libavcodec/ffv1_template.c @@ -37,7 +37,7 @@ static inline int RENAME(get_context)(PlaneContext *p, TYPE *src, const int RT = last[1]; const int L = src[-1]; - if (p->quant_table[3][127]) { + if (!p->three_quant_tables) { const int TT = last2[0]; const int LL = src[-2]; return p->quant_table[0][(L - LT) & 0xFF] + diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c index e465ed49d7..5ac67daaa8 100644 --- a/libavcodec/ffv1dec.c +++ b/libavcodec/ffv1dec.c @@ -197,6 +197,7 @@ static int decode_slice_header(FFV1Context *f, FFV1Context *fs) } p->quant_table_index = idx; memcpy(p->quant_table, f->quant_tables[idx], sizeof(p->quant_table)); + p->three_quant_tables = f->three_quant_tables[idx]; context_count = f->context_count[idx]; if (p->context_count < context_count) { @@ -474,6 +475,7 @@ static int read_extra_header(FFV1Context *f) for (i = 0; i < f->quant_table_count; i++) { f->context_count[i] = read_quant_tables(c, f->quant_tables[i]); + f->three_quant_tables[i] = is_three_quant_tables(f->quant_tables[i]); if (f->context_count[i] < 0) { av_log(f->avctx, AV_LOG_ERROR, "read_quant_table error\n"); return AVERROR_INVALIDDATA; @@ -735,6 +737,7 @@ static int read_header(FFV1Context *f) f->chroma_h_shift, f->chroma_v_shift, f->avctx->pix_fmt); if (f->version < 2) { context_count = read_quant_tables(c, f->quant_table); + f->three_quant_table = is_three_quant_tables(f->quant_table); if (context_count < 0) { av_log(f->avctx, AV_LOG_ERROR, "read_quant_table error\n"); return AVERROR_INVALIDDATA; @@ -797,9 +800,11 @@ static int read_header(FFV1Context *f) p->quant_table_index = idx; memcpy(p->quant_table, f->quant_tables[idx], sizeof(p->quant_table)); + p->three_quant_tables = f->three_quant_tables[idx]; context_count = f->context_count[idx]; } else { memcpy(p->quant_table, f->quant_table, sizeof(p->quant_table)); + p->three_quant_tables = f->three_quant_table; } if (f->version <= 2) { diff --git a/libavcodec/ffv1enc.c b/libavcodec/ffv1enc.c index c521b7d445..e6e5171c8f 100644 --- a/libavcodec/ffv1enc.c +++ b/libavcodec/ffv1enc.c @@ -733,31 +733,37 @@ FF_ENABLE_DEPRECATION_WARNINGS s->quant_tables[0][0][i]= quant11[i]; s->quant_tables[0][1][i]= 11*quant11[i]; s->quant_tables[0][2][i]= 11*11*quant11[i]; + s->three_quant_tables[0]= 1; s->quant_tables[1][0][i]= quant11[i]; s->quant_tables[1][1][i]= 11*quant11[i]; s->quant_tables[1][2][i]= 11*11*quant5 [i]; s->quant_tables[1][3][i]= 5*11*11*quant5 [i]; s->quant_tables[1][4][i]= 5*5*11*11*quant5 [i]; + s->three_quant_tables[1]= 0; } else { s->quant_tables[0][0][i]= quant9_10bit[i]; s->quant_tables[0][1][i]= 11*quant9_10bit[i]; s->quant_tables[0][2][i]= 11*11*quant9_10bit[i]; + s->three_quant_tables[0]= 1; s->quant_tables[1][0][i]= quant9_10bit[i]; s->quant_tables[1][1][i]= 11*quant9_10bit[i]; s->quant_tables[1][2][i]= 11*11*quant5_10bit[i]; s->quant_tables[1][3][i]= 5*11*11*quant5_10bit[i]; s->quant_tables[1][4][i]= 5*5*11*11*quant5_10bit[i]; + s->three_quant_tables[1]= 0; } } s->context_count[0] = (11 * 11 * 11 + 1) / 2; s->context_count[1] = (11 * 11 * 5 * 5 * 5 + 1) / 2; memcpy(s->quant_table, s->quant_tables[s->context_model], sizeof(s->quant_table)); + s->three_quant_table = s->three_quant_tables[s->context_model]; for (i = 0; i < s->plane_count; i++) { PlaneContext *const p = &s->plane[i]; memcpy(p->quant_table, s->quant_table, sizeof(p->quant_table)); + p->three_quant_tables= s->three_quant_table; p->quant_table_index = s->context_model; p->context_count = s->context_count[p->quant_table_index]; }