From patchwork Mon Aug 28 17:36:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 43380 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3822:b0:149:dfde:5c0a with SMTP id p34csp297710pzf; Mon, 28 Aug 2023 10:29:12 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEVWjO7GazXhtWyeZjM45i9Yw2EUKTDOdAv7kiEczdvkHRJBDG/yQa+ZNBDj36gh65RRFK5 X-Received: by 2002:a17:906:3193:b0:9a2:256a:65cd with SMTP id 19-20020a170906319300b009a2256a65cdmr9146999ejy.4.1693243751564; Mon, 28 Aug 2023 10:29:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693243751; cv=none; d=google.com; s=arc-20160816; b=iGpaLPDU7sKgEUU24YXUcOmqiE3VLQw8WF6vXT46CMQHy5HNn8xlbCNsQ7OanGumd/ WKv7KgNp6w5Edus5TC/lwq/AbwSSkQvehZTuREoLrh9zSUHMdCUPLFekd6XNkalNspdv APc0BytTnEbdH3bn/36FWP4NAnpd6QunnkA3vikgK8UrH9ou+FYJdqCeZJjoOLRKTGr0 Lfw/aABnOw5lXJGAm8sgIaZlZqgt3aBcCUKpxC6xoGd+/QBIDrBzsVHbLv7oVqkT5jX5 1bbvG/V+lk0RBy8LT5dMClqkpOs2r9IGsGz9yYKZoA+QmcF7sQ2Jy4XuBKHHnct/l5QF Y63w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to; bh=4jFftam1ExrDYTLNrMQaPrcF+lvnv0U6dvee2tcewSg=; fh=e5zN9xSzcxLA6bGo3lF+CqTbY/oLwzApV03EO/RBfgQ=; b=ptar55t7P2EwS2pK7xrEbOtG0ow+YqnP+eXnvgmnHh+YGnWXw5+dNuVSwNvXK3XsJ+ qR9cIvQnUjpAjgB/uzvL1wc7Vrbf4MWEtwzRfDuQla7BMToGcWz5n+67zTKo3mNFS3wy 2TbOAx7Ol3Pn4DmjFdbrTaMNXoSLfEv3VEcdE8FTSWr9JS7CvL1pgKwiARvhvModAONh rraO4K59AMYNuNCi0KikBuE5ix20Vw55Nt6W4bBkMWh1gDupka37DzbsGJXj6Zb32L4k R9+NWlahSDpUKTqdOr7SHqBvYKHgQXGWm/cmxHVboOB7phPRPAS1NqA2SZPxy/dC5lCS Pa3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20221208 header.b=I57U0rZp; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id lz27-20020a170906fb1b00b0098d7e44a637si2901099ejb.794.2023.08.28.10.29.10; Mon, 28 Aug 2023 10:29:11 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20221208 header.b=I57U0rZp; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A1D3368C643; Mon, 28 Aug 2023 20:29:07 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-vk1-f172.google.com (mail-vk1-f172.google.com [209.85.221.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 62F8268C601 for ; Mon, 28 Aug 2023 20:29:01 +0300 (EEST) Received: by mail-vk1-f172.google.com with SMTP id 71dfb90a1353d-48d0edd8a81so1359694e0c.0 for ; Mon, 28 Aug 2023 10:29:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693243740; x=1693848540; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=KIgt1m1SI05kt8+HI3L3SsuG7/t10bXX6zkDoK1QKjs=; b=I57U0rZp3SNZqDeJ5cwhP7kq55Qs4HCC5sPvoYQFIeKcorzr0cpzX4KFeu7K8tTnwt Wwwl8KG4KLRwon/nwEbCRMDj6O4sgiShKXqLYYLAy6SpPXFH/RN7MZ7Tg9+F984n3xKS YsrqGP3yZHkgmeJ3twFKpRELzM4kM5Pv9Qt2Rwh9SnZjodjMCiaMKMFmD9lo/nRSe5l/ GSZFJteXOBSgGJJTkf3T1h0DqbDzxz25u+LO5f21nG8DdwNE3DxAN2CZ/szVCIR3TuVQ 5JB2LhvIn1AWeN/TS7iuLzYooSYF5Z/spQ92Dun8RFvfJu2fXq1th1oRNovQXOTpsmYW CtVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693243740; x=1693848540; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=KIgt1m1SI05kt8+HI3L3SsuG7/t10bXX6zkDoK1QKjs=; b=DG0/6axAPtEDt+3Irz+ZZYbtTrwh8Vyyh5ToBdLe7ugFOFKQRqOQ+jy1gM6K4M9tZE mWa9IMvS7hvzvdubCvwT6yWyZusP11Hsorfs/nFfjMTHU1sip/8dyKjTjFBoYec9P/p+ UvLCN6ucdGDFjrV0ysAYZnw1tVMG6x8m1n3n8gN20Z6kHqBd0HfE12vyWu4aMyfNVjfN 8vG9OQ/gMyIV0HG5gNXwrW92xobJSN72gBxHWDzM/QMgn8piOE/tbAFr5Un4LsDgr1hf iiiY1iUXNktK3s+sbeOQtuIem29YneAXEGYdX0aZDNAx1ES7SgXvDQJtjDGBxXSbMuIk W60Q== X-Gm-Message-State: AOJu0Yx7d/3zDUY6NRJIjkrryrIVLMrv0kJN34ZSunqDPjKTX9a5GVz1 el7SHxHynx6mLhiBwsyPWDGjCL0V7EWHnJIcAbAIBoMR X-Received: by 2002:a05:6122:118a:b0:48d:10eb:7fb4 with SMTP id x10-20020a056122118a00b0048d10eb7fb4mr20407874vkn.4.1693243739610; Mon, 28 Aug 2023 10:28:59 -0700 (PDT) MIME-Version: 1.0 From: Paul B Mahol Date: Mon, 28 Aug 2023 19:36:17 +0200 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH] MULTI VLC decoding boost X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: TLJgVq9uXPEN Patches attached. Thanks for kurosu for pointing unmerged branches. The UNCACHED_PATH for x86_32 is broken with this for 2 codecs it touches. Fix is trivial, to be fixed later. From 4250d74dad2bfb4c8d01fc26c9635c56293fc74c Mon Sep 17 00:00:00 2001 From: Christophe Gisquet Date: Sun, 9 Jul 2017 12:56:35 +0000 Subject: [PATCH 3/3] avcodec/utvideodec: add vlc multi support Faster decoding, by average 50% faster overall. Signed-off-by: Paul B Mahol --- libavcodec/utvideo.h | 1 + libavcodec/utvideodec.c | 91 ++++++++++++++++++++--------------------- 2 files changed, 45 insertions(+), 47 deletions(-) diff --git a/libavcodec/utvideo.h b/libavcodec/utvideo.h index 9da9329ff3..d274b6586d 100644 --- a/libavcodec/utvideo.h +++ b/libavcodec/utvideo.h @@ -80,6 +80,7 @@ typedef struct UtvideoContext { ptrdiff_t slice_stride; uint8_t *slice_bits, *slice_buffer[4]; + void *buffer; int slice_bits_size; const uint8_t *packed_stream[4][256]; diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c index 1f00c58950..0b0352b7ec 100644 --- a/libavcodec/utvideodec.c +++ b/libavcodec/utvideodec.c @@ -46,7 +46,7 @@ typedef struct HuffEntry { } HuffEntry; static int build_huff(UtvideoContext *c, const uint8_t *src, VLC *vlc, - int *fsym, unsigned nb_elems) + VLC_MULTI *multi, int *fsym, unsigned nb_elems) { int i; HuffEntry he[1024]; @@ -82,11 +82,34 @@ static int build_huff(UtvideoContext *c, const uint8_t *src, VLC *vlc, he[--codes_count[bits[i]]] = (HuffEntry) { bits[i], i }; #define VLC_BITS 11 - return ff_init_vlc_from_lengths(vlc, VLC_BITS, codes_count[0], + return ff_init_vlc_multi_from_lengths(vlc, multi, VLC_BITS, nb_elems, codes_count[0], &he[0].len, sizeof(*he), &he[0].sym, sizeof(*he), 2, 0, 0, c->avctx); } +#define READ_PLANE(b, end) \ +{ \ + buf = !use_pred ? dest : c->buffer; \ + for (i = 0; CACHED_BITSTREAM_READER && i < width-end && get_bits_left(&gb) > 0;) {\ + ret = get_vlc_multi(&gb, (uint8_t *)buf + i * b, multi.table, \ + vlc.table, VLC_BITS, 3); \ + if (ret > 0) \ + i += ret; \ + if (ret <= 0) \ + goto fail; \ + } \ + for (; i < width && get_bits_left(&gb) > 0; i++) \ + buf[i] = get_vlc2(&gb, vlc.table, VLC_BITS, 3); \ + if (use_pred) { \ + if (b == 2) \ + c->llviddsp.add_left_pred_int16((uint16_t *)dest, (const uint16_t *)buf, 0x3ff, width, prev); \ + else \ + c->llviddsp.add_left_pred((uint8_t *)dest, (const uint8_t *)buf, width, prev); \ + } \ + prev = dest[width-1]; \ + dest += stride; \ +} + static int decode_plane10(UtvideoContext *c, int plane_no, uint16_t *dst, ptrdiff_t stride, int width, int height, @@ -95,11 +118,12 @@ static int decode_plane10(UtvideoContext *c, int plane_no, { int i, j, slice, pix, ret; int sstart, send; + VLC_MULTI multi; VLC vlc; GetBitContext gb; int prev, fsym; - if ((ret = build_huff(c, huff, &vlc, &fsym, 1024)) < 0) { + if ((ret = build_huff(c, huff, &vlc, &multi, &fsym, 1024)) < 0) { av_log(c->avctx, AV_LOG_ERROR, "Cannot build Huffman codes\n"); return ret; } @@ -131,7 +155,7 @@ static int decode_plane10(UtvideoContext *c, int plane_no, send = 0; for (slice = 0; slice < c->slices; slice++) { - uint16_t *dest; + uint16_t *dest, *buf; int slice_data_start, slice_data_end, slice_size; sstart = send; @@ -156,37 +180,20 @@ static int decode_plane10(UtvideoContext *c, int plane_no, init_get_bits(&gb, c->slice_bits, slice_size * 8); prev = 0x200; - for (j = sstart; j < send; j++) { - for (i = 0; i < width; i++) { - pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3); - if (pix < 0) { - av_log(c->avctx, AV_LOG_ERROR, "Decoding error\n"); - goto fail; - } - if (use_pred) { - prev += pix; - prev &= 0x3FF; - pix = prev; - } - dest[i] = pix; - } - dest += stride; - if (get_bits_left(&gb) < 0) { - av_log(c->avctx, AV_LOG_ERROR, - "Slice decoding ran out of bits\n"); - goto fail; - } - } + for (j = sstart; j < send; j++) + READ_PLANE(2, 3) if (get_bits_left(&gb) > 32) av_log(c->avctx, AV_LOG_WARNING, "%d bits left after decoding slice\n", get_bits_left(&gb)); } ff_free_vlc(&vlc); + ff_free_vlc_multi(&multi); return 0; fail: ff_free_vlc(&vlc); + ff_free_vlc_multi(&multi); return AVERROR_INVALIDDATA; } @@ -207,6 +214,7 @@ static int decode_plane(UtvideoContext *c, int plane_no, { int i, j, slice, pix; int sstart, send; + VLC_MULTI multi; VLC vlc; GetBitContext gb; int ret, prev, fsym; @@ -259,7 +267,7 @@ static int decode_plane(UtvideoContext *c, int plane_no, return 0; } - if (build_huff(c, src, &vlc, &fsym, 256)) { + if (build_huff(c, src, &vlc, &multi, &fsym, 256)) { av_log(c->avctx, AV_LOG_ERROR, "Cannot build Huffman codes\n"); return AVERROR_INVALIDDATA; } @@ -292,7 +300,7 @@ static int decode_plane(UtvideoContext *c, int plane_no, send = 0; for (slice = 0; slice < c->slices; slice++) { - uint8_t *dest; + uint8_t *dest, *buf; int slice_data_start, slice_data_end, slice_size; sstart = send; @@ -317,36 +325,20 @@ static int decode_plane(UtvideoContext *c, int plane_no, init_get_bits(&gb, c->slice_bits, slice_size * 8); prev = 0x80; - for (j = sstart; j < send; j++) { - for (i = 0; i < width; i++) { - pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3); - if (pix < 0) { - av_log(c->avctx, AV_LOG_ERROR, "Decoding error\n"); - goto fail; - } - if (use_pred) { - prev += pix; - pix = prev; - } - dest[i] = pix; - } - if (get_bits_left(&gb) < 0) { - av_log(c->avctx, AV_LOG_ERROR, - "Slice decoding ran out of bits\n"); - goto fail; - } - dest += stride; - } + for (j = sstart; j < send; j++) + READ_PLANE(1, 5) if (get_bits_left(&gb) > 32) av_log(c->avctx, AV_LOG_WARNING, "%d bits left after decoding slice\n", get_bits_left(&gb)); } ff_free_vlc(&vlc); + ff_free_vlc_multi(&multi); return 0; fail: ff_free_vlc(&vlc); + ff_free_vlc_multi(&multi); return AVERROR_INVALIDDATA; } @@ -992,6 +984,10 @@ static av_cold int decode_init(AVCodecContext *avctx) return AVERROR_INVALIDDATA; } + c->buffer = av_calloc(avctx->width, c->pro?2:1); + if (!c->buffer) + return AVERROR(ENOMEM); + av_pix_fmt_get_chroma_sub_sample(avctx->pix_fmt, &h_shift, &v_shift); if ((avctx->width & ((1<height & ((1<priv_data; av_freep(&c->slice_bits); + av_freep(&c->buffer); return 0; } -- 2.39.1