From patchwork Fri May 5 21:54:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Devin Heitmueller X-Patchwork-Id: 41503 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:dca6:b0:f3:34fa:f187 with SMTP id ky38csp769343pzb; Fri, 5 May 2023 13:58:45 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4cD6IrUiA76s6ZC+bcw0A8rVGG5UGgEYMQyvYtwgYjg3y4B/gEduuC4sORfvluu96UXbud X-Received: by 2002:a17:907:9345:b0:958:4870:8d09 with SMTP id bv5-20020a170907934500b0095848708d09mr2244681ejc.37.1683320325162; Fri, 05 May 2023 13:58:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683320325; cv=none; d=google.com; s=arc-20160816; b=eoCuBT8RihpJ7uC43QFY14cfTn6Hh5ikLKK8pb0tXnTNljhBwgeoov6oMpjRTVyC8V 4KFqFhuA3G7lGvy+3B9lF+1sx5648aAAow1sUIBvuiwtCn8qxXRzAn5kxsdxuageohTQ y04IV1qg4A+8RVQcI30eHJanU6e8lBmQ1EtgEKYUqnmHvtEk+nOGG5+9nzcMda/dCbqF Oj4VLlEGXyPG+3LSkwYXozQqgtCRliWlQrec/iP2nt6zRzyMBIJZbV+bdci9u/kAViQu T6p3120gT91LBKD+mx2aJNoCJKlJS3LlCvrJBOBoBGEF8/5Bly3jVN8Te+Js6GTRy5Fr +5hA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:dkim-signature :delivered-to; bh=ZrkUwz/Xa2MyFlX2amaSs5dV/zsWqZ7ueXNItAxzSlI=; b=uN7xwMc8iFCJgudQDqg7Bk554rixlkccjKbJC9u2XOjctyUr11TvL2MMRXlNwJbgk7 eWHG5gHR4jriV6gcsCQKQEnDMqTBGOWAA93WP6ash75KyXj7GaaOvQapvo/n5VhMBk6X 9Zzs9ZngWYq2A2tuIAPRNoTSo6e06AIsyQyXspx3O2I3fOZXVjLgQGXOB8dP+9Xs77Im B90Zki2JH6kV3j1sR3qiK+jfSIWXfuwrgIwBJgM1hYKr+gXNokSnWiCayx/z4I6ZfJzB FxEGTVG1DINv9H7qSkBTbLWmP47gBYmTbC+UnTiaNniTibKcMOpmAm/tFQjadlRRY9mO 8Hdg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@ltnglobal-com.20221208.gappssmtp.com header.s=20221208 header.b=AafnwyKU; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id sd41-20020a1709076e2900b009619a2a7c09si2390710ejc.315.2023.05.05.13.58.44; Fri, 05 May 2023 13:58:45 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ltnglobal-com.20221208.gappssmtp.com header.s=20221208 header.b=AafnwyKU; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 36388687FF0; Fri, 5 May 2023 23:58:41 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DDEC768C08C for ; Fri, 5 May 2023 23:58:34 +0300 (EEST) Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-61b5da092dfso11090136d6.0 for ; Fri, 05 May 2023 13:58:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ltnglobal-com.20221208.gappssmtp.com; s=20221208; t=1683320313; x=1685912313; h=message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7ZmJKKwuIb/fBHJSltNxBAlz6WweLuEeG+ZmSSGudPc=; b=AafnwyKUDvqoLNilL3tcgdh9O487XC0ElWMXqiV/PspSoT/rp9GwYIGfKMQJXwBC55 j9jX4jHwoIqroHlozQptlrIiZwqJwQipwBoBMbAPVkcOHjwJ6Kj3AI1MtmSrM37SMHrS 5461vdcCtAZ7F3/RPEAVacyMRdBtToPdIk2iy3NlGwXXEq25kQ0JD7xnpLg8+JNb0Tz0 4/S3ez/3l/+tPInELQcJGtfROe+JUNcakoe7Kpq1QMaf5A6ViskYIQq80kAaXMThFKSa nFwNnmzPl+0PITN7lW/F+iN9gQjij6bqD3485UoDYyB3cV6TkjGfPx9BvPX6cY5e+TAv dC1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683320313; x=1685912313; h=message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7ZmJKKwuIb/fBHJSltNxBAlz6WweLuEeG+ZmSSGudPc=; b=Kdrl6srbSXCf6ttLJv7WqZ2HMD1kSyGcU3S21Bw8sHOT73VpRWsIlWIcH2du9SUAJe lgsD+IJhMgj9W0sP7KW2/ODEvajyit7O+7mDzfbLk9ctWrqfpu2+0EyiPCQkRif6KHs/ FH7W/U9ABXTFlfGM/Sxtty5UbSvv6YTVUygRwBWQU0JXbZ8rTh3PBUzigKwM8mkSeDyg MRtESqS7dk+nU8/Iel7UtM033/53UN8IF4okVANypvTNildABQE0KwT2I62mQx9pvHLs bGg2cg6KuFUMBOvOxkkGKXYYngpJiidGnJegWcdmNQzGK7m4n7+O8qSOSQ9KTFqT8tuD VvEA== X-Gm-Message-State: AC+VfDz/ta/xloO7+rR343qT4h1t/crt4BRLuMTktG4rKeQIX0xIW2zh ISmIDLwbKB2FVTEeIcGKg+WqyHNV73oa4uHryw4= X-Received: by 2002:a05:6214:cac:b0:61b:58be:70cd with SMTP id s12-20020a0562140cac00b0061b58be70cdmr4458443qvs.17.1683320313151; Fri, 05 May 2023 13:58:33 -0700 (PDT) Received: from ltnt-nyc-580testdevin.livetimenet.com (pool-71-105-132-214.nycmny.fios.verizon.net. [71.105.132.214]) by smtp.gmail.com with ESMTPSA id d7-20020a0cf0c7000000b006168277998csm863213qvl.58.2023.05.05.13.58.32 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 May 2023 13:58:32 -0700 (PDT) From: Devin Heitmueller X-Google-Original-From: Devin Heitmueller To: ffmpeg-devel@ffmpeg.org Date: Fri, 5 May 2023 17:54:17 -0400 Message-Id: <1683323657-20687-1-git-send-email-dheitmueller@ltnglobal.com> X-Mailer: git-send-email 1.8.3.1 Subject: [FFmpeg-devel] [RFC/PATCH] bitpacked_dec: Optimization for bitpacked_dec decoder performance X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Devin Heitmueller MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: FKNGs0sg8mVT Rework the code a bit to speed up the 10-bit bitpacked decoding routine. This is probably about as fast as I can get it without switching to assembly language. Demonstratable with: ./ffmpeg -f lavfi -i "smptehdbars=size=3840x2160" -c bitpacked -f image2 -frames:v 1 source.yuv ./ffmpeg -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le out.yuv On my development system, it went from 80ms for a 2160p frame down to 20ms (i.e. a 4X speedup). Good enough for now, I hope... Signed-off-by: Devin Heitmueller --- libavcodec/bitpacked_dec.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/libavcodec/bitpacked_dec.c b/libavcodec/bitpacked_dec.c index a1ffef1..96aba27 100644 --- a/libavcodec/bitpacked_dec.c +++ b/libavcodec/bitpacked_dec.c @@ -28,7 +28,6 @@ #include "avcodec.h" #include "codec_internal.h" -#include "get_bits.h" #include "libavutil/imgutils.h" #include "thread.h" @@ -65,7 +64,7 @@ static int bitpacked_decode_yuv422p10(AVCodecContext *avctx, AVFrame *frame, { uint64_t frame_size = (uint64_t)avctx->width * (uint64_t)avctx->height * 20; uint64_t packet_size = (uint64_t)avpkt->size * 8; - GetBitContext bc; + uint8_t *src; uint16_t *y, *u, *v; int ret, i, j; @@ -79,20 +78,18 @@ static int bitpacked_decode_yuv422p10(AVCodecContext *avctx, AVFrame *frame, if (avctx->width % 2) return AVERROR_PATCHWELCOME; - ret = init_get_bits(&bc, avpkt->data, avctx->width * avctx->height * 20); - if (ret) - return ret; - + src = avpkt->data; for (i = 0; i < avctx->height; i++) { y = (uint16_t*)(frame->data[0] + i * frame->linesize[0]); u = (uint16_t*)(frame->data[1] + i * frame->linesize[1]); v = (uint16_t*)(frame->data[2] + i * frame->linesize[2]); for (j = 0; j < avctx->width; j += 2) { - *u++ = get_bits(&bc, 10); - *y++ = get_bits(&bc, 10); - *v++ = get_bits(&bc, 10); - *y++ = get_bits(&bc, 10); + *u++ = (src[0] << 2) | (src[1] >> 6); + *y++ = ((src[1] << 4) | (src[2] >> 4)) & 0x3ff; + *v++ = ((src[2] << 6) | (src[3] >> 2)) & 0x3ff; + *y++ = ((src[3] << 8) | (src[4])) & 0x3ff; + src += 5; } }