From patchwork Wed Dec 13 19:58:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marton Balint X-Patchwork-Id: 45127 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1225:b0:181:818d:5e7f with SMTP id v37csp4514486pzf; Wed, 13 Dec 2023 11:58:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IHh7gjQ58dY73+8LYszYffWjyXxaMqu7IjhXj7jKnuCnyv5tmeiEYjeVZDd18h5B5wRmA6b X-Received: by 2002:a05:651c:150:b0:2ca:25a7:7364 with SMTP id c16-20020a05651c015000b002ca25a77364mr4060094ljd.66.1702497524793; Wed, 13 Dec 2023 11:58:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702497524; cv=none; d=google.com; s=arc-20160816; b=ROj8EXcrx6PK6KLrtDoqNMV6n2nz/F57byqki2CMW1jn0wdJCOoTBfReFQlfquT/8I QRRbvscjnss8mtLOxwo8PLfAZZXlESx0gmA5Z6BRmCH+22+MH3OtRPD48zUlPFftosf8 B/cV+C0KuHNho7KoUbn6w4rLKAIEi2X1vybvbCKdm55j6zvSZAG843T5MlT34KlledX5 bwyHInlWpVlg8hhEHNso9oLAptUYmKrtaWFAfSHZ36vdxgzRdGVrC1ug5H69AZCR0h8c fWt4R9bg0Y5jNT5pZmPoGAZnKyExKNiacBRDygLQPmU2j/dBGoIdvNq7Lf9q+kJdq7o5 E6EQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:delivered-to; bh=RoBlNFJouadpBOzgQedJVDAPIjqzrFqZRSyYATiEGos=; fh=V+wPu/uCk8uqMbYKDQJEV7wx5VbtDWLWaKZivKPiIwM=; b=rOuX2k0Hdwp1LL1YDuKvZLPGuQNc5KEbonwq1C5T/mN43OmLXoc5Q5mYo9ZqLCCQ4h nbjvqb74YRQpWdkwglMqcKidYcjox//5OgnQ5xt7skasAl5n0ly05UpCdbOyInqaiD/V 8vXEpfeXQ2WB33gCLycnUW3rQbbOMOAA8lJbty0t6zPOZbSxHo2bKqPJkoL5QG2vJT89 7W+QIV79+64zNUmcVmBd1rNOfubGJOiKrHEyoREE5QcmUQpQbizk/LdPmVcVLnC8aQBb WEjLjsyGmJBvFZ6DZT5bcPp+pa/bT7JgS+xexpgI3LP7X4ZPDbuWvZUTby4A/AEV/2tn FC4Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r24-20020a50d698000000b0054c6c8d7aa6si5445054edi.267.2023.12.13.11.58.42; Wed, 13 Dec 2023 11:58:44 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C4AFA68D0BA; Wed, 13 Dec 2023 21:58:39 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from iq.passwd.hu (iq.passwd.hu [217.27.212.140]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6BACF68CF79 for ; Wed, 13 Dec 2023 21:58:33 +0200 (EET) Received: from localhost (localhost [127.0.0.1]) by iq.passwd.hu (Postfix) with ESMTP id 35473E99AB; Wed, 13 Dec 2023 20:58:33 +0100 (CET) X-Virus-Scanned: amavisd-new at passwd.hu Received: from iq.passwd.hu ([127.0.0.1]) by localhost (iq.passwd.hu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bXBx3nnoTzbN; Wed, 13 Dec 2023 20:58:30 +0100 (CET) Received: from bluegene.passwd.hu (localhost [127.0.0.1]) by iq.passwd.hu (Postfix) with ESMTP id 2AA69E98F8; Wed, 13 Dec 2023 20:58:30 +0100 (CET) From: Marton Balint To: ffmpeg-devel@ffmpeg.org Date: Wed, 13 Dec 2023 20:58:20 +0100 Message-Id: <20231213195820.21046-1-cus@passwd.hu> X-Mailer: git-send-email 2.35.3 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avcodec/bitpacked_dec: optimize bitpacked_decode_yuv422p10 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Marton Balint , Devin Heitmueller , Devin Heitmueller Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: GfjMUNPX0E7o From: Devin Heitmueller Rework the code a bit to speed up the 10-bit bitpacked decoding routine. This is probably about as fast as I can get it without switching to assembly language. Demonstratable with: ./ffmpeg -f lavfi -i "smptehdbars=size=3840x2160" -c bitpacked -f image2 -frames:v 1 source.yuv ./ffmpeg -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le out.yuv On my development system, it went from 80ms for a 2160p frame down to 20ms (i.e. a 4X speedup). Good enough for now, I hope... Comments from Marton: Originally on my system better performance could be achieved by simply switching to the cached bitstream reader, but for Devin it was slower than his direct byte operations. I changed the order of writing output from u/y/v/y to u/v/y/y, and that made the code faster than the cached bitstream reader on my system as well. TIMER measurement of the decode loop on Ryzen 5 3600 with command line: ./ffmpeg -stream_loop 256 -threads 1 -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le -f null none -loglevel error Before: 823204127 decicycles in YUV, 256 runs, 0 skips After: 315070524 decicycles in YUV, 256 runs, 0 skips Signed-off-by: Devin Heitmueller Signed-off-by: Marton Balint --- libavcodec/bitpacked_dec.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/libavcodec/bitpacked_dec.c b/libavcodec/bitpacked_dec.c index c88f861993..54c008bd86 100644 --- a/libavcodec/bitpacked_dec.c +++ b/libavcodec/bitpacked_dec.c @@ -28,7 +28,6 @@ #include "avcodec.h" #include "codec_internal.h" -#include "get_bits.h" #include "libavutil/imgutils.h" #include "thread.h" @@ -65,7 +64,7 @@ static int bitpacked_decode_yuv422p10(AVCodecContext *avctx, AVFrame *frame, { uint64_t frame_size = (uint64_t)avctx->width * (uint64_t)avctx->height * 20; uint64_t packet_size = (uint64_t)avpkt->size * 8; - GetBitContext bc; + uint8_t *src; uint16_t *y, *u, *v; int ret, i, j; @@ -79,20 +78,18 @@ static int bitpacked_decode_yuv422p10(AVCodecContext *avctx, AVFrame *frame, if (avctx->width % 2) return AVERROR_PATCHWELCOME; - ret = init_get_bits(&bc, avpkt->data, avctx->width * avctx->height * 20); - if (ret) - return ret; - + src = avpkt->data; for (i = 0; i < avctx->height; i++) { y = (uint16_t*)(frame->data[0] + i * frame->linesize[0]); u = (uint16_t*)(frame->data[1] + i * frame->linesize[1]); v = (uint16_t*)(frame->data[2] + i * frame->linesize[2]); for (j = 0; j < avctx->width; j += 2) { - *u++ = get_bits(&bc, 10); - *y++ = get_bits(&bc, 10); - *v++ = get_bits(&bc, 10); - *y++ = get_bits(&bc, 10); + *u++ = (src[0] << 2) | (src[1] >> 6); + *v++ = ((src[2] << 6) | (src[3] >> 2)) & 0x3ff; + *y++ = ((src[1] << 4) | (src[2] >> 4)) & 0x3ff; + *y++ = ((src[3] << 8) | (src[4])) & 0x3ff; + src += 5; } }