From patchwork Sun Feb 28 18:45:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 26031 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 662EC449D4F for ; Sun, 28 Feb 2021 20:46:01 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4846A68AAE8; Sun, 28 Feb 2021 20:46:01 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2196868AAD5 for ; Sun, 28 Feb 2021 20:45:51 +0200 (EET) Received: by mail-ed1-f52.google.com with SMTP id s8so17793611edd.5 for ; Sun, 28 Feb 2021 10:45:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/Y76/XjZ3YQ2ooLV+HiGiN88xOeUYcI67hleoZi+PdQ=; b=THUV0sxMjVf1KXjNOnEKaSVPddlCCABzn7oYQxA0LN2yuWNFXtbN4g4D6EJoDmm5dm l+YfYTboYqzAs7Sqo+tEK4pMGQKqzVl99V0dTm/obWq1Omix1ku/6FEdKZN6dacGvRcq rQEYjcrwdbxV6wY/Qu3gdEo/7k43Q/LV/dEDuEViVgqrSZH9wvJHAuYHI202s3qmytR3 btuaVn3P5yZE30tAO21y3oGfZHj5zSc1Wtgu7cL1kizYPMKHqJ3u3TR7ABr+/jRKwc7h 1GhAiXFxKFL8qxdCsAN/6TGc8TYja+cQK9ipf8uJ7vKuYqVh4tzR8e5BWpyUGLlfT2m/ 4JQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/Y76/XjZ3YQ2ooLV+HiGiN88xOeUYcI67hleoZi+PdQ=; b=fuqHAqvv4Ocb5z7j5mZf6QOXUmeUQXz7GZOE/sIk0sqGSx6FakqlKLFBSzShMRiZ3J JWsoVcUC5O7OSUhB+gm3/1AfYpKZCGe//KEV9EBmaWjzwOStuNmA3qGv8KEQCBedG3Mp BOmUQ08REWHvWDCmEh4rJ7RvMuA0oSZLbVfkz7gY5PgQtA6ylOxHG6r6pQdb66/t8sow cVHym8xZa/D8RNarwIDw2MxyX6XKiFlzMB8aDb2Hry+HYNjORINC3I3rvs3HGwxMsWSF LPsp2LGBgWhO2WYFwtp9nLIwwNuvg3Eky8EHSMRo6vU7y1Ou32I7KgOwEAvNHVyYnJro fLOw== X-Gm-Message-State: AOAM533xwI3se1wK1KQqbvhNJlQvvwea4SRbpzAQbpKcE2lQLupCuB6Y nmNnsaEelbsLO3rAz+FQVcLZ/6tCdG4= X-Google-Smtp-Source: ABdhPJz255MEyVZxqNBw0o5/SyMfMnKQsegpg9a998D1mVEbXQh3rasilT7MhNOY6dxtZHYuZGLs8A== X-Received: by 2002:aa7:d2d5:: with SMTP id k21mr7421325edr.216.1614537950273; Sun, 28 Feb 2021 10:45:50 -0800 (PST) Received: from sblaptop.fritz.box (ipbcc1aa4b.dynamic.kabel-deutschland.de. [188.193.170.75]) by smtp.gmail.com with ESMTPSA id d14sm13126675edk.81.2021.02.28.10.45.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Feb 2021 10:45:49 -0800 (PST) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Sun, 28 Feb 2021 19:45:10 +0100 Message-Id: <20210228184510.247073-5-andreas.rheinhardt@gmail.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210228184510.247073-1-andreas.rheinhardt@gmail.com> References: <20210228184510.247073-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/5] avcodec/roqvideoenc: Avoid allocating buffers separately X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This is possible because their size is known at compile-time; so they can be put directly into the context and don't need to be allocated for every frame. Signed-off-by: Andreas Rheinhardt --- libavcodec/roqvideoenc.c | 146 ++++++++++++++++++--------------------- 1 file changed, 68 insertions(+), 78 deletions(-) diff --git a/libavcodec/roqvideoenc.c b/libavcodec/roqvideoenc.c index d65c740d5f..1ab0de0abb 100644 --- a/libavcodec/roqvideoenc.c +++ b/libavcodec/roqvideoenc.c @@ -78,6 +78,36 @@ /* The cast is useful when multiplying it by INT_MAX */ #define ROQ_LAMBDA_SCALE ((uint64_t) FF_LAMBDA_SCALE) +typedef struct RoqCodebooks { + int numCB4; + int numCB2; + int usedCB2[MAX_CBS_2x2]; + int usedCB4[MAX_CBS_4x4]; + uint8_t unpacked_cb2[MAX_CBS_2x2*2*2*3]; + uint8_t unpacked_cb4[MAX_CBS_4x4*4*4*3]; + uint8_t unpacked_cb4_enlarged[MAX_CBS_4x4*8*8*3]; +} RoqCodebooks; + +/** + * Temporary vars + */ +typedef struct RoqTempData +{ + int f2i4[MAX_CBS_4x4]; + int i2f4[MAX_CBS_4x4]; + int f2i2[MAX_CBS_2x2]; + int i2f2[MAX_CBS_2x2]; + + int mainChunkSize; + + int numCB4; + int numCB2; + + RoqCodebooks codebooks; + + int used_option[4]; +} RoqTempData; + typedef struct SubcelEvaluation { int eval_dist[4]; int best_bit_use; @@ -115,7 +145,9 @@ typedef struct RoqEncContext { const AVFrame *frame_to_enc; uint8_t *out_buf; - struct RoqTempData *tmpData; + RoqTempData tmp_data; + roq_cell results4[4 * MAX_CBS_4x4]; + int tmp_codebook_buf[FFMAX(24 * MAX_CBS_4x4, 6 * MAX_CBS_2x2)]; CelEvaluation *cel_evals; int *closest_cb; @@ -233,36 +265,6 @@ static inline int squared_diff_macroblock(uint8_t a[], uint8_t b[], int size) return sdiff; } -typedef struct RoqCodebooks { - int numCB4; - int numCB2; - int usedCB2[MAX_CBS_2x2]; - int usedCB4[MAX_CBS_4x4]; - uint8_t unpacked_cb2[MAX_CBS_2x2*2*2*3]; - uint8_t unpacked_cb4[MAX_CBS_4x4*4*4*3]; - uint8_t unpacked_cb4_enlarged[MAX_CBS_4x4*8*8*3]; -} RoqCodebooks; - -/** - * Temporary vars - */ -typedef struct RoqTempData -{ - int f2i4[MAX_CBS_4x4]; - int i2f4[MAX_CBS_4x4]; - int f2i2[MAX_CBS_2x2]; - int i2f2[MAX_CBS_2x2]; - - int mainChunkSize; - - int numCB4; - int numCB2; - - RoqCodebooks codebooks; - - int used_option[4]; -} RoqTempdata; - /** * Initialize cel evaluators and set their source coordinates */ @@ -424,9 +426,10 @@ static void motion_search(RoqEncContext *enc, int blocksize) * Get distortion for all options available to a subcel */ static void gather_data_for_subcel(SubcelEvaluation *subcel, int x, - int y, RoqEncContext *enc, RoqTempdata *tempData) + int y, RoqEncContext *enc) { RoqContext *const roq = &enc->common; + RoqTempData *const tempData = &enc->tmp_data; uint8_t mb4[4*4*3]; uint8_t mb2[2*2*3]; int cluster_index; @@ -488,10 +491,10 @@ static void gather_data_for_subcel(SubcelEvaluation *subcel, int x, /** * Get distortion for all options available to a cel */ -static void gather_data_for_cel(CelEvaluation *cel, RoqEncContext *enc, - RoqTempdata *tempData) +static void gather_data_for_cel(CelEvaluation *cel, RoqEncContext *enc) { RoqContext *const roq = &enc->common; + RoqTempData *const tempData = &enc->tmp_data; uint8_t mb8[8*8*3]; int index = cel->sourceY * roq->width / 64 + cel->sourceX/8; int i, j, best_dist, divide_bit_use; @@ -523,10 +526,10 @@ static void gather_data_for_cel(CelEvaluation *cel, RoqEncContext *enc, index_mb(mb8, tempData->codebooks.unpacked_cb4_enlarged, tempData->codebooks.numCB4, &cel->cbEntry, 8); - gather_data_for_subcel(cel->subCels + 0, cel->sourceX+0, cel->sourceY+0, enc, tempData); - gather_data_for_subcel(cel->subCels + 1, cel->sourceX+4, cel->sourceY+0, enc, tempData); - gather_data_for_subcel(cel->subCels + 2, cel->sourceX+0, cel->sourceY+4, enc, tempData); - gather_data_for_subcel(cel->subCels + 3, cel->sourceX+4, cel->sourceY+4, enc, tempData); + gather_data_for_subcel(cel->subCels + 0, cel->sourceX+0, cel->sourceY+0, enc); + gather_data_for_subcel(cel->subCels + 1, cel->sourceX+4, cel->sourceY+0, enc); + gather_data_for_subcel(cel->subCels + 2, cel->sourceX+0, cel->sourceY+4, enc); + gather_data_for_subcel(cel->subCels + 3, cel->sourceX+4, cel->sourceY+4, enc); cel->eval_dist[RoQ_ID_CCC] = 0; divide_bit_use = 0; @@ -563,9 +566,10 @@ static void gather_data_for_cel(CelEvaluation *cel, RoqEncContext *enc, } } -static void remap_codebooks(RoqEncContext *enc, RoqTempdata *tempData) +static void remap_codebooks(RoqEncContext *enc) { RoqContext *const roq = &enc->common; + RoqTempData *const tempData = &enc->tmp_data; int i, j, idx=0; /* Make remaps for the final codebook usage */ @@ -596,9 +600,10 @@ static void remap_codebooks(RoqEncContext *enc, RoqTempdata *tempData) /** * Write codebook chunk */ -static void write_codebooks(RoqEncContext *enc, RoqTempdata *tempData) +static void write_codebooks(RoqEncContext *enc) { RoqContext *const roq = &enc->common; + RoqTempData *const tempData = &enc->tmp_data; int i, j; uint8_t **outp= &enc->out_buf; @@ -652,10 +657,10 @@ static void write_typecode(CodingSpool *s, uint8_t type) } static void reconstruct_and_encode_image(RoqEncContext *enc, - RoqTempdata *tempData, int w, int h, int numBlocks) { RoqContext *const roq = &enc->common; + RoqTempData *const tempData = &enc->tmp_data; int i, j, k; int x, y; int subX, subY; @@ -815,20 +820,17 @@ static int generate_codebook(RoqEncContext *enc, int i, j, k, ret = 0; int c_size = size*size/4; int *buf; - int *codebook = av_malloc_array(6*c_size, cbsize*sizeof(int)); + int *codebook = enc->tmp_codebook_buf; int *closest_cb = enc->closest_cb; - if (!codebook) - return AVERROR(ENOMEM); - ret = avpriv_init_elbg(points, 6 * c_size, inputCount, codebook, cbsize, 1, closest_cb, &enc->randctx); if (ret < 0) - goto out; + return ret; ret = avpriv_do_elbg(points, 6 * c_size, inputCount, codebook, cbsize, 1, closest_cb, &enc->randctx); if (ret < 0) - goto out; + return ret; buf = codebook; for (i=0; iv = (*buf++ + CHROMA_BIAS/2)/CHROMA_BIAS; results++; } -out: - av_free(codebook); - return ret; + return 0; } -static int generate_new_codebooks(RoqEncContext *enc, RoqTempdata *tempData) +static int generate_new_codebooks(RoqEncContext *enc) { int i, j, ret = 0; - RoqCodebooks *codebooks = &tempData->codebooks; + RoqCodebooks *codebooks = &enc->tmp_data.codebooks; RoqContext *const roq = &enc->common; int max = roq->width * roq->height / 16; uint8_t mb2[3*4]; - roq_cell *results4 = av_malloc(sizeof(roq_cell)*MAX_CBS_4x4*4); int *points = enc->points; - if (!results4) { - ret = AVERROR(ENOMEM); - goto out; - } - /* Subsample YUV data */ create_clusters(enc->frame_to_enc, roq->width, roq->height, points); - /* Create 4x4 codebooks */ - if ((ret = generate_codebook(enc, points, max, - results4, 4, (enc->quake3_compat ? MAX_CBS_4x4-1 : MAX_CBS_4x4))) < 0) - goto out; - codebooks->numCB4 = (enc->quake3_compat ? MAX_CBS_4x4-1 : MAX_CBS_4x4); + /* Create 4x4 codebooks */ + if ((ret = generate_codebook(enc, points, max, enc->results4, + 4, codebooks->numCB4)) < 0) + return ret; + /* Create 2x2 codebooks */ if ((ret = generate_codebook(enc, points, max * 4, roq->cb2x2, 2, MAX_CBS_2x2)) < 0) - goto out; + return ret; codebooks->numCB2 = MAX_CBS_2x2; @@ -884,7 +878,7 @@ static int generate_new_codebooks(RoqEncContext *enc, RoqTempdata *tempData) /* Index all 4x4 entries to the 2x2 entries, unpack, and enlarge */ for (i=0; inumCB4; i++) { for (j=0; j<4; j++) { - unpack_roq_cell(&results4[4*i + j], mb2); + unpack_roq_cell(&enc->results4[4*i + j], mb2); index_mb(mb2, codebooks->unpacked_cb2, codebooks->numCB2, &roq->cb4x4[i].idx[j], 2); } @@ -893,20 +887,19 @@ static int generate_new_codebooks(RoqEncContext *enc, RoqTempdata *tempData) enlarge_roq_mb4(codebooks->unpacked_cb4 + i*4*4*3, codebooks->unpacked_cb4_enlarged + i*8*8*3); } -out: - av_free(results4); - return ret; + + return 0; } static int roq_encode_video(RoqEncContext *enc) { - RoqTempdata *tempData = enc->tmpData; + RoqTempData *const tempData = &enc->tmp_data; RoqContext *const roq = &enc->common; int ret; memset(tempData, 0, sizeof(*tempData)); - ret = generate_new_codebooks(enc, tempData); + ret = generate_new_codebooks(enc); if (ret < 0) return ret; @@ -917,7 +910,7 @@ static int roq_encode_video(RoqEncContext *enc) retry_encode: for (int i = 0; i < roq->width * roq->height / 64; i++) - gather_data_for_cel(enc->cel_evals + i, enc, tempData); + gather_data_for_cel(enc->cel_evals + i, enc); /* Quake 3 can't handle chunks bigger than 65535 bytes */ if (tempData->mainChunkSize/8 > 65535 && enc->quake3_compat) { @@ -940,11 +933,11 @@ static int roq_encode_video(RoqEncContext *enc) goto retry_encode; } - remap_codebooks(enc, tempData); + remap_codebooks(enc); - write_codebooks(enc, tempData); + write_codebooks(enc); - reconstruct_and_encode_image(enc, tempData, roq->width, roq->height, + reconstruct_and_encode_image(enc, roq->width, roq->height, roq->width * roq->height / 64); /* Rotate frame history */ @@ -964,7 +957,6 @@ static av_cold int roq_encode_end(AVCodecContext *avctx) av_frame_free(&enc->common.current_frame); av_frame_free(&enc->common.last_frame); - av_freep(&enc->tmpData); av_freep(&enc->cel_evals); av_freep(&enc->closest_cb); av_freep(&enc->this_motion4); @@ -1009,8 +1001,6 @@ static av_cold int roq_encode_init(AVCodecContext *avctx) if (!roq->last_frame || !roq->current_frame) return AVERROR(ENOMEM); - enc->tmpData = av_malloc(sizeof(RoqTempdata)); - enc->this_motion4 = av_mallocz_array(roq->width * roq->height / 16, sizeof(motion_vect)); @@ -1028,7 +1018,7 @@ static av_cold int roq_encode_init(AVCodecContext *avctx) enc->closest_cb = av_malloc_array(roq->width * roq->height, 3 * sizeof(int)); - if (!enc->tmpData || !enc->this_motion4 || !enc->last_motion4 || + if (!enc->this_motion4 || !enc->last_motion4 || !enc->this_motion8 || !enc->last_motion8 || !enc->closest_cb) return AVERROR(ENOMEM);