From patchwork Wed Aug 16 10:47:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 43235 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4c9d:b0:130:ccc6:6c4b with SMTP id fq29csp1158594pzb; Wed, 16 Aug 2023 03:48:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFC0jurS14Hspk+F9vQsQSHH+PvAlVC0Zt4dlOO0l3TAN8IGF4hF8BrgBOQcqqQlFD3kH+v X-Received: by 2002:a05:6402:494:b0:523:3f1e:68c4 with SMTP id k20-20020a056402049400b005233f1e68c4mr1242887edv.34.1692182886671; Wed, 16 Aug 2023 03:48:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692182886; cv=none; d=google.com; s=arc-20160816; b=o42kEt6s8cgMSjSa7OqZncUBkytJFgghrMuiI+JBSlufroR9v6F80ZOqjZ7DbDVM4x GClzp6wO6djv5afm4ibrv5iR802mhMnhhB82MpjCMHU39/LIaka2VZr9ro/nGuIA4Uow LBKvkyWtwWy9UgGkpo652z97LWOmmV4RlOyJIj2kVxp75dmfz4IyasoPaWF9aSafCWY5 kDj/6rSRQ+Xkv49d88Qcs4IJi5T4NLaXSje8YUtpd0X55IEBj7oGbsWAKIacqQVoMgZv h7wfpIveR+iQG0BDUUONZkoMdGRc/N7wDu7+HQOg6N47XOcMTjEqcGtIG4difjv5JHam TIYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to; bh=UJOna7U5nHu4eo6ob8mL5PLROnp9Yz2mp8YresanYQI=; fh=ghmNPjFTSM6oKC/7k5yo9hnu8V2TFPrY7GkS/gBuc+4=; b=eV+Lhfdd2KVMUz4On4j15dHdviLr1piGHtOoieHzO5Em3UZbTwsKMuYkpcce2awaJG +Ulbk0WmaeLlEDmAJY0WQlpjmPSXFk1fTSsi3emmezKGnbWtEnqok1UzssnRJhVKfvCA UOKm/hijpob+xU2OCzRkH39aU7a07jedwa4G7wqZb66baIdjfgQmUfqRArLg7bsCJH4K cuRVf32CLA0I3EjkzaWdTYXOUEIsN960jvkXHFFT02fvAvNavIex9VBtMLDOrAY6xgqJ dA2QRco6NZqLGo8NSRhIDCjb1Uavan+MM4e6BLX+n8wqHgJxhB8OeplYJp3kQqwBR+lv L1Nw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20221208 header.b=Fig5YvMM; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f25-20020a056402151900b00525714a3207si2975492edw.292.2023.08.16.03.48.05; Wed, 16 Aug 2023 03:48:06 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20221208 header.b=Fig5YvMM; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5C51F680436; Wed, 16 Aug 2023 13:48:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-vk1-f182.google.com (mail-vk1-f182.google.com [209.85.221.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A4DC2680436 for ; Wed, 16 Aug 2023 13:47:54 +0300 (EEST) Received: by mail-vk1-f182.google.com with SMTP id 71dfb90a1353d-4881c30b1b3so567748e0c.1 for ; Wed, 16 Aug 2023 03:47:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692182873; x=1692787673; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=8a3j2LkV1SUlGPt6A+EIcV2frZPjzwlPyR2R1sxlcIQ=; b=Fig5YvMMOZ52n3mEqTvE4+DyIAtGKLplawifpzdZMeEspIlRdSn4wh53N7QNEXGeUV pndJNmhSvmOQoYsXMXl4s/STHaDHkSPmrN5CKjDH8a4Ag3R50h04oKKhpLg5kRRiUmCr fDTVgYF+MRvBzED7e77tdnnZvQaoxB3xGcm4z7LGUKLnvFpeBdvRLKzI2zjHhhDj+jC3 inb02YxOSMdYSlrdXhkdqFeYfilNNBTsk5l3vp+I/YLs0D0o0EVRnWBkrOxoWUlneitw cqQwRBfa9C9RJAJTNvj8Y5Suyjr6ZJS9Jm/V9GZ8a3QFg8R1Vo5zMZ9LhfvrgXS7BzkW H4YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692182873; x=1692787673; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=8a3j2LkV1SUlGPt6A+EIcV2frZPjzwlPyR2R1sxlcIQ=; b=ac8RuUXgD75nTaoHYRd8c2RnR3wldRLu7AtbWMRVgOLcvcr3eS03IgqZw4NniY/Qkd Y8+IWH/yRKH9DtboWAaY6NKfggNf6ESHFf3s/hno3eB1uUk8jKQ5/hmZCbwm/B/N2WZk 5DheXB2k/USapEJIEvmEq3Xg+P2imIRJA5Gjekbl82VwnQWpgZIIPQgfKkzymffaP6Hq eETOO8ba8jLeyHj5BW53ZFEQWiviQEvtNAnOpHJ4vTkOc4choEjAzfasqUxZxzGatwzJ 6ywDNGGEJnT2woaxtjQKkFOd/N7gtmc+XvUkzAgYCdItrFBPuO7Uw/IMckwN5cc/Q+s9 7nWA== X-Gm-Message-State: AOJu0YwBaY8dirpyHqAaqRx9aiiNtyaV5xG21LPKyXZvLhHITP6Cer4z PR0jLdtUtCrac7yAByRGcWodkuwT/XP5tMv1APResrwL X-Received: by 2002:a67:e312:0:b0:443:7935:6eb5 with SMTP id j18-20020a67e312000000b0044379356eb5mr1244907vsf.15.1692182872595; Wed, 16 Aug 2023 03:47:52 -0700 (PDT) MIME-Version: 1.0 From: Paul B Mahol Date: Wed, 16 Aug 2023 12:47:36 +0200 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH] tta decoder improvements X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 5KmOfmZL28kT Patch attached. From 2b6ac4f7093157533b7f279a78a73bfabeb98cf0 Mon Sep 17 00:00:00 2001 From: Paul B Mahol Date: Tue, 15 Aug 2023 21:13:59 +0200 Subject: [PATCH] avcodec/tta: switch to planar sample formats Makes decoding few percent faster. Also fix code style while here. Signed-off-by: Paul B Mahol --- libavcodec/tta.c | 167 +++++++++++++++++++++++++++++++---------------- 1 file changed, 109 insertions(+), 58 deletions(-) diff --git a/libavcodec/tta.c b/libavcodec/tta.c index 3e89571f16..6add4106d3 100644 --- a/libavcodec/tta.c +++ b/libavcodec/tta.c @@ -55,7 +55,7 @@ typedef struct TTAContext { unsigned data_length; int frame_length, last_frame_length; - int32_t *decode_buffer; + int32_t **decode_buffer; uint8_t crc_pass[8]; uint8_t *pass; @@ -107,10 +107,16 @@ static int allocate_buffers(AVCodecContext *avctx) TTAContext *s = avctx->priv_data; if (s->bps < 3) { - s->decode_buffer = av_calloc(s->frame_length, - sizeof(*s->decode_buffer) * s->channels); + s->decode_buffer = av_calloc(s->channels, sizeof(*s->decode_buffer)); if (!s->decode_buffer) return AVERROR(ENOMEM); + + for (int ch = 0; ch < s->channels; ch++) { + s->decode_buffer[ch] = av_calloc(s->frame_length, + sizeof(*s->decode_buffer[ch])); + if (!s->decode_buffer[ch]) + return AVERROR(ENOMEM); + } } else s->decode_buffer = NULL; s->ch_ctx = av_malloc_array(avctx->ch_layout.nb_channels, sizeof(*s->ch_ctx)); @@ -181,14 +187,14 @@ static av_cold int tta_decode_init(AVCodecContext * avctx) } switch(s->bps) { - case 1: avctx->sample_fmt = AV_SAMPLE_FMT_U8; break; + case 1: avctx->sample_fmt = AV_SAMPLE_FMT_U8P; break; case 2: - avctx->sample_fmt = AV_SAMPLE_FMT_S16; + avctx->sample_fmt = AV_SAMPLE_FMT_S16P; break; case 3: - avctx->sample_fmt = AV_SAMPLE_FMT_S32; + avctx->sample_fmt = AV_SAMPLE_FMT_S32P; break; - //case 4: avctx->sample_fmt = AV_SAMPLE_FMT_S32; break; + //case 4: avctx->sample_fmt = AV_SAMPLE_FMT_S32P; break; default: av_log(avctx, AV_LOG_ERROR, "Invalid/unsupported sample format.\n"); return AVERROR_INVALIDDATA; @@ -231,10 +237,10 @@ static int tta_decode_frame(AVCodecContext *avctx, AVFrame *frame, const uint8_t *buf = avpkt->data; int buf_size = avpkt->size; TTAContext *s = avctx->priv_data; + const int bps = s->bps; GetBitContext gb; int i, ret; int cur_chan = 0, framelen = s->frame_length; - uint32_t *p; if (avctx->err_recognition & AV_EF_CRCCHECK) { if (buf_size < 4 || @@ -251,14 +257,13 @@ static int tta_decode_frame(AVCodecContext *avctx, AVFrame *frame, return ret; // decode directly to output buffer for 24-bit sample format - if (s->bps == 3) - s->decode_buffer = (int32_t *)frame->data[0]; + if (bps == 3) + s->decode_buffer = (int32_t **)frame->extended_data; // init per channel states for (i = 0; i < s->channels; i++) { TTAFilter *filter = &s->ch_ctx[i].filter; - s->ch_ctx[i].predictor = 0; - ff_tta_filter_init(filter, ff_tta_filter_configs[s->bps-1]); + ff_tta_filter_init(filter, ff_tta_filter_configs[bps-1]); if (s->format == FORMAT_ENCRYPTED) { int i; for (i = 0; i < 8; i++) @@ -268,9 +273,8 @@ static int tta_decode_frame(AVCodecContext *avctx, AVFrame *frame, } i = 0; - for (p = s->decode_buffer; (int32_t*)p < s->decode_buffer + (framelen * s->channels); p++) { - int32_t *predictor = &s->ch_ctx[cur_chan].predictor; - TTAFilter *filter = &s->ch_ctx[cur_chan].filter; + for (int j = 0; j < framelen * s->channels; j++) { + int32_t *p = s->decode_buffer[cur_chan] + i; TTARice *rice = &s->ch_ctx[cur_chan].rice; uint32_t unary, depth, k; int32_t value; @@ -306,44 +310,24 @@ static int tta_decode_frame(AVCodecContext *avctx, AVFrame *frame, rice->sum1 += value - (rice->sum1 >> 4); if (rice->k1 > 0 && rice->sum1 < ff_tta_shift_16[rice->k1]) rice->k1--; - else if(rice->sum1 > ff_tta_shift_16[rice->k1 + 1]) + else if (rice->sum1 > ff_tta_shift_16[rice->k1 + 1]) rice->k1++; value += ff_tta_shift_1[rice->k0]; default: rice->sum0 += value - (rice->sum0 >> 4); if (rice->k0 > 0 && rice->sum0 < ff_tta_shift_16[rice->k0]) rice->k0--; - else if(rice->sum0 > ff_tta_shift_16[rice->k0 + 1]) + else if (rice->sum0 > ff_tta_shift_16[rice->k0 + 1]) rice->k0++; } // extract coded value *p = 1 + ((value >> 1) ^ ((value & 1) - 1)); - // run hybrid filter - s->dsp.filter_process(filter->qm, filter->dx, filter->dl, &filter->error, p, - filter->shift, filter->round); - - // fixed order prediction -#define PRED(x, k) (int32_t)((((uint64_t)(x) << (k)) - (x)) >> (k)) - switch (s->bps) { - case 1: *p += PRED(*predictor, 4); break; - case 2: - case 3: *p += PRED(*predictor, 5); break; - case 4: *p += *predictor; break; - } - *predictor = *p; - // flip channels if (cur_chan < (s->channels-1)) cur_chan++; else { - // decorrelate in case of multiple channels - if (s->channels > 1) { - int32_t *r = p - 1; - for (*p += *r / 2; r > (int32_t*)p - s->channels; r--) - *r = *(r + 1) - *r; - } cur_chan = 0; i++; // check for last frame @@ -354,6 +338,64 @@ static int tta_decode_frame(AVCodecContext *avctx, AVFrame *frame, } } + // run hybrid filter + for (int ch = 0; ch < s->channels; ch++) { + TTAFilter *filter = &s->ch_ctx[ch].filter; + const int32_t shift = filter->shift; + const int32_t round = filter->round; + int32_t *p = s->decode_buffer[ch]; + int32_t error = filter->error; + int32_t *qm = filter->qm; + int32_t *dx = filter->dx; + int32_t *dl = filter->dl; + + for (int n = 0; n < framelen; n++) { + s->dsp.filter_process(qm, dx, dl, + &error, &p[n], + shift, round); + } + } + + // fixed order prediction +#define PRED(x, k) (int32_t)((((uint64_t)(x) << (k)) - (x)) >> (k)) + for (int ch = 0; ch < s->channels; ch++) { + int32_t *p = s->decode_buffer[ch]; + int32_t predictor = 0; + + switch (bps) { + case 1: + for (int n = 0; n < framelen; n++) { + p[n] += PRED(predictor, 4); + predictor = p[n]; + } + break; + case 2: + case 3: + for (int n = 0; n < framelen; n++) { + p[n] += PRED(predictor, 5); + predictor = p[n]; + } + break; + } + } + + // decorrelate in case of multiple channels + if (s->channels > 1) { + int32_t *a = s->decode_buffer[s->channels-1]; + int32_t *b = s->decode_buffer[s->channels-2]; + + for (int n = 0; n < framelen; n++) + a[n] += b[n] / 2; + + for (int ch = s->channels - 1; ch >= 1; ch--) { + int32_t *b = s->decode_buffer[ch-1]; + int32_t *c = s->decode_buffer[ch ]; + + for (int n = 0; n < framelen; n++) + b[n] = c[n] - b[n]; + } + } + align_get_bits(&gb); if (get_bits_left(&gb) < 32) { ret = AVERROR_INVALIDDATA; @@ -362,31 +404,34 @@ static int tta_decode_frame(AVCodecContext *avctx, AVFrame *frame, skip_bits_long(&gb, 32); // frame crc // convert to output buffer - switch (s->bps) { - case 1: { - uint8_t *samples = (uint8_t *)frame->data[0]; - p = s->decode_buffer; - for (i = 0; i < framelen * s->channels; i++) - samples[i] = p[i] + 0x80; - break; + switch (bps) { + case 1: + for (int ch = 0; ch < s->channels; ch++) { + uint8_t *samples = (uint8_t *)frame->extended_data[ch]; + int32_t *p = s->decode_buffer[ch]; + for (i = 0; i < framelen; i++) + samples[i] = p[i] + 0x80; } - case 2: { - int16_t *samples = (int16_t *)frame->data[0]; - p = s->decode_buffer; - for (i = 0; i < framelen * s->channels; i++) - samples[i] = p[i]; break; + case 2: + for (int ch = 0; ch < s->channels; ch++) { + int16_t *samples = (int16_t *)frame->extended_data[ch]; + int32_t *p = s->decode_buffer[ch]; + for (i = 0; i < framelen; i++) + samples[i] = p[i]; } - case 3: { - // shift samples for 24-bit sample format - int32_t *samples = (int32_t *)frame->data[0]; + break; + case 3: + for (int ch = 0; ch < s->channels; ch++) { + // shift samples for 24-bit sample format + int32_t *samples = (int32_t *)frame->extended_data[ch]; - for (i = 0; i < framelen * s->channels; i++) - samples[i] = samples[i] * 256U; + for (i = 0; i < framelen; i++) + samples[i] = samples[i] * 256U; + } // reset decode buffer s->decode_buffer = NULL; break; - } } *got_frame_ptr = 1; @@ -394,16 +439,22 @@ static int tta_decode_frame(AVCodecContext *avctx, AVFrame *frame, return buf_size; error: // reset decode buffer - if (s->bps == 3) + if (bps == 3) s->decode_buffer = NULL; return ret; } -static av_cold int tta_decode_close(AVCodecContext *avctx) { +static av_cold int tta_decode_close(AVCodecContext *avctx) +{ TTAContext *s = avctx->priv_data; - if (s->bps < 3) + if (s->bps < 3) { + if (s->decode_buffer) { + for (int ch = 0; ch < s->channels; ch++) + av_freep(&s->decode_buffer[ch]); + } av_freep(&s->decode_buffer); + } s->decode_buffer = NULL; av_freep(&s->ch_ctx); -- 2.39.1