From patchwork Wed Aug 14 12:18:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petro Mozil X-Patchwork-Id: 51012 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a746:0:b0:489:2eb3:e4c4 with SMTP id f6csp872080vqm; Wed, 14 Aug 2024 05:26:16 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUcEfTEF3ApTTEY7VWjeYilHdIrWaxpqL6F9hwpvr57lM7Ct0Y+9gltqiEDd2v8b61ZDWTwVw2J5pnqQTtcZ3JK@gmail.com X-Google-Smtp-Source: AGHT+IHIqINkaj922/t4/ne6flMbp6dgOnMF5J8S2z9uxKnqArboBrdeXI4g4xzk2DirR4OI4y/u X-Received: by 2002:a2e:a99f:0:b0:2f1:5c54:7517 with SMTP id 38308e7fff4ca-2f3aa1f51f7mr8649601fa.7.1723638375680; Wed, 14 Aug 2024 05:26:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1723638375; cv=none; d=google.com; s=arc-20160816; b=WseYx+GyMq/d9hY/thH94FKL0xSLrK65K1BgRT+Xup2vZp+bOS9XGV3m62e+WMzubr /t4MOlryd6WBJVvji3QRwJVKPYJFqTWqg5q/+b2QPhes1bZ7U+mbcnEGW2d+AY3uMilT CdrrZvNPom6TlsYJEfCJmzmZ4dtL9eTqTAFGgO5Tms4hkVgnEjFV+t2WHwedsQGcA7DS R1uirrciy6nEWNikdAUV+2hJjmz0M2xf6jm9q4nHLT5PAa5cczqMexZB3xa6uLMeVUtE SA6raCzn1gke7vcArbT0UwZJaR8J1Z001K/ow1SW6laZhZlwkpBUZ02uX5uGG2AgFrwI q3Lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=o0jlwrwbN80YiswKUwNl0oGl8OgzJP+wda6VvPj8Kr8=; fh=m/VMXocG8Ut50td6LydymovRbX/ovRMjrhDjlJBrApg=; b=p/RteURJjZK2BRLqkHx6djBuOWLNftAGXKh0lTA+UaH57+YNJR6Gm8mNvH1BY0ixwx 1RQ+4aLR4OVreX37x4RedAkyPSbqJcjfmjKCv81eXRLaqe3FSOfDAclYfBpusBTw+Lwy CH15wNd/k7EwWelkyfYvcnVoZqpnA/qSU70BahoJkXJdH3PeknfqmVAeDSLinhAbKGnB m1ICbVjzUTMVaKQCabbEhVA0JpWNXrQ6/HoTeE0BRYVGiFPLePxgKhB5ywVYe3tyjEhw yy3PmpR4E58tjG3LYs/engQOi5AO/83O2u9ieBIFXwgmM4hSqP+tS1BxhT2STv87V6QF lTDw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=hSU88ocz; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a80f41ce3b7si200259766b.1043.2024.08.14.05.26.14; Wed, 14 Aug 2024 05:26:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=hSU88ocz; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 059FC68DA87; Wed, 14 Aug 2024 15:20:06 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2710968D9F1 for ; Wed, 14 Aug 2024 15:19:59 +0300 (EEST) Received: by mail-lf1-f51.google.com with SMTP id 2adb3069b0e04-52efbb55d24so11641922e87.1 for ; Wed, 14 Aug 2024 05:19:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723637998; x=1724242798; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5PDOlU1mNv9RqbY4wwq0d4bEOerNIFJ53Y4ShCxXwGA=; b=hSU88oczJ5SFrYrIMdKiwYWf0BzFO+gzs7tNwkD6x7aN/2zDTQc1VRIOhSNFq68QYa 0/J1H80fd376omfdZMmEJea/Sm9yKNyrMfne3Xp1aKnZ9JegD5IizPKqbhql0klXgPRb TbH1dXAjFeAEoos/6daTu2lcTYzP86ryKGdPzfHMDASoUv64J/R76I4h83KGtf0qiw9O pHq3sD9ZJINevKu0w8lSRmJQQmDyj7Gi7OxvTK/wpWBe8B0EaVYa9VL0/sDRCr5LJMeo xUU62qkYTCwZey9hEzRUJo7Fdu+tbLlnhM5+OKNCaIvbv8r+CdpH4iHXxcp8ZvnIgfPe qvjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723637998; x=1724242798; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5PDOlU1mNv9RqbY4wwq0d4bEOerNIFJ53Y4ShCxXwGA=; b=NpNeHgMkunj70sPiPl6wl7ppo8W9Ff5lRQwTyNrtknvsZmjlj9ASLrZO5li9f3Y2hO JLJI3e39JNV9xUsLkFcCtfI4rLB2iATXL4rlLLNHDsTRiUJ3dCgIKfweTRVD29bdy5H0 x7TN3cNvY1zliFfCnAohP8PWKWhu46YgzBjEopOISFeZIgYoCctTUYs0MOxox1N3p8H8 ++RaO4m0jlM/HAMAlyckr8is8r2AsLZDZmheZK91EKjCiIn0EvG1PA+ChXlm6f6srEvl azduPehWjqg3GgB8aXuXi3TdnVRVzICJvJ4f2zJGDvrXcUlbedibk/tjxAiwLdX7GJkK AuYg== X-Gm-Message-State: AOJu0YynAw6FOxL4uIFGrSfylnqA9TVC76MUYegxpDWvEbF5y5qgEMP8 wIHOCuDPcsu9iToJgOn3FAda+deHl+rraV5sYdhu0QoqSCzskuhIg8v5zzuu X-Received: by 2002:a05:6512:3c91:b0:52c:83c7:936a with SMTP id 2adb3069b0e04-532edbaec88mr2093359e87.42.1723637996965; Wed, 14 Aug 2024 05:19:56 -0700 (PDT) Received: from localhost.localdomain (176-106-196-191.point.lviv.ua. [176.106.196.191]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a80f41839ebsm164682366b.214.2024.08.14.05.19.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Aug 2024 05:19:56 -0700 (PDT) From: Petro Mozil To: ffmpeg-devel@ffmpeg.org Date: Wed, 14 Aug 2024 12:18:35 +0000 Message-ID: <20240814121856.757459-2-mozil.petryk@gmail.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: MIME-Version: 1.0 X-Unsent: 1 Subject: [FFmpeg-devel] [PATCH 1/2] Add dirac vulkan hwaccel usage to diracdec.c X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Petro Mozil Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: hktags/4HrOK This patch adds a vc2 hwaccel to ffmpeg. Tested on ffmpeg vc encoder and vc2-conformance software. Here is a set of commands to verify correctness # Encode vid as vc2 ffmpeg -i -vcodec vc2 input.vc2 # Decode with hwaccel ffmpeg -init_hw_device "vulkan=vk:0" -hwaccel vulkan -i input.vc2 output.mkv # Decode without hwaccel ffmpeg -i input.vc2 output.mkv Signed-off-by: Petro Mozil --- libavcodec/diracdec.c | 336 +++++++++++------------------------------- libavcodec/diracdec.h | 267 +++++++++++++++++++++++++++++++++ 2 files changed, 355 insertions(+), 248 deletions(-) create mode 100644 libavcodec/diracdec.h diff --git a/libavcodec/diracdec.c b/libavcodec/diracdec.c index 76209aebba..542824f6e1 100644 --- a/libavcodec/diracdec.c +++ b/libavcodec/diracdec.c @@ -26,228 +26,11 @@ * @author Marco Gerards , David Conrad, Jordi Ortiz */ -#include "libavutil/mem.h" -#include "libavutil/mem_internal.h" -#include "libavutil/pixdesc.h" -#include "libavutil/thread.h" -#include "avcodec.h" -#include "get_bits.h" -#include "codec_internal.h" -#include "decode.h" -#include "golomb.h" -#include "dirac_arith.h" -#include "dirac_vlc.h" -#include "mpegvideoencdsp.h" -#include "dirac_dwt.h" -#include "dirac.h" -#include "diractab.h" -#include "diracdsp.h" -#include "videodsp.h" - -#define EDGE_WIDTH 16 - -/** - * The spec limits this to 3 for frame coding, but in practice can be as high as 6 - */ -#define MAX_REFERENCE_FRAMES 8 -#define MAX_DELAY 5 /* limit for main profile for frame coding (TODO: field coding) */ -#define MAX_FRAMES (MAX_REFERENCE_FRAMES + MAX_DELAY + 1) -#define MAX_QUANT 255 /* max quant for VC-2 */ -#define MAX_BLOCKSIZE 32 /* maximum xblen/yblen we support */ - -/** - * DiracBlock->ref flags, if set then the block does MC from the given ref - */ -#define DIRAC_REF_MASK_REF1 1 -#define DIRAC_REF_MASK_REF2 2 -#define DIRAC_REF_MASK_GLOBAL 4 - -/** - * Value of Picture.reference when Picture is not a reference picture, but - * is held for delayed output. - */ -#define DELAYED_PIC_REF 4 - -#define CALC_PADDING(size, depth) \ - (((size + (1 << depth) - 1) >> depth) << depth) - -#define DIVRNDUP(a, b) (((a) + (b) - 1) / (b)) - -typedef struct { - AVFrame *avframe; - int interpolated[3]; /* 1 if hpel[] is valid */ - uint8_t *hpel[3][4]; - uint8_t *hpel_base[3][4]; - int reference; - unsigned picture_number; -} DiracFrame; - -typedef struct { - union { - int16_t mv[2][2]; - int16_t dc[3]; - } u; /* anonymous unions aren't in C99 :( */ - uint8_t ref; -} DiracBlock; - -typedef struct SubBand { - int level; - int orientation; - int stride; /* in bytes */ - int width; - int height; - int pshift; - int quant; - uint8_t *ibuf; - struct SubBand *parent; - - /* for low delay */ - unsigned length; - const uint8_t *coeff_data; -} SubBand; - -typedef struct Plane { - DWTPlane idwt; - - int width; - int height; - ptrdiff_t stride; - - /* block length */ - uint8_t xblen; - uint8_t yblen; - /* block separation (block n+1 starts after this many pixels in block n) */ - uint8_t xbsep; - uint8_t ybsep; - /* amount of overspill on each edge (half of the overlap between blocks) */ - uint8_t xoffset; - uint8_t yoffset; - - SubBand band[MAX_DWT_LEVELS][4]; -} Plane; - -/* Used by Low Delay and High Quality profiles */ -typedef struct DiracSlice { - GetBitContext gb; - int slice_x; - int slice_y; - int bytes; -} DiracSlice; - -typedef struct DiracContext { - AVCodecContext *avctx; - MpegvideoEncDSPContext mpvencdsp; - VideoDSPContext vdsp; - DiracDSPContext diracdsp; - DiracVersionInfo version; - GetBitContext gb; - AVDiracSeqHeader seq; - int seen_sequence_header; - int64_t frame_number; /* number of the next frame to display */ - Plane plane[3]; - int chroma_x_shift; - int chroma_y_shift; - - int bit_depth; /* bit depth */ - int pshift; /* pixel shift = bit_depth > 8 */ - - int zero_res; /* zero residue flag */ - int is_arith; /* whether coeffs use arith or golomb coding */ - int core_syntax; /* use core syntax only */ - int low_delay; /* use the low delay syntax */ - int hq_picture; /* high quality picture, enables low_delay */ - int ld_picture; /* use low delay picture, turns on low_delay */ - int dc_prediction; /* has dc prediction */ - int globalmc_flag; /* use global motion compensation */ - int num_refs; /* number of reference pictures */ - - /* wavelet decoding */ - unsigned wavelet_depth; /* depth of the IDWT */ - unsigned wavelet_idx; - - /** - * schroedinger older than 1.0.8 doesn't store - * quant delta if only one codebook exists in a band - */ - unsigned old_delta_quant; - unsigned codeblock_mode; - - unsigned num_x; /* number of horizontal slices */ - unsigned num_y; /* number of vertical slices */ - - uint8_t *thread_buf; /* Per-thread buffer for coefficient storage */ - int threads_num_buf; /* Current # of buffers allocated */ - int thread_buf_size; /* Each thread has a buffer this size */ - - DiracSlice *slice_params_buf; - int slice_params_num_buf; - - struct { - unsigned width; - unsigned height; - } codeblock[MAX_DWT_LEVELS+1]; - - struct { - AVRational bytes; /* average bytes per slice */ - uint8_t quant[MAX_DWT_LEVELS][4]; /* [DIRAC_STD] E.1 */ - } lowdelay; - - struct { - unsigned prefix_bytes; - uint64_t size_scaler; - } highquality; - - struct { - int pan_tilt[2]; /* pan/tilt vector */ - int zrs[2][2]; /* zoom/rotate/shear matrix */ - int perspective[2]; /* perspective vector */ - unsigned zrs_exp; - unsigned perspective_exp; - } globalmc[2]; - - /* motion compensation */ - uint8_t mv_precision; /* [DIRAC_STD] REFS_WT_PRECISION */ - int16_t weight[2]; /* [DIRAC_STD] REF1_WT and REF2_WT */ - unsigned weight_log2denom; /* [DIRAC_STD] REFS_WT_PRECISION */ - - int blwidth; /* number of blocks (horizontally) */ - int blheight; /* number of blocks (vertically) */ - int sbwidth; /* number of superblocks (horizontally) */ - int sbheight; /* number of superblocks (vertically) */ - - uint8_t *sbsplit; - DiracBlock *blmotion; - - uint8_t *edge_emu_buffer[4]; - uint8_t *edge_emu_buffer_base; - - uint16_t *mctmp; /* buffer holding the MC data multiplied by OBMC weights */ - uint8_t *mcscratch; - int buffer_stride; - - DECLARE_ALIGNED(16, uint8_t, obmc_weight)[3][MAX_BLOCKSIZE*MAX_BLOCKSIZE]; - - void (*put_pixels_tab[4])(uint8_t *dst, const uint8_t *src[5], int stride, int h); - void (*avg_pixels_tab[4])(uint8_t *dst, const uint8_t *src[5], int stride, int h); - void (*add_obmc)(uint16_t *dst, const uint8_t *src, int stride, const uint8_t *obmc_weight, int yblen); - dirac_weight_func weight_func; - dirac_biweight_func biweight_func; - - DiracFrame *current_picture; - DiracFrame *ref_pics[2]; - - DiracFrame *ref_frames[MAX_REFERENCE_FRAMES+1]; - DiracFrame *delay_frames[MAX_DELAY+1]; - DiracFrame all_frames[MAX_FRAMES]; -} DiracContext; - -enum dirac_subband { - subband_ll = 0, - subband_hl = 1, - subband_lh = 2, - subband_hh = 3, - subband_nb, -}; +#include "diracdec.h" +#include "hwaccels.h" +#include "hwconfig.h" +#include "libavutil/imgutils.c" +#include "config_components.h" /* magic number division by 3 from schroedinger */ static inline int divide3(int x) @@ -351,7 +134,7 @@ static int alloc_buffers(DiracContext *s, int stride) return 0; } -static av_cold void free_sequence_buffers(DiracContext *s) +static void free_sequence_buffers(DiracContext *s) { int i, j, k; @@ -403,8 +186,11 @@ static av_cold int dirac_decode_init(AVCodecContext *avctx) for (i = 0; i < MAX_FRAMES; i++) { s->all_frames[i].avframe = av_frame_alloc(); - if (!s->all_frames[i].avframe) + if (!s->all_frames[i].avframe) { + while (i > 0) + av_frame_free(&s->all_frames[--i].avframe); return AVERROR(ENOMEM); + } } ret = ff_thread_once(&dirac_arith_init, ff_dirac_init_arith_tables); if (ret != 0) @@ -413,7 +199,7 @@ static av_cold int dirac_decode_init(AVCodecContext *avctx) return 0; } -static av_cold void dirac_decode_flush(AVCodecContext *avctx) +static void dirac_decode_flush(AVCodecContext *avctx) { DiracContext *s = avctx->priv_data; free_sequence_buffers(s); @@ -426,9 +212,7 @@ static av_cold int dirac_decode_end(AVCodecContext *avctx) DiracContext *s = avctx->priv_data; int i; - // Necessary in case dirac_decode_init() failed - if (s->all_frames[MAX_FRAMES - 1].avframe) - free_sequence_buffers(s); + dirac_decode_flush(avctx); for (i = 0; i < MAX_FRAMES; i++) av_frame_free(&s->all_frames[i].avframe); @@ -812,14 +596,6 @@ static int decode_lowdelay_slice(AVCodecContext *avctx, void *arg) return 0; } -typedef struct SliceCoeffs { - int left; - int top; - int tot_h; - int tot_v; - int tot; -} SliceCoeffs; - static int subband_coeffs(const DiracContext *s, int x, int y, int p, SliceCoeffs c[MAX_DWT_LEVELS]) { @@ -1006,7 +782,10 @@ static int decode_lowdelay(DiracContext *s) return AVERROR_INVALIDDATA; } - avctx->execute2(avctx, decode_hq_slice_row, slices, NULL, s->num_y); + if (avctx->hwaccel) + FF_HW_CALL(avctx, decode_slice, NULL, 0); + else + avctx->execute2(avctx, decode_hq_slice_row, slices, NULL, s->num_y); } else { for (slice_y = 0; bufsize > 0 && slice_y < s->num_y; slice_y++) { for (slice_x = 0; bufsize > 0 && slice_x < s->num_x; slice_x++) { @@ -1873,7 +1652,13 @@ static int dirac_decode_frame_internal(DiracContext *s) { DWTContext d; int y, i, comp, dsty; - int ret; + int ret = -1; + + if (s->avctx->hwaccel) { + ret = FF_HW_CALL(s->avctx, start_frame, NULL, 0); + if (ret < 0) + return ret; + } if (s->low_delay) { /* [DIRAC_STD] 13.5.1 low_delay_transform_data() */ @@ -1889,6 +1674,14 @@ static int dirac_decode_frame_internal(DiracContext *s) } } + if (s->avctx->hwaccel) { + ret = ffhwaccel(s->avctx->hwaccel)->end_frame(s->avctx); + if (ret == 0) { + /* Hwaccel failed - fall back on software decoder */ + } + return ret; + } + for (comp = 0; comp < 3; comp++) { Plane *p = &s->plane[comp]; uint8_t *frame = s->current_picture->avframe->data[comp]; @@ -1904,6 +1697,7 @@ static int dirac_decode_frame_internal(DiracContext *s) if (ret < 0) return ret; } + ret = ff_spatial_idwt_init(&d, &p->idwt, s->wavelet_idx+2, s->wavelet_depth, s->bit_depth); if (ret < 0) @@ -1970,15 +1764,23 @@ static int get_buffer_with_edge(AVCodecContext *avctx, AVFrame *f, int flags) { int ret, i; int chroma_x_shift, chroma_y_shift; - ret = av_pix_fmt_get_chroma_sub_sample(avctx->pix_fmt, &chroma_x_shift, + DiracContext *s = avctx->priv_data; + ret = av_pix_fmt_get_chroma_sub_sample(s->sof_pix_fmt, &chroma_x_shift, &chroma_y_shift); if (ret < 0) return ret; + /*if (avctx->hwaccel) {*/ + /* f->width = s->plane[0].width;*/ + /* f->height = s->plane[0].height;*/ + /* ret = ff_get_buffer(avctx, f, flags);*/ + /* return ret;*/ + /*}*/ + f->width = avctx->width + 2 * EDGE_WIDTH; f->height = avctx->height + 2 * EDGE_WIDTH + 2; ret = ff_get_buffer(avctx, f, flags); - if (ret < 0) + if (ret < 0 || avctx->hwaccel) return ret; for (i = 0; f->data[i]; i++) { @@ -2136,6 +1938,7 @@ static int dirac_decode_data_unit(AVCodecContext *avctx, const uint8_t *buf, int init_get_bits(&s->gb, &buf[13], 8*(size - DATA_UNIT_HEADER_SIZE)); if (parse_code == DIRAC_PCODE_SEQ_HEADER) { + enum AVPixelFormat *pix_fmts; if (s->seen_sequence_header) return 0; @@ -2156,6 +1959,7 @@ static int dirac_decode_data_unit(AVCodecContext *avctx, const uint8_t *buf, int } ff_set_sar(avctx, dsh->sample_aspect_ratio); + s->sof_pix_fmt = dsh->pix_fmt; avctx->pix_fmt = dsh->pix_fmt; avctx->color_range = dsh->color_range; avctx->color_trc = dsh->color_trc; @@ -2172,7 +1976,20 @@ static int dirac_decode_data_unit(AVCodecContext *avctx, const uint8_t *buf, int s->pshift = s->bit_depth > 8; - ret = av_pix_fmt_get_chroma_sub_sample(avctx->pix_fmt, + /*if (s->pshift) {*/ + /* avctx->pix_fmt = s->sof_pix_fmt;*/ + /*} else {*/ + pix_fmts = (enum AVPixelFormat[]){ +#if CONFIG_DIRAC_VULKAN_HWACCEL + AV_PIX_FMT_VULKAN, +#endif + s->sof_pix_fmt, + AV_PIX_FMT_NONE, + }; + avctx->pix_fmt = ff_get_format(s->avctx, pix_fmts); + /*}*/ + + ret = av_pix_fmt_get_chroma_sub_sample(s->sof_pix_fmt, &s->chroma_x_shift, &s->chroma_y_shift); if (ret < 0) @@ -2202,9 +2019,10 @@ static int dirac_decode_data_unit(AVCodecContext *avctx, const uint8_t *buf, int } /* find an unused frame */ - for (i = 0; i < MAX_FRAMES; i++) + for (i = 0; i < MAX_FRAMES; i++) if (s->all_frames[i].avframe->data[0] == NULL) pic = &s->all_frames[i]; + if (!pic) { av_log(avctx, AV_LOG_ERROR, "framelist full\n"); return AVERROR_INVALIDDATA; @@ -2244,12 +2062,28 @@ static int dirac_decode_data_unit(AVCodecContext *avctx, const uint8_t *buf, int if ((ret = get_buffer_with_edge(avctx, pic->avframe, (parse_code & 0x0C) == 0x0C ? AV_GET_BUFFER_FLAG_REF : 0)) < 0) return ret; s->current_picture = pic; - s->plane[0].stride = pic->avframe->linesize[0]; - s->plane[1].stride = pic->avframe->linesize[1]; - s->plane[2].stride = pic->avframe->linesize[2]; - if (alloc_buffers(s, FFMAX3(FFABS(s->plane[0].stride), FFABS(s->plane[1].stride), FFABS(s->plane[2].stride))) < 0) - return AVERROR(ENOMEM); + if (s->avctx->hwaccel) { + if (!(s->low_delay && s->hq_picture)) { + av_log(avctx, AV_LOG_ERROR, "The HWaccel only supports VC-2\n"); + return AVERROR_INVALIDDATA; + } + + if (!s->hwaccel_picture_private) { + const FFHWAccel *hwaccel = ffhwaccel(s->avctx->hwaccel); + s->hwaccel_picture_private = + av_mallocz(hwaccel->frame_priv_data_size); + if (!s->hwaccel_picture_private) + return AVERROR(ENOMEM); + } + } else { + s->plane[0].stride = pic->avframe->linesize[0]; + s->plane[1].stride = pic->avframe->linesize[1]; + s->plane[2].stride = pic->avframe->linesize[2]; + + if (alloc_buffers(s, FFMAX3(FFABS(s->plane[0].stride), FFABS(s->plane[1].stride), FFABS(s->plane[2].stride))) < 0) + return AVERROR(ENOMEM); + } /* [DIRAC_STD] 11.1 Picture parse. picture_parse() */ ret = dirac_decode_picture_header(s); @@ -2359,6 +2193,7 @@ static int dirac_decode_frame(AVCodecContext *avctx, AVFrame *picture, return buf_idx; } + const FFCodec ff_dirac_decoder = { .p.name = "dirac", CODEC_LONG_NAME("BBC Dirac VC-2"), @@ -2370,5 +2205,10 @@ const FFCodec ff_dirac_decoder = { FF_CODEC_DECODE_CB(dirac_decode_frame), .p.capabilities = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_SLICE_THREADS | AV_CODEC_CAP_DR1, .flush = dirac_decode_flush, - .caps_internal = FF_CODEC_CAP_INIT_CLEANUP, + .hw_configs = (const AVCodecHWConfigInternal *const []) { +#if CONFIG_DIRAC_VULKAN_HWACCEL + HWACCEL_VULKAN(dirac), +#endif + NULL + }, }; diff --git a/libavcodec/diracdec.h b/libavcodec/diracdec.h new file mode 100644 index 0000000000..9c8dc14127 --- /dev/null +++ b/libavcodec/diracdec.h @@ -0,0 +1,267 @@ +/* + * Copyright (C) 2007 Marco Gerards + * Copyright (C) 2009 David Conrad + * Copyright (C) 2011 Jordi Ortiz + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * Dirac Decoder + * @author Marco Gerards , David Conrad, Jordi Ortiz + */ + + +#ifndef DIRACDEC_H +#define DIRACDEC_H + +#include "libavutil/mem.h" +#include "libavutil/mem_internal.h" +#include "libavutil/pixdesc.h" +#include "libavutil/thread.h" +#include "avcodec.h" +#include "get_bits.h" +#include "codec_internal.h" +#include "decode.h" +#include "golomb.h" +#include "dirac_arith.h" +#include "dirac_vlc.h" +#include "mpegvideoencdsp.h" +#include "dirac_dwt.h" +#include "dirac.h" +#include "diractab.h" +#include "diracdsp.h" +#include "videodsp.h" +#include "hwaccel_internal.h" + +#define EDGE_WIDTH 16 + +/** + * The spec limits this to 3 for frame coding, but in practice can be as high as 6 + */ +#define MAX_REFERENCE_FRAMES 8 +#define MAX_DELAY 5 /* limit for main profile for frame coding (TODO: field coding) */ +#define MAX_FRAMES (MAX_REFERENCE_FRAMES + MAX_DELAY + 1) +#define MAX_QUANT 255 /* max quant for VC-2 */ +#define MAX_BLOCKSIZE 32 /* maximum xblen/yblen we support */ + +/** + * DiracBlock->ref flags, if set then the block does MC from the given ref + */ +#define DIRAC_REF_MASK_REF1 1 +#define DIRAC_REF_MASK_REF2 2 +#define DIRAC_REF_MASK_GLOBAL 4 + +/** + * Value of Picture.reference when Picture is not a reference picture, but + * is held for delayed output. + */ +#define DELAYED_PIC_REF 4 + +#define CALC_PADDING(size, depth) \ + (((size + (1 << depth) - 1) >> depth) << depth) + +#define DIVRNDUP(a, b) (((a) + (b) - 1) / (b)) + +typedef struct { + AVFrame *avframe; + int interpolated[3]; /* 1 if hpel[] is valid */ + uint8_t *hpel[3][4]; + uint8_t *hpel_base[3][4]; + int reference; + unsigned picture_number; +} DiracFrame; + +typedef struct { + union { + int16_t mv[2][2]; + int16_t dc[3]; + } u; /* anonymous unions aren't in C99 :( */ + uint8_t ref; +} DiracBlock; + +typedef struct SubBand { + int level; + int orientation; + int stride; /* in bytes */ + int width; + int height; + int pshift; + int quant; + uint8_t *ibuf; + struct SubBand *parent; + + /* for low delay */ + unsigned length; + const uint8_t *coeff_data; +} SubBand; + +typedef struct Plane { + DWTPlane idwt; + + int width; + int height; + ptrdiff_t stride; + + /* block length */ + uint8_t xblen; + uint8_t yblen; + /* block separation (block n+1 starts after this many pixels in block n) */ + uint8_t xbsep; + uint8_t ybsep; + /* amount of overspill on each edge (half of the overlap between blocks) */ + uint8_t xoffset; + uint8_t yoffset; + + SubBand band[MAX_DWT_LEVELS][4]; +} Plane; + +/* Used by Low Delay and High Quality profiles */ +typedef struct DiracSlice { + GetBitContext gb; + int slice_x; + int slice_y; + int bytes; +} DiracSlice; + +typedef struct DiracContext { + AVCodecContext *avctx; + MpegvideoEncDSPContext mpvencdsp; + VideoDSPContext vdsp; + DiracDSPContext diracdsp; + DiracVersionInfo version; + GetBitContext gb; + AVDiracSeqHeader seq; + enum AVPixelFormat sof_pix_fmt; + void *hwaccel_picture_private; + int seen_sequence_header; + int64_t frame_number; /* number of the next frame to display */ + Plane plane[3]; + int chroma_x_shift; + int chroma_y_shift; + + int bit_depth; /* bit depth */ + int pshift; /* pixel shift = bit_depth > 8 */ + + int zero_res; /* zero residue flag */ + int is_arith; /* whether coeffs use arith or golomb coding */ + int core_syntax; /* use core syntax only */ + int low_delay; /* use the low delay syntax */ + int hq_picture; /* high quality picture, enables low_delay */ + int ld_picture; /* use low delay picture, turns on low_delay */ + int dc_prediction; /* has dc prediction */ + int globalmc_flag; /* use global motion compensation */ + int num_refs; /* number of reference pictures */ + + /* wavelet decoding */ + unsigned wavelet_depth; /* depth of the IDWT */ + unsigned wavelet_idx; + + /** + * schroedinger older than 1.0.8 doesn't store + * quant delta if only one codebook exists in a band + */ + unsigned old_delta_quant; + unsigned codeblock_mode; + + unsigned num_x; /* number of horizontal slices */ + unsigned num_y; /* number of vertical slices */ + + uint8_t *thread_buf; /* Per-thread buffer for coefficient storage */ + int threads_num_buf; /* Current # of buffers allocated */ + int thread_buf_size; /* Each thread has a buffer this size */ + + DiracSlice *slice_params_buf; + int slice_params_num_buf; + + struct { + unsigned width; + unsigned height; + } codeblock[MAX_DWT_LEVELS+1]; + + struct { + AVRational bytes; /* average bytes per slice */ + uint8_t quant[MAX_DWT_LEVELS][4]; /* [DIRAC_STD] E.1 */ + } lowdelay; + + struct { + unsigned prefix_bytes; + uint64_t size_scaler; + } highquality; + + struct { + int pan_tilt[2]; /* pan/tilt vector */ + int zrs[2][2]; /* zoom/rotate/shear matrix */ + int perspective[2]; /* perspective vector */ + unsigned zrs_exp; + unsigned perspective_exp; + } globalmc[2]; + + /* motion compensation */ + uint8_t mv_precision; /* [DIRAC_STD] REFS_WT_PRECISION */ + int16_t weight[2]; /* [DIRAC_STD] REF1_WT and REF2_WT */ + unsigned weight_log2denom; /* [DIRAC_STD] REFS_WT_PRECISION */ + + int blwidth; /* number of blocks (horizontally) */ + int blheight; /* number of blocks (vertically) */ + int sbwidth; /* number of superblocks (horizontally) */ + int sbheight; /* number of superblocks (vertically) */ + + uint8_t *sbsplit; + DiracBlock *blmotion; + + uint8_t *edge_emu_buffer[4]; + uint8_t *edge_emu_buffer_base; + + uint16_t *mctmp; /* buffer holding the MC data multiplied by OBMC weights */ + uint8_t *mcscratch; + int buffer_stride; + + DECLARE_ALIGNED(16, uint8_t, obmc_weight)[3][MAX_BLOCKSIZE*MAX_BLOCKSIZE]; + + void (*put_pixels_tab[4])(uint8_t *dst, const uint8_t *src[5], int stride, int h); + void (*avg_pixels_tab[4])(uint8_t *dst, const uint8_t *src[5], int stride, int h); + void (*add_obmc)(uint16_t *dst, const uint8_t *src, int stride, const uint8_t *obmc_weight, int yblen); + dirac_weight_func weight_func; + dirac_biweight_func biweight_func; + + DiracFrame *current_picture; + DiracFrame *ref_pics[2]; + + DiracFrame *ref_frames[MAX_REFERENCE_FRAMES+1]; + DiracFrame *delay_frames[MAX_DELAY+1]; + DiracFrame all_frames[MAX_FRAMES]; +} DiracContext; + +enum dirac_subband { + subband_ll = 0, + subband_hl = 1, + subband_lh = 2, + subband_hh = 3, + subband_nb, +}; + +typedef struct SliceCoeffs { + int left; + int top; + int tot_h; + int tot_v; + int tot; +} SliceCoeffs; + +#endif