From patchwork Sun Oct 20 20:05:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Niklas Haas X-Patchwork-Id: 52416 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:143:b0:48e:c0f8:d0de with SMTP id h3csp2246466vqi; Sun, 20 Oct 2024 13:34:04 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXGv06u4g9sbMi0SZcNSR6ZmNo/atB8YQQF9ImAF7GiK9ZrC+bnpWHu3v3YStJoLOlO+Ndxu+De+lFbxaX8CzHN@gmail.com X-Google-Smtp-Source: AGHT+IHsiEuh9ualM8h8uA2X4XyK2OoOji4sE19LafEayHV3fYc4aTJ+yzZVMUTLecBCpZb8H4aU X-Received: by 2002:a05:6512:3184:b0:539:fcf0:268e with SMTP id 2adb3069b0e04-53a0c6dae70mr4336662e87.14.1729456444152; Sun, 20 Oct 2024 13:34:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1729456444; cv=none; d=google.com; s=arc-20240605; b=L1bWp6yEWfwJXyvRftT6+KdExU/rxaHdWqRlX0rbNZZ21UQNv9h9J1pvORa7D5iLqx 5XNVSOowshHG8zxquDEtUIg40wUQYQa+H5pNw2pB7+yRm4QMS5NHL23vpunMu6EoRWW9 B6/zvPLfLhvo/PcS3zpPv8toTTVHqatJ6dFzybnpClGmoAGR/P8ug+2iZkuSVNsk9AfE 4yqLedy4ibIMXplegl7dQd1u4TKd9Hf92jkXLqwo6GOR6rf0z4OyHnSmpgTZrppQgGWf PE2TSHcla7b99+38PqNQgUbkVBkc6X0HyjdUAMOUh4ZIIQurZRrgtBgGjOzCH5fAAqTd eUxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=RaKsh2SVqCRUnvY+9dnYCKJddDNjF96i/KMNEuDEeR8=; fh=xmAeKtysnShNOmkhiJmYkS30uw4Fu2hvBJ7qlIwukxQ=; b=bz5y6upLwTcr4YVKuahAvpxuxsReRllxwZK/KJDX1Cn50okyppuVQnZ0ZtvExtyEuB WIj5QA6wxwAZvehHC1NOWH5OlQN5YCXry5tqsBrL7q6Ey0OdTTGBRllmisGRDvvjbB1A jvY7kur6E7rpT17obNOD6Ol11b1mkomd6HCvY6wgJ4HqigyXjQTGcGyo6vnmwz0Vr+/l VVrT0jrZeM9EYgrHytjZgt53AcQBKdKMiGSBVbotPdfsaDXJu/5NYtEz27aMzjWAcOoF OLokobrCMqDxqRoeFLGwLtwNkgn9mmw5zX6OOzOpC9KYiQKhQZ+fg7ffHjxHGSoW/pEg oeOQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@haasn.xyz header.s=mail header.b=QhE8mbLM; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-53a2242626csi669953e87.274.2024.10.20.13.34.03; Sun, 20 Oct 2024 13:34:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@haasn.xyz header.s=mail header.b=QhE8mbLM; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B692968DCBE; Sun, 20 Oct 2024 23:09:10 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from haasn.dev (haasn.dev [78.46.187.166]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C0FDE68DB29 for ; Sun, 20 Oct 2024 23:09:00 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=haasn.xyz; s=mail; t=1729454938; bh=pjaYtKNAbYT3dfEWDIPcdztYO0oIde+zZVq5MIwkMTk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QhE8mbLMHoB+dswsXlNm7WgTuZyKBPi6djdA38GsztA1UrqhJK2KOs2eZILJVwJEH SC2JmKdEph4XarLaMItLKZ4ksOGZBTWMEStD15XAqtva8l46IE4pwIj9KnFOrWaLWd c1FywX798COuKDdctR6i/n8I2tBj1QkDUy15wDnI= Received: from haasn.dev (unknown [10.30.0.2]) by haasn.dev (Postfix) with ESMTP id 2B21C4BE8E; Sun, 20 Oct 2024 22:08:58 +0200 (CEST) From: Niklas Haas To: ffmpeg-devel@ffmpeg.org Date: Sun, 20 Oct 2024 22:05:24 +0200 Message-ID: <20241020200851.1414766-16-ffmpeg@haasn.xyz> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241020200851.1414766-1-ffmpeg@haasn.xyz> References: <20241020200851.1414766-1-ffmpeg@haasn.xyz> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 15/18] swscale: introduce new, dynamic scaling API X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Niklas Haas Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: tvACZk61hzww From: Niklas Haas As part of a larger, ongoing effort to modernize and partially rewrite libswscale, it was decided and generally agreed upon to introduce a new public API for libswscale. This API is designed to be less stateful, more explicitly defined, and considerably easier to use than the existing one. Most of the API work has been already accomplished in the previous commits, this commit merely introduces the ability to use sws_scale_frame() dynamically, without prior sws_init_context() calls. Instead, the new API takes frame properties from the frames themselves, and the implementation is based on the new SwsGraph API, which we simply reinitialize as needed. This high-level wrapper also recreates the logic that used to live inside vf_scale for scaling interlaced frames, enabling it to be reused more easily by end users. Finally, this function is designed to simply copy refs directly when nothing needs to be done, substantially improving throughput of the noop fast path. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas --- libswscale/swscale.c | 196 ++++++++++++++++++++++++++++++++-- libswscale/swscale.h | 89 +++++++++++---- libswscale/swscale_internal.h | 7 +- libswscale/utils.c | 4 + libswscale/x86/output.asm | 2 +- 5 files changed, 269 insertions(+), 29 deletions(-) diff --git a/libswscale/swscale.c b/libswscale/swscale.c index bb4faaa708..7751123ba2 100644 --- a/libswscale/swscale.c +++ b/libswscale/swscale.c @@ -1209,21 +1209,205 @@ int sws_receive_slice(SwsContext *sws, unsigned int slice_start, dst, c->frame_dst->linesize, slice_start, slice_height); } +static void get_frame_pointers(const AVFrame *frame, uint8_t *data[4], + int linesize[4], int field) +{ + for (int i = 0; i < 4; i++) { + data[i] = frame->data[i]; + linesize[i] = frame->linesize[i]; + } + + if (!(frame->flags & AV_FRAME_FLAG_INTERLACED)) { + av_assert1(!field); + return; + } + + if (field == FIELD_BOTTOM) { + /* Odd rows, offset by one line */ + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(frame->format); + for (int i = 0; i < 4; i++) { + data[i] += linesize[i]; + if (desc->flags & AV_PIX_FMT_FLAG_PAL) + break; + } + } + + /* Take only every second line */ + for (int i = 0; i < 4; i++) + linesize[i] <<= 1; +} + +/* Subset of av_frame_ref() that only references (video) data buffers */ +static int frame_ref(AVFrame *dst, const AVFrame *src) +{ + /* ref the buffers */ + for (int i = 0; i < FF_ARRAY_ELEMS(src->buf); i++) { + if (!src->buf[i]) + continue; + dst->buf[i] = av_buffer_ref(src->buf[i]); + if (!dst->buf[i]) + return AVERROR(ENOMEM); + } + + memcpy(dst->data, src->data, sizeof(src->data)); + memcpy(dst->linesize, src->linesize, sizeof(src->linesize)); + return 0; +} + int sws_scale_frame(SwsContext *sws, AVFrame *dst, const AVFrame *src) { int ret; + SwsInternal *c = sws_internal(sws); + if (!src || !dst) + return AVERROR(EINVAL); + + if (c->frame_src) { + /* Context has been initialized with explicit values, fall back to + * legacy API */ + ret = sws_frame_start(sws, dst, src); + if (ret < 0) + return ret; + + ret = sws_send_slice(sws, 0, src->height); + if (ret >= 0) + ret = sws_receive_slice(sws, 0, dst->height); - ret = sws_frame_start(sws, dst, src); + sws_frame_end(sws); + + return ret; + } + + ret = sws_frame_setup(sws, dst, src); if (ret < 0) return ret; - ret = sws_send_slice(sws, 0, src->height); - if (ret >= 0) - ret = sws_receive_slice(sws, 0, dst->height); + if (!src->data[0]) + return 0; - sws_frame_end(sws); + if (c->graph[FIELD_TOP]->noop && + (!c->graph[FIELD_BOTTOM] || c->graph[FIELD_BOTTOM]->noop) && + src->buf[0] && !dst->buf[0] && !dst->data[0]) + { + /* Lightweight refcopy */ + ret = frame_ref(dst, src); + if (ret < 0) + return ret; + } else { + if (!dst->data[0]) { + ret = av_frame_get_buffer(dst, 0); + if (ret < 0) + return ret; + } - return ret; + for (int field = 0; field < 2; field++) { + SwsGraph *graph = c->graph[field]; + uint8_t *dst_data[4], *src_data[4]; + int dst_linesize[4], src_linesize[4]; + get_frame_pointers(dst, dst_data, dst_linesize, field); + get_frame_pointers(src, src_data, src_linesize, field); + sws_graph_run(graph, dst_data, dst_linesize, + (const uint8_t **) src_data, src_linesize); + if (!graph->dst.interlaced) + break; + } + } + + return 0; +} + +static int validate_params(SwsContext *ctx) +{ +#define VALIDATE(field, min, max) \ + if (ctx->field < min || ctx->field > max) { \ + av_log(ctx, AV_LOG_ERROR, "'%s' (%d) out of range [%d, %d]\n", \ + #field, (int) ctx->field, min, max); \ + return AVERROR(EINVAL); \ + } + + VALIDATE(threads, 0, 8192); + VALIDATE(dither, 0, SWS_DITHER_NB - 1) + VALIDATE(alpha_blend, 0, SWS_ALPHA_BLEND_NB - 1) + return 0; +} + +int sws_frame_setup(SwsContext *ctx, const AVFrame *dst, const AVFrame *src) +{ + SwsInternal *s = ctx->internal; + const char *err_msg; + int ret; + + if (!src || !dst) + return AVERROR(EINVAL); + if ((ret = validate_params(ctx)) < 0) + return ret; + + for (int field = 0; field < 2; field++) { + SwsFormat src_fmt = ff_fmt_from_frame(src, field); + SwsFormat dst_fmt = ff_fmt_from_frame(dst, field); + + if ((src->flags ^ dst->flags) & AV_FRAME_FLAG_INTERLACED) { + err_msg = "Cannot convert interlaced to progressive frames or vice versa.\n"; + ret = AVERROR(EINVAL); + goto fail; + } + + /* TODO: remove once implemented */ + if ((dst_fmt.prim != src_fmt.prim || dst_fmt.trc != src_fmt.trc) && + !s->color_conversion_warned) + { + av_log(ctx, AV_LOG_WARNING, "Conversions between different primaries / " + "transfer functions are not currently implemented, expect " + "wrong results.\n"); + s->color_conversion_warned = 1; + } + + if (!ff_test_fmt(&src_fmt, 0)) { + err_msg = "Unsupported input"; + ret = AVERROR(ENOTSUP); + goto fail; + } + + if (!ff_test_fmt(&dst_fmt, 1)) { + err_msg = "Unsupported output"; + ret = AVERROR(ENOTSUP); + goto fail; + } + + ret = sws_graph_reinit(ctx, &dst_fmt, &src_fmt, field, &s->graph[field]); + if (ret < 0) { + err_msg = "Failed initializing scaling graph"; + goto fail; + } + + if (s->graph[field]->incomplete && ctx->flags & SWS_STRICT) { + err_msg = "Incomplete scaling graph"; + ret = AVERROR(EINVAL); + goto fail; + } + + if (!src_fmt.interlaced) { + sws_graph_free(&s->graph[FIELD_BOTTOM]); + break; + } + + continue; + + fail: + av_log(ctx, AV_LOG_ERROR, "%s (%s): fmt:%s csp:%s prim:%s trc:%s ->" + " fmt:%s csp:%s prim:%s trc:%s\n", + err_msg, av_err2str(ret), + av_get_pix_fmt_name(src_fmt.format), av_color_space_name(src_fmt.csp), + av_color_primaries_name(src_fmt.prim), av_color_transfer_name(src_fmt.trc), + av_get_pix_fmt_name(dst_fmt.format), av_color_space_name(dst_fmt.csp), + av_color_primaries_name(dst_fmt.prim), av_color_transfer_name(dst_fmt.trc)); + + for (int i = 0; i < FF_ARRAY_ELEMS(s->graph); i++) + sws_graph_free(&s->graph[i]); + + return ret; + } + + return 0; } /** diff --git a/libswscale/swscale.h b/libswscale/swscale.h index 6eeb0b0ea0..4b09f0b2d8 100644 --- a/libswscale/swscale.h +++ b/libswscale/swscale.h @@ -107,6 +107,12 @@ typedef enum SwsFlags { SWS_LANCZOS = 1 << 9, ///< 3-tap sinc/sinc SWS_SPLINE = 1 << 10, ///< cubic Keys spline + /** + * Return an error on underspecified conversions. Without this flag, + * unspecified fields are defaulted to sensible values. + */ + SWS_STRICT = 1 << 11, + /** * Emit verbose log of scaling parameters. */ @@ -209,7 +215,10 @@ typedef struct SwsContext { int gamma_flag; /** - * Frame property overrides. + * Deprecated frame property overrides, for the legacy API only. + * + * Ignored by sws_scale_frame() when used in dynamic mode, in which + * case all properties are instead taken from the frame directly. */ int src_w, src_h; ///< Width and height of the source frame int dst_w, dst_h; ///< Width and height of the destination frame @@ -221,6 +230,8 @@ typedef struct SwsContext { int src_h_chr_pos; ///< Source horizontal chroma position int dst_v_chr_pos; ///< Destination vertical chroma position int dst_h_chr_pos; ///< Destination horizontal chroma position + + /* Remember to add new fields to graph.c:opts_equal() */ } SwsContext; /** @@ -289,12 +300,57 @@ int sws_test_transfer(enum AVColorTransferCharacteristic trc, int output); */ int sws_test_frame(const AVFrame *frame, int output); +/** + * Like `sws_scale_frame`, but without actually scaling. It will instead + * merely initialize internal state that *would* be required to perform the + * operation, as well as returning the correct error code for unsupported + * frame combinations. + * + * @param ctx The scaling context. + * @param dst The destination frame to consider. + * @param src The source frame to consider. + * @return 0 on success, a negative AVERROR code on failure. + */ +int sws_frame_setup(SwsContext *ctx, const AVFrame *dst, const AVFrame *src); + +/******************** + * Main scaling API * + ********************/ + /** * Check if a given conversion is a noop. Returns a positive integer if * no operation needs to be performed, 0 otherwise. */ int sws_is_noop(const AVFrame *dst, const AVFrame *src); +/** + * Scale source data from `src` and write the output to `dst`. + * + * This function can be used directly on an allocated context, without setting + * up any frame properties or calling `sws_init_context()`. Such usage is fully + * dynamic and does not require reallocation if the frame properties change. + * + * Alternatively, this function can be called on a context that has been + * explicitly initialized. However, this is provided only for backwards + * compatibility. In this usage mode, all frame properties must be correctly + * set at init time, and may no longer change after initialization. + * + * @param ctx The scaling context. + * @param dst The destination frame. The data buffers may either be already + * allocated by the caller or left clear, in which case they will + * be allocated by the scaler. The latter may have performance + * advantages - e.g. in certain cases some (or all) output planes + * may be references to input planes, rather than copies. + * @param src The source frame. If the data buffers are set to NULL, then + * this function behaves identically to `sws_frame_setup`. + * @return 0 on success, a negative AVERROR code on failure. + */ +int sws_scale_frame(SwsContext *c, AVFrame *dst, const AVFrame *src); + +/************************* + * Legacy (stateful) API * + *************************/ + #define SWS_SRC_V_CHR_DROP_MASK 0x30000 #define SWS_SRC_V_CHR_DROP_SHIFT 16 @@ -357,6 +413,11 @@ int sws_isSupportedEndiannessConversion(enum AVPixelFormat pix_fmt); /** * Initialize the swscaler context sws_context. * + * This function is considered deprecated, and provided only for backwards + * compatibility with sws_scale() and sws_start_frame(). The preferred way to + * use libswscale is to set all frame properties correctly and call + * sws_scale_frame() directly, without explicitly initializing the context. + * * @return zero or positive value on success, a negative value on * error */ @@ -398,7 +459,8 @@ SwsContext *sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat, /** * Scale the image slice in srcSlice and put the resulting scaled * slice in the image in dst. A slice is a sequence of consecutive - * rows in an image. + * rows in an image. Requires a context that has been previously + * been initialized with sws_init_context(). * * Slices have to be provided in sequential order, either in * top-bottom or bottom-top order. If slices are provided in @@ -425,27 +487,11 @@ int sws_scale(SwsContext *c, const uint8_t *const srcSlice[], const int srcStride[], int srcSliceY, int srcSliceH, uint8_t *const dst[], const int dstStride[]); -/** - * Scale source data from src and write the output to dst. - * - * This is merely a convenience wrapper around - * - sws_frame_start() - * - sws_send_slice(0, src->height) - * - sws_receive_slice(0, dst->height) - * - sws_frame_end() - * - * @param c The scaling context - * @param dst The destination frame. See documentation for sws_frame_start() for - * more details. - * @param src The source frame. - * - * @return 0 on success, a negative AVERROR code on failure - */ -int sws_scale_frame(SwsContext *c, AVFrame *dst, const AVFrame *src); - /** * Initialize the scaling process for a given pair of source/destination frames. * Must be called before any calls to sws_send_slice() and sws_receive_slice(). + * Requires a context that has been previously been initialized with + * sws_init_context(). * * This function will retain references to src and dst, so they must both use * refcounted buffers (if allocated by the caller, in case of dst). @@ -516,7 +562,8 @@ int sws_receive_slice(SwsContext *c, unsigned int slice_start, unsigned int slice_height); /** - * Get the alignment required for slices + * Get the alignment required for slices. Requires a context that has been + * previously been initialized with sws_init_context(). * * @param c The scaling context * @return alignment required for output slices requested with sws_receive_slice(). diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index f1a31775d5..69f8e8a838 100644 --- a/libswscale/swscale_internal.h +++ b/libswscale/swscale_internal.h @@ -26,6 +26,7 @@ #include "config.h" #include "swscale.h" +#include "graph.h" #include "libavutil/avassert.h" #include "libavutil/common.h" @@ -323,6 +324,9 @@ struct SwsInternal { int *slice_err; int nb_slice_ctx; + /* Scaling graph, reinitialized dynamically as needed. */ + SwsGraph *graph[2]; /* top, bottom fields */ + // values passed to current sws_receive_slice() call int dst_slice_start; int dst_slice_height; @@ -663,6 +667,7 @@ struct SwsInternal { unsigned int dst_slice_align; atomic_int stride_unaligned_warned; atomic_int data_unaligned_warned; + int color_conversion_warned; Half2FloatTables *h2f_tables; }; @@ -674,7 +679,7 @@ static_assert(offsetof(SwsInternal, redDither) + DITHER32_INT == offsetof(SwsInt #if ARCH_X86_64 /* x86 yuv2gbrp uses the SwsInternal for yuv coefficients if struct offsets change the asm needs to be updated too */ -static_assert(offsetof(SwsInternal, yuv2rgb_y_offset) == 40332, +static_assert(offsetof(SwsInternal, yuv2rgb_y_offset) == 40348, "yuv2rgb_y_offset must be updated in x86 asm"); #endif diff --git a/libswscale/utils.c b/libswscale/utils.c index b8d478b104..f56d2926b8 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -61,6 +61,7 @@ #include "swscale.h" #include "swscale_internal.h" #include "utils.h" +#include "graph.h" typedef struct FormatEntry { uint8_t is_supported_in :1; @@ -2461,6 +2462,9 @@ void sws_freeContext(SwsContext *sws) if (!c) return; + for (i = 0; i < FF_ARRAY_ELEMS(c->graph); i++) + sws_graph_free(&c->graph[i]); + for (i = 0; i < c->nb_slice_ctx; i++) sws_freeContext(c->slice_ctx[i]); av_freep(&c->slice_ctx); diff --git a/libswscale/x86/output.asm b/libswscale/x86/output.asm index 7a1e5d9bc1..f2e884780a 100644 --- a/libswscale/x86/output.asm +++ b/libswscale/x86/output.asm @@ -582,7 +582,7 @@ yuv2nv12cX_fn yuv2nv21 %if ARCH_X86_64 struc SwsInternal - .padding: resb 40332 ; offsetof(SwsInternal, yuv2rgb_y_offset) + .padding: resb 40348 ; offsetof(SwsInternal, yuv2rgb_y_offset) .yuv2rgb_y_offset: resd 1 .yuv2rgb_y_coeff: resd 1 .yuv2rgb_v2r_coeff: resd 1