From patchwork Sat Sep  8 13:49:29 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Timo Rothenpieler <timo@rothenpieler.org>
X-Patchwork-Id: 10279
Delivered-To: ffmpegpatchwork@gmail.com
Received: by 2002:a02:12c4:0:0:0:0:0 with SMTP id 65-v6csp587673jap;
	Sat, 8 Sep 2018 06:49:50 -0700 (PDT)
X-Google-Smtp-Source: 
 ANB0VdZaNomJYhM/DM/NJ2LE5sV1Fw9tjb3iRPOUoFEg7CWjkMJuFnAnofpS+d9UN51Od7S6BykS
X-Received: by 2002:adf:ec85:: with SMTP id
	z5-v6mr9812048wrn.142.1536414590163;
	Sat, 08 Sep 2018 06:49:50 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1536414590; cv=none;
	d=google.com; s=arc-20160816;
	b=T7wNhWBn9UzG2xsW0jO3hyQzVQ4dHg14u5q1rWIjoLZu5SITw1OHauVc4syluToNH2
	vKILGdXL+I2Fe+9ONddRYRtSV5o52qQ8zd3sII1A+JoPr0J6BCYe0sc6V3qSXYmrk5Qp
	HzII09KZjIbz6DhX7jC88oJJ0RA9KbGlOxCzRbAmeqr7KJ/vvvOQnP8jDXBMGdeHtnaO
	67/9DeaOsGIcyKgp3bT009HRARKzQWwTeJAiz+UQEBpA4Gdkc8wO3uG4zdsQUsuyonzg
	WD2FVkUdkRLOpBmKXgwUFIrvDheQiiCSo3MjhM8SY1OgiAg8f9j+19OT7CFOtLMitP2A
	BpBw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
	s=arc-20160816;
	h=sender:errors-to:content-transfer-encoding:cc:reply-to
	:list-subscribe:list-help:list-post:list-archive:list-unsubscribe
	:list-id:precedence:subject:mime-version:references:in-reply-to
	:message-id:date:to:from:dkim-signature:delivered-to;
	bh=jp7Qx7q9wwEQArb7IzB5QbGJ8jzE+OHZo0MY6rvzsT0=;
	b=JXR+UrVDvh9XtPQzVw/wrGoG1F9aNdf4bjUz8Ui8dR8VY89hk83oOxzM8jC2mCfJBp
	7BXD3JOHpjA6n7t2aPksfbjkZVidA5AVQa2rdUKnUYlaYRRFglC6P1hQzcPjc81L+O6b
	ruoEFVQ3YZ1H1LlXrzIfXDKw7qljSHBu9VrbrCqyGBIV06LDL+ARH5mBuAz0S5ip9PXX
	cISDdmjyHCGeIv2MumxBz/sX3vfS/Yc6VdpTbTt/vTtOQg328vmoLcUN0IY3UNPVeqro
	pUYwVW4j7+ka8v3OzoKW4ffaDijuYbOCOYZO7q7j9bPwZZTnr3W+iiBey/llRi6VqXcz
	pjXw==
ARC-Authentication-Results: i=1; mx.google.com;
	dkim=neutral (body hash did not verify) header.i=@rothenpieler.org
	header.s=mail header.b=gnoGwkfy;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
	dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
	by mx.google.com with ESMTP id
	x9-v6si10103901wrx.283.2018.09.08.06.49.49;
	Sat, 08 Sep 2018 06:49:50 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
	dkim=neutral (body hash did not verify) header.i=@rothenpieler.org
	header.s=mail header.b=gnoGwkfy;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
	dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3B66E68A0BC;
	Sat,  8 Sep 2018 16:49:39 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from btbn.de (btbn.de [5.9.118.179])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5AB80680CA2
	for <ffmpeg-devel@ffmpeg.org>; Sat,  8 Sep 2018 16:49:32 +0300 (EEST)
Received: from localhost.localdomain
	(200116b864197700a50f90305f181013.dip.versatel-1u1.de
	[IPv6:2001:16b8:6419:7700:a50f:9030:5f18:1013])
	by btbn.de (Postfix) with ESMTPSA id 0090C254B8;
	Sat,  8 Sep 2018 15:49:39 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rothenpieler.org;
	s=mail; t=1536414580;
	bh=Opudo4y8y2wY4CmGpqIhc63bkDQnQvKCKBgykITKf+c=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References;
	b=gnoGwkfyD/DmZ+Jyefb6G17My2IFTOr5rwMVx6HRhrus/AfwIyuFzjZvrt88Qc7tM
	8FvZM50XQ/LpPo47soQokt/EVB9zMoYd8j5FZhwTs6E8DsPgsEsa+dyGlGRM9bjpv5
	wb4C9Q/vuG28ZWrOE6kRlGICc3ZZPQbPevBMMTWhVGrl9+eeiibs4yd5G5k+5NMRQ0
	CBFgHj2Z3rFrNmsB/fm8k1oFFbrcs1PPzHVL32j4AbdRtsTLReJj5xW+I/EUkXAMLH
	CYi9QczUQNzkLeun1HkITiOFKPEnikImMmRT4UUZ8jOGF9uAec1tVKei7fHmmzyo9h
	ND/Myi0bfcp0A==
From: Timo Rothenpieler <timo@rothenpieler.org>
To: ffmpeg-devel@ffmpeg.org
Date: Sat,  8 Sep 2018 15:49:29 +0200
Message-Id: <20180908134929.5720-1-timo@rothenpieler.org>
X-Mailer: git-send-email 2.17.0
In-Reply-To: 
 <CY4PR12MB1749D68FEF3ECD60F597DD3FD2030@CY4PR12MB1749.namprd12.prod.outlook.com>
References: 
 <CY4PR12MB1749D68FEF3ECD60F597DD3FD2030@CY4PR12MB1749.namprd12.prod.outlook.com>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH] avfilter: add nvidia NPP based transpose
	filter
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Cc: ygupta@nvidia.com, Timo Rothenpieler <timo@rothenpieler.org>,
	rarzumanyan@nvidia.com
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

From: Roman Arzumanyan <rarzumanyan@nvidia.com>

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
---
I'm not overly a fan of a rotate filter that only support 90° angles
either.
So here's my modified version of the original transpose filter, which
now behaves the exact same as the software transpose filter.

Additionally, I removed the format conversion from the filter. That's
the job of the scale filter, and also saves you from doing pointless
double format conversion if you scale and transpose NV12 video.
Nvenc accepts yuv420p/444p input anyway, and if you really need to, one
can add another scale_npp after to get back nv12.

A possible commandline for this is:
./ffmpeg.exe -hwaccel cuvid -c:v h264_cuvid -i in.mkv -c copy -c:v h264_nvenc -vf scale_npp=format=yuv420p,transpose_npp=cclock_flip out.mkv


 configure                      |   5 +-
 doc/filters.texi               |  55 ++++
 libavfilter/Makefile           |   1 +
 libavfilter/allfilters.c       |   1 +
 libavfilter/version.h          |   2 +-
 libavfilter/vf_transpose_npp.c | 483 +++++++++++++++++++++++++++++++++
 6 files changed, 544 insertions(+), 3 deletions(-)
 create mode 100644 libavfilter/vf_transpose_npp.c

diff --git a/configure b/configure
index 0d6ee0abfc..e1f229f052 100755
--- a/configure
+++ b/configure
@@ -2923,6 +2923,7 @@ hwupload_cuda_filter_deps="ffnvcodec"
 scale_npp_filter_deps="ffnvcodec libnpp"
 scale_cuda_filter_deps="cuda_sdk"
 thumbnail_cuda_filter_deps="cuda_sdk"
+transpose_npp_filter_deps="ffnvcodec libnpp"
 
 amf_deps_any="libdl LoadLibrary"
 nvenc_deps="ffnvcodec"
@@ -6082,8 +6083,8 @@ enabled libmodplug        && require_pkg_config libmodplug libmodplug libmodplug
 enabled libmp3lame        && require "libmp3lame >= 3.98.3" lame/lame.h lame_set_VBR_quality -lmp3lame $libm_extralibs
 enabled libmysofa         && { check_pkg_config libmysofa libmysofa mysofa.h mysofa_load ||
                                require libmysofa mysofa.h mysofa_load -lmysofa $zlib_extralibs; }
-enabled libnpp            && { check_lib libnpp npp.h nppGetLibVersion -lnppig -lnppicc -lnppc ||
-                               check_lib libnpp npp.h nppGetLibVersion -lnppi -lnppc ||
+enabled libnpp            && { check_lib libnpp npp.h nppGetLibVersion -lnppig -lnppicc -lnppc -lnppidei ||
+                               check_lib libnpp npp.h nppGetLibVersion -lnppi -lnppc -lnppidei ||
                                die "ERROR: libnpp not found"; }
 enabled libopencore_amrnb && require libopencore_amrnb opencore-amrnb/interf_dec.h Decoder_Interface_init -lopencore-amrnb
 enabled libopencore_amrwb && require libopencore_amrwb opencore-amrwb/dec_if.h D_IF_init -lopencore-amrwb
diff --git a/doc/filters.texi b/doc/filters.texi
index 37e79d34e1..5b839b6419 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -16284,6 +16284,61 @@ The command above can also be specified as:
 transpose=1:portrait
 @end example
 
+@section transpose_npp
+
+Transpose rows with columns in the input video and optionally flip it.
+For more in depth examples see the @ref{transpose} video filter, which shares mostly the same options.
+
+It accepts the following parameters:
+
+@table @option
+
+@item dir
+Specify the transposition direction.
+
+Can assume the following values:
+@table @samp
+@item cclock_flip
+Rotate by 90 degrees counterclockwise and vertically flip. (default)
+
+@item clock
+Rotate by 90 degrees clockwise.
+
+@item cclock
+Rotate by 90 degrees counterclockwise.
+
+@item clock_flip
+Rotate by 90 degrees clockwise and vertically flip.
+@end table
+
+@item passthrough
+Do not apply the transposition if the input geometry matches the one
+specified by the specified value. It accepts the following values:
+@table @samp
+@item none
+Always apply transposition. (default)
+@item portrait
+Preserve portrait geometry (when @var{height} >= @var{width}).
+@item landscape
+Preserve landscape geometry (when @var{width} >= @var{height}).
+@end table
+
+@item interp_algo
+The interpolation algorithm used for rotating. One of the following:
+@table @option
+@item nn
+Nearest neighbour
+
+@item linear
+Linear
+
+@item cubic
+Cubid (default)
+
+@end table
+
+@end table
+
 @section trim
 Trim the input so that the output contains one continuous subpart of the input.
 
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index e412000c8f..cc0cc15fd2 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -374,6 +374,7 @@ OBJS-$(CONFIG_TONEMAP_FILTER)                += vf_tonemap.o colorspace.o
 OBJS-$(CONFIG_TONEMAP_OPENCL_FILTER)         += vf_tonemap_opencl.o colorspace.o opencl.o \
                                                 opencl/tonemap.o opencl/colorspace_common.o
 OBJS-$(CONFIG_TRANSPOSE_FILTER)              += vf_transpose.o
+OBJS-$(CONFIG_TRANSPOSE_NPP_FILTER)          += vf_transpose_npp.o
 OBJS-$(CONFIG_TRIM_FILTER)                   += trim.o
 OBJS-$(CONFIG_UNPREMULTIPLY_FILTER)          += vf_premultiply.o framesync.o
 OBJS-$(CONFIG_UNSHARP_FILTER)                += vf_unsharp.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 2fa9460335..73a5d7e188 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -356,6 +356,7 @@ extern AVFilter ff_vf_tmix;
 extern AVFilter ff_vf_tonemap;
 extern AVFilter ff_vf_tonemap_opencl;
 extern AVFilter ff_vf_transpose;
+extern AVFilter ff_vf_transpose_npp;
 extern AVFilter ff_vf_trim;
 extern AVFilter ff_vf_unpremultiply;
 extern AVFilter ff_vf_unsharp;
diff --git a/libavfilter/version.h b/libavfilter/version.h
index 2ff2b6a318..ef982339d7 100644
--- a/libavfilter/version.h
+++ b/libavfilter/version.h
@@ -30,7 +30,7 @@
 #include "libavutil/version.h"
 
 #define LIBAVFILTER_VERSION_MAJOR   7
-#define LIBAVFILTER_VERSION_MINOR  27
+#define LIBAVFILTER_VERSION_MINOR  28
 #define LIBAVFILTER_VERSION_MICRO 100
 
 #define LIBAVFILTER_VERSION_INT AV_VERSION_INT(LIBAVFILTER_VERSION_MAJOR, \
diff --git a/libavfilter/vf_transpose_npp.c b/libavfilter/vf_transpose_npp.c
new file mode 100644
index 0000000000..5842a25483
--- /dev/null
+++ b/libavfilter/vf_transpose_npp.c
@@ -0,0 +1,483 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <nppi.h>
+#include <stdio.h>
+#include <string.h>
+#include "libavutil/avstring.h"
+#include "libavutil/common.h"
+#include "libavutil/eval.h"
+#include "libavutil/hwcontext.h"
+#include "libavutil/hwcontext_cuda_internal.h"
+#include "libavutil/internal.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+#include "avfilter.h"
+#include "formats.h"
+#include "internal.h"
+#include "video.h"
+
+static const enum AVPixelFormat supported_formats[] = {
+    AV_PIX_FMT_YUV420P,
+    AV_PIX_FMT_YUV444P
+};
+
+enum TransposeStage {
+    STAGE_ROTATE,
+    STAGE_TRANSPOSE,
+    STAGE_NB
+};
+
+enum Transpose {
+    NPP_TRANSPOSE_CCLOCK_FLIP = 0,
+    NPP_TRANSPOSE_CLOCK = 1,
+    NPP_TRANSPOSE_CCLOCK = 2,
+    NPP_TRANSPOSE_CLOCK_FLIP = 3
+};
+
+enum Passthrough {
+    NPP_TRANSPOSE_PT_TYPE_NONE = 0,
+    NPP_TRANSPOSE_PT_TYPE_LANDSCAPE,
+    NPP_TRANSPOSE_PT_TYPE_PORTRAIT
+};
+
+typedef struct NPPTransposeStageContext {
+    int stage_needed;
+    enum AVPixelFormat in_fmt;
+    enum AVPixelFormat out_fmt;
+    struct {
+        int width;
+        int height;
+    } planes_in[3], planes_out[3];
+    AVBufferRef *frames_ctx;
+    AVFrame     *frame;
+} NPPTransposeStageContext;
+
+typedef struct NPPTransposeContext {
+    const AVClass *class;
+    NPPTransposeStageContext stages[STAGE_NB];
+    AVFrame *tmp_frame;
+
+    int passthrough;    ///< PassthroughType, landscape passthrough mode enabled
+    int dir;            ///< TransposeDir
+    int interp_algo;
+} NPPTransposeContext;
+
+static int npptranspose_init(AVFilterContext *ctx)
+{
+    NPPTransposeContext *s = ctx->priv;
+    int i;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(s->stages); i++) {
+        s->stages[i].frame = av_frame_alloc();
+        if (!s->stages[i].frame)
+            return AVERROR(ENOMEM);
+    }
+
+    s->tmp_frame = av_frame_alloc();
+    if (!s->tmp_frame)
+        return AVERROR(ENOMEM);
+
+    return 0;
+}
+
+static void npptranspose_uninit(AVFilterContext *ctx)
+{
+    NPPTransposeContext *s = ctx->priv;
+    int i;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(s->stages); i++) {
+        av_frame_free(&s->stages[i].frame);
+        av_buffer_unref(&s->stages[i].frames_ctx);
+    }
+
+    av_frame_free(&s->tmp_frame);
+}
+
+static int npptranspose_query_formats(AVFilterContext *ctx)
+{
+    static const enum AVPixelFormat pixel_formats[] = {
+        AV_PIX_FMT_CUDA, AV_PIX_FMT_NONE,
+    };
+
+    AVFilterFormats *pix_fmts = ff_make_format_list(pixel_formats);
+    return ff_set_common_formats(ctx, pix_fmts);
+}
+
+static int init_stage(NPPTransposeStageContext *stage, AVBufferRef *device_ctx)
+{
+    AVBufferRef *out_ref = NULL;
+    AVHWFramesContext *out_ctx;
+    int in_sw, in_sh, out_sw, out_sh;
+    int ret, i;
+
+    av_pix_fmt_get_chroma_sub_sample(stage->in_fmt,  &in_sw,  &in_sh);
+    av_pix_fmt_get_chroma_sub_sample(stage->out_fmt, &out_sw, &out_sh);
+
+    if (!stage->planes_out[0].width) {
+        stage->planes_out[0].width  = stage->planes_in[0].width;
+        stage->planes_out[0].height = stage->planes_in[0].height;
+    }
+
+    for (i = 1; i < FF_ARRAY_ELEMS(stage->planes_in); i++) {
+        stage->planes_in[i].width   = stage->planes_in[0].width   >> in_sw;
+        stage->planes_in[i].height  = stage->planes_in[0].height  >> in_sh;
+        stage->planes_out[i].width  = stage->planes_out[0].width  >> out_sw;
+        stage->planes_out[i].height = stage->planes_out[0].height >> out_sh;
+    }
+
+    out_ref = av_hwframe_ctx_alloc(device_ctx);
+    if (!out_ref)
+        return AVERROR(ENOMEM);
+    out_ctx = (AVHWFramesContext*)out_ref->data;
+
+    out_ctx->format    = AV_PIX_FMT_CUDA;
+    out_ctx->sw_format = stage->out_fmt;
+    out_ctx->width     = FFALIGN(stage->planes_out[0].width,  32);
+    out_ctx->height    = FFALIGN(stage->planes_out[0].height, 32);
+
+    ret = av_hwframe_ctx_init(out_ref);
+    if (ret < 0)
+        goto fail;
+
+    av_frame_unref(stage->frame);
+    ret = av_hwframe_get_buffer(out_ref, stage->frame, 0);
+    if (ret < 0)
+        goto fail;
+
+    stage->frame->width  = stage->planes_out[0].width;
+    stage->frame->height = stage->planes_out[0].height;
+    av_buffer_unref(&stage->frames_ctx);
+    stage->frames_ctx = out_ref;
+
+    return 0;
+
+fail:
+    av_buffer_unref(&out_ref);
+    return ret;
+}
+
+static int format_is_supported(enum AVPixelFormat fmt)
+{
+    int i;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++)
+        if (supported_formats[i] == fmt)
+            return 1;
+
+    return 0;
+}
+
+static int init_processing_chain(AVFilterContext *ctx, int in_width, int in_height,
+                                 int out_width, int out_height)
+{
+    NPPTransposeContext *s = ctx->priv;
+    AVHWFramesContext *in_frames_ctx;
+    enum AVPixelFormat format;
+    int i, ret, last_stage = -1;
+    int rot_width = out_width, rot_height = out_height;
+
+    /* check that we have a hw context */
+    if (!ctx->inputs[0]->hw_frames_ctx) {
+        av_log(ctx, AV_LOG_ERROR, "No hw context provided on input\n");
+        return AVERROR(EINVAL);
+    }
+
+    in_frames_ctx = (AVHWFramesContext*)ctx->inputs[0]->hw_frames_ctx->data;
+    format        = in_frames_ctx->sw_format;
+
+    if (!format_is_supported(format)) {
+        av_log(ctx, AV_LOG_ERROR, "Unsupported input format: %s\n",
+               av_get_pix_fmt_name(format));
+        return AVERROR(ENOSYS);
+    }
+
+    if (s->dir != NPP_TRANSPOSE_CCLOCK_FLIP) {
+        s->stages[STAGE_ROTATE].stage_needed = 1;
+    }
+
+    if (s->dir == NPP_TRANSPOSE_CCLOCK_FLIP || s->dir == NPP_TRANSPOSE_CLOCK_FLIP) {
+        s->stages[STAGE_TRANSPOSE].stage_needed = 1;
+
+        /* Rotating by 180° in case of clock_flip, or not at all for cclock_flip, so width/height unchanged by rotation */
+        rot_width = in_width;
+        rot_height = in_height;
+    }
+
+    s->stages[STAGE_ROTATE].in_fmt               = format;
+    s->stages[STAGE_ROTATE].out_fmt              = format;
+    s->stages[STAGE_ROTATE].planes_in[0].width   = in_width;
+    s->stages[STAGE_ROTATE].planes_in[0].height  = in_height;
+    s->stages[STAGE_ROTATE].planes_out[0].width  = rot_width;
+    s->stages[STAGE_ROTATE].planes_out[0].height = rot_height;
+    s->stages[STAGE_TRANSPOSE].in_fmt               = format;
+    s->stages[STAGE_TRANSPOSE].out_fmt              = format;
+    s->stages[STAGE_TRANSPOSE].planes_in[0].width   = rot_width;
+    s->stages[STAGE_TRANSPOSE].planes_in[0].height  = rot_height;
+    s->stages[STAGE_TRANSPOSE].planes_out[0].width  = out_width;
+    s->stages[STAGE_TRANSPOSE].planes_out[0].height = out_height;
+
+    /* init the hardware contexts */
+    for (i = 0; i < FF_ARRAY_ELEMS(s->stages); i++) {
+        if (!s->stages[i].stage_needed)
+            continue;
+        ret = init_stage(&s->stages[i], in_frames_ctx->device_ref);
+        if (ret < 0)
+            return ret;
+        last_stage = i;
+    }
+
+    if (last_stage >= 0)
+        ctx->outputs[0]->hw_frames_ctx = av_buffer_ref(s->stages[last_stage].frames_ctx);
+    else
+        ctx->outputs[0]->hw_frames_ctx = av_buffer_ref(ctx->inputs[0]->hw_frames_ctx);
+
+    if (!ctx->outputs[0]->hw_frames_ctx)
+        return AVERROR(ENOMEM);
+
+    return 0;
+}
+
+static int npptranspose_config_props(AVFilterLink *outlink)
+{
+    AVFilterContext *ctx = outlink->src;
+    AVFilterLink *inlink = outlink->src->inputs[0];
+    NPPTransposeContext *s = ctx->priv;
+    int ret;
+
+    if ((inlink->w >= inlink->h && s->passthrough == NPP_TRANSPOSE_PT_TYPE_LANDSCAPE) ||
+        (inlink->w <= inlink->h && s->passthrough == NPP_TRANSPOSE_PT_TYPE_PORTRAIT)) {
+        av_log(ctx, AV_LOG_VERBOSE,
+               "w:%d h:%d -> w:%d h:%d (passthrough mode)\n",
+               inlink->w, inlink->h, inlink->w, inlink->h);
+        return 0;
+    } else {
+        s->passthrough = NPP_TRANSPOSE_PT_TYPE_NONE;
+    }
+
+    outlink->w = inlink->h;
+    outlink->h = inlink->w;
+    outlink->sample_aspect_ratio = (AVRational){inlink->sample_aspect_ratio.den, inlink->sample_aspect_ratio.num};
+
+    ret = init_processing_chain(ctx, inlink->w, inlink->h, outlink->w, outlink->h);
+    if (ret < 0)
+        return ret;
+
+    av_log(ctx, AV_LOG_VERBOSE, "w:%d h:%d -transpose-> w:%d h:%d\n",
+           inlink->w, inlink->h, outlink->w, outlink->h);
+
+    return 0;
+}
+
+static int npptranspose_rotate(AVFilterContext *ctx, NPPTransposeStageContext *stage,
+                               AVFrame *out, AVFrame *in)
+{
+    NPPTransposeContext *s = ctx->priv;
+    NppStatus err;
+    int i;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(stage->planes_in) && i < FF_ARRAY_ELEMS(in->data) && in->data[i]; i++) {
+        int iw = stage->planes_in[i].width;
+        int ih = stage->planes_in[i].height;
+        int ow = stage->planes_out[i].width;
+        int oh = stage->planes_out[i].height;
+
+        // nppRotate uses 0,0 as the rotation point
+        // need to shift the image accordingly after rotation
+        // need to substract 1 to get the correct coordinates
+        double angle = s->dir == NPP_TRANSPOSE_CLOCK ? -90.0 : s->dir == NPP_TRANSPOSE_CCLOCK ? 90.0 : 180.0;
+        int shiftw = (s->dir == NPP_TRANSPOSE_CLOCK  || s->dir == NPP_TRANSPOSE_CLOCK_FLIP) ? ow - 1 : 0;
+        int shifth = (s->dir == NPP_TRANSPOSE_CCLOCK || s->dir == NPP_TRANSPOSE_CLOCK_FLIP) ? oh - 1 : 0;
+
+        err = nppiRotate_8u_C1R(in->data[i], (NppiSize){ iw, ih },
+                                in->linesize[i], (NppiRect){ 0, 0, iw, ih },
+                                out->data[i], out->linesize[i],
+                                (NppiRect){ 0, 0, ow, oh },
+                                angle, shiftw, shifth, s->interp_algo);
+        if (err != NPP_SUCCESS) {
+            av_log(ctx, AV_LOG_ERROR, "NPP rotate error: %d\n", err);
+            return AVERROR_UNKNOWN;
+        }
+    }
+
+    return 0;
+}
+
+static int npptranspose_transpose(AVFilterContext *ctx, NPPTransposeStageContext *stage,
+                                  AVFrame *out, AVFrame *in)
+{
+    NppStatus err;
+    int i;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(stage->planes_in) && i < FF_ARRAY_ELEMS(in->data) && in->data[i]; i++) {
+        int iw = stage->planes_in[i].width;
+        int ih = stage->planes_in[i].height;
+
+        err = nppiTranspose_8u_C1R(in->data[i], in->linesize[i],
+                                   out->data[i], out->linesize[i],
+                                   (NppiSize){ iw, ih });
+        if (err != NPP_SUCCESS) {
+            av_log(ctx, AV_LOG_ERROR, "NPP transpose error: %d\n", err);
+            return AVERROR_UNKNOWN;
+        }
+    }
+
+    return 0;
+}
+
+static int (*const npptranspose_process[])(AVFilterContext *ctx, NPPTransposeStageContext *stage,
+                                           AVFrame *out, AVFrame *in) = {
+    [STAGE_ROTATE]       = npptranspose_rotate,
+    [STAGE_TRANSPOSE]    = npptranspose_transpose
+};
+
+static int npptranspose_filter(AVFilterContext *ctx, AVFrame *out, AVFrame *in)
+{
+    NPPTransposeContext *s = ctx->priv;
+    AVFrame *src = in;
+    int i, ret, last_stage = -1;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(s->stages); i++) {
+        if (!s->stages[i].stage_needed)
+            continue;
+
+        ret = npptranspose_process[i](ctx, &s->stages[i], s->stages[i].frame, src);
+        if (ret < 0)
+            return ret;
+
+        src        = s->stages[i].frame;
+        last_stage = i;
+    }
+
+    if (last_stage < 0)
+        return AVERROR_BUG;
+
+    ret = av_hwframe_get_buffer(src->hw_frames_ctx, s->tmp_frame, 0);
+    if (ret < 0)
+        return ret;
+
+    av_frame_move_ref(out, src);
+    av_frame_move_ref(src, s->tmp_frame);
+
+    ret = av_frame_copy_props(out, in);
+    if (ret < 0)
+        return ret;
+
+    return 0;
+}
+
+static int npptranspose_filter_frame(AVFilterLink *link, AVFrame *in)
+{
+    AVFilterContext              *ctx = link->dst;
+    NPPTransposeContext            *s = ctx->priv;
+    AVFilterLink             *outlink = ctx->outputs[0];
+    AVHWFramesContext     *frames_ctx = (AVHWFramesContext*)outlink->hw_frames_ctx->data;
+    AVCUDADeviceContext *device_hwctx = frames_ctx->device_ctx->hwctx;
+    AVFrame *out = NULL;
+    CUresult err;
+    CUcontext dummy;
+    int ret = 0;
+
+    if (s->passthrough)
+        return ff_filter_frame(outlink, in);
+
+    out = av_frame_alloc();
+    if (!out) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    err = device_hwctx->internal->cuda_dl->cuCtxPushCurrent(device_hwctx->cuda_ctx);
+    if (err != CUDA_SUCCESS) {
+        ret = AVERROR_UNKNOWN;
+        goto fail;
+    }
+
+    ret = npptranspose_filter(ctx, out, in);
+
+    device_hwctx->internal->cuda_dl->cuCtxPopCurrent(&dummy);
+    if (ret < 0)
+        goto fail;
+
+    av_frame_free(&in);
+
+    return ff_filter_frame(outlink, out);
+
+fail:
+    av_frame_free(&in);
+    av_frame_free(&out);
+    return ret;
+}
+
+#define OFFSET(x) offsetof(NPPTransposeContext, x)
+#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM)
+
+static const AVOption options[] = {
+    { "dir", "set transpose direction", OFFSET(dir), AV_OPT_TYPE_INT, { .i64 = NPP_TRANSPOSE_CCLOCK_FLIP }, 0, 3, FLAGS, "dir" },
+        { "cclock_flip", "rotate counter-clockwise with vertical flip", 0, AV_OPT_TYPE_CONST, { .i64 = NPP_TRANSPOSE_CCLOCK_FLIP }, 0, 0, FLAGS, "dir" },
+        { "clock",       "rotate clockwise",                            0, AV_OPT_TYPE_CONST, { .i64 = NPP_TRANSPOSE_CLOCK       }, 0, 0, FLAGS, "dir" },
+        { "cclock",      "rotate counter-clockwise",                    0, AV_OPT_TYPE_CONST, { .i64 = NPP_TRANSPOSE_CCLOCK      }, 0, 0, FLAGS, "dir" },
+        { "clock_flip",  "rotate clockwise with vertical flip",         0, AV_OPT_TYPE_CONST, { .i64 = NPP_TRANSPOSE_CLOCK_FLIP  }, 0, 0, FLAGS, "dir" },
+    { "passthrough", "do not apply transposition if the input matches the specified geometry", OFFSET(passthrough), AV_OPT_TYPE_INT, { .i64 = NPP_TRANSPOSE_PT_TYPE_NONE },  0, 2, FLAGS, "passthrough" },
+        { "none",      "always apply transposition",  0, AV_OPT_TYPE_CONST, { .i64 = NPP_TRANSPOSE_PT_TYPE_NONE },      0, 0, FLAGS, "passthrough" },
+        { "landscape", "preserve landscape geometry", 0, AV_OPT_TYPE_CONST, { .i64 = NPP_TRANSPOSE_PT_TYPE_LANDSCAPE }, 0, 0, FLAGS, "passthrough" },
+        { "portrait",  "preserve portrait geometry",  0, AV_OPT_TYPE_CONST, { .i64 = NPP_TRANSPOSE_PT_TYPE_PORTRAIT },  0, 0, FLAGS, "passthrough" },
+    { "interp_algo", "Interpolation algorithm used for rotating", OFFSET(interp_algo), AV_OPT_TYPE_INT, { .i64 = NPPI_INTER_CUBIC }, 0, INT_MAX, FLAGS, "interp_algo" },
+        { "nn",     "nearest neighbour", 0, AV_OPT_TYPE_CONST, { .i64 = NPPI_INTER_NN     }, 0, 0, FLAGS, "interp_algo" },
+        { "linear", "linear",            0, AV_OPT_TYPE_CONST, { .i64 = NPPI_INTER_LINEAR }, 0, 0, FLAGS, "interp_algo" },
+        { "cubic",  "cubic",             0, AV_OPT_TYPE_CONST, { .i64 = NPPI_INTER_CUBIC  }, 0, 0, FLAGS, "interp_algo" },
+    { NULL },
+};
+
+static const AVClass npptranspose_class = {
+    .class_name = "npptranspose",
+    .item_name  = av_default_item_name,
+    .option     = options,
+    .version    = LIBAVUTIL_VERSION_INT,
+};
+
+static const AVFilterPad npptranspose_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .filter_frame = npptranspose_filter_frame,
+    },
+    { NULL }
+};
+
+static const AVFilterPad npptranspose_outputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .config_props = npptranspose_config_props,
+    },
+    { NULL }
+};
+
+AVFilter ff_vf_transpose_npp = {
+    .name           = "transpose_npp",
+    .description    = NULL_IF_CONFIG_SMALL("NVIDIA Performance Primitives video transpose"),
+    .init           = npptranspose_init,
+    .uninit         = npptranspose_uninit,
+    .query_formats  = npptranspose_query_formats,
+    .priv_size      = sizeof(NPPTransposeContext),
+    .priv_class     = &npptranspose_class,
+    .inputs         = npptranspose_inputs,
+    .outputs        = npptranspose_outputs,
+    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
+};
\ No newline at end of file