diff mbox series

[FFmpeg-devel,1/3] avutil: Add Dolby Vision RPU side data type

Message ID 20211021134843.1436241-2-derek.buitenhuis@gmail.com
State New
Headers show
Series Dolby Vision RPU Side Data
Related show

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished
andriy/make_ppc success Make finished
andriy/make_fate_ppc success Make fate finished

Commit Message

Derek Buitenhuis Oct. 21, 2021, 1:48 p.m. UTC
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
---
All known encoders, etc. expect RPUs in this format, to my knowledge.
(There is no spec for parsing the RPUs, even privately, and not a single
piece of software that would benefit from that - and it is a legal minefield
which I am not going to touch.)

The comment says "with emulation bytes intact" but technically, RPUs
do not have emulation bytes, since they are not actually NALs, but just
masquerade as them, and as such, encoder APIs expect them to be there.
---
 doc/APIchanges      | 3 +++
 libavutil/frame.c   | 1 +
 libavutil/frame.h   | 7 +++++++
 libavutil/version.h | 2 +-
 4 files changed, 12 insertions(+), 1 deletion(-)

Comments

Jan Ekström Oct. 22, 2021, 10:02 p.m. UTC | #1
On Thu, Oct 21, 2021 at 4:49 PM Derek Buitenhuis
<derek.buitenhuis@gmail.com> wrote:
>
> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
> ---
> All known encoders, etc. expect RPUs in this format, to my knowledge.
> (There is no spec for parsing the RPUs, even privately, and not a single
> piece of software that would benefit from that

DoVi Profile 5 and 7 rendering requires the data within these RPU
entities, AFAIK (as it contains various values to correctly configure
the custom color space). So software such as libplacebo where haasn
has made efforts towards implementing the non-standard color space
utilized would benefit from having the contents.

So while there clearly are two separate use cases, I would note that
saying "nothing would see any use from the bytes within these
structures" is maybe a bit disingenuous.

The two use cases being:
1. Just re-encoding the content. The contents of these RPU buffers
being unrelated and only passthrough is necessary.
2. Trying to gather the information of these metadata units regarding
interpretation of the received image.

Both are clearly valid use cases, and we should IMHO support both, at
least on the AVFrame level. Not that I mean that you should implement
the parsing of these metadata units, since it is clear you want to
keep away from that part of the work :) .

> - and it is a legal minefield
> which I am not going to touch.)

While with this I agree with completely. Or at least given the D
company's track record :P . That said, just parsing the metadata units
is probably much less of a swamp than actually trying to utilize the
metadata to properly render Profile 5 or 7 video.

For the record, there are at least two known to me OSS implementations
of RPU parsing/writing, one being made in Java and the other being
made in Rust.

Jan
Derek Buitenhuis Oct. 23, 2021, 5:39 p.m. UTC | #2
On 10/22/2021 11:02 PM, Jan Ekström wrote:
> DoVi Profile 5 and 7 rendering requires the data within these RPU
> entities, AFAIK (as it contains various values to correctly configure
> the custom color space). So software such as libplacebo where haasn
> has made efforts towards implementing the non-standard color space
> utilized would benefit from having the contents.
> 
> So while there clearly are two separate use cases, I would note that
> saying "nothing would see any use from the bytes within these
> structures" is maybe a bit disingenuous.

Do any of thse actually work or is it just gusswork based on reverse
engineering and patents (0% chance I am readng any patent or IPR). We
could rename this or DOVI_PROFILE8 or RPU_BUFFER or something if you
prefer.

> The two use cases being:
> 1. Just re-encoding the content. The contents of these RPU buffers
> being unrelated and only passthrough is necessary.

Yes, supported today by e.g. x265, which I imagine is the most common usecase.

> 2. Trying to gather the information of these metadata units regarding
> interpretation of the received image.

To what end is a bit iffy - nothing can make much use of 99% of it.

> Both are clearly valid use cases, and we should IMHO support both, at
> least on the AVFrame level. Not that I mean that you should implement
> the parsing of these metadata units, since it is clear you want to
> keep away from that part of the work :) .

Indeed, I was hoping this patchset would not be bogged down with "yeah
implement this undocumented proprietary parser from RE or patent instead"
which I obviously cannot do (dayjob, etc.) - and I think the value is ...
eh ... at best.

I'm not particularily fond of the idea of indefinitely killing this
functionality while waiting for someone to take up the parsing.

>> - and it is a legal minefield
>> which I am not going to touch.)
> 
> While with this I agree with completely. Or at least given the D
> company's track record :P . That said, just parsing the metadata units
> is probably much less of a swamp than actually trying to utilize the
> metadata to properly render Profile 5 or 7 video.

Which is why parsing it is iffy value given the risk.

- Derek
diff mbox series

Patch

diff --git a/doc/APIchanges b/doc/APIchanges
index 7b267a79ac..8bc9c7eb6e 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -14,6 +14,9 @@  libavutil:     2021-04-27
 
 API changes, most recent first:
 
+2021-XX-XX - xxxxxxxxxx - lavf 57.8.100 - frame.h
+  Add AV_FRAME_DATA_DOVI_RPU.
+
 2021-09-21 - xxxxxxxxxx - lavu 57.7.100 - pixfmt.h
   Add AV_PIX_FMT_X2BGR10.
 
diff --git a/libavutil/frame.c b/libavutil/frame.c
index 2617cda2b5..23b52f7b66 100644
--- a/libavutil/frame.c
+++ b/libavutil/frame.c
@@ -728,6 +728,7 @@  const char *av_frame_side_data_name(enum AVFrameSideDataType type)
     case AV_FRAME_DATA_SEI_UNREGISTERED:            return "H.26[45] User Data Unregistered SEI message";
     case AV_FRAME_DATA_FILM_GRAIN_PARAMS:           return "Film grain parameters";
     case AV_FRAME_DATA_DETECTION_BBOXES:            return "Bounding boxes for object detection and classification";
+    case AV_FRAME_DATA_DOVI_RPU:                    return "Dolby Vision RPU Data";
     }
     return NULL;
 }
diff --git a/libavutil/frame.h b/libavutil/frame.h
index ff2540a20f..29839211ca 100644
--- a/libavutil/frame.h
+++ b/libavutil/frame.h
@@ -187,6 +187,13 @@  enum AVFrameSideDataType {
      * as described by AVDetectionBBoxHeader.
      */
     AV_FRAME_DATA_DETECTION_BBOXES,
+
+    /**
+     * Dolby Vision RPU data, suitable for passing to x265
+     * or other libraries. Array of uint8_t, with NAL emulation
+     * bytes intact.
+     */
+    AV_FRAME_DATA_DOVI_RPU,
 };
 
 enum AVActiveFormatDescription {
diff --git a/libavutil/version.h b/libavutil/version.h
index 896e348d80..eeb33b388c 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -79,7 +79,7 @@ 
  */
 
 #define LIBAVUTIL_VERSION_MAJOR  57
-#define LIBAVUTIL_VERSION_MINOR   7
+#define LIBAVUTIL_VERSION_MINOR   8
 #define LIBAVUTIL_VERSION_MICRO 100
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \