diff mbox series

[FFmpeg-devel,3/4] vaapi_encode_h265: Query encoding block sizes and features

Message ID 20200305002528.11418-3-sw@jkqxz.net
State New
Headers show
Series [FFmpeg-devel,1/4] vaapi_encode: Move block size calculation after entrypoint selection | expand

Checks

Context Check Description
andriy/ffmpeg-patchwork success Make fate finished

Commit Message

Mark Thompson March 5, 2020, 12:25 a.m. UTC
---
Requires <https://github.com/intel/libva/pull/385>.  That isn't upstream, so this will need to wait for that and then get at least a fix to the version numbering before applying.

 libavcodec/vaapi_encode_h265.c | 91 +++++++++++++++++++++++++++-------
 1 file changed, 74 insertions(+), 17 deletions(-)

Comments

Fu, Linjie March 5, 2020, 5:13 a.m. UTC | #1
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Mark Thompson
> Sent: Thursday, March 5, 2020 08:25
> To: ffmpeg-devel@ffmpeg.org
> Subject: [FFmpeg-devel] [PATCH 3/4] vaapi_encode_h265: Query encoding
> block sizes and features
> 
> ---
> Requires <https://github.com/intel/libva/pull/385>.  That isn't upstream, so
> this will need to wait for that and then get at least a fix to the version
> numbering before applying.

The query looks good to me since I'm suffering from the differences of default
parameters allowed for VDENC and VMEPAK in media-driver while adding support
for low power encoding [1] for VAAPI. 

However got several questions:

1. Why not include PPS parameters as well? There are some differences in
cu_qp_delta_enabled_flag and diff_cu_qp_delta_depth supported for VAEntrypointEncSliceLP.

2. Would it be better to only involve the variable parameters for sps/pps... ? This is 
what PR 379 [2] (unfinished) intended to do.

3. The suggested value of SAO enable flag could also be 1 in corresponding code in
media-driver, which would provide 3% BD-Rate improvement with no fps penalty
based on previous experiment with FFmpeg VAAPI.

4. There is separate media_libva_caps for gen11 and gen12, maybe need further update
or centralization for the code in driver. (not tested yet, will do later)

[1] https://github.com/intel-media-ci/ffmpeg-cartwheel/blob/master/patchset/0008-lavc-vaapi_encode_h265-add-support-for-low-power-mod.patch
[2] https://github.com/intel/libva/pull/379

- Linjie
Mark Thompson March 7, 2020, 4:30 p.m. UTC | #2
On 05/03/2020 05:13, Fu, Linjie wrote:
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
>> Mark Thompson
>> Sent: Thursday, March 5, 2020 08:25
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: [FFmpeg-devel] [PATCH 3/4] vaapi_encode_h265: Query encoding
>> block sizes and features
>>
>> ---
>> Requires <https://github.com/intel/libva/pull/385>.  That isn't upstream, so
>> this will need to wait for that and then get at least a fix to the version
>> numbering before applying.
> 
> The query looks good to me since I'm suffering from the differences of default
> parameters allowed for VDENC and VMEPAK in media-driver while adding support
> for low power encoding [1] for VAAPI. 
> 
> However got several questions:
> 
> 1. Why not include PPS parameters as well? There are some differences in
> cu_qp_delta_enabled_flag and diff_cu_qp_delta_depth supported for VAEntrypointEncSliceLP.

Support for cu_qp_delta_enabled_flag is implied by the presence of non-CQP RC modes (maybe I should add a comment saying that explicitly?).

The possible range of diff_cu_qp_delta_depth is set by min_cu_qp_delta_block_size_minus3, since the variable you actually want to define is Log2MinCuQpDeltaSize.  (So for example that would be 1 if the smallest block supporting a qp_delta is 16x16 (Log2MinCuQpDeltaSize = 4), and then depth could be up to 2 on 64x64 CTBs or 1 on 32x32 CTBs.)

> 2. Would it be better to only involve the variable parameters for sps/pps... ? This is 
> what PR 379 [2] (unfinished) intended to do.

I'm not sure what you mean by this question - all of these are variable parameters in SPS/PPS.

> 3. The suggested value of SAO enable flag could also be 1 in corresponding code in
> media-driver, which would provide 3% BD-Rate improvement with no fps penalty
> based on previous experiment with FFmpeg VAAPI.

Huh, nice.  That never worked in Gen 9, so I guess that's another difference which needs the query information to express.

> 4. There is separate media_libva_caps for gen11 and gen12, maybe need further update
> or centralization for the code in driver. (not tested yet, will do later)

Since the documentation has not been released this needs someone from Intel to define what the capabilities actually are.  As I wrote in the PR against iHD, all I have are guesses based on what I can observe working (or not).

- Mark
Fu, Linjie March 8, 2020, 4:06 a.m. UTC | #3
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Mark Thompson
> Sent: Sunday, March 8, 2020 00:31
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH 3/4] vaapi_encode_h265: Query
> encoding block sizes and features
> 
> On 05/03/2020 05:13, Fu, Linjie wrote:
> >> -----Original Message-----
> >> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> >> Mark Thompson
> >> Sent: Thursday, March 5, 2020 08:25
> >> To: ffmpeg-devel@ffmpeg.org
> >> Subject: [FFmpeg-devel] [PATCH 3/4] vaapi_encode_h265: Query
> encoding
> >> block sizes and features
> >>
> >> ---
> >> Requires <https://github.com/intel/libva/pull/385>.  That isn't upstream,
> so
> >> this will need to wait for that and then get at least a fix to the version
> >> numbering before applying.
> >
> > The query looks good to me since I'm suffering from the differences of
> default
> > parameters allowed for VDENC and VMEPAK in media-driver while adding
> support
> > for low power encoding [1] for VAAPI.
> >
> > However got several questions:
> >
> > 1. Why not include PPS parameters as well? There are some differences in
> > cu_qp_delta_enabled_flag and diff_cu_qp_delta_depth supported for
> VAEntrypointEncSliceLP.
> 
> Support for cu_qp_delta_enabled_flag is implied by the presence of non-
> CQP RC modes (maybe I should add a comment saying that explicitly?).
> 
> The possible range of diff_cu_qp_delta_depth is set by
> min_cu_qp_delta_block_size_minus3, since the variable you actually want to
> define is Log2MinCuQpDeltaSize.  (So for example that would be 1 if the
> smallest block supporting a qp_delta is 16x16 (Log2MinCuQpDeltaSize = 4),
> and then depth could be up to 2 on 64x64 CTBs or 1 on 32x32 CTBs.)

Yes you are right, this should be not related with LP,  setting the depth would
make sense. 
Log2MinCuQpDeltaSize = CtbLog2SizeY − diff_cu_qp_delta_depth;

> > 2. Would it be better to only involve the variable parameters for sps/pps... ?
> This is
> > what PR 379 [2] (unfinished) intended to do.
> 
> I'm not sure what you mean by this question - all of these are variable
> parameters in SPS/PPS.
>
> > 3. The suggested value of SAO enable flag could also be 1 in corresponding
> code in
> > media-driver, which would provide 3% BD-Rate improvement with no fps
> penalty
> > based on previous experiment with FFmpeg VAAPI.
> 
> Huh, nice.  That never worked in Gen 9, so I guess that's another difference
> which needs the query information to express.
> 
> > 4. There is separate media_libva_caps for gen11 and gen12, maybe need
> further update
> > or centralization for the code in driver. (not tested yet, will do later)
> 
> Since the documentation has not been released this needs someone from
> Intel to define what the capabilities actually are.  As I wrote in the PR against
> iHD, all I have are guesses based on what I can observe working (or not).
> 

I'll sync with media-driver and try to get capabilities for each platform.
Thanks for the PRs in libva and iHD.

- Linjie
diff mbox series

Patch

diff --git a/libavcodec/vaapi_encode_h265.c b/libavcodec/vaapi_encode_h265.c
index dcc22eb610..59d150f503 100644
--- a/libavcodec/vaapi_encode_h265.c
+++ b/libavcodec/vaapi_encode_h265.c
@@ -56,6 +56,7 @@  typedef struct VAAPIEncodeH265Context {
     VAAPIEncodeContext common;
 
     // Encoder features.
+    uint32_t va_features;
     uint32_t ctu_size;
     uint32_t min_cb_size;
 
@@ -440,23 +441,54 @@  static int vaapi_encode_h265_init_sequence_params(AVCodecContext *avctx)
             vps->vps_max_latency_increase_plus1[i];
     }
 
-    // These have to come from the capabilities of the encoder.  We have no
-    // way to query them, so just hardcode parameters which work on the Intel
-    // driver.
-    // CTB size from 8x8 to 32x32.
-    sps->log2_min_luma_coding_block_size_minus3   = 0;
-    sps->log2_diff_max_min_luma_coding_block_size = 2;
-    // Transform size from 4x4 to 32x32.
-    sps->log2_min_luma_transform_block_size_minus2   = 0;
-    sps->log2_diff_max_min_luma_transform_block_size = 3;
-    // Full transform hierarchy allowed (2-5).
-    sps->max_transform_hierarchy_depth_inter = 3;
-    sps->max_transform_hierarchy_depth_intra = 3;
-    // AMP works.
-    sps->amp_enabled_flag = 1;
-    // SAO and temporal MVP do not work.
-    sps->sample_adaptive_offset_enabled_flag = 0;
-    sps->sps_temporal_mvp_enabled_flag       = 0;
+#if VA_CHECK_VERSION(1, 7, 0)
+    if (priv->va_features) {
+        VAConfigAttribValEncHEVCFeatures features = { .value = priv->va_features };
+
+        sps->log2_min_luma_coding_block_size_minus3 =
+            ff_ctz(priv->min_cb_size) - 3;
+        sps->log2_diff_max_min_luma_coding_block_size =
+            ff_ctz(priv->ctu_size) - ff_ctz(priv->min_cb_size);
+
+        sps->log2_min_luma_transform_block_size_minus2 =
+            features.bits.log2_min_luma_transform_block_size_minus2;
+        sps->log2_diff_max_min_luma_transform_block_size =
+            features.bits.log2_max_luma_transform_block_size_minus2 -
+            features.bits.log2_min_luma_transform_block_size_minus2;
+
+        sps->max_transform_hierarchy_depth_inter =
+            features.bits.max_transform_hierarchy_depth_inter;
+        sps->max_transform_hierarchy_depth_intra =
+            features.bits.max_transform_hierarchy_depth_intra;
+
+        sps->amp_enabled_flag =
+            features.bits.amp_supported;
+        sps->sample_adaptive_offset_enabled_flag =
+            features.bits.sao_supported;
+        sps->sps_temporal_mvp_enabled_flag =
+            features.bits.temporal_mvp_supported;
+    } else
+#endif
+    {
+        // These values come from the capabilities of the first encoder
+        // implementation in the i965 driver on Intel Skylake.  They may
+        // fail badly with other platforms or drivers.
+        // CTB size from 8x8 to 32x32.
+        sps->log2_min_luma_coding_block_size_minus3   = 0;
+        sps->log2_diff_max_min_luma_coding_block_size = 2;
+        // Transform size from 4x4 to 32x32.
+        sps->log2_min_luma_transform_block_size_minus2   = 0;
+        sps->log2_diff_max_min_luma_transform_block_size = 3;
+        // Full transform hierarchy allowed (2-5).
+        sps->max_transform_hierarchy_depth_inter = 3;
+        sps->max_transform_hierarchy_depth_intra = 3;
+        // AMP works.
+        sps->amp_enabled_flag = 1;
+        // SAO and temporal MVP do not work.
+        sps->sample_adaptive_offset_enabled_flag = 0;
+        sps->sps_temporal_mvp_enabled_flag       = 0;
+    }
+
 
     sps->pcm_enabled_flag = 0;
 
@@ -1073,6 +1105,31 @@  static av_cold void vaapi_encode_h265_block_size(AVCodecContext *avctx)
     VAAPIEncodeContext      *ctx = avctx->priv_data;
     VAAPIEncodeH265Context *priv = avctx->priv_data;
 
+#if VA_CHECK_VERSION(1, 7, 0)
+    {
+        VAConfigAttrib attr = { VAConfigAttribEncHEVCFeatures };
+        VAConfigAttribValEncHEVCFeatures features;
+        VAStatus vas;
+
+        vas = vaGetConfigAttributes(ctx->hwctx->display, ctx->va_profile,
+                                    ctx->va_entrypoint, &attr, 1);
+        if (vas != VA_STATUS_SUCCESS) {
+            av_log(avctx, AV_LOG_WARNING, "Failed to query encoder "
+                   "features, using guessed defaults.\n");
+        } else if (attr.value == VA_ATTRIB_NOT_SUPPORTED) {
+            av_log(avctx, AV_LOG_WARNING, "Driver does not advertise "
+                   "encoder features, using guessed defaults.\n");
+        } else {
+            features.value = priv->va_features = attr.value;
+
+            priv->ctu_size =
+                1 << features.bits.log2_max_coding_tree_block_size_minus3 + 3;
+            priv->min_cb_size =
+                1 << features.bits.log2_min_luma_coding_block_size_minus3 + 3;
+        }
+    }
+#endif
+
     if (!priv->ctu_size) {
         priv->ctu_size     = 32;
         priv->min_cb_size  = 16;