diff mbox series

[FFmpeg-devel] libavc/libx264: add support to propagate SSE values through encoder stats

Message ID 235b296cd67fed2c79862d4403dbe268d2f7e25b.camel@amazon.it
State New
Headers show
Series [FFmpeg-devel] libavc/libx264: add support to propagate SSE values through encoder stats | expand

Checks

Context Check Description
andriy/commit_msg_x86 warning The first line of the commit message must start with a context terminated by a colon and a space, for example "lavu/opt: " or "doc: ".
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Diego Felix de Souza via ffmpeg-devel Sept. 23, 2023, 10:04 a.m. UTC
Hi,
please find attached a patch to propagate the SSE for a frame into the
encoder stats.
Since libx264 already provides PSNR values, this is done by basically
inverting the formula to recover the SSE values.

Would it be possible to also append other values to the errors vector?
E.g., libx264 also computes SSIM but other values could be provided.

Best,
Elias



NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale: 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico

Comments

Diego Felix de Souza via ffmpeg-devel Oct. 2, 2023, 5:35 p.m. UTC | #1
> Hi,
> please find attached a patch to propagate the SSE for a frame into
> the
> encoder stats.
> Since libx264 already provides PSNR values, this is done by basically
> inverting the formula to recover the SSE values.
> 
> Would it be possible to also append other values to the errors
> vector?
> E.g., libx264 also computes SSIM but other values could be provided.
> 
> Best,
> Elias

Hi,
very minor fix to a minor patch.
I noticed I wasn't rounding values properly.

Could someone have a look, please?
Elias



NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale: 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico
Anton Khirnov Oct. 10, 2023, 10:54 a.m. UTC | #2
Quoting Carotti, Elias via ffmpeg-devel (2023-10-02 19:35:09)
> diff --git a/libavcodec/libx264.c b/libavcodec/libx264.c
> index 77a9f173b4..4c643c9066 100644
> --- a/libavcodec/libx264.c
> +++ b/libavcodec/libx264.c
> @@ -129,6 +129,8 @@ typedef struct X264Context {
>      int roi_warned;
> 
>      int mb_info;
> +
> +    int64_t sse[3];

The values don't need to be preserved across frames, so might as well
put this on stack in the block calling ff_side_data_set_encoder_stats().

>  } X264Context;
> 
>  static void X264_log(void *p, int level, const char *fmt, va_list args)
> @@ -726,7 +728,40 @@ FF_ENABLE_DEPRECATION_WARNINGS
> 
>      pkt->flags |= AV_PKT_FLAG_KEY*pic_out.b_keyframe;
>      if (ret) {
> -        ff_side_data_set_encoder_stats(pkt, (pic_out.i_qpplus1 - 1) * FF_QP2LAMBDA, NULL, 0, pict_type);
> +        const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(csp_to_pixfmt(pic_out.img.i_csp));
> +        int error_count = 0;
> +        int64_t *errors = NULL;
> +
> +        if (ctx->flags & AV_CODEC_FLAG_PSNR) {
> +            double scale[3] = { 1,
> +                (1 << pix_desc->log2_chroma_h) * (double)(1 << pix_desc->log2_chroma_w),
> +                (1 << pix_desc->log2_chroma_h) * (double)(1 << pix_desc->log2_chroma_w),

Any particular reason the cast is on the second value? It looks strange.

> +            };
> +            double mse;
> +            int i;
> +
> +            error_count = pix_desc->nb_components;
> +
> +            av_log(ctx, AV_LOG_DEBUG, "PSNR values from libx264: %.3f %.3f %.3f.\n",
> +                   pic_out.prop.f_psnr[0], pic_out.prop.f_psnr[1], pic_out.prop.f_psnr[2]);
> +
> +            for (i = 0; i < pix_desc->nb_components; ++i) {

for (int i ....
Diego Felix de Souza via ffmpeg-devel Oct. 11, 2023, 10:54 a.m. UTC | #3
Hi Anton, 

On Tue, 2023-10-10 at 12:54 +0200, Anton Khirnov wrote:
> 
> Quoting Carotti, Elias via ffmpeg-devel (2023-10-02 19:35:09)
> > diff --git a/libavcodec/libx264.c b/libavcodec/libx264.c
> > index 77a9f173b4..4c643c9066 100644
> > --- a/libavcodec/libx264.c
> > +++ b/libavcodec/libx264.c
> > @@ -129,6 +129,8 @@ typedef struct X264Context {
> >      int roi_warned;
> > 
> >      int mb_info;
> > +
> > +    int64_t sse[3];
> 
> The values don't need to be preserved across frames, so might as well
> put this on stack in the block calling
> ff_side_data_set_encoder_stats().

Agreed.

> 
> >  } X264Context;
> > 
> >  static void X264_log(void *p, int level, const char *fmt, va_list
> > args)
> > @@ -726,7 +728,40 @@ FF_ENABLE_DEPRECATION_WARNINGS
> > 
> >      pkt->flags |= AV_PKT_FLAG_KEY*pic_out.b_keyframe;
> >      if (ret) {
> > -        ff_side_data_set_encoder_stats(pkt, (pic_out.i_qpplus1 -
> > 1) * FF_QP2LAMBDA, NULL, 0, pict_type);
> > +        const AVPixFmtDescriptor *pix_desc =
> > av_pix_fmt_desc_get(csp_to_pixfmt(pic_out.img.i_csp));
> > +        int error_count = 0;
> > +        int64_t *errors = NULL;
> > +
> > +        if (ctx->flags & AV_CODEC_FLAG_PSNR) {
> > +            double scale[3] = { 1,
> > +                (1 << pix_desc->log2_chroma_h) * (double)(1 <<
> > pix_desc->log2_chroma_w),
> > +                (1 << pix_desc->log2_chroma_h) * (double)(1 <<
> > pix_desc->log2_chroma_w),
> 
> Any particular reason the cast is on the second value? It looks
> strange.
> 

Just my habit. Fixed.

> > +            };
> > +            double mse;
> > +            int i;
> > +
> > +            error_count = pix_desc->nb_components;
> > +
> > +            av_log(ctx, AV_LOG_DEBUG, "PSNR values from libx264:
> > %.3f %.3f %.3f.\n",
> > +                   pic_out.prop.f_psnr[0], pic_out.prop.f_psnr[1],
> > pic_out.prop.f_psnr[2]);
> > +
> > +            for (i = 0; i < pix_desc->nb_components; ++i) {
> 
> for (int i ....

Agreed.

I also found the - (minus) sign in the mse formula was wrong and I
removed it.
Numbers seem to be coherent with those from libx264.
Please find attached a new patch rebased against the latest master with
the above fixes.

There is an increasing error (over increasing PSNRs and resolutions)
when reconstructing the PSNR from the SSE as computed above due to the
approximations and the roundings back and forth, however it seems to
yield similar values as those computed by libx264.

Best





NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale: 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico
Anton Khirnov Oct. 13, 2023, 2:16 p.m. UTC | #4
Quoting Carotti, Elias via ffmpeg-devel (2023-10-11 12:54:21)
> diff --git a/libavcodec/libx264.c b/libavcodec/libx264.c
> index 77a9f173b4..85bd870f5d 100644
> --- a/libavcodec/libx264.c
> +++ b/libavcodec/libx264.c
> @@ -726,7 +726,39 @@ FF_ENABLE_DEPRECATION_WARNINGS
> 
>      pkt->flags |= AV_PKT_FLAG_KEY*pic_out.b_keyframe;
>      if (ret) {
> -        ff_side_data_set_encoder_stats(pkt, (pic_out.i_qpplus1 - 1) * FF_QP2LAMBDA, NULL, 0, pict_type);
> +        const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(csp_to_pixfmt(pic_out.img.i_csp));

There's a problem here - we do not handle all values of i_csp.
E.g. we have no equivalent of X264_CSP_NV12 | X264_CSP_HIGH_DEPTH, which
x264 will use for YUV420P10 input.

The best solution is probably to use AVCodecContext.pix_fmt and assume
that x264 doesn't do any nontrivial (i.e. other than interleaving and
such) pixel format transformations internally.

> +        int error_count = 0;
> +        int64_t *errors = NULL;
> +        int64_t sse[3] = {0};
> +
> +        if (ctx->flags & AV_CODEC_FLAG_PSNR) {
> +            double scale[3] = { 1,
> +                (double)(1 << pix_desc->log2_chroma_h) * (1 << pix_desc->log2_chroma_w),
> +                (double)(1 << pix_desc->log2_chroma_h) * (1 << pix_desc->log2_chroma_w),
> +            };
> +
> +            error_count = pix_desc->nb_components;
> +
> +            av_log(ctx, AV_LOG_DEBUG, "PSNR values from libx264: %.3f %.3f %.3f.\n",
> +                   pic_out.prop.f_psnr[0], pic_out.prop.f_psnr[1], pic_out.prop.f_psnr[2]);

In my tests libx264 prints these values by itself, so this seems
redundant.
Diego Felix de Souza via ffmpeg-devel Oct. 13, 2023, 4:35 p.m. UTC | #5
On Fri, 2023-10-13 at 16:16 +0200, Anton Khirnov wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> 
> 
> 
> Quoting Carotti, Elias via ffmpeg-devel (2023-10-11 12:54:21)
> > diff --git a/libavcodec/libx264.c b/libavcodec/libx264.c
> > index 77a9f173b4..85bd870f5d 100644
> > --- a/libavcodec/libx264.c
> > +++ b/libavcodec/libx264.c
> > @@ -726,7 +726,39 @@ FF_ENABLE_DEPRECATION_WARNINGS
> > 
> >      pkt->flags |= AV_PKT_FLAG_KEY*pic_out.b_keyframe;
> >      if (ret) {
> > -        ff_side_data_set_encoder_stats(pkt, (pic_out.i_qpplus1 -
> > 1) * FF_QP2LAMBDA, NULL, 0, pict_type);
> > +        const AVPixFmtDescriptor *pix_desc =
> > av_pix_fmt_desc_get(csp_to_pixfmt(pic_out.img.i_csp));
> 
> There's a problem here - we do not handle all values of i_csp.
> E.g. we have no equivalent of X264_CSP_NV12 | X264_CSP_HIGH_DEPTH,
> which
> x264 will use for YUV420P10 input.
> 
> The best solution is probably to use AVCodecContext.pix_fmt and
> assume
> that x264 doesn't do any nontrivial (i.e. other than interleaving and
> such) pixel format transformations internally.
> 


I see. Wouldn't not outputting the SSE values when csp_to_pixfmt()
returns AV_PIX_FMT_NONE work better? 
That wouldn't be worse than it is today (meaning that right now we
don't get those values for any pix_fmt).
Anyway, I did as you suggested and used AVCodecContext.pix_fmt.


> > <snip>
> > +            av_log(ctx, AV_LOG_DEBUG, "PSNR values from libx264:
> > %.3f %.3f %.3f.\n",
> > +                   pic_out.prop.f_psnr[0], pic_out.prop.f_psnr[1],
> > pic_out.prop.f_psnr[2]);
> 
> In my tests libx264 prints these values by itself, so this seems
> redundant.

removed.




NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale: 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico
Anton Khirnov Oct. 19, 2023, 11:48 a.m. UTC | #6
Quoting Carotti, Elias via ffmpeg-devel (2023-09-23 12:04:28)
> Hi,
> please find attached a patch to propagate the SSE for a frame into the
> encoder stats.
> Since libx264 already provides PSNR values, this is done by basically
> inverting the formula to recover the SSE values.

Thanks, pushed.

> Would it be possible to also append other values to the errors vector?
> E.g., libx264 also computes SSIM but other values could be provided.

It is certainly conceivable, but it's a nontrivial question whether it
should be added to this side data or to a new one. Ideally the same type
of data should be produced both by this and the various
metrics-computing filters.
Anton Khirnov Oct. 19, 2023, 11:50 a.m. UTC | #7
Quoting Carotti, Elias via ffmpeg-devel (2023-10-13 18:35:14)
> I see. Wouldn't not outputting the SSE values when csp_to_pixfmt()
> returns AV_PIX_FMT_NONE work better? 
> That wouldn't be worse than it is today (meaning that right now we
> don't get those values for any pix_fmt).

It would work in strictly fewer cases than this patch, so I don't see
why you'd want to go that way.
diff mbox series

Patch

From dfd47efb7f7cb264fef62d0d8fb70ff8168bdfd7 Mon Sep 17 00:00:00 2001
From: Elias Carotti <eliascrt _at_ amazon _dot_ it>
Date: Fri, 15 Sep 2023 20:05:43 +0200
Subject: [PATCH] Add the SSE calculation for libx264.

Since libx264 only provides a per-frame per-channel PSNR, this is inverted to get back the SSE.
---
 libavcodec/libx264.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/libavcodec/libx264.c b/libavcodec/libx264.c
index 77a9f173b4..0fae155ff2 100644
--- a/libavcodec/libx264.c
+++ b/libavcodec/libx264.c
@@ -129,6 +129,8 @@  typedef struct X264Context {
     int roi_warned;

     int mb_info;
+
+    int64_t sse[3];
 } X264Context;

 static void X264_log(void *p, int level, const char *fmt, va_list args)
@@ -726,7 +728,40 @@  FF_ENABLE_DEPRECATION_WARNINGS

     pkt->flags |= AV_PKT_FLAG_KEY*pic_out.b_keyframe;
     if (ret) {
-        ff_side_data_set_encoder_stats(pkt, (pic_out.i_qpplus1 - 1) * FF_QP2LAMBDA, NULL, 0, pict_type);
+        const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(csp_to_pixfmt(pic_out.img.i_csp));
+        int error_count = 0;
+        int64_t *errors = NULL;
+
+        if (ctx->flags & AV_CODEC_FLAG_PSNR) {
+            double scale[3] = { 1,
+                (1 << pix_desc->log2_chroma_h) * (double)(1 << pix_desc->log2_chroma_w),
+                (1 << pix_desc->log2_chroma_h) * (double)(1 << pix_desc->log2_chroma_w),
+            };
+            double mse;
+            int i;
+
+            error_count = pix_desc->nb_components;
+
+            av_log(ctx, AV_LOG_DEBUG, "PSNR values from libx264: %.3f %.3f %.3f.\n",
+                   pic_out.prop.f_psnr[0], pic_out.prop.f_psnr[1], pic_out.prop.f_psnr[2]);
+
+            for (i = 0; i < pix_desc->nb_components; ++i) {
+                double max_value = (double)(1 << pix_desc->comp[i].depth) - 1.0;
+                double plane_size = ctx->width * ctx->height / scale[i];
+
+                /* psnr = -10 * log10(max_value * max_value / mse) */
+                mse = (max_value * max_value) / pow(10, -pic_out.prop.f_psnr[i] / 10.0);
+
+                /* SSE = MSE * width * height / scale -> because of possible chroma downsampling */
+                x4->sse[i] = (int64_t)floor(mse * plane_size);
+            };
+
+            errors = x4->sse;
+        }
+
+        ff_side_data_set_encoder_stats(pkt, (pic_out.i_qpplus1 - 1) * FF_QP2LAMBDA,
+                                       errors, error_count, pict_type);
+
         if (wallclock)
             ff_side_data_set_prft(pkt, wallclock);
     }
--
2.34.1