diff mbox

[FFmpeg-devel,1/2] vc1dec: support multiple slices in frame coded images with hwaccel

Message ID 20161118141137.1888-1-h.leppkes@gmail.com
State Accepted
Headers show

Commit Message

Hendrik Leppkes Nov. 18, 2016, 2:11 p.m. UTC
Based on a patch by Jun Zhao <mypopydev@gmail.com>
---
 libavcodec/vc1dec.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

Comments

Hendrik Leppkes Nov. 18, 2016, 2:18 p.m. UTC | #1
On Fri, Nov 18, 2016 at 3:11 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
> Based on a patch by Jun Zhao <mypopydev@gmail.com>
> ---
>  libavcodec/vc1dec.c | 41 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 2 deletions(-)
>

As a quick run down on the whole set:

- This does fix decoding of frame-coded samples (ie. fate-vc1_sa10091
and fate-vc1_sa20021) on NVIDIA using DXVA2, no regressions using
DXVA2 are known
- This does fix decoding of these samples on Intel GPUs using VAAPI,
however it appears to break on AMD using VAAPI. Important to note here
however is that this change is matching up with the vaapi spec, and
the AMD implementation is more or less hacky.
- VDPAU seems to overall be unimpressed and keeps working as before.

I hope I summarized the non-DXVA2 cases properly, as I didn't test
those personally, but relied on data from Mark Thompson.
Despite the breakage of AMD VAAPI decoding, this change appears to be
correct in the sense that it matches the VAAPI and DXVA2 specs on how
to handle slices and fixes decoding on two different GPUs, using those
two APIs.

- Hendrik
Michael Niedermayer Nov. 18, 2016, 5:40 p.m. UTC | #2
On Fri, Nov 18, 2016 at 03:18:27PM +0100, Hendrik Leppkes wrote:
> On Fri, Nov 18, 2016 at 3:11 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
> > Based on a patch by Jun Zhao <mypopydev@gmail.com>
> > ---
> >  libavcodec/vc1dec.c | 41 +++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 39 insertions(+), 2 deletions(-)
> >
> 
> As a quick run down on the whole set:
> 
> - This does fix decoding of frame-coded samples (ie. fate-vc1_sa10091
> and fate-vc1_sa20021) on NVIDIA using DXVA2, no regressions using
> DXVA2 are known
> - This does fix decoding of these samples on Intel GPUs using VAAPI,
> however it appears to break on AMD using VAAPI. Important to note here
> however is that this change is matching up with the vaapi spec, and
> the AMD implementation is more or less hacky.
> - VDPAU seems to overall be unimpressed and keeps working as before.
> 
> I hope I summarized the non-DXVA2 cases properly, as I didn't test
> those personally, but relied on data from Mark Thompson.
> Despite the breakage of AMD VAAPI decoding, this change appears to be
> correct in the sense that it matches the VAAPI and DXVA2 specs on how
> to handle slices and fixes decoding on two different GPUs, using those
> two APIs.

can the amd case be detected and the implementation adapted so
everything works ?


[...]
Hendrik Leppkes Nov. 18, 2016, 6:42 p.m. UTC | #3
On Fri, Nov 18, 2016 at 6:40 PM, Michael Niedermayer
<michael@niedermayer.cc> wrote:
> On Fri, Nov 18, 2016 at 03:18:27PM +0100, Hendrik Leppkes wrote:
>> On Fri, Nov 18, 2016 at 3:11 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
>> > Based on a patch by Jun Zhao <mypopydev@gmail.com>
>> > ---
>> >  libavcodec/vc1dec.c | 41 +++++++++++++++++++++++++++++++++++++++--
>> >  1 file changed, 39 insertions(+), 2 deletions(-)
>> >
>>
>> As a quick run down on the whole set:
>>
>> - This does fix decoding of frame-coded samples (ie. fate-vc1_sa10091
>> and fate-vc1_sa20021) on NVIDIA using DXVA2, no regressions using
>> DXVA2 are known
>> - This does fix decoding of these samples on Intel GPUs using VAAPI,
>> however it appears to break on AMD using VAAPI. Important to note here
>> however is that this change is matching up with the vaapi spec, and
>> the AMD implementation is more or less hacky.
>> - VDPAU seems to overall be unimpressed and keeps working as before.
>>
>> I hope I summarized the non-DXVA2 cases properly, as I didn't test
>> those personally, but relied on data from Mark Thompson.
>> Despite the breakage of AMD VAAPI decoding, this change appears to be
>> correct in the sense that it matches the VAAPI and DXVA2 specs on how
>> to handle slices and fixes decoding on two different GPUs, using those
>> two APIs.
>
> can the amd case be detected and the implementation adapted so
> everything works ?
>

Not really. Lets just let them fix the implementation in mesa, Mark
tells me its generally considered hacky and not very stable/reliable
in the first place.

- Hendrik
Mark Thompson Nov. 20, 2016, 3:20 p.m. UTC | #4
On 18/11/16 14:18, Hendrik Leppkes wrote:
> On Fri, Nov 18, 2016 at 3:11 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
>> Based on a patch by Jun Zhao <mypopydev@gmail.com>
>> ---
>>  libavcodec/vc1dec.c | 41 +++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 39 insertions(+), 2 deletions(-)
>>
> 
> As a quick run down on the whole set:
> 
> - This does fix decoding of frame-coded samples (ie. fate-vc1_sa10091
> and fate-vc1_sa20021) on NVIDIA using DXVA2, no regressions using
> DXVA2 are known
> - This does fix decoding of these samples on Intel GPUs using VAAPI,
> however it appears to break on AMD using VAAPI. Important to note here
> however is that this change is matching up with the vaapi spec, and
> the AMD implementation is more or less hacky.
> - VDPAU seems to overall be unimpressed and keeps working as before.
> 
> I hope I summarized the non-DXVA2 cases properly, as I didn't test
> those personally, but relied on data from Mark Thompson.
> Despite the breakage of AMD VAAPI decoding, this change appears to be
> correct in the sense that it matches the VAAPI and DXVA2 specs on how
> to handle slices and fixes decoding on two different GPUs, using those
> two APIs.

Yes, the mesa driver was subverting the API to make it work before (and it may not even have been deliberate).

Retested, patch LGTM: this is definitely a positive change even with that regression.

I'll add looking at the mesa driver to my todo list, it should certainly be fixed there rather than in lavc.

Thanks,

- Mark
Carl Eugen Hoyos Nov. 20, 2016, 4:16 p.m. UTC | #5
2016-11-18 19:42 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>:
> On Fri, Nov 18, 2016 at 6:40 PM, Michael Niedermayer
> <michael@niedermayer.cc> wrote:
>> On Fri, Nov 18, 2016 at 03:18:27PM +0100, Hendrik Leppkes wrote:

>>> - This does fix decoding of these samples on Intel GPUs using VAAPI,
>>> however it appears to break on AMD using VAAPI. Important to note here
>>> however is that this change is matching up with the vaapi spec, and
>>> the AMD implementation is more or less hacky.
>>> - VDPAU seems to overall be unimpressed and keeps working as before.
>>>
>>> I hope I summarized the non-DXVA2 cases properly, as I didn't test
>>> those personally, but relied on data from Mark Thompson.
>>> Despite the breakage of AMD VAAPI decoding, this change appears to be
>>> correct in the sense that it matches the VAAPI and DXVA2 specs on how
>>> to handle slices and fixes decoding on two different GPUs, using those
>>> two APIs.
>>
>> can the amd case be detected and the implementation adapted so
>> everything works ?
>
> Not really.

There is no API to read the version of the mesa driver?

Carl Eugen
Carl Eugen Hoyos Nov. 20, 2016, 4:19 p.m. UTC | #6
2016-11-18 15:18 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>:

> - This does fix decoding of these samples on Intel GPUs using VAAPI,

> however it appears to break on AMD using VAAPI.

Then it needs at least a Changelog entry.

Carl Eugen
Mark Thompson Nov. 20, 2016, 4:50 p.m. UTC | #7
On 20/11/16 16:16, Carl Eugen Hoyos wrote:
> 2016-11-18 19:42 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>:
>> On Fri, Nov 18, 2016 at 6:40 PM, Michael Niedermayer
>> <michael@niedermayer.cc> wrote:
>>> On Fri, Nov 18, 2016 at 03:18:27PM +0100, Hendrik Leppkes wrote:
> 
>>>> - This does fix decoding of these samples on Intel GPUs using VAAPI,
>>>> however it appears to break on AMD using VAAPI. Important to note here
>>>> however is that this change is matching up with the vaapi spec, and
>>>> the AMD implementation is more or less hacky.
>>>> - VDPAU seems to overall be unimpressed and keeps working as before.
>>>>
>>>> I hope I summarized the non-DXVA2 cases properly, as I didn't test
>>>> those personally, but relied on data from Mark Thompson.
>>>> Despite the breakage of AMD VAAPI decoding, this change appears to be
>>>> correct in the sense that it matches the VAAPI and DXVA2 specs on how
>>>> to handle slices and fixes decoding on two different GPUs, using those
>>>> two APIs.
>>>
>>> can the amd case be detected and the implementation adapted so
>>> everything works ?
>>
>> Not really.
> 
> There is no API to read the version of the mesa driver?

The one useful call is vaQueryVendorString(), which indeed returns useful results for i965.  Unfortunately, all mesa returns here is a fixed string:

https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/context.c#n176

- Mark
Hendrik Leppkes Nov. 20, 2016, 4:54 p.m. UTC | #8
On Sun, Nov 20, 2016 at 5:16 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2016-11-18 19:42 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>:
>> On Fri, Nov 18, 2016 at 6:40 PM, Michael Niedermayer
>> <michael@niedermayer.cc> wrote:
>>> On Fri, Nov 18, 2016 at 03:18:27PM +0100, Hendrik Leppkes wrote:
>
>>>> - This does fix decoding of these samples on Intel GPUs using VAAPI,
>>>> however it appears to break on AMD using VAAPI. Important to note here
>>>> however is that this change is matching up with the vaapi spec, and
>>>> the AMD implementation is more or less hacky.
>>>> - VDPAU seems to overall be unimpressed and keeps working as before.
>>>>
>>>> I hope I summarized the non-DXVA2 cases properly, as I didn't test
>>>> those personally, but relied on data from Mark Thompson.
>>>> Despite the breakage of AMD VAAPI decoding, this change appears to be
>>>> correct in the sense that it matches the VAAPI and DXVA2 specs on how
>>>> to handle slices and fixes decoding on two different GPUs, using those
>>>> two APIs.
>>>
>>> can the amd case be detected and the implementation adapted so
>>> everything works ?
>>
>> Not really.
>
> There is no API to read the version of the mesa driver?
>

Perhaps in some obscure way, but this is in generic code which doesn't
even know what kind of hwaccel is being used, nevermind access to the
internals of this hwaccel.
Trying to hack this in would be *extremely* ugly and breach a bunch of
the encapsulations we have to shield generic decoders from any hwaccel
implementation specifics.

Really, mesa should just fix their code to adhere to the vaapi
specification for slices
In reality, I have never seen a real-world video using these features,
otherwise I would have known about this lack of support in DXVA2,
which I did not. So its just not worth hacking around driver issues in
a very ugly way.

- Hendrik
Carl Eugen Hoyos Nov. 20, 2016, 6:14 p.m. UTC | #9
2016-11-20 17:50 GMT+01:00 Mark Thompson <sw@jkqxz.net>:
> On 20/11/16 16:16, Carl Eugen Hoyos wrote:

>> There is no API to read the version of the mesa driver?
>
> The one useful call is vaQueryVendorString(), which indeed returns useful results for i965.  Unfortunately, all mesa returns here is a fixed string:
>
> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/context.c#n176

Thank you both for explaining!

Please consider to make sure mesa knows about the issue...

Carl Eugen
Hendrik Leppkes Nov. 25, 2016, 9:30 p.m. UTC | #10
On Sun, Nov 20, 2016 at 7:14 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2016-11-20 17:50 GMT+01:00 Mark Thompson <sw@jkqxz.net>:
>> On 20/11/16 16:16, Carl Eugen Hoyos wrote:
>
>>> There is no API to read the version of the mesa driver?
>>
>> The one useful call is vaQueryVendorString(), which indeed returns useful results for i965.  Unfortunately, all mesa returns here is a fixed string:
>>
>> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/context.c#n176
>
> Thank you both for explaining!
>
> Please consider to make sure mesa knows about the issue...
>

Unless there are more comments I'll apply this tomorrow with changelog
and micro version bump.

- Hendrik
Hendrik Leppkes Nov. 26, 2016, 12:19 p.m. UTC | #11
On Fri, Nov 25, 2016 at 10:30 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
> On Sun, Nov 20, 2016 at 7:14 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>> 2016-11-20 17:50 GMT+01:00 Mark Thompson <sw@jkqxz.net>:
>>> On 20/11/16 16:16, Carl Eugen Hoyos wrote:
>>
>>>> There is no API to read the version of the mesa driver?
>>>
>>> The one useful call is vaQueryVendorString(), which indeed returns useful results for i965.  Unfortunately, all mesa returns here is a fixed string:
>>>
>>> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/context.c#n176
>>
>> Thank you both for explaining!
>>
>> Please consider to make sure mesa knows about the issue...
>>
>
> Unless there are more comments I'll apply this tomorrow with changelog
> and micro version bump.
>

And pushed.

- Hendrik
diff mbox

Patch

diff --git a/libavcodec/vc1dec.c b/libavcodec/vc1dec.c
index 4f78aa8..75d3365 100644
--- a/libavcodec/vc1dec.c
+++ b/libavcodec/vc1dec.c
@@ -634,6 +634,8 @@  static int vc1_decode_frame(AVCodecContext *avctx, void *data,
         uint8_t *buf;
         GetBitContext gb;
         int mby_start;
+        const uint8_t *rawbuf;
+        int raw_size;
     } *slices = NULL, *tmp;
 
     v->second_field = 0;
@@ -716,6 +718,8 @@  static int vc1_decode_frame(AVCodecContext *avctx, void *data,
                     /* assuming that the field marker is at the exact middle,
                        hope it's correct */
                     slices[n_slices].mby_start = s->mb_height + 1 >> 1;
+                    slices[n_slices].rawbuf = start;
+                    slices[n_slices].raw_size = size + 4;
                     n_slices1 = n_slices - 1; // index of the last slice of the first field
                     n_slices++;
                     break;
@@ -743,6 +747,8 @@  static int vc1_decode_frame(AVCodecContext *avctx, void *data,
                     init_get_bits(&slices[n_slices].gb, slices[n_slices].buf,
                                   buf_size3 << 3);
                     slices[n_slices].mby_start = get_bits(&slices[n_slices].gb, 9);
+                    slices[n_slices].rawbuf = start;
+                    slices[n_slices].raw_size = size + 4;
                     n_slices++;
                     break;
                 }
@@ -779,6 +785,8 @@  static int vc1_decode_frame(AVCodecContext *avctx, void *data,
                 init_get_bits(&slices[n_slices].gb, slices[n_slices].buf,
                               buf_size3 << 3);
                 slices[n_slices].mby_start = s->mb_height + 1 >> 1;
+                slices[n_slices].rawbuf = divider;
+                slices[n_slices].raw_size = buf + buf_size - divider;
                 n_slices1 = n_slices - 1;
                 n_slices++;
             }
@@ -921,6 +929,7 @@  static int vc1_decode_frame(AVCodecContext *avctx, void *data,
     } else
 #endif
     if (avctx->hwaccel) {
+        s->mb_y = 0;
         if (v->field_mode && buf_start_second_field) {
             // decode first field
             s->picture_structure = PICT_BOTTOM_FIELD - v->tff;
@@ -953,8 +962,36 @@  static int vc1_decode_frame(AVCodecContext *avctx, void *data,
             s->picture_structure = PICT_FRAME;
             if ((ret = avctx->hwaccel->start_frame(avctx, buf_start, (buf + buf_size) - buf_start)) < 0)
                 goto err;
-            if ((ret = avctx->hwaccel->decode_slice(avctx, buf_start, (buf + buf_size) - buf_start)) < 0)
-                goto err;
+
+            if (n_slices == 0) {
+                // no slices, decode the frame as-is
+                if ((ret = avctx->hwaccel->decode_slice(avctx, buf_start, (buf + buf_size) - buf_start)) < 0)
+                    goto err;
+            } else {
+                // decode the frame part as the first slice
+                if ((ret = avctx->hwaccel->decode_slice(avctx, buf_start, slices[0].rawbuf - buf_start)) < 0)
+                    goto err;
+
+                // and process the slices as additional slices afterwards
+                for (i = 0 ; i < n_slices; i++) {
+                    s->gb = slices[i].gb;
+                    s->mb_y = slices[i].mby_start;
+
+                    v->pic_header_flag = get_bits1(&s->gb);
+                    if (v->pic_header_flag) {
+                        if (ff_vc1_parse_frame_header_adv(v, &s->gb) < 0) {
+                            av_log(v->s.avctx, AV_LOG_ERROR, "Slice header damaged\n");
+                            ret = AVERROR_INVALIDDATA;
+                            if (avctx->err_recognition & AV_EF_EXPLODE)
+                                goto err;
+                            continue;
+                        }
+                    }
+
+                    if ((ret = avctx->hwaccel->decode_slice(avctx, slices[i].rawbuf, slices[i].raw_size)) < 0)
+                        goto err;
+                }
+            }
             if ((ret = avctx->hwaccel->end_frame(avctx)) < 0)
                 goto err;
         }