Message ID | 20161118141137.1888-1-h.leppkes@gmail.com |
---|---|
State | Accepted |
Headers | show |
On Fri, Nov 18, 2016 at 3:11 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote: > Based on a patch by Jun Zhao <mypopydev@gmail.com> > --- > libavcodec/vc1dec.c | 41 +++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 39 insertions(+), 2 deletions(-) > As a quick run down on the whole set: - This does fix decoding of frame-coded samples (ie. fate-vc1_sa10091 and fate-vc1_sa20021) on NVIDIA using DXVA2, no regressions using DXVA2 are known - This does fix decoding of these samples on Intel GPUs using VAAPI, however it appears to break on AMD using VAAPI. Important to note here however is that this change is matching up with the vaapi spec, and the AMD implementation is more or less hacky. - VDPAU seems to overall be unimpressed and keeps working as before. I hope I summarized the non-DXVA2 cases properly, as I didn't test those personally, but relied on data from Mark Thompson. Despite the breakage of AMD VAAPI decoding, this change appears to be correct in the sense that it matches the VAAPI and DXVA2 specs on how to handle slices and fixes decoding on two different GPUs, using those two APIs. - Hendrik
On Fri, Nov 18, 2016 at 03:18:27PM +0100, Hendrik Leppkes wrote: > On Fri, Nov 18, 2016 at 3:11 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote: > > Based on a patch by Jun Zhao <mypopydev@gmail.com> > > --- > > libavcodec/vc1dec.c | 41 +++++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 39 insertions(+), 2 deletions(-) > > > > As a quick run down on the whole set: > > - This does fix decoding of frame-coded samples (ie. fate-vc1_sa10091 > and fate-vc1_sa20021) on NVIDIA using DXVA2, no regressions using > DXVA2 are known > - This does fix decoding of these samples on Intel GPUs using VAAPI, > however it appears to break on AMD using VAAPI. Important to note here > however is that this change is matching up with the vaapi spec, and > the AMD implementation is more or less hacky. > - VDPAU seems to overall be unimpressed and keeps working as before. > > I hope I summarized the non-DXVA2 cases properly, as I didn't test > those personally, but relied on data from Mark Thompson. > Despite the breakage of AMD VAAPI decoding, this change appears to be > correct in the sense that it matches the VAAPI and DXVA2 specs on how > to handle slices and fixes decoding on two different GPUs, using those > two APIs. can the amd case be detected and the implementation adapted so everything works ? [...]
On Fri, Nov 18, 2016 at 6:40 PM, Michael Niedermayer <michael@niedermayer.cc> wrote: > On Fri, Nov 18, 2016 at 03:18:27PM +0100, Hendrik Leppkes wrote: >> On Fri, Nov 18, 2016 at 3:11 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote: >> > Based on a patch by Jun Zhao <mypopydev@gmail.com> >> > --- >> > libavcodec/vc1dec.c | 41 +++++++++++++++++++++++++++++++++++++++-- >> > 1 file changed, 39 insertions(+), 2 deletions(-) >> > >> >> As a quick run down on the whole set: >> >> - This does fix decoding of frame-coded samples (ie. fate-vc1_sa10091 >> and fate-vc1_sa20021) on NVIDIA using DXVA2, no regressions using >> DXVA2 are known >> - This does fix decoding of these samples on Intel GPUs using VAAPI, >> however it appears to break on AMD using VAAPI. Important to note here >> however is that this change is matching up with the vaapi spec, and >> the AMD implementation is more or less hacky. >> - VDPAU seems to overall be unimpressed and keeps working as before. >> >> I hope I summarized the non-DXVA2 cases properly, as I didn't test >> those personally, but relied on data from Mark Thompson. >> Despite the breakage of AMD VAAPI decoding, this change appears to be >> correct in the sense that it matches the VAAPI and DXVA2 specs on how >> to handle slices and fixes decoding on two different GPUs, using those >> two APIs. > > can the amd case be detected and the implementation adapted so > everything works ? > Not really. Lets just let them fix the implementation in mesa, Mark tells me its generally considered hacky and not very stable/reliable in the first place. - Hendrik
On 18/11/16 14:18, Hendrik Leppkes wrote: > On Fri, Nov 18, 2016 at 3:11 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote: >> Based on a patch by Jun Zhao <mypopydev@gmail.com> >> --- >> libavcodec/vc1dec.c | 41 +++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 39 insertions(+), 2 deletions(-) >> > > As a quick run down on the whole set: > > - This does fix decoding of frame-coded samples (ie. fate-vc1_sa10091 > and fate-vc1_sa20021) on NVIDIA using DXVA2, no regressions using > DXVA2 are known > - This does fix decoding of these samples on Intel GPUs using VAAPI, > however it appears to break on AMD using VAAPI. Important to note here > however is that this change is matching up with the vaapi spec, and > the AMD implementation is more or less hacky. > - VDPAU seems to overall be unimpressed and keeps working as before. > > I hope I summarized the non-DXVA2 cases properly, as I didn't test > those personally, but relied on data from Mark Thompson. > Despite the breakage of AMD VAAPI decoding, this change appears to be > correct in the sense that it matches the VAAPI and DXVA2 specs on how > to handle slices and fixes decoding on two different GPUs, using those > two APIs. Yes, the mesa driver was subverting the API to make it work before (and it may not even have been deliberate). Retested, patch LGTM: this is definitely a positive change even with that regression. I'll add looking at the mesa driver to my todo list, it should certainly be fixed there rather than in lavc. Thanks, - Mark
2016-11-18 19:42 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>: > On Fri, Nov 18, 2016 at 6:40 PM, Michael Niedermayer > <michael@niedermayer.cc> wrote: >> On Fri, Nov 18, 2016 at 03:18:27PM +0100, Hendrik Leppkes wrote: >>> - This does fix decoding of these samples on Intel GPUs using VAAPI, >>> however it appears to break on AMD using VAAPI. Important to note here >>> however is that this change is matching up with the vaapi spec, and >>> the AMD implementation is more or less hacky. >>> - VDPAU seems to overall be unimpressed and keeps working as before. >>> >>> I hope I summarized the non-DXVA2 cases properly, as I didn't test >>> those personally, but relied on data from Mark Thompson. >>> Despite the breakage of AMD VAAPI decoding, this change appears to be >>> correct in the sense that it matches the VAAPI and DXVA2 specs on how >>> to handle slices and fixes decoding on two different GPUs, using those >>> two APIs. >> >> can the amd case be detected and the implementation adapted so >> everything works ? > > Not really. There is no API to read the version of the mesa driver? Carl Eugen
2016-11-18 15:18 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>: > - This does fix decoding of these samples on Intel GPUs using VAAPI, > however it appears to break on AMD using VAAPI. Then it needs at least a Changelog entry. Carl Eugen
On 20/11/16 16:16, Carl Eugen Hoyos wrote: > 2016-11-18 19:42 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>: >> On Fri, Nov 18, 2016 at 6:40 PM, Michael Niedermayer >> <michael@niedermayer.cc> wrote: >>> On Fri, Nov 18, 2016 at 03:18:27PM +0100, Hendrik Leppkes wrote: > >>>> - This does fix decoding of these samples on Intel GPUs using VAAPI, >>>> however it appears to break on AMD using VAAPI. Important to note here >>>> however is that this change is matching up with the vaapi spec, and >>>> the AMD implementation is more or less hacky. >>>> - VDPAU seems to overall be unimpressed and keeps working as before. >>>> >>>> I hope I summarized the non-DXVA2 cases properly, as I didn't test >>>> those personally, but relied on data from Mark Thompson. >>>> Despite the breakage of AMD VAAPI decoding, this change appears to be >>>> correct in the sense that it matches the VAAPI and DXVA2 specs on how >>>> to handle slices and fixes decoding on two different GPUs, using those >>>> two APIs. >>> >>> can the amd case be detected and the implementation adapted so >>> everything works ? >> >> Not really. > > There is no API to read the version of the mesa driver? The one useful call is vaQueryVendorString(), which indeed returns useful results for i965. Unfortunately, all mesa returns here is a fixed string: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/context.c#n176 - Mark
On Sun, Nov 20, 2016 at 5:16 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > 2016-11-18 19:42 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>: >> On Fri, Nov 18, 2016 at 6:40 PM, Michael Niedermayer >> <michael@niedermayer.cc> wrote: >>> On Fri, Nov 18, 2016 at 03:18:27PM +0100, Hendrik Leppkes wrote: > >>>> - This does fix decoding of these samples on Intel GPUs using VAAPI, >>>> however it appears to break on AMD using VAAPI. Important to note here >>>> however is that this change is matching up with the vaapi spec, and >>>> the AMD implementation is more or less hacky. >>>> - VDPAU seems to overall be unimpressed and keeps working as before. >>>> >>>> I hope I summarized the non-DXVA2 cases properly, as I didn't test >>>> those personally, but relied on data from Mark Thompson. >>>> Despite the breakage of AMD VAAPI decoding, this change appears to be >>>> correct in the sense that it matches the VAAPI and DXVA2 specs on how >>>> to handle slices and fixes decoding on two different GPUs, using those >>>> two APIs. >>> >>> can the amd case be detected and the implementation adapted so >>> everything works ? >> >> Not really. > > There is no API to read the version of the mesa driver? > Perhaps in some obscure way, but this is in generic code which doesn't even know what kind of hwaccel is being used, nevermind access to the internals of this hwaccel. Trying to hack this in would be *extremely* ugly and breach a bunch of the encapsulations we have to shield generic decoders from any hwaccel implementation specifics. Really, mesa should just fix their code to adhere to the vaapi specification for slices In reality, I have never seen a real-world video using these features, otherwise I would have known about this lack of support in DXVA2, which I did not. So its just not worth hacking around driver issues in a very ugly way. - Hendrik
2016-11-20 17:50 GMT+01:00 Mark Thompson <sw@jkqxz.net>: > On 20/11/16 16:16, Carl Eugen Hoyos wrote: >> There is no API to read the version of the mesa driver? > > The one useful call is vaQueryVendorString(), which indeed returns useful results for i965. Unfortunately, all mesa returns here is a fixed string: > > https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/context.c#n176 Thank you both for explaining! Please consider to make sure mesa knows about the issue... Carl Eugen
On Sun, Nov 20, 2016 at 7:14 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > 2016-11-20 17:50 GMT+01:00 Mark Thompson <sw@jkqxz.net>: >> On 20/11/16 16:16, Carl Eugen Hoyos wrote: > >>> There is no API to read the version of the mesa driver? >> >> The one useful call is vaQueryVendorString(), which indeed returns useful results for i965. Unfortunately, all mesa returns here is a fixed string: >> >> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/context.c#n176 > > Thank you both for explaining! > > Please consider to make sure mesa knows about the issue... > Unless there are more comments I'll apply this tomorrow with changelog and micro version bump. - Hendrik
On Fri, Nov 25, 2016 at 10:30 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote: > On Sun, Nov 20, 2016 at 7:14 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >> 2016-11-20 17:50 GMT+01:00 Mark Thompson <sw@jkqxz.net>: >>> On 20/11/16 16:16, Carl Eugen Hoyos wrote: >> >>>> There is no API to read the version of the mesa driver? >>> >>> The one useful call is vaQueryVendorString(), which indeed returns useful results for i965. Unfortunately, all mesa returns here is a fixed string: >>> >>> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/context.c#n176 >> >> Thank you both for explaining! >> >> Please consider to make sure mesa knows about the issue... >> > > Unless there are more comments I'll apply this tomorrow with changelog > and micro version bump. > And pushed. - Hendrik
diff --git a/libavcodec/vc1dec.c b/libavcodec/vc1dec.c index 4f78aa8..75d3365 100644 --- a/libavcodec/vc1dec.c +++ b/libavcodec/vc1dec.c @@ -634,6 +634,8 @@ static int vc1_decode_frame(AVCodecContext *avctx, void *data, uint8_t *buf; GetBitContext gb; int mby_start; + const uint8_t *rawbuf; + int raw_size; } *slices = NULL, *tmp; v->second_field = 0; @@ -716,6 +718,8 @@ static int vc1_decode_frame(AVCodecContext *avctx, void *data, /* assuming that the field marker is at the exact middle, hope it's correct */ slices[n_slices].mby_start = s->mb_height + 1 >> 1; + slices[n_slices].rawbuf = start; + slices[n_slices].raw_size = size + 4; n_slices1 = n_slices - 1; // index of the last slice of the first field n_slices++; break; @@ -743,6 +747,8 @@ static int vc1_decode_frame(AVCodecContext *avctx, void *data, init_get_bits(&slices[n_slices].gb, slices[n_slices].buf, buf_size3 << 3); slices[n_slices].mby_start = get_bits(&slices[n_slices].gb, 9); + slices[n_slices].rawbuf = start; + slices[n_slices].raw_size = size + 4; n_slices++; break; } @@ -779,6 +785,8 @@ static int vc1_decode_frame(AVCodecContext *avctx, void *data, init_get_bits(&slices[n_slices].gb, slices[n_slices].buf, buf_size3 << 3); slices[n_slices].mby_start = s->mb_height + 1 >> 1; + slices[n_slices].rawbuf = divider; + slices[n_slices].raw_size = buf + buf_size - divider; n_slices1 = n_slices - 1; n_slices++; } @@ -921,6 +929,7 @@ static int vc1_decode_frame(AVCodecContext *avctx, void *data, } else #endif if (avctx->hwaccel) { + s->mb_y = 0; if (v->field_mode && buf_start_second_field) { // decode first field s->picture_structure = PICT_BOTTOM_FIELD - v->tff; @@ -953,8 +962,36 @@ static int vc1_decode_frame(AVCodecContext *avctx, void *data, s->picture_structure = PICT_FRAME; if ((ret = avctx->hwaccel->start_frame(avctx, buf_start, (buf + buf_size) - buf_start)) < 0) goto err; - if ((ret = avctx->hwaccel->decode_slice(avctx, buf_start, (buf + buf_size) - buf_start)) < 0) - goto err; + + if (n_slices == 0) { + // no slices, decode the frame as-is + if ((ret = avctx->hwaccel->decode_slice(avctx, buf_start, (buf + buf_size) - buf_start)) < 0) + goto err; + } else { + // decode the frame part as the first slice + if ((ret = avctx->hwaccel->decode_slice(avctx, buf_start, slices[0].rawbuf - buf_start)) < 0) + goto err; + + // and process the slices as additional slices afterwards + for (i = 0 ; i < n_slices; i++) { + s->gb = slices[i].gb; + s->mb_y = slices[i].mby_start; + + v->pic_header_flag = get_bits1(&s->gb); + if (v->pic_header_flag) { + if (ff_vc1_parse_frame_header_adv(v, &s->gb) < 0) { + av_log(v->s.avctx, AV_LOG_ERROR, "Slice header damaged\n"); + ret = AVERROR_INVALIDDATA; + if (avctx->err_recognition & AV_EF_EXPLODE) + goto err; + continue; + } + } + + if ((ret = avctx->hwaccel->decode_slice(avctx, slices[i].rawbuf, slices[i].raw_size)) < 0) + goto err; + } + } if ((ret = avctx->hwaccel->end_frame(avctx)) < 0) goto err; }