diff mbox series

[FFmpeg-devel] avcodec/jpeg2000dec: support of 2 fields in 1 AVPacket

Message ID fa7a1a89-35fd-4a61-be42-b3bf25662031@mediaarea.net
State New
Headers show
Series [FFmpeg-devel] avcodec/jpeg2000dec: support of 2 fields in 1 AVPacket | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Jerome Martinez Feb. 2, 2024, 3:55 p.m. UTC
Before this patch, the FFmpeg MXF parser correctly detects content with 
2 fields in 1 AVPacket as e.g. interlaced 720x486 but the FFmpeg JPEG 
2000 decoder reads the JPEG 2000 SIZ header without understanding that 
the indicated height is the height of 1 field only so overwrites the 
frame size info with e.g. 720x243, and also completely discards the 
second frame, which lead to the decoding of only half of the stored 
content as "progressive" 720x243 flagged interlaced.

Example file:
https://www.digitizationguidelines.gov/guidelines/MXF_sampleFiles/RDD48-sample12-gf-jpeg2000-ntsc-4.2.zip

Before this patch:
Stream #0:0: Video: jpeg2000, yuv422p10le(bottom coded first (swapped)), 
720x243, lossless, SAR 9:20 DAR 4:3, 29.97 tbr, 29.97 tbn, 29.97 tbc

After this patch:
Stream #0:0: Video: jpeg2000, yuv422p10le(bottom coded first (swapped)), 
720x486, lossless, SAR 9:10 DAR 4:3, 29.97 fps, 29.97 tbr, 29.97 tbn
From 5242971da7d2cf8d8713144e4a7bcc4aa06437c4 Mon Sep 17 00:00:00 2001
From: Jerome Martinez <jerome@mediaarea.net>
Date: Thu, 1 Feb 2024 17:58:02 +0100
Subject: [PATCH] avcodec/jpeg2000dec: support of 2 fields in 1 AVPacket

---
 libavcodec/jpeg2000dec.c | 87 +++++++++++++++++++++++++++++++++++++++++++-----
 libavcodec/jpeg2000dec.h |  5 +++
 2 files changed, 84 insertions(+), 8 deletions(-)

Comments

Tomas Härdin Feb. 3, 2024, 10 a.m. UTC | #1
fre 2024-02-02 klockan 16:55 +0100 skrev Jerome Martinez:
> Before this patch, the FFmpeg MXF parser correctly detects content
> with 
> 2 fields in 1 AVPacket as e.g. interlaced 720x486 but the FFmpeg JPEG
> 2000 decoder reads the JPEG 2000 SIZ header without understanding
> that 
> the indicated height is the height of 1 field only so overwrites the 
> frame size info with e.g. 720x243, and also completely discards the 
> second frame, which lead to the decoding of only half of the stored 
> content as "progressive" 720x243 flagged interlaced.

Is the decoder really the right place to do this? Surely this happens
with more codecs than just j2k. Isnt it also a parser's job to do this
kind of stuff?

The logic that scans the packet for two SOI markers shouldn't be
necessary if the relevant information is carried over from the MXF
demuxer. It also makes the j2k decoder harder to ||ize. You might also
need the StoredF2Offset

/Tomas
Jerome Martinez Feb. 3, 2024, 10:51 a.m. UTC | #2
On 03/02/2024 11:00, Tomas Härdin wrote:
> fre 2024-02-02 klockan 16:55 +0100 skrev Jerome Martinez:
>> Before this patch, the FFmpeg MXF parser correctly detects content
>> with
>> 2 fields in 1 AVPacket as e.g. interlaced 720x486 but the FFmpeg JPEG
>> 2000 decoder reads the JPEG 2000 SIZ header without understanding
>> that
>> the indicated height is the height of 1 field only so overwrites the
>> frame size info with e.g. 720x243, and also completely discards the
>> second frame, which lead to the decoding of only half of the stored
>> content as "progressive" 720x243 flagged interlaced.
> Is the decoder really the right place to do this? Surely this happens
> with more codecs than just j2k. Isnt it also a parser's job to do this
> kind of stuff?

Both solutions have pro and con:
- Doing it in the decoder fixes the issue for all containers and only 
one codec
- Doing it in the demuxer fixes the issue for one container and all codecs
And currently I know only one container and only one codec having this 
issue.

My choice to implement in the decoder comes from the idea that it seems 
more hacky to decode 2 AVPackets (crafted from a single MXF packet), 
then do a RAM copy of the decoded (so huge) content for interleaving 
fields into 1 frame (pressure on RAM bandwidth) in the MXF demuxer + 
adapt frame metadata (height is overwritten by the decoder then need to 
overwrite again the value), doing it in the decoder seems less intrusive.

If moving to the demuxer is the only acceptable solution, I will do it 
but I would like to be sure that there is a consensus on that by FFmpeg 
developers before doing it, in order to not have this extra work 
rejected due to a future disagreement about where it should go.

> The logic that scans the packet for two SOI markers shouldn't be
> necessary if the relevant information is carried over from the MXF
> demuxer.

As far as I know there is nothing in the MXF file saying where is the 
second field in the packet, at which MXF metadata do you think?


>   It also makes the j2k decoder harder to ||ize. You might also
> need the StoredF2Offset

I don't change the field order detection by current FFmpeg code, I rely 
on FFmpeg code directly.
In my opinion if MXF field order detection needs to be changed, it would 
be in a separate patch dedicated to that (the field order detection in 
MXF) and it does not impact this patch as it is independent from which 
container is used, using AVCodecContext field order information. Jérôme
Tomas Härdin Feb. 3, 2024, 7:58 p.m. UTC | #3
lör 2024-02-03 klockan 11:51 +0100 skrev Jerome Martinez:
> On 03/02/2024 11:00, Tomas Härdin wrote:
> > fre 2024-02-02 klockan 16:55 +0100 skrev Jerome Martinez:
> > > Before this patch, the FFmpeg MXF parser correctly detects
> > > content
> > > with
> > > 2 fields in 1 AVPacket as e.g. interlaced 720x486 but the FFmpeg
> > > JPEG
> > > 2000 decoder reads the JPEG 2000 SIZ header without understanding
> > > that
> > > the indicated height is the height of 1 field only so overwrites
> > > the
> > > frame size info with e.g. 720x243, and also completely discards
> > > the
> > > second frame, which lead to the decoding of only half of the
> > > stored
> > > content as "progressive" 720x243 flagged interlaced.
> > Is the decoder really the right place to do this? Surely this
> > happens
> > with more codecs than just j2k. Isnt it also a parser's job to do
> > this
> > kind of stuff?
> 
> Both solutions have pro and con:
> - Doing it in the decoder fixes the issue for all containers and only
> one codec
> - Doing it in the demuxer fixes the issue for one container and all
> codecs
> And currently I know only one container and only one codec having
> this 
> issue.

A more proper fix might be to auto-insert a deinterleave filter in
ffmpeg.

> My choice to implement in the decoder comes from the idea that it
> seems 
> more hacky to decode 2 AVPackets (crafted from a single MXF packet), 
> then do a RAM copy of the decoded (so huge) content for interleaving 
> fields into 1 frame (pressure on RAM bandwidth) in the MXF demuxer + 
> adapt frame metadata (height is overwritten by the decoder then need
> to 
> overwrite again the value), doing it in the decoder seems less
> intrusive.

The fastest way, in a player, is probably to do it with a shader. That
should be the least amount of copies and the most cache coherent.

> If moving to the demuxer is the only acceptable solution, I will do
> it 
> but I would like to be sure that there is a consensus on that by
> FFmpeg 
> developers before doing it, in order to not have this extra work 
> rejected due to a future disagreement about where it should go.
> 
> > The logic that scans the packet for two SOI markers shouldn't be
> > necessary if the relevant information is carried over from the MXF
> > demuxer.
> 
> As far as I know there is nothing in the MXF file saying where is the
> second field in the packet, at which MXF metadata do you think?

Well, FrameLayout == SEPARATE_FIELDS, EditRate = 30000/1001 and
FieldDominance == 2. DisplayHeight is the field height as S377 says it
should be for SEPARATE_FIELDS.

Annoyingly EditRate could be the field rate rather than the frame rate.
So there may be no way to a priori know how many fields an essence
element contains.

I think people with knowledge how interlacing is handled in other
formats and codecs might want to chime in.

/Tomas
Tomas Härdin Feb. 3, 2024, 8:04 p.m. UTC | #4
lör 2024-02-03 klockan 20:58 +0100 skrev Tomas Härdin:
> lör 2024-02-03 klockan 11:51 +0100 skrev Jerome Martinez:
> > On 03/02/2024 11:00, Tomas Härdin wrote:
> > > fre 2024-02-02 klockan 16:55 +0100 skrev Jerome Martinez:
> > > > Before this patch, the FFmpeg MXF parser correctly detects
> > > > content
> > > > with
> > > > 2 fields in 1 AVPacket as e.g. interlaced 720x486 but the
> > > > FFmpeg
> > > > JPEG
> > > > 2000 decoder reads the JPEG 2000 SIZ header without
> > > > understanding
> > > > that
> > > > the indicated height is the height of 1 field only so
> > > > overwrites
> > > > the
> > > > frame size info with e.g. 720x243, and also completely discards
> > > > the
> > > > second frame, which lead to the decoding of only half of the
> > > > stored
> > > > content as "progressive" 720x243 flagged interlaced.
> > > Is the decoder really the right place to do this? Surely this
> > > happens
> > > with more codecs than just j2k. Isnt it also a parser's job to do
> > > this
> > > kind of stuff?
> > 
> > Both solutions have pro and con:
> > - Doing it in the decoder fixes the issue for all containers and
> > only
> > one codec
> > - Doing it in the demuxer fixes the issue for one container and all
> > codecs
> > And currently I know only one container and only one codec having
> > this 
> > issue.
> 
> A more proper fix might be to auto-insert a deinterleave filter in
> ffmpeg.
> 
> > My choice to implement in the decoder comes from the idea that it
> > seems 
> > more hacky to decode 2 AVPackets (crafted from a single MXF
> > packet), 
> > then do a RAM copy of the decoded (so huge) content for
> > interleaving 
> > fields into 1 frame (pressure on RAM bandwidth) in the MXF demuxer
> > + 
> > adapt frame metadata (height is overwritten by the decoder then
> > need
> > to 
> > overwrite again the value), doing it in the decoder seems less
> > intrusive.
> 
> The fastest way, in a player, is probably to do it with a shader.
> That
> should be the least amount of copies and the most cache coherent.
> 
> > If moving to the demuxer is the only acceptable solution, I will do
> > it 
> > but I would like to be sure that there is a consensus on that by
> > FFmpeg 
> > developers before doing it, in order to not have this extra work 
> > rejected due to a future disagreement about where it should go.
> > 
> > > The logic that scans the packet for two SOI markers shouldn't be
> > > necessary if the relevant information is carried over from the
> > > MXF
> > > demuxer.
> > 
> > As far as I know there is nothing in the MXF file saying where is
> > the
> > second field in the packet, at which MXF metadata do you think?
> 
> Well, FrameLayout == SEPARATE_FIELDS, EditRate = 30000/1001 and
> FieldDominance == 2. DisplayHeight is the field height as S377 says
> it
> should be for SEPARATE_FIELDS.

It should also be said that what this patch effectively does is
silently convert SEPARATE_FIELDS to MIXED_FIELDS. What if I want to
transcode J2K to lower bitrate but keep it SEPARATE_FIELDS?

/Tomas
Pierre-Anthony Lemieux Feb. 4, 2024, 12:20 a.m. UTC | #5
On Sat, Feb 3, 2024 at 12:04 PM Tomas Härdin <git@haerdin.se> wrote:
>
> lör 2024-02-03 klockan 20:58 +0100 skrev Tomas Härdin:
> > lör 2024-02-03 klockan 11:51 +0100 skrev Jerome Martinez:
> > > On 03/02/2024 11:00, Tomas Härdin wrote:
> > > > fre 2024-02-02 klockan 16:55 +0100 skrev Jerome Martinez:
> > > > > Before this patch, the FFmpeg MXF parser correctly detects
> > > > > content
> > > > > with
> > > > > 2 fields in 1 AVPacket as e.g. interlaced 720x486 but the
> > > > > FFmpeg
> > > > > JPEG
> > > > > 2000 decoder reads the JPEG 2000 SIZ header without
> > > > > understanding
> > > > > that
> > > > > the indicated height is the height of 1 field only so
> > > > > overwrites
> > > > > the
> > > > > frame size info with e.g. 720x243, and also completely discards
> > > > > the
> > > > > second frame, which lead to the decoding of only half of the
> > > > > stored
> > > > > content as "progressive" 720x243 flagged interlaced.
> > > > Is the decoder really the right place to do this? Surely this
> > > > happens
> > > > with more codecs than just j2k. Isnt it also a parser's job to do
> > > > this
> > > > kind of stuff?
> > >
> > > Both solutions have pro and con:
> > > - Doing it in the decoder fixes the issue for all containers and
> > > only
> > > one codec
> > > - Doing it in the demuxer fixes the issue for one container and all
> > > codecs
> > > And currently I know only one container and only one codec having
> > > this
> > > issue.
> >
> > A more proper fix might be to auto-insert a deinterleave filter in
> > ffmpeg.
> >
> > > My choice to implement in the decoder comes from the idea that it
> > > seems
> > > more hacky to decode 2 AVPackets (crafted from a single MXF
> > > packet),
> > > then do a RAM copy of the decoded (so huge) content for
> > > interleaving
> > > fields into 1 frame (pressure on RAM bandwidth) in the MXF demuxer
> > > +
> > > adapt frame metadata (height is overwritten by the decoder then
> > > need
> > > to
> > > overwrite again the value), doing it in the decoder seems less
> > > intrusive.
> >
> > The fastest way, in a player, is probably to do it with a shader.
> > That
> > should be the least amount of copies and the most cache coherent.
> >
> > > If moving to the demuxer is the only acceptable solution, I will do
> > > it
> > > but I would like to be sure that there is a consensus on that by
> > > FFmpeg
> > > developers before doing it, in order to not have this extra work
> > > rejected due to a future disagreement about where it should go.
> > >
> > > > The logic that scans the packet for two SOI markers shouldn't be
> > > > necessary if the relevant information is carried over from the
> > > > MXF
> > > > demuxer.
> > >
> > > As far as I know there is nothing in the MXF file saying where is
> > > the
> > > second field in the packet, at which MXF metadata do you think?
> >
> > Well, FrameLayout == SEPARATE_FIELDS, EditRate = 30000/1001 and
> > FieldDominance == 2. DisplayHeight is the field height as S377 says
> > it
> > should be for SEPARATE_FIELDS.
>
> It should also be said that what this patch effectively does is
> silently convert SEPARATE_FIELDS to MIXED_FIELDS. What if I want to
> transcode J2K to lower bitrate but keep it SEPARATE_FIELDS?

See attached table from SMPTE ST 422, which specifies wrapping of J2K in MXF.

>
> /Tomas
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Tomas Härdin Feb. 5, 2024, 12:19 a.m. UTC | #6
> > > > > The logic that scans the packet for two SOI markers shouldn't
> > > > > be
> > > > > necessary if the relevant information is carried over from
> > > > > the
> > > > > MXF
> > > > > demuxer.
> > > > 
> > > > As far as I know there is nothing in the MXF file saying where
> > > > is
> > > > the
> > > > second field in the packet, at which MXF metadata do you think?
> > > 
> > > Well, FrameLayout == SEPARATE_FIELDS, EditRate = 30000/1001 and
> > > FieldDominance == 2. DisplayHeight is the field height as S377
> > > says
> > > it
> > > should be for SEPARATE_FIELDS.
> > 
> > It should also be said that what this patch effectively does is
> > silently convert SEPARATE_FIELDS to MIXED_FIELDS. What if I want to
> > transcode J2K to lower bitrate but keep it SEPARATE_FIELDS?
> 
> See attached table from SMPTE ST 422, which specifies wrapping of J2K
> in MXF.

Which entry in the table would the provided file correspond to? To me
it seems none of them fit. There's two fields, meaning two j2k
codestreams, in each corresponding essence element KLV packet (I think,
unless CP packets get reassembled somewhere else). Entry I2 seems
closest but it specifies FULL_FRAME. I1 is otherwise tempting, but
there SampleRate should equal the field rate whereas the file has
SampleRate = 30000/1001.

Aside: I didn't realize one could use SampleRate like this, but Annex
G.2.2 is explicit about it.

I'm tempted to say this file has been muxed improperly. Too bad there's
no software or version information in the file. What made it?

/Tomas
James Almer Feb. 5, 2024, 2:25 a.m. UTC | #7
On 2/3/2024 7:00 AM, Tomas Härdin wrote:
> fre 2024-02-02 klockan 16:55 +0100 skrev Jerome Martinez:
>> Before this patch, the FFmpeg MXF parser correctly detects content
>> with
>> 2 fields in 1 AVPacket as e.g. interlaced 720x486 but the FFmpeg JPEG
>> 2000 decoder reads the JPEG 2000 SIZ header without understanding
>> that
>> the indicated height is the height of 1 field only so overwrites the
>> frame size info with e.g. 720x243, and also completely discards the
>> second frame, which lead to the decoding of only half of the stored
>> content as "progressive" 720x243 flagged interlaced.
> 
> Is the decoder really the right place to do this? Surely this happens
> with more codecs than just j2k. Isnt it also a parser's job to do this
> kind of stuff?

An AVParser must not split (or assemble) a packet in a form that is not 
meant to be encapsulated in a container.
If you need to split a packet into smaller portions, you need a 
bitstream filter. See {vp9_superframe,av1_frame}_split.

> 
> The logic that scans the packet for two SOI markers shouldn't be
> necessary if the relevant information is carried over from the MXF
> demuxer. It also makes the j2k decoder harder to ||ize. You might also
> need the StoredF2Offset
> 
> /Tomas
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Tomas Härdin Feb. 5, 2024, 2:28 p.m. UTC | #8
sön 2024-02-04 klockan 23:25 -0300 skrev James Almer:
> On 2/3/2024 7:00 AM, Tomas Härdin wrote:
> > fre 2024-02-02 klockan 16:55 +0100 skrev Jerome Martinez:
> > > Before this patch, the FFmpeg MXF parser correctly detects
> > > content
> > > with
> > > 2 fields in 1 AVPacket as e.g. interlaced 720x486 but the FFmpeg
> > > JPEG
> > > 2000 decoder reads the JPEG 2000 SIZ header without understanding
> > > that
> > > the indicated height is the height of 1 field only so overwrites
> > > the
> > > frame size info with e.g. 720x243, and also completely discards
> > > the
> > > second frame, which lead to the decoding of only half of the
> > > stored
> > > content as "progressive" 720x243 flagged interlaced.
> > 
> > Is the decoder really the right place to do this? Surely this
> > happens
> > with more codecs than just j2k. Isnt it also a parser's job to do
> > this
> > kind of stuff?
> 
> An AVParser must not split (or assemble) a packet in a form that is
> not 
> meant to be encapsulated in a container.

This is two independent packets that are concatenated. A bsf changes
the bits, right? Whereas in this case no changes in the actual data is
necessary. Parsers are typically used to split raw essence into
packets, right? This is a similar case.

A use-case for this kind of splitting could be upping EditRate to
60000/1001 (SampleRate still being 60000/1001), allowing for field-
level editing. The resulting MXF would still be SEPARATE_FIELDS. This
is mode F1 in the table that pal posted.

This ties into the work Anton is doing on interlacing I suspect.

/Tomas
Jerome Martinez Feb. 15, 2024, 3:02 p.m. UTC | #9
On 05/02/2024 01:19, Tomas Härdin wrote:
> [...]
> Which entry in the table would the provided file correspond to? To me
> it seems none of them fit. There's two fields, meaning two j2k
> codestreams, in each corresponding essence element KLV packet (I think,
> unless CP packets get reassembled somewhere else). Entry I2 seems
> closest but it specifies FULL_FRAME. I1 is otherwise tempting, but
> there SampleRate should equal the field rate whereas the file has
> SampleRate = 30000/1001.

Other examples I have (not shareable) with 2 jp2k pictures per KLV have 
identification from an old version of AmberFin iCR, I have no file with 
the I2 correctly signaled, with my first example it isI2 (2 fields per 
KLV) with I1 Header Metadata Property Values **but** with I2 essence 
container label which has a content byte (byte 15 of the UL) of 0x04 = I2.
The AmberFin iCR files have the generic essence container label with 
content byte of 0x01 = FU (Unspecified) so for my main use case we could 
activate the search of the 2nd jp2k only if I2 is explicitly signaled by 
the essence container label but it would prevent to catch the 2nd field 
when this signaling is unspecified and buggy Frame layout + sample rate 
+ edit rate.

I agree that this is not what is defined in ST 422 in full, but note 
that frame layout and height are not required by ST 377 (only "best 
effort") so IMO we should not rely much on them, and at least we should 
handle what is in the wild, correct me if I am wrong but handling non 
conforming content seems an acceptable policy in FFmpeg (I think to e.g. 
DPX and non conforming EOLs of some scanners, their names are directly 
written in FFmpeg source code).
Video Line Map is also best effort but without it it is not possible to 
know the field_order, I wonder what should be done in that case. 
Currently I rely on current implementation in FFmpeg for catching 
field_order and don't try to find the 2nd field if field_order is 
AV_FIELD_UNKNOWN (not important for me as all files I have have 
field_order related metadata).

Also if I manually edit the MXF for having a conforming I2 property 
values, FFmpeg behaves same (still indicating half height and still 
silently discarding the 2nd field), so in my opinion the handling of 2 
jp2k pictures per KLV is still relevant for handling correctly I2 
conforming files and tolerating wrong property values may be relevant 
(checking essence container label only? to be discussed).

On 03/02/2024 20:58, Tomas Härdin wrote:
> The fastest way, in a player, is probably to do it with a shader. That
> should be the least amount of copies and the most cache coherent.

As far a I know the player is not aware that the AVFrame actually 
contains a field so it can not apply a shader or something else, which 
AVFrame field indicates that this is a a field to be interleaved with 
the next AVFrame before display?
Currently for I1 files ffplay or VLC show double rate half height so it 
seems that they don't catch that AVFrame contains a field.

On 03/02/2024 21:04, Tomas Härdin wrote:
> It should also be said that what this patch effectively does is
> silently convert SEPARATE_FIELDS to MIXED_FIELDS. What if I want to
> transcode J2K to lower bitrate but keep it SEPARATE_FIELDS?


I don't get what is the expected behavior of FFmpeg: what is the meaning 
of "243" in
"Stream #0:0: Video: jpeg2000, yuv422p10le(bottom coded first 
(swapped)), 720x243, lossless, SAR 9:20 DAR 4:3, 29.97 fps, 29.97 tbr, 
29.97 tbn"

My understanding is that a frame height is expected, never a field 
height, and there is no indication in the current output that 243 is a 
field height for both I1 & I2, so the "silent conversion" would be 
expected by the user in order to have a frame in the output and never a 
field, am I wrong?

Also it seems that there is no way to signal that the outputted picture 
is a field and not a frame, and FFmpeg handles I1 (1 field per KLV) as 
if it ignores that this is a field and not a frame, so when a I1 file is 
converted to another format without an interleave filter manually added 
the output is an ugly flipping double rate half height content.
Silently converting a field to a frame seems to me a worse behavior than 
silently converting SEPARATE_FIELDS to MIXED_FIELDS because the output 
is not what is expected by the person who created the file as well as 
the person watching the output of FFmpeg.


> What if I want to transcode J2K to lower bitrate but keep it SEPARATE_FIELDS?

Interlacing the lines then encoding separately the fields? It is more a 
matter of default behavior (deinterlace or not) and who would need to 
apply a filter, my issue is that I see no way to signal in FFmpeg that 
"got_frame" means "got frame or field" and that AVFrame contains a field 
so I would prefer that the default behavior is to handle frames in 
AVFrame rather than fields. Is it acceptable?Additionally the MXF 
container indicates (for conforming files) that I2 edit rate is for a 
frame even if there are 2 separate fields in the KLV, do you expect that 
FFmpeg forces separate fields in separate AVFrame like I1 even when the 
MXF muxer explicitly said that it is expected that edit is by frame? We 
have here 3 cases (separated fields in separated KLV, 2 separated fields 
in 1 KLV, mixed fields in 1 KLV), my understanding of current AVFrame is 
that only mixed fields (so 1 frame) in 1 AVFrame is supported, and both 
other MXF methods are silently converted to it (fields handled as frame 
for separated fields in separated KLV, 2nd field discarded for 2 
separated fields in 1 KLV). My planned next step is to handle I1 files 
automatically in the right output (a frame rather than fields in AVFrame 
and output) so users don't have the surprise to have a double rate half 
height content in MKV or MP4 without any signaling about that and badly 
handled by any player, including ffplay. Wouldn't it be acceptable? 
Should I split the 2 I2 fields and provide 2 AVFrame?



In other words, I would like to know if AVFrame is intended at long term 
to handle also fields in addition to frames, and if so is there a way to 
signal that the AVFrame structure actually contains a field so 
players/transcoders (including FFmpeg) can do the interleave before 
showing (ffplay) something or converting to a format not supporting 
separated fields, if not what is the prefered approach (my current 
proposal, decode both fields then interleave, adding an interleave 
filter, splitting a KLV into 2 AVPacket, other?).
Jerome
Vittorio Giovara Feb. 16, 2024, 12:02 a.m. UTC | #10
On Thu, Feb 15, 2024 at 4:02 PM Jerome Martinez <jerome@mediaarea.net>
wrote:

> In other words, I would like to know if AVFrame is intended at long term
> to handle also fields in addition to frames, and if so is there a way to
> signal that the AVFrame structure actually contains a field
>

I may be missing something from the previous discussion, but the
AV_FRAME_FLAG_INTERLACED should indicate when that is the case.
I am not familiar enough with j2k code to know if that flag is correctly
set or not.
Kieran Kunhya Feb. 16, 2024, 2:17 a.m. UTC | #11
>
> I may be missing something from the previous discussion, but the
> AV_FRAME_FLAG_INTERLACED should indicate when that is the case.
> I am not familiar enough with j2k code to know if that flag is correctly
> set or not.
>
>
AV_FRAME_FLAG_INTERLACED signals two fields which are interleaved. What the
OP wants is a single field in an AVFrame.

(insert rant about why interlaced is evil blah blah).

Kieran
Anton Khirnov Feb. 16, 2024, 10:32 a.m. UTC | #12
Quoting Tomas Härdin (2024-02-03 20:58:20)
> I think people with knowledge how interlacing is handled in other
> formats and codecs might want to chime in.

For MPEG codecs our decoders always output (properly signalled)
interlaced content as two interleaved fields in a single AVFrame flagged
with AV_FRAME_FLAG_INTERLACED. Its height is the height of the whole
frame, so double the number of lines in each field.

So IIUC this patch is taking the correct approach.
Tomas Härdin Feb. 18, 2024, 11:14 p.m. UTC | #13
tor 2024-02-15 klockan 16:02 +0100 skrev Jerome Martinez:
> On 05/02/2024 01:19, Tomas Härdin wrote:
> > [...]
> > Which entry in the table would the provided file correspond to? To
> > me
> > it seems none of them fit. There's two fields, meaning two j2k
> > codestreams, in each corresponding essence element KLV packet (I
> > think,
> > unless CP packets get reassembled somewhere else). Entry I2 seems
> > closest but it specifies FULL_FRAME. I1 is otherwise tempting, but
> > there SampleRate should equal the field rate whereas the file has
> > SampleRate = 30000/1001.
> 
> Other examples I have (not shareable) with 2 jp2k pictures per KLV
> have 
> identification from an old version of AmberFin iCR, I have no file
> with 
> the I2 correctly signaled, with my first example it isI2 (2 fields
> per 
> KLV) with I1 Header Metadata Property Values **but** with I2 essence 
> container label which has a content byte (byte 15 of the UL) of 0x04
> = I2.
> The AmberFin iCR files have the generic essence container label with 
> content byte of 0x01 = FU (Unspecified) so for my main use case we
> could 
> activate the search of the 2nd jp2k only if I2 is explicitly signaled
> by 
> the essence container label but it would prevent to catch the 2nd
> field 
> when this signaling is unspecified and buggy Frame layout + sample
> rate 
> + edit rate.

I'm not super stoked about implementing support for broken muxers.
Instead these companies should fix their code. But either way we at the
very least need a reliable way to detect these kinds of files if we are
to do this. There was no Software + Version information in the sample
provided, which is otherwise a reliable method to deal with shitty
muxers.

Always inserting a parser worsens performance, and dealing with j2k is
already very taxing. I've been working hard on ||izing j2kdec, and
adding more serial code would make that harder.

> On 03/02/2024 20:58, Tomas Härdin wrote:
> > The fastest way, in a player, is probably to do it with a shader.
> > That
> > should be the least amount of copies and the most cache coherent.
> 
> As far a I know the player is not aware that the AVFrame actually 
> contains a field so it can not apply a shader or something else,
> which 
> AVFrame field indicates that this is a a field to be interleaved with
> the next AVFrame before display?
> Currently for I1 files ffplay or VLC show double rate half height so
> it 
> seems that they don't catch that AVFrame contains a field.

There might be no way to do this at present, as Anton's reply
indicates.

Proper support for FrameLayout may require adding a bunch of fields to
AVFrame and/or AVCodecContext.

We could of course always convert to MIXED_FIELDS and just say that's
the way FFmpeg does things. Anyone who wants to do anything more fancy
is free to provide a patch or sufficient money. The present situation
where the other field is just discarded entirely is of course not
satisfactory either.

/Tomas
Marton Balint Feb. 18, 2024, 11:43 p.m. UTC | #14
On Fri, 16 Feb 2024, Anton Khirnov wrote:

> Quoting Tomas Härdin (2024-02-03 20:58:20)
>> I think people with knowledge how interlacing is handled in other
>> formats and codecs might want to chime in.
>
> For MPEG codecs our decoders always output (properly signalled)
> interlaced content as two interleaved fields in a single AVFrame flagged
> with AV_FRAME_FLAG_INTERLACED. Its height is the height of the whole
> frame, so double the number of lines in each field.
>
> So IIUC this patch is taking the correct approach.

I believe interlaced HEVC has the same issue, the decoder generates field 
pictures and not frames. Would changing the HEVC decoder be acceptable to 
generate frames with interleaved fields?

Regards,
Marton
Hendrik Leppkes Feb. 19, 2024, 8:22 a.m. UTC | #15
On Mon, Feb 19, 2024 at 12:44 AM Marton Balint <cus@passwd.hu> wrote:
>
>
>
> On Fri, 16 Feb 2024, Anton Khirnov wrote:
>
> > Quoting Tomas Härdin (2024-02-03 20:58:20)
> >> I think people with knowledge how interlacing is handled in other
> >> formats and codecs might want to chime in.
> >
> > For MPEG codecs our decoders always output (properly signalled)
> > interlaced content as two interleaved fields in a single AVFrame flagged
> > with AV_FRAME_FLAG_INTERLACED. Its height is the height of the whole
> > frame, so double the number of lines in each field.
> >
> > So IIUC this patch is taking the correct approach.
>
> I believe interlaced HEVC has the same issue, the decoder generates field
> pictures and not frames. Would changing the HEVC decoder be acceptable to
> generate frames with interleaved fields?
>

Absolutely.
There have been WIP patches in the past that tried to do this, but
resorting to memcpy instead of updating the decoder to write to every
second line, and that should of course be avoided.
HEVC also has HWAccel, which should work as well in the same manner,
if possible.

- Hendrik
Tomas Härdin Feb. 19, 2024, 11:05 a.m. UTC | #16
mån 2024-02-19 klockan 00:43 +0100 skrev Marton Balint:
> 
> 
> On Fri, 16 Feb 2024, Anton Khirnov wrote:
> 
> > Quoting Tomas Härdin (2024-02-03 20:58:20)
> > > I think people with knowledge how interlacing is handled in other
> > > formats and codecs might want to chime in.
> > 
> > For MPEG codecs our decoders always output (properly signalled)
> > interlaced content as two interleaved fields in a single AVFrame
> > flagged
> > with AV_FRAME_FLAG_INTERLACED. Its height is the height of the
> > whole
> > frame, so double the number of lines in each field.
> > 
> > So IIUC this patch is taking the correct approach.
> 
> I believe interlaced HEVC has the same issue, the decoder generates
> field 
> pictures and not frames. Would changing the HEVC decoder be
> acceptable to 
> generate frames with interleaved fields?

This isn't a J2K thing, but an MXF thing. The same issue will arise for
MJPEG and raw video in MXF using SEPARATE_FIELDS.

/Tomas
Tomas Härdin Feb. 19, 2024, 11:07 a.m. UTC | #17
mån 2024-02-19 klockan 12:05 +0100 skrev Tomas Härdin:
> mån 2024-02-19 klockan 00:43 +0100 skrev Marton Balint:
> > 
> > 
> > On Fri, 16 Feb 2024, Anton Khirnov wrote:
> > 
> > > Quoting Tomas Härdin (2024-02-03 20:58:20)
> > > > I think people with knowledge how interlacing is handled in
> > > > other
> > > > formats and codecs might want to chime in.
> > > 
> > > For MPEG codecs our decoders always output (properly signalled)
> > > interlaced content as two interleaved fields in a single AVFrame
> > > flagged
> > > with AV_FRAME_FLAG_INTERLACED. Its height is the height of the
> > > whole
> > > frame, so double the number of lines in each field.
> > > 
> > > So IIUC this patch is taking the correct approach.
> > 
> > I believe interlaced HEVC has the same issue, the decoder generates
> > field 
> > pictures and not frames. Would changing the HEVC decoder be
> > acceptable to 
> > generate frames with interleaved fields?
> 
> This isn't a J2K thing, but an MXF thing. The same issue will arise
> for
> MJPEG and raw video in MXF using SEPARATE_FIELDS.

To make things worse, encoding SEPARATE_FIELDS using progressive
H.263/4/5/6 in MXF is almost certainly legal. Lawful evil.

/Tomas
Tomas Härdin Feb. 19, 2024, 11:08 a.m. UTC | #18
mån 2024-02-19 klockan 00:14 +0100 skrev Tomas Härdin:
> tor 2024-02-15 klockan 16:02 +0100 skrev Jerome Martinez:
> > On 05/02/2024 01:19, Tomas Härdin wrote:
> > > [...]
> > > Which entry in the table would the provided file correspond to?
> > > To
> > > me
> > > it seems none of them fit. There's two fields, meaning two j2k
> > > codestreams, in each corresponding essence element KLV packet (I
> > > think,
> > > unless CP packets get reassembled somewhere else). Entry I2 seems
> > > closest but it specifies FULL_FRAME. I1 is otherwise tempting,
> > > but
> > > there SampleRate should equal the field rate whereas the file has
> > > SampleRate = 30000/1001.
> > 
> > Other examples I have (not shareable) with 2 jp2k pictures per KLV
> > have 
> > identification from an old version of AmberFin iCR, I have no file
> > with 
> > the I2 correctly signaled, with my first example it isI2 (2 fields
> > per 
> > KLV) with I1 Header Metadata Property Values **but** with I2
> > essence 
> > container label which has a content byte (byte 15 of the UL) of
> > 0x04
> > = I2.
> > The AmberFin iCR files have the generic essence container label
> > with 
> > content byte of 0x01 = FU (Unspecified) so for my main use case we
> > could 
> > activate the search of the 2nd jp2k only if I2 is explicitly
> > signaled
> > by 
> > the essence container label but it would prevent to catch the 2nd
> > field 
> > when this signaling is unspecified and buggy Frame layout + sample
> > rate 
> > + edit rate.
> 
> I'm not super stoked about implementing support for broken muxers.
> Instead these companies should fix their code. But either way we at
> the
> very least need a reliable way to detect these kinds of files if we
> are
> to do this. There was no Software + Version information in the sample
> provided, which is otherwise a reliable method to deal with shitty
> muxers.

Correction: there is Identification metadata, but it's at the end of
the header metadata so I missed it.

    Identifications
      Identification = {fb8e5be0-1fc5-11e9-8263-7062b8a31e5c}
      Identification -> Strong Reference to Identification
        Identification
          InstanceUID = {fb8e5be0-1fc5-11e9-8263-7062b8a31e5c}
          ThisGenerationUID = {fb8e5be0-1fc5-11e9-8264-7062b8a31e5c}
          CompanyName = OpenCube
          ProductName = MXFTk Advanced
          ProductUID = {3a4fe380-0d01-11e4-869f-3cd92b5c1dfc}
          VersionString = 2.7.0.20190123
          ProductVersion = Major="2", Minor="7", Patch="0", Build="0",
Release="VersionReleased"
          ToolkitVersion = Major="2", Minor="7", Patch="0", Build="0",
Release="VersionReleased"
          Platform = Microsoft Windows 7 Professional Service Pack 1
(Build 7601)
          ModificationDate = 2019-01-24 11:51:37.884
    LastModifiedDate = 2019-01-24 11:51:37.884
    GenerationUID = {fb8e5be0-1fc5-11e9-8264-7062b8a31e5c}

This at least (maybe) allows us to detect these broken files. But does
MXFTk *always* write interlaced files like this?

/Tomas
Jerome Martinez Feb. 19, 2024, 11:19 a.m. UTC | #19
On 19/02/2024 12:08, Tomas Härdin wrote:
> mån 2024-02-19 klockan 00:14 +0100 skrev Tomas Härdin:
>> tor 2024-02-15 klockan 16:02 +0100 skrev Jerome Martinez:
>>> On 05/02/2024 01:19, Tomas Härdin wrote:
>>>> [...]
>>>> Which entry in the table would the provided file correspond to?
>>>> To
>>>> me
>>>> it seems none of them fit. There's two fields, meaning two j2k
>>>> codestreams, in each corresponding essence element KLV packet (I
>>>> think,
>>>> unless CP packets get reassembled somewhere else). Entry I2 seems
>>>> closest but it specifies FULL_FRAME. I1 is otherwise tempting,
>>>> but
>>>> there SampleRate should equal the field rate whereas the file has
>>>> SampleRate = 30000/1001.
>>> Other examples I have (not shareable) with 2 jp2k pictures per KLV
>>> have
>>> identification from an old version of AmberFin iCR, I have no file
>>> with
>>> the I2 correctly signaled, with my first example it isI2 (2 fields
>>> per
>>> KLV) with I1 Header Metadata Property Values **but** with I2
>>> essence
>>> container label which has a content byte (byte 15 of the UL) of
>>> 0x04
>>> = I2.
>>> The AmberFin iCR files have the generic essence container label
>>> with
>>> content byte of 0x01 = FU (Unspecified) so for my main use case we
>>> could
>>> activate the search of the 2nd jp2k only if I2 is explicitly
>>> signaled
>>> by
>>> the essence container label but it would prevent to catch the 2nd
>>> field
>>> when this signaling is unspecified and buggy Frame layout + sample
>>> rate
>>> + edit rate.
>> I'm not super stoked about implementing support for broken muxers.
>> Instead these companies should fix their code. But either way we at
>> the
>> very least need a reliable way to detect these kinds of files if we
>> are
>> to do this. There was no Software + Version information in the sample
>> provided, which is otherwise a reliable method to deal with shitty
>> muxers.
> Correction: there is Identification metadata, but it's at the end of
> the header metadata so I missed it.

Same.


>
>      Identifications
>        Identification = {fb8e5be0-1fc5-11e9-8263-7062b8a31e5c}
>        Identification -> Strong Reference to Identification
>          Identification
>            InstanceUID = {fb8e5be0-1fc5-11e9-8263-7062b8a31e5c}
>            ThisGenerationUID = {fb8e5be0-1fc5-11e9-8264-7062b8a31e5c}
>            CompanyName = OpenCube
>            ProductName = MXFTk Advanced
>            ProductUID = {3a4fe380-0d01-11e4-869f-3cd92b5c1dfc}
>            VersionString = 2.7.0.20190123
>            ProductVersion = Major="2", Minor="7", Patch="0", Build="0",
> Release="VersionReleased"
>            ToolkitVersion = Major="2", Minor="7", Patch="0", Build="0",
> Release="VersionReleased"
>            Platform = Microsoft Windows 7 Professional Service Pack 1
> (Build 7601)
>            ModificationDate = 2019-01-24 11:51:37.884
>      LastModifiedDate = 2019-01-24 11:51:37.884
>      GenerationUID = {fb8e5be0-1fc5-11e9-8264-7062b8a31e5c}
>
> This at least (maybe) allows us to detect these broken files. But does
> MXFTk *always* write interlaced files like this?

Beside the fact we don't know the version, I have files from other 
muxers having the same issue.

In practice:
- For the example file, no need to check identification, as said 
previously essence container label provides I2 tip, so I could propagate 
I2 tip to the jp2k decoder
- But if we limit to that, some other files in the wild won't be 
recognized because essence container label stipulates (legally) 
unspecified and (out of spec but it is real) have the same issue with 
frame layout.

If I understood correctly the other comments an AVPacket should be a MXF 
element so the MXF parser should not split the packet.
I am preparing a v2 patch which does not include the extra probe, 
relying on current jp2k parser for catching the 2nd field, so in my 
opinion there is not value added to try to catch I2 including buggy 
muxers in MXF.

Jérôme
Devin Heitmueller Feb. 19, 2024, 4:14 p.m. UTC | #20
On Mon, Feb 19, 2024 at 3:23 AM Hendrik Leppkes <h.leppkes@gmail.com> wrote:
> Absolutely.
> There have been WIP patches in the past that tried to do this, but
> resorting to memcpy instead of updating the decoder to write to every
> second line, and that should of course be avoided.
> HEVC also has HWAccel, which should work as well in the same manner,
> if possible.

Speaking as someone who works with interlaced HEVC on a regular basis,
I would *very much* welcome this change.  In the last five years I've
had to make hacks to ffmpeg, VLC, and gStreamer to recombine the
fields, since I don't have the expertise necessary to modify the
actual libavcodec HEVC decoder to do the interleaving.  If somebody
were to fix the decoder itself then I could throw away all those
hacks.

Regards,

Devin
Jerome Martinez Feb. 20, 2024, 3:07 p.m. UTC | #21
Attached is an updated version of the patch proposal.

About the idea to keep separate fields in the output AVFrame, I note 
from the discussion that it is commonly accepted that up to now it is 
expected that the AVPacket contains what is in the MXF element and that 
the AVFrame contains a frame and never a field, and additionally I see 
in e.g. mpeg12dec.c that the decoder interleaves separate fields:
mb_y <<= field_pic;
if (s2->picture_structure == PICT_BOTTOM_FIELD)
     mb_y++;
And mpegvideo_parser.c creates a AVPacket with both fields in AVPacket 
even if they are separated, this patch keeps the AVPacket from e.g. MXF 
with both fields in it and does something similar to what do other 
decoders with separate fields in the output AVFRame.

About the detection of the 2 separated fields in 1 packet in the MXF 
file (I2 mode), it is doable in the example file provided in the first 
patch proposal to catch it by checking the essence label but other files 
I have don't have the specific essence label (they use the "undefined" 
essence label, legit) so IMO we should not rely on it and there is 
actually no practical advantage in catching that from the container.

In practice this new patch proposal is slightly more complex, with one 
recursive call to jpeg2000_decode_frame() when there are 2 codestreams 
in 1 AVPacket, but it has a negligible performance impact (few 
comparisons and bool checks) when there is only one codestream in the 
AVPacket (the currently only supported method, remind that currently 
FFmpeg completely discards the 2nd codestream present in the AVPacket) 
and it relies on the current jpeg2000_read_main_headers() function for 
catching the end of the first codestream (safer than the quick find of 
EOC/SOC in the first version).

It also changes the jpeg2000_decode_frame return value to the count of 
bytes parsed, it seems that it was what is expected but in practice it 
was not doing that, fixing the return value could be a separate patch if 
preferred.

Jérôme

On 02/02/2024 16:55, Jerome Martinez wrote:
> Before this patch, the FFmpeg MXF parser correctly detects content 
> with 2 fields in 1 AVPacket as e.g. interlaced 720x486 but the FFmpeg 
> JPEG 2000 decoder reads the JPEG 2000 SIZ header without understanding 
> that the indicated height is the height of 1 field only so overwrites 
> the frame size info with e.g. 720x243, and also completely discards 
> the second frame, which lead to the decoding of only half of the 
> stored content as "progressive" 720x243 flagged interlaced.
>
> Example file:
> https://www.digitizationguidelines.gov/guidelines/MXF_sampleFiles/RDD48-sample12-gf-jpeg2000-ntsc-4.2.zip 
>
>
> Before this patch:
> Stream #0:0: Video: jpeg2000, yuv422p10le(bottom coded first 
> (swapped)), 720x243, lossless, SAR 9:20 DAR 4:3, 29.97 tbr, 29.97 tbn, 
> 29.97 tbc
>
> After this patch:
> Stream #0:0: Video: jpeg2000, yuv422p10le(bottom coded first 
> (swapped)), 720x486, lossless, SAR 9:10 DAR 4:3, 29.97 fps, 29.97 tbr, 
> 29.97 tbn
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
From 6d47d60178f50091afc3b7193b752186603efc6a Mon Sep 17 00:00:00 2001
From: Jerome Martinez <jerome@mediaarea.net>
Date: Tue, 20 Feb 2024 16:04:11 +0100
Subject: [PATCH] avcodec/jpeg2000dec: support of 2 fields in 1 AVPacket

---
 libavcodec/jpeg2000dec.c | 78 ++++++++++++++++++++++++++++++++++++++++++------
 libavcodec/jpeg2000dec.h |  5 ++++
 2 files changed, 74 insertions(+), 9 deletions(-)

diff --git a/libavcodec/jpeg2000dec.c b/libavcodec/jpeg2000dec.c
index 691cfbd891..28a3e11020 100644
--- a/libavcodec/jpeg2000dec.c
+++ b/libavcodec/jpeg2000dec.c
@@ -194,6 +194,8 @@ static int get_siz(Jpeg2000DecoderContext *s)
     int ret;
     int o_dimx, o_dimy; //original image dimensions.
     int dimx, dimy;
+    int previous_width = s->width;
+    int previous_height = s->height;
 
     if (bytestream2_get_bytes_left(&s->g) < 36) {
         av_log(s->avctx, AV_LOG_ERROR, "Insufficient space for SIZ\n");
@@ -211,7 +213,7 @@ static int get_siz(Jpeg2000DecoderContext *s)
     s->tile_offset_y  = bytestream2_get_be32u(&s->g); // YT0Siz
     ncomponents       = bytestream2_get_be16u(&s->g); // CSiz
 
-    if (av_image_check_size2(s->width, s->height, s->avctx->max_pixels, AV_PIX_FMT_NONE, 0, s->avctx)) {
+    if (av_image_check_size2(s->width, s->height << (s->has_2_fields && s->height >= 0), s->avctx->max_pixels, AV_PIX_FMT_NONE, 0, s->avctx)) {
         avpriv_request_sample(s->avctx, "Large Dimensions");
         return AVERROR_PATCHWELCOME;
     }
@@ -301,6 +303,20 @@ static int get_siz(Jpeg2000DecoderContext *s)
             return AVERROR(ENOMEM);
     }
 
+    /* management of frames having 2 separate codestreams */
+    if (s->has_2_fields) {
+        s->height <<= 1;
+        s->image_offset_y <<= 1;
+        s->tile_offset_y <<= 1;
+        if (s->is_second_field && (s->width != previous_width || s->height != previous_height)) {
+            avpriv_request_sample(s->avctx, "Support of 2 JPEG 2000 codestreams with different base characteristics");
+            return AVERROR_PATCHWELCOME;
+        }
+        if (s->image_offset_y || s->tile_offset_y || (s->tile_height << 1) != s->height) {
+            av_log(s->avctx, AV_LOG_WARNING, "Decoding of 2 fields having titles in 1 AVPacket was not tested\n");
+        }
+    }
+
     /* compute image size with reduction factor */
     o_dimx = ff_jpeg2000_ceildivpow2(s->width  - s->image_offset_x,
                                                s->reduction_factor);
@@ -2001,7 +2017,7 @@ static inline void tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile
                                                                                                   \
             y    = tile->comp[compno].coord[1][0] -                                               \
                    ff_jpeg2000_ceildiv(s->image_offset_y, s->cdy[compno]);                        \
-            line = (PIXEL *)picture->data[plane] + y * (picture->linesize[plane] / sizeof(PIXEL));\
+            line = (PIXEL *)picture->data[plane] + (y + (s->is_second_field ^ s->is_bottom_coded_first)) * (picture->linesize[plane] / sizeof(PIXEL));\
             for (; y < h; y++) {                                                                  \
                 PIXEL *dst;                                                                       \
                                                                                                   \
@@ -2028,7 +2044,7 @@ static inline void tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile
                         dst += pixelsize;                                                         \
                     }                                                                             \
                 }                                                                                 \
-                line += picture->linesize[plane] / sizeof(PIXEL);                                 \
+                line += (picture->linesize[plane] << s->has_2_fields) / sizeof(PIXEL);            \
             }                                                                                     \
         }                                                                                         \
                                                                                                   \
@@ -2450,6 +2466,7 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
 {
     Jpeg2000DecoderContext *s = avctx->priv_data;
     int ret;
+    int codestream_size;
 
     s->avctx     = avctx;
     bytestream2_init(&s->g, avpkt->data, avpkt->size);
@@ -2484,20 +2501,50 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
         ret = AVERROR_INVALIDDATA;
         goto end;
     }
+
+    /* management of frames having 2 separate codestreams */
+    if (s->has_2_fields && !s->is_second_field) {
+        switch (avctx->field_order) {
+        case AV_FIELD_BB:
+        case AV_FIELD_BT:
+            s->is_bottom_coded_first = 1;
+            break;
+        default:
+            s->is_bottom_coded_first = 0;
+        }
+    }
+
     if (ret = jpeg2000_read_main_headers(s))
         goto end;
+    codestream_size = avpkt->size - bytestream2_get_bytes_left(&s->g);
+
+    /* management of frames having 2 separate codestreams */
+    if (bytestream2_get_bytes_left(&s->g) > 1 && bytestream2_peek_be16(&s->g) == JPEG2000_SOC) {
+        if (!s->has_2_fields) {
+            /* 2 codestreams newly detected, adatping output frame structure for handling 2 codestreams and parsing again the headers (fast and done once for a stable stream) */
+            s->has_2_fields = 1;
+            jpeg2000_dec_cleanup(s);
+            return jpeg2000_decode_frame(avctx, picture, got_frame, avpkt);
+        }
+    } else if (s->has_2_fields && !s->is_second_field) {
+        /* 1 codestream newly detected, adatping output frame structure for handling 1 codestream and parsing again the headers (fast and never done for a stable stream) */
+        s->has_2_fields = 0;
+        s->is_bottom_coded_first = 0;
+        jpeg2000_dec_cleanup(s);
+        return jpeg2000_decode_frame(avctx, picture, got_frame, avpkt);
+    }
 
     if (s->sar.num && s->sar.den)
         avctx->sample_aspect_ratio = s->sar;
     s->sar.num = s->sar.den = 0;
 
     if (avctx->skip_frame >= AVDISCARD_ALL) {
-        jpeg2000_dec_cleanup(s);
-        return avpkt->size;
+        ret = codestream_size;
+        goto end;
     }
 
     /* get picture buffer */
-    if ((ret = ff_thread_get_buffer(avctx, picture, 0)) < 0)
+    if ((!s->has_2_fields || !s->is_second_field) && (ret = ff_thread_get_buffer(avctx, picture, 0)) < 0)
         goto end;
     picture->pict_type = AV_PICTURE_TYPE_I;
     picture->flags |= AV_FRAME_FLAG_KEY;
@@ -2518,17 +2565,30 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
 
     avctx->execute2(avctx, jpeg2000_decode_tile, picture, NULL, s->numXtiles * s->numYtiles);
 
-    jpeg2000_dec_cleanup(s);
-
     *got_frame = 1;
 
     if (s->avctx->pix_fmt == AV_PIX_FMT_PAL8)
         memcpy(picture->data[1], s->palette, 256 * sizeof(uint32_t));
 
-    return bytestream2_tell(&s->g);
+    ret = codestream_size;
 
 end:
     jpeg2000_dec_cleanup(s);
+
+    /* management of frames having 2 separate codestreams */
+    if (s->has_2_fields && !s->is_second_field && ret < avpkt->size && ret >= 0) {
+        /* only the 1st codestream was parsed, parsing now the 2nd codestream */
+        s->is_second_field = 1;
+        avpkt->data += ret;
+        avpkt->size -= ret;
+        ret = jpeg2000_decode_frame(avctx, picture, got_frame, avpkt);
+        avpkt->data -= ret;
+        avpkt->size += ret;
+        s->is_second_field = 0;
+        if (ret >= 0)
+            ret += codestream_size;
+    }
+
     return ret;
 }
 
diff --git a/libavcodec/jpeg2000dec.h b/libavcodec/jpeg2000dec.h
index d0ca6e7a79..ce42812c48 100644
--- a/libavcodec/jpeg2000dec.h
+++ b/libavcodec/jpeg2000dec.h
@@ -114,6 +114,11 @@ typedef struct Jpeg2000DecoderContext {
 
     /*options parameters*/
     int             reduction_factor;
+    
+    /* field info */
+    int8_t          has_2_fields;
+    int8_t          is_bottom_coded_first;
+    int8_t          is_second_field;
 } Jpeg2000DecoderContext;
 
 #endif //AVCODEC_JPEG2000DEC_H
Tomas Härdin Feb. 21, 2024, 1:11 p.m. UTC | #22
This is still J2K specific rather than handling it properly for all
codecs all at once.

mxfdec can detect cases where there will be two separate fields in one
KLV, and the decoder(s) can if I'm not mistaken be instructed to decode
into an AVFrame with stride and offset set up for interlaced decoding.

It should be possible to have ffmpeg set up the necessary plumbing for
this.

/Tomas
Jerome Martinez Feb. 21, 2024, 2:27 p.m. UTC | #23
On 21/02/2024 14:11, Tomas Härdin wrote:

> mxfdec can detect cases where there will be two separate fields in one
> KLV,

In practice the issue is not to detect I2 case in mxfdec (it saves us 
only a little during probing of the first frame, but I could add such 
signaling in a patch after this one because it is actually independent 
from the management of 2 fields in the decoder if we want to support 
buggy files), the issue is to split per field in mxfdec.

SMPTE ST 422 extracts:
"Users are cautioned that the code values for SOC and EOC are not 
protected and can occur within the image size marker segment (SIZ), 
quantization marker segment (QCD), comment marker segment (COM) and 
other places."
"Decoders can derive the bytestream offsets of each field by analysing 
the code stream format within the essence element as described in 
ISO/IEC 15444-1."

Note that this MXF + jp2k spec hints that separating fields should be 
done in the decoder, not the demuxer.
It is impossible to split per field in a codec-neutral manner due to 
lack of metadata for that in MXF, and doing that in mxfdec implies to 
duplicate jp2k header parser code in mxfdec, and to add a similar parser 
per supported video format in the future.

> and the decoder(s) can if I'm not mistaken be instructed to decode
> into an AVFrame with stride and offset set up for interlaced decoding.


I checked the MPEG-2 Video decoder and it does not do what you say, it 
does what I do with this patch:
- mpegvideo_parser (so the parser of raw files, equivalent to mxfdec) 
understands that the stream has 2 separate fields and puts them in 1 
single AVPacket. It could separate them put it doesn't.
- mpeg12dec (equivalent to jpeg2000dec) understands that there are 2 
separate fields and applies a stride and offset internally and outputs a 
frame in AVFrame, not a field. It could provide fields but it doesn't.
I checked only what I know well, MPEG-2 Video, maybe it is done the way 
you indicate elsewhere, currently I see no example about the idea 
that stride and offset should be provided by the demuxer, would you mind 
to show some code in FFmpeg doing this way?
And stride and offset in signaling would mean that the target decoder 
must support stride and offset in signaling (including jpeg2000dec so 
getting a part of my current patch anyway), so no more "universal" as 
far as I understand.

Also:
Other comments say that AVPacket is expected to receive the packet as it 
in the container so no split of packet.
Other comments say that AVFrame is expected to receive a frame and 
never, at least for the moment, a field.
I don't see how to respect theses 2 indications, while 
mpegvideo_parser.c + mpeg12dec.c are conforming to theses indications 
because they do as in this patch.

My understanding of other comments is that my patch proposal is the 
right way to do it, I am lost here with contradictions in the indicated 
path for the support of such file.
e.g. https://ffmpeg.org/pipermail/ffmpeg-devel/2024-February/321540.html

> It should be possible to have ffmpeg set up the necessary plumbing for
> this.

But is it how it works elsewhere in FFmpeg? Would such complex and deep 
modifications be accepted by others?
Please confirm that it would be accepted before I go so deep in the 
changes of FFmpeg code, or please indicate where such tip is already 
stored in codec context.
And it is not only plumbing, it is implementing a jp2k parser in mxfdec 
for splitting fields, impossible otherwise to split the fields and it is 
NOT for all codec at once (which is impossible in practice due to the 
need to have codec dependent split of fields).

And in practice, what it the inconvenience to do this way, with very few 
modifications and in a similar manner of what is done elsewhere in 
FFmpeg with separate fields?
Especially with patch v2 which adds nearly (a couple of comparisons 
between integers) no performance impact to previously supported content.

As a summary, IMO this patch is conform to what is done elsewhere in 
FFmpeg for separated fiels e.g. in mpeg12dec and is conform to what is 
hinted in the specification by checking the 2nd codestream offset in the 
decoder rather than in the demuxer.

Jérôme
Tomas Härdin Feb. 24, 2024, 12:26 p.m. UTC | #24
ons 2024-02-21 klockan 15:27 +0100 skrev Jerome Martinez:
> On 21/02/2024 14:11, Tomas Härdin wrote:
> 
> > mxfdec can detect cases where there will be two separate fields in
> > one
> > KLV,
> 
> In practice the issue is not to detect I2 case in mxfdec (it saves us
> only a little during probing of the first frame, but I could add such
> signaling in a patch after this one because it is actually
> independent 
> from the management of 2 fields in the decoder if we want to support 
> buggy files), the issue is to split per field in mxfdec.
> 
> SMPTE ST 422 extracts:
> "Users are cautioned that the code values for SOC and EOC are not 
> protected and can occur within the image size marker segment (SIZ), 
> quantization marker segment (QCD), comment marker segment (COM) and 
> other places."
> "Decoders can derive the bytestream offsets of each field by
> analysing 
> the code stream format within the essence element as described in 
> ISO/IEC 15444-1."
> 
> Note that this MXF + jp2k spec hints that separating fields should be
> done in the decoder, not the demuxer.

We already have a j2k parser that could be pressed into service for
this. But perhaps parsing is not necessary, see below. My main concern
is that interlacing work the same for all codecs that mxfdec supports,
expecially rawvideo.

> It is impossible to split per field in a codec-neutral manner due to 
> lack of metadata for that in MXF, and doing that in mxfdec implies to
> duplicate jp2k header parser code in mxfdec, and to add a similar
> parser 
> per supported video format in the future.

We do have a bit of duplicate parsing in mxfdec (and mxfenc) already,
which I don't like. So I share your concern.

> > and the decoder(s) can if I'm not mistaken be instructed to decode
> > into an AVFrame with stride and offset set up for interlaced
> > decoding.
> 
> 
> I checked the MPEG-2 Video decoder and it does not do what you say

I didn't say that it does, I said that we could do that. If we conform
all wrappings to MIXED_FIELDS this is probably fine. I think it's been
well established that FFmpeg is MIXED_FIELDS internally.

> , it does what I do with this patch:
> - mpegvideo_parser (so the parser of raw files, equivalent to mxfdec)
> understands that the stream has 2 separate fields and puts them in 1 
> single AVPacket. It could separate them put it doesn't.

mpeg2 has a concept of fields that j2k doesn't. Again, we're not really
talking about j2k here but MXF


> > It should be possible to have ffmpeg set up the necessary plumbing
> > for
> > this.
> 
> But is it how it works elsewhere in FFmpeg? Would such complex and
> deep 
> modifications be accepted by others?

Good question. I would propose something like the following:

1) detect the use of SEPARATE_FIELDS and set a flag in AVStream
2) allocate AVFrame for the size of the resulting *frame*
3a) if the codec is inherently interlaced, call the decoder once
3b) if the codec is not inherently interlaced, call the decoder twice,
with appropriate stride, and keep track of the number of bytes decoded
so far so we know what offset to start the second decode from

The codecs for which 3b) applies include at least:

* jpeg2000
* ffv1
* rawvideo
* tiff

/Tomas
Jerome Martinez Feb. 25, 2024, 4:14 a.m. UTC | #25
On 24/02/2024 13:26, Tomas Härdin wrote:
> [...]
>>> It should be possible to have ffmpeg set up the necessary plumbing
>>> for
>>> this.
>> But is it how it works elsewhere in FFmpeg? Would such complex and
>> deep
>> modifications be accepted by others?
> Good question. I would propose something like the following:
>
> 1) detect the use of SEPARATE_FIELDS and set a flag in AVStream


As in practice and in that case (2 jp2k codestreams per AVPacket) it is 
only a tip (because we can autodetect and there are many buggy files in 
the wild) for the jpeg2000 decoder, I was planning to add that later in 
a separate patch, but attached is a version with the flag.

> 2) allocate AVFrame for the size of the resulting *frame*
So keeping what is already there.

> 3a) if the codec is inherently interlaced, call the decoder once
> 3b) if the codec is not inherently interlaced, call the decoder twice,
> with appropriate stride, and keep track of the number of bytes decoded
> so far so we know what offset to start the second decode from


The place I see for that is in decode_simple_internal().
But it is a very hot place I don't like to modify, and it seems to me 
some extra code for 99.9999% (or even more 9s) of files which don't need 
such feature, with more risk to forget this specific feature during a 
future dev e.g. not obvious to change also in ff_thread_decode_frame 
when touching this part.
I also needed to add a dedicated AVStream field for saying that the 
decoder is able to manage this functionality (and is needed there).

What is the added value to call the decoder twice from decode.c rather 
than recursive call (or a master function in the decoder calling the 
current function twice, if preferred) inside the decoder only?
As far as I understand, it would not help for other formats (only the 
signaling propagation in AVStream helps and it is done by another 
AVStream field) and I personally highly prefer that such feature is as 
much as possible in a single place in each decoder rather than pieces a 
bit everywhere, and each decoder needs to be upgraded anyway.

> The codecs for which 3b) applies include at least:
>
> * jpeg2000

Our use case.

> * ffv1

FFV1 has its own flags internally for interlaced content (interleaved 
method only) and I expect no work for separated fields. the MXF/FFV1 
spec does not plan separated fields for FFV1, and there is no byte in 
the essence label for that.


> * rawvideo
> * tiff

I didn't find specifications for the essence label UL corresponding and 
I have no file for that, as far as I understand it is highly theoretical 
but if it appears would be only a matter of mapping the MXF signaling to 
the new AVStream field and supporting the feature in the decoders (even 
if we implement the idea of calling the decoder twice, the decoder needs 
to be expanded for this feature).
So IMO no dev to do there too for the moment.

Jérôme
From f4311b718012a92590ce6168355ec118e02052a8 Mon Sep 17 00:00:00 2001
From: Jerome Martinez <jerome@mediaarea.net>
Date: Tue, 20 Feb 2024 16:04:11 +0100
Subject: [PATCH] avcodec/jpeg2000dec: support of 2 fields in 1 AVPacket

---
 libavcodec/avcodec.h       | 14 +++++++++
 libavcodec/codec_par.c     |  3 ++
 libavcodec/codec_par.h     |  5 ++++
 libavcodec/decode.c        |  3 ++
 libavcodec/defs.h          |  7 +++++
 libavcodec/jpeg2000dec.c   | 73 +++++++++++++++++++++++++++++++++++++++-------
 libavcodec/jpeg2000dec.h   |  6 ++++
 libavcodec/pthread_frame.c |  3 ++
 libavformat/mxfdec.c       | 14 +++++++++
 9 files changed, 118 insertions(+), 10 deletions(-)

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 7fb44e28f4..38d63adc0f 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -2116,6 +2116,20 @@ typedef struct AVCodecContext {
      *   an error.
      */
     int64_t frame_num;
+
+    /**
+     * Video only. The way separate fields are wrapped in the container
+     * - decoding: tip from the demuxer
+     * - encoding: not (yet) used
+     */
+    enum AVFrameWrapping               frame_wrapping;
+
+    /**
+     * Video only. Indicate if running the decoder twice for a single AVFrame is supported
+     * - decoding: set by the decoder
+     * - encoding: not  used
+     */
+    int                                frame_wrapping_field_2_supported;
 } AVCodecContext;
 
 /**
diff --git a/libavcodec/codec_par.c b/libavcodec/codec_par.c
index abaac63841..3f26f9d4d6 100644
--- a/libavcodec/codec_par.c
+++ b/libavcodec/codec_par.c
@@ -51,6 +51,7 @@ static void codec_parameters_reset(AVCodecParameters *par)
     par->framerate           = (AVRational){ 0, 1 };
     par->profile             = AV_PROFILE_UNKNOWN;
     par->level               = AV_LEVEL_UNKNOWN;
+    par->frame_wrapping      = AV_WRAPPING_UNKNOWN;
 }
 
 AVCodecParameters *avcodec_parameters_alloc(void)
@@ -165,6 +166,7 @@ int avcodec_parameters_from_context(AVCodecParameters *par,
         par->sample_aspect_ratio = codec->sample_aspect_ratio;
         par->video_delay         = codec->has_b_frames;
         par->framerate           = codec->framerate;
+        par->frame_wrapping      = codec->frame_wrapping;
         break;
     case AVMEDIA_TYPE_AUDIO:
         par->format           = codec->sample_fmt;
@@ -252,6 +254,7 @@ int avcodec_parameters_to_context(AVCodecContext *codec,
         codec->sample_aspect_ratio    = par->sample_aspect_ratio;
         codec->has_b_frames           = par->video_delay;
         codec->framerate              = par->framerate;
+        codec->frame_wrapping         = par->frame_wrapping;
         break;
     case AVMEDIA_TYPE_AUDIO:
         codec->sample_fmt       = par->format;
diff --git a/libavcodec/codec_par.h b/libavcodec/codec_par.h
index f42dd3b1d5..1e53292553 100644
--- a/libavcodec/codec_par.h
+++ b/libavcodec/codec_par.h
@@ -136,6 +136,11 @@ typedef struct AVCodecParameters {
     enum AVFieldOrder                  field_order;
 
     /**
+     * Video only. The way separate fields are wrapped in the container
+     */
+    enum AVFrameWrapping               frame_wrapping;
+
+    /**
      * Video only. Additional colorspace characteristics.
      */
     enum AVColorRange                  color_range;
diff --git a/libavcodec/decode.c b/libavcodec/decode.c
index 2cfb3fcf97..979759d84a 100644
--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -428,6 +428,9 @@ static inline int decode_simple_internal(AVCodecContext *avctx, AVFrame *frame,
         consumed = ff_thread_decode_frame(avctx, frame, &got_frame, pkt);
     } else {
         consumed = codec->cb.decode(avctx, frame, &got_frame, pkt);
+        if (consumed >= 0 && avctx->frame_wrapping_field_2_supported) {
+            consumed = codec->cb.decode(avctx, frame, &got_frame, pkt);
+        }
 
         if (!(codec->caps_internal & FF_CODEC_CAP_SETS_PKT_DTS))
             frame->pkt_dts = pkt->dts;
diff --git a/libavcodec/defs.h b/libavcodec/defs.h
index 00d840ec19..8f7ecf81c5 100644
--- a/libavcodec/defs.h
+++ b/libavcodec/defs.h
@@ -204,6 +204,13 @@ enum AVFieldOrder {
     AV_FIELD_BT,          ///< Bottom coded first, top displayed first
 };
 
+enum AVFrameWrapping {
+    AV_WRAPPING_UNKNOWN,
+    AV_WRAPPING_FRAME,    ///< if interlaced content: lines are interleaved
+    AV_WRAPPING_FIELD_1,  ///< each field is an independent encoded item, 1 field per AVPacket
+    AV_WRAPPING_FIELD_2,  ///< each field is an independent encoded item, 2 fields per AVPacket
+};
+
 /**
  * @ingroup lavc_decoding
  */
diff --git a/libavcodec/jpeg2000dec.c b/libavcodec/jpeg2000dec.c
index 691cfbd891..c9b935d97b 100644
--- a/libavcodec/jpeg2000dec.c
+++ b/libavcodec/jpeg2000dec.c
@@ -194,6 +194,8 @@ static int get_siz(Jpeg2000DecoderContext *s)
     int ret;
     int o_dimx, o_dimy; //original image dimensions.
     int dimx, dimy;
+    int previous_width = s->width;
+    int previous_height = s->height;
 
     if (bytestream2_get_bytes_left(&s->g) < 36) {
         av_log(s->avctx, AV_LOG_ERROR, "Insufficient space for SIZ\n");
@@ -211,7 +213,7 @@ static int get_siz(Jpeg2000DecoderContext *s)
     s->tile_offset_y  = bytestream2_get_be32u(&s->g); // YT0Siz
     ncomponents       = bytestream2_get_be16u(&s->g); // CSiz
 
-    if (av_image_check_size2(s->width, s->height, s->avctx->max_pixels, AV_PIX_FMT_NONE, 0, s->avctx)) {
+    if (av_image_check_size2(s->width, s->height << (s->has_2_fields && s->height >= 0), s->avctx->max_pixels, AV_PIX_FMT_NONE, 0, s->avctx)) {
         avpriv_request_sample(s->avctx, "Large Dimensions");
         return AVERROR_PATCHWELCOME;
     }
@@ -301,6 +303,20 @@ static int get_siz(Jpeg2000DecoderContext *s)
             return AVERROR(ENOMEM);
     }
 
+    /* management of frames having 2 separate codestreams */
+    if (s->has_2_fields) {
+        s->height <<= 1;
+        s->image_offset_y <<= 1;
+        s->tile_offset_y <<= 1;
+        if (s->is_second_field && (s->width != previous_width || s->height != previous_height)) {
+            avpriv_request_sample(s->avctx, "Support of 2 JPEG 2000 codestreams with different base characteristics");
+            return AVERROR_PATCHWELCOME;
+        }
+        if (s->image_offset_y || s->tile_offset_y || (s->tile_height << 1) != s->height) {
+            av_log(s->avctx, AV_LOG_WARNING, "Decoding of 2 fields having titles in 1 AVPacket was not tested\n");
+        }
+    }
+
     /* compute image size with reduction factor */
     o_dimx = ff_jpeg2000_ceildivpow2(s->width  - s->image_offset_x,
                                                s->reduction_factor);
@@ -2001,7 +2017,7 @@ static inline void tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile
                                                                                                   \
             y    = tile->comp[compno].coord[1][0] -                                               \
                    ff_jpeg2000_ceildiv(s->image_offset_y, s->cdy[compno]);                        \
-            line = (PIXEL *)picture->data[plane] + y * (picture->linesize[plane] / sizeof(PIXEL));\
+            line = (PIXEL *)picture->data[plane] + (y + (s->is_second_field ^ s->is_bottom_coded_first)) * (picture->linesize[plane] / sizeof(PIXEL));\
             for (; y < h; y++) {                                                                  \
                 PIXEL *dst;                                                                       \
                                                                                                   \
@@ -2028,7 +2044,7 @@ static inline void tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile
                         dst += pixelsize;                                                         \
                     }                                                                             \
                 }                                                                                 \
-                line += picture->linesize[plane] / sizeof(PIXEL);                                 \
+                line += (picture->linesize[plane] << s->has_2_fields) / sizeof(PIXEL);            \
             }                                                                                     \
         }                                                                                         \
                                                                                                   \
@@ -2441,6 +2457,9 @@ static av_cold int jpeg2000_decode_init(AVCodecContext *avctx)
 
     ff_jpeg2000dsp_init(&s->dsp);
     ff_jpeg2000_init_tier1_luts();
+    
+    s->has_2_fields = avctx->frame_wrapping == AV_WRAPPING_FIELD_2;
+    avctx->frame_wrapping_field_2_supported = s->has_2_fields;
 
     return 0;
 }
@@ -2450,9 +2469,10 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
 {
     Jpeg2000DecoderContext *s = avctx->priv_data;
     int ret;
+    int codestream_size;
 
     s->avctx     = avctx;
-    bytestream2_init(&s->g, avpkt->data, avpkt->size);
+    bytestream2_init(&s->g, avpkt->data + s->consumed, avpkt->size - s->consumed);
     s->curtileno = -1;
     memset(s->cdef, -1, sizeof(s->cdef));
 
@@ -2484,20 +2504,50 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
         ret = AVERROR_INVALIDDATA;
         goto end;
     }
+
+    /* management of frames having 2 separate codestreams */
+    if (s->has_2_fields && !s->is_second_field) {
+        switch (avctx->field_order) {
+        case AV_FIELD_BB:
+        case AV_FIELD_BT:
+            s->is_bottom_coded_first = 1;
+            break;
+        default:
+            s->is_bottom_coded_first = 0;
+        }
+    }
+
     if (ret = jpeg2000_read_main_headers(s))
         goto end;
+    codestream_size = avpkt->size - bytestream2_get_bytes_left(&s->g);
+
+    /* management of frames having 2 separate codestreams */
+    if (bytestream2_get_bytes_left(&s->g) > 1 && bytestream2_peek_be16(&s->g) == JPEG2000_SOC) {
+        if (!s->has_2_fields) {
+            /* 2 codestreams newly detected, adatping output frame structure for handling 2 codestreams and parsing again the headers (fast and never done if wrapper has the right tip) */
+            s->has_2_fields = 1;
+            jpeg2000_dec_cleanup(s);
+            return jpeg2000_decode_frame(avctx, picture, got_frame, avpkt);
+        }
+    } else if (s->has_2_fields && !s->is_second_field) {
+        /* 1 codestream newly detected, adatping output frame structure for handling 1 codestream and parsing again the headers (fast and never done if wrapper has the right tip) */
+        s->has_2_fields = 0;
+        s->is_bottom_coded_first = 0;
+        jpeg2000_dec_cleanup(s);
+        return jpeg2000_decode_frame(avctx, picture, got_frame, avpkt);
+    }
 
     if (s->sar.num && s->sar.den)
         avctx->sample_aspect_ratio = s->sar;
     s->sar.num = s->sar.den = 0;
 
     if (avctx->skip_frame >= AVDISCARD_ALL) {
-        jpeg2000_dec_cleanup(s);
-        return avpkt->size;
+        ret = codestream_size;
+        goto end;
     }
 
     /* get picture buffer */
-    if ((ret = ff_thread_get_buffer(avctx, picture, 0)) < 0)
+    if ((!s->has_2_fields || !s->is_second_field) && (ret = ff_thread_get_buffer(avctx, picture, 0)) < 0)
         goto end;
     picture->pict_type = AV_PICTURE_TYPE_I;
     picture->flags |= AV_FRAME_FLAG_KEY;
@@ -2518,17 +2568,20 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
 
     avctx->execute2(avctx, jpeg2000_decode_tile, picture, NULL, s->numXtiles * s->numYtiles);
 
-    jpeg2000_dec_cleanup(s);
-
     *got_frame = 1;
 
     if (s->avctx->pix_fmt == AV_PIX_FMT_PAL8)
         memcpy(picture->data[1], s->palette, 256 * sizeof(uint32_t));
 
-    return bytestream2_tell(&s->g);
+    ret = codestream_size;
 
 end:
     jpeg2000_dec_cleanup(s);
+
+    /* management of frames having 2 separate codestreams */
+    s->is_second_field = s->has_2_fields && !s->is_second_field && ret < avpkt->size && ret >= 0; /* next call will handle the second field */
+    s->consumed = s->is_second_field ? ret : 0;
+
     return ret;
 }
 
diff --git a/libavcodec/jpeg2000dec.h b/libavcodec/jpeg2000dec.h
index d0ca6e7a79..5ae94aafd8 100644
--- a/libavcodec/jpeg2000dec.h
+++ b/libavcodec/jpeg2000dec.h
@@ -114,6 +114,12 @@ typedef struct Jpeg2000DecoderContext {
 
     /*options parameters*/
     int             reduction_factor;
+    
+    /* field info */
+    int8_t          has_2_fields;
+    int8_t          is_bottom_coded_first;
+    int8_t          is_second_field;
+    int             consumed;
 } Jpeg2000DecoderContext;
 
 #endif //AVCODEC_JPEG2000DEC_H
diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
index 71e99a5728..e3a8815653 100644
--- a/libavcodec/pthread_frame.c
+++ b/libavcodec/pthread_frame.c
@@ -221,6 +221,9 @@ static attribute_align_arg void *frame_worker_thread(void *arg)
         av_frame_unref(p->frame);
         p->got_frame = 0;
         p->result = codec->cb.decode(avctx, p->frame, &p->got_frame, p->avpkt);
+        if (p->result >= 0 && avctx->frame_wrapping_field_2_supported) {
+            p->result = codec->cb.decode(avctx, p->frame, &p->got_frame, p->avpkt);
+        }
 
         if ((p->result < 0 || !p->got_frame) && p->frame->buf[0])
             av_frame_unref(p->frame);
diff --git a/libavformat/mxfdec.c b/libavformat/mxfdec.c
index e42975e7fd..af33d8cad4 100644
--- a/libavformat/mxfdec.c
+++ b/libavformat/mxfdec.c
@@ -2948,6 +2948,20 @@ static int mxf_parse_structural_metadata(MXFContext *mxf)
                 default:
                     break;
                 }
+                switch ((*essence_container_ul)[14]) {
+                case 3: /* I1: Interlaced Frame, 1 field/KLV */
+                    st->codecpar->frame_wrapping = AV_WRAPPING_FIELD_1;
+                    break;
+                case 4: /* I2: Interlaced Frame, 2 fields/KLV */
+                    st->codecpar->frame_wrapping = AV_WRAPPING_FIELD_2;
+                    break;
+                case 2: /* Cn: Clip- wrapped Picture Element */
+                case 6: /* P1: Frame- wrapped Picture Element */
+                    st->codecpar->frame_wrapping = AV_WRAPPING_FRAME;
+                    break;
+                default:
+                    break;
+                }
             }
 
             if (st->codecpar->codec_id == AV_CODEC_ID_PRORES) {
Tomas Härdin March 1, 2024, 10:29 p.m. UTC | #26
sön 2024-02-25 klockan 05:14 +0100 skrev Jerome Martinez:
> On 24/02/2024 13:26, Tomas Härdin wrote:
> > 3a) if the codec is inherently interlaced, call the decoder once
> > 3b) if the codec is not inherently interlaced, call the decoder
> > twice,
> > with appropriate stride, and keep track of the number of bytes
> > decoded
> > so far so we know what offset to start the second decode from
> 
> 
> The place I see for that is in decode_simple_internal().
> But it is a very hot place I don't like to modify, and it seems to me
> some extra code for 99.9999% (or even more 9s) of files which don't
> need 
> such feature,

See below

> with more risk to forget this specific feature during a
> future dev e.g. not obvious to change also in ff_thread_decode_frame 
> when touching this part.

This is why we adds tests.

> I also needed to add a dedicated AVStream field for saying that the 
> decoder is able to manage this functionality (and is needed there).
> 
> What is the added value to call the decoder twice from decode.c
> rather 
> than recursive call (or a master function in the decoder calling the 
> current function twice, if preferred) inside the decoder only?

We get support for all codecs that can do SEPARATE_FIELDS in MXF,
rather than a half measure that only supports j2k, that we'd have to
fix later anyway.


> > The codecs for which 3b) applies include at least:
> > 
> > * jpeg2000
> 
> Our use case.
> 
> > * ffv1
> 
> FFV1 has its own flags internally for interlaced content (interleaved
> method only) and I expect no work for separated fields. the MXF/FFV1 
> spec does not plan separated fields for FFV1, and there is no byte in
> the essence label for that.

Ah, I missed that

> > * rawvideo
> > * tiff
> 
> I didn't find specifications for the essence label UL corresponding

ULs aren't for specifying interlacing type (though I wouldn't be
surprised if there's a mapping that misuse them for that)

> and 
> I have no file for that, as far as I understand it is highly
> theoretical 

I've had clients asking about VBI data in MXF, which sometimes are
stored as raw samples. It is not outside the realm of possibility that
someone using MXF for archival purposes would use raw video which, due
to the nature of capture equipment, would be stored as SEPARATE_FIELDS.
I'm not aware of a mapping document for this however, but I am aware of
people doing precisely this kind of stuff.

> but if it appears would be only a matter of mapping the MXF signaling
> to 
> the new AVStream field and supporting the feature in the decoders
> (even 
> if we implement the idea of calling the decoder twice, the decoder
> needs 
> to be expanded for this feature).

So in other words putting it into every decoder for which there exists
an MXF mapping for SEPARATE_FIELDS, rather than doing it properly?

/Tomas
Jerome Martinez March 3, 2024, 5:07 p.m. UTC | #27
On 01/03/2024 23:29, Tomas Härdin wrote:
> sön 2024-02-25 klockan 05:14 +0100 skrev Jerome Martinez:
> [...]
>> I also needed to add a dedicated AVStream field for saying that the
>> decoder is able to manage this functionality (and is needed there).
>>
>> What is the added value to call the decoder twice from decode.c
>> rather
>> than recursive call (or a master function in the decoder calling the
>> current function twice, if preferred) inside the decoder only?
> We get support for all codecs that can do SEPARATE_FIELDS in MXF,
> rather than a half measure that only supports j2k, that we'd have to
> fix later anyway.

I personally don't understand how, because in both cases we need a 
decoder aware about this feature (for the stride during decoding), 
anyway if you think that it will be useful in the future for another 
codec which would have separate fields, I warned about the issues I see 
in practice and it does not matter to me that it is done the way you 
want, my goal being that the upstream FFmpeg, rather than only my build, 
does not trash half of a frame (current behavior), however it is done.


>> [...]
>> I didn't find specifications for the essence label UL corresponding
> ULs aren't for specifying interlacing type (though I wouldn't be
> surprised if there's a mapping that misuse them for that)


In practice for MXF jp2k the store method is provided by the UL (byte 15 
of the essence container label), it is in the specs and all the other 
items (frame layout, sample rate, edit rate, aspect ratio) alone don't 
provide enough information (and are often buggy), I did not decide about 
that.

About other formats and if it should not depend on the UL, I did not 
find any information about separate fields, difficult for me to prove 
that something does not exist, could you indicate another spec 
specifying differently separate fields?

In practice and as far as I know, we currently have only jp2k with 2 
completely separate codestreams in MXF, so I implemented in my patch for 
all existing specifications (and files) I am aware about, i.e. 1.


> [...]
>
>> but if it appears would be only a matter of mapping the MXF signaling
>> to
>> the new AVStream field and supporting the feature in the decoders
>> (even
>> if we implement the idea of calling the decoder twice, the decoder
>> needs
>> to be expanded for this feature).
> So in other words putting it into every decoder for which there exists
> an MXF mapping for SEPARATE_FIELDS, rather than doing it properly?


As said above, I am not convinced that calling the decoder twice from 
decode.c (or similar) is properly doing things, but if you think that 
this is properly done if it is done this way, fine for me.

Patch v3 contains all the requested changes (MXF config propagation to 
the decoder, calling the decoder twice), is there anything else in this 
patch proposal preventing it to be applied?

Jérôme
Jerome Martinez April 24, 2024, 11:20 a.m. UTC | #28
Hi, I'm bumping this patch proposal for avoiding a situation where 
FFmpeg skips half the visual content when 2 jpeg2000 codestreams are 
present in one frame. I re-reviewed this discussion and think I answered 
all concerns. I'm hesitant with patch v3 because I consider that 
touching frame_worker_thread for this feature is not so useful but I am 
ok with either patch v2 or v3.
Thanks much,
     Jérôme

On 20/02/2024 16:07, Jerome Martinez wrote:
> Attached is an updated version of the patch proposal.
>
> About the idea to keep separate fields in the output AVFrame, I note 
> from the discussion that it is commonly accepted that up to now it is 
> expected that the AVPacket contains what is in the MXF element and 
> that the AVFrame contains a frame and never a field, and additionally 
> I see in e.g. mpeg12dec.c that the decoder interleaves separate fields:
> mb_y <<= field_pic;
> if (s2->picture_structure == PICT_BOTTOM_FIELD)
>     mb_y++;
> And mpegvideo_parser.c creates a AVPacket with both fields in AVPacket 
> even if they are separated, this patch keeps the AVPacket from e.g. 
> MXF with both fields in it and does something similar to what do other 
> decoders with separate fields in the output AVFRame.
>
> About the detection of the 2 separated fields in 1 packet in the MXF 
> file (I2 mode), it is doable in the example file provided in the first 
> patch proposal to catch it by checking the essence label but other 
> files I have don't have the specific essence label (they use the 
> "undefined" essence label, legit) so IMO we should not rely on it and 
> there is actually no practical advantage in catching that from the 
> container.
>
> In practice this new patch proposal is slightly more complex, with one 
> recursive call to jpeg2000_decode_frame() when there are 2 codestreams 
> in 1 AVPacket, but it has a negligible performance impact (few 
> comparisons and bool checks) when there is only one codestream in the 
> AVPacket (the currently only supported method, remind that currently 
> FFmpeg completely discards the 2nd codestream present in the AVPacket) 
> and it relies on the current jpeg2000_read_main_headers() function for 
> catching the end of the first codestream (safer than the quick find of 
> EOC/SOC in the first version).
>
> It also changes the jpeg2000_decode_frame return value to the count of 
> bytes parsed, it seems that it was what is expected but in practice it 
> was not doing that, fixing the return value could be a separate patch 
> if preferred.
>
> Jérôme
>
> On 02/02/2024 16:55, Jerome Martinez wrote:
>> Before this patch, the FFmpeg MXF parser correctly detects content 
>> with 2 fields in 1 AVPacket as e.g. interlaced 720x486 but the FFmpeg 
>> JPEG 2000 decoder reads the JPEG 2000 SIZ header without 
>> understanding that the indicated height is the height of 1 field only 
>> so overwrites the frame size info with e.g. 720x243, and also 
>> completely discards the second frame, which lead to the decoding of 
>> only half of the stored content as "progressive" 720x243 flagged 
>> interlaced.
>>
>> Example file:
>> https://www.digitizationguidelines.gov/guidelines/MXF_sampleFiles/RDD48-sample12-gf-jpeg2000-ntsc-4.2.zip 
>>
>>
>> Before this patch:
>> Stream #0:0: Video: jpeg2000, yuv422p10le(bottom coded first 
>> (swapped)), 720x243, lossless, SAR 9:20 DAR 4:3, 29.97 tbr, 29.97 
>> tbn, 29.97 tbc
>>
>> After this patch:
>> Stream #0:0: Video: jpeg2000, yuv422p10le(bottom coded first 
>> (swapped)), 720x486, lossless, SAR 9:10 DAR 4:3, 29.97 fps, 29.97 
>> tbr, 29.97 tbn
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Tomas Härdin April 24, 2024, 10:54 p.m. UTC | #29
ons 2024-04-24 klockan 13:20 +0200 skrev Jerome Martinez:
> Hi, I'm bumping this patch proposal for avoiding a situation where 
> FFmpeg skips half the visual content when 2 jpeg2000 codestreams are 
> present in one frame. I re-reviewed this discussion and think I
> answered 
> all concerns. I'm hesitant with patch v3 because I consider that 
> touching frame_worker_thread for this feature is not so useful but I
> am 
> ok with either patch v2 or v3.

Did you look at doing this further up, say in demux.c? Because adding
codec-specific logic for this seems wrong, when J2K is far from the
only codec that can be used like this in MXF..

/Tomas
Michael Niedermayer April 25, 2024, 12:59 a.m. UTC | #30
On Wed, Apr 24, 2024 at 01:20:02PM +0200, Jerome Martinez wrote:
> Hi, I'm bumping this patch proposal for avoiding a situation where FFmpeg
> skips half the visual content when 2 jpeg2000 codestreams are present in one
> frame. I re-reviewed this discussion and think I answered all concerns. I'm
> hesitant with patch v3 because I consider that touching frame_worker_thread
> for this feature is not so useful but I am ok with either patch v2 or v3.
> Thanks much,
>     Jérôme

I hope i applied the correct patch:
 libavcodec/avcodec.h       | 14 ++++++++++++++
 libavcodec/codec_par.c     |  3 +++
 libavcodec/codec_par.h     |  5 +++++
 libavcodec/decode.c        |  3 +++
 libavcodec/defs.h          |  7 +++++++
 libavcodec/jpeg2000dec.c   | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------
 libavcodec/jpeg2000dec.h   |  6 ++++++
 libavcodec/pthread_frame.c |  3 +++
 libavformat/mxfdec.c       | 14 ++++++++++++++
 9 files changed, 118 insertions(+), 10 deletions(-)

but that with:

./ffmpeg -i ~/tickets/1102/ntsc2.mxf -bitexact  -vframes 1 /tmp/file.bmp

segafaults:
==1031087== Invalid write of size 2
==1031087==    at 0x9D758F: jpeg2000_decode_tile (in ffmpeg/ffmpeg_g)
==1031087==    by 0x7C3820: avcodec_default_execute2 (in ffmpeg/ffmpeg_g)
==1031087==    by 0x9DCE6E: jpeg2000_decode_frame (in ffmpeg/ffmpeg_g)
==1031087==    by 0xAF912F: frame_worker_thread (in ffmpeg/ffmpeg_g)
==1031087==    by 0x4A00608: start_thread (pthread_create.c:477)
==1031087==    by 0x8104352: clone (clone.S:95)
==1031087==  Address 0x0 is not stack'd, malloc'd or (recently) free'd


[...]
diff mbox series

Patch

diff --git a/libavcodec/jpeg2000dec.c b/libavcodec/jpeg2000dec.c
index 691cfbd891..d8bfca390e 100644
--- a/libavcodec/jpeg2000dec.c
+++ b/libavcodec/jpeg2000dec.c
@@ -194,6 +194,8 @@  static int get_siz(Jpeg2000DecoderContext *s)
     int ret;
     int o_dimx, o_dimy; //original image dimensions.
     int dimx, dimy;
+    int previous_width = s->width;
+    int previous_height = s->height;
 
     if (bytestream2_get_bytes_left(&s->g) < 36) {
         av_log(s->avctx, AV_LOG_ERROR, "Insufficient space for SIZ\n");
@@ -211,7 +213,7 @@  static int get_siz(Jpeg2000DecoderContext *s)
     s->tile_offset_y  = bytestream2_get_be32u(&s->g); // YT0Siz
     ncomponents       = bytestream2_get_be16u(&s->g); // CSiz
 
-    if (av_image_check_size2(s->width, s->height, s->avctx->max_pixels, AV_PIX_FMT_NONE, 0, s->avctx)) {
+    if (av_image_check_size2(s->width, s->height << (s->height >= 0 && s->has_2_fields), s->avctx->max_pixels, AV_PIX_FMT_NONE, 0, s->avctx)) {
         avpriv_request_sample(s->avctx, "Large Dimensions");
         return AVERROR_PATCHWELCOME;
     }
@@ -301,6 +303,19 @@  static int get_siz(Jpeg2000DecoderContext *s)
             return AVERROR(ENOMEM);
     }
 
+    if (s->has_2_fields) {
+        s->height <<= 1;
+        s->image_offset_y <<= 1;
+        s->tile_offset_y <<= 1;
+        if (s->is_second_field && (s->width != previous_width || s->height != previous_height)) {
+            avpriv_request_sample(s->avctx, "Pixel size of the 2 fields of the frame are not same");
+            return AVERROR_PATCHWELCOME;
+        }
+        if (s->image_offset_y || s->tile_offset_y || (s->tile_height << 1) != s->height) {
+            av_log(s->avctx, AV_LOG_WARNING, "Decoding of 2 fields having titles in 1 AVPacket was not tested\n");
+        }
+    }
+
     /* compute image size with reduction factor */
     o_dimx = ff_jpeg2000_ceildivpow2(s->width  - s->image_offset_x,
                                                s->reduction_factor);
@@ -2001,7 +2016,7 @@  static inline void tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile
                                                                                                   \
             y    = tile->comp[compno].coord[1][0] -                                               \
                    ff_jpeg2000_ceildiv(s->image_offset_y, s->cdy[compno]);                        \
-            line = (PIXEL *)picture->data[plane] + y * (picture->linesize[plane] / sizeof(PIXEL));\
+            line = (PIXEL *)picture->data[plane] + (y + (s->is_second_field ^ s->is_bottom_coded_first)) * (picture->linesize[plane] / sizeof(PIXEL));\
             for (; y < h; y++) {                                                                  \
                 PIXEL *dst;                                                                       \
                                                                                                   \
@@ -2028,7 +2043,7 @@  static inline void tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile
                         dst += pixelsize;                                                         \
                     }                                                                             \
                 }                                                                                 \
-                line += picture->linesize[plane] / sizeof(PIXEL);                                 \
+                line += (picture->linesize[plane] << s->has_2_fields) / sizeof(PIXEL);            \
             }                                                                                     \
         }                                                                                         \
                                                                                                   \
@@ -2445,8 +2460,8 @@  static av_cold int jpeg2000_decode_init(AVCodecContext *avctx)
     return 0;
 }
 
-static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
-                                 int *got_frame, AVPacket *avpkt)
+static int jpeg2000_decode_frame_picture(AVCodecContext *avctx, AVFrame *picture,
+                                 AVPacket *avpkt)
 {
     Jpeg2000DecoderContext *s = avctx->priv_data;
     int ret;
@@ -2497,7 +2512,7 @@  static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
     }
 
     /* get picture buffer */
-    if ((ret = ff_thread_get_buffer(avctx, picture, 0)) < 0)
+    if ((!s->has_2_fields || !s->is_second_field) && (ret = ff_thread_get_buffer(avctx, picture, 0)) < 0)
         goto end;
     picture->pict_type = AV_PICTURE_TYPE_I;
     picture->flags |= AV_FRAME_FLAG_KEY;
@@ -2520,8 +2535,6 @@  static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
 
     jpeg2000_dec_cleanup(s);
 
-    *got_frame = 1;
-
     if (s->avctx->pix_fmt == AV_PIX_FMT_PAL8)
         memcpy(picture->data[1], s->palette, 256 * sizeof(uint32_t));
 
@@ -2532,6 +2545,64 @@  end:
     return ret;
 }
 
+static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
+                                 int *got_frame, AVPacket *avpkt)
+{
+    Jpeg2000DecoderContext *s = avctx->priv_data;
+    int picture_1_size = avpkt->size, picture_2_size = 0;
+    int ret1 = 0, ret2 = 0;
+    int may_have_2_fields_in_1_packet = 0;
+
+    // find if there are 2 JPEG2000 pictures in a single packet
+    s->has_2_fields = 0;
+    s->is_bottom_coded_first = 0;
+    s->is_second_field = 0;
+    switch (avctx->field_order) {
+    case AV_FIELD_TT:
+    case AV_FIELD_TB:
+        may_have_2_fields_in_1_packet = 1;
+        break;
+    case AV_FIELD_BB:
+    case AV_FIELD_BT:
+        may_have_2_fields_in_1_packet = 2;
+        break;
+    }
+    if (may_have_2_fields_in_1_packet) {
+        for (int i = 0; i < avpkt->size - 4; i++) {
+            static const unsigned char EOI_SOI[4] = { 0xFF, 0xD9, 0xFF, 0x4F };
+            if (!memcmp(avpkt->data + i, EOI_SOI, 4)) {
+                if (picture_2_size) {
+                    av_log(s->avctx, AV_LOG_WARNING, "EIO SOI sequence found twice, risk of wrong detection\n");
+                } else {
+                    picture_1_size = i + 2;
+                    picture_2_size = avpkt->size - picture_1_size;
+                    s->has_2_fields = 1;
+                    s->is_bottom_coded_first = may_have_2_fields_in_1_packet - 1;
+                }
+            }
+        }
+    }
+
+    // parsing full frame or first picture
+    avpkt->size -= picture_2_size;
+    ret1 = jpeg2000_decode_frame_picture(avctx, picture, avpkt);
+    
+    // parsing second picture if present
+    if (picture_2_size) {
+        avpkt->data += picture_1_size;
+        avpkt->size = picture_2_size;
+        s->is_second_field = 1;
+        ret2 = jpeg2000_decode_frame_picture(avctx, picture, avpkt);
+
+        // reset
+        avpkt->data -= picture_1_size;
+        avpkt->size += picture_1_size;
+    }
+ 
+    *got_frame = avctx->skip_frame < AVDISCARD_ALL && (ret1 >= 0 || ret2 >= 0); // got_frame is 1 if any of the 2 pictures is fine
+    return ret1 < 0 ? ret1 : (ret2 < 0 ? ret2 : (ret1 + ret2)); // priority on first field error code
+}
+
 #define OFFSET(x) offsetof(Jpeg2000DecoderContext, x)
 #define VD AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_DECODING_PARAM
 
diff --git a/libavcodec/jpeg2000dec.h b/libavcodec/jpeg2000dec.h
index d0ca6e7a79..ce42812c48 100644
--- a/libavcodec/jpeg2000dec.h
+++ b/libavcodec/jpeg2000dec.h
@@ -114,6 +114,11 @@  typedef struct Jpeg2000DecoderContext {
 
     /*options parameters*/
     int             reduction_factor;
+    
+    /* field info */
+    int8_t          has_2_fields;
+    int8_t          is_bottom_coded_first;
+    int8_t          is_second_field;
 } Jpeg2000DecoderContext;
 
 #endif //AVCODEC_JPEG2000DEC_H