diff mbox

[FFmpeg-devel] lavf/mov: Do not blindly allocate stts entries

Message ID CAB0OVGpXhLfZ+ZxdJZOVtfdcJ1fPW9+zoZ1nv5ESq21AFoMzgA@mail.gmail.com
State Superseded
Headers show

Commit Message

Carl Eugen Hoyos Dec. 29, 2017, 1:10 a.m. UTC
2017-11-28 21:32 GMT+01:00 Michael Niedermayer <michael@niedermayer.cc>:
> On Mon, Nov 27, 2017 at 05:24:14AM +0100, Carl Eugen Hoyos wrote:
>> Hi!
>>
>> Attached patch avoids allocations >1GB for (short and) invalid mov
>> files with only reasonable speed impact.
>>
>> Please review, Carl Eugen
>
>>  mov.c |   16 +++++++++++++---
>>  1 file changed, 13 insertions(+), 3 deletions(-)
>> 980861e4c47c80c850d4e849043df2510a3d1d32  0001-lavf-mov-Do-not-blindly-allocate-huge-memory-blocks-.patch
>> From 0d243bad5fdd9850ff41d49a32a06274a3cd9756 Mon Sep 17 00:00:00 2001
>> From: Carl Eugen Hoyos <ceffmpeg@gmail.com>
>> Date: Mon, 27 Nov 2017 05:13:25 +0100
>> Subject: [PATCH] lavf/mov: Do not blindly allocate huge memory blocks for
>>  stts entries.
>>
>> Fixes large allocations for short files with invalid stts entry.
>> Fixes bugzilla 1102.
>> ---
>>  libavformat/mov.c |   16 +++++++++++++---
>>  1 file changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/libavformat/mov.c b/libavformat/mov.c
>> index ddb1e59..9d353bf 100644
>> --- a/libavformat/mov.c
>> +++ b/libavformat/mov.c
>> @@ -2838,14 +2838,24 @@ static int mov_read_stts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>>      if (sc->stts_data)
>>          av_log(c->fc, AV_LOG_WARNING, "Duplicated STTS atom\n");
>>      av_free(sc->stts_data);
>> -    sc->stts_count = 0;
>> -    sc->stts_data = av_malloc_array(entries, sizeof(*sc->stts_data));
>> +    sc->stts_count = FFMIN(1024 * 1024, entries);
>> +    sc->stts_data = av_realloc_array(NULL, sc->stts_count, sizeof(*sc->stts_data));
>>      if (!sc->stts_data)
>>          return AVERROR(ENOMEM);
>
> i dont know if leaving stts_count random on return is a good idea

Fixed.

>>      for (i = 0; i < entries && !pb->eof_reached; i++) {
>> -        int sample_duration;
>> +        int sample_duration, ret;
>>          unsigned int sample_count;
>> +        if (i > sc->stts_count) {
>> +            ret = av_reallocp_array(&sc->stts_data,
>> +                                    FFMIN(sc->stts_count * 2LL, entries),
>> +                                    sizeof(*sc->stts_data));
>
> this should use a variant of av_fast_realloc

New patch attached, only tested with fate.

Thank you, Carl Eugen

Comments

Derek Buitenhuis Dec. 29, 2017, 7:47 p.m. UTC | #1
On 12/29/2017 1:10 AM, Carl Eugen Hoyos wrote:
> New patch attached, only tested with fate.

> +    if (INT_MAX / sizeof(*sc->stts_data) <= entries)
>          return AVERROR(ENOMEM);

Should probably be something thing AVERROR(EINVAL), I think.

> +    sc->stts_count = FFMIN(1024 * 1024, entries);

As stated in the past many times, I am not a fan of hardcoding arbitrary
max hard limits. It's the OS's job to handle large allocation issues, not
the library, for what could be perfectly valid files.

However, if we insist on doing it, I believe 1024 * 1024 is too low. Very
long MP4/MOV files are not as uncommon as you think; I have many many 10-20hr
MP4s and even a 55 hour one, and this hardcoded (i.e. unchangeable by the user)
would/could break such files with no way to fix them from the user-side.

> +            sc->stts_count = FFMIN(sc->stts_count * 2, entries);

This one seems especially sketchy to me... it can't be that odd of a scenario
to double the count.

In general, I am not a fan of arbitrary limits.

- Derek
Carl Eugen Hoyos Dec. 29, 2017, 8:25 p.m. UTC | #2
2017-12-29 20:47 GMT+01:00 Derek Buitenhuis <derek.buitenhuis@gmail.com>:
> On 12/29/2017 1:10 AM, Carl Eugen Hoyos wrote:
>> New patch attached, only tested with fate.
>
>> +    if (INT_MAX / sizeof(*sc->stts_data) <= entries)

This is an arbitrary limit...

>>          return AVERROR(ENOMEM);
>
> Should probably be something thing AVERROR(EINVAL), I think.

... and it is therefore - imo - not correct to return EINVAL.

>> +    sc->stts_count = FFMIN(1024 * 1024, entries);

This is not a limit and therefore not an arbitrary limit.

> As stated in the past many times, I am not a fan of hardcoding
> arbitrary max hard limits.

There is no hard limit here (but only above), it just doesn't make much
sense to allocate several G for a file that's just a few bytes long.

Carl Eugen
Derek Buitenhuis Dec. 29, 2017, 8:35 p.m. UTC | #3
On 12/29/2017 8:25 PM, Carl Eugen Hoyos wrote:
> 2017-12-29 20:47 GMT+01:00 Derek Buitenhuis <derek.buitenhuis@gmail.com>:
>> On 12/29/2017 1:10 AM, Carl Eugen Hoyos wrote:
>>> New patch attached, only tested with fate.
>>
>>> +    if (INT_MAX / sizeof(*sc->stts_data) <= entries)
> 
> This is an arbitrary limit...

Yeah, true. I should have pointed that out too.

> 
>>>          return AVERROR(ENOMEM);
>>
>> Should probably be something thing AVERROR(EINVAL), I think.
> 
> ... and it is therefore - imo - not correct to return EINVAL.

ENOMEM is arguably even more incorrect, though.

> 
>>> +    sc->stts_count = FFMIN(1024 * 1024, entries);
> 
> This is not a limit and therefore not an arbitrary limit.

... what? It is a limit on the size of stts_count. A arbitrary (1mb) one.

In any case, I misunderstood this part, and it is OK. Sorry about that - disregard
my complaint.

FFMIN(sc->stts_count * 2, entries) could in theory end up using more memory
for legitimate large files though, due to exponential buffer growth. Is that
something worth caring about?

Other than that, seems OK. Sorry for the noise.

- Derek
Carl Eugen Hoyos Dec. 29, 2017, 8:42 p.m. UTC | #4
2017-12-29 21:35 GMT+01:00 Derek Buitenhuis <derek.buitenhuis@gmail.com>:

> FFMIN(sc->stts_count * 2, entries) could in theory end up using
> more memory for legitimate large files though, due to exponential
> buffer growth. Is that something worth caring about?

Sorry, I don't understand this sentence - could you try with
other words?

Thanks for your mail, Carl Eugen
Derek Buitenhuis Dec. 29, 2017, 9:32 p.m. UTC | #5
On 12/29/2017 8:42 PM, Carl Eugen Hoyos wrote:
> Sorry, I don't understand this sentence - could you try with
> other words?

Because the buffer is now grown exponentially (by a factor of 2 every
time it is resizes), it's possible that it ends up with almost double
the size of 'entries' after resizes.

For example, consider a valid MP4 file with (1024 * 1024 + 1) entries.
With the new code, the final buffer sizes will be (1024 * 1024 * 2) after
iterating over everything. I don't know if it's something we care about.

Completely Unrelated: Multiple types of atoms are similar to the stts atom,
in that they contain a coded number of entries we allocate - maybe we should
consistently apply this sort of allocation strategy to them, too (out of
scope of this patch, though)?

- Derek
Derek Buitenhuis Dec. 29, 2017, 9:33 p.m. UTC | #6
On 12/29/2017 9:32 PM, Derek Buitenhuis wrote:
> Because the buffer is now grown exponentially (by a factor of 2 every
> time it is resizes), it's possible that it ends up with almost double
> the size of 'entries' after resizes.
> 
> For example, consider a valid MP4 file with (1024 * 1024 + 1) entries.
> With the new code, the final buffer sizes will be (1024 * 1024 * 2) after
> iterating over everything. I don't know if it's something we care about.

Ah, actually disregard this too. Sigh. I must be on some sort of Christmas
drugs or something, because I'm misreading everything. Maybe too much eggnog.

- Derek
Carl Eugen Hoyos Dec. 29, 2017, 10:36 p.m. UTC | #7
2017-12-29 22:32 GMT+01:00 Derek Buitenhuis <derek.buitenhuis@gmail.com>:

> Completely Unrelated: Multiple types of atoms are similar to the stts atom,
> in that they contain a coded number of entries we allocate - maybe we should
> consistently apply this sort of allocation strategy to them, too (out of
> scope of this patch, though)?

This is correct but I don't immediately see how this can be done
except on a case-by-case basis.

(And I prefer working with actual samples.)

Carl Eugen
diff mbox

Patch

From 85620d03b313f5a684a5e9c06e445180601592f5 Mon Sep 17 00:00:00 2001
From: Carl Eugen Hoyos <ceffmpeg@gmail.com>
Date: Fri, 29 Dec 2017 02:08:48 +0100
Subject: [PATCH] lavf/mov: Do not blindly allocate huge memory blocks for
 stts entries.

Fixes large allocations for short files with invalid stts entry.
Fixes bugzilla 1102.
---
 libavformat/mov.c |   23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 2064473..df7a40f 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -2849,14 +2849,29 @@  static int mov_read_stts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
     if (sc->stts_data)
         av_log(c->fc, AV_LOG_WARNING, "Duplicated STTS atom\n");
     av_free(sc->stts_data);
-    sc->stts_count = 0;
-    sc->stts_data = av_malloc_array(entries, sizeof(*sc->stts_data));
-    if (!sc->stts_data)
+    if (INT_MAX / sizeof(*sc->stts_data) <= entries)
         return AVERROR(ENOMEM);
+    sc->stts_count = FFMIN(1024 * 1024, entries);
+    sc->stts_data = av_realloc_array(NULL, sc->stts_count, sizeof(*sc->stts_data));
+    if (!sc->stts_data) {
+        sc->stts_count = 0;
+        return AVERROR(ENOMEM);
+    }
 
     for (i = 0; i < entries && !pb->eof_reached; i++) {
         int sample_duration;
-        unsigned int sample_count;
+        unsigned int sample_count, alloc_size = sc->stts_count * sizeof(*sc->stts_data);
+        if (i > sc->stts_count) {
+            MOVStts *stts_data = av_fast_realloc(sc->stts_data, &alloc_size,
+                                                 FFMIN(sc->stts_count * 2LL, entries) * sizeof(*sc->stts_data));
+            if (!stts_data) {
+                av_freep(&sc->stts_data);
+                sc->stts_count = 0;
+                return AVERROR(ENOMEM);
+            }
+            sc->stts_count = FFMIN(sc->stts_count * 2, entries);
+            sc->stts_data = stts_data;
+        }
 
         sample_count=avio_rb32(pb);
         sample_duration = avio_rb32(pb);
-- 
1.7.10.4