diff mbox series

[FFmpeg-devel,v2,1/2] avformat/url: check double dot is not to parent directory

Message ID 20200725024537.97453-1-lq@chinaffmpeg.org
State New
Headers show
Series [FFmpeg-devel,v2,1/2] avformat/url: check double dot is not to parent directory | expand

Checks

Context Check Description
andriy/default pending
andriy/make success Make finished
andriy/make_fate success Make fate finished

Commit Message

Liu Steven July 25, 2020, 2:45 a.m. UTC
fix ticket: 8814
if get ".." in the url, check next byte and lead byte by double dot,
it there have no '/' and not root node, it is not used go to directory ".."

Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
---
 libavformat/url.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Zlomek, Josef July 25, 2020, 5:47 a.m. UTC | #1
Hi Steven,

It is better but still not correct. Consider this test:

test("http://server/foo/bar",
"a/b/../c/d/../e../.../..f/g../h../other/url/a.mp3/...");
It should give "
http://server/foo/bar/a/c/e../.../..f/g../h../other/url/a.mp3/...".

I think the best would be to use strtok(p, "/") to split the path into the
components and for each ".." component remove the previous one (if there
are some still).

Best regards,
Josef

On Sat, Jul 25, 2020 at 4:45 AM Steven Liu <lq@chinaffmpeg.org> wrote:

> fix ticket: 8814
> if get ".." in the url, check next byte and lead byte by double dot,
> it there have no '/' and not root node, it is not used go to directory ".."
>
> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
> ---
>  libavformat/url.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/libavformat/url.c b/libavformat/url.c
> index 20463a6674..35f27fe3ca 100644
> --- a/libavformat/url.c
> +++ b/libavformat/url.c
> @@ -97,6 +97,18 @@ static void trim_double_dot_url(char *buf, const char
> *rel, int size)
>      /* set new current position if the root node is changed */
>      p = root;
>      while (p && (node = strstr(p, ".."))) {
> +        if (strlen(node) > 2 && node[2] != '/') {
> +            node = strstr(node + 1, "..");
> +            if (!node)
> +                break;
> +        }
> +
> +        if (p != node && p[node - p - 1] != '/') {
> +            node = strstr(node + 1, "..");
> +            if (!node)
> +                break;
> +        }
> +
>          av_strlcat(tmp_path, p, node - p + strlen(tmp_path));
>          p = node + 3;
>          sep = strrchr(tmp_path, '/');
> --
> 2.25.0
>
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Liu Steven July 25, 2020, 6:27 a.m. UTC | #2
在 2020/7/25 下午2:11,“ffmpeg-devel 代表 Zlomek, Josef”<ffmpeg-devel-bounces@ffmpeg.org 代表 josef@pex.com> 写入:

    Hi Steven,
    
    It is better but still not correct. Consider this test:
    
    test("http://server/foo/bar",
    "a/b/../c/d/../e../.../..f/g../h../other/url/a.mp3/...");
    It should give "
    http://server/foo/bar/a/c/e../.../..f/g../h../other/url/a.mp3/...".
    
    I think the best would be to use strtok(p, "/") to split the path into the
    components and for each ".." component remove the previous one (if there
    are some still).
I think you can submit patch if you have full idea for it.
    
    Best regards,
    Josef
    
    On Sat, Jul 25, 2020 at 4:45 AM Steven Liu <lq@chinaffmpeg.org> wrote:
    
    > fix ticket: 8814
    > if get ".." in the url, check next byte and lead byte by double dot,
    > it there have no '/' and not root node, it is not used go to directory ".."
    >
    > Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
    > ---
    >  libavformat/url.c | 12 ++++++++++++
    >  1 file changed, 12 insertions(+)
    >
    > diff --git a/libavformat/url.c b/libavformat/url.c
    > index 20463a6674..35f27fe3ca 100644
    > --- a/libavformat/url.c
    > +++ b/libavformat/url.c
    > @@ -97,6 +97,18 @@ static void trim_double_dot_url(char *buf, const char
    > *rel, int size)
    >      /* set new current position if the root node is changed */
    >      p = root;
    >      while (p && (node = strstr(p, ".."))) {
    > +        if (strlen(node) > 2 && node[2] != '/') {
    > +            node = strstr(node + 1, "..");
    > +            if (!node)
    > +                break;
    > +        }
    > +
    > +        if (p != node && p[node - p - 1] != '/') {
    > +            node = strstr(node + 1, "..");
    > +            if (!node)
    > +                break;
    > +        }
    > +
    >          av_strlcat(tmp_path, p, node - p + strlen(tmp_path));
    >          p = node + 3;
    >          sep = strrchr(tmp_path, '/');
    > --
    > 2.25.0
    >
    >
    >
    >
    > _______________________________________________
    > ffmpeg-devel mailing list
    > ffmpeg-devel@ffmpeg.org
    > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
    >
    > To unsubscribe, visit link above, or email
    > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
    
    
    
    -- 
    Josef Zlomek
    _______________________________________________
    ffmpeg-devel mailing list
    ffmpeg-devel@ffmpeg.org
    https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
    
    To unsubscribe, visit link above, or email
    ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Nicolas George July 25, 2020, 8:31 a.m. UTC | #3
Steven Liu (12020-07-25):
> fix ticket: 8814
> if get ".." in the url, check next byte and lead byte by double dot,
> it there have no '/' and not root node, it is not used go to directory ".."
> 
> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
> ---
>  libavformat/url.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/libavformat/url.c b/libavformat/url.c
> index 20463a6674..35f27fe3ca 100644
> --- a/libavformat/url.c
> +++ b/libavformat/url.c
> @@ -97,6 +97,18 @@ static void trim_double_dot_url(char *buf, const char *rel, int size)
>      /* set new current position if the root node is changed */
>      p = root;
>      while (p && (node = strstr(p, ".."))) {

> +        if (strlen(node) > 2 && node[2] != '/') {

I have not yet looked at the whole patch, but this strlen() test is
useless.

And I think you would better rework the complete logic of the test:
strstring ".." is a broken method.

> +            node = strstr(node + 1, "..");
> +            if (!node)
> +                break;
> +        }
> +
> +        if (p != node && p[node - p - 1] != '/') {
> +            node = strstr(node + 1, "..");
> +            if (!node)
> +                break;
> +        }
> +
>          av_strlcat(tmp_path, p, node - p + strlen(tmp_path));
>          p = node + 3;
>          sep = strrchr(tmp_path, '/');

Regards,
Nicolas George July 25, 2020, 8:34 a.m. UTC | #4
刘歧 (12020-07-25):
>     I think the best would be to use strtok(p, "/") to split the path into the
>     components and for each ".." component remove the previous one (if there
>     are some still).
> I think you can submit patch if you have full idea for it.

strtok() is a terrible API, and better not used, but you are right, the
logic of the parsing is flawed.

I would suggest to keep walk from / to / (using strchr() probably), and
keep a pointer on the last two in the source and the last one in the
target IIRC.

Regards,
Steven Liu July 25, 2020, 8:42 a.m. UTC | #5
Nicolas George <george@nsup.org> 于2020年7月25日周六 下午4:31写道:
>
> Steven Liu (12020-07-25):
> > fix ticket: 8814
> > if get ".." in the url, check next byte and lead byte by double dot,
> > it there have no '/' and not root node, it is not used go to directory ".."
> >
> > Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
> > ---
> >  libavformat/url.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/libavformat/url.c b/libavformat/url.c
> > index 20463a6674..35f27fe3ca 100644
> > --- a/libavformat/url.c
> > +++ b/libavformat/url.c
> > @@ -97,6 +97,18 @@ static void trim_double_dot_url(char *buf, const char *rel, int size)
> >      /* set new current position if the root node is changed */
> >      p = root;
> >      while (p && (node = strstr(p, ".."))) {
>
> > +        if (strlen(node) > 2 && node[2] != '/') {
>
> I have not yet looked at the whole patch, but this strlen() test is
> useless.
>
> And I think you would better rework the complete logic of the test:
agreed, I think need lots of testcase for the logic, ../ .../..
dummy../... .../..dummy and so on,
maybe need spend some time to do it :D
> strstring ".." is a broken method.
>
> > +            node = strstr(node + 1, "..");
> > +            if (!node)
> > +                break;
> > +        }
> > +
> > +        if (p != node && p[node - p - 1] != '/') {
> > +            node = strstr(node + 1, "..");
> > +            if (!node)
> > +                break;
> > +        }
> > +
> >          av_strlcat(tmp_path, p, node - p + strlen(tmp_path));
> >          p = node + 3;
> >          sep = strrchr(tmp_path, '/');
>
> Regards,
>
> --
>   Nicolas George
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Nicolas George July 25, 2020, 8:44 a.m. UTC | #6
Steven Liu (12020-07-25):
> > And I think you would better rework the complete logic of the test:
> agreed, I think need lots of testcase for the logic, ../ .../..
> dummy../... .../..dummy and so on,
> maybe need spend some time to do it :D

I did not mean the FATE test for this feature, I meant the test of
whether something is a parent directory or not. It should parse the
path, not fish for a pattern.

Regards,
Steven Liu July 25, 2020, 8:47 a.m. UTC | #7
Nicolas George <george@nsup.org> 于2020年7月25日周六 下午4:44写道:
>
> Steven Liu (12020-07-25):
> > > And I think you would better rework the complete logic of the test:
> > agreed, I think need lots of testcase for the logic, ../ .../..
> > dummy../... .../..dummy and so on,
> > maybe need spend some time to do it :D
>
> I did not mean the FATE test for this feature, I meant the test of
> whether something is a parent directory or not. It should parse the
Yes, I mean, not only same as your suggestion, also need make full FATE.
> path, not fish for a pattern.
>
> Regards,
>
> --
>   Nicolas George
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Marton Balint July 25, 2020, 9:40 a.m. UTC | #8
On Sat, 25 Jul 2020, Zlomek, Josef wrote:

> Hi Steven,
>
> It is better but still not correct. Consider this test:
>
> test("http://server/foo/bar",
> "a/b/../c/d/../e../.../..f/g../h../other/url/a.mp3/...");
> It should give "
> http://server/foo/bar/a/c/e../.../..f/g../h../other/url/a.mp3/...".
>
> I think the best would be to use strtok(p, "/") to split the path into the
> components and for each ".." component remove the previous one (if there
> are some still).

And I also would like to point out that using static strings with 
MAX_URL_SIZE is not OK. This function supports an arbitrary buffer size, 
so limiting it to MAX_URL_SIZE is a bug.

Regards,
Marton

>
> Best regards,
> Josef
>
> On Sat, Jul 25, 2020 at 4:45 AM Steven Liu <lq@chinaffmpeg.org> wrote:
>
>> fix ticket: 8814
>> if get ".." in the url, check next byte and lead byte by double dot,
>> it there have no '/' and not root node, it is not used go to directory ".."
>>
>> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
>> ---
>>  libavformat/url.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/libavformat/url.c b/libavformat/url.c
>> index 20463a6674..35f27fe3ca 100644
>> --- a/libavformat/url.c
>> +++ b/libavformat/url.c
>> @@ -97,6 +97,18 @@ static void trim_double_dot_url(char *buf, const char
>> *rel, int size)
>>      /* set new current position if the root node is changed */
>>      p = root;
>>      while (p && (node = strstr(p, ".."))) {
>> +        if (strlen(node) > 2 && node[2] != '/') {
>> +            node = strstr(node + 1, "..");
>> +            if (!node)
>> +                break;
>> +        }
>> +
>> +        if (p != node && p[node - p - 1] != '/') {
>> +            node = strstr(node + 1, "..");
>> +            if (!node)
>> +                break;
>> +        }
>> +
>>          av_strlcat(tmp_path, p, node - p + strlen(tmp_path));
>>          p = node + 3;
>>          sep = strrchr(tmp_path, '/');
>> --
>> 2.25.0
>>
>>
>>
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
>
>
> -- 
> Josef Zlomek
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Steven Liu July 25, 2020, 10:48 a.m. UTC | #9
Marton Balint <cus@passwd.hu> 于2020年7月25日周六 下午5:40写道:

> And I also would like to point out that using static strings with
> MAX_URL_SIZE is not OK. This function supports an arbitrary buffer size,
> so limiting it to MAX_URL_SIZE is a bug.
Okay, get it, i'm reworking.

Thanks
Steven
Nicolas George July 25, 2020, 11:03 a.m. UTC | #10
Marton Balint (12020-07-25):
> And I also would like to point out that using static strings with
> MAX_URL_SIZE is not OK. This function supports an arbitrary buffer size, so
> limiting it to MAX_URL_SIZE is a bug.

Excellent point. That would require changing the prototype of a few
functions, but they are internal, we can do it.

For this, I would like to pitch the AVWriter API once again, because it
is made precisely for these cases: when you need to return a string of
arbitrary length, but would like to avoid, as much a possible, the
overhead of dynamically allocating it and the many error tests that come
with it. It is like AVBPrint, but better.

Let me elaborate without polluting the discussion by showing the actual
implementation.

So, let us say that function A calls function B, and function B needs to
return a string of arbitrary length but often small (I say string, but
the API can deal with binary data).

Likely use cases: rewriting an URL like here; escaping special
characters; serialization to key=value string; serialization to JSON or
XML; character encoding conversion, zlib-style unpacking, etc.


First, from the point of view of B, the function that needs to return
the string:

Instead of either:

	int B(char *buf, size_t buf_size, other arguments);
	int B(char **buf, other arguments);

request an AVWriter as argument:

	void B(AVWriter wr, other arguments);

Then, just write in it with the various API functions:
av_writer_write(), av_writer_printf(), etc.

And that's all. Everything is taken care of: the string will grow as
needed, the size will be tracked, error checks are made. Nothing to
worry about at all.


Now, from the point of view of A, the function that calls B and gets a
string in return:

Instead of declaring a buffer, or a pointer to a buffer, declare:

	AVWriter wr = av_dynbuf_writer();

and call B with it: B(wr, ...).

Then, extract the string, check for error (of course: dynamic allocation
can happen and fail; but this error check is the only one necessary),
use the string and free it:

	char *msg = av_dynbuf_writer_get_data(wr, NULL);
	if (!msg)
	    return AVERROR(ENOMEM);
	do something with msg
	av_dynbuf_writer_reset(wr);

I concede, for this example, av_dynbuf_writer_get_data() is one more
step than using the pointer directly. But as soon as the code becomes
more complex, in particular as soon as it uses the same writer twice to
put two strings together, this extra step is balanced by the fewer error
checks necessary.


All said, it makes one more API to learn, but IMHO, it is very simple to
learn: seven line of sample code. And the benefits are immediate.

Plus: there are many wonderful features that I have not told you about
yet.

So, Steven, Marton and the others: presented like that, does it look
appealing? Shall I elaborate?

Regards,
Zlomek, Josef July 26, 2020, 6:38 a.m. UTC | #11
I will submit patch on monday.

Josef

On Sat, Jul 25, 2020 at 8:27 AM 刘歧 <lq@chinaffmpeg.org> wrote:

>
>
> 在 2020/7/25 下午2:11,“ffmpeg-devel 代表 Zlomek, Josef”<
> ffmpeg-devel-bounces@ffmpeg.org 代表 josef@pex.com> 写入:
>
>     Hi Steven,
>
>     It is better but still not correct. Consider this test:
>
>     test("http://server/foo/bar",
>     "a/b/../c/d/../e../.../..f/g../h../other/url/a.mp3/...");
>     It should give "
>     http://server/foo/bar/a/c/e../.../..f/g../h../other/url/a.mp3/...".
>
>     I think the best would be to use strtok(p, "/") to split the path into
> the
>     components and for each ".." component remove the previous one (if
> there
>     are some still).
> I think you can submit patch if you have full idea for it.
>
>     Best regards,
>     Josef
>
>     On Sat, Jul 25, 2020 at 4:45 AM Steven Liu <lq@chinaffmpeg.org> wrote:
>
>     > fix ticket: 8814
>     > if get ".." in the url, check next byte and lead byte by double dot,
>     > it there have no '/' and not root node, it is not used go to
> directory ".."
>     >
>     > Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
>     > ---
>     >  libavformat/url.c | 12 ++++++++++++
>     >  1 file changed, 12 insertions(+)
>     >
>     > diff --git a/libavformat/url.c b/libavformat/url.c
>     > index 20463a6674..35f27fe3ca 100644
>     > --- a/libavformat/url.c
>     > +++ b/libavformat/url.c
>     > @@ -97,6 +97,18 @@ static void trim_double_dot_url(char *buf, const
> char
>     > *rel, int size)
>     >      /* set new current position if the root node is changed */
>     >      p = root;
>     >      while (p && (node = strstr(p, ".."))) {
>     > +        if (strlen(node) > 2 && node[2] != '/') {
>     > +            node = strstr(node + 1, "..");
>     > +            if (!node)
>     > +                break;
>     > +        }
>     > +
>     > +        if (p != node && p[node - p - 1] != '/') {
>     > +            node = strstr(node + 1, "..");
>     > +            if (!node)
>     > +                break;
>     > +        }
>     > +
>     >          av_strlcat(tmp_path, p, node - p + strlen(tmp_path));
>     >          p = node + 3;
>     >          sep = strrchr(tmp_path, '/');
>     > --
>     > 2.25.0
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > ffmpeg-devel mailing list
>     > ffmpeg-devel@ffmpeg.org
>     > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>     >
>     > To unsubscribe, visit link above, or email
>     > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
>
>
>     --
>     Josef Zlomek
>     _______________________________________________
>     ffmpeg-devel mailing list
>     ffmpeg-devel@ffmpeg.org
>     https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>     To unsubscribe, visit link above, or email
>     ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Steven Liu July 27, 2020, 7:18 a.m. UTC | #12
Marton Balint <cus@passwd.hu> 于2020年7月25日周六 下午5:40写道:
>
>
>
> On Sat, 25 Jul 2020, Zlomek, Josef wrote:
>
> > Hi Steven,
> >
> > It is better but still not correct. Consider this test:
> >
> > test("http://server/foo/bar",
> > "a/b/../c/d/../e../.../..f/g../h../other/url/a.mp3/...");
> > It should give "
> > http://server/foo/bar/a/c/e../.../..f/g../h../other/url/a.mp3/...".
> >
> > I think the best would be to use strtok(p, "/") to split the path into the
> > components and for each ".." component remove the previous one (if there
> > are some still).
>
> And I also would like to point out that using static strings with
> MAX_URL_SIZE is not OK. This function supports an arbitrary buffer size,
> so limiting it to MAX_URL_SIZE is a bug.
What about use av_malloc? or bprint?
I think use av_malloc maybe easter to me.
>
> Regards,
> Marton
>
> >

Thanks
Steven
Steven Liu July 27, 2020, 7:25 a.m. UTC | #13
Nicolas George <george@nsup.org> 于2020年7月25日周六 下午7:03写道:
>
> Marton Balint (12020-07-25):
> > And I also would like to point out that using static strings with
> > MAX_URL_SIZE is not OK. This function supports an arbitrary buffer size, so
> > limiting it to MAX_URL_SIZE is a bug.
>
> Excellent point. That would require changing the prototype of a few
> functions, but they are internal, we can do it.
>
> For this, I would like to pitch the AVWriter API once again, because it
> is made precisely for these cases: when you need to return a string of
> arbitrary length, but would like to avoid, as much a possible, the
> overhead of dynamically allocating it and the many error tests that come
> with it. It is like AVBPrint, but better.
>
> Let me elaborate without polluting the discussion by showing the actual
> implementation.
>
> So, let us say that function A calls function B, and function B needs to
> return a string of arbitrary length but often small (I say string, but
> the API can deal with binary data).
>
> Likely use cases: rewriting an URL like here; escaping special
> characters; serialization to key=value string; serialization to JSON or
> XML; character encoding conversion, zlib-style unpacking, etc.
>
>
> First, from the point of view of B, the function that needs to return
> the string:
>
> Instead of either:
>
>         int B(char *buf, size_t buf_size, other arguments);
>         int B(char **buf, other arguments);
>
> request an AVWriter as argument:
>
>         void B(AVWriter wr, other arguments);
>
> Then, just write in it with the various API functions:
> av_writer_write(), av_writer_printf(), etc.
>
> And that's all. Everything is taken care of: the string will grow as
> needed, the size will be tracked, error checks are made. Nothing to
> worry about at all.
>
>
> Now, from the point of view of A, the function that calls B and gets a
> string in return:
>
> Instead of declaring a buffer, or a pointer to a buffer, declare:
>
>         AVWriter wr = av_dynbuf_writer();
>
> and call B with it: B(wr, ...).
>
> Then, extract the string, check for error (of course: dynamic allocation
> can happen and fail; but this error check is the only one necessary),
> use the string and free it:
>
>         char *msg = av_dynbuf_writer_get_data(wr, NULL);
>         if (!msg)
>             return AVERROR(ENOMEM);
>         do something with msg
>         av_dynbuf_writer_reset(wr);
>
> I concede, for this example, av_dynbuf_writer_get_data() is one more
> step than using the pointer directly. But as soon as the code becomes
> more complex, in particular as soon as it uses the same writer twice to
> put two strings together, this extra step is balanced by the fewer error
> checks necessary.
>
>
> All said, it makes one more API to learn, but IMHO, it is very simple to
> learn: seven line of sample code. And the benefits are immediate.
>
> Plus: there are many wonderful features that I have not told you about
> yet.
>
> So, Steven, Marton and the others: presented like that, does it look
> appealing? Shall I elaborate?
Because language barries. maybe I cannot deep understand, maybe need
more code examples.
But look interesting, maybe more easy for use.
>
> Regards,
>
> --
>   Nicolas George
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Nicolas George July 27, 2020, 1:49 p.m. UTC | #14
Steven Liu (12020-07-27):
> What about use av_malloc? or bprint?
> I think use av_malloc maybe easter to me.

For internal APIs, definitely BPrint over naive av_malloc(). But this is
unrelated to the issue at hand: the size limit exists in the caller of
this function, not in this function.

Regards,
Marton Balint July 27, 2020, 6:57 p.m. UTC | #15
On Mon, 27 Jul 2020, Steven Liu wrote:

> Marton Balint <cus@passwd.hu> 于2020年7月25日周六 下午5:40写道:
>>
>>
>>
>> On Sat, 25 Jul 2020, Zlomek, Josef wrote:
>>
>> > Hi Steven,
>> >
>> > It is better but still not correct. Consider this test:
>> >
>> > test("http://server/foo/bar",
>> > "a/b/../c/d/../e../.../..f/g../h../other/url/a.mp3/...");
>> > It should give "
>> > http://server/foo/bar/a/c/e../.../..f/g../h../other/url/a.mp3/...".
>> >
>> > I think the best would be to use strtok(p, "/") to split the path into the
>> > components and for each ".." component remove the previous one (if there
>> > are some still).
>>
>> And I also would like to point out that using static strings with
>> MAX_URL_SIZE is not OK. This function supports an arbitrary buffer size,
>> so limiting it to MAX_URL_SIZE is a bug.
> What about use av_malloc? or bprint?
> I think use av_malloc maybe easter to me.

It should be implemented in a way which does not use a temporary buffer. 
Seems doable.

Regards,
Marton
Marton Balint July 27, 2020, 8:12 p.m. UTC | #16
On Sat, 25 Jul 2020, Nicolas George wrote:

> Marton Balint (12020-07-25):
>> And I also would like to point out that using static strings with
>> MAX_URL_SIZE is not OK. This function supports an arbitrary buffer size, so
>> limiting it to MAX_URL_SIZE is a bug.
>
> Excellent point. That would require changing the prototype of a few
> functions, but they are internal, we can do it.

If I understand correctly, you are speaking about URL handling in general 
all over the codebase, and not only ff_make_absolute_url.

>
> For this, I would like to pitch the AVWriter API once again,

I think we understand the concept. It just seems a tiny bit heavy weight, 
or at least that was the primary concern as far as I remember.

Regards,
Marton

> is made precisely for these cases: when you need to return a string of
> arbitrary length, but would like to avoid, as much a possible, the
> overhead of dynamically allocating it and the many error tests that come
> with it. It is like AVBPrint, but better.
>
> Let me elaborate without polluting the discussion by showing the actual
> implementation.
>
> So, let us say that function A calls function B, and function B needs to
> return a string of arbitrary length but often small (I say string, but
> the API can deal with binary data).
>
> Likely use cases: rewriting an URL like here; escaping special
> characters; serialization to key=value string; serialization to JSON or
> XML; character encoding conversion, zlib-style unpacking, etc.
>
>
> First, from the point of view of B, the function that needs to return
> the string:
>
> Instead of either:
>
> 	int B(char *buf, size_t buf_size, other arguments);
> 	int B(char **buf, other arguments);
>
> request an AVWriter as argument:
>
> 	void B(AVWriter wr, other arguments);
>
> Then, just write in it with the various API functions:
> av_writer_write(), av_writer_printf(), etc.
>
> And that's all. Everything is taken care of: the string will grow as
> needed, the size will be tracked, error checks are made. Nothing to
> worry about at all.
>
>
> Now, from the point of view of A, the function that calls B and gets a
> string in return:
>
> Instead of declaring a buffer, or a pointer to a buffer, declare:
>
> 	AVWriter wr = av_dynbuf_writer();
>
> and call B with it: B(wr, ...).
>
> Then, extract the string, check for error (of course: dynamic allocation
> can happen and fail; but this error check is the only one necessary),
> use the string and free it:
>
> 	char *msg = av_dynbuf_writer_get_data(wr, NULL);
> 	if (!msg)
> 	    return AVERROR(ENOMEM);
> 	do something with msg
> 	av_dynbuf_writer_reset(wr);
>
> I concede, for this example, av_dynbuf_writer_get_data() is one more
> step than using the pointer directly. But as soon as the code becomes
> more complex, in particular as soon as it uses the same writer twice to
> put two strings together, this extra step is balanced by the fewer error
> checks necessary.
>
>
> All said, it makes one more API to learn, but IMHO, it is very simple to
> learn: seven line of sample code. And the benefits are immediate.
>
> Plus: there are many wonderful features that I have not told you about
> yet.
>
> So, Steven, Marton and the others: presented like that, does it look
> appealing? Shall I elaborate?
>
> Regards,
>
> --
>  Nicolas George
>
Nicolas George July 27, 2020, 9:34 p.m. UTC | #17
Steven Liu (12020-07-27):
> Because language barries. maybe I cannot deep understand, maybe need
> more code examples.
> But look interesting, maybe more easy for use.

All right, here is a practical albeit fictional example.

Let us say we want av_metadata_to_user_string() that turns an
AVDictionary into a string for users, and show_metadata_in_dialog() to
print the metadata of all the streams in a file in a GUI dialog.

av_metadata_to_user_string() using naïve av_malloc():

char *av_metadata_to_user_string(AVDictionary *meta)
{
    size_t size = 1, len;
    AVDictionaryEntry *entry;
    char *str, *cur;

    entry = NULL;
    while ((entry = av_dict_get(meta, "", entry, AV_DICT_IGNORE_SUFFIX))) {
        len = strlen(entry->key);
        if (len > SIZE_MAX - size)
            return NULL;
        size += len;
        len = strlen(entry->value);
        if (len > SIZE_MAX - size)
            return NULL;
        size += len;
        if (3 > SIZE_MAX - size)
            return NULL;
        size += 3;
    }
    str = av_malloc(size);
    if (!str)
        return AVERROR(ENOMEM);
    cur = str;
    entry = NULL;
    while ((entry = av_dict_get(meta, "", entry, AV_DICT_IGNORE_SUFFIX))) {
        cur += av_strlcatf(cur, "%s: %s\n", entry->key, entry->value, str + size - cur);
    }
    av_assert0(cur == str + size - 1);
    return str;
}

av_metadata_to_user_string() using AVBPrint:

void av_metadata_to_user_string(AVWriter wr, AVDictionary *meta)
{
    AVDictionaryEntry *entry;

    while ((entry = av_dict_get(meta, "", entry, AV_DICT_IGNORE_SUFFIX))) {
        av_writer_printf(wr, entry->key, entry->value, str + size - cur);
    }
}

show_metadata_in_dialog(), without AVBPrint, but relying on Gtk+'s
features (I want to show AVWriter is good, it would be dishonest not to
use all the tools to make the other side look good):

void show_metadata_in_dialog(AVFormatContext *ctx, GtkLabel *label)
{
    GString *str = g_string_new();
    unsigned i;

    for (i = 0; i < ctx->nb_streams; i++) {
        char *meta = av_metadata_to_user_string(ctx->streams->metadata);
        if (!meta)
            fatal_error("out of memory");
        g_string_printf("Stream #%d:\n%s\n", i, meta);
        av_free(meta);
    }
    gtk_label_set_text(label, str->str);
    g_string_free(str);
}

show_metadata_in_dialog() using AVBPrint:

void show_metadata_in_dialog(AVFormatContext *ctx, GtkLabel *label)
{
    AVWriter = av_dynbuf_writer();
    unsigned i;

    for (i = 0; i < ctx->nb_streams; i++) {
        av_writer_printf(wr, "Stream #%d:\n", i);
        av_metadata_to_user_string(wr, ctx->streams->metadata);
        g_string_print(wr, "\n");
    }
    if (av_dynbuf_writer_get_error(wr))
        fatal_error("out of memory");
    gtk_label_set_text(label, av_dynbuf_writer_get_data(wr, NULL));
    av_dynbuf_writer_reset(wr);
}

Now, imagine if a lot of FFmpeg functions did use the same AVWriter, and
therefore worked well together.

Also, not visible in the code: unless the metadata is big, the AVWriter
version will keep all in the stack.

Regards,
Nicolas George July 27, 2020, 9:39 p.m. UTC | #18
Marton Balint (12020-07-27):
> If I understand correctly, you are speaking about URL handling in general
> all over the codebase, and not only ff_make_absolute_url.

I thought that was what you were speaking in the first place.
ff_make_absolute_url() only does what the caller wants, the caller
decides the size of the buffer, and we can be sure it is shorter than
the concatenation of both parts.

> I think we understand the concept. It just seems a tiny bit heavy weight, or
> at least that was the primary concern as far as I remember.

That's because I made the mistake of showing the implementation and
talking about the benefits.

The implementation is my problem (and whoever would succeed me if I were
to stop maintaining it, but if it is accepted, I do not intend to). The
benefits are for everybody.

Please have a look at the other mail I just sent, where I show the
benefits and keep the implementation for myself.

Regards,
Jean-Baptiste Kempf July 27, 2020, 10:19 p.m. UTC | #19
On Sat, 25 Jul 2020, at 13:03, Nicolas George wrote:
> And that's all. Everything is taken care of: the string will grow as
> needed, the size will be tracked, error checks are made. Nothing to
> worry about at all.

How is that different from open_memstream, which is done for this exact purpose?

--
Jean-Baptiste Kempf -  President
+33 672 704 734
Zlomek, Josef July 28, 2020, 4:04 a.m. UTC | #20
On Mon, Jul 27, 2020 at 8:57 PM Marton Balint <cus@passwd.hu> wrote:

> It should be implemented in a way which does not use a temporary buffer.
> Seems doable.
>

You are right. I will submit a new version.

Best regards,
Josef
Nicolas George July 28, 2020, 8:05 a.m. UTC | #21
Jean-Baptiste Kempf (12020-07-28):
> How is that different from open_memstream, which is done for this exact purpose?

I am not sure what you are referring to, I do not find a function by
that name in FFmpeg.

Are you referring to avio_open_dyn_buf()?

If so, there are three differences:

- avio_open_dyn_buf() resides in libavformat, and therefore is only
  available for a tiny portion of the project. I had a vague project to
  move part of AVIO to lavu, but it is huge work.

- avio_open_dyn_buf() always uses dynamic allocation. AVWriter starts
  entirely on the stack (or whatever we want), and only resorts to
  dynamic allocation if the buffer becomes too large. As a corollary,
  avio_open_dyn_buf() requires an error check at the beginning, which
  AVWriter does not.

- AVWriter is polymorphic. I have shown only av_dynbuf_writer(), but
  there are others, and applications can implement their own. That is
  the key to the "wonderful features" I have evoked. I would be happy to
  tell more about them.

Regards,
Zhao Zhili July 28, 2020, 11:39 a.m. UTC | #22
> On Jul 28, 2020, at 4:05 PM, Nicolas George <george@nsup.org> wrote:
> 
> Jean-Baptiste Kempf (12020-07-28):
>> How is that different from open_memstream, which is done for this exact purpose?
> 
> I am not sure what you are referring to, I do not find a function by
> that name in FFmpeg.

I think jb is referring to

     FILE *open_memstream(char **bufp, size_t *sizep);
https://linux.die.net/man/3/open_memstream <https://linux.die.net/man/3/open_memstream>

VLC has a wrapper for it:

https://code.videolan.org/videolan/vlc/-/blob/master/include/vlc_memstream.h <https://code.videolan.org/videolan/vlc/-/blob/master/include/vlc_memstream.h>
> 
> Are you referring to avio_open_dyn_buf()?
> 
> If so, there are three differences:
> 
> - avio_open_dyn_buf() resides in libavformat, and therefore is only
>  available for a tiny portion of the project. I had a vague project to
>  move part of AVIO to lavu, but it is huge work.
> 
> - avio_open_dyn_buf() always uses dynamic allocation. AVWriter starts
>  entirely on the stack (or whatever we want), and only resorts to
>  dynamic allocation if the buffer becomes too large. As a corollary,
>  avio_open_dyn_buf() requires an error check at the beginning, which
>  AVWriter does not.
> 
> - AVWriter is polymorphic. I have shown only av_dynbuf_writer(), but
>  there are others, and applications can implement their own. That is
>  the key to the "wonderful features" I have evoked. I would be happy to
>  tell more about them.
> 
> Regards,
> 
> -- 
>  Nicolas George
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Nicolas George July 28, 2020, 12:02 p.m. UTC | #23
zhilizhao (12020-07-28):
> I think jb is referring to
> 
>      FILE *open_memstream(char **bufp, size_t *sizep);
> https://linux.die.net/man/3/open_memstream <https://linux.die.net/man/3/open_memstream>
> 
> VLC has a wrapper for it:
> 
> https://code.videolan.org/videolan/vlc/-/blob/master/include/vlc_memstream.h <https://code.videolan.org/videolan/vlc/-/blob/master/include/vlc_memstream.h>

Oh. Thanks. I had not realized such an useful function had been
standardized.

That means the argument about it being in lavf is wotrhless. But these
two arguments:

> > - avio_open_dyn_buf() always uses dynamic allocation. AVWriter starts
> >  entirely on the stack (or whatever we want), and only resorts to
> >  dynamic allocation if the buffer becomes too large. As a corollary,
> >  avio_open_dyn_buf() requires an error check at the beginning, which
> >  AVWriter does not.
> > 
> > - AVWriter is polymorphic. I have shown only av_dynbuf_writer(), but
> >  there are others, and applications can implement their own. That is
> >  the key to the "wonderful features" I have evoked. I would be happy to
> >  tell more about them.

apply to open_memstream() too.

Please ask me about the wonderful features ;)

Regards,
Zhao Zhili July 28, 2020, 12:43 p.m. UTC | #24
> On Jul 28, 2020, at 8:02 PM, Nicolas George <george@nsup.org> wrote:
> 
> zhilizhao (12020-07-28):
>> I think jb is referring to
>> 
>>     FILE *open_memstream(char **bufp, size_t *sizep);
>> https://linux.die.net/man/3/open_memstream <https://linux.die.net/man/3/open_memstream>
>> 
>> VLC has a wrapper for it:
>> 
>> https://code.videolan.org/videolan/vlc/-/blob/master/include/vlc_memstream.h <https://code.videolan.org/videolan/vlc/-/blob/master/include/vlc_memstream.h>
> 
> Oh. Thanks. I had not realized such an useful function had been
> standardized.
> 
> That means the argument about it being in lavf is wotrhless. But these
> two arguments:
> 
>>> - avio_open_dyn_buf() always uses dynamic allocation. AVWriter starts
>>> entirely on the stack (or whatever we want), and only resorts to
>>> dynamic allocation if the buffer becomes too large. As a corollary,
>>> avio_open_dyn_buf() requires an error check at the beginning, which
>>> AVWriter does not.

fmemopen() can work with memory on stack or on heap, although it doesn’t
support switch between the two dynamically. It’s a good to have feature like
the small string optimization in std::string, but i’m not too keen on it.

>>> 
>>> - AVWriter is polymorphic. I have shown only av_dynbuf_writer(), but
>>> there are others, and applications can implement their own. That is
>>> the key to the "wonderful features" I have evoked. I would be happy to
>>> tell more about them.
> 
> apply to open_memstream() too.
> 
> Please ask me about the wonderful features ;)

Please, what wonderful features are in your plan?

> 
> Regards,
> 
> -- 
>  Nicolas George
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Alexander Strasser July 29, 2020, 7:10 p.m. UTC | #25
On 2020-07-28 20:43 +0800, zhilizhao wrote:
> > On Jul 28, 2020, at 8:02 PM, Nicolas George <george@nsup.org> wrote:
> >
> > zhilizhao (12020-07-28):
> >> I think jb is referring to
> >>
> >>     FILE *open_memstream(char **bufp, size_t *sizep);
> >> https://linux.die.net/man/3/open_memstream <https://linux.die.net/man/3/open_memstream>
> >>
> >> VLC has a wrapper for it:
> >>
> >> https://code.videolan.org/videolan/vlc/-/blob/master/include/vlc_memstream.h <https://code.videolan.org/videolan/vlc/-/blob/master/include/vlc_memstream.h>
> >
> > Oh. Thanks. I had not realized such an useful function had been
> > standardized.
> >
> > That means the argument about it being in lavf is wotrhless. But these
> > two arguments:
> >
> >>> - avio_open_dyn_buf() always uses dynamic allocation. AVWriter starts
> >>> entirely on the stack (or whatever we want), and only resorts to
> >>> dynamic allocation if the buffer becomes too large. As a corollary,
> >>> avio_open_dyn_buf() requires an error check at the beginning, which
> >>> AVWriter does not.
>
> fmemopen() can work with memory on stack or on heap, although it doesn’t
> support switch between the two dynamically. It’s a good to have feature like
> the small string optimization in std::string, but i’m not too keen on it.
>
> >>>
> >>> - AVWriter is polymorphic. I have shown only av_dynbuf_writer(), but
> >>> there are others, and applications can implement their own. That is
> >>> the key to the "wonderful features" I have evoked. I would be happy to
> >>> tell more about them.
> >
> > apply to open_memstream() too.
> >
> > Please ask me about the wonderful features ;)
>
> Please, what wonderful features are in your plan?

I would say FILE * is polymorphic as well.

Maybe we are not in control to provide more implementations, the
question still would be which implementations we would want.


  Alexander
Nicolas George July 30, 2020, 3:42 p.m. UTC | #26
zhilizhao (12020-07-28):
> fmemopen() can work with memory on stack or on heap, although it doesn’t
> support switch between the two dynamically. It’s a good to have feature like
> the small string optimization in std::string, but i’m not too keen on it.

I think it is an important optimization. This kind of string operation
can happen once per frame, and with low-latency codecs or devices, once
per frame can be hundreds or thousand of times per second.

Note that we are actively trying to prevent memory allocations that can
happen once per frame, to the point of adding pools to avoid them. This
optimization is both much simpler and a bit more efficient than using
pools.

And even for cases that are not once per frame, this is interesting:
because this optimization is there, we do not have to hesitate to use
this API even if we know that the string is short and a fixed buffer
suffices. This is interesting because the more parts of FFmpeg use this
API, the more convenient it becomes.

> Please, what wonderful features are in your plan?

;) I may have exaggerated a little bit about wonderful: anything is
possible with normal string buffers anyway. But AVWriter allows nice
optimizations, avoiding many intermediary buffers.

A few examples:

If you want to send something directly to a file or the network, you can
have av_avio_writer() that does that with an AVIO context. If the thing
is big but constructed by parts, we avoid needing a big buffer, and
serialization and writing can happen in parallel.

If we have to pile string processing on top of each other, for example
first serializing something and then escaping special characters in the
resulting string, AVWriter can be made to do it on the fly, without an
intermediate buffer in the middle.

If an application works with other kinds of memory allocation, for
example a Java application using StringBuffer and such, AVWriter can be
made to use directly these instead of first using av_malloc() for a
buffer and then converting it to what the application needs.

I think that gives a good idea of what is possible. Most importantly,
none of these feature makes anything more complicated in our code: using
AVWriter is still as simple as I have shown earlier (
https://ffmpeg.org/pipermail/ffmpeg-devel/2020-July/266897.html
). Everything complex is in the implementation, and it not that complex.

Regards,
Zhao Zhili July 30, 2020, 4:23 p.m. UTC | #27
> On Jul 30, 2020, at 11:42 PM, Nicolas George <george@nsup.org> wrote:
> 
> zhilizhao (12020-07-28):
>> fmemopen() can work with memory on stack or on heap, although it doesn’t
>> support switch between the two dynamically. It’s a good to have feature like
>> the small string optimization in std::string, but i’m not too keen on it.
> 
> I think it is an important optimization. This kind of string operation
> can happen once per frame, and with low-latency codecs or devices, once
> per frame can be hundreds or thousand of times per second.
> 
> Note that we are actively trying to prevent memory allocations that can
> happen once per frame, to the point of adding pools to avoid them. This
> optimization is both much simpler and a bit more efficient than using
> pools.
> 
> And even for cases that are not once per frame, this is interesting:
> because this optimization is there, we do not have to hesitate to use
> this API even if we know that the string is short and a fixed buffer
> suffices. This is interesting because the more parts of FFmpeg use this
> API, the more convenient it becomes.
> 
>> Please, what wonderful features are in your plan?
> 
> ;) I may have exaggerated a little bit about wonderful: anything is
> possible with normal string buffers anyway. But AVWriter allows nice
> optimizations, avoiding many intermediary buffers.
> 
> A few examples:
> 
> If you want to send something directly to a file or the network, you can
> have av_avio_writer() that does that with an AVIO context. If the thing
> is big but constructed by parts, we avoid needing a big buffer, and
> serialization and writing can happen in parallel.
> 
> If we have to pile string processing on top of each other, for example
> first serializing something and then escaping special characters in the
> resulting string, AVWriter can be made to do it on the fly, without an
> intermediate buffer in the middle.
> 
> If an application works with other kinds of memory allocation, for
> example a Java application using StringBuffer and such, AVWriter can be
> made to use directly these instead of first using av_malloc() for a
> buffer and then converting it to what the application needs.
> 
> I think that gives a good idea of what is possible. Most importantly,
> none of these feature makes anything more complicated in our code: using
> AVWriter is still as simple as I have shown earlier (
> https://ffmpeg.org/pipermail/ffmpeg-devel/2020-July/266897.html
> ). Everything complex is in the implementation, and it not that complex.
> 

So we can get a writer with AVIO context as backend, with custom stream filters.
It do sound interesting, as long as it doesn't go too far like io in Java and C++.
It's valuable to support user allocated memory, however, I'm afraid string is only
a small part of the use cases.

> Regards,
> 
> -- 
>  Nicolas George
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Nicolas George Aug. 4, 2020, 9:07 a.m. UTC | #28
Zhao Zhili (12020-07-31):
> So we can get a writer with AVIO context as backend, with custom stream filters.
> It do sound interesting, as long as it doesn't go too far like io in Java and C++.

The maze of twisty little APIs all alike is one of the reasons I try to
minimize my interactions with C++ and Java.

In particular, I do not plan on any kind of built-in inheritance, which
is the doorway to this nightmare world.

> It's valuable to support user allocated memory, however, I'm afraid string is only
> a small part of the use cases.

I can reassure you: I wrote string, but everything is designed to be
completely 8-bit clean, and can be used for any kind of binary data.

Regards,
diff mbox series

Patch

diff --git a/libavformat/url.c b/libavformat/url.c
index 20463a6674..35f27fe3ca 100644
--- a/libavformat/url.c
+++ b/libavformat/url.c
@@ -97,6 +97,18 @@  static void trim_double_dot_url(char *buf, const char *rel, int size)
     /* set new current position if the root node is changed */
     p = root;
     while (p && (node = strstr(p, ".."))) {
+        if (strlen(node) > 2 && node[2] != '/') {
+            node = strstr(node + 1, "..");
+            if (!node)
+                break;
+        }
+
+        if (p != node && p[node - p - 1] != '/') {
+            node = strstr(node + 1, "..");
+            if (!node)
+                break;
+        }
+
         av_strlcat(tmp_path, p, node - p + strlen(tmp_path));
         p = node + 3;
         sep = strrchr(tmp_path, '/');