[FFmpeg-devel,1/2] lavf/file: Add proper support for authority in file URIs.

Submitted by Andrey Semashev on Nov. 29, 2018, 6:18 p.m.

Details

Message ID 20181129181818.13295-1-andrey.semashev@gmail.com
State New
Headers show

Commit Message

Andrey Semashev Nov. 29, 2018, 6:18 p.m.
Previously, URIs with authority field were incorrectly interpreted as if
the authority was part of the path. This commit adds more complete file URI
parsing according to https://tools.ietf.org/html/rfc8089. In particular, the
file backend now recognizes URIs with authority field and ensures that it is
either empty or contains the special value "localhost". The file backend will
return EINVAL if the user attempts to use it with a URI referring to
a remote host.

Also, enable file_delete() on Windows as it provides equivalents of unlink()
and rmdir(). The compatibility glue is already provided by os_support.h.
---
 libavformat/file.c | 55 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 48 insertions(+), 7 deletions(-)

Comments

Nicolas George Nov. 29, 2018, 6:24 p.m.
Andrey Semashev (2018-11-29):
> Previously, URIs with authority field were incorrectly interpreted as if
> the authority was part of the path.

The "file:" prefix does not indicate a file:// URI but a path for the
file: protocol of FFmpeg.

You can check by yourself that they are not URIs by trying to get FFmpeg
to open file:///dev/nul%6c for example.

Regards,
Andrey Semashev Nov. 29, 2018, 6:42 p.m.
On 11/29/18 9:24 PM, Nicolas George wrote:
> Andrey Semashev (2018-11-29):
>> Previously, URIs with authority field were incorrectly interpreted as if
>> the authority was part of the path.
> 
> The "file:" prefix does not indicate a file:// URI but a path for the
> file: protocol of FFmpeg.

It does. avformat_open_input, avio_open and ffurl_open[_whitelist] docs 
all say it's an URL and they don't perform any conversion. So the file 
backend should be prepared to receive a URL, with a scheme and authority.

> You can check by yourself that they are not URIs by trying to get FFmpeg
> to open file:///dev/nul%6c for example.

It will probably currently fail because of the escape sequence. But even 
if it weren't for this reason, it would still be interpreting the URI 
the wrong way because of the authority part, which is what this patch fixes.

Escape sequences, if needed, can be fixed separately.
Nicolas George Nov. 29, 2018, 6:47 p.m.
Andrey Semashev (2018-11-29):
> It does. avformat_open_input, avio_open and ffurl_open[_whitelist] docs all
> say it's an URL and they don't perform any conversion. So the file backend
> should be prepared to receive a URL, with a scheme and authority.

So either the docs are slightly wrong or the code is. Do you have an
argument to decide it is one rather than the other?

I do:

> It will probably currently fail because of the escape sequence.

Exactly. Since escaping, a very basic feature of URIs, is not handled at
all, it is a clear indication that the paths are NOT meant to be
considered URIs. The documentation was added much later, and made the
same mistake you are doing now; same goes for a few private function
names.

> Escape sequences, if needed, can be fixed separately.

That would break a lot of working applications and is therefore not a
good idea.

Regards,
Andrey Semashev Nov. 29, 2018, 7:10 p.m.
On 11/29/18 9:47 PM, Nicolas George wrote:
> Andrey Semashev (2018-11-29):
>> It does. avformat_open_input, avio_open and ffurl_open[_whitelist] docs all
>> say it's an URL and they don't perform any conversion. So the file backend
>> should be prepared to receive a URL, with a scheme and authority.
> 
> So either the docs are slightly wrong or the code is. Do you have an
> argument to decide it is one rather than the other?
> 
> I do:
> 
>> It will probably currently fail because of the escape sequence.
> 
> Exactly. Since escaping, a very basic feature of URIs, is not handled at
> all, it is a clear indication that the paths are NOT meant to be
> considered URIs. The documentation was added much later, and made the
> same mistake you are doing now; same goes for a few private function
> names.

I condider the lack of support for escape sequences a bug, which is 
probably a rudiment of the past, when ffmpeg was primarily targeted for 
working with local files. The fact that all these functions also accept 
raw filesystem paths instead of URIs is also there for the same reason, 
only with additional benefit of convenience. Nowdays, there is one 
common interface for interacting with ffmpeg, and this interface is URIs 
(or raw local paths). There is no third pseudo-URI option, AFAICT. So, 
in my humble opinion the docs are correct, it is the implementation that 
needs to catch up.

>> Escape sequences, if needed, can be fixed separately.
> 
> That would break a lot of working applications and is therefore not a
> good idea.

If an application passes a URI and expects that it is not interpreted as 
such is already broken. I could make a patch adding support for escape 
sequences as well, but it seems you would not accept it. Am I correct?
Nicolas George Nov. 29, 2018, 7:16 p.m.
Andrey Semashev (2018-11-29):
>				     Nowdays, there is one common interface
> for interacting with ffmpeg, and this interface is URIs (or raw local
> paths). There is no third pseudo-URI option, AFAICT. So, in my humble
> opinion the docs are correct, it is the implementation that needs to catch
> up.

You are wrong. There is one common interface: that is pseudi-URI. URI is
not an option.

> If an application passes a URI and expects that it is not interpreted as
> such is already broken.

And it always was. Breaking something that works is worse than having
something that never worked still not work.

>			  I could make a patch adding support for escape
> sequences as well, but it seems you would not accept it. Am I correct?

As is, "fixing" the file: protocol paths to be treated as URIs would be
an API break, it is not acceptable.

You can propose patches to make FFmpeg support real URIs instead / in
addition to its old pseudo-URI syntax, but you would need to design with
API compatibility in mind.

Regards,
Andrey Semashev Nov. 29, 2018, 7:21 p.m.
On 11/29/18 10:16 PM, Nicolas George wrote:
> Andrey Semashev (2018-11-29):
>> 				     Nowdays, there is one common interface
>> for interacting with ffmpeg, and this interface is URIs (or raw local
>> paths). There is no third pseudo-URI option, AFAICT. So, in my humble
>> opinion the docs are correct, it is the implementation that needs to catch
>> up.
> 
> You are wrong. There is one common interface: that is pseudi-URI. URI is
> not an option.
> 
>> If an application passes a URI and expects that it is not interpreted as
>> such is already broken.
> 
> And it always was. Breaking something that works is worse than having
> something that never worked still not work.
> 
>> 			  I could make a patch adding support for escape
>> sequences as well, but it seems you would not accept it. Am I correct?
> 
> As is, "fixing" the file: protocol paths to be treated as URIs would be
> an API break, it is not acceptable.
> 
> You can propose patches to make FFmpeg support real URIs instead / in
> addition to its old pseudo-URI syntax, but you would need to design with
> API compatibility in mind.

Ok, thanks for your comments.

Patch hide | download patch | download mbox

diff --git a/libavformat/file.c b/libavformat/file.c
index 1d321c4205..040197d50d 100644
--- a/libavformat/file.c
+++ b/libavformat/file.c
@@ -19,6 +19,8 @@ 
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+#include "libavutil/log.h"
+#include "libavutil/error.h"
 #include "libavutil/avstring.h"
 #include "libavutil/internal.h"
 #include "libavutil/opt.h"
@@ -104,6 +106,31 @@  static const AVClass pipe_class = {
     .version    = LIBAVUTIL_VERSION_INT,
 };
 
+static int file_get_path(const char* filename, const char** ppath)
+{
+    const char* path = filename;
+    // Check if the filename is a file URI. https://tools.ietf.org/html/rfc8089#section-2
+    if (av_strncasecmp(path, "file:", sizeof("file:") - 1) == 0) {
+        path += sizeof("file:") - 1;
+        if (path[0] == '/' && path[1] == '/') {
+            // The URI may have an authority part. Check that the authority does not contain
+            // a remote host name. We cannot access filesystem on a different host.
+            path += 2;
+            if (path[0] != '/') {
+                if (strncmp(path, "localhost/", sizeof("localhost/") - 1) == 0) {
+                    path += sizeof("localhost") - 1;
+                } else {
+                    av_log(NULL, AV_LOG_ERROR, "File URIs referencing a remote host are not supported: %s\n", filename);
+                    return AVERROR(EINVAL);
+                }
+            }
+        }
+    }
+
+    *ppath = path;
+    return 0;
+}
+
 static int file_read(URLContext *h, unsigned char *buf, int size)
 {
     FileContext *c = h->priv_data;
@@ -136,7 +163,9 @@  static int file_check(URLContext *h, int mask)
 {
     int ret = 0;
     const char *filename = h->filename;
-    av_strstart(filename, "file:", &filename);
+    ret = file_get_path(filename, &filename);
+    if (ret < 0)
+        return ret;
 
     {
 #if HAVE_ACCESS && defined(R_OK)
@@ -167,10 +196,12 @@  static int file_check(URLContext *h, int mask)
 
 static int file_delete(URLContext *h)
 {
-#if HAVE_UNISTD_H
+#if HAVE_UNISTD_H || defined(_WIN32)
     int ret;
     const char *filename = h->filename;
-    av_strstart(filename, "file:", &filename);
+    ret = file_get_path(filename, &filename);
+    if (ret < 0)
+        return ret;
 
     ret = rmdir(filename);
     if (ret < 0 && errno == ENOTDIR)
@@ -188,8 +219,12 @@  static int file_move(URLContext *h_src, URLContext *h_dst)
 {
     const char *filename_src = h_src->filename;
     const char *filename_dst = h_dst->filename;
-    av_strstart(filename_src, "file:", &filename_src);
-    av_strstart(filename_dst, "file:", &filename_dst);
+    int ret = file_get_path(filename_src, &filename_src);
+    if (ret < 0)
+        return ret;
+    ret = file_get_path(filename_dst, &filename_dst);
+    if (ret < 0)
+        return ret;
 
     if (rename(filename_src, filename_dst) < 0)
         return AVERROR(errno);
@@ -206,7 +241,9 @@  static int file_open(URLContext *h, const char *filename, int flags)
     int fd;
     struct stat st;
 
-    av_strstart(filename, "file:", &filename);
+    int ret = file_get_path(filename, &filename);
+    if (ret < 0)
+        return ret;
 
     if (flags & AVIO_FLAG_WRITE && flags & AVIO_FLAG_READ) {
         access = O_CREAT | O_RDWR;
@@ -264,8 +301,12 @@  static int file_open_dir(URLContext *h)
 {
 #if HAVE_LSTAT
     FileContext *c = h->priv_data;
+    const char* dirname = h->filename;
+    int ret = file_get_path(dirname, &dirname);
+    if (ret < 0)
+        return ret;
 
-    c->dir = opendir(h->filename);
+    c->dir = opendir(dirname);
     if (!c->dir)
         return AVERROR(errno);