diff mbox series

[FFmpeg-devel,v4,1/1] avformat: Add IPFS protocol support.

Message ID 20220203172950.21458-2-markg85@gmail.com
State New
Headers show
Series Add IPFS protocol support. | expand

Checks

Context Check Description
andriy/make_aarch64_jetson success Make finished
andriy/make_fate_aarch64_jetson success Make fate finished
andriy/make_armv7_RPi4 success Make finished
andriy/make_fate_armv7_RPi4 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished
andriy/make_ppc success Make finished
andriy/make_fate_ppc success Make fate finished

Commit Message

Mark Gaiser Feb. 3, 2022, 5:29 p.m. UTC
This patch adds support for:
- ffplay ipfs://<cid>
- ffplay ipns://<cid>

IPFS data can be played from so called "ipfs gateways".
A gateway is essentially a webserver that gives access to the
distributed IPFS network.

This protocol support (ipfs and ipns) therefore translates
ipfs:// and ipns:// to a http:// url. This resulting url is
then handled by the http protocol. It could also be https
depending on the gateway provided.

To use this protocol, a gateway must be provided.
If you do nothing it will try to find it in your
$HOME/.ipfs/gateway file. The ways to set it manually are:
1. Define a -gateway <url> to the gateway.
2. Define $IPFS_GATEWAY with the full http link to the gateway.
3. Define $IPFS_PATH and point it to the IPFS data path.
4. Have IPFS running in your local user folder (under $HOME/.ipfs).

Signed-off-by: Mark Gaiser <markg85@gmail.com>
---
 configure                 |   2 +
 doc/protocols.texi        |  30 ++++
 libavformat/Makefile      |   2 +
 libavformat/ipfsgateway.c | 329 ++++++++++++++++++++++++++++++++++++++
 libavformat/protocols.c   |   2 +
 5 files changed, 365 insertions(+)
 create mode 100644 libavformat/ipfsgateway.c

Comments

Tomas Härdin Feb. 4, 2022, 11:10 a.m. UTC | #1
tor 2022-02-03 klockan 18:29 +0100 skrev Mark Gaiser:
> 
> +typedef struct IPFSGatewayContext {
> +    AVClass *class;
> +    URLContext *inner;
> +    char *gateway;

Consider two separate variables. One for AVOption and one for the
dynamically allocated string. Or put the latter on the stack.

> +} IPFSGatewayContext;
> +
> +// A best-effort way to find the IPFS gateway.
> +// Only the most appropiate gateway is set. It's not actually
> requested
> +// (http call) to prevent a potential slowdown in startup. A
> potential timeout
> +// is handled by the HTTP protocol.
> +//
> +// Return codes can be:
> +// 1 : A potential gateway is found and set in c->gateway
> +// -1: The IPFS data folder could not be found
> +// -2: The gateway file could not be found
> +// -3: The gateway file is found but empty
> +// -4: $HOME is empty
> +// -9: Unhandled error

What Michael meant with better return codes is using AVERROR_* :)

> +static int populate_ipfs_gateway(URLContext *h)
> +{
> +    IPFSGatewayContext *c = h->priv_data;
> +    char *ipfs_full_data_folder = NULL;
> +    char *ipfs_gateway_file = NULL;

These can be char[PATH_MAX]

> +    struct stat st;
> +    int stat_ret = 0;
> +    int ret = -9;
> +    FILE *gateway_file = NULL;
> +    char gateway_file_data[1000];

A maximum URL length of 999?

> +
> +    // First, test if there already is a path in c->gateway. If it
> is then it
> +    // was provided as cli arument and should be used. It takes
> precdence.
> +    if (c->gateway != NULL) {
> +        ret = 1;
> +        goto err;
> +    }
> +
> +    // Test $IPFS_GATEWAY.
> +    if (getenv("IPFS_GATEWAY") != NULL) {
> +        av_free(c->gateway);

Useless since c->gateway is NULL

> +
> +        // Stat the folder.
> +        // It should exist in a default IPFS setup when run as local
> user.
> +#ifndef _WIN32
> +        stat_ret = stat(ipfs_full_data_folder, &st);
> +#else
> +        stat_ret = win32_stat(ipfs_full_data_folder, &st);
> +#endif

Again, there is no reason to stat this. Just try opening the gateway
file directly.

> +
> +    // Read a single line (fgets stops at new line mark).
> +    fgets(gateway_file_data, sizeof(gateway_file_data) - 1,
> gateway_file);

This can result in gateway_file_data not being NUL terminated

> +
> +    // Replace first occurence of end of line to \0
> +    gateway_file_data[strcspn(gateway_file_data, "\r\n")] = 0;

What if the file uses \n or no newlines at all?

> +err:
> +    if (gateway_file)
> +        fclose(gateway_file);
> +
> +    av_free(ipfs_full_data_folder);
> +    av_free(ipfs_gateway_file);

This is not cleaning up dynamic allocations of c->gateway

> +// -3: The gateway url part (without the protocol) is too short. We
> expect 3
> +//     characters minimal. So http://aaa would be the bare minimal.

http://1 is valid I think. It means http://0.0.0.1

> +    // Test if the gateway starts with either http:// or https://
> +    // The remainder is stored in url_without_protocol
> +    if (av_stristart(uri, "http://", &url_without_protocol) == 0
> +        && av_stristart(uri, "https://", &url_without_protocol) ==
> 0) {
> +        av_log(h, AV_LOG_ERROR, "The gateway URL didn't start with
> http:// or https:// and is therefore invalid.\n");
> +        ret = -2;
> +        goto err;
> +    }

I guess restricting this to HTTP schemes is OK. Or are there non-HTTP
gateways for this?

> +    if (last_gateway_char != '/') {
> +        c->gateway = av_asprintf("%s/", c->gateway);

Yet another leak

>     // Sanitize the gateway to a format we expect.
> +    if (sanitize_ipfs_gateway(h) < 1)
> +        goto err;

This will return unset ret, thus leaking data from the stack

> +static int ipfs_close(URLContext *h)
> +{
> +    IPFSGatewayContext *c = h->priv_data;

Here is where you'd put any deallocations

The quality of this patch is making me re-affirm what I've already said
viz parsing. bash+sed is superior.

/Tomas
Mark Gaiser Feb. 4, 2022, 2:12 p.m. UTC | #2
On Fri, Feb 4, 2022 at 12:10 PM Tomas Härdin <tjoppen@acc.umu.se> wrote:

> tor 2022-02-03 klockan 18:29 +0100 skrev Mark Gaiser:
> >
> > +typedef struct IPFSGatewayContext {
> > +    AVClass *class;
> > +    URLContext *inner;
> > +    char *gateway;
>
> Consider two separate variables. One for AVOption and one for the
> dynamically allocated string. Or put the latter on the stack.
>

There always needs to be a gateway so why is reusing that variable an issue?
I'm fine splitting it up but I'd like to understand the benefit of it as
currently I don't see that benefit.


> > +} IPFSGatewayContext;
> > +
> > +// A best-effort way to find the IPFS gateway.
> > +// Only the most appropiate gateway is set. It's not actually
> > requested
> > +// (http call) to prevent a potential slowdown in startup. A
> > potential timeout
> > +// is handled by the HTTP protocol.
> > +//
> > +// Return codes can be:
> > +// 1 : A potential gateway is found and set in c->gateway
> > +// -1: The IPFS data folder could not be found
> > +// -2: The gateway file could not be found
> > +// -3: The gateway file is found but empty
> > +// -4: $HOME is empty
> > +// -9: Unhandled error
>
> What Michael meant with better return codes is using AVERROR_* :)
>

Hey, first attempt at an ffmpeg contribution here. I didn't know that at
all. ;)
But yeah, will do!


> > +static int populate_ipfs_gateway(URLContext *h)
> > +{
> > +    IPFSGatewayContext *c = h->priv_data;
> > +    char *ipfs_full_data_folder = NULL;
> > +    char *ipfs_gateway_file = NULL;
>
> These can be char[PATH_MAX]
>

Oke, will do.
C code question though.
How do I use av_asprintf on stack arrays like that?

I kinda like to prevent using something different just to later figure out
that there was an av_ function that would be far better :)


> > +    struct stat st;
> > +    int stat_ret = 0;
> > +    int ret = -9;
> > +    FILE *gateway_file = NULL;
> > +    char gateway_file_data[1000];
>
> A maximum URL length of 999?
>

There doesn't seem to be a hard limit on url's..
It even seems to be browser dependent. Chrome apparently allows it up to
2MB, firefox up to 64k.
I'm just going to use PATH_MAX here too which seems plenty with 4096 bytes.
IPFS url's can be rather lengthy though so I do like to keep enough room
for it.


> > +
> > +    // First, test if there already is a path in c->gateway. If it
> > is then it
> > +    // was provided as cli arument and should be used. It takes
> > precdence.
> > +    if (c->gateway != NULL) {
> > +        ret = 1;
> > +        goto err;
> > +    }
> > +
> > +    // Test $IPFS_GATEWAY.
> > +    if (getenv("IPFS_GATEWAY") != NULL) {
> > +        av_free(c->gateway);
>
> Useless since c->gateway is NULL
>
> > +
> > +        // Stat the folder.
> > +        // It should exist in a default IPFS setup when run as local
> > user.
> > +#ifndef _WIN32
> > +        stat_ret = stat(ipfs_full_data_folder, &st);
> > +#else
> > +        stat_ret = win32_stat(ipfs_full_data_folder, &st);
> > +#endif
>
> Again, there is no reason to stat this. Just try opening the gateway
> file directly.
>

This is a folder, not a file.

The other stat that was here too was a file, I replaced that with an fopen.
It smells sketchy to me to (ab)use fopen to check if a folder exists.
There's stat for that.


>
> > +
> > +    // Read a single line (fgets stops at new line mark).
> > +    fgets(gateway_file_data, sizeof(gateway_file_data) - 1,
> > gateway_file);
>
> This can result in gateway_file_data not being NUL terminated


> > +
> > +    // Replace first occurence of end of line to \0
> > +    gateway_file_data[strcspn(gateway_file_data, "\r\n")] = 0;
>
> What if the file uses \n or no newlines at all?
>

Right.
So I guess the fix here is:
1. Initialize gateway_file_data so all bytes are zero
2. read a line
3. set the last byte of gateway_file_data to 0

Now any text in the string will be the gateway.

Is that a proper fix?


> > +err:
> > +    if (gateway_file)
> > +        fclose(gateway_file);
> > +
> > +    av_free(ipfs_full_data_folder);
> > +    av_free(ipfs_gateway_file);
>
> This is not cleaning up dynamic allocations of c->gateway
>

So I should do that in  ipfs_close, right?

>
> > +// -3: The gateway url part (without the protocol) is too short. We
> > expect 3
> > +//     characters minimal. So http://aaa would be the bare minimal.
>
> http://1 is valid I think. It means http://0.0.0.1


Right, 1 character it is.
I thought I'd go for a common sense approach as in "it can't possibly be
smaller than..."
But I suppose just forcing any character to be there is fine too.

>
>
> > +    // Test if the gateway starts with either http:// or https://
> > +    // The remainder is stored in url_without_protocol
> > +    if (av_stristart(uri, "http://", &url_without_protocol) == 0
> > +        && av_stristart(uri, "https://", &url_without_protocol) ==
> > 0) {
> > +        av_log(h, AV_LOG_ERROR, "The gateway URL didn't start with
> > http:// or https:// and is therefore invalid.\n");
> > +        ret = -2;
> > +        goto err;
> > +    }
>
> I guess restricting this to HTTP schemes is OK. Or are there non-HTTP
> gateways for this?
>

No.
At least not from the IPFS camp.
The IPFS software creates a gateway and that is specifically an http
gateway.
Users can put that behind a proxy making it (potentially) a https gateway
but that's about it.


>
> > +    if (last_gateway_char != '/') {
> > +        c->gateway = av_asprintf("%s/", c->gateway);
>
> Yet another leak
>

Please tell me how to fix this one.
As you can see, I need the c->gateway value to copy and add a "/" to it.

In C++ this would just be a dead simple append ;)



> >     // Sanitize the gateway to a format we expect.
> > +    if (sanitize_ipfs_gateway(h) < 1)
> > +        goto err;
>
> This will return unset ret, thus leaking data from the stack
>
> > +static int ipfs_close(URLContext *h)
> > +{
> > +    IPFSGatewayContext *c = h->priv_data;
>
> Here is where you'd put any deallocations
>
> The quality of this patch is making me re-affirm what I've already said
> viz parsing. bash+sed is superior.
>

That would be a superior waste of time if that were the outcome.
Let's not go there. It makes other parts much more complicated too.
But i've already explained that in length, no need to repeat myself here
again.

>
> /Tomas
>
>
Tomas Härdin Feb. 7, 2022, 3:09 p.m. UTC | #3
fre 2022-02-04 klockan 15:12 +0100 skrev Mark Gaiser:
> On Fri, Feb 4, 2022 at 12:10 PM Tomas Härdin <tjoppen@acc.umu.se>
> wrote:
> 
> > tor 2022-02-03 klockan 18:29 +0100 skrev Mark Gaiser:
> > > 
> > > +typedef struct IPFSGatewayContext {
> > > +    AVClass *class;
> > > +    URLContext *inner;
> > > +    char *gateway;
> > 
> > Consider two separate variables. One for AVOption and one for the
> > dynamically allocated string. Or put the latter on the stack.
> > 
> 
> There always needs to be a gateway so why is reusing that variable an
> issue?
> I'm fine splitting it up but I'd like to understand the benefit of it
> as
> currently I don't see that benefit.

Because of the way AVOption memory allocation works

> 
> > > +static int populate_ipfs_gateway(URLContext *h)
> > > +{
> > > +    IPFSGatewayContext *c = h->priv_data;
> > > +    char *ipfs_full_data_folder = NULL;
> > > +    char *ipfs_gateway_file = NULL;
> > 
> > These can be char[PATH_MAX]
> > 
> 
> Oke, will do.
> C code question though.
> How do I use av_asprintf on stack arrays like that?

snprintf(). Also be careful with PATH_MAX and +-1 bytes for the NUL.

> 
> > Again, there is no reason to stat this. Just try opening the
> > gateway
> > file directly.
> > 
> 
> This is a folder, not a file.
> 
> The other stat that was here too was a file, I replaced that with an
> fopen.
> It smells sketchy to me to (ab)use fopen to check if a folder exists.
> There's stat for that.

You don't need to check whether the folder exists at all. The only
thing that accomplishes is some AV_LOG_DEBUG prints that won't even get
compiled in unless a users builds with -g (I think). It's not sketchy -
it's spec'd behavior.

> 
> 
> > 
> > > +
> > > +    // Read a single line (fgets stops at new line mark).
> > > +    fgets(gateway_file_data, sizeof(gateway_file_data) - 1,
> > > gateway_file);
> > 
> > This can result in gateway_file_data not being NUL terminated
> 
> 
> > > +
> > > +    // Replace first occurence of end of line to \0
> > > +    gateway_file_data[strcspn(gateway_file_data, "\r\n")] = 0;
> > 
> > What if the file uses \n or no newlines at all?
> > 
> 
> Right.
> So I guess the fix here is:
> 1. Initialize gateway_file_data so all bytes are zero
> 2. read a line
> 3. set the last byte of gateway_file_data to 0
> 
> Now any text in the string will be the gateway.
> 
> Is that a proper fix?

Yes always putting a NUL at the end works. You don't need to initialize
with zero in that case. fgets() will NUL terminate except when there's
an error like the line being too long.

> 
> 
> > > +err:
> > > +    if (gateway_file)
> > > +        fclose(gateway_file);
> > > +
> > > +    av_free(ipfs_full_data_folder);
> > > +    av_free(ipfs_gateway_file);
> > 
> > This is not cleaning up dynamic allocations of c->gateway
> > 
> 
> So I should do that in  ipfs_close, right?

That's one place to do it yes. I forget whether _close() is called in
case of errors. av_freep() will set the pointer to NULL after freeing
so no double-frees occur.

> 
> > 
> > 
> > > +    // Test if the gateway starts with either http:// or
> > > https://
> > > +    // The remainder is stored in url_without_protocol
> > > +    if (av_stristart(uri, "http://", &url_without_protocol) == 0
> > > +        && av_stristart(uri, "https://", &url_without_protocol)
> > > ==
> > > 0) {
> > > +        av_log(h, AV_LOG_ERROR, "The gateway URL didn't start
> > > with
> > > http:// or https:// and is therefore invalid.\n");
> > > +        ret = -2;
> > > +        goto err;
> > > +    }
> > 
> > I guess restricting this to HTTP schemes is OK. Or are there non-
> > HTTP
> > gateways for this?
> > 
> 
> No.
> At least not from the IPFS camp.
> The IPFS software creates a gateway and that is specifically an http
> gateway.
> Users can put that behind a proxy making it (potentially) a https
> gateway
> but that's about it.

I see. I guess if any user puts this stuff behind gopher:// or
something then that's their problem.

> 
> > 
> > > +    if (last_gateway_char != '/') {
> > > +        c->gateway = av_asprintf("%s/", c->gateway);
> > 
> > Yet another leak
> > 
> 
> Please tell me how to fix this one.
> As you can see, I need the c->gateway value to copy and add a "/" to
> it.
> 
> In C++ this would just be a dead simple append ;)

Ensure there's enough space for '/' and a NUL and just write that to
the end.

snprintf() can do all of this if used appropriately. For example to
conditionally append "/" you can put %s in the format string and the
ternary

 needs_slash ? "/" : ""

as the associated argument

/Tomas
Mark Gaiser Feb. 9, 2022, 5:36 p.m. UTC | #4
On Mon, Feb 7, 2022 at 4:09 PM Tomas Härdin <tjoppen@acc.umu.se> wrote:

> fre 2022-02-04 klockan 15:12 +0100 skrev Mark Gaiser:
> > On Fri, Feb 4, 2022 at 12:10 PM Tomas Härdin <tjoppen@acc.umu.se>
> > wrote:
> >
> > > tor 2022-02-03 klockan 18:29 +0100 skrev Mark Gaiser:
> > > >
> > > > +typedef struct IPFSGatewayContext {
> > > > +    AVClass *class;
> > > > +    URLContext *inner;
> > > > +    char *gateway;
> > >
> > > Consider two separate variables. One for AVOption and one for the
> > > dynamically allocated string. Or put the latter on the stack.
> > >
> >
> > There always needs to be a gateway so why is reusing that variable an
> > issue?
> > I'm fine splitting it up but I'd like to understand the benefit of it
> > as
> > currently I don't see that benefit.
>
> Because of the way AVOption memory allocation works
>
> >
> > > > +static int populate_ipfs_gateway(URLContext *h)
> > > > +{
> > > > +    IPFSGatewayContext *c = h->priv_data;
> > > > +    char *ipfs_full_data_folder = NULL;
> > > > +    char *ipfs_gateway_file = NULL;
> > >
> > > These can be char[PATH_MAX]
> > >
> >
> > Oke, will do.
> > C code question though.
> > How do I use av_asprintf on stack arrays like that?
>
> snprintf(). Also be careful with PATH_MAX and +-1 bytes for the NUL.
>
> >
> > > Again, there is no reason to stat this. Just try opening the
> > > gateway
> > > file directly.
> > >
> >
> > This is a folder, not a file.
> >
> > The other stat that was here too was a file, I replaced that with an
> > fopen.
> > It smells sketchy to me to (ab)use fopen to check if a folder exists.
> > There's stat for that.
>
> You don't need to check whether the folder exists at all. The only
> thing that accomplishes is some AV_LOG_DEBUG prints that won't even get
> compiled in unless a users builds with -g (I think). It's not sketchy -
> it's spec'd behavior.
>
> >
> >
> > >
> > > > +
> > > > +    // Read a single line (fgets stops at new line mark).
> > > > +    fgets(gateway_file_data, sizeof(gateway_file_data) - 1,
> > > > gateway_file);
> > >
> > > This can result in gateway_file_data not being NUL terminated
> >
> >
> > > > +
> > > > +    // Replace first occurence of end of line to \0
> > > > +    gateway_file_data[strcspn(gateway_file_data, "\r\n")] = 0;
> > >
> > > What if the file uses \n or no newlines at all?
> > >
> >
> > Right.
> > So I guess the fix here is:
> > 1. Initialize gateway_file_data so all bytes are zero
> > 2. read a line
> > 3. set the last byte of gateway_file_data to 0
> >
> > Now any text in the string will be the gateway.
> >
> > Is that a proper fix?
>
> Yes always putting a NUL at the end works. You don't need to initialize
> with zero in that case. fgets() will NUL terminate except when there's
> an error like the line being too long.
>
> >
> >
> > > > +err:
> > > > +    if (gateway_file)
> > > > +        fclose(gateway_file);
> > > > +
> > > > +    av_free(ipfs_full_data_folder);
> > > > +    av_free(ipfs_gateway_file);
> > >
> > > This is not cleaning up dynamic allocations of c->gateway
> > >
> >
> > So I should do that in  ipfs_close, right?
>
> That's one place to do it yes. I forget whether _close() is called in
> case of errors. av_freep() will set the pointer to NULL after freeing
> so no double-frees occur.
>
> >
> > >
> > >
> > > > +    // Test if the gateway starts with either http:// or
> > > > https://
> > > > +    // The remainder is stored in url_without_protocol
> > > > +    if (av_stristart(uri, "http://", &url_without_protocol) == 0
> > > > +        && av_stristart(uri, "https://", &url_without_protocol)
> > > > ==
> > > > 0) {
> > > > +        av_log(h, AV_LOG_ERROR, "The gateway URL didn't start
> > > > with
> > > > http:// or https:// and is therefore invalid.\n");
> > > > +        ret = -2;
> > > > +        goto err;
> > > > +    }
> > >
> > > I guess restricting this to HTTP schemes is OK. Or are there non-
> > > HTTP
> > > gateways for this?
> > >
> >
> > No.
> > At least not from the IPFS camp.
> > The IPFS software creates a gateway and that is specifically an http
> > gateway.
> > Users can put that behind a proxy making it (potentially) a https
> > gateway
> > but that's about it.
>
> I see. I guess if any user puts this stuff behind gopher:// or
> something then that's their problem.
>
> >
> > >
> > > > +    if (last_gateway_char != '/') {
> > > > +        c->gateway = av_asprintf("%s/", c->gateway);
> > >
> > > Yet another leak
> > >
> >
> > Please tell me how to fix this one.
> > As you can see, I need the c->gateway value to copy and add a "/" to
> > it.
> >
> > In C++ this would just be a dead simple append ;)
>
> Ensure there's enough space for '/' and a NUL and just write that to
> the end.
>
> snprintf() can do all of this if used appropriately. For example to
> conditionally append "/" you can put %s in the format string and the
> ternary
>
>  needs_slash ? "/" : ""
>
> as the associated argument
>
> /Tomas
>
>
Thank you so much for all the valuable feedback! It's much appreciated :)

I'll fix all the still open issues and send another patch mail somewhere
later this week.
Question though, as this seems to be heading towards the final patch
version. How do you prefer to see the next patch round?

1. As it's done currently. You don't see what's different.
2. As a 1/2 and 2/2 patch where 2/2 would be the a diff on top of 1/2 with
the review feedback applied.

Either is fine with me.
But I fear that a split patch mail (so 1/2 with the code as is right now
and 2/2 with the feedback changes) could potentially waste people's time if
they comment on something that is fixed in the feedback patch.


> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Mark Gaiser Feb. 10, 2022, 1:14 a.m. UTC | #5
On Wed, Feb 9, 2022 at 6:36 PM Mark Gaiser <markg85@gmail.com> wrote:

> On Mon, Feb 7, 2022 at 4:09 PM Tomas Härdin <tjoppen@acc.umu.se> wrote:
>
>> fre 2022-02-04 klockan 15:12 +0100 skrev Mark Gaiser:
>> > On Fri, Feb 4, 2022 at 12:10 PM Tomas Härdin <tjoppen@acc.umu.se>
>> > wrote:
>> >
>> > > tor 2022-02-03 klockan 18:29 +0100 skrev Mark Gaiser:
>> > > >
>> > > > +typedef struct IPFSGatewayContext {
>> > > > +    AVClass *class;
>> > > > +    URLContext *inner;
>> > > > +    char *gateway;
>> > >
>> > > Consider two separate variables. One for AVOption and one for the
>> > > dynamically allocated string. Or put the latter on the stack.
>> > >
>> >
>> > There always needs to be a gateway so why is reusing that variable an
>> > issue?
>> > I'm fine splitting it up but I'd like to understand the benefit of it
>> > as
>> > currently I don't see that benefit.
>>
>> Because of the way AVOption memory allocation works
>>
>> >
>> > > > +static int populate_ipfs_gateway(URLContext *h)
>> > > > +{
>> > > > +    IPFSGatewayContext *c = h->priv_data;
>> > > > +    char *ipfs_full_data_folder = NULL;
>> > > > +    char *ipfs_gateway_file = NULL;
>> > >
>> > > These can be char[PATH_MAX]
>> > >
>> >
>> > Oke, will do.
>> > C code question though.
>> > How do I use av_asprintf on stack arrays like that?
>>
>> snprintf(). Also be careful with PATH_MAX and +-1 bytes for the NUL.
>>
>> >
>> > > Again, there is no reason to stat this. Just try opening the
>> > > gateway
>> > > file directly.
>> > >
>> >
>> > This is a folder, not a file.
>> >
>> > The other stat that was here too was a file, I replaced that with an
>> > fopen.
>> > It smells sketchy to me to (ab)use fopen to check if a folder exists.
>> > There's stat for that.
>>
>> You don't need to check whether the folder exists at all. The only
>> thing that accomplishes is some AV_LOG_DEBUG prints that won't even get
>> compiled in unless a users builds with -g (I think). It's not sketchy -
>> it's spec'd behavior.
>>
>> >
>> >
>> > >
>> > > > +
>> > > > +    // Read a single line (fgets stops at new line mark).
>> > > > +    fgets(gateway_file_data, sizeof(gateway_file_data) - 1,
>> > > > gateway_file);
>> > >
>> > > This can result in gateway_file_data not being NUL terminated
>> >
>> >
>> > > > +
>> > > > +    // Replace first occurence of end of line to \0
>> > > > +    gateway_file_data[strcspn(gateway_file_data, "\r\n")] = 0;
>> > >
>> > > What if the file uses \n or no newlines at all?
>> > >
>> >
>> > Right.
>> > So I guess the fix here is:
>> > 1. Initialize gateway_file_data so all bytes are zero
>> > 2. read a line
>> > 3. set the last byte of gateway_file_data to 0
>> >
>> > Now any text in the string will be the gateway.
>> >
>> > Is that a proper fix?
>>
>> Yes always putting a NUL at the end works. You don't need to initialize
>> with zero in that case. fgets() will NUL terminate except when there's
>> an error like the line being too long.
>>
>> >
>> >
>> > > > +err:
>> > > > +    if (gateway_file)
>> > > > +        fclose(gateway_file);
>> > > > +
>> > > > +    av_free(ipfs_full_data_folder);
>> > > > +    av_free(ipfs_gateway_file);
>> > >
>> > > This is not cleaning up dynamic allocations of c->gateway
>> > >
>> >
>> > So I should do that in  ipfs_close, right?
>>
>> That's one place to do it yes. I forget whether _close() is called in
>> case of errors. av_freep() will set the pointer to NULL after freeing
>> so no double-frees occur.
>>
>> >
>> > >
>> > >
>> > > > +    // Test if the gateway starts with either http:// or
>> > > > https://
>> > > > +    // The remainder is stored in url_without_protocol
>> > > > +    if (av_stristart(uri, "http://", &url_without_protocol) == 0
>> > > > +        && av_stristart(uri, "https://", &url_without_protocol)
>> > > > ==
>> > > > 0) {
>> > > > +        av_log(h, AV_LOG_ERROR, "The gateway URL didn't start
>> > > > with
>> > > > http:// or https:// and is therefore invalid.\n");
>> > > > +        ret = -2;
>> > > > +        goto err;
>> > > > +    }
>> > >
>> > > I guess restricting this to HTTP schemes is OK. Or are there non-
>> > > HTTP
>> > > gateways for this?
>> > >
>> >
>> > No.
>> > At least not from the IPFS camp.
>> > The IPFS software creates a gateway and that is specifically an http
>> > gateway.
>> > Users can put that behind a proxy making it (potentially) a https
>> > gateway
>> > but that's about it.
>>
>> I see. I guess if any user puts this stuff behind gopher:// or
>> something then that's their problem.
>>
>> >
>> > >
>> > > > +    if (last_gateway_char != '/') {
>> > > > +        c->gateway = av_asprintf("%s/", c->gateway);
>> > >
>> > > Yet another leak
>> > >
>> >
>> > Please tell me how to fix this one.
>> > As you can see, I need the c->gateway value to copy and add a "/" to
>> > it.
>> >
>> > In C++ this would just be a dead simple append ;)
>>
>> Ensure there's enough space for '/' and a NUL and just write that to
>> the end.
>>
>> snprintf() can do all of this if used appropriately. For example to
>> conditionally append "/" you can put %s in the format string and the
>> ternary
>>
>>  needs_slash ? "/" : ""
>>
>> as the associated argument
>>
>> /Tomas
>>
>>
> Thank you so much for all the valuable feedback! It's much appreciated :)
>
> I'll fix all the still open issues and send another patch mail somewhere
> later this week.
> Question though, as this seems to be heading towards the final patch
> version. How do you prefer to see the next patch round?
>
> 1. As it's done currently. You don't see what's different.
> 2. As a 1/2 and 2/2 patch where 2/2 would be the a diff on top of 1/2 with
> the review feedback applied.
>
> Either is fine with me.
> But I fear that a split patch mail (so 1/2 with the code as is right now
> and 2/2 with the feedback changes) could potentially waste people's time if
> they comment on something that is fixed in the feedback patch.
>
>
Never mind the question. V5 is up as option 1. This change again touched
about half the file so a diff doesn't make sense.


> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>>
>
Tomas Härdin Feb. 12, 2022, 11:58 a.m. UTC | #6
ons 2022-02-09 klockan 18:36 +0100 skrev Mark Gaiser:
> > 
> Thank you so much for all the valuable feedback! It's much
> appreciated :)
> 
> I'll fix all the still open issues and send another patch mail
> somewhere
> later this week.
> Question though, as this seems to be heading towards the final patch
> version. How do you prefer to see the next patch round?
> 
> 1. As it's done currently. You don't see what's different.
> 2. As a 1/2 and 2/2 patch where 2/2 would be the a diff on top of 1/2
> with
> the review feedback applied.
> 
> Either is fine with me.
> But I fear that a split patch mail (so 1/2 with the code as is right
> now
> and 2/2 with the feedback changes) could potentially waste people's
> time if
> they comment on something that is fixed in the feedback patch.

Submit patches as you want them in the source tree. Writing patches to
patches not yet committed makes little sense.

/Tomas
diff mbox series

Patch

diff --git a/configure b/configure
index 5b19a35f59..6ff09e7974 100755
--- a/configure
+++ b/configure
@@ -3585,6 +3585,8 @@  udp_protocol_select="network"
 udplite_protocol_select="network"
 unix_protocol_deps="sys_un_h"
 unix_protocol_select="network"
+ipfs_protocol_select="https_protocol"
+ipns_protocol_select="https_protocol"
 
 # external library protocols
 libamqp_protocol_deps="librabbitmq"
diff --git a/doc/protocols.texi b/doc/protocols.texi
index d207df0b52..7c9c0a4808 100644
--- a/doc/protocols.texi
+++ b/doc/protocols.texi
@@ -2025,5 +2025,35 @@  decoding errors.
 
 @end table
 
+@section ipfs
+
+InterPlanetary File System (IPFS) protocol support. One can access files stored 
+on the IPFS network through so called gateways. Those are http(s) endpoints.
+This protocol wraps the IPFS native protocols (ipfs:// and ipns://) to be send 
+to such a gateway. Users can (and should) host their own node which means this 
+protocol will use your local machine gateway to access files on the IPFS network.
+
+If a user doesn't have a node of their own then the public gateway dweb.link is 
+used by default.
+
+You can use this protocol in 2 ways. Using IPFS:
+@example
+ffplay ipfs://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
+@end example
+
+Or the IPNS protocol (IPNS is mutable IPFS):
+@example
+ffplay ipns://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
+@end example
+
+You can also change the gateway to be used:
+
+@table @option
+
+@item gateway
+Defines the gateway to use. When nothing is provided the protocol will first try 
+your local gateway. If that fails dweb.link will be used.
+
+@end table
 
 @c man end PROTOCOLS
diff --git a/libavformat/Makefile b/libavformat/Makefile
index 3dc6a479cc..4edce8420f 100644
--- a/libavformat/Makefile
+++ b/libavformat/Makefile
@@ -656,6 +656,8 @@  OBJS-$(CONFIG_SRTP_PROTOCOL)             += srtpproto.o srtp.o
 OBJS-$(CONFIG_SUBFILE_PROTOCOL)          += subfile.o
 OBJS-$(CONFIG_TEE_PROTOCOL)              += teeproto.o tee_common.o
 OBJS-$(CONFIG_TCP_PROTOCOL)              += tcp.o
+OBJS-$(CONFIG_IPFS_PROTOCOL)             += ipfsgateway.o
+OBJS-$(CONFIG_IPNS_PROTOCOL)             += ipfsgateway.o
 TLS-OBJS-$(CONFIG_GNUTLS)                += tls_gnutls.o
 TLS-OBJS-$(CONFIG_LIBTLS)                += tls_libtls.o
 TLS-OBJS-$(CONFIG_MBEDTLS)               += tls_mbedtls.o
diff --git a/libavformat/ipfsgateway.c b/libavformat/ipfsgateway.c
new file mode 100644
index 0000000000..72d431eac3
--- /dev/null
+++ b/libavformat/ipfsgateway.c
@@ -0,0 +1,329 @@ 
+/*
+ * IPFS and IPNS protocol support through IPFS Gateway.
+ * Copyright (c) 2022 Mark Gaiser
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "avformat.h"
+#include "libavutil/avassert.h"
+#include "libavutil/avstring.h"
+#include "libavutil/internal.h"
+#include "libavutil/opt.h"
+#include "libavutil/tree.h"
+#include <fcntl.h>
+#if HAVE_IO_H
+#include <io.h>
+#endif
+#if HAVE_UNISTD_H
+#include <unistd.h>
+#endif
+#include "os_support.h"
+#include "url.h"
+#include <stdlib.h>
+#include <sys/stat.h>
+
+typedef struct IPFSGatewayContext {
+    AVClass *class;
+    URLContext *inner;
+    char *gateway;
+} IPFSGatewayContext;
+
+// A best-effort way to find the IPFS gateway.
+// Only the most appropiate gateway is set. It's not actually requested
+// (http call) to prevent a potential slowdown in startup. A potential timeout
+// is handled by the HTTP protocol.
+//
+// Return codes can be:
+// 1 : A potential gateway is found and set in c->gateway
+// -1: The IPFS data folder could not be found
+// -2: The gateway file could not be found
+// -3: The gateway file is found but empty
+// -4: $HOME is empty
+// -9: Unhandled error
+static int populate_ipfs_gateway(URLContext *h)
+{
+    IPFSGatewayContext *c = h->priv_data;
+    char *ipfs_full_data_folder = NULL;
+    char *ipfs_gateway_file = NULL;
+    struct stat st;
+    int stat_ret = 0;
+    int ret = -9;
+    FILE *gateway_file = NULL;
+    char gateway_file_data[1000];
+
+    // First, test if there already is a path in c->gateway. If it is then it
+    // was provided as cli arument and should be used. It takes precdence.
+    if (c->gateway != NULL) {
+        ret = 1;
+        goto err;
+    }
+
+    // Test $IPFS_GATEWAY.
+    if (getenv("IPFS_GATEWAY") != NULL) {
+        av_free(c->gateway);
+        c->gateway = av_asprintf("%s", getenv("IPFS_GATEWAY"));
+        ret = 1;
+        goto err;
+    } else
+        av_log(h, AV_LOG_DEBUG, "$IPFS_GATEWAY is empty.\n");
+
+    // We need to know the IPFS folder to - eventually - read the contents of
+    // the "gateway" file which would tell us the gateway to use.
+    if (getenv("IPFS_PATH") == NULL) {
+        av_log(h, AV_LOG_DEBUG, "$IPFS_PATH is empty.\n");
+
+        // Try via the home folder.
+        if (getenv("HOME") == NULL) {
+            av_log(h, AV_LOG_ERROR, "$HOME appears to be empty.\n");
+            ret = -4;
+            goto err;
+        }
+
+        ipfs_full_data_folder = av_asprintf("%s/.ipfs/", getenv("HOME"));
+
+        // Stat the folder.
+        // It should exist in a default IPFS setup when run as local user.
+#ifndef _WIN32
+        stat_ret = stat(ipfs_full_data_folder, &st);
+#else
+        stat_ret = win32_stat(ipfs_full_data_folder, &st);
+#endif
+        if (stat_ret < 0) {
+            av_log(h, AV_LOG_DEBUG, "Unable to find IPFS folder. We tried:\n");
+            av_log(h, AV_LOG_DEBUG, "- $IPFS_PATH, which was empty.\n");
+            av_log(h, AV_LOG_DEBUG, "- $HOME/.ipfs (full uri: %s) which doesn't exist.\n", ipfs_full_data_folder);
+            ret = -1;
+            goto err;
+        }
+    } else
+        ipfs_full_data_folder = av_asprintf("%s", getenv("IPFS_PATH"));
+
+    ipfs_gateway_file = av_asprintf("%sgateway", ipfs_full_data_folder);
+
+    // Get the contents of the gateway file.
+    gateway_file = av_fopen_utf8(ipfs_gateway_file, "r");
+    if (!gateway_file) {
+        av_log(h, AV_LOG_ERROR, "The IPFS gateway file (full uri: %s) doesn't exist. Is the gateway enabled?\n", ipfs_gateway_file);
+        ret = -2;
+        goto err;
+    }
+
+    // Read a single line (fgets stops at new line mark).
+    fgets(gateway_file_data, sizeof(gateway_file_data) - 1, gateway_file);
+
+    // Replace first occurence of end of line to \0
+    gateway_file_data[strcspn(gateway_file_data, "\r\n")] = 0;
+
+    // If strlen finds anything longer then 0 characters then we have a
+    // potential gateway url.
+    if (strlen(gateway_file_data) < 1) {
+        av_log(h, AV_LOG_ERROR, "The IPFS gateway file (full uri: %s) appears to be empty. Is the gateway started?\n", ipfs_gateway_file);
+        ret = -3;
+        goto err;
+    }
+
+    // At this point gateway_file_data contains at least something.
+    // Copy it into c->gateway.
+    c->gateway = av_asprintf("%s", gateway_file_data);
+    if (c->gateway) {
+        ret = 1;
+        goto err;
+    } else
+        av_log(h, AV_LOG_DEBUG, "Unknown error in the IPFS gateway file.\n");
+
+err:
+    if (gateway_file)
+        fclose(gateway_file);
+
+    av_free(ipfs_full_data_folder);
+    av_free(ipfs_gateway_file);
+    return ret;
+}
+
+// For now just makes sure that the gateway ends in url we expect.
+// Like http://localhost:8080/.
+// Explicitly with the traling slash.
+//
+// Return codes can be: (any code above 0 is OK)
+// 2 : c->gateway modified to add a trailing "/"
+// 1 : c->gateway is set with a string that looks to be a valid URL
+// -1: The gateway is too short to be any kind of URL.
+// -2: The gateway didn't start with http:// or https://
+// -3: The gateway url part (without the protocol) is too short. We expect 3
+//     characters minimal. So http://aaa would be the bare minimal.
+// -9: Unhandled error
+static int sanitize_ipfs_gateway(URLContext *h)
+{
+    IPFSGatewayContext *c = h->priv_data;
+    char last_gateway_char;
+    const char *url_without_protocol;
+    const char *uri = c->gateway;
+    int ret = -9;
+
+    // "http://" is already 7 characters. Lets be really conservative and say a
+    // url must be 10 chatacters at the very least.
+    if (strlen(c->gateway) < 10) {
+        av_log(h, AV_LOG_ERROR, "The gateway URL must be at least 10 characters long.\n");
+        ret = -1;
+        goto err;
+    }
+
+    // Test if the gateway starts with either http:// or https://
+    // The remainder is stored in url_without_protocol
+    if (av_stristart(uri, "http://", &url_without_protocol) == 0
+        && av_stristart(uri, "https://", &url_without_protocol) == 0) {
+        av_log(h, AV_LOG_ERROR, "The gateway URL didn't start with http:// or https:// and is therefore invalid.\n");
+        ret = -2;
+        goto err;
+    }
+
+    // We now know the remainder of the url without the protocol. Check it for
+    // some length. At least 3 characters.
+    if (strlen(url_without_protocol) < 3) {
+        av_log(h, AV_LOG_ERROR, "The gateway url (without the protocol part) is too short to be a valid URL.\n");
+        ret = -3;
+        goto err;
+    }
+
+    last_gateway_char = c->gateway[strlen(c->gateway) - 1];
+
+    if (last_gateway_char != '/') {
+        c->gateway = av_asprintf("%s/", c->gateway);
+        ret = 2;
+        goto err;
+    }
+
+err:
+    return ret;
+}
+
+static int translate_ipfs_to_http(URLContext *h, const char *uri,
+                                  int flags, AVDictionary **options)
+{
+    const char *ipfs_cid;
+    const char *protocol_path_suffix = "ipfs/";
+    char *fulluri = NULL;
+    int ret;
+    IPFSGatewayContext *c = h->priv_data;
+
+    // Test for ipfs://, ipfs:, ipns:// and ipns:. This prefix is stripped from
+    // the string leaving just the CID in ipfs_cid.
+    int is_ipfs = (av_stristart(uri, "ipfs://", &ipfs_cid)
+                   || av_stristart(uri, "ipfs:", &ipfs_cid));
+    int is_ipns = (av_stristart(uri, "ipns://", &ipfs_cid)
+                   || av_stristart(uri, "ipns:", &ipfs_cid));
+
+    // Populate the IPFS gateway if we have any.
+    // If not, inform the user how to properly set one.
+    if (populate_ipfs_gateway(h) < 0) {
+        av_log(h, AV_LOG_ERROR, "No IPFS gateway was set. Make sure a local IPFS instance is running.\n");
+        av_log(h, AV_LOG_INFO, "There are multiple options to define this gateway. The below options are in order of precedence:\n");
+        av_log(h, AV_LOG_INFO, "1. Define a -gateway <url> to the gateway without trailing forward slash.\n");
+        av_log(h, AV_LOG_INFO, "2. Define $IPFS_GATEWAY with the full http link to the gateway without trailing forward slash.\n");
+        av_log(h, AV_LOG_INFO, "3. Define $IPFS_PATH and point it to the IPFS data path.\n");
+        av_log(h, AV_LOG_INFO, "4. Have IPFS running in your local user folder (under $HOME/.ipfs).\n");
+        av_log(h, AV_LOG_INFO, "In all path cases, a file named gateway is expected. See https://github.com/ipfs/specs/issues/261 for more information.\n");
+        ret = AVERROR(EINVAL);
+        goto err;
+    }
+
+    // Sanitize the gateway to a format we expect.
+    if (sanitize_ipfs_gateway(h) < 1)
+        goto err;
+
+    // We must have either ipns or ipfs.
+    if (!is_ipfs && !is_ipns) {
+        ret = AVERROR(EINVAL);
+        av_log(h, AV_LOG_ERROR, "Unsupported url %s\n", uri);
+        goto err;
+    }
+
+    // If we have IPNS, update the protocol.
+    if (is_ipns)
+        protocol_path_suffix = "ipns/";
+
+    // Concatenate the url.
+    // This ends up with something like: http://localhost:8080/ipfs/Qm.....
+    fulluri = av_asprintf("%s%s%s", c->gateway, protocol_path_suffix, ipfs_cid);
+
+    // Pass the URL back to FFMpeg's protocol handler.
+    if ((ret = ffurl_open_whitelist(&c->inner, fulluri, flags,
+                                    &h->interrupt_callback, options,
+                                    h->protocol_whitelist,
+                                    h->protocol_blacklist, h))
+        < 0) {
+        av_log(h, AV_LOG_ERROR, "Unable to open resource: %s\n", fulluri);
+        goto err;
+    }
+
+err:
+    av_free(fulluri);
+    return ret;
+}
+
+static int ipfs_read(URLContext *h, unsigned char *buf, int size)
+{
+    IPFSGatewayContext *c = h->priv_data;
+    return ffurl_read(c->inner, buf, size);
+}
+
+static int64_t ipfs_seek(URLContext *h, int64_t pos, int whence)
+{
+    IPFSGatewayContext *c = h->priv_data;
+    return ffurl_seek(c->inner, pos, whence);
+}
+
+static int ipfs_close(URLContext *h)
+{
+    IPFSGatewayContext *c = h->priv_data;
+    return ffurl_closep(&c->inner);
+}
+
+#define OFFSET(x) offsetof(IPFSGatewayContext, x)
+
+static const AVOption options[] = {
+    {"gateway", "The gateway to ask for IPFS data.", OFFSET(gateway), AV_OPT_TYPE_STRING, {.str = NULL}, 0, 0, AV_OPT_FLAG_DECODING_PARAM},
+    {NULL},
+};
+
+static const AVClass ipfs_context_class = {
+    .class_name   = "IPFS",
+    .item_name    = av_default_item_name,
+    .option       = options,
+    .version      = LIBAVUTIL_VERSION_INT,
+};
+
+const URLProtocol ff_ipfs_protocol = {
+    .name             = "ipfs",
+    .url_open2        = translate_ipfs_to_http,
+    .url_read         = ipfs_read,
+    .url_seek         = ipfs_seek,
+    .url_close        = ipfs_close,
+    .priv_data_size   = sizeof(IPFSGatewayContext),
+    .priv_data_class  = &ipfs_context_class,
+};
+
+const URLProtocol ff_ipns_protocol = {
+    .name             = "ipns",
+    .url_open2        = translate_ipfs_to_http,
+    .url_read         = ipfs_read,
+    .url_seek         = ipfs_seek,
+    .url_close        = ipfs_close,
+    .priv_data_size   = sizeof(IPFSGatewayContext),
+    .priv_data_class  = &ipfs_context_class,
+};
diff --git a/libavformat/protocols.c b/libavformat/protocols.c
index 948fae411f..675b684bd3 100644
--- a/libavformat/protocols.c
+++ b/libavformat/protocols.c
@@ -73,6 +73,8 @@  extern const URLProtocol ff_libsrt_protocol;
 extern const URLProtocol ff_libssh_protocol;
 extern const URLProtocol ff_libsmbclient_protocol;
 extern const URLProtocol ff_libzmq_protocol;
+extern const URLProtocol ff_ipfs_protocol;
+extern const URLProtocol ff_ipns_protocol;
 
 #include "libavformat/protocol_list.c"