diff mbox

[FFmpeg-devel] avfilter/image2: Add source file path and basename to each packet side data.

Message ID af9ca04a-86bd-ab8f-03ce-9e6b0fcc48a2@gmail.com
State New
Headers show

Commit Message

Alexandre Heitor Schmidt Dec. 21, 2019, 5:32 p.m. UTC
> In the commit msg,
 >
 >> libavformat/img2dec.c: Modify image2 demuxer to make available
 >> two special metadata entries called lavf.image2dec.source_basename
 >> and lavf.image2dec.source_basename, which represents, respectively,
 >> the complete path to the source image for the current frame and
 >> the basename i.e. the file name related to the current frame.
 >> These can then be used by filters like drawtext and others.
 >
 > First key name should end in source_path

Damn! You're right! New patch attached with this minor change corrected.

Thanks!

Alex.

Comments

Michael Niedermayer Dec. 22, 2019, 11:54 p.m. UTC | #1
On Sat, Dec 21, 2019 at 05:32:26PM +0000, Alexandre Heitor Schmidt wrote:
> > In the commit msg,
> >
> >> libavformat/img2dec.c: Modify image2 demuxer to make available
> >> two special metadata entries called lavf.image2dec.source_basename
> >> and lavf.image2dec.source_basename, which represents, respectively,
> >> the complete path to the source image for the current frame and
> >> the basename i.e. the file name related to the current frame.
> >> These can then be used by filters like drawtext and others.
> >
> > First key name should end in source_path
> 
> Damn! You're right! New patch attached with this minor change corrected.
> 
> Thanks!
> 
> Alex.
> 

>  doc/demuxers.texi     |   11 +++++++++++
>  doc/filters.texi      |    7 +++++++
>  libavformat/img2dec.c |   32 ++++++++++++++++++++++++++++++++
>  3 files changed, 50 insertions(+)
> c75f4fef2f9595bc69315d450219adda125aaf25  0001-avformat-image2-Metadata-identifying-the-source-path.patch
> >From 95ebffed3784ba12a32966128136e8deae0d8386 Mon Sep 17 00:00:00 2001
> From: Alexandre Heitor Schmidt <alexandre.schmidt@gmail.com>
> Date: Sat, 21 Dec 2019 16:19:31 +0000
> Subject: [PATCH] avformat/image2: Metadata identifying the source path of 
>  input filename and documentation for its usage.
> 
> libavformat/img2dec.c: Modify image2 demuxer to make available
> two special metadata entries called lavf.image2dec.source_path
> and lavf.image2dec.source_basename, which represents, respectively,
> the complete path to the source image for the current frame and
> the basename i.e. the file name related to the current frame.
> These can then be used by filters like drawtext and others.

not sure why i just now realize it but

Storing the source path is problematic privacy and security wise

Thanks

[...]
Alexandre Heitor Schmidt Dec. 23, 2019, 12:19 a.m. UTC | #2
>
> not sure why i just now realize it but
>
> Storing the source path is problematic privacy and security wise
>
> Thanks
>

What does this means? That it won't be applied?

Can you give an example on why this would be a security issue, considering
you already have it anywhere, only not available to filters until now?

Besides, isn't metadata way more sensitive than a simple file path?

Thanks.

Alex.
Marton Balint Dec. 23, 2019, 9:44 a.m. UTC | #3
On Mon, 23 Dec 2019, alexandre.schmidt@gmail.com wrote:

>>> not sure why i just now realize it but
>>
>> Storing the source path is problematic privacy and security wise
>>
>> Thanks
>>
>
> What does this means? That it won't be applied?

It only means that you may have to only activate this (or at least the 
full path metadata) feature if the user explicitly requests it. Probably 
the best way to do that is to introduce a new option of the image2 
demuxer, probably with an AV_OPT_TYPE_FLAGS type.

>
> Can you give an example on why this would be a security issue, considering
> you already have it anywhere, only not available to filters until now?
>
> Besides, isn't metadata way more sensitive than a simple file path?

The problem is possible leaking of information about the system which 
processed the file, because it is typically not disclosed in an output 
file where the system stores media files it processes.

Regards,
Marton
Michael Niedermayer Dec. 23, 2019, 10:30 a.m. UTC | #4
On Mon, Dec 23, 2019 at 12:19:21AM +0000, alexandre.schmidt@gmail.com wrote:
> >
> > not sure why i just now realize it but
> >
> > Storing the source path is problematic privacy and security wise
> >
> > Thanks
> >
> 
> What does this means? That it won't be applied?
> 
> Can you give an example on why this would be a security issue, considering
> you already have it anywhere, only not available to filters until now?
> 
> Besides, isn't metadata way more sensitive than a simple file path?

About security
The file path can reveal a wide range of information like
The platform used, 
The username, 
A potentially writable location
And a lot more depending on how the directories are layed out

About privacy
The username is commonly related to the users real name, thats
sensitive information
And a lot more depending on how the directories are layed out
consider a doctors office might have directories which use the
patients social security numbers in the path

The problem here is this is new metadata, the input never contained
this sensitive data but depending on what is done downstream with
it the output might contain this sensitive metadata

converting inputfile to outputfile shouldnt result in outputfile
containing sensitive information that wasnt in the input and that
the user did not explicitly ask for to be addded

To show why for example thers a privacy concern here, a slightly
unfunny hypothetical example:
A girl gets stalked by some guy online, she takes a screenshoot of
the message the guy sent her on facebook. And uploads that picture
sadly the picture contains her name, phone number and GPS coordinates
without her knowing. 

About "That it won't be applied?" 
I think the feature makes sense but it must be ensured that sensitive
data isnt added or leaking somewhere without the users knowledge and
concent

Thanks

[...]
Alexandre Heitor Schmidt Dec. 26, 2019, 2:01 p.m. UTC | #5
> you may have to only activate this (or at least the full path 
metadata) feature
 > if the user explicitly requests it. Probably the best way to do that 
is to
 > introduce a new option of the image2 demuxer, probably with an 
AV_OPT_TYPE_FLAGS
 > type.

Do you mean image2 should, for example, have another option like 
'enable_path_metadata' which would make the demuxer export the metadata 
only when set to "1"?

Thanks.

Alex.
Alexandre Heitor Schmidt Dec. 26, 2019, 2:11 p.m. UTC | #6
> About security
 > The file path can reveal a wide range of information like
 > The platform used,
 > The username,
 > A potentially writable location
 > And a lot more depending on how the directories are layed out
 >
 > About privacy
 > The username is commonly related to the users real name, thats
 > sensitive information
 > And a lot more depending on how the directories are layed out
 > consider a doctors office might have directories which use the
 > patients social security numbers in the path
 >
 > The problem here is this is new metadata, the input never contained
 > this sensitive data but depending on what is done downstream with
 > it the output might contain this sensitive metadata
 >
 > converting inputfile to outputfile shouldnt result in outputfile
 > containing sensitive information that wasnt in the input and that
 > the user did not explicitly ask for to be addded
 >
 > To show why for example thers a privacy concern here, a slightly
 > unfunny hypothetical example:
 > A girl gets stalked by some guy online, she takes a screenshoot of
 > the message the guy sent her on facebook. And uploads that picture
 > sadly the picture contains her name, phone number and GPS coordinates
 > without her knowing.

I see your point. I just think that, if the user is generating an 
output, he has access to the input files and path. If he wants to use 
the input information, he can. If he doesn't want to use that 
information, he can just ignore it. Maybe I'm not being able to see the 
whole picture.

 > About "That it won't be applied?"
 > I think the feature makes sense but it must be ensured that sensitive
 > data isnt added or leaking somewhere without the users knowledge and
 > concent

The 'concent' here, in my opinion, is the freedom the user has to either 
use or don't use the metadata. Or are you referring to cases where the 
input and output are not processed by the same person, such as on 
broadcast, streaming, etc?

Anyway, if an extra flag for image2, to make it export/don't export the 
path-related metadata is necessary, it can be implemented.

Alex.
Alexandre Heitor Schmidt Dec. 29, 2019, 7:41 p.m. UTC | #7
On 26/12/2019 14:01, Alexandre Heitor Schmidt wrote:
 > > you may have to only activate this (or at least the full path 
metadata) feature
 > > if the user explicitly requests it. Probably the best way to do 
that is to
 > > introduce a new option of the image2 demuxer, probably with an 
AV_OPT_TYPE_FLAGS
 > > type.
 >
 > Do you mean image2 should, for example, have another option like 
'enable_path_metadata' which would make the demuxer export the metadata 
only when set to "1"?

Can I add a new parameter to VideoDemuxData in img2.h, named 
export_path_metadata, which, when explicitly set to 1, will allow 
exporting input path as the special metadata parameters 
(lavf.image2dec.source_path and lavf.image2dec.source_basename)? The 
default will be not to export them, so it won't compromise security, as 
discussed previously in this thread.

Something like this:

In img2.h:

typedef struct VideoDemuxData {
     const AVClass *class;  /**< Class for private options. */
     int img_first;
     int img_last;
     int img_number;
     int64_t pts;
     int img_count;
     int is_pipe;
     int split_planes;       /**< use independent file for each Y, U, V 
plane */
     char path[1024];
     char *pixel_format;     /**< Set by a private option. */
     int width, height;      /**< Set by a private option. */
     AVRational framerate;   /**< Set by a private option. */
     int loop;
     int pattern_type; /**< PatternType */
     int use_glob;
#if HAVE_GLOB
     glob_t globstate;
#endif
     int start_number;
     int start_number_range;
     int frame_size;
     int ts_from_file;
     int export_path_metadata; /**< enabled when set to 1. */
} VideoDemuxData;

In img2dec.c:

#define COMMON_OPTIONS \
     { "framerate",    "set the video framerate", OFFSET(framerate),    
AV_OPT_TYPE_VIDEO_RATE, {.str = "25"}, 0, INT_MAX, DEC }, \
     { "pixel_format", "set video pixel format", OFFSET(pixel_format), 
AV_OPT_TYPE_STRING,     {.str = NULL}, 0, 0,       DEC }, \
     { "video_size",   "set video size", OFFSET(width),        
AV_OPT_TYPE_IMAGE_SIZE, {.str = NULL}, 0, 0,       DEC }, \
     { "loop",         "force loop over input file sequence", 
OFFSET(loop), AV_OPT_TYPE_BOOL,   {.i64 = 0   }, 0, 1,       DEC }, \
     { "export_path_metadata", "enable metadata containing input path 
information", OFFSET(export_path_metadata), AV_OPT_TYPE_BOOL, {.i64 = 
0   }, 0, 1,       DEC }, \
     { NULL },

Alex.
Marton Balint Dec. 29, 2019, 9:35 p.m. UTC | #8
On Sun, 29 Dec 2019, Alexandre Heitor Schmidt wrote:

> On 26/12/2019 14:01, Alexandre Heitor Schmidt wrote:
> > > you may have to only activate this (or at least the full path 
> metadata) feature
> > > if the user explicitly requests it. Probably the best way to do 
> that is to
> > > introduce a new option of the image2 demuxer, probably with an 
> AV_OPT_TYPE_FLAGS
> > > type.
> >
> > Do you mean image2 should, for example, have another option like 
> 'enable_path_metadata' which would make the demuxer export the metadata 
> only when set to "1"?
>
> Can I add a new parameter to VideoDemuxData in img2.h, named 
> export_path_metadata, which, when explicitly set to 1, will allow 
> exporting input path as the special metadata parameters 
> (lavf.image2dec.source_path and lavf.image2dec.source_basename)? The 
> default will be not to export them, so it won't compromise security, as 
> discussed previously in this thread.

Yeah, this seems fine as solution.

Thanks,
Marton

>
> Something like this:
>
> In img2.h:
>
> typedef struct VideoDemuxData {
>     const AVClass *class;  /**< Class for private options. */
>     int img_first;
>     int img_last;
>     int img_number;
>     int64_t pts;
>     int img_count;
>     int is_pipe;
>     int split_planes;       /**< use independent file for each Y, U, V 
> plane */
>     char path[1024];
>     char *pixel_format;     /**< Set by a private option. */
>     int width, height;      /**< Set by a private option. */
>     AVRational framerate;   /**< Set by a private option. */
>     int loop;
>     int pattern_type; /**< PatternType */
>     int use_glob;
> #if HAVE_GLOB
>     glob_t globstate;
> #endif
>     int start_number;
>     int start_number_range;
>     int frame_size;
>     int ts_from_file;
>     int export_path_metadata; /**< enabled when set to 1. */
> } VideoDemuxData;
>
> In img2dec.c:
>
> #define COMMON_OPTIONS \
>     { "framerate",    "set the video framerate", OFFSET(framerate),    
> AV_OPT_TYPE_VIDEO_RATE, {.str = "25"}, 0, INT_MAX, DEC }, \
>     { "pixel_format", "set video pixel format", OFFSET(pixel_format), 
> AV_OPT_TYPE_STRING,     {.str = NULL}, 0, 0,       DEC }, \
>     { "video_size",   "set video size", OFFSET(width),        
> AV_OPT_TYPE_IMAGE_SIZE, {.str = NULL}, 0, 0,       DEC }, \
>     { "loop",         "force loop over input file sequence", 
> OFFSET(loop), AV_OPT_TYPE_BOOL,   {.i64 = 0   }, 0, 1,       DEC }, \
>     { "export_path_metadata", "enable metadata containing input path 
> information", OFFSET(export_path_metadata), AV_OPT_TYPE_BOOL, {.i64 = 
> 0   }, 0, 1,       DEC }, \
>     { NULL },
>
> Alex.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff mbox

Patch

>From 95ebffed3784ba12a32966128136e8deae0d8386 Mon Sep 17 00:00:00 2001
From: Alexandre Heitor Schmidt <alexandre.schmidt@gmail.com>
Date: Sat, 21 Dec 2019 16:19:31 +0000
Subject: [PATCH] avformat/image2: Metadata identifying the source path of 
 input filename and documentation for its usage.

libavformat/img2dec.c: Modify image2 demuxer to make available
two special metadata entries called lavf.image2dec.source_path
and lavf.image2dec.source_basename, which represents, respectively,
the complete path to the source image for the current frame and
the basename i.e. the file name related to the current frame.
These can then be used by filters like drawtext and others.

doc/demuxers.texti: Documented the two special metadata tags.

doc/filters.texti: Added an usage example for these tags.

Fixes #2874.

Signed-off-by: Alexandre Heitor Schmidt <alexandre.schmidt@gmail.com>
---
 doc/demuxers.texi     | 11 +++++++++++
 doc/filters.texi      |  7 +++++++
 libavformat/img2dec.c | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/doc/demuxers.texi b/doc/demuxers.texi
index 0d13bdd1b3..96cff22267 100644
--- a/doc/demuxers.texi
+++ b/doc/demuxers.texi
@@ -362,6 +362,17 @@  determine the format of the images contained in the files.
 The size, the pixel format, and the format of each image must be the
 same for all the files in the sequence.
 
+Along with the metadata found within each file, two special metadata
+fields are made available for other filters (see @var{drawtext} filter
+for examples):
+
+@table @option
+@item lavf.image2dec.source_path
+Corresponds to the full path to the input image being read.
+@item lavf.image2dec.source_basename
+Corresponds to the name of the file being read.
+@end table
+
 This demuxer accepts the following options:
 @table @option
 @item framerate
diff --git a/doc/filters.texi b/doc/filters.texi
index 8c5d3a5760..6611d5977c 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -9872,6 +9872,13 @@  drawtext=fontfile=FreeSans.ttf:text=DOG:fontsize=24:x=10:y=20+24-max_glyph_a,
 drawtext=fontfile=FreeSans.ttf:text=cow:fontsize=24:x=80:y=20+24-max_glyph_a
 @end example
 
+@item
+Plot special @var{lavf.image2dec.source_basename} metadata onto each frame if
+such metadata exists. Otherwise, plot the string "NA".
+@example
+drawtext="fontsize=20:fontcolor=white:fontfile=FreeSans.ttf:text='%@{metadata\:lavf.image2dec.source_basename\:NA@}':x=10:y=10"
+@end example
+
 @end itemize
 
 For more information about libfreetype, check:
diff --git a/libavformat/img2dec.c b/libavformat/img2dec.c
index f8b4a655a5..86f9f1f657 100644
--- a/libavformat/img2dec.c
+++ b/libavformat/img2dec.c
@@ -374,6 +374,32 @@  int ff_img_read_header(AVFormatContext *s1)
     return 0;
 }
 
+/**
+ * Add this frame's source path and basename to packet's sidedata
+ * as a dictionary, so it can be used by filters like 'drawtext'.
+ */
+static int add_filename_as_pkt_side_data(char *filename, AVPacket *pkt) {
+    uint8_t* metadata;
+    int metadata_len;
+    AVDictionary *d = NULL;
+    char *packed_metadata = NULL;
+
+    av_dict_set(&d, "lavf.image2dec.source_path", filename, 0);
+    av_dict_set(&d, "lavf.image2dec.source_basename", av_basename(filename), 0);
+
+    packed_metadata = av_packet_pack_dictionary(d, &metadata_len);
+    if (!packed_metadata)
+        return AVERROR(ENOMEM);
+    if (!(metadata = av_packet_new_side_data(pkt, AV_PKT_DATA_STRINGS_METADATA, metadata_len))) {
+        av_freep(&packed_metadata);
+        return AVERROR(ENOMEM);
+    }
+    memcpy(metadata, packed_metadata, metadata_len);
+    av_freep(&packed_metadata);
+
+    return 0;
+}
+
 int ff_img_read_packet(AVFormatContext *s1, AVPacket *pkt)
 {
     VideoDemuxData *s = s1->priv_data;
@@ -486,6 +512,12 @@  int ff_img_read_packet(AVFormatContext *s1, AVPacket *pkt)
     if (s->is_pipe)
         pkt->pos = avio_tell(f[0]);
 
+    if (!s->is_pipe) {
+        res = add_filename_as_pkt_side_data(filename, pkt);
+        if (res < 0)
+            goto fail;
+    }
+
     pkt->size = 0;
     for (i = 0; i < 3; i++) {
         if (f[i]) {
-- 
2.17.1