diff mbox series

[FFmpeg-devel,v2,1/2] avutil/wchar_filename, file_open: Support long file names on Windows

Message ID b66dbdf40c1984c3fa8b759ee82a0e205376102f.1652653071.git.ffmpegagent@gmail.com
State New
Headers show
Series Support long file names on Windows | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

softworkz May 15, 2022, 10:17 p.m. UTC
From: softworkz <softworkz@hotmail.com>

Signed-off-by: softworkz <softworkz@hotmail.com>
---
 libavutil/file_open.c      |   2 +-
 libavutil/wchar_filename.h | 156 +++++++++++++++++++++++++++++++++++++
 2 files changed, 157 insertions(+), 1 deletion(-)

Comments

Nil Admirari May 16, 2022, 8:12 a.m. UTC | #1
> +static inline int path_is_extended(const wchar_t *path)
> +{
> + size_t len = wcslen(path);
> + if (len >= 4 && path[0] == L'\\' && (path[1] == L'\\' || path[1] == L'?') && path[2] == L'?' && path[3] == L'\\')

Length check is probably unnecessary: comparisons will reject '\0'
and further comparisons won't run due to short-circuiting.

> + // The length of unc_prefix is 6 plus 1 for terminating zeros
> + temp_w = (wchar_t *)av_calloc(len + 6 + 1, sizeof(wchar_t));

Not really true. The length of unc_prefix is 8.
2 is subtracted because UNC path already has \\ at the beginning.

> + if (len >= 260 || (*ppath_w)[len - 1] == L' ' || (*ppath_w)[len - 1] == L'.') {

1. Please change 260 to MAX_PATH.
2. GetFullPathName removes trailing spaces and dots, so the second part is always false.
Soft Works May 16, 2022, 9:14 p.m. UTC | #2
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of nil-
> admirari@mailo.com
> Sent: Monday, May 16, 2022 10:12 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> file_open: Support long file names on Windows
> 
> > +static inline int path_is_extended(const wchar_t *path)
> > +{
> > + size_t len = wcslen(path);
> > + if (len >= 4 && path[0] == L'\\' && (path[1] == L'\\' || path[1]
> == L'?') && path[2] == L'?' && path[3] == L'\\')
> 
> Length check is probably unnecessary: comparisons will reject '\0'
> and further comparisons won't run due to short-circuiting.

Yup, I think you're right, even though it would appear "unsafe" at
first sight. I've removed the length check.


> > + // The length of unc_prefix is 6 plus 1 for terminating zeros
> > + temp_w = (wchar_t *)av_calloc(len + 6 + 1, sizeof(wchar_t));
> 
> Not really true. The length of unc_prefix is 8.
> 2 is subtracted because UNC path already has \\ at the beginning.

Correct. It actually needs to be "len - 2 + 8 + 1".
I've updated the comment and the calculation.


> > + if (len >= 260 || (*ppath_w)[len - 1] == L' ' || (*ppath_w)[len -
> 1] == L'.') {
> 
> 1. Please change 260 to MAX_PATH.

Done.


> 2. GetFullPathName removes trailing spaces and dots, so the second
> part is always false.

Yea, when someone would want to handle such (weird) kind of path,
one would need to specify an extended path directly.
Removed the second part.


> > We already have win32_stat, but what's a bit tricky is that the
> > struct that this function takes as a parameter is named the same
> > as the function itself.
> 
> Sorry, I thought is was a definition of a function, not a struct.
> Since stat function is already defined as win32_stat,
> It's better to revert the changes.

stat wasn't already defined as win32_stat.
win32_stat was already defined but not mapped. That's what my change
does. 


> > The functions are needed in both. file_open.c cannot be included
> > in libavformat/os_support.h and neither the other way round,
> > so they need to be in a 3rd place. How about renaming
> > wchar_filename.h to windows_filename.h ?
> 
> Probably it's better to rename.

OK, but let's do this in subsequent patch shortly after.


> > I have skipped those checks because we won't have partially
> qualified
> > paths at this point (due to having called GetFullPathNameW) and
> > device paths are not allowed to be longer than 260, so this it might
> > happen that the UNC prefix gets added, but only when it's a long
> > path which doesn't work anyway (I've tested those cases).
> 
> I think it's better to test for \\.\ explicitly in path_is_extended:
> 1. It's not obvious that \\.\ aren't allowed to be long.
> 2. Probably FFmpeg is not going to have a longPathAware manifest,
>    but it can be linked with an EXE with such a manifest.
>    Would MAX_PATH restriction still apply?

That's a good question, but we need to be clear that device paths are
actually intended for accessing devices, e.g. like \\.\COM1
The fact that the prefix also works with a drive-letter path
is more due to some heritage and nobody would normally want to 
use such kind of paths to access files.

That being said, I've added a check (path_is_device_path()) for 
this now in the same way as .NET is doing it - which means
inside add_extended_prefix().


> You have the checks inside of get_extended_win32_path and none
> inside of add_extended_prefix. Yet add_extended_prefix can be called
> by anyone: it's not private. Thus add_extended_prefix either should be
> inlined,
> or it should have the necessary checks in place. Otherwise you end up
> with
> an API that's easy to use incorrectly and hard to use correctly, and
> it should be the other way around.

I have added checks there now for both device path and extended path
prefix. I have also clarified the function doc even further, so I 
hope it's sufficiently clear now. ;-)


> > And what's the point about this?
> 
> Point is obvious: extended paths are difficult to handle correctly.
> get_extended_win32_path cannot be used on its own, only as a final
> step before getting FILE* or a file descriptor.

Yes, that's how it's meant to be used, so the whole application can
be kept free from dealing with it.


Thanks again for the review, these were some good improvements.

Kind regards,
softworkz
Nil Admirari May 17, 2022, 3:06 p.m. UTC | #3
> stat wasn't already defined as win32_stat.
> win32_stat was already defined but not mapped. That's what my change
> does. 

There are two defines in os_support.h:

#  ifdef stat
#   undef stat
#  endif
#  define stat _stati64

and

DEF_FS_FUNCTION2(stat, _wstati64, _stati64, struct stat*)

which defines win32_stat (not stat). This function takes struct stat*, which due to previous define
expands into struct _stati64*.

_stati64 and _wstati64 both take struct _stati64*, which is named identically to the first function.
struct _stati64 expands into different structs depending on the value of _USE_32BIT_TIME_T,
which your explicit structure definition does not capture, see:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions?view=msvc-170.
If someone defines_USE_32BIT_TIME_T, your code will fail to compile.

C allows functions and structs to have identical names, preprocessor does not;
therefore win32_stat must be used explicitly where stat is required as in file.c:160

    struct stat st; // expands into struct _stati64 on Windows.
#   ifndef _WIN32
    ret = stat(filename, &st);
#   else
    ret = win32_stat(filename, &st);
#   endif

However, no everyone follows: img2dec.c:504 and ipfsgateway.c:104 use plain stat.

if (stat(filename, &img_stat)) {
stat_ret = stat(ipfs_full_data_folder, &st);

In these files, on Windows, both the struct and the function call expand into _stati64,
and this time the function call bypasses the UTF-8 to wchar conversion.

Apparently yet another macro is necessary:

#ifdef _WIN32
#define ff_stat win32_stat
#else
#define ff_stat stat
#endif
Soft Works May 17, 2022, 3:28 p.m. UTC | #4
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of nil-
> admirari@mailo.com
> Sent: Tuesday, May 17, 2022 5:07 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> file_open: Support long file names on Windows
> 
> > stat wasn't already defined as win32_stat.
> > win32_stat was already defined but not mapped. That's what my change
> > does.
> 
> There are two defines in os_support.h:
> 
> #  ifdef stat
> #   undef stat
> #  endif
> #  define stat _stati64
> 
> and
> 
> DEF_FS_FUNCTION2(stat, _wstati64, _stati64, struct stat*)
> 
> which defines win32_stat (not stat). This function takes struct stat*,
> which due to previous define
> expands into struct _stati64*.
> 
> _stati64 and _wstati64 both take struct _stati64*, which is named
> identically to the first function.
> struct _stati64 expands into different structs depending on the value
> of _USE_32BIT_TIME_T,
> which your explicit structure definition does not capture, see:
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-
> functions?view=msvc-170.
> If someone defines_USE_32BIT_TIME_T, your code will fail to compile.

Yes, that's true. But there are hundreds of other things someone could
define which makes compilation fail. 
We don't need to accommodate for every single possibility, and it's not
that _USE_32BIT_TIME_T would be required or the default for 32bit 
compilation.


> C allows functions and structs to have identical names, preprocessor
> does not;
> therefore win32_stat must be used explicitly where stat is required as
> in file.c:160

Except when you define a compatible struct with the same name as the
function - like I did.


>     struct stat st; // expands into struct _stati64 on Windows.
> #   ifndef _WIN32
>     ret = stat(filename, &st);
> #   else
>     ret = win32_stat(filename, &st);
> #   endif

This could be removed after the patch.


> However, no everyone follows: img2dec.c:504 and ipfsgateway.c:104 use
> plain stat.
> 
> if (stat(filename, &img_stat)) {
> stat_ret = stat(ipfs_full_data_folder, &st);
> 
> In these files, on Windows, both the struct and the function call
> expand into _stati64,
> and this time the function call bypasses the UTF-8 to wchar
> conversion.
> 
> Apparently yet another macro is necessary:
> 
> #ifdef _WIN32
> #define ff_stat win32_stat
> #else
> #define ff_stat stat
> #endif

Probably you didn't spot it. It's already there:

#  define stat win32_stat

Kind regards,
softworkz
Soft Works May 17, 2022, 3:43 p.m. UTC | #5
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft
> Works
> Sent: Tuesday, May 17, 2022 5:28 PM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> file_open: Support long file names on Windows
> 
> 
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> nil-
> > admirari@mailo.com
> > Sent: Tuesday, May 17, 2022 5:07 PM
> > To: ffmpeg-devel@ffmpeg.org
> > Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> > file_open: Support long file names on Windows
> >
> > > stat wasn't already defined as win32_stat.
> > > win32_stat was already defined but not mapped. That's what my
> change
> > > does.
> >
> > There are two defines in os_support.h:
> >
> > #  ifdef stat
> > #   undef stat
> > #  endif
> > #  define stat _stati64
> >
> > and
> >
> > DEF_FS_FUNCTION2(stat, _wstati64, _stati64, struct stat*)
> >
> > which defines win32_stat (not stat). This function takes struct
> stat*,
> > which due to previous define
> > expands into struct _stati64*.
> >
> > _stati64 and _wstati64 both take struct _stati64*, which is named
> > identically to the first function.
> > struct _stati64 expands into different structs depending on the
> value
> > of _USE_32BIT_TIME_T,
> > which your explicit structure definition does not capture, see:
> > https://docs.microsoft.com/en-us/cpp/c-runtime-
> library/reference/stat-
> > functions?view=msvc-170.
> > If someone defines_USE_32BIT_TIME_T, your code will fail to compile.
> 
> Yes, that's true. But there are hundreds of other things someone could
> define which makes compilation fail.
> We don't need to accommodate for every single possibility, and it's
> not
> that _USE_32BIT_TIME_T would be required or the default for 32bit
> compilation.
> 
> 
> > C allows functions and structs to have identical names, preprocessor
> > does not;
> > therefore win32_stat must be used explicitly where stat is required
> as
> > in file.c:160
> 
> Except when you define a compatible struct with the same name as the
> function - like I did.

I don't want to say that I'd consider this to be a great solution.
But the problem is that the function and struct names are identical
and when we want to re-define/map the function, we also need to
provide a struct of that name because the macro-replacement can't
work selectively.

Maybe there is some cool trick to handle this, yet I haven't had
a better idea so far.

Thanks,
softworkz
Nil Admirari May 20, 2022, 5:51 p.m. UTC | #6
> Yes, that's true. But there are hundreds of other things someone could
> define which makes compilation fail.

Doesn't mean that yet another such thing should be added by incorrectly
redefining structs already defined correctly by system headers.

> Probably you didn't spot it. It's already there:
> # define stat win32_stat

I'm actually wondering how does it even compile. All stat structs in code
become struct win32_stat, and all calls to stat become calls to win32_stat,
which in turn wraps _wstati64. But _wstati64 does not accept struct win32_stat*,
it accepts struct _stati64*. Content of these structs is probably identical, but
it should not matter: C is typed nominally, not structurally.

> I don't want to say that I'd consider this to be a great solution.
> But the problem is that the function and struct names are identical
> and when we want to re-define/map the function, we also need to
> provide a struct of that name because the macro-replacement can't
> work selectively.

Doesn't mean that the should be named identically in FFmpeg code.
Naming a struct stat and a function avpriv_stat is a reasonable choice.
You can even define avpriv_stat with parameters the way fstat is defined:

#ifdef _WIN32
#define avpriv_stat(f,s) win32_stat((f), (s))
#else
#define avpriv_stat(f,s) stat((f), (s))
#endif
Soft Works May 20, 2022, 6:03 p.m. UTC | #7
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of nil-
> admirari@mailo.com
> Sent: Friday, May 20, 2022 7:52 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> file_open: Support long file names on Windows
> 
> > Yes, that's true. But there are hundreds of other things someone could
> > define which makes compilation fail.
> 
> Doesn't mean that yet another such thing should be added by incorrectly
> redefining structs already defined correctly by system headers.
> 
> > Probably you didn't spot it. It's already there:
> > # define stat win32_stat
> 
> I'm actually wondering how does it even compile. All stat structs in code
> become struct win32_stat, and all calls to stat become calls to
> win32_stat,
> which in turn wraps _wstati64. But _wstati64 does not accept struct
> win32_stat*,
> it accepts struct _stati64*. Content of these structs is probably
> identical, but
> it should not matter: C is typed nominally, not structurally.
> 
> > I don't want to say that I'd consider this to be a great solution.
> > But the problem is that the function and struct names are identical
> > and when we want to re-define/map the function, we also need to
> > provide a struct of that name because the macro-replacement can't
> > work selectively.
> 
> Doesn't mean that the should be named identically in FFmpeg code.
> Naming a struct stat and a function avpriv_stat is a reasonable choice.
> You can even define avpriv_stat with parameters the way fstat is defined:
> 
> #ifdef _WIN32
> #define avpriv_stat(f,s) win32_stat((f), (s))
> #else
> #define avpriv_stat(f,s) stat((f), (s))
> #endif

I thought the purpose of all those re-mappings would be that plain
Posix functions can still be used..?
It's already the Posix declaration where the function name 
is the same as the structure name (stat).

I'm unbiased. avpriv_stat() would be ok from my POV as well, but 
I'm not sure whether all others would agree when plain stat could
no longer be used (without breaking long filename support)?

Thanks,
softworkz
Nil Admirari May 21, 2022, 11:08 a.m. UTC | #8
> I thought the purpose of all those re-mappings would be that plain
> Posix functions can still be used..?
> It's already the Posix declaration where the function name 
> is the same as the structure name (stat).

Not possible for stat precisely because of function and struct sharing a name.
That's why stat is used as

# ifndef _WIN32
ret = stat(filename, &st);
# else
ret = win32_stat(filename, &st);
# endif

Such blocks are either to be copied across the application,
or hidden behind yet another macro.
Soft Works May 21, 2022, 11:12 a.m. UTC | #9
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of nil-
> admirari@mailo.com
> Sent: Saturday, May 21, 2022 1:08 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> file_open: Support long file names on Windows
> 
> > I thought the purpose of all those re-mappings would be that plain
> > Posix functions can still be used..?
> > It's already the Posix declaration where the function name
> > is the same as the structure name (stat).
> 
> Not possible for stat precisely because of function and struct sharing a
> name.

That's exactly what is said above and before :-)

> That's why stat is used as
> 
> # ifndef _WIN32
> ret = stat(filename, &st);
> # else
> ret = win32_stat(filename, &st);
> # endif
> 
> Such blocks are either to be copied across the application,
> or hidden behind yet another macro.

The latter is what my patchset already does.


Kind regards,
sw
Nil Admirari May 23, 2022, 3:35 p.m. UTC | #10
>> Not possible for stat precisely because of function and struct sharing a
>> name.

> That's exactly what is said above and before :-)

My previous question was not answered, so I had to look up the answer myself.

> I'm actually wondering how does it even compile. All stat structs in code
> become struct win32_stat, and all calls to stat become calls to win32_stat,
> which in turn wraps _wstati64. But _wstati64 does not accept struct win32_stat*,
> it accepts struct _stati64*. Content of these structs is probably identical, but
> it should not matter: C is typed nominally, not structurally.

Turns out C actually has a concept of compatible types:
https://en.cppreference.com/w/c/language/type.

The problems is:
> they are both structure/union/enumeration types, and
> - (c99) if one is declared with a tag, the other must also be declared with the same tag.
> ...
> If two declarations refer to the same object or function and do not use compatible types,
> the behavior of the program is undefined.

Your structure does not have the same tag as the CRT's one.
Are you sure you want to rely on undefined behaviour?

I haven't compiled your code, but the following simple example:

struct A { int a, b, c; };
struct B { int a, b, c; };
void printA(struct A *a);

struct B b = { 1, 2, 3 };
printA(&b);

generates a

warning: passing argument 1 of ‘printA’ from incompatible pointer type [-Wincompatible-pointer-types]
    |     printA(&b);
    |            ^~
    |            |
    |            struct B *
note: expected ‘struct A *’ but argument is of type ‘struct B *’
    | void printA(struct A *a)

Are you sure you wanna add a couple of similar warnings to the project?

Needless to repeat, _USE_32BIT_TIME_T is not supported.
Soft Works May 23, 2022, 3:47 p.m. UTC | #11
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of nil-
> admirari@mailo.com
> Sent: Monday, May 23, 2022 5:36 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> file_open: Support long file names on Windows
> 
> >> Not possible for stat precisely because of function and struct sharing
> a
> >> name.
> 
> > That's exactly what is said above and before :-)
> 
> My previous question was not answered, so I had to look up the answer
> myself.
> 
> > I'm actually wondering how does it even compile. All stat structs in
> code
> > become struct win32_stat, and all calls to stat become calls to
> win32_stat,
> > which in turn wraps _wstati64. But _wstati64 does not accept struct
> win32_stat*,
> > it accepts struct _stati64*. Content of these structs is probably
> identical, but
> > it should not matter: C is typed nominally, not structurally.
> 
> Turns out C actually has a concept of compatible types:
> https://en.cppreference.com/w/c/language/type.
> 
> The problems is:
> > they are both structure/union/enumeration types, and
> > - (c99) if one is declared with a tag, the other must also be declared
> with the same tag.
> > ...
> > If two declarations refer to the same object or function and do not use
> compatible types,
> > the behavior of the program is undefined.
> 
> Your structure does not have the same tag as the CRT's one.
> Are you sure you want to rely on undefined behaviour?
> 
> I haven't compiled your code, but the following simple example:
> 
> struct A { int a, b, c; };
> struct B { int a, b, c; };
> void printA(struct A *a);
> 
> struct B b = { 1, 2, 3 };
> printA(&b);
> 
> generates a
> 
> warning: passing argument 1 of ‘printA’ from incompatible pointer type [-
> Wincompatible-pointer-types]
>     |     printA(&b);
>     |            ^~
>     |            |
>     |            struct B *
> note: expected ‘struct A *’ but argument is of type ‘struct B *’
>     | void printA(struct A *a)
> 
> Are you sure you wanna add a couple of similar warnings to the project?

This is not what's happening. No warnings, not even from clang diagnostics
with -Weverything.


> Needless to repeat, _USE_32BIT_TIME_T is not supported.

I don't think it ever was. Have you been compiling and using ffmpeg 
successfully with this?

Kind regards,
sw
Hendrik Leppkes May 23, 2022, 5:12 p.m. UTC | #12
On Mon, May 23, 2022 at 5:48 PM Soft Works <softworkz@hotmail.com> wrote:
>
>
>
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of nil-
> > admirari@mailo.com
> > Sent: Monday, May 23, 2022 5:36 PM
> > To: ffmpeg-devel@ffmpeg.org
> > Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> > file_open: Support long file names on Windows
> >
> > >> Not possible for stat precisely because of function and struct sharing
> > a
> > >> name.
> >
> > > That's exactly what is said above and before :-)
> >
> > My previous question was not answered, so I had to look up the answer
> > myself.
> >
> > > I'm actually wondering how does it even compile. All stat structs in
> > code
> > > become struct win32_stat, and all calls to stat become calls to
> > win32_stat,
> > > which in turn wraps _wstati64. But _wstati64 does not accept struct
> > win32_stat*,
> > > it accepts struct _stati64*. Content of these structs is probably
> > identical, but
> > > it should not matter: C is typed nominally, not structurally.
> >
> > Turns out C actually has a concept of compatible types:
> > https://en.cppreference.com/w/c/language/type.
> >
> > The problems is:
> > > they are both structure/union/enumeration types, and
> > > - (c99) if one is declared with a tag, the other must also be declared
> > with the same tag.
> > > ...
> > > If two declarations refer to the same object or function and do not use
> > compatible types,
> > > the behavior of the program is undefined.
> >
> > Your structure does not have the same tag as the CRT's one.
> > Are you sure you want to rely on undefined behaviour?
> >
> > I haven't compiled your code, but the following simple example:
> >
> > struct A { int a, b, c; };
> > struct B { int a, b, c; };
> > void printA(struct A *a);
> >
> > struct B b = { 1, 2, 3 };
> > printA(&b);
> >
> > generates a
> >
> > warning: passing argument 1 of ‘printA’ from incompatible pointer type [-
> > Wincompatible-pointer-types]
> >     |     printA(&b);
> >     |            ^~
> >     |            |
> >     |            struct B *
> > note: expected ‘struct A *’ but argument is of type ‘struct B *’
> >     | void printA(struct A *a)
> >
> > Are you sure you wanna add a couple of similar warnings to the project?
>
> This is not what's happening. No warnings, not even from clang diagnostics
> with -Weverything.
>
>
> > Needless to repeat, _USE_32BIT_TIME_T is not supported.
>
> I don't think it ever was. Have you been compiling and using ffmpeg
> successfully with this?
>

_USE_32BIT_TIME_T is actually the default for 32-bit mingw-w64 builds,
so its certainly a situation on windows that needs to work - and does
in master.

- Hendrik
Soft Works May 24, 2022, 5:32 a.m. UTC | #13
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Hendrik
> Leppkes
> Sent: Monday, May 23, 2022 7:12 PM
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> file_open: Support long file names on Windows
> 
> On Mon, May 23, 2022 at 5:48 PM Soft Works <softworkz@hotmail.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of nil-
> > > admirari@mailo.com
> > > Sent: Monday, May 23, 2022 5:36 PM
> > > To: ffmpeg-devel@ffmpeg.org
> > > Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> > > file_open: Support long file names on Windows
> > >
> > > >> Not possible for stat precisely because of function and struct
> sharing
> > > a
> > > >> name.
> > >
> > > > That's exactly what is said above and before :-)
> > >
> > > My previous question was not answered, so I had to look up the answer
> > > myself.
> > >
> > > > I'm actually wondering how does it even compile. All stat structs in
> > > code
> > > > become struct win32_stat, and all calls to stat become calls to
> > > win32_stat,
> > > > which in turn wraps _wstati64. But _wstati64 does not accept struct
> > > win32_stat*,
> > > > it accepts struct _stati64*. Content of these structs is probably
> > > identical, but
> > > > it should not matter: C is typed nominally, not structurally.
> > >
> > > Turns out C actually has a concept of compatible types:
> > > https://en.cppreference.com/w/c/language/type.
> > >
> > > The problems is:
> > > > they are both structure/union/enumeration types, and
> > > > - (c99) if one is declared with a tag, the other must also be
> declared
> > > with the same tag.
> > > > ...
> > > > If two declarations refer to the same object or function and do not
> use
> > > compatible types,
> > > > the behavior of the program is undefined.
> > >
> > > Your structure does not have the same tag as the CRT's one.
> > > Are you sure you want to rely on undefined behaviour?
> > >
> > > I haven't compiled your code, but the following simple example:
> > >
> > > struct A { int a, b, c; };
> > > struct B { int a, b, c; };
> > > void printA(struct A *a);
> > >
> > > struct B b = { 1, 2, 3 };
> > > printA(&b);
> > >
> > > generates a
> > >
> > > warning: passing argument 1 of ‘printA’ from incompatible pointer type
> [-
> > > Wincompatible-pointer-types]
> > >     |     printA(&b);
> > >     |            ^~
> > >     |            |
> > >     |            struct B *
> > > note: expected ‘struct A *’ but argument is of type ‘struct B *’
> > >     | void printA(struct A *a)
> > >
> > > Are you sure you wanna add a couple of similar warnings to the
> project?
> >
> > This is not what's happening. No warnings, not even from clang
> diagnostics
> > with -Weverything.
> >
> >
> > > Needless to repeat, _USE_32BIT_TIME_T is not supported.
> >
> > I don't think it ever was. Have you been compiling and using ffmpeg
> > successfully with this?
> >
> 
> _USE_32BIT_TIME_T is actually the default for 32-bit mingw-w64 builds,
> so its certainly a situation on windows that needs to work - and does
> in master.

Are you sure? It is not the default for 32 bit builds with MSVC.

But it could be made working with the way that I'm using in my current 
patchset (with a conditional define). So that's not really the question.

The primary question is whether the (admittedly) ugly way I'm
using is acceptable at all.

Suggestions are very welcome ;-) 
Would you have an idea perhaps?

Thanks,
softworkz
Soft Works May 24, 2022, 5:54 a.m. UTC | #14
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft
> Works
> Sent: Tuesday, May 24, 2022 7:33 AM
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> file_open: Support long file names on Windows
> 
> 
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Hendrik
> > Leppkes
> > Sent: Monday, May 23, 2022 7:12 PM
> > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> > Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> > file_open: Support long file names on Windows
> >
> > On Mon, May 23, 2022 at 5:48 PM Soft Works <softworkz@hotmail.com>
> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> nil-
> > > > admirari@mailo.com
> > > > Sent: Monday, May 23, 2022 5:36 PM
> > > > To: ffmpeg-devel@ffmpeg.org
> > > > Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> > > > file_open: Support long file names on Windows
> > > >
> > > > >> Not possible for stat precisely because of function and struct
> > sharing
> > > > a
> > > > >> name.
> > > >
> > > > > That's exactly what is said above and before :-)
> > > >
> > > > My previous question was not answered, so I had to look up the
> answer
> > > > myself.
> > > >
> > > > > I'm actually wondering how does it even compile. All stat structs
> in
> > > > code
> > > > > become struct win32_stat, and all calls to stat become calls to
> > > > win32_stat,
> > > > > which in turn wraps _wstati64. But _wstati64 does not accept
> struct
> > > > win32_stat*,
> > > > > it accepts struct _stati64*. Content of these structs is probably
> > > > identical, but
> > > > > it should not matter: C is typed nominally, not structurally.
> > > >
> > > > Turns out C actually has a concept of compatible types:
> > > > https://en.cppreference.com/w/c/language/type.
> > > >
> > > > The problems is:
> > > > > they are both structure/union/enumeration types, and
> > > > > - (c99) if one is declared with a tag, the other must also be
> > declared
> > > > with the same tag.
> > > > > ...
> > > > > If two declarations refer to the same object or function and do
> not
> > use
> > > > compatible types,
> > > > > the behavior of the program is undefined.
> > > >
> > > > Your structure does not have the same tag as the CRT's one.
> > > > Are you sure you want to rely on undefined behaviour?
> > > >
> > > > I haven't compiled your code, but the following simple example:
> > > >
> > > > struct A { int a, b, c; };
> > > > struct B { int a, b, c; };
> > > > void printA(struct A *a);
> > > >
> > > > struct B b = { 1, 2, 3 };
> > > > printA(&b);
> > > >
> > > > generates a
> > > >
> > > > warning: passing argument 1 of ‘printA’ from incompatible pointer
> type
> > [-
> > > > Wincompatible-pointer-types]
> > > >     |     printA(&b);
> > > >     |            ^~
> > > >     |            |
> > > >     |            struct B *
> > > > note: expected ‘struct A *’ but argument is of type ‘struct B *’
> > > >     | void printA(struct A *a)
> > > >
> > > > Are you sure you wanna add a couple of similar warnings to the
> > project?
> > >
> > > This is not what's happening. No warnings, not even from clang
> > diagnostics
> > > with -Weverything.
> > >
> > >
> > > > Needless to repeat, _USE_32BIT_TIME_T is not supported.
> > >
> > > I don't think it ever was. Have you been compiling and using ffmpeg
> > > successfully with this?
> > >
> >
> > _USE_32BIT_TIME_T is actually the default for 32-bit mingw-w64 builds,
> > so its certainly a situation on windows that needs to work - and does
> > in master.
> 
> Are you sure? It is not the default for 32 bit builds with MSVC.

Sorry, you're right. I found it. It's in _mingw.h.

I'll update the patch.

Thanks,
sw
Soft Works May 24, 2022, 9:47 a.m. UTC | #15
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft
> Works
> Sent: Monday, May 23, 2022 5:48 PM
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> file_open: Support long file names on Windows
> 
> 
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of nil-
> > admirari@mailo.com
> > Sent: Monday, May 23, 2022 5:36 PM
> > To: ffmpeg-devel@ffmpeg.org
> > Subject: Re: [FFmpeg-devel] [PATCH v2 1/2] avutil/wchar_filename,
> > file_open: Support long file names on Windows
> >
> > >> Not possible for stat precisely because of function and struct
> sharing
> > a
> > >> name.
> >
> > > That's exactly what is said above and before :-)
> >
> > My previous question was not answered, so I had to look up the answer
> > myself.
> >
> > > I'm actually wondering how does it even compile. All stat structs in
> > code
> > > become struct win32_stat, and all calls to stat become calls to
> > win32_stat,
> > > which in turn wraps _wstati64. But _wstati64 does not accept struct
> > win32_stat*,
> > > it accepts struct _stati64*. Content of these structs is probably
> > identical, but
> > > it should not matter: C is typed nominally, not structurally.
> >
> > Turns out C actually has a concept of compatible types:
> > https://en.cppreference.com/w/c/language/type.
> >
> > The problems is:
> > > they are both structure/union/enumeration types, and
> > > - (c99) if one is declared with a tag, the other must also be declared
> > with the same tag.
> > > ...
> > > If two declarations refer to the same object or function and do not
> use
> > compatible types,
> > > the behavior of the program is undefined.
> >
> > Your structure does not have the same tag as the CRT's one.
> > Are you sure you want to rely on undefined behaviour?
> >
> > I haven't compiled your code, but the following simple example:
> >
> > struct A { int a, b, c; };
> > struct B { int a, b, c; };
> > void printA(struct A *a);
> >
> > struct B b = { 1, 2, 3 };
> > printA(&b);
> >
> > generates a
> >
> > warning: passing argument 1 of ‘printA’ from incompatible pointer type
> [-
> > Wincompatible-pointer-types]
> >     |     printA(&b);
> >     |            ^~
> >     |            |
> >     |            struct B *
> > note: expected ‘struct A *’ but argument is of type ‘struct B *’
> >     | void printA(struct A *a)
> >
> > Are you sure you wanna add a couple of similar warnings to the project?
> 
> This is not what's happening. No warnings, not even from clang diagnostics
> with -Weverything.

I'm sorry, you were right and I was wrong. The includes and macros had hidden
away all the warnings.

For my new solution which I had submitted today, I had made a test in a code
file where I put all the new things directly and expanded the macros.

This gives in fact the kind of error you mentioned:

... function': incompatible types - from 'win32_stat *' to '_stat64 *'


I had explained the way it works to Martin in another response. The 
relevant part is:

The struct:

    struct win32_stat
    {
        struct _stati64;
    };

makes use of a MS compiler feature called "anonymous structures":
This way, we automatically "inherit" the members of the struct.

Now, that still gives the incompatible type warning. 

If we would
want to get rid of this, we could define the struct as follows:


    struct win32_stat
    {
        union
        {
            struct _stati64;
            struct _stati64 stat;
        };
    };

The union is anonymous and includes _stati64 twice: once anonymous
and once named.

This would allow us to define our win32_stat function like this:


static inline int win32_stat(const char *filename_utf8, struct win32_stat *par)
{
    wchar_t *filename_w;
    int ret;
    if (get_extended_win32_path(filename_utf8, &filename_w))
        return -1;
    if (!filename_w)
        goto fallback;
    ret = _wstat64(filename_w, &par->stat);
    av_free(filename_w);
    return ret;
fallback:
      return _stat64(filename_utf8, &par->stat);
}

so it uses the ->stat member for doing the api calls while
the calling (ffmpeg) code can use the structure as if it was the 
actual POSIX stat structure.


Kind regards,
sw
Nil Admirari May 24, 2022, 12:11 p.m. UTC | #16
> If we would
> want to get rid of this, we could define the struct as follows:
>
> struct win32_stat
> {
> union
> {
> struct _stati64;
> struct _stati64 stat;
> };
> };
>
> The union is anonymous and includes _stati64 twice: once anonymous
> and once named.
>
> This would allow us to define our win32_stat function like this:
>
> static inline int win32_stat(const char *filename_utf8, struct win32_stat *par)
> {
> wchar_t *filename_w;
> int ret;
> if (get_extended_win32_path(filename_utf8, &filename_w))
> return -1;
> if (!filename_w)
> goto fallback;
> ret = _wstat64(filename_w, &par->stat);
> av_free(filename_w);
> return ret;
> fallback:
> return _stat64(filename_utf8, &par->stat);
> }
>
> so it uses the ->stat member for doing the api calls while
> the calling (ffmpeg) code can use the structure as if it was the 
> actual POSIX stat structure.

I'm fine with anonymous union and a rewrite of win32_stat.
But, anonymous unions are a C11 feature: https://en.cppreference.com/w/c/language/union,
and C11 is apparently not allowed https://ffmpeg.org/developer.html#C-language-features.
diff mbox series

Patch

diff --git a/libavutil/file_open.c b/libavutil/file_open.c
index cc302f2f76..57c5e78d51 100644
--- a/libavutil/file_open.c
+++ b/libavutil/file_open.c
@@ -45,7 +45,7 @@  static int win32_open(const char *filename_utf8, int oflag, int pmode)
     wchar_t *filename_w;
 
     /* convert UTF-8 to wide chars */
-    if (utf8towchar(filename_utf8, &filename_w))
+    if (get_extended_win32_path(filename_utf8, &filename_w))
         return -1;
     if (!filename_w)
         goto fallback;
diff --git a/libavutil/wchar_filename.h b/libavutil/wchar_filename.h
index 90f082452c..78e7dd47d7 100644
--- a/libavutil/wchar_filename.h
+++ b/libavutil/wchar_filename.h
@@ -40,6 +40,162 @@  static inline int utf8towchar(const char *filename_utf8, wchar_t **filename_w)
     MultiByteToWideChar(CP_UTF8, 0, filename_utf8, -1, *filename_w, num_chars);
     return 0;
 }
+
+/**
+ * Checks for extended path prefixes for which normalization needs to be skipped.
+ * see .NET6: PathInternal.IsExtended()
+ */
+static inline int path_is_extended(const wchar_t *path)
+{
+    size_t len = wcslen(path);
+    if (len >= 4  && path[0] == L'\\' && (path[1] == L'\\' || path[1] == L'?') && path[2] == L'?' && path[3] == L'\\')
+        return 1;
+
+    return 0;
+}
+
+/**
+ * Performs path normalization by calling GetFullPathNameW().
+ * see .NET6: PathHelper.GetFullPathName()
+ */
+static inline int get_full_path_name(wchar_t **ppath_w)
+{
+    int num_chars;
+    wchar_t *temp_w;
+
+    num_chars = GetFullPathNameW(*ppath_w, 0, NULL, NULL);
+    if (num_chars <= 0) {
+        errno = EINVAL;
+        return -1;
+    }
+
+    temp_w = (wchar_t *)av_calloc(num_chars, sizeof(wchar_t));
+    if (!temp_w) {
+        errno = ENOMEM;
+        return -1;
+    }
+
+    num_chars = GetFullPathNameW(*ppath_w, num_chars, temp_w, NULL);
+    if (num_chars <= 0) {
+        errno = EINVAL;
+        return -1;
+    }
+
+    av_freep(ppath_w);
+    *ppath_w = temp_w;
+
+    return 0;
+}
+
+/**
+ * Normalizes a Windows file or folder path.
+ * Expansion of short paths (with 8.3 path components) is currently omitted
+ * as it is not required for accessing long paths.
+ * see .NET6: PathHelper.Normalize().
+ */
+static inline int path_normalize(wchar_t **ppath_w)
+{
+    int ret;
+
+    if ((ret = get_full_path_name(ppath_w)) < 0)
+        return ret;
+
+    /* What .NET does at this point is to call PathHelper.TryExpandShortFileName()
+     * in case the path contains a '~' character.
+     * We don't need to do this as we don't need to normalize the file name
+     * for presentation, and the extended path prefix works with 8.3 path
+     * components as well
+     */
+    return 0;
+}
+
+/**
+ * Adds an extended path or UNC prefix to longs paths or paths ending
+ * with a space or a dot. (' ' or '.').
+ * This function expects that the path has been normalized before by
+ * calling path_normalize().
+ * see .NET6: PathInternal.EnsureExtendedPrefix() *
+ */
+static inline int add_extended_prefix(wchar_t **ppath_w)
+{
+    const wchar_t *unc_prefix           = L"\\\\?\\UNC\\";
+    const wchar_t *extended_path_prefix = L"\\\\?\\";
+    const wchar_t *path_w               = *ppath_w;
+    const size_t len                    = wcslen(path_w);
+    wchar_t *temp_w;
+
+    if (len < 2)
+        return 0;
+
+    /* We're skipping the check IsPartiallyQualified() because
+     * we know we have called GetFullPathNameW() already, also
+     * we don't check IsDevice() because device paths are not
+     * allowed to be long paths and we're calling this only
+     * for long paths.
+     */
+    if (path_w[0] == L'\\' && path_w[1] == L'\\') {
+        // The length of unc_prefix is 6 plus 1 for terminating zeros
+        temp_w = (wchar_t *)av_calloc(len + 6 + 1, sizeof(wchar_t));
+        if (!temp_w) {
+            errno = ENOMEM;
+            return -1;
+        }
+        wcscpy(temp_w, unc_prefix);
+        wcscat(temp_w, path_w + 2);
+    } else {
+        // The length of extended_path_prefix is 4 plus 1 for terminating zeros
+        temp_w = (wchar_t *)av_calloc(len + 4 + 1, sizeof(wchar_t));
+        if (!temp_w) {
+            errno = ENOMEM;
+            return -1;
+        }
+        wcscpy(temp_w, extended_path_prefix);
+        wcscat(temp_w, path_w);
+    }
+
+    av_freep(ppath_w);
+    *ppath_w = temp_w;
+
+    return 0;
+}
+
+/**
+ * Converts a file or folder path to wchar_t for use with Windows file
+ * APIs. Paths with extended path prefix (either '\\?\' or \??\') are
+ * left unchanged.
+ * All other paths are normalized and converted to absolute paths.
+ * Longs paths (>= 260) are prefixed with the extended path or extended
+ * UNC path prefix.
+ * see .NET6: Path.GetFullPath() and Path.GetFullPathInternal()
+ */
+static inline int get_extended_win32_path(const char *path, wchar_t **ppath_w)
+{
+    int ret;
+    size_t len;
+
+    if ((ret = utf8towchar(path, ppath_w)) < 0)
+        return ret;
+
+    if (path_is_extended(*ppath_w)) {
+        /* Paths prefixed with '\\?\' or \??\' are considered normalized by definition.
+         * Windows doesn't normalize those paths and neither should we.
+         */
+        return 0;
+    }
+
+    if ((ret = path_normalize(ppath_w)) < 0)
+        return ret;
+
+    // see .NET6: PathInternal.EnsureExtendedPrefixIfNeeded()
+    len = wcslen(*ppath_w);
+    if (len >= 260 || (*ppath_w)[len - 1] == L' ' || (*ppath_w)[len - 1] == L'.') {
+        if ((ret = add_extended_prefix(ppath_w)) < 0)
+            return ret;
+    }
+
+    return 0;
+}
+
 #endif
 
 #endif /* AVUTIL_WCHAR_FILENAME_H */