diff mbox series

[FFmpeg-devel,v11,1/6] libavutil/wchar_filename.h: Add whcartoutf8, wchartoansi and utf8toansi

Message ID 20220423205626.39039-1-nil-admirari@mailo.com
State New
Headers show
Series [FFmpeg-devel,v11,1/6] libavutil/wchar_filename.h: Add whcartoutf8, wchartoansi and utf8toansi | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Nil Admirari April 23, 2022, 8:56 p.m. UTC
These functions are going to be used in libavformat/avisynth.c
and fftools/cmdutils.c to remove MAX_PATH limit.
---
 libavutil/wchar_filename.h | 51 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

Comments

Soft Works April 24, 2022, 3:39 a.m. UTC | #1
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Nil
> Admirari
> Sent: Saturday, April 23, 2022 10:56 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: [FFmpeg-devel] [PATCH v11 1/6] libavutil/wchar_filename.h:
> Add whcartoutf8, wchartoansi and utf8toansi
> 
> These functions are going to be used in libavformat/avisynth.c
> and fftools/cmdutils.c to remove MAX_PATH limit.
> ---

Hi,

thanks for submitting this patchset.

I'm afraid that I hadn't found time to look into this at an earlier stage,
so please apologize if one of my questions has been covered before.



1. Patch 3/6 - Replace LoadLibraryExA with LoadLibraryExW

What's the point in making changes to library loading? What does it fix?
Which dll cannot be loaded with the current implementation and what 
would be the location of the dll and the location of the exe in that case?

Could you please give an example of a situation that this is supposed 
to fix?



2. Patches 5/6 and 6/6 - Add Fusion Manifest

The manifest that you want to add includes two settings:

 - longPathAware
   effective on Windows 10, version 1607 or later

 - activeCodePage
   effective on Windows 10, version 1903 or later

Both of these manifest attributes are affecting the runtime behavior of
an application running on Windows - but only starting from a certain 
OS version. That means - effectively you would end up having an ffmpeg
executable with three different kinds of behavior:

1. Win < 1607
2. Win >= 1607 && < 1903
3. Win >= 1903

An ffmpeg executable, where you can't rely on the exposed behavior being
consistent and where you would need to check the operating system version 
before using to make sure that you are providing parameters in the "right"
way - I'm not sure whether that would make much sense.



3. All Patches x/6 - Remove MAX_PATH limit

The punch line sounds compelling. But how is the current situation and 
what exactly would be the benefit?

What's important to understand is that the basic Windows file APIs are
supporting long (> MAX_PATH length) for a long time already.
MS has just been a bit hesitant regarding the transition and wanted to
avoid breaking changes to the API. The outcome was that long paths are
only supported when using a lower-level path syntax, which is as simple 
as prepending "\\?\" to the actual path, no matter whether drive or UNC
network path.

As an example, the following command runs without issue on Windows 
with a normal current ffmpeg build:

.\ffmpeg.exe -i "\\?\U:\TestMedia\This is a very long path - This is a very long path - This is a very long path - This is a very long path - This is a very long path -\This is a very long path - This is a very long path - This is a very long path - This is a very lon\aaaaaaaaaaaaaassssssssssssssdddddddddd\TestMediaThis is a very long path - This is a very long path - This is a very long path - This is a very long path - This is a very long path -This is a very long path - This is a very long path - Thi - 不要抬头.ts"

It's not only a path with len > MAX_PATH, the name also includes 
Chinese (non-ANSI) characters.
All this is working in a predictable and reliable way on all common Windows
versions.


Recently, MS have taken additional effort in order to ease compatibility 
and improve platform portability, by allowing applications to use ANSI
file APIs with long paths and UTF-8 charset. 
BUT: there are several prerequisites for this to work:

1. Windows version needs to be >= 1903
   (for being able to have both attributes in effect)

2. Application needs to be compiled with and manifest file as resource

3. A registry key or group policy needs to be set on Windows to enable this
´  (in both cases, administrative permission/UAC is required to set it)

4. Even when registry key or group policy is set, it might still be pending
   a reboot


SUMMARY

From my understanding, the benefit of this patchset would be:

When 1. and 2. and 3. and 4. are fulfilled (and only then) - it would be 
possible to use long path names in ffmpeg _without_ prefixing it with '\\?\'

On the other side, there's a risk of regressions by adding those manifest
attributes.

I wonder whether it wouldn’t be a better idea, to simply auto-add this prefix
in the ffmpeg file handling code if:

- OS == Windows
- len > MAX_PATH

This would work on all common Windows versions and would be predictable and
reliable.

Best regards,
softworkz
Soft Works May 7, 2022, 5:57 p.m. UTC | #2
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Nil
> Admirari
> Sent: Saturday, April 23, 2022 10:56 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: [FFmpeg-devel] [PATCH v11 1/6] libavutil/wchar_filename.h:
> Add whcartoutf8, wchartoansi and utf8toansi
> 
> These functions are going to be used in libavformat/avisynth.c
> and fftools/cmdutils.c to remove MAX_PATH limit.
> ---
>  libavutil/wchar_filename.h | 51
> ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/libavutil/wchar_filename.h b/libavutil/wchar_filename.h
> index 90f08245..c0e5d47e 100644
> --- a/libavutil/wchar_filename.h
> +++ b/libavutil/wchar_filename.h
> @@ -40,6 +40,57 @@ static inline int utf8towchar(const char
> *filename_utf8, wchar_t **filename_w)
>      MultiByteToWideChar(CP_UTF8, 0, filename_utf8, -1, *filename_w,
> num_chars);
>      return 0;
>  }
> +
> +av_warn_unused_result
> +static inline int wchartocp(unsigned int code_page, const wchar_t
> *filename_w,
> +                            char **filename)
> +{
> +    DWORD flags = code_page == CP_UTF8 ? MB_ERR_INVALID_CHARS : 0;
> +    int num_chars = WideCharToMultiByte(code_page, flags, filename_w,
> -1,
> +                                        NULL, 0, NULL, NULL);
> +    if (num_chars <= 0) {
> +        *filename = NULL;
> +        return 0;
> +    }
> +    *filename = av_calloc(num_chars, sizeof(char));
> +    if (!*filename) {
> +        errno = ENOMEM;
> +        return -1;
> +    }
> +    WideCharToMultiByte(code_page, flags, filename_w, -1,
> +                        *filename, num_chars, NULL, NULL);
> +    return 0;
> +}
> +
> +av_warn_unused_result
> +static inline int wchartoutf8(const wchar_t *filename_w, char
> **filename)
> +{
> +    return wchartocp(CP_UTF8, filename_w, filename);
> +}
> +
> +av_warn_unused_result
> +static inline int wchartoansi(const wchar_t *filename_w, char
> **filename)
> +{
> +    return wchartocp(CP_ACP, filename_w, filename);
> +}
> +
> +av_warn_unused_result
> +static inline int utf8toansi(const char *filename_utf8, char
> **filename)
> +{
> +    wchar_t *filename_w = NULL;
> +    int ret = -1;
> +    if (utf8towchar(filename_utf8, &filename_w))
> +        return -1;
> +
> +    if (!filename_w) {
> +        *filename = NULL;
> +        return 0;
> +    }
> +
> +    ret = wchartoansi(filename_w, filename);
> +    av_free(filename_w);
> +    return ret;
> +}
>  #endif
> 
>  #endif /* AVUTIL_WCHAR_FILENAME_H */
> --
> 2.32.0
> 

LGTM now as it seems to be of use in several places.
diff mbox series

Patch

diff --git a/libavutil/wchar_filename.h b/libavutil/wchar_filename.h
index 90f08245..c0e5d47e 100644
--- a/libavutil/wchar_filename.h
+++ b/libavutil/wchar_filename.h
@@ -40,6 +40,57 @@  static inline int utf8towchar(const char *filename_utf8, wchar_t **filename_w)
     MultiByteToWideChar(CP_UTF8, 0, filename_utf8, -1, *filename_w, num_chars);
     return 0;
 }
+
+av_warn_unused_result
+static inline int wchartocp(unsigned int code_page, const wchar_t *filename_w,
+                            char **filename)
+{
+    DWORD flags = code_page == CP_UTF8 ? MB_ERR_INVALID_CHARS : 0;
+    int num_chars = WideCharToMultiByte(code_page, flags, filename_w, -1,
+                                        NULL, 0, NULL, NULL);
+    if (num_chars <= 0) {
+        *filename = NULL;
+        return 0;
+    }
+    *filename = av_calloc(num_chars, sizeof(char));
+    if (!*filename) {
+        errno = ENOMEM;
+        return -1;
+    }
+    WideCharToMultiByte(code_page, flags, filename_w, -1,
+                        *filename, num_chars, NULL, NULL);
+    return 0;
+}
+
+av_warn_unused_result
+static inline int wchartoutf8(const wchar_t *filename_w, char **filename)
+{
+    return wchartocp(CP_UTF8, filename_w, filename);
+}
+
+av_warn_unused_result
+static inline int wchartoansi(const wchar_t *filename_w, char **filename)
+{
+    return wchartocp(CP_ACP, filename_w, filename);
+}
+
+av_warn_unused_result
+static inline int utf8toansi(const char *filename_utf8, char **filename)
+{
+    wchar_t *filename_w = NULL;
+    int ret = -1;
+    if (utf8towchar(filename_utf8, &filename_w))
+        return -1;
+
+    if (!filename_w) {
+        *filename = NULL;
+        return 0;
+    }
+
+    ret = wchartoansi(filename_w, filename);
+    av_free(filename_w);
+    return ret;
+}
 #endif
 
 #endif /* AVUTIL_WCHAR_FILENAME_H */