diff mbox series

[FFmpeg-devel,v12,1/4] libavutil/wchar_filename.h: Add whcartoutf8, wchartoansi and utf8toansi

Message ID 20220605113542.12280-1-nil-admirari@mailo.com
State New
Headers show
Series [FFmpeg-devel,v12,1/4] libavutil/wchar_filename.h: Add whcartoutf8, wchartoansi and utf8toansi | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished

Commit Message

Nil Admirari June 5, 2022, 11:35 a.m. UTC
These functions are going to be used in libavformat/avisynth.c
and fftools/cmdutils.c to remove MAX_PATH limit.
---
 libavutil/wchar_filename.h | 51 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

Comments

Martin Storsjö June 9, 2022, 10:10 a.m. UTC | #1
On Sun, 5 Jun 2022, Nil Admirari wrote:

> These functions are going to be used in libavformat/avisynth.c
> and fftools/cmdutils.c to remove MAX_PATH limit.
> ---
> libavutil/wchar_filename.h | 51 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 51 insertions(+)

This patchset looks good to me too, thanks! If there's no more comments on 
it, I'll push it later.

// Martin
Martin Storsjö June 9, 2022, 7:09 p.m. UTC | #2
On Thu, 9 Jun 2022, Soft Works wrote:

>
>
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Martin
>> Storsjö
>> Sent: Thursday, June 9, 2022 12:10 PM
>> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
>> Subject: Re: [FFmpeg-devel] [PATCH v12 1/4] libavutil/wchar_filename.h: Add
>> whcartoutf8, wchartoansi and utf8toansi
>>
>> On Sun, 5 Jun 2022, Nil Admirari wrote:
>>
>>> These functions are going to be used in libavformat/avisynth.c
>>> and fftools/cmdutils.c to remove MAX_PATH limit.
>>> ---
>>> libavutil/wchar_filename.h | 51 ++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 51 insertions(+)
>>
>> This patchset looks good to me too, thanks! If there's no more comments on
>> it, I'll push it later.
>
> I missed to take another look at it.
>
> I'm wondering why it converts wchar back to ansi instead of utf8 in 4/4 and
> whether it won't fail then, when a path contains non-ANSI chars.

Yes, that's a preexisting problem there. That patch gets rid of the fixed 
path lengths without touching the rest of the charset handling there.

There's paths from two sources; getenv() and from GetModuleFileName(). 
These are passed to plain fopen() (which uses ANSI filenames).

To make it entirely unicode compliant, we'd need to replace getenv() with 
a wrapper that uses _wgetenv() and converts that to utf8, then convert the 
GetModuleFileName() output to utf8 too, and switch the fopen() to 
fopen_utf8.

So the patch in itself is fine - it fixes one aspect, and doesn't make the 
other aspect worse.

// Martin
Nil Admirari June 9, 2022, 7:19 p.m. UTC | #3
>> I'm wondering why it converts wchar back to ansi instead of utf8 in 4/4 and
>> whether it won't fail then, when a path contains non-ANSI chars.
>
> Yes, that's a preexisting problem there. That patch gets rid of the fixed 
> path lengths without touching the rest of the charset handling there.
>
> There's paths from two sources; getenv() and from GetModuleFileName(). 
> These are passed to plain fopen() (which uses ANSI filenames).
>
> To make it entirely unicode compliant, we'd need to replace getenv() with 
> a wrapper that uses _wgetenv() and converts that to utf8, then convert the 
> GetModuleFileName() output to utf8 too, and switch the fopen() to 
> fopen_utf8.

Actually utf8toansi() was created for AviSynth, which doesn't have a Unicode API.

getenv() and plain fopen() are used in cmdutils.c, which is the only file that
haven't transitioned to avpriv_fopen_utf8() yet, but utf8toansi() is not used in cmdutils.c.
Martin Storsjö June 9, 2022, 8:23 p.m. UTC | #4
On Thu, 9 Jun 2022, Soft Works wrote:

>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Martin
>> Storsjö
>> Sent: Thursday, June 9, 2022 9:09 PM
>> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
>> Subject: Re: [FFmpeg-devel] [PATCH v12 1/4] libavutil/wchar_filename.h: Add
>> whcartoutf8, wchartoansi and utf8toansi
>>
>> On Thu, 9 Jun 2022, Soft Works wrote:
>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Martin
>>>> Storsjö
>>>> Sent: Thursday, June 9, 2022 12:10 PM
>>>> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
>>>> Subject: Re: [FFmpeg-devel] [PATCH v12 1/4] libavutil/wchar_filename.h:
>> Add
>>>> whcartoutf8, wchartoansi and utf8toansi
>>>>
>>>> On Sun, 5 Jun 2022, Nil Admirari wrote:
>>>>
>>>>> These functions are going to be used in libavformat/avisynth.c
>>>>> and fftools/cmdutils.c to remove MAX_PATH limit.
>>>>> ---
>>>>> libavutil/wchar_filename.h | 51 ++++++++++++++++++++++++++++++++++++++
>>>>> 1 file changed, 51 insertions(+)
>>>>
>>>> This patchset looks good to me too, thanks! If there's no more comments on
>>>> it, I'll push it later.
>>>
>>> I missed to take another look at it.
>>>
>>> I'm wondering why it converts wchar back to ansi instead of utf8 in 4/4 and
>>> whether it won't fail then, when a path contains non-ANSI chars.
>>
>> Yes, that's a preexisting problem there. That patch gets rid of the fixed
>> path lengths without touching the rest of the charset handling there.
>>
>> There's paths from two sources; getenv() and from GetModuleFileName().
>> These are passed to plain fopen() (which uses ANSI filenames).
>
> What I meant is the use of wchartoansi() in patch 4/4.
>
> The current code calls GetModuleFileNameA()
> the new code calls GetModuleFileNameW() and then converts the output
> file name with wchartoansi().
>
> Can we be sure that this is always the same?
> Especially when the path contains non-ANSI chars?
>
> That's something quite different. GetModuleFileNameA() will return a
> valid and usable path (but it could be an 8.3 path or have some
> replacement chars)

Right, I wasn't aware that the -A version would return a 
guaranteed-ansi-compatible version of the filename. If that's really the 
case, this patch would indeed be a minor step backwards.

// Martin
Nil Admirari June 9, 2022, 9:12 p.m. UTC | #5
> Right, I wasn't aware that the -A version would return a 
> guaranteed-ansi-compatible version of the filename. If that's really the 
> case, this patch would indeed be a minor step backwards.

Two options are available:
1. fopen() is replaced with avpriv_fopen_utf8(), getenv() is made Unicode-aware
    on Windows, and wide version of get_module_filename() is used as it is now.
2. A wrapper around GetModuleFilenameA() created specifically for this case,
    only to be replaced later by option one, when avpriv_fopen_utf8() gets merged.

If option one is chosen, the patch will have to wait for avpriv_fopen_utf8() patches.
Martin Storsjö June 9, 2022, 9:24 p.m. UTC | #6
On Thu, 9 Jun 2022, nil-admirari@mailo.com wrote:

>> Right, I wasn't aware that the -A version would return a
>> guaranteed-ansi-compatible version of the filename. If that's really the
>> case, this patch would indeed be a minor step backwards.
>
> Two options are available:
> 1. fopen() is replaced with avpriv_fopen_utf8(), getenv() is made Unicode-aware
>    on Windows, and wide version of get_module_filename() is used as it is now.
> 2. A wrapper around GetModuleFilenameA() created specifically for this case,
>    only to be replaced later by option one, when avpriv_fopen_utf8() gets merged.
>
> If option one is chosen, the patch will have to wait for avpriv_fopen_utf8() patches.

For that, the first option sounds better - that sounds to me more like a 
direction forward, not backwards.

The patches for *_fopen_utf8 have already been merged - within the 
libraries, you can use avpriv_fopen_utf8, and fftools has got a separate 
"fopen_utf8" function it can use.

// Martin
Nil Admirari June 13, 2022, 4:36 p.m. UTC | #7
>> ...
>> 1. fopen() is replaced with avpriv_fopen_utf8(), getenv() is made Unicode-aware
>> on Windows, and wide version of get_module_filename() is used as it is now.
> ...
> For that, the first option sounds better - that sounds to me more like a 
> direction forward, not backwards.

Done: https://ffmpeg.org/pipermail/ffmpeg-devel/2022-June/297491.html.
getenv_utf8 is a static inline in its own header in libavutil.

log.c, bktr.c, fbdev_common.c, oss.c and af_ladspa.c still use plain getenv(),
please check commit message for details.

All other uses of getenv() have been replaced with getenv_utf8():
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-June/297495.html
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-June/297494.html
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-June/297493.html
diff mbox series

Patch

diff --git a/libavutil/wchar_filename.h b/libavutil/wchar_filename.h
index 90f0824..c0e5d47 100644
--- a/libavutil/wchar_filename.h
+++ b/libavutil/wchar_filename.h
@@ -40,6 +40,57 @@  static inline int utf8towchar(const char *filename_utf8, wchar_t **filename_w)
     MultiByteToWideChar(CP_UTF8, 0, filename_utf8, -1, *filename_w, num_chars);
     return 0;
 }
+
+av_warn_unused_result
+static inline int wchartocp(unsigned int code_page, const wchar_t *filename_w,
+                            char **filename)
+{
+    DWORD flags = code_page == CP_UTF8 ? MB_ERR_INVALID_CHARS : 0;
+    int num_chars = WideCharToMultiByte(code_page, flags, filename_w, -1,
+                                        NULL, 0, NULL, NULL);
+    if (num_chars <= 0) {
+        *filename = NULL;
+        return 0;
+    }
+    *filename = av_calloc(num_chars, sizeof(char));
+    if (!*filename) {
+        errno = ENOMEM;
+        return -1;
+    }
+    WideCharToMultiByte(code_page, flags, filename_w, -1,
+                        *filename, num_chars, NULL, NULL);
+    return 0;
+}
+
+av_warn_unused_result
+static inline int wchartoutf8(const wchar_t *filename_w, char **filename)
+{
+    return wchartocp(CP_UTF8, filename_w, filename);
+}
+
+av_warn_unused_result
+static inline int wchartoansi(const wchar_t *filename_w, char **filename)
+{
+    return wchartocp(CP_ACP, filename_w, filename);
+}
+
+av_warn_unused_result
+static inline int utf8toansi(const char *filename_utf8, char **filename)
+{
+    wchar_t *filename_w = NULL;
+    int ret = -1;
+    if (utf8towchar(filename_utf8, &filename_w))
+        return -1;
+
+    if (!filename_w) {
+        *filename = NULL;
+        return 0;
+    }
+
+    ret = wchartoansi(filename_w, filename);
+    av_free(filename_w);
+    return ret;
+}
 #endif
 
 #endif /* AVUTIL_WCHAR_FILENAME_H */