Message ID | 20220605113542.12280-1-nil-admirari@mailo.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel,v12,1/4] libavutil/wchar_filename.h: Add whcartoutf8, wchartoansi and utf8toansi | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
On Sun, 5 Jun 2022, Nil Admirari wrote: > These functions are going to be used in libavformat/avisynth.c > and fftools/cmdutils.c to remove MAX_PATH limit. > --- > libavutil/wchar_filename.h | 51 ++++++++++++++++++++++++++++++++++++++ > 1 file changed, 51 insertions(+) This patchset looks good to me too, thanks! If there's no more comments on it, I'll push it later. // Martin
On Thu, 9 Jun 2022, Soft Works wrote: > > >> -----Original Message----- >> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Martin >> Storsjö >> Sent: Thursday, June 9, 2022 12:10 PM >> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> >> Subject: Re: [FFmpeg-devel] [PATCH v12 1/4] libavutil/wchar_filename.h: Add >> whcartoutf8, wchartoansi and utf8toansi >> >> On Sun, 5 Jun 2022, Nil Admirari wrote: >> >>> These functions are going to be used in libavformat/avisynth.c >>> and fftools/cmdutils.c to remove MAX_PATH limit. >>> --- >>> libavutil/wchar_filename.h | 51 ++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 51 insertions(+) >> >> This patchset looks good to me too, thanks! If there's no more comments on >> it, I'll push it later. > > I missed to take another look at it. > > I'm wondering why it converts wchar back to ansi instead of utf8 in 4/4 and > whether it won't fail then, when a path contains non-ANSI chars. Yes, that's a preexisting problem there. That patch gets rid of the fixed path lengths without touching the rest of the charset handling there. There's paths from two sources; getenv() and from GetModuleFileName(). These are passed to plain fopen() (which uses ANSI filenames). To make it entirely unicode compliant, we'd need to replace getenv() with a wrapper that uses _wgetenv() and converts that to utf8, then convert the GetModuleFileName() output to utf8 too, and switch the fopen() to fopen_utf8. So the patch in itself is fine - it fixes one aspect, and doesn't make the other aspect worse. // Martin
>> I'm wondering why it converts wchar back to ansi instead of utf8 in 4/4 and >> whether it won't fail then, when a path contains non-ANSI chars. > > Yes, that's a preexisting problem there. That patch gets rid of the fixed > path lengths without touching the rest of the charset handling there. > > There's paths from two sources; getenv() and from GetModuleFileName(). > These are passed to plain fopen() (which uses ANSI filenames). > > To make it entirely unicode compliant, we'd need to replace getenv() with > a wrapper that uses _wgetenv() and converts that to utf8, then convert the > GetModuleFileName() output to utf8 too, and switch the fopen() to > fopen_utf8. Actually utf8toansi() was created for AviSynth, which doesn't have a Unicode API. getenv() and plain fopen() are used in cmdutils.c, which is the only file that haven't transitioned to avpriv_fopen_utf8() yet, but utf8toansi() is not used in cmdutils.c.
On Thu, 9 Jun 2022, Soft Works wrote: >> -----Original Message----- >> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Martin >> Storsjö >> Sent: Thursday, June 9, 2022 9:09 PM >> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> >> Subject: Re: [FFmpeg-devel] [PATCH v12 1/4] libavutil/wchar_filename.h: Add >> whcartoutf8, wchartoansi and utf8toansi >> >> On Thu, 9 Jun 2022, Soft Works wrote: >> >>> >>> >>>> -----Original Message----- >>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Martin >>>> Storsjö >>>> Sent: Thursday, June 9, 2022 12:10 PM >>>> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> >>>> Subject: Re: [FFmpeg-devel] [PATCH v12 1/4] libavutil/wchar_filename.h: >> Add >>>> whcartoutf8, wchartoansi and utf8toansi >>>> >>>> On Sun, 5 Jun 2022, Nil Admirari wrote: >>>> >>>>> These functions are going to be used in libavformat/avisynth.c >>>>> and fftools/cmdutils.c to remove MAX_PATH limit. >>>>> --- >>>>> libavutil/wchar_filename.h | 51 ++++++++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 51 insertions(+) >>>> >>>> This patchset looks good to me too, thanks! If there's no more comments on >>>> it, I'll push it later. >>> >>> I missed to take another look at it. >>> >>> I'm wondering why it converts wchar back to ansi instead of utf8 in 4/4 and >>> whether it won't fail then, when a path contains non-ANSI chars. >> >> Yes, that's a preexisting problem there. That patch gets rid of the fixed >> path lengths without touching the rest of the charset handling there. >> >> There's paths from two sources; getenv() and from GetModuleFileName(). >> These are passed to plain fopen() (which uses ANSI filenames). > > What I meant is the use of wchartoansi() in patch 4/4. > > The current code calls GetModuleFileNameA() > the new code calls GetModuleFileNameW() and then converts the output > file name with wchartoansi(). > > Can we be sure that this is always the same? > Especially when the path contains non-ANSI chars? > > That's something quite different. GetModuleFileNameA() will return a > valid and usable path (but it could be an 8.3 path or have some > replacement chars) Right, I wasn't aware that the -A version would return a guaranteed-ansi-compatible version of the filename. If that's really the case, this patch would indeed be a minor step backwards. // Martin
> Right, I wasn't aware that the -A version would return a > guaranteed-ansi-compatible version of the filename. If that's really the > case, this patch would indeed be a minor step backwards. Two options are available: 1. fopen() is replaced with avpriv_fopen_utf8(), getenv() is made Unicode-aware on Windows, and wide version of get_module_filename() is used as it is now. 2. A wrapper around GetModuleFilenameA() created specifically for this case, only to be replaced later by option one, when avpriv_fopen_utf8() gets merged. If option one is chosen, the patch will have to wait for avpriv_fopen_utf8() patches.
On Thu, 9 Jun 2022, nil-admirari@mailo.com wrote: >> Right, I wasn't aware that the -A version would return a >> guaranteed-ansi-compatible version of the filename. If that's really the >> case, this patch would indeed be a minor step backwards. > > Two options are available: > 1. fopen() is replaced with avpriv_fopen_utf8(), getenv() is made Unicode-aware > on Windows, and wide version of get_module_filename() is used as it is now. > 2. A wrapper around GetModuleFilenameA() created specifically for this case, > only to be replaced later by option one, when avpriv_fopen_utf8() gets merged. > > If option one is chosen, the patch will have to wait for avpriv_fopen_utf8() patches. For that, the first option sounds better - that sounds to me more like a direction forward, not backwards. The patches for *_fopen_utf8 have already been merged - within the libraries, you can use avpriv_fopen_utf8, and fftools has got a separate "fopen_utf8" function it can use. // Martin
>> ... >> 1. fopen() is replaced with avpriv_fopen_utf8(), getenv() is made Unicode-aware >> on Windows, and wide version of get_module_filename() is used as it is now. > ... > For that, the first option sounds better - that sounds to me more like a > direction forward, not backwards. Done: https://ffmpeg.org/pipermail/ffmpeg-devel/2022-June/297491.html. getenv_utf8 is a static inline in its own header in libavutil. log.c, bktr.c, fbdev_common.c, oss.c and af_ladspa.c still use plain getenv(), please check commit message for details. All other uses of getenv() have been replaced with getenv_utf8(): https://ffmpeg.org/pipermail/ffmpeg-devel/2022-June/297495.html https://ffmpeg.org/pipermail/ffmpeg-devel/2022-June/297494.html https://ffmpeg.org/pipermail/ffmpeg-devel/2022-June/297493.html
diff --git a/libavutil/wchar_filename.h b/libavutil/wchar_filename.h index 90f0824..c0e5d47 100644 --- a/libavutil/wchar_filename.h +++ b/libavutil/wchar_filename.h @@ -40,6 +40,57 @@ static inline int utf8towchar(const char *filename_utf8, wchar_t **filename_w) MultiByteToWideChar(CP_UTF8, 0, filename_utf8, -1, *filename_w, num_chars); return 0; } + +av_warn_unused_result +static inline int wchartocp(unsigned int code_page, const wchar_t *filename_w, + char **filename) +{ + DWORD flags = code_page == CP_UTF8 ? MB_ERR_INVALID_CHARS : 0; + int num_chars = WideCharToMultiByte(code_page, flags, filename_w, -1, + NULL, 0, NULL, NULL); + if (num_chars <= 0) { + *filename = NULL; + return 0; + } + *filename = av_calloc(num_chars, sizeof(char)); + if (!*filename) { + errno = ENOMEM; + return -1; + } + WideCharToMultiByte(code_page, flags, filename_w, -1, + *filename, num_chars, NULL, NULL); + return 0; +} + +av_warn_unused_result +static inline int wchartoutf8(const wchar_t *filename_w, char **filename) +{ + return wchartocp(CP_UTF8, filename_w, filename); +} + +av_warn_unused_result +static inline int wchartoansi(const wchar_t *filename_w, char **filename) +{ + return wchartocp(CP_ACP, filename_w, filename); +} + +av_warn_unused_result +static inline int utf8toansi(const char *filename_utf8, char **filename) +{ + wchar_t *filename_w = NULL; + int ret = -1; + if (utf8towchar(filename_utf8, &filename_w)) + return -1; + + if (!filename_w) { + *filename = NULL; + return 0; + } + + ret = wchartoansi(filename_w, filename); + av_free(filename_w); + return ret; +} #endif #endif /* AVUTIL_WCHAR_FILENAME_H */