diff mbox

[FFmpeg-devel,2/2] configure: instruct MSVC 2015 to properly process UTF-8 string literals

Message ID 20170203084154.9612-2-h.leppkes@gmail.com
State Accepted
Commit 8b80feb9a70bca07e6ea2e1a0b870915e88f13f7
Headers show

Commit Message

Hendrik Leppkes Feb. 3, 2017, 8:41 a.m. UTC
Without the /UTF-8 switch, the MSVC compiler treats all files as in the
system codepage, instead of in UTF-8, which causes UTF-8 string literals
to be interpreted wrong.

This switch was only introduced in VS2015 Update 2, and any earlier
versions do not have an equivalent solution.

Fixes fate-sub-scc on MSVC 2015+
---
 configure | 3 +++
 1 file changed, 3 insertions(+)

Comments

James Almer Feb. 3, 2017, 2:05 p.m. UTC | #1
On 2/3/2017 5:41 AM, Hendrik Leppkes wrote:
> Without the /UTF-8 switch, the MSVC compiler treats all files as in the
> system codepage, instead of in UTF-8, which causes UTF-8 string literals
> to be interpreted wrong.
> 
> This switch was only introduced in VS2015 Update 2, and any earlier
> versions do not have an equivalent solution.
> 
> Fixes fate-sub-scc on MSVC 2015+
> ---
>  configure | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/configure b/configure
> index d3d652f0f4..231cc3eca7 100755
> --- a/configure
> +++ b/configure
> @@ -6327,6 +6327,9 @@ EOF
>      # Issue has been fixed in MSVC v19.00.24218.
>      check_cpp_condition windows.h "_MSC_FULL_VER >= 190024218" ||
>          check_cflags -d2SSAOptimizer-
> +    # enable utf-8 source processing on VS2015 U2 and newer
> +    check_cpp_condition windows.h "_MSC_FULL_VER >= 190023918" &&
> +        add_cflags -utf-8

Probably better use check_cflags, just in case.

>  fi
>  
>  for pfx in "" host_; do
>
Hendrik Leppkes Feb. 3, 2017, 3:32 p.m. UTC | #2
On Fri, Feb 3, 2017 at 3:05 PM, James Almer <jamrial@gmail.com> wrote:
> On 2/3/2017 5:41 AM, Hendrik Leppkes wrote:
>> Without the /UTF-8 switch, the MSVC compiler treats all files as in the
>> system codepage, instead of in UTF-8, which causes UTF-8 string literals
>> to be interpreted wrong.
>>
>> This switch was only introduced in VS2015 Update 2, and any earlier
>> versions do not have an equivalent solution.
>>
>> Fixes fate-sub-scc on MSVC 2015+
>> ---
>>  configure | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/configure b/configure
>> index d3d652f0f4..231cc3eca7 100755
>> --- a/configure
>> +++ b/configure
>> @@ -6327,6 +6327,9 @@ EOF
>>      # Issue has been fixed in MSVC v19.00.24218.
>>      check_cpp_condition windows.h "_MSC_FULL_VER >= 190024218" ||
>>          check_cflags -d2SSAOptimizer-
>> +    # enable utf-8 source processing on VS2015 U2 and newer
>> +    check_cpp_condition windows.h "_MSC_FULL_VER >= 190023918" &&
>> +        add_cflags -utf-8
>
> Probably better use check_cflags, just in case.
>

check_cflags doesn't work, since most wrong options just cause it to
emit a warning but not error out (although confusingly some do error
out, like the d2 option above, since the d2 prefix directly targets
the c2 compiler stage, and unknown options there error instead of
warn).
Thats the whole reason I added a version check in the first place
instead of solely using check_cflags with it.

I mean, no real harm to use check_cflags together with the version
check, but since it doesn't do anything, I figured I would save the
extra check and a few forks.

- Hendrik
Matt Oliver Feb. 4, 2017, 4:50 a.m. UTC | #3
On 4 February 2017 at 02:32, Hendrik Leppkes <h.leppkes@gmail.com> wrote:

> On Fri, Feb 3, 2017 at 3:05 PM, James Almer <jamrial@gmail.com> wrote:
> > On 2/3/2017 5:41 AM, Hendrik Leppkes wrote:
> >> Without the /UTF-8 switch, the MSVC compiler treats all files as in the
> >> system codepage, instead of in UTF-8, which causes UTF-8 string literals
> >> to be interpreted wrong.
> >>
> >> This switch was only introduced in VS2015 Update 2, and any earlier
> >> versions do not have an equivalent solution.
> >>
> >> Fixes fate-sub-scc on MSVC 2015+
> >> ---
> >>  configure | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/configure b/configure
> >> index d3d652f0f4..231cc3eca7 100755
> >> --- a/configure
> >> +++ b/configure
> >> @@ -6327,6 +6327,9 @@ EOF
> >>      # Issue has been fixed in MSVC v19.00.24218.
> >>      check_cpp_condition windows.h "_MSC_FULL_VER >= 190024218" ||
> >>          check_cflags -d2SSAOptimizer-
> >> +    # enable utf-8 source processing on VS2015 U2 and newer
> >> +    check_cpp_condition windows.h "_MSC_FULL_VER >= 190023918" &&
> >> +        add_cflags -utf-8
> >
> > Probably better use check_cflags, just in case.
> >
>
> check_cflags doesn't work, since most wrong options just cause it to
> emit a warning but not error out (although confusingly some do error
> out, like the d2 option above, since the d2 prefix directly targets
> the c2 compiler stage, and unknown options there error instead of
> warn).
> Thats the whole reason I added a version check in the first place
> instead of solely using check_cflags with it.
>
> I mean, no real harm to use check_cflags together with the version
> check, but since it doesn't do anything, I figured I would save the
> extra check and a few forks.


Is there any possibility to also find a fix for this on older msvc
versions? If you direct me to the specific lines of code that are
mis-compiling/executing ill have a look.
In the mean time this patch looks good to me.
Hendrik Leppkes Feb. 4, 2017, 9:25 a.m. UTC | #4
On Sat, Feb 4, 2017 at 5:50 AM, Matt Oliver <protogonoi@gmail.com> wrote:
> On 4 February 2017 at 02:32, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
>
>> On Fri, Feb 3, 2017 at 3:05 PM, James Almer <jamrial@gmail.com> wrote:
>> > On 2/3/2017 5:41 AM, Hendrik Leppkes wrote:
>> >> Without the /UTF-8 switch, the MSVC compiler treats all files as in the
>> >> system codepage, instead of in UTF-8, which causes UTF-8 string literals
>> >> to be interpreted wrong.
>> >>
>> >> This switch was only introduced in VS2015 Update 2, and any earlier
>> >> versions do not have an equivalent solution.
>> >>
>> >> Fixes fate-sub-scc on MSVC 2015+
>> >> ---
>> >>  configure | 3 +++
>> >>  1 file changed, 3 insertions(+)
>> >>
>> >> diff --git a/configure b/configure
>> >> index d3d652f0f4..231cc3eca7 100755
>> >> --- a/configure
>> >> +++ b/configure
>> >> @@ -6327,6 +6327,9 @@ EOF
>> >>      # Issue has been fixed in MSVC v19.00.24218.
>> >>      check_cpp_condition windows.h "_MSC_FULL_VER >= 190024218" ||
>> >>          check_cflags -d2SSAOptimizer-
>> >> +    # enable utf-8 source processing on VS2015 U2 and newer
>> >> +    check_cpp_condition windows.h "_MSC_FULL_VER >= 190023918" &&
>> >> +        add_cflags -utf-8
>> >
>> > Probably better use check_cflags, just in case.
>> >
>>
>> check_cflags doesn't work, since most wrong options just cause it to
>> emit a warning but not error out (although confusingly some do error
>> out, like the d2 option above, since the d2 prefix directly targets
>> the c2 compiler stage, and unknown options there error instead of
>> warn).
>> Thats the whole reason I added a version check in the first place
>> instead of solely using check_cflags with it.
>>
>> I mean, no real harm to use check_cflags together with the version
>> check, but since it doesn't do anything, I figured I would save the
>> extra check and a few forks.
>
>
> Is there any possibility to also find a fix for this on older msvc
> versions? If you direct me to the specific lines of code that are
> mis-compiling/executing ill have a look.
> In the mean time this patch looks good to me.

For this particular fate test, ccaption_dec.c line 73 onwards has
various unicode escape sequences, which don't get interpreted
properly.

The one generic workaround is encoding byte sequences directly, which
is really ugly and code-changes are usually hard to justify.
Another MSVC workaround is an undocumented compiler pragma, which
works in 2013, but not in 2012 (#pragma
execution_character_set("utf-8")), but adding pragmas to files is also
not something you can just easily hide away without it becoming ugly.

- Hendrik
Hendrik Leppkes Feb. 4, 2017, 9:26 a.m. UTC | #5
On Sat, Feb 4, 2017 at 10:25 AM, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
> On Sat, Feb 4, 2017 at 5:50 AM, Matt Oliver <protogonoi@gmail.com> wrote:
>> On 4 February 2017 at 02:32, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
>>
>>> On Fri, Feb 3, 2017 at 3:05 PM, James Almer <jamrial@gmail.com> wrote:
>>> > On 2/3/2017 5:41 AM, Hendrik Leppkes wrote:
>>> >> Without the /UTF-8 switch, the MSVC compiler treats all files as in the
>>> >> system codepage, instead of in UTF-8, which causes UTF-8 string literals
>>> >> to be interpreted wrong.
>>> >>
>>> >> This switch was only introduced in VS2015 Update 2, and any earlier
>>> >> versions do not have an equivalent solution.
>>> >>
>>> >> Fixes fate-sub-scc on MSVC 2015+
>>> >> ---
>>> >>  configure | 3 +++
>>> >>  1 file changed, 3 insertions(+)
>>> >>
>>> >> diff --git a/configure b/configure
>>> >> index d3d652f0f4..231cc3eca7 100755
>>> >> --- a/configure
>>> >> +++ b/configure
>>> >> @@ -6327,6 +6327,9 @@ EOF
>>> >>      # Issue has been fixed in MSVC v19.00.24218.
>>> >>      check_cpp_condition windows.h "_MSC_FULL_VER >= 190024218" ||
>>> >>          check_cflags -d2SSAOptimizer-
>>> >> +    # enable utf-8 source processing on VS2015 U2 and newer
>>> >> +    check_cpp_condition windows.h "_MSC_FULL_VER >= 190023918" &&
>>> >> +        add_cflags -utf-8
>>> >
>>> > Probably better use check_cflags, just in case.
>>> >
>>>
>>> check_cflags doesn't work, since most wrong options just cause it to
>>> emit a warning but not error out (although confusingly some do error
>>> out, like the d2 option above, since the d2 prefix directly targets
>>> the c2 compiler stage, and unknown options there error instead of
>>> warn).
>>> Thats the whole reason I added a version check in the first place
>>> instead of solely using check_cflags with it.
>>>
>>> I mean, no real harm to use check_cflags together with the version
>>> check, but since it doesn't do anything, I figured I would save the
>>> extra check and a few forks.
>>
>>
>> Is there any possibility to also find a fix for this on older msvc
>> versions? If you direct me to the specific lines of code that are
>> mis-compiling/executing ill have a look.
>> In the mean time this patch looks good to me.
>
> For this particular fate test, ccaption_dec.c line 73 onwards has
> various unicode escape sequences, which don't get interpreted
> properly.
>
> The one generic workaround is encoding byte sequences directly, which
> is really ugly and code-changes are usually hard to justify.
> Another MSVC workaround is an undocumented compiler pragma, which
> works in 2013, but not in 2012 (#pragma
> execution_character_set("utf-8")), but adding pragmas to files is also
> not something you can just easily hide away without it becoming ugly.
>

Oh, and one other option is re-saving the file with a UTF-8 BOM, which
the compiler will also use as a hint to support UTF-8 sequences,
however UTF-8 BOM is also usually frowned upon, not to mention that
all files using such features would need to be identified first.

- Hendrik
Carl Eugen Hoyos Feb. 4, 2017, 9:29 a.m. UTC | #6
2017-02-04 10:25 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>:

> Another MSVC workaround is an undocumented compiler pragma, which
> works in 2013, but not in 2012 (#pragma
> execution_character_set("utf-8")), but adding pragmas to files is also
> not something you can just easily hide away without it becoming ugly.

Is there a real disadvantage adding the pragma after your patch
gets applied?

Carl Eugen
Hendrik Leppkes Feb. 4, 2017, 10:23 a.m. UTC | #7
On Sat, Feb 4, 2017 at 10:29 AM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2017-02-04 10:25 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>:
>
>> Another MSVC workaround is an undocumented compiler pragma, which
>> works in 2013, but not in 2012 (#pragma
>> execution_character_set("utf-8")), but adding pragmas to files is also
>> not something you can just easily hide away without it becoming ugly.
>
> Is there a real disadvantage adding the pragma after your patch
> gets applied?
>

Adding pragmas is ugly because you can't hide them away, but its
independent of this patch either way, since it targets a different
compiler version.

Applied this patch in the meantime, as well as 1/2 from the set.

- Hendrik
Matt Oliver Feb. 5, 2017, 8:08 a.m. UTC | #8
On 4 February 2017 at 21:23, Hendrik Leppkes <h.leppkes@gmail.com> wrote:

> On Sat, Feb 4, 2017 at 10:29 AM, Carl Eugen Hoyos <ceffmpeg@gmail.com>
> wrote:
> > 2017-02-04 10:25 GMT+01:00 Hendrik Leppkes <h.leppkes@gmail.com>:
> >
> >> Another MSVC workaround is an undocumented compiler pragma, which
> >> works in 2013, but not in 2012 (#pragma
> >> execution_character_set("utf-8")), but adding pragmas to files is also
> >> not something you can just easily hide away without it becoming ugly.
> >
> > Is there a real disadvantage adding the pragma after your patch
> > gets applied?
> >
>
> Adding pragmas is ugly because you can't hide them away, but its
> independent of this patch either way, since it targets a different
> compiler version.
>
> Applied this patch in the meantime, as well as 1/2 from the set.
>

Well given that the pragma doesn't work in 2012 and we are still claiming
support for those older msvc versions then it wouldn't really be a suitable
fix anyway.

As far as I see it it the only way to cover all currently supported msvc
versions would be to convert the utf chars directly to binary. Which is a
bit ugly granted, but im not sure there is anyway around it. MSVC appears
to output a warning about characters that dont fit in the current code page
and at the moment only ccaption_dec.c is mentioned so it appears tat
luckily only 1 file needs to be changed.
diff mbox

Patch

diff --git a/configure b/configure
index d3d652f0f4..231cc3eca7 100755
--- a/configure
+++ b/configure
@@ -6327,6 +6327,9 @@  EOF
     # Issue has been fixed in MSVC v19.00.24218.
     check_cpp_condition windows.h "_MSC_FULL_VER >= 190024218" ||
         check_cflags -d2SSAOptimizer-
+    # enable utf-8 source processing on VS2015 U2 and newer
+    check_cpp_condition windows.h "_MSC_FULL_VER >= 190023918" &&
+        add_cflags -utf-8
 fi
 
 for pfx in "" host_; do