mbox series

[FFmpeg-devel,00/41] Stop including superseded functions for x64

Message ID DB6PR0101MB2214F3355A6887EB7203DAA38FA79@DB6PR0101MB2214.eurprd01.prod.exchangelabs.com
Headers show
Series Stop including superseded functions for x64 | expand

Message

Andreas Rheinhardt June 9, 2022, 11:15 p.m. UTC
x64 requires MMX, MMXEXT, SSE and SSE2; yet there is no shortage
of code like the following:

    if (EXTERNAL_MMX(cpu_flags)) {
        c->ssd_int8_vs_int16 = ff_ssd_int8_vs_int16_mmx;
    }
    if (EXTERNAL_SSE2(cpu_flags)) {
        c->ssd_int8_vs_int16 = ff_ssd_int8_vs_int16_sse2;
    }

Given that SSE2 is always present on x64, the only way
for the mmx version to be chosen in the above example
is if SSE2 has been disabled either at compile-time
or at runtime, i.e. it is never used unless one shoots
oneself in the foot.
This patchset therefore disables such functions for x64
by #if'ing them away; x86 has not been affected. This
saves about 140KB.

(Another way to handle this would be to remove every function
that would be overridden if one had a processor capable of
MMX, MMXEXT, SSE and SSE2. x86 processors not fulfilling
this requirement (which are truely ancient nowadays)
would still work, but would be slower, i.e. they would be treated
as second-class citizens. This would have the advantage of
avoiding #ifs and would lighten x86 binaries of code that is
not used at all by the overwhelming majority of users.
I'll update this patchset if it is preferred to do it that way.)

Andreas Rheinhardt (41):
  avcodec/x86/qpeldsp: Remove unused ff_put_no_rnd_pixels16_l2_3dnow
  avcodec/x86/hevcdsp_init: Remove unnecessary inclusion of get_bits.h
  avcodec/hevcdec: Make ff_hevc_pel_weight static
  avcodec/v4l2_m2m: Remove unused ff_v4l2_m2m_codec_full_reinit
  avcodec/videodsp: Make ff_emulated_edge_mc_16 static
  avcodec/x86/fpel: Remove unused ff_avg_pixels4_mmx
  avcodec/x86/rv34dsp: Remove unused ff_rv34_idct_dc_mmxext
  avcodec/x86/h264_qpel_8bit: Remove unused function
  avcodec/x86/vc1dsp_init: Disable overridden functions on x64
  avcodec/x86/ac3dsp_init: Disable overridden functions on x64
  avcodec/x86/audiodsp_init: Disable overridden functions on x64
  avcodec/x86/diracdsp_init: Disable overridden functions on x64
  avcodec/x86/mpegvideoenc: Disable overridden functions on x64
  avcodec/x86/fdct: Disable overridden functions on x64
  avcodec/x86/hevcdsp_init: Disable overridden functions on x64
  avcodec/x86/rv40dsp_init: Disable overridden functions on x64
  avcodec/x86/cavsdsp: Disable overridden functions on x64
  avcodec/x86/h264_intrapred_init: Disable overridden functions on x64
  avfilter/x86/vf_noise: Disable overridden functions on x64
  avcodec/x86/me_cmp: Disable overridden functions on x64
  avcodec/x86/mpegvideoencdsp: Disable ff_pix_norm1_mmx on x64
  avcodec/x86/h264dsp_init: Disable overridden functions on x64
  avcodec/x86/sbrdsp_init: Disable overridden functions on x64
  avcodec/x86/idctdsp_init: Disable overridden functions on x64
  avcodec/x86/blockdsp_init: Disable overridden functions on x64
  avcodec/x86/pixblockdsp_init: Disable overridden functions on x64
  avcodec/x86/lossless_audiodsp_init: Disable overridden functions on
    x64
  avcodec/x86/svq1enc_init: Disable overridden functions on x64
  avcodec/x86/fmtconvert_init: Disable overridden functions on x64
  avcodec/x86/hpeldsp_vp3_init: Disable overridden functions on x64
  avcodec/x86/hpeldsp_init: Disable overridden functions on x64
  avcodec/x86/h264_qpel: Make functions only used here static
  avcodec/x86/h264_qpel: Disable overridden functions on x64
  avcodec/x86/h264chroma_init: Disable overridden functions on x64
  swresample/x86/audio_convert_init: Disable overridden functions on x64
  swresample/x86/rematrix_init: Disable overridden functions on x64
  swscale/x86/rgb2rgb: Disable overridden functions on x64
  swscale/x86/yuv2rgb: Disable overridden functions on x64
  swscale/x86/swscale: Disable overridden functions on x64
  avfilter/x86/vf_eq_init: Disable overridden functions on x64
  avutil/x86/pixelutils_init: Disable overridden functions on x64

 libavcodec/hevcdec.c                    | 10 +--
 libavcodec/tests/x86/dct.c              |  4 +-
 libavcodec/v4l2_m2m.c                   | 76 ---------------------
 libavcodec/videodsp.c                   |  4 ++
 libavcodec/videodsp.h                   |  1 -
 libavcodec/videodsp_template.c          |  1 +
 libavcodec/x86/ac3dsp.asm               |  5 ++
 libavcodec/x86/ac3dsp_init.c            |  2 +
 libavcodec/x86/audiodsp.asm             |  4 ++
 libavcodec/x86/audiodsp_init.c          |  2 +
 libavcodec/x86/blockdsp.asm             |  4 ++
 libavcodec/x86/blockdsp_init.c          |  2 +
 libavcodec/x86/cavsdsp.c                | 20 ++++--
 libavcodec/x86/cavsidct.asm             |  2 +
 libavcodec/x86/diracdsp_init.c          |  4 +-
 libavcodec/x86/fdct.c                   | 12 ++--
 libavcodec/x86/fdctdsp_init.c           |  2 +
 libavcodec/x86/fmtconvert.asm           |  4 ++
 libavcodec/x86/fmtconvert_init.c        |  2 +
 libavcodec/x86/fpel.asm                 |  3 +-
 libavcodec/x86/h264_chromamc.asm        |  4 +-
 libavcodec/x86/h264_deblock.asm         | 24 ++-----
 libavcodec/x86/h264_idct.asm            | 57 ++++------------
 libavcodec/x86/h264_intrapred.asm       | 26 ++++++++
 libavcodec/x86/h264_intrapred_10bit.asm | 16 +++++
 libavcodec/x86/h264_intrapred_init.c    | 20 ++++--
 libavcodec/x86/h264_qpel.c              | 88 ++++++++++++++-----------
 libavcodec/x86/h264_qpel_8bit.asm       |  5 +-
 libavcodec/x86/h264_weight.asm          |  8 +++
 libavcodec/x86/h264chroma_init.c        |  2 +
 libavcodec/x86/h264dsp_init.c           | 21 ++++--
 libavcodec/x86/hevc_idct.asm            |  2 +
 libavcodec/x86/hevcdsp_init.c           |  7 +-
 libavcodec/x86/hpeldsp.asm              | 22 +++++++
 libavcodec/x86/hpeldsp_init.c           | 40 +++++++----
 libavcodec/x86/hpeldsp_vp3.asm          |  4 ++
 libavcodec/x86/hpeldsp_vp3_init.c       |  2 +
 libavcodec/x86/idctdsp.asm              |  6 ++
 libavcodec/x86/idctdsp_init.c           |  4 ++
 libavcodec/x86/lossless_audiodsp.asm    |  2 +
 libavcodec/x86/lossless_audiodsp_init.c |  2 +
 libavcodec/x86/me_cmp.asm               |  6 ++
 libavcodec/x86/me_cmp_init.c            | 61 ++++++++++-------
 libavcodec/x86/mpegvideoenc.c           |  8 ++-
 libavcodec/x86/mpegvideoencdsp.asm      |  2 +
 libavcodec/x86/pixblockdsp.asm          |  4 ++
 libavcodec/x86/pixblockdsp_init.c       |  2 +
 libavcodec/x86/qpeldsp.asm              |  2 -
 libavcodec/x86/rnd_template.c           |  2 +
 libavcodec/x86/rv34dsp.asm              | 13 +---
 libavcodec/x86/rv40dsp.asm              |  2 +
 libavcodec/x86/rv40dsp_init.c           | 10 +--
 libavcodec/x86/sbrdsp.asm               |  2 +
 libavcodec/x86/sbrdsp_init.c            |  2 +
 libavcodec/x86/simple_idct.asm          |  2 +
 libavcodec/x86/svq1enc.asm              |  2 +
 libavcodec/x86/svq1enc_init.c           |  2 +
 libavcodec/x86/vc1dsp_init.c            | 41 ++++++++----
 libavcodec/x86/vc1dsp_loopfilter.asm    |  2 +
 libavfilter/x86/vf_eq.asm               |  2 +
 libavfilter/x86/vf_eq_init.c            |  4 ++
 libavfilter/x86/vf_noise.c              |  4 ++
 libavutil/x86/pixelutils.asm            |  4 ++
 libavutil/x86/pixelutils_init.c         |  4 ++
 libswresample/x86/audio_convert.asm     |  2 +
 libswresample/x86/audio_convert_init.c  |  4 ++
 libswresample/x86/rematrix.asm          |  2 +
 libswresample/x86/rematrix_init.c       |  2 +
 libswscale/x86/rgb2rgb.c                |  6 ++
 libswscale/x86/rgb2rgb_template.c       | 10 +--
 libswscale/x86/swscale.c                |  6 +-
 libswscale/x86/yuv2rgb.c                | 13 ++--
 libswscale/x86/yuv2rgb_template.c       |  5 +-
 libswscale/x86/yuv_2_rgb.asm            |  5 ++
 74 files changed, 472 insertions(+), 292 deletions(-)

Comments

Andreas Rheinhardt June 11, 2022, 8:14 p.m. UTC | #1
Andreas Rheinhardt:
> x64 requires MMX, MMXEXT, SSE and SSE2; yet there is no shortage
> of code like the following:
> 
>     if (EXTERNAL_MMX(cpu_flags)) {
>         c->ssd_int8_vs_int16 = ff_ssd_int8_vs_int16_mmx;
>     }
>     if (EXTERNAL_SSE2(cpu_flags)) {
>         c->ssd_int8_vs_int16 = ff_ssd_int8_vs_int16_sse2;
>     }
> 
> Given that SSE2 is always present on x64, the only way
> for the mmx version to be chosen in the above example
> is if SSE2 has been disabled either at compile-time
> or at runtime, i.e. it is never used unless one shoots
> oneself in the foot.
> This patchset therefore disables such functions for x64
> by #if'ing them away; x86 has not been affected. This
> saves about 140KB.
> 
> (Another way to handle this would be to remove every function
> that would be overridden if one had a processor capable of
> MMX, MMXEXT, SSE and SSE2. x86 processors not fulfilling
> this requirement (which are truely ancient nowadays)
> would still work, but would be slower, i.e. they would be treated
> as second-class citizens. This would have the advantage of
> avoiding #ifs and would lighten x86 binaries of code that is
> not used at all by the overwhelming majority of users.
> I'll update this patchset if it is preferred to do it that way.)
> 

I have now implemented this other way mentioned above (i.e. removing
stuff that is overridden if SSE2 is available altogether also for
x86-32); the result can be seen here:
https://github.com/mkver/FFmpeg/commits/mmx2
I prefer this to the old version because of the reduced complexity which
dwarfs the potential to slow down some ancient systems a bit (if these
ancient systems use an up-to-date FFmpeg which is quite unlikely).
Furthermore, some of the MMX scale functions that are removed are
buggy/not bixexact. See
https://github.com/mkver/FFmpeg/commit/c5513ad962100040601b5eba0042692a740ac50a
(or shall I post these patches?)
Is anyone against this removal?

- Andreas
Andreas Rheinhardt June 20, 2022, 11:16 a.m. UTC | #2
Andreas Rheinhardt:
> Andreas Rheinhardt:
>> x64 requires MMX, MMXEXT, SSE and SSE2; yet there is no shortage
>> of code like the following:
>>
>>     if (EXTERNAL_MMX(cpu_flags)) {
>>         c->ssd_int8_vs_int16 = ff_ssd_int8_vs_int16_mmx;
>>     }
>>     if (EXTERNAL_SSE2(cpu_flags)) {
>>         c->ssd_int8_vs_int16 = ff_ssd_int8_vs_int16_sse2;
>>     }
>>
>> Given that SSE2 is always present on x64, the only way
>> for the mmx version to be chosen in the above example
>> is if SSE2 has been disabled either at compile-time
>> or at runtime, i.e. it is never used unless one shoots
>> oneself in the foot.
>> This patchset therefore disables such functions for x64
>> by #if'ing them away; x86 has not been affected. This
>> saves about 140KB.
>>
>> (Another way to handle this would be to remove every function
>> that would be overridden if one had a processor capable of
>> MMX, MMXEXT, SSE and SSE2. x86 processors not fulfilling
>> this requirement (which are truely ancient nowadays)
>> would still work, but would be slower, i.e. they would be treated
>> as second-class citizens. This would have the advantage of
>> avoiding #ifs and would lighten x86 binaries of code that is
>> not used at all by the overwhelming majority of users.
>> I'll update this patchset if it is preferred to do it that way.)
>>
> 
> I have now implemented this other way mentioned above (i.e. removing
> stuff that is overridden if SSE2 is available altogether also for
> x86-32); the result can be seen here:
> https://github.com/mkver/FFmpeg/commits/mmx2
> I prefer this to the old version because of the reduced complexity which
> dwarfs the potential to slow down some ancient systems a bit (if these
> ancient systems use an up-to-date FFmpeg which is quite unlikely).
> Furthermore, some of the MMX scale functions that are removed are
> buggy/not bixexact. See
> https://github.com/mkver/FFmpeg/commit/c5513ad962100040601b5eba0042692a740ac50a
> (or shall I post these patches?)
> Is anyone against this removal?
> 

Given that no one was against removing these old functions, but several
people (on IRC) supported the idea I will go ahead and do it. I will
apply https://github.com/mkver/FFmpeg/commits/mmx3 (an updated and
extended version of the branch linked to above) in two days unless there
are objections.
(I can also send this to the mailing list if desired.)

- Andreas