Message ID | 20220331172351.550818-1-bavison@riscosopen.org |
---|---|
Headers | show |
Series | avcodec/vc1: Arm optimisations | expand |
On Thu, 31 Mar 2022, Ben Avison wrote: > The VC1 decoder was missing lots of important fast paths for Arm, especially > for 64-bit Arm. This submission fills in implementations for all functions > where a fast path already existed and the fallback C implementation was > taking 1% or more of the runtime, and adds a new fast path to permit > vc1_unescape_buffer() to be overridden. > > I've measured the playback speed on a 1.5 GHz Cortex-A72 (Raspberry Pi 4) > using `ffmpeg -i <bitstream> -f null -` for a couple of example streams: > > Architecture: AArch32 AArch32 AArch64 AArch64 > Stream: 1 2 1 2 > Before speed: 1.22x 0.82x 1.00x 0.67x > After speed: 1.31x 0.98x 1.39x 1.06x > Improvement: 7.4% 20% 39% 58% > > `make fate` passes on both AArch32 and AArch64. > > Changes in v2: > > * Refactor checkasm tests to convert some macros into functions. > * Remove cast-to-void of checked_call. > * Limit 16-bit values in idctdsp checkasm test to +/-0x100. > * Reinstate ff_add_pixels_clamped_arm. > * Adapt vc1 deblocking filters to specify stride as ptrdiff_t. > * Add align specifiers to a few VLD/VST instructions for AArch32 deblocking > filter, and adapt checkasm test not to test with tighter alignment than is > encountered in normal use. > * Correct unescape buffer memcmp length. > * Update benchmarks for AArch64 idctdsp. Thanks! From a quick readthrough, this version of the patchset seems good to me! I'll run it through some more testing, and push it if everything seems to work fine (tomorrow or so). // Martin
On Fri, 1 Apr 2022, Martin Storsjö wrote: > On Thu, 31 Mar 2022, Ben Avison wrote: > >> The VC1 decoder was missing lots of important fast paths for Arm, >> especially >> for 64-bit Arm. This submission fills in implementations for all functions >> where a fast path already existed and the fallback C implementation was >> taking 1% or more of the runtime, and adds a new fast path to permit >> vc1_unescape_buffer() to be overridden. >> >> I've measured the playback speed on a 1.5 GHz Cortex-A72 (Raspberry Pi 4) >> using `ffmpeg -i <bitstream> -f null -` for a couple of example streams: >> >> Architecture: AArch32 AArch32 AArch64 AArch64 >> Stream: 1 2 1 2 >> Before speed: 1.22x 0.82x 1.00x 0.67x >> After speed: 1.31x 0.98x 1.39x 1.06x >> Improvement: 7.4% 20% 39% 58% >> >> `make fate` passes on both AArch32 and AArch64. >> >> Changes in v2: >> >> * Refactor checkasm tests to convert some macros into functions. >> * Remove cast-to-void of checked_call. >> * Limit 16-bit values in idctdsp checkasm test to +/-0x100. >> * Reinstate ff_add_pixels_clamped_arm. >> * Adapt vc1 deblocking filters to specify stride as ptrdiff_t. >> * Add align specifiers to a few VLD/VST instructions for AArch32 deblocking >> filter, and adapt checkasm test not to test with tighter alignment than is >> encountered in normal use. >> * Correct unescape buffer memcmp length. >> * Update benchmarks for AArch64 idctdsp. > > Thanks! From a quick readthrough, this version of the patchset seems good to > me! I'll run it through some more testing, and push it if everything seems to > work fine (tomorrow or so). Pushed now - thanks for your contribution! // Martin