[FFmpeg-devel,1/4] avcodec/x86/vvc: add alf filter luma and chroma avx2 optimizations

From: Wu Jianhua <toqsxw@outlook.com>

From: Wu Jianhua <toqsxw@outlook.com>

vvc_alf_filter_chroma_4x4_10_c: 657.0
vvc_alf_filter_chroma_4x4_10_avx2: 138.0
vvc_alf_filter_chroma_4x8_10_c: 1264.7
vvc_alf_filter_chroma_4x8_10_avx2: 253.5
vvc_alf_filter_chroma_4x12_10_c: 1841.7
vvc_alf_filter_chroma_4x12_10_avx2: 375.5
vvc_alf_filter_chroma_4x16_10_c: 2442.7
vvc_alf_filter_chroma_4x16_10_avx2: 491.7
vvc_alf_filter_chroma_4x20_10_c: 3057.0
vvc_alf_filter_chroma_4x20_10_avx2: 607.2
vvc_alf_filter_chroma_4x24_10_c: 3667.0
vvc_alf_filter_chroma_4x24_10_avx2: 747.5
vvc_alf_filter_chroma_4x28_10_c: 4286.7
vvc_alf_filter_chroma_4x28_10_avx2: 849.0
vvc_alf_filter_chroma_4x32_10_c: 4886.0
vvc_alf_filter_chroma_4x32_10_avx2: 967.5
vvc_alf_filter_chroma_8x4_10_c: 1250.5
vvc_alf_filter_chroma_8x4_10_avx2: 261.0
vvc_alf_filter_chroma_8x8_10_c: 2430.7
vvc_alf_filter_chroma_8x8_10_avx2: 494.7
vvc_alf_filter_chroma_8x12_10_c: 3631.2
vvc_alf_filter_chroma_8x12_10_avx2: 734.5
vvc_alf_filter_chroma_8x16_10_c: 13675.7
vvc_alf_filter_chroma_8x16_10_avx2: 972.0
vvc_alf_filter_chroma_8x20_10_c: 6212.0
vvc_alf_filter_chroma_8x20_10_avx2: 1211.0
vvc_alf_filter_chroma_8x24_10_c: 7440.7
vvc_alf_filter_chroma_8x24_10_avx2: 1447.0
vvc_alf_filter_chroma_8x28_10_c: 8460.5
vvc_alf_filter_chroma_8x28_10_avx2: 1682.5
vvc_alf_filter_chroma_8x32_10_c: 9665.2
vvc_alf_filter_chroma_8x32_10_avx2: 1917.7
vvc_alf_filter_chroma_12x4_10_c: 1865.2
vvc_alf_filter_chroma_12x4_10_avx2: 391.7
vvc_alf_filter_chroma_12x8_10_c: 3625.2
vvc_alf_filter_chroma_12x8_10_avx2: 739.0
vvc_alf_filter_chroma_12x12_10_c: 5427.5
vvc_alf_filter_chroma_12x12_10_avx2: 1094.2
vvc_alf_filter_chroma_12x16_10_c: 7237.7
vvc_alf_filter_chroma_12x16_10_avx2: 1447.2
vvc_alf_filter_chroma_12x20_10_c: 9035.2
vvc_alf_filter_chroma_12x20_10_avx2: 1805.2
vvc_alf_filter_chroma_12x24_10_c: 11135.7
vvc_alf_filter_chroma_12x24_10_avx2: 2158.2
vvc_alf_filter_chroma_12x28_10_c: 12644.0
vvc_alf_filter_chroma_12x28_10_avx2: 2511.2
vvc_alf_filter_chroma_12x32_10_c: 14441.7
vvc_alf_filter_chroma_12x32_10_avx2: 2888.0
vvc_alf_filter_chroma_16x4_10_c: 2410.0
vvc_alf_filter_chroma_16x4_10_avx2: 251.7
vvc_alf_filter_chroma_16x8_10_c: 4943.0
vvc_alf_filter_chroma_16x8_10_avx2: 479.0
vvc_alf_filter_chroma_16x12_10_c: 7235.5
vvc_alf_filter_chroma_16x12_10_avx2: 9751.0
vvc_alf_filter_chroma_16x16_10_c: 10142.7
vvc_alf_filter_chroma_16x16_10_avx2: 935.5
vvc_alf_filter_chroma_16x20_10_c: 12029.0
vvc_alf_filter_chroma_16x20_10_avx2: 1174.5
vvc_alf_filter_chroma_16x24_10_c: 14414.2
vvc_alf_filter_chroma_16x24_10_avx2: 1410.5
vvc_alf_filter_chroma_16x28_10_c: 16813.0
vvc_alf_filter_chroma_16x28_10_avx2: 1713.0
vvc_alf_filter_chroma_16x32_10_c: 19228.5
vvc_alf_filter_chroma_16x32_10_avx2: 2256.0
vvc_alf_filter_chroma_20x4_10_c: 3015.2
vvc_alf_filter_chroma_20x4_10_avx2: 371.7
vvc_alf_filter_chroma_20x8_10_c: 6170.2
vvc_alf_filter_chroma_20x8_10_avx2: 721.0
vvc_alf_filter_chroma_20x12_10_c: 9019.7
vvc_alf_filter_chroma_20x12_10_avx2: 1102.7
vvc_alf_filter_chroma_20x16_10_c: 12040.2
vvc_alf_filter_chroma_20x16_10_avx2: 1422.5
vvc_alf_filter_chroma_20x20_10_c: 15010.7
vvc_alf_filter_chroma_20x20_10_avx2: 1765.7
vvc_alf_filter_chroma_20x24_10_c: 18017.7
vvc_alf_filter_chroma_20x24_10_avx2: 2124.7
vvc_alf_filter_chroma_20x28_10_c: 21025.5
vvc_alf_filter_chroma_20x28_10_avx2: 2488.2
vvc_alf_filter_chroma_20x32_10_c: 31128.5
vvc_alf_filter_chroma_20x32_10_avx2: 3205.2
vvc_alf_filter_chroma_24x4_10_c: 3701.2
vvc_alf_filter_chroma_24x4_10_avx2: 494.7
vvc_alf_filter_chroma_24x8_10_c: 7613.0
vvc_alf_filter_chroma_24x8_10_avx2: 957.2
vvc_alf_filter_chroma_24x12_10_c: 10816.7
vvc_alf_filter_chroma_24x12_10_avx2: 1427.7
vvc_alf_filter_chroma_24x16_10_c: 14390.5
vvc_alf_filter_chroma_24x16_10_avx2: 1948.2
vvc_alf_filter_chroma_24x20_10_c: 17989.5
vvc_alf_filter_chroma_24x20_10_avx2: 2363.7
vvc_alf_filter_chroma_24x24_10_c: 21581.7
vvc_alf_filter_chroma_24x24_10_avx2: 2839.7
vvc_alf_filter_chroma_24x28_10_c: 25179.2
vvc_alf_filter_chroma_24x28_10_avx2: 3313.2
vvc_alf_filter_chroma_24x32_10_c: 28776.2
vvc_alf_filter_chroma_24x32_10_avx2: 4154.7
vvc_alf_filter_chroma_28x4_10_c: 4331.2
vvc_alf_filter_chroma_28x4_10_avx2: 624.2
vvc_alf_filter_chroma_28x8_10_c: 8445.0
vvc_alf_filter_chroma_28x8_10_avx2: 1197.7
vvc_alf_filter_chroma_28x12_10_c: 12684.5
vvc_alf_filter_chroma_28x12_10_avx2: 1786.7
vvc_alf_filter_chroma_28x16_10_c: 16924.5
vvc_alf_filter_chroma_28x16_10_avx2: 2378.7
vvc_alf_filter_chroma_28x20_10_c: 38361.0
vvc_alf_filter_chroma_28x20_10_avx2: 2967.0
vvc_alf_filter_chroma_28x24_10_c: 25329.0
vvc_alf_filter_chroma_28x24_10_avx2: 3564.2
vvc_alf_filter_chroma_28x28_10_c: 29514.0
vvc_alf_filter_chroma_28x28_10_avx2: 4151.7
vvc_alf_filter_chroma_28x32_10_c: 33673.2
vvc_alf_filter_chroma_28x32_10_avx2: 5125.0
vvc_alf_filter_chroma_32x4_10_c: 4945.2
vvc_alf_filter_chroma_32x4_10_avx2: 485.7
vvc_alf_filter_chroma_32x8_10_c: 9658.7
vvc_alf_filter_chroma_32x8_10_avx2: 943.7
vvc_alf_filter_chroma_32x12_10_c: 16177.7
vvc_alf_filter_chroma_32x12_10_avx2: 1443.7
vvc_alf_filter_chroma_32x16_10_c: 19336.0
vvc_alf_filter_chroma_32x16_10_avx2: 1876.0
vvc_alf_filter_chroma_32x20_10_c: 24153.0
vvc_alf_filter_chroma_32x20_10_avx2: 2323.0
vvc_alf_filter_chroma_32x24_10_c: 28917.7
vvc_alf_filter_chroma_32x24_10_avx2: 2806.2
vvc_alf_filter_chroma_32x28_10_c: 33738.7
vvc_alf_filter_chroma_32x28_10_avx2: 3454.0
vvc_alf_filter_chroma_32x32_10_c: 38531.5
vvc_alf_filter_chroma_32x32_10_avx2: 4103.2
vvc_alf_filter_luma_4x4_10_c: 1076.2
vvc_alf_filter_luma_4x4_10_avx2: 240.0
vvc_alf_filter_luma_4x8_10_c: 2113.2
vvc_alf_filter_luma_4x8_10_avx2: 454.5
vvc_alf_filter_luma_4x12_10_c: 3179.2
vvc_alf_filter_luma_4x12_10_avx2: 669.0
vvc_alf_filter_luma_4x16_10_c: 4146.5
vvc_alf_filter_luma_4x16_10_avx2: 885.0
vvc_alf_filter_luma_4x20_10_c: 5168.2
vvc_alf_filter_luma_4x20_10_avx2: 1106.0
vvc_alf_filter_luma_4x24_10_c: 6168.2
vvc_alf_filter_luma_4x24_10_avx2: 1357.0
vvc_alf_filter_luma_4x28_10_c: 7330.0
vvc_alf_filter_luma_4x28_10_avx2: 1539.5
vvc_alf_filter_luma_4x32_10_c: 8202.0
vvc_alf_filter_luma_4x32_10_avx2: 1803.7
vvc_alf_filter_luma_8x4_10_c: 2100.5
vvc_alf_filter_luma_8x4_10_avx2: 479.7
vvc_alf_filter_luma_8x8_10_c: 4079.5
vvc_alf_filter_luma_8x8_10_avx2: 898.2
vvc_alf_filter_luma_8x12_10_c: 6209.2
vvc_alf_filter_luma_8x12_10_avx2: 1328.7
vvc_alf_filter_luma_8x16_10_c: 8177.5
vvc_alf_filter_luma_8x16_10_avx2: 1765.0
vvc_alf_filter_luma_8x20_10_c: 10400.5
vvc_alf_filter_luma_8x20_10_avx2: 2196.2
vvc_alf_filter_luma_8x24_10_c: 12222.7
vvc_alf_filter_luma_8x24_10_avx2: 2626.0
vvc_alf_filter_luma_8x28_10_c: 14235.5
vvc_alf_filter_luma_8x28_10_avx2: 3065.2
vvc_alf_filter_luma_8x32_10_c: 16702.2
vvc_alf_filter_luma_8x32_10_avx2: 3494.2
vvc_alf_filter_luma_12x4_10_c: 3142.0
vvc_alf_filter_luma_12x4_10_avx2: 699.5
vvc_alf_filter_luma_12x8_10_c: 6093.2
vvc_alf_filter_luma_12x8_10_avx2: 1335.5
vvc_alf_filter_luma_12x12_10_c: 9098.7
vvc_alf_filter_luma_12x12_10_avx2: 1988.5
vvc_alf_filter_luma_12x16_10_c: 12237.5
vvc_alf_filter_luma_12x16_10_avx2: 2635.0
vvc_alf_filter_luma_12x20_10_c: 15240.7
vvc_alf_filter_luma_12x20_10_avx2: 3289.5
vvc_alf_filter_luma_12x24_10_c: 18262.0
vvc_alf_filter_luma_12x24_10_avx2: 3937.2
vvc_alf_filter_luma_12x28_10_c: 21283.0
vvc_alf_filter_luma_12x28_10_avx2: 4585.2
vvc_alf_filter_luma_12x32_10_c: 24299.7
vvc_alf_filter_luma_12x32_10_avx2: 5333.5
vvc_alf_filter_luma_16x4_10_c: 5729.7
vvc_alf_filter_luma_16x4_10_avx2: 446.2
vvc_alf_filter_luma_16x8_10_c: 8256.5
vvc_alf_filter_luma_16x8_10_avx2: 876.7
vvc_alf_filter_luma_16x12_10_c: 12178.7
vvc_alf_filter_luma_16x12_10_avx2: 1332.7
vvc_alf_filter_luma_16x16_10_c: 16262.5
vvc_alf_filter_luma_16x16_10_avx2: 1734.5
vvc_alf_filter_luma_16x20_10_c: 20263.7
vvc_alf_filter_luma_16x20_10_avx2: 2147.2
vvc_alf_filter_luma_16x24_10_c: 24789.7
vvc_alf_filter_luma_16x24_10_avx2: 2591.7
vvc_alf_filter_luma_16x28_10_c: 28894.5
vvc_alf_filter_luma_16x28_10_avx2: 3228.7
vvc_alf_filter_luma_16x32_10_c: 33360.0
vvc_alf_filter_luma_16x32_10_avx2: 4117.5
vvc_alf_filter_luma_20x4_10_c: 5076.0
vvc_alf_filter_luma_20x4_10_avx2: 674.2
vvc_alf_filter_luma_20x8_10_c: 10138.2
vvc_alf_filter_luma_20x8_10_avx2: 1323.5
vvc_alf_filter_luma_20x12_10_c: 15171.5
vvc_alf_filter_luma_20x12_10_avx2: 2026.5
vvc_alf_filter_luma_20x16_10_c: 20315.0
vvc_alf_filter_luma_20x16_10_avx2: 2611.0
vvc_alf_filter_luma_20x20_10_c: 25367.0
vvc_alf_filter_luma_20x20_10_avx2: 3259.5
vvc_alf_filter_luma_20x24_10_c: 30443.5
vvc_alf_filter_luma_20x24_10_avx2: 3898.5
vvc_alf_filter_luma_20x28_10_c: 35439.7
vvc_alf_filter_luma_20x28_10_avx2: 4645.5
vvc_alf_filter_luma_20x32_10_c: 40609.0
vvc_alf_filter_luma_20x32_10_avx2: 5849.0
vvc_alf_filter_luma_24x4_10_c: 6245.5
vvc_alf_filter_luma_24x4_10_avx2: 901.2
vvc_alf_filter_luma_24x8_10_c: 12166.7
vvc_alf_filter_luma_24x8_10_avx2: 1754.7
vvc_alf_filter_luma_24x12_10_c: 18223.2
vvc_alf_filter_luma_24x12_10_avx2: 2621.5
vvc_alf_filter_luma_24x16_10_c: 24287.2
vvc_alf_filter_luma_24x16_10_avx2: 3474.2
vvc_alf_filter_luma_24x20_10_c: 38042.2
vvc_alf_filter_luma_24x20_10_avx2: 4335.7
vvc_alf_filter_luma_24x24_10_c: 36462.0
vvc_alf_filter_luma_24x24_10_avx2: 5199.5
vvc_alf_filter_luma_24x28_10_c: 42502.7
vvc_alf_filter_luma_24x28_10_avx2: 6133.5
vvc_alf_filter_luma_24x32_10_c: 48675.5
vvc_alf_filter_luma_24x32_10_avx2: 7575.0
vvc_alf_filter_luma_28x4_10_c: 7101.5
vvc_alf_filter_luma_28x4_10_avx2: 1128.2
vvc_alf_filter_luma_28x8_10_c: 14185.7
vvc_alf_filter_luma_28x8_10_avx2: 2189.0
vvc_alf_filter_luma_28x12_10_c: 21278.7
vvc_alf_filter_luma_28x12_10_avx2: 3347.2
vvc_alf_filter_luma_28x16_10_c: 28338.2
vvc_alf_filter_luma_28x16_10_avx2: 4462.7
vvc_alf_filter_luma_28x20_10_c: 37076.7
vvc_alf_filter_luma_28x20_10_avx2: 5729.0
vvc_alf_filter_luma_28x24_10_c: 42612.2
vvc_alf_filter_luma_28x24_10_avx2: 6508.7
vvc_alf_filter_luma_28x28_10_c: 49686.0
vvc_alf_filter_luma_28x28_10_avx2: 7666.0
vvc_alf_filter_luma_28x32_10_c: 65345.2
vvc_alf_filter_luma_28x32_10_avx2: 9330.2
vvc_alf_filter_luma_32x4_10_c: 8329.5
vvc_alf_filter_luma_32x4_10_avx2: 887.7
vvc_alf_filter_luma_32x8_10_c: 16941.7
vvc_alf_filter_luma_32x8_10_avx2: 1736.0
vvc_alf_filter_luma_32x12_10_c: 73347.7
vvc_alf_filter_luma_32x12_10_avx2: 2584.2
vvc_alf_filter_luma_32x16_10_c: 32359.5
vvc_alf_filter_luma_32x16_10_avx2: 3442.7
vvc_alf_filter_luma_32x20_10_c: 40482.5
vvc_alf_filter_luma_32x20_10_avx2: 4318.5
vvc_alf_filter_luma_32x24_10_c: 48674.7
vvc_alf_filter_luma_32x24_10_avx2: 5174.2
vvc_alf_filter_luma_32x28_10_c: 56715.7
vvc_alf_filter_luma_32x28_10_avx2: 6124.5
vvc_alf_filter_luma_32x32_10_c: 66720.0
vvc_alf_filter_luma_32x32_10_avx2: 7577.2

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
---
 libavcodec/x86/vvc/Makefile      |   3 +-
 libavcodec/x86/vvc/vvc_alf.asm   | 441 +++++++++++++++++++++++++++++++
 libavcodec/x86/vvc/vvcdsp_init.c |  49 ++++
 3 files changed, 492 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/x86/vvc/vvc_alf.asm

Message ID	OSZP286MB217310EB25FEC0B1113B8CE4CA1B2@OSZP286MB2173.JPNP286.PROD.OUTLOOK.COM
State	New
Headers	show Delivered-To: ffmpegpatchwork2@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; From: toqsxw@outlook.com To: ffmpeg-devel@ffmpeg.org Date: Mon, 29 Apr 2024 23:24:41 +0800 Message-ID: <OSZP286MB217310EB25FEC0B1113B8CE4CA1B2@OSZP286MB2173.JPNP286.PROD.OUTLOOK.COM> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Subject: [FFmpeg-devel] [PATCH 1/4] avcodec/x86/vvc: add alf filter luma and chroma avx2 optimizations Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Cc: Wu Jianhua <toqsxw@outlook.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Series	[FFmpeg-devel,1/4] avcodec/x86/vvc: add alf filter luma and chroma avx2 optimizations \| expand [FFmpeg-devel,1/4] avcodec/x86/vvc: add alf filter luma and chroma avx2 optimizations [FFmpeg-devel,2/4] tests/checkasm: add checkasm_check_vvc_alf and check_alf_filter [FFmpeg-devel,3/4] avcodec/x86/vvc/vvc_alf: add alf classify avx2 optimizations [FFmpeg-devel,4/4] tests/checkasm/vvc_alf: add check_alf_classify

[FFmpeg-devel,1/4] avcodec/x86/vvc: add alf filter luma and chroma avx2 optimizations

Commit Message

Comments

Patch