From patchwork Thu Jul 20 16:07:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: maxime taisant X-Patchwork-Id: 4390 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.1.76 with SMTP id 73csp2331880vsb; Thu, 20 Jul 2017 09:07:54 -0700 (PDT) X-Received: by 10.28.10.193 with SMTP id 184mr2863621wmk.83.1500566874569; Thu, 20 Jul 2017 09:07:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1500566874; cv=none; d=google.com; s=arc-20160816; b=ETZraRItiYLbC594izn/YtUe4HkYDQFgQOuO6GginucxffxrVawMdYxyU9FdFUZrKk eTMlUrB+uX6QIB8GwPy8c5Gv5g+upz4qJ7kQOakmr6z6UNDyLEAAb5dhoA69z7eX3o+S PeT1xEP6rOOxE/QMWN54xuyiph+NU6p4Ji+BQlLYifR5EStGbDtfgRVkMSTokWiRRa8P j7RTQg6O+WismKR0+rWXprjZgU4QoI4K1jLOZN8dRXxDvygFaBos5Kd/TUj1kpV3/bOB xXPdXrLIC94DsUucLvLf0nUmpQXXXSd19NKoTJ1j9ATE+6WXl69DIl7X0zaN7Qvs9kvO ZPUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:spamdiagnosticmetadata :spamdiagnosticoutput:content-language:accept-language:message-id :date:thread-index:thread-topic:to:from:delivered-to :arc-authentication-results; bh=cNwKgmLkuhdeGcXEV0W51Pqylt7oKDXuOzPV3OGX4+k=; b=DvruRiLTJji8ktgenCMhWk9wWZv4JoPnxV7XlKre3dsOYQ/Udrwd80odORAfhJqcdc Xs/2w2z42a3mp/BeA1wEq/zz4k9Jx6fxeaW0W1pf88bzsUzEoSNdZvo+gPViRpYf+NRe 8+J2NcuzJ0VVfDoT+0PNS6hWCdFjR7ene22NfiXHEZ9+DDXX9yUmaOjFVBiBcLwexDND sMn7xairqiGoRuMl0Ef4sqT+6B0z1KYPLGeZKTMMNJ8KTdUGis32Q59knPF370xLPe5O oRruvAMQKCEV3/6qW3cWDh2AuyiwUSAabHR2hOSfg2mN8qlXIdZfLUGhFrfNcSUcgjbQ 9KzQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l28si5002895wrb.95.2017.07.20.09.07.53; Thu, 20 Jul 2017 09:07:54 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DCBC4689980; Thu, 20 Jul 2017 19:07:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-oln040092072084.outbound.protection.outlook.com [40.92.72.84]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 06908688396 for ; Thu, 20 Jul 2017 19:07:35 +0300 (EEST) Received: from DB5EUR03FT043.eop-EUR03.prod.protection.outlook.com (10.152.20.53) by DB5EUR03HT067.eop-EUR03.prod.protection.outlook.com (10.152.21.239) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1240.9; Thu, 20 Jul 2017 16:07:43 +0000 Received: from HE1PR0901MB1594.eurprd09.prod.outlook.com (10.152.20.60) by DB5EUR03FT043.mail.protection.outlook.com (10.152.20.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1240.9 via Frontend Transport; Thu, 20 Jul 2017 16:07:43 +0000 Received: from HE1PR0901MB1594.eurprd09.prod.outlook.com ([fe80::b98b:e781:b263:3caa]) by HE1PR0901MB1594.eurprd09.prod.outlook.com ([fe80::b98b:e781:b263:3caa%17]) with mapi id 15.01.1261.024; Thu, 20 Jul 2017 16:07:42 +0000 From: maxime taisant To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH][RFC] JPEG2000: SSE optimisation for DWT decoding Thread-Index: AQHTAXJR5ahFgH6E3UG4VojenkgRGQ== Date: Thu, 20 Jul 2017 16:07:42 +0000 Message-ID: Accept-Language: fr-FR, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: ffmpeg.org; dkim=none (message not signed) header.d=none; ffmpeg.org; dmarc=none action=none header.from=hotmail.fr; x-incomingtopheadermarker: OriginalChecksum:85DB785F426A18A06D1067F0CB5563806EAED6C0F3FD65E35F5B5DB545EB9172; UpperCasedChecksum:93B11DD2CE4A06FB471DA5DF1B6271DE3B3AA31DFFE186BB053C6DF4A26343BC; SizeAsReceived:7210; Count:44 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [Sps0AAhkO3SZY9qDZyIuaeyPRBIvNaSC] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB5EUR03HT067; 7:CTxotC8TF8kb8hdtPh7LBndoc0d1818qmToJVEqwmKFYBVhYQUmXIPN0vFPIiVq7PvPKYV2JMWmg1sOPUHnx6Er/V92/NcyWJK5a+O+6iSC79m1w2Jb5OzOWveXkFh+GrRK+ODNBHkcMR8ux2vJ6z8Eki/tGtfnNo7bzx2ZidRsPPQJInCiuRh+taZxq+xrc9MMxEEVpdkK35kIW9oRRuR0uhmcxNk9+Qzbay4yw8e17s8PK5nACsOyfZeQ6xG18bP4U44aXaZE6VuI4cKPPawBWH7CT4KfzJ/rH2QEY+zZcnmaq5ZjSy1ptT2FN+nFmZt73nY0+GtGQIeDh/tm56X8aGWei3JLrSUR4/3v0EQtarHo1WjKjLC7iLnAnf7mnGrQKGT/kjRpN2CQ6p76iNfItB/6Cf+VUpm/2K5waMvxo+++dU/3qh0+IUfSGB9+gENVatry2q/jpns3OImzLTDx7K2BgkhzpCVpx2ZdNaz4gCM6qaOssmIzr10jYv+nFMdRl17LW5F8PdBdpb6zMJIK98uvQ3A3fVj+9y5xX++G+4Pv8ECfN1wpAgihooeHwuXClIj1HlRWxdoYXxuP4nSSO4/0F2cS9WhuHQ1KaE9uLnUshZmCwtvCMxqWknbMTVi0TPtAlEFGQ9hdGeM2G1XInuxgLurORj42H+v6FznQip0FjwtkNMe+tnDF+n+qSVP+UNYdHU4X7DwTNSvdSRtIKW9w/Y97N1WrYYntn5w2Rns0MPJKvJ7rJ1cSDUTfaYoupBfx+0olnIBiExORLew== x-incomingheadercount: 44 x-eopattributedmessage: 0 x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(7070007)(98901004); DIR:OUT; SFP:1901; SCL:1; SRVR:DB5EUR03HT067; H:HE1PR0901MB1594.eurprd09.prod.outlook.com; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: efc3e11e-0cda-4679-20a3-08d4cf897339 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(300000503095)(300135400095)(201702061074)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031324274)(2017031323274)(201702181274)(2017031322350)(1603101448)(1601125374)(1701031045)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:DB5EUR03HT067; x-ms-exchange-slblob-mailprops: 7MJMDUNTCtyR/9nh6KRMj1F7Qsz0hgEftEWu7QTCPYMfRYK41OUz9onde8GzTahR3Y8v1g6P/D6kcgXYVZGqMO0gP5TQY3qbu5BhAnb9AUcm0S07e6QvAAA8xIEB9tHn7bMdd8+bisAlUCFukQeSzca0ZY2G7jqnO+8k4lJLZCN9iDhS483m4l/hyaqYKXQ/0uXpLnOsad5rKtNp4Njd3j254fD8LQICt7CmaxPWqVuiFaA0m7O07fukJud6R57MRWU2yn9/fQvyuxTnKzUFSGs2OlqvE1TjW4Frw3t95vzl7rNCO6DmODh85wsI4izQXIQZhZ753mRVeaSoeAammRyI0hBCFZtXykId1w88aZTJgxnDqtsEcz3csitZAItXjavYk4t46UfYjUi6em455Nof/NSOfCke1XzDWb6Rr+Vr7Q4E3cT2pbvbO1Ce3ayFAbKCTur+0AGKqwYvPqVFr+ECAb1ydcRlvHA6fvmG05yChmznZ2Nn/2eEoSvB2LkXrMGrTs//4hYDObx+qgy7rPUmSXxNldBuTEPHJ1+EhCx7M0QxzJw8x5WyYI+RCsOQtvfSIA0R4oeOMBQc2qvGGharhDRwGFtTCj0yv4EpmiuCSQjP4A8xFQ7QE5EwWh069drYAhec70Ahnwm2cYdi8vWqhyrU8mVrRt+BWCcSpx4Ez23twZQORwCPUMzZVG0MZJJ//54aiaOSkpbn4woLxolT3bBA74JrVQiIaeNPC2BdL5gBrh2Jmw== x-ms-traffictypediagnostic: DB5EUR03HT067: x-exchange-antispam-report-test: UriScan:(236129657087228)(170290950945281)(209349559609743); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(444000031); SRVR:DB5EUR03HT067; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DB5EUR03HT067; x-forefront-prvs: 0374433C81 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Jul 2017 16:07:42.4231 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5EUR03HT067 Subject: [FFmpeg-devel] [PATCH][RFC] JPEG2000: SSE optimisation for DWT decoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: maxime taisant Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Maxime Taisant Hi, I am currently working on SSE optimisations for the dwt functions used to decode JPEG2000. For the moment, I have only managed to produce a SSE-optimized version of the sr_1d97_float function (with relatively good results). I would like to have some comments on my work so far, to know if I am on the right track or if there is some parts that I need to improve or modify. Thank you. --- libavcodec/jpeg2000dwt.c | 5 +- libavcodec/jpeg2000dwt.h | 2 + libavcodec/x86/jpeg2000dsp.asm | 268 ++++++++++++++++++++++++++++++++++++++ libavcodec/x86/jpeg2000dsp_init.c | 3 + 4 files changed, 277 insertions(+), 1 deletion(-) diff --git a/libavcodec/jpeg2000dwt.c b/libavcodec/jpeg2000dwt.c index 55dd5e89b5..b2a952aa29 100644 --- a/libavcodec/jpeg2000dwt.c +++ b/libavcodec/jpeg2000dwt.c @@ -425,7 +425,10 @@ static void dwt_decode97_float(DWTContext *s, float *t) for (i = 1 - mh; i < lh; i += 2, j++) l[i] = data[w * lp + j]; - sr_1d97_float(line, mh, mh + lh); + if (ARCH_X86) + ff_sr_1d97_float_sse(line, mh, mh + lh); + else + sr_1d97_float(line, mh, mh + lh); for (i = 0; i < lh; i++) data[w * lp + i] = l[i]; diff --git a/libavcodec/jpeg2000dwt.h b/libavcodec/jpeg2000dwt.h index 718d183ac1..59dec14478 100644 --- a/libavcodec/jpeg2000dwt.h +++ b/libavcodec/jpeg2000dwt.h @@ -65,4 +65,6 @@ int ff_dwt_decode(DWTContext *s, void *t); void ff_dwt_destroy(DWTContext *s); +void ff_sr_1d97_float_sse(float *p, int i0, int i1); + #endif /* AVCODEC_JPEG2000DWT_H */ diff --git a/libavcodec/x86/jpeg2000dsp.asm b/libavcodec/x86/jpeg2000dsp.asm index 56b5fbd606..dabfb914b8 100644 --- a/libavcodec/x86/jpeg2000dsp.asm +++ b/libavcodec/x86/jpeg2000dsp.asm @@ -29,6 +29,16 @@ pf_ict1: times 8 dd 0.34413 pf_ict2: times 8 dd 0.71414 pf_ict3: times 8 dd 1.772 +F_LFTG_K: dd 1.230174104914001 +F_LFTG_X: dd 0.812893066115961 + +F_LFTG_ALPHA: times 8 dd 1.586134342059924 +F_LFTG_BETA: times 8 dd 0.052980118572961 +F_LFTG_GAMMA: times 8 dd 0.882911075530934 +F_LFTG_DELTA: times 8 dd 0.443506852043971 + +TWO: dd 2.0 + SECTION .text ;*********************************************************************** @@ -142,3 +152,261 @@ RCT_INT INIT_YMM avx2 RCT_INT %endif + +;*********************************************************************** +; ff_sr_ld97_float_(float *p, int i0, int i1) +;*********************************************************************** +%macro SR1D97FLOAT 0 +cglobal sr_1d97_float, 3, 5, 10, p, i0, i1, tmp0, tmp1 + mov tmp0q, i0q + mov tmp1q, i1q + add tmp0q, 1 + cmp tmp1q, tmp0q + jg .extend + sub tmp0q, 2 + jnz .else + movss m0, [pq+4] + movss m1, [F_LFTG_K] + movss m2, [TWO] + divss m1, m2 + mulss m0, m1 + movss [pq+4], m0 + jmp .end + +.else: + movss m0, [pq] + movss m1, [F_LFTG_X] + mulss m0, m1 + movss [pq], m0 + jmp .end + +.extend: + shl i0d, 2 + shl i1d, 2 + mov tmp0q, i0q + mov tmp1q, i1q + movups m0, [pq+tmp0q+4] + shufps m0, m0, 0x1B + movups [pq+tmp0q-16], m0 + movups m0, [pq+tmp1q-20] + shufps m0, m0, 0x1B + movups [pq+tmp1q], m0 + + movups m3, [F_LFTG_DELTA] + mov tmp0q, i0q + mov tmp1q, i1q + shr tmp0q, 1 + sub tmp0q, 4 + shr tmp1q, 1 + add tmp1q, 8 + cmp tmp0q, tmp1q + jge .beginloop2 +.loop1: + add tmp0q, 12 + cmp tmp0q, tmp1q + jge .endloop1 + + movups m0, [pq+2*tmp0q-28] + movups m4, [pq+2*tmp0q-12] + movups m1, m0 + shufps m0, m4, 0xDD + shufps m1, m4, 0x88 + movups m2, [pq+2*tmp0q-24] + movups m5, [pq+2*tmp0q-8] + shufps m2, m5, 0xDD + addps m2, m1 + mulps m2, m3 + subps m0, m2 + movups m4, m1 + shufps m1, m0, 0x44 + shufps m1, m1, 0xD8 + shufps m4, m0, 0xEE + shufps m4, m4, 0xD8 + movups [pq+2*tmp0q-28], m1 + movups [pq+2*tmp0q-12], m4 + + add tmp0q, 4 + cmp tmp0q, tmp1q + jge .beginloop2 + jmp .loop1 + +.endloop1: + sub tmp0q, 12 +.littleloop1: + movss m0, [pq+2*tmp0q] + movss m1, [pq+2*tmp0q-4] + movss m2, [pq+2*tmp0q+4] + addss m1, m2 + mulss m1, m3 + subss m0, m1 + movss [pq+2*tmp0q], m0 + add tmp0q, 4 + cmp tmp0q, tmp1q + jl .littleloop1 + +.beginloop2: + movups m3, [F_LFTG_GAMMA] + mov tmp0q, i0q + mov tmp1q, i1q + shr tmp0q, 1 + sub tmp0q, 4 + shr tmp1q, 1 + add tmp1q, 4 + cmp tmp0q, tmp1q + jge .beginloop3 +.loop2: + add tmp0q, 12 + cmp tmp0q, tmp1q + jge .endloop2 + + movups m0, [pq+2*tmp0q-24] + movups m4, [pq+2*tmp0q-8] + movups m1, m0 + shufps m0, m4, 0xDD + shufps m1, m4, 0x88 + movups m2, [pq+2*tmp0q-20] + movups m5, [pq+2*tmp0q-4] + shufps m2, m5, 0xDD + addps m2, m1 + mulps m2, m3 + subps m0, m2 + movups m4, m1 + shufps m1, m0, 0x44 + shufps m1, m1, 0xD8 + shufps m4, m0, 0xEE + shufps m4, m4, 0xD8 + movups [pq+2*tmp0q-24], m1 + movups [pq+2*tmp0q-8], m4 + + add tmp0q, 4 + cmp tmp0q, tmp1q + jge .beginloop3 + jmp .loop2 + +.endloop2: + sub tmp0q, 12 +.littleloop2: + movss m0, [pq+2*tmp0q+4] + movss m1, [pq+2*tmp0q] + movss m2, [pq+2*tmp0q+8] + addss m1, m2 + mulss m1, m3 + subss m0, m1 + movss [pq+2*tmp0q+4], m0 + add tmp0q, 4 + cmp tmp0q, tmp1q + jl .littleloop2 + +.beginloop3: + movups m3, [F_LFTG_BETA] + mov tmp0q, i0q + mov tmp1q, i1q + shr tmp0q, 1 + sub tmp0q, 4 + shr tmp1q, 1 + add tmp1q, 8 + cmp tmp0q, tmp1q + jge .beginloop4 +.loop3: + add tmp0q, 12 + cmp tmp0q, tmp1q + jge .endloop3 + + movups m0, [pq+2*tmp0q-28] + movups m4, [pq+2*tmp0q-12] + movups m1, m0 + shufps m0, m4, 0xDD + shufps m1, m4, 0x88 + movups m2, [pq+2*tmp0q-24] + movups m5, [pq+2*tmp0q-8] + shufps m2, m5, 0xDD + addps m2, m1 + mulps m2, m3 + addps m0, m2 + movups m4, m1 + shufps m1, m0, 0x44 + shufps m1, m1, 0xD8 + shufps m4, m0, 0xEE + shufps m4, m4, 0xD8 + movups [pq+2*tmp0q-28], m1 + movups [pq+2*tmp0q-12], m4 + + add tmp0q, 4 + cmp tmp0q, tmp1q + jge .beginloop4 + jmp .loop3 + +.endloop3: + sub tmp0q, 12 +.littleloop3: + movss m0, [pq+2*tmp0q] + movss m1, [pq+2*tmp0q-4] + movss m2, [pq+2*tmp0q+4] + addss m1, m2 + mulss m1, m3 + addss m0, m1 + movss [pq+2*tmp0q], m0 + add tmp0q, 4 + cmp tmp0q, tmp1q + jl .littleloop3 + +.beginloop4: + movups m3, [F_LFTG_ALPHA] + mov tmp0q, i0q + mov tmp1q, i1q + shr tmp0q, 1 + sub tmp0q, 4 + shr tmp1q, 1 + add tmp1q, 4 + cmp tmp0q, tmp1q + jge .end +.loop4: + add tmp0q, 12 + cmp tmp0q, tmp1q + jge .endloop4 + + movups m0, [pq+2*tmp0q-24] + movups m4, [pq+2*tmp0q-8] + movups m1, m0 + shufps m0, m4, 0xDD + shufps m1, m4, 0x88 + movups m2, [pq+2*tmp0q-20] + movups m5, [pq+2*tmp0q-4] + shufps m2, m5, 0xDD + addps m2, m1 + mulps m2, m3 + addps m0, m2 + movups m4, m1 + shufps m1, m0, 0x44 + shufps m1, m1, 0xD8 + shufps m4, m0, 0xEE + shufps m4, m4, 0xD8 + movups [pq+2*tmp0q-24], m1 + movups [pq+2*tmp0q-8], m4 + + add tmp0q, 4 + cmp tmp0q, tmp1q + jge .end + jmp .loop4 + +.endloop4: + sub tmp0q, 12 +.littleloop4: + movss m0, [pq+2*tmp0q+4] + movss m1, [pq+2*tmp0q] + movss m2, [pq+2*tmp0q+8] + addss m1, m2 + mulss m1, m3 + addss m0, m1 + movss [pq+2*tmp0q+4], m0 + add tmp0q, 4 + cmp tmp0q, tmp1q + jl .littleloop4 + +.end: + REP_RET +%endmacro + +INIT_XMM sse +SR1D97FLOAT + diff --git a/libavcodec/x86/jpeg2000dsp_init.c b/libavcodec/x86/jpeg2000dsp_init.c index baa81383ea..3d3735c43a 100644 --- a/libavcodec/x86/jpeg2000dsp_init.c +++ b/libavcodec/x86/jpeg2000dsp_init.c @@ -23,12 +23,15 @@ #include "libavutil/cpu.h" #include "libavutil/x86/cpu.h" #include "libavcodec/jpeg2000dsp.h" +#include "libavcodec/jpeg2000dwt.h" void ff_ict_float_sse(void *src0, void *src1, void *src2, int csize); void ff_ict_float_avx(void *src0, void *src1, void *src2, int csize); void ff_rct_int_sse2 (void *src0, void *src1, void *src2, int csize); void ff_rct_int_avx2 (void *src0, void *src1, void *src2, int csize); +void ff_sr_1d97_float_sse(float *p, int i0, int i1); + av_cold void ff_jpeg2000dsp_init_x86(Jpeg2000DSPContext *c) { int cpu_flags = av_get_cpu_flags();