From patchwork Sun Aug 18 20:13:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramiro Polla X-Patchwork-Id: 51075 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:b6ca:0:b0:48e:c0f8:d0de with SMTP id s10csp1569831vqj; Sun, 18 Aug 2024 13:14:31 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXY/j0c2vrHwQdIdnN7zr2mVjxlFrFplWtuL9wx5dgSKoYooQdaVfxIgupLX0ilZe5X5To8HcfNu6bCj2ed+zCA@gmail.com X-Google-Smtp-Source: AGHT+IHHk6BCPRW4qYvBicGnY3ASvy7nSNSGapCFgJcvUIh3jkuMrthCwpPi1vGJ3QZOmIaECn7T X-Received: by 2002:a2e:be07:0:b0:2ef:307d:1f with SMTP id 38308e7fff4ca-2f3be4788ebmr37113921fa.1.1724012071310; Sun, 18 Aug 2024 13:14:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1724012071; cv=none; d=google.com; s=arc-20160816; b=bZcisna3QFXUPkRdZ4POnAj+u0Ko1qjnJHVY6ynwqlBhIn49BVF3nBEEUAcfzv6Oo7 aOzyajzVJrqix4p2G2PmoNpIhJPweODcXAEZ0M/7ttXFcMEKUyjRqDrQTPNiR9A9hv59 GZihq09TZd0oK31tThz/hkuEqTOEg6k5vkSP7SFOFmFHLjYEBpJ50uj2nA269Cl6pi6F n0Z3/v2Ma5+MhofjnD44ga7+XqddNv2ZzwfdmA3NoQi1tQJWYPKzkNkXnlxnhOIejJFH 4yV2qELohN3oevubx4HjdXiDS+JBnRHCaNWvUy8OVJc253jrOKFeVyu6yF4mo1IhFcKw deqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=l/ecNPoSshc7Qz6TLx60VAHeja18HDfrFu+0vDvi0k4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=Ae/nF45CiR1fg2TBOfg/qzPyidYjmJUqDAKkjAqtp6yxdCKAb6GH1goxgQUt8f8Q1f 4c2avSgXfdA/N1KJ7JCsHfvPQe1j9SNvlzMxGqU+HS6Kukt7K1nIsJHrCzcJSIJZsGsv GDLB/cj8Bn9cOT3GbC986ul63hqZT18ZnFgm4++R4C+N9QWzhQRt82suCblXs759bID5 McuNOW+RpG5jy521fPkUF5KHq6Uo4Mg4Ts+Tr+6kfougv3Tlm2v5JRziQPK1nczp32Ye DvKrL0zL0eTMfw5R2z7d4ZOH2CFTPeeW6ZannXQKtVvbJxElf9ykInM1GAcL/u/9jGFT SJwQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="T7fFMeq/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f3b76eb0b3si21359881fa.295.2024.08.18.13.14.30; Sun, 18 Aug 2024 13:14:31 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="T7fFMeq/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 744F268DDEF; Sun, 18 Aug 2024 23:13:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 041D468DDD9 for ; Sun, 18 Aug 2024 23:13:37 +0300 (EEST) Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-371b97cfd6fso480327f8f.2 for ; Sun, 18 Aug 2024 13:13:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724012016; x=1724616816; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=TYiv5X76ggupCD0RSBLFPQyBLhv2Ai4JDKJoyC1abo0=; b=T7fFMeq/jMoQycpdCH8OkW0Un2hwRErJFAL8MuGKdQV3lLMuh8KxNh9UlJZ5LHbvoj euPW3JgDHkgZ0Q0TWV4G7+BOcQ1hP1sEx61cmeHRxBVz90rdKm1BfDZBPGSoNOq8KLhl BfoqBCFZWsRibVfgbGkUA2kzURFxmXFSVSZzelbD1lf2KWYvM4TtmMwAJzene5viDhxD IkmnjmREeopSrLmCNsRMgc4WSmrwQgYGHsVbYy4cFS6TFw70pWi1PKd4v00I5xLyQGgg BxK73bnAqkEFEfaQo0stXNPzdiSGq2tDzcbV6XKyKGGt1DBcrkr1Eb45FJ1lsFUucUEo OJJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724012016; x=1724616816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TYiv5X76ggupCD0RSBLFPQyBLhv2Ai4JDKJoyC1abo0=; b=KWqE+S4e23iSRreXq9mmA8LBe/6VaDEaQVf4tKjk842GqrIGPJDGrZjwRq82IPcgJf sZgrYwWUe1163Y5Z0+woIKcYOi9P57XdYHfXmL9cr34xpO1EUl0Nk4/tTinXWlviGCZp R85oL9FlwZNzAgLqSH7pxaf5G+S+XfCukMQyClfo8674LIk6o/RWE27sewtgXkmoD/Kd T18BgL3NlCvScXnE+ESWbei+qfrXwjfDPA7I4sEAvGLvebLMG8VbUvsqcRsV+jW3WqG9 6oqI1HdXBDOcWgxL+NdHjY7gaKrBP4eU49jcB9RYnGpNOzzZ/BCrfY7Xf/aau1L7ztC1 p0ug== X-Gm-Message-State: AOJu0YySxL2sv+QIRLv2TkjdcFNNLofJKuQciIL8mViL4ec5G/RHOtfs AYluF+SwDgWFiyzz+6ghkKXGI5LlfuVuuHx9kkdGPIQgAnx2HI2YmuW4s2ek X-Received: by 2002:a5d:54d1:0:b0:371:8bc9:1682 with SMTP id ffacd0b85a97d-3719465ff4cmr5099457f8f.33.1724012015935; Sun, 18 Aug 2024 13:13:35 -0700 (PDT) Received: from localhost.localdomain (205.66-130-109.adsl-dyn.isp.belgacom.be. [109.130.66.205]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-37189896c5bsm8739931f8f.80.2024.08.18.13.13.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Aug 2024 13:13:35 -0700 (PDT) From: Ramiro Polla To: ffmpeg-devel@ffmpeg.org Date: Sun, 18 Aug 2024 22:13:25 +0200 Message-Id: <20240818201326.100492-6-ramiro.polla@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240818201326.100492-1-ramiro.polla@gmail.com> References: <20240818201326.100492-1-ramiro.polla@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 6/7] avcodec/x86/mpegvideoencdsp: speed up draw_edges_mmx by using memcpy() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 2Ds3plcAcenM The mmx memory copy code is not nearly as efficient as memcpy(), which would make draw_edges_mmx much slower than draw_edges_8_c. Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz: before after draw_edges_8_1724_4_mmx: 8697.2 8739.6 ( 1.00x) draw_edges_8_1724_8_mmx: 10439.0 10548.4 ( 0.99x) draw_edges_8_1724_16_mmx: 10687.5 10876.1 ( 0.98x) draw_edges_128_407_4_mmx: 4252.5 3562.4 ( 1.19x) draw_edges_128_407_8_mmx: 4561.7 3868.1 ( 1.18x) draw_edges_128_407_16_mmx: 5505.7 4533.4 ( 1.21x) draw_edges_1080_31_4_mmx: 1667.7 560.4 ( 2.98x) draw_edges_1080_31_8_mmx: 2804.5 1232.1 ( 2.28x) draw_edges_1080_31_16_mmx: 12478.5 3829.6 ( 3.26x) draw_edges_1920_4_4_mmx: 2596.2 812.4 ( 3.20x) draw_edges_1920_4_8_mmx: 8056.0 2964.4 ( 2.72x) draw_edges_1920_4_16_mmx: 24628.7 6387.9 ( 3.86x) --- libavcodec/x86/mpegvideoencdsp_init.c | 51 ++++++--------------------- 1 file changed, 11 insertions(+), 40 deletions(-) diff --git a/libavcodec/x86/mpegvideoencdsp_init.c b/libavcodec/x86/mpegvideoencdsp_init.c index 503548e668..c30fa91175 100644 --- a/libavcodec/x86/mpegvideoencdsp_init.c +++ b/libavcodec/x86/mpegvideoencdsp_init.c @@ -166,46 +166,17 @@ static void draw_edges_mmx(uint8_t *buf, int wrap, int width, int height, "r" (ptr + wrap * height)); } - /* top and bottom (and hopefully also the corners) */ - if (sides & EDGE_TOP) { - for (i = 0; i < h; i += 4) { - ptr = buf - (i + 1) * wrap - w; - __asm__ volatile ( - "1: \n\t" - "movq (%1, %0), %%mm0 \n\t" - "movq %%mm0, (%0) \n\t" - "movq %%mm0, (%0, %2) \n\t" - "movq %%mm0, (%0, %2, 2) \n\t" - "movq %%mm0, (%0, %3) \n\t" - "add $8, %0 \n\t" - "cmp %4, %0 \n\t" - "jb 1b \n\t" - : "+r" (ptr) - : "r" ((x86_reg) buf - (x86_reg) ptr - w), - "r" ((x86_reg) - wrap), "r" ((x86_reg) - wrap * 3), - "r" (ptr + width + 2 * w)); - } - } - - if (sides & EDGE_BOTTOM) { - for (i = 0; i < h; i += 4) { - ptr = last_line + (i + 1) * wrap - w; - __asm__ volatile ( - "1: \n\t" - "movq (%1, %0), %%mm0 \n\t" - "movq %%mm0, (%0) \n\t" - "movq %%mm0, (%0, %2) \n\t" - "movq %%mm0, (%0, %2, 2) \n\t" - "movq %%mm0, (%0, %3) \n\t" - "add $8, %0 \n\t" - "cmp %4, %0 \n\t" - "jb 1b \n\t" - : "+r" (ptr) - : "r" ((x86_reg) last_line - (x86_reg) ptr - w), - "r" ((x86_reg) wrap), "r" ((x86_reg) wrap * 3), - "r" (ptr + width + 2 * w)); - } - } + /* top and bottom + corners */ + buf -= w; + last_line = buf + (height - 1) * wrap; + if (sides & EDGE_TOP) + for (i = 0; i < h; i++) + // top + memcpy(buf - (i + 1) * wrap, buf, width + w + w); + if (sides & EDGE_BOTTOM) + for (i = 0; i < h; i++) + // bottom + memcpy(last_line + (i + 1) * wrap, last_line, width + w + w); } #endif /* HAVE_INLINE_ASM */