From patchwork Sun Apr 26 19:44:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?FR=C3=89D=C3=89RIC_RECOULES?= X-Patchwork-Id: 19279 Delivered-To: andriy.gelman@gmail.com Received: by 2002:a25:3c87:0:0:0:0:0 with SMTP id j129csp2228069yba; Sun, 26 Apr 2020 12:45:19 -0700 (PDT) X-Google-Smtp-Source: APiQypInSNLrLADBoGd/fYg+VPfiH4iCs2WOEBv+pGU5pYQS3Pt7YSmXB37lWsWHKB/0LlKtAbpv X-Received: by 2002:a5d:5646:: with SMTP id j6mr24931342wrw.207.1587930319688; Sun, 26 Apr 2020 12:45:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587930319; cv=none; d=google.com; s=arc-20160816; b=DwmfsVKPGcgKiaJeo7KiZacm+O0mBztXNm9IaY19Z3S8oIMliyt0/2aZ3lsRJyK1Cq shJJDdJOusgtt52VfkU0d7c6IwS9OH59X//Ql9LajrrZE6+CKXjfMtFt87Ubwnov0u27 ywklFykCpJ8w+UnbbmvjOmocCkqa5jmfWx42+fcas7oD0+yZ4gBU/QliWLZEdwriWCzg oHqjfX6+Am1yPe8EMjJZiY246uYllSZSxv9Qw6JdU5j7oohC6ianTRR9MwYZVWeiy3hq aX6i+XxI8RSHjoXgnz3z9/zxvAoMigNMw2sQ1gfnTcRklnpVA2b6Cv1IzPfwRVtHEsKK xGKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:delivered-to; bh=sQWQWNqq9g3asC1iFzqvF6DfSeXWasFL+fYIgzKX2yw=; b=paF1LQLlzuLTHGLVRFYe8sJqAdmpwkIxtyB7CrQmWGqSYYtP2C/JGjdUQLlRDupDER 9QQ41KtxXcG7XecDJEebmINP5+z11j9OlniInww4FNfJmx1JTszVAYhFgHkgdXbLpDp1 DSmErSHbZq1q+OKo2IHl5eSy8BdB0orZaSmarlzTovksTQF5QyaZ8LXb5MDtKnkZQBsE wMrvfWp95KNaM4EPk5o4p4+z9sdCjOA1GanL7kJQkwWFC2FT+NTg9zYMYZkFn7Yj5gKD QTfX/a1/dGVqpR3hUD+qCyiZLcPW5xaaKREXlcdu/Hf0OMngxKppCVg/hWv/1HJkDUJB PEfA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z184si8764124wmg.199.2020.04.26.12.45.19; Sun, 26 Apr 2020 12:45:19 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6996268C6A6; Sun, 26 Apr 2020 22:45:15 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from smtp.smtpout.orange.fr (smtp03.smtpout.orange.fr [80.12.242.125]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3B2C868C3E1 for ; Sun, 26 Apr 2020 22:45:08 +0300 (EEST) Received: from is232189.intra.cea.fr ([46.193.2.18]) by mwinf5d50 with ME id XXkk220050PJwDa03Xl7HY; Sun, 26 Apr 2020 21:45:07 +0200 X-ME-Helo: is232189.intra.cea.fr X-ME-Auth: ZnJlZGVyaWMucmVjb3VsZXNAb3JhbmdlLmZy X-ME-Date: Sun, 26 Apr 2020 21:45:07 +0200 X-ME-IP: 46.193.2.18 From: frederic.recoules@univ-grenoble-alpes.fr To: ffmpeg-devel@ffmpeg.org Date: Sun, 26 Apr 2020 21:44:18 +0200 Message-Id: <20200422174918.7290-8-frederic.recoules@univ-grenoble-alpes.fr> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200422174918.7290-1-frederic.recoules@univ-grenoble-alpes.fr> References: <20200422174918.7290-1-frederic.recoules@univ-grenoble-alpes.fr> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 08/14] [inline assembly] add mmx clobbers to mpegvideoenc X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?utf-8?q?Fr=C3=A9d=C3=A9ric_Recoules?= Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: CG02pM1LSYhL Content-Length: 8292 From: Frédéric Recoules --- libavcodec/x86/mpegvideoenc_qns_template.c | 12 +++++--- libavcodec/x86/mpegvideoencdsp_init.c | 32 ++++++++++++++++++---- 2 files changed, 35 insertions(+), 9 deletions(-) diff --git a/libavcodec/x86/mpegvideoenc_qns_template.c b/libavcodec/x86/mpegvideoenc_qns_template.c index 882d486205..96325fd8f8 100644 --- a/libavcodec/x86/mpegvideoenc_qns_template.c +++ b/libavcodec/x86/mpegvideoenc_qns_template.c @@ -39,8 +39,8 @@ static int DEF(try_8x8basis)(int16_t rem[64], int16_t weight[64], int16_t basis[ av_assert2(FFABS(scale) < MAX_ABS); scale<<= 16 + SCALE_OFFSET - BASIS_SHIFT + RECON_SHIFT; - SET_RND(mm6); __asm__ volatile( + SET_RND_TPL(mm6) "pxor %%mm7, %%mm7 \n\t" "movd %4, %%mm5 \n\t" "punpcklwd %%mm5, %%mm5 \n\t" @@ -69,7 +69,9 @@ static int DEF(try_8x8basis)(int16_t rem[64], int16_t weight[64], int16_t basis[ "movd %%mm7, %0 \n\t" : "+r" (i) - : "r"(basis), "r"(rem), "r"(weight), "g"(scale) + : "r"(basis), "r"(rem), "r"(weight), "g"(scale) COMMA_SET_RND_IN + MMX_CLOBBERS_ONLY("mm0", "mm1", "mm5", "mm7" + SET_RND_CLOBBER(, "mm6")) ); return i; } @@ -80,8 +82,8 @@ static void DEF(add_8x8basis)(int16_t rem[64], int16_t basis[64], int scale) if(FFABS(scale) < MAX_ABS){ scale<<= 16 + SCALE_OFFSET - BASIS_SHIFT + RECON_SHIFT; - SET_RND(mm6); __asm__ volatile( + SET_RND_TPL(mm6) "movd %3, %%mm5 \n\t" "punpcklwd %%mm5, %%mm5 \n\t" "punpcklwd %%mm5, %%mm5 \n\t" @@ -99,7 +101,9 @@ static void DEF(add_8x8basis)(int16_t rem[64], int16_t basis[64], int scale) " jb 1b \n\t" : "+r" (i) - : "r"(basis), "r"(rem), "g"(scale) + : "r"(basis), "r"(rem), "g"(scale) COMMA_SET_RND_IN + MMX_CLOBBERS_ONLY("mm0", "mm1", "mm5" + SET_RND_CLOBBER(, "mm6")) ); }else{ for(i=0; i<8*8; i++){ diff --git a/libavcodec/x86/mpegvideoencdsp_init.c b/libavcodec/x86/mpegvideoencdsp_init.c index 532836cec9..8430ec62ea 100644 --- a/libavcodec/x86/mpegvideoencdsp_init.c +++ b/libavcodec/x86/mpegvideoencdsp_init.c @@ -51,17 +51,26 @@ int ff_pix_norm1_sse2(uint8_t *pix, int line_size); "psraw $1, " #y " \n\t" #define DEF(x) x ## _mmx #define SET_RND MOVQ_WONE +#define SET_RND_TPL MOVQ_WONE_TPL +#define COMMA_SET_RND_IN +#define SET_RND_CLOBBER(...) __VA_ARGS__ #define SCALE_OFFSET 1 #include "mpegvideoenc_qns_template.c" #undef DEF #undef SET_RND +#undef SET_RND_TPL +#undef COMMA_SET_RND_IN +#undef SET_RND_CLOBBER #undef SCALE_OFFSET #undef PMULHRW #define DEF(x) x ## _3dnow #define SET_RND(x) +#define SET_RND_TPL(x) +#define COMMA_SET_RND_IN +#define SET_RND_CLOBBER(...) #define SCALE_OFFSET 0 #define PMULHRW(x, y, s, o) \ "pmulhrw " #s ", " #x " \n\t" \ @@ -71,6 +80,9 @@ int ff_pix_norm1_sse2(uint8_t *pix, int line_size); #undef DEF #undef SET_RND +#undef SET_RND_TPL +#undef COMMA_SET_RND_IN +#undef SET_RND_CLOBBER #undef SCALE_OFFSET #undef PMULHRW @@ -78,6 +90,9 @@ int ff_pix_norm1_sse2(uint8_t *pix, int line_size); #undef PHADDD #define DEF(x) x ## _ssse3 #define SET_RND(x) +#define SET_RND_TPL(x) +#define COMMA_SET_RND_IN +#define SET_RND_CLOBBER(...) #define SCALE_OFFSET -1 #define PHADDD(a, t) \ @@ -93,6 +108,9 @@ int ff_pix_norm1_sse2(uint8_t *pix, int line_size); #undef DEF #undef SET_RND +#undef SET_RND_TPL +#undef COMMA_SET_RND_IN +#undef SET_RND_CLOBBER #undef SCALE_OFFSET #undef PMULHRW #undef PHADDD @@ -127,7 +145,8 @@ static void draw_edges_mmx(uint8_t *buf, int wrap, int width, int height, "jb 1b \n\t" : "+r" (ptr) : "r" ((x86_reg) wrap), "r" ((x86_reg) width), - "r" (ptr + wrap * height)); + "r" (ptr + wrap * height) + MMX_CLOBBERS_ONLY("mm0", "mm1") ); } else if (w == 16) { __asm__ volatile ( "1: \n\t" @@ -148,7 +167,7 @@ static void draw_edges_mmx(uint8_t *buf, int wrap, int width, int height, "jb 1b \n\t" : "+r"(ptr) : "r"((x86_reg)wrap), "r"((x86_reg)width), "r"(ptr + wrap * height) - ); + MMX_CLOBBERS_ONLY("mm0", "mm1") ); } else { av_assert1(w == 4); __asm__ volatile ( @@ -167,7 +186,8 @@ static void draw_edges_mmx(uint8_t *buf, int wrap, int width, int height, "jb 1b \n\t" : "+r" (ptr) : "r" ((x86_reg) wrap), "r" ((x86_reg) width), - "r" (ptr + wrap * height)); + "r" (ptr + wrap * height) + MMX_CLOBBERS_ONLY("mm0", "mm1") ); } /* top and bottom (and hopefully also the corners) */ @@ -187,7 +207,8 @@ static void draw_edges_mmx(uint8_t *buf, int wrap, int width, int height, : "+r" (ptr) : "r" ((x86_reg) buf - (x86_reg) ptr - w), "r" ((x86_reg) - wrap), "r" ((x86_reg) - wrap * 3), - "r" (ptr + width + 2 * w)); + "r" (ptr + width + 2 * w) + MMX_CLOBBERS_ONLY("mm0") ); } } @@ -207,7 +228,8 @@ static void draw_edges_mmx(uint8_t *buf, int wrap, int width, int height, : "+r" (ptr) : "r" ((x86_reg) last_line - (x86_reg) ptr - w), "r" ((x86_reg) wrap), "r" ((x86_reg) wrap * 3), - "r" (ptr + width + 2 * w)); + "r" (ptr + width + 2 * w) + MMX_CLOBBERS_ONLY("mm0") ); } } }