From patchwork Mon Aug 12 15:44:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: ckerchne X-Patchwork-Id: 14441 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id C2A134485A2 for ; Mon, 12 Aug 2019 18:44:30 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 903A968ABDF; Mon, 12 Aug 2019 18:44:30 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 503B768A89E for ; Mon, 12 Aug 2019 18:44:24 +0300 (EEST) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x7CFh5ll089952 for ; Mon, 12 Aug 2019 11:44:22 -0400 Received: from e14.ny.us.ibm.com (e14.ny.us.ibm.com [129.33.205.204]) by mx0a-001b2d01.pphosted.com with ESMTP id 2ub94rwysr-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 12 Aug 2019 11:44:22 -0400 Received: from localhost by e14.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 12 Aug 2019 16:44:21 +0100 Received: from b01cxnp22036.gho.pok.ibm.com (9.57.198.26) by e14.ny.us.ibm.com (146.89.104.201) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 12 Aug 2019 16:44:19 +0100 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x7CFiIFp14418434 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 12 Aug 2019 15:44:18 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8224AAC05B for ; Mon, 12 Aug 2019 15:44:18 +0000 (GMT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4691EAC05F for ; Mon, 12 Aug 2019 15:44:18 +0000 (GMT) Received: from ltc.linux.ibm.com (unknown [9.16.170.189]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP for ; Mon, 12 Aug 2019 15:44:18 +0000 (GMT) MIME-Version: 1.0 Date: Mon, 12 Aug 2019 11:44:19 -0400 From: ckerchne To: ffmpeg-devel@ffmpeg.org X-Sender: ckerchne@linux.vnet.ibm.com User-Agent: Roundcube Webmail/1.0.1 X-TM-AS-GCONF: 00 x-cbid: 19081215-0052-0000-0000-000003EA2ECE X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00011585; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000287; SDB=6.01245795; UDB=6.00657372; IPR=6.01027298; MB=3.00028147; MTD=3.00000008; XFM=3.00000015; UTC=2019-08-12 15:44:20 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19081215-0053-0000-0000-000062114CDA Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-08-12_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=916 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1908120175 Subject: [FFmpeg-devel] [PATCH V3] swscale/ppc/yuv2rgb_altivec: Fixes compiler bug - replace vec_lvsl/vec_perm with vec_xl X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" A bug exist with the gcc compilers for Power in versions 6.x and 7.x (verified with 6.3 and 7.4). It was fixed in version 8.x (verified with 8.3). I was using a Power 9 ppc64le machine for building and testing. It appears the compiler is generating the wrong code for little endian machines for the vec_lvsl/vec_perm instruction combos in some cases. If these instructions are replaced with vec_xl, the problem goes away for all versions of the compilers (plus it is a bit faster). Fixes #7124 and can be reproduced within the docker by following the instructions within the ticket. https://trac.ffmpeg.org/ticket/7124 diff --git a/libswscale/ppc/yuv2rgb_altivec.c b/libswscale/ppc/yuv2rgb_altivec.c index c1e2852adb..536545293d 100644 --- a/libswscale/ppc/yuv2rgb_altivec.c +++ b/libswscale/ppc/yuv2rgb_altivec.c @@ -305,9 +305,6 @@ static int altivec_ ## name(SwsContext *c, const unsigned char **in, \ vector signed short R1, G1, B1; \ vector unsigned char R, G, B; \ \ - const vector unsigned char *y1ivP, *y2ivP, *uivP, *vivP; \ - vector unsigned char align_perm; \ - \ vector signed short lCY = c->CY; \ vector signed short lOY = c->OY; \ vector signed short lCRV = c->CRV; \ @@ -338,26 +335,13 @@ static int altivec_ ## name(SwsContext *c, const unsigned char **in, \ vec_dstst(oute, (0x02000002 | (((w * 3 + 32) / 32) << 16)), 1); \ \ for (j = 0; j < w / 16; j++) { \ - y1ivP = (const vector unsigned char *) y1i; \ - y2ivP = (const vector unsigned char *) y2i; \ - uivP = (const vector unsigned char *) ui; \ - vivP = (const vector unsigned char *) vi; \ - \ - align_perm = vec_lvsl(0, y1i); \ - y0 = (vector unsigned char) \ - vec_perm(y1ivP[0], y1ivP[1], align_perm); \ + y0 = vec_xl(0, y1i); \ \ - align_perm = vec_lvsl(0, y2i); \ - y1 = (vector unsigned char) \ - vec_perm(y2ivP[0], y2ivP[1], align_perm); \ + y1 = vec_xl(0, y2i); \ \ - align_perm = vec_lvsl(0, ui); \ - u = (vector signed char) \ - vec_perm(uivP[0], uivP[1], align_perm); \ + u = (vector signed char) vec_xl(0, ui); \ \ - align_perm = vec_lvsl(0, vi); \ - v = (vector signed char) \ - vec_perm(vivP[0], vivP[1], align_perm); \ + v = (vector signed char) vec_xl(0, vi); \ \ u = (vector signed char) \ vec_sub(u, \