mbox

[FFmpeg-devel,RFC,0/7] RISC-V V vector length dealings

Message ID 2846208.e9J7NaK4W3@basile.remlab.net
State New
Headers show

Pull-request

https://git.remlab.net/git/ffmpeg.git rvv-vlen

Message

Rémi Denis-Courmont Sept. 27, 2022, 8:04 p.m. UTC
Hello,

As a general rule, scalable vector instruction sets should be used with the
largest possible vector length. There are however a number of operations that
just happen with a fixed size, and this patchset exhibits the simplest one I
could find. The proper RISC-V Vector extension guarantees a minimum vector
length of 128 bits. In theory though the underlying specification also allows
for (embedded designs with) only 32 or 64 bits per vector.

The RFC is how should this be dealt with? The simplest possibibility is to
simply assume 128 bits. Indeed, I am not aware of any actual or proposed
processor IP with shorter-than-128-bit vectors, even less so, one upon which
FFmpeg would be used. For what it is worth, ARM SVE guarantees a minimum of
128 bits per vector too. In that case, we can drop the first patch, and
simplify the following ones.

Another option is to expose the vector length via the CPU flags as proposed
earlier by Lynne. Though this is unorthodox the vector length is not a proper
flag. The vector length can readily be retrieved from a read-only unprivileged
CSR, and this patchset instead introduces a simple inline wrapper therefore.
The downside of this approach is that this is nominally undefined behaviour,
and typically will raise a SIGILL, if the processor does not support the
vector extension.

However I want to emphasise that the same problem also exists for DSP
functions operating on more than 128 bits. For instance, the inner loop of the
Opus post-filter works with 160 bits. So then, we cannot simply ignore the
variability between processors. RISC-V has existing designs with 128-bit
vectors and announced designs with 256-bit and 512-bit vectors.

I don't personally have any strong preference for or against either of the CPU
flags or the dedicated platform-specific helper approaches. And besides, I do
not have the pretense to decide on FFmpeg internal design. But I doubt that
this concern can be ignored entirely.

The following changes since commit 59cb0bd23d61f6ea3bfd86558346e2720aba7f06:

  avfilter/vf_extractplanes: add missing break; statement (2022-09-27 19:35:49 +0200)

are available in the Git repository at:

  https://git.remlab.net/git/ffmpeg.git rvv-vlen

for you to fetch changes up to 1b80effc9798c164fd7c1953174ccf4a66298aff:

  lavc/pixblockdsp: RISC-V diff_pixels & diff_pixels_unaligned (2022-09-27 22:19:19 +0300)

----------------------------------------------------------------
Rémi Denis-Courmont (7):
      lavu/riscv: helper to read the vector length
      lavc/idctdsp: RISC-V V put_pixels_clamped function
      lavc/idctdsp: RISC-V V add_pixels_clamped function
      lavc/idctdsp: RISC-V V put_signed_pixels_clamped function
      lavc/pixblockdsp: RISC-V V 8-bit get_pixels & get_pixels_unaligned
      lavc/pixblockdsp: RISC-V V 16-bit get_pixels & get_pixels_unaligned
      lavc/pixblockdsp: RISC-V diff_pixels & diff_pixels_unaligned

 libavcodec/idctdsp.c                |  2 +
 libavcodec/idctdsp.h                |  2 +
 libavcodec/riscv/Makefile           |  3 ++
 libavcodec/riscv/idctdsp_init.c     | 48 ++++++++++++++++++++++
 libavcodec/riscv/idctdsp_rvv.S      | 80 +++++++++++++++++++++++++++++++++++++
 libavcodec/riscv/pixblockdsp_init.c | 20 ++++++++++
 libavcodec/riscv/pixblockdsp_rvv.S  | 60 ++++++++++++++++++++++++++++
 libavutil/riscv/cpu.h               | 45 +++++++++++++++++++++
 8 files changed, 260 insertions(+)
 create mode 100644 libavcodec/riscv/idctdsp_init.c
 create mode 100644 libavcodec/riscv/idctdsp_rvv.S
 create mode 100644 libavcodec/riscv/pixblockdsp_rvv.S
 create mode 100644 libavutil/riscv/cpu.h

Comments

Lynne Sept. 27, 2022, 9:32 p.m. UTC | #1
Sep 27, 2022, 22:04 by remi@remlab.net:

> Hello,
>
> As a general rule, scalable vector instruction sets should be used with the
> largest possible vector length. There are however a number of operations that
> just happen with a fixed size, and this patchset exhibits the simplest one I
> could find. The proper RISC-V Vector extension guarantees a minimum vector
> length of 128 bits. In theory though the underlying specification also allows
> for (embedded designs with) only 32 or 64 bits per vector.
>
> The RFC is how should this be dealt with? The simplest possibibility is to
> simply assume 128 bits. Indeed, I am not aware of any actual or proposed
> processor IP with shorter-than-128-bit vectors, even less so, one upon which
> FFmpeg would be used. For what it is worth, ARM SVE guarantees a minimum of
> 128 bits per vector too. In that case, we can drop the first patch, and
> simplify the following ones.
>
> Another option is to expose the vector length via the CPU flags as proposed
> earlier by Lynne. Though this is unorthodox the vector length is not a proper
> flag. The vector length can readily be retrieved from a read-only unprivileged
> CSR, and this patchset instead introduces a simple inline wrapper therefore.
> The downside of this approach is that this is nominally undefined behaviour,
> and typically will raise a SIGILL, if the processor does not support the
> vector extension.
>

Where's the undefined behavior? If it's guarded by an if, the
function will return the maximum length. I don't mind that it's not
a cpuflag.
Rémi Denis-Courmont Sept. 28, 2022, 6:03 a.m. UTC | #2
Le 28 septembre 2022 00:32:42 GMT+03:00, Lynne <dev@lynne.ee> a écrit :
>Sep 27, 2022, 22:04 by remi@remlab.net:
>
>> Hello,
>>
>> As a general rule, scalable vector instruction sets should be used with the
>> largest possible vector length. There are however a number of operations that
>> just happen with a fixed size, and this patchset exhibits the simplest one I
>> could find. The proper RISC-V Vector extension guarantees a minimum vector
>> length of 128 bits. In theory though the underlying specification also allows
>> for (embedded designs with) only 32 or 64 bits per vector.
>>
>> The RFC is how should this be dealt with? The simplest possibibility is to
>> simply assume 128 bits. Indeed, I am not aware of any actual or proposed
>> processor IP with shorter-than-128-bit vectors, even less so, one upon which
>> FFmpeg would be used. For what it is worth, ARM SVE guarantees a minimum of
>> 128 bits per vector too. In that case, we can drop the first patch, and
>> simplify the following ones.
>>
>> Another option is to expose the vector length via the CPU flags as proposed
>> earlier by Lynne. Though this is unorthodox the vector length is not a proper
>> flag. The vector length can readily be retrieved from a read-only unprivileged
>> CSR, and this patchset instead introduces a simple inline wrapper therefore.
>> The downside of this approach is that this is nominally undefined behaviour,
>> and typically will raise a SIGILL, if the processor does not support the
>> vector extension.
>>
>
>Where's the undefined behavior? If it's guarded by an if, the
>function will return the maximum length. I don't mind that it's not
>a cpuflag.
>
>_______________________________________________
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>

The UB occurs if the helper is called on a CPU without vectors. There is no (I think) UB in the patchset. My point is rather that it might be prone to programming errors, not that they would be actual errors already.

Presumably someone could accidentally insert or move a call to ff_rv_get_vlenb() before the proper CPU flags check, and not notice that it would cause UB on some CPUs.

With that said, if there are no objections to the approach in this series, I'm obviously fine with that.
Lynne Sept. 28, 2022, 6:51 a.m. UTC | #3
Sep 28, 2022, 08:03 by remi@remlab.net:

> Le 28 septembre 2022 00:32:42 GMT+03:00, Lynne <dev@lynne.ee> a écrit :
> >Sep 27, 2022, 22:04 by remi@remlab.net:
>
>>> Hello,
>>>
>>> As a general rule, scalable vector instruction sets should be used with the
>>> largest possible vector length. There are however a number of operations that
>>> just happen with a fixed size, and this patchset exhibits the simplest one I
>>> could find. The proper RISC-V Vector extension guarantees a minimum vector
>>> length of 128 bits. In theory though the underlying specification also allows
>>> for (embedded designs with) only 32 or 64 bits per vector.
>>>
>>> The RFC is how should this be dealt with? The simplest possibibility is to
>>> simply assume 128 bits. Indeed, I am not aware of any actual or proposed
>>> processor IP with shorter-than-128-bit vectors, even less so, one upon which
>>> FFmpeg would be used. For what it is worth, ARM SVE guarantees a minimum of
>>> 128 bits per vector too. In that case, we can drop the first patch, and
>>> simplify the following ones.
>>>
>>> Another option is to expose the vector length via the CPU flags as proposed
>>> earlier by Lynne. Though this is unorthodox the vector length is not a proper
>>> flag. The vector length can readily be retrieved from a read-only unprivileged
>>> CSR, and this patchset instead introduces a simple inline wrapper therefore.
>>> The downside of this approach is that this is nominally undefined behaviour,
>>> and typically will raise a SIGILL, if the processor does not support the
>>> vector extension.
>>>
> >Where's the undefined behavior? If it's guarded by an if, the
> >function will return the maximum length. I don't mind that it's not
> >a cpuflag.
>
>>
>>
> >_______________________________________________
> >ffmpeg-devel mailing list
> >ffmpeg-devel@ffmpeg.org
> >https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>>
>>
> >To unsubscribe, visit link above, or email
> >ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
>>
>>
>
> The UB occurs if the helper is called on a CPU without vectors. There is no (I think) UB in the patchset. My point is rather that it might be prone to programming errors, not that they would be actual errors already.
>
> Presumably someone could accidentally insert or move a call to ff_rv_get_vlenb() before the proper CPU flags check, and not notice that it would cause UB on some CPUs.
>
> With that said, if there are no objections to the approach in this series, I'm obviously fine with that.
>

I think it's fine as-is. I'm sure we'll get more fate systems other than your U74
to check for that in the future.
Patchset looks good to me. Will apply in a few hours.