Age | Commit message (Collapse) | Author | Files | Lines |
|
Improve the way micro instructions (FW code) are uploaded to Accelerator
Engines (AEs). If code starts at PC zero (absolute addressing), read
uwords with no relative address. Otherwise, use relative addressing to
the page region.
Signed-off-by: Jack Xu <[email protected]>
Co-developed-by: Wojciech Ziemba <[email protected]>
Signed-off-by: Wojciech Ziemba <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Rename the function qat_uclo_del_uof_obj() in qat_uclo_del_obj() since
it frees the memory allocated for all firmware objects.
Signed-off-by: Jack Xu <[email protected]>
Co-developed-by: Wojciech Ziemba <[email protected]>
Signed-off-by: Wojciech Ziemba <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Introduce additional parenthesis to resolve a warninga reported by
checkpatch.
Signed-off-by: Jack Xu <[email protected]>
Co-developed-by: Wojciech Ziemba <[email protected]>
Signed-off-by: Wojciech Ziemba <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Remove unnecessary parenthesis across the firmware loader.
Signed-off-by: Jack Xu <[email protected]>
Co-developed-by: Wojciech Ziemba <[email protected]>
Signed-off-by: Wojciech Ziemba <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Change message in error path of qat_uclo_check_image_compat() to report
an incompatible firmware image that contains a neighbor register table.
Signed-off-by: Jack Xu <[email protected]>
Co-developed-by: Wojciech Ziemba <[email protected]>
Signed-off-by: Wojciech Ziemba <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Do not mask the AE number with the AE mask when accessing the AE local
CSRs. Bit 12 of the local CSR address is the start of AE number so just
take out the AE mask here.
Signed-off-by: Jack Xu <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
The return value of qat_hal_rd_ae_csr() is always a CSR value and never
a status and should not be stored in the status variable of
qat_hal_put_rel_rd_xfer().
This removes the assignment as qat_hal_rd_ae_csr() is not expected to
fail.
A more comprehensive handling of the theoretical corner case which could
result in a fail will be submitted in a separate patch.
Fixes: 8c9478a400b7 ("crypto: qat - reduce stack size with KASAN")
Signed-off-by: Jack Xu <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Fiona Trahe <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Implement infrastructure for the Multiple Object File (MOF) format
in the firmware loader. This will allow to load a specific firmware
image contained inside an MOF file.
This patch is based on earlier work done by Pingchao Yang.
Signed-off-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Jack Xu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
This patch fixes all the sparse warnings in cavium/nitrox:
- Fix endianness warnings by adding the correct markers to unions.
- Add missing header inclusions for prototypes.
- Move nitrox_sriov_configure prototype into the isr header file.
Signed-off-by: Herbert Xu <[email protected]>
|
|
Change all lower case pci in comments to be upper case PCI.
Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Adam Guerin <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
The current NEON based ChaCha implementation for ARM is optimized for
multiples of 4x the ChaCha block size (64 bytes). This makes sense for
block encryption, but given that ChaCha is also often used in the
context of networking, it makes sense to consider arbitrary length
inputs as well.
For example, WireGuard typically uses 1420 byte packets, and performing
ChaCha encryption involves 5 invocations of chacha_4block_xor_neon()
and 3 invocations of chacha_block_xor_neon(), where the last one also
involves a memcpy() using a buffer on the stack to process the final
chunk of 1420 % 64 == 12 bytes.
Let's optimize for this case as well, by letting chacha_4block_xor_neon()
deal with any input size between 64 and 256 bytes, using NEON permutation
instructions and overlapping loads and stores. This way, the 140 byte
tail of a 1420 byte input buffer can simply be processed in one go.
This results in the following performance improvements for 1420 byte
blocks, without significant impact on power-of-2 input sizes. (Note
that Raspberry Pi is widely used in combination with a 32-bit kernel,
even though the core is 64-bit capable)
Cortex-A8 (BeagleBone) : 7%
Cortex-A15 (Calxeda Midway) : 21%
Cortex-A53 (Raspberry Pi 3) : 3%
Cortex-A72 (Raspberry Pi 4) : 19%
Cc: Eric Biggers <[email protected]>
Cc: "Jason A . Donenfeld" <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Remove cast for mailbox CSR in adf_admin.c as it is not needed.
Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Adam Guerin <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
The extra tests in the manager actually require the manager to be
selected too. Otherwise the linker gives errors like:
ld: arch/x86/crypto/chacha_glue.o: in function `chacha_simd_stream_xor':
chacha_glue.c:(.text+0x422): undefined reference to `crypto_simd_disabled_for_test'
Fixes: 2343d1529aff ("crypto: Kconfig - allow tests to be disabled when manager is disabled")
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
At the time xts fallback tfm allocation fails the device struct
hasn't been enabled yet in the caam xts tfm's private context.
Fix this by using the device struct from xts algorithm's private context
or, when not available, by replacing dev_err with pr_err.
Fixes: 9d9b14dbe077 ("crypto: caam/jr - add fallback for XTS with more than 8B IV")
Fixes: 83e8aa912138 ("crypto: caam/qi - add fallback for XTS with more than 8B IV")
Fixes: 36e2d7cfdcf1 ("crypto: caam/qi2 - add fallback for XTS with more than 8B IV")
Signed-off-by: Horia Geantă <[email protected]>
Reviewed-by: Iuliana Prodan <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
'hisi_qm_init' initializes configuration of QM.
To improve code readability, split it into two pieces.
Signed-off-by: Weili Qian <[email protected]>
Reviewed-by: Zhou Wang <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
'qm_eq_ctx_cfg' initializes configuration of EQ and AEQ,
split it into two pieces to improve code readability.
Signed-off-by: Weili Qian <[email protected]>
Reviewed-by: Zhou Wang <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
'qm_qp_ctx_cfg' initializes configuration of SQ and CQ,
split it into two pieces to improve code readability.
Signed-off-by: Weili Qian <[email protected]>
Reviewed-by: Zhou Wang <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Replace 'sprintf' with 'scnprintf' to avoid overrun.
Signed-off-by: Weili Qian <[email protected]>
Reviewed-by: Zhou Wang <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Since 'qm_set_sqctype' always returns 0, change it as 'void'.
Signed-off-by: Weili Qian <[email protected]>
Reviewed-by: Zhou Wang <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Since 'qm_create_debugfs_file' always returns 0, change it as 'void'.
Signed-off-by: Weili Qian <[email protected]>
Reviewed-by: Zhou Wang <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
The returns of 'qm_get_hw_error_status' and 'qm_get_dev_err_status'
are values from the hardware registers, which should not be defined
as 'int', so update as 'u32'.
Signed-off-by: Weili Qian <[email protected]>
Reviewed-by: Zhou Wang <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Some numbers are replaced by macros to avoid incomprehension.
Signed-off-by: Weili Qian <[email protected]>
Reviewed-by: Zhou Wang <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Clean up the check for irq. dev_err() is superfluous as
platform_get_irq() already prints an error. Check for zero
would indicate a bug. Remove curly braces to conform to
styling requirements.
Signed-off-by: Nigel Christian <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Loading the module deadlocks since:
-local cbc(aes) implementation needs a fallback and
-crypto API tries to find one but the request_module() resolves back to
the same module
Fix this by changing the module alias for cbc(aes) and
using the NEED_FALLBACK flag when requesting for a fallback algorithm.
Fixes: 00b99ad2bac2 ("crypto: arm/aes-neonbs - Use generic cbc encryption path")
Signed-off-by: Horia Geantă <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
A semicolon is not needed after a switch statement.
Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
A semicolon is not needed after a switch statement.
Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
PAC pointer authentication signs the return address against the value
of the stack pointer, to prevent stack overrun exploits from corrupting
the control flow. However, this requires that the AUTIASP is issued with
SP holding the same value as it held when the PAC value was generated.
The Poly1305 NEON code got this wrong, resulting in crashes on PAC
capable hardware.
Fixes: f569ca164751 ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS ...")
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Commit 3f69cc60768b ("crypto: af_alg - Allow arbitrarily long algorithm
names") made the kernel start accepting arbitrarily long algorithm names
in sockaddr_alg. However, the actual length of the salg_name field
stayed at the original 64 bytes.
This is broken because the kernel can access indices >= 64 in salg_name,
which is undefined behavior -- even though the memory that is accessed
is still located within the sockaddr structure. It would only be
defined behavior if the array were properly marked as arbitrary-length
(either by making it a flexible array, which is the recommended way
these days, or by making it an array of length 0 or 1).
We can't simply change salg_name into a flexible array, since that would
break source compatibility with userspace programs that embed
sockaddr_alg into another struct, or (more commonly) declare a
sockaddr_alg like 'struct sockaddr_alg sa = { .salg_name = "foo" };'.
One solution would be to change salg_name into a flexible array only
when '#ifdef __KERNEL__'. However, that would keep userspace without an
easy way to actually use the longer algorithm names.
Instead, add a new structure 'sockaddr_alg_new' that has the flexible
array field, and expose it to both userspace and the kernel.
Make the kernel use it correctly in alg_bind().
This addresses the syzbot report
"UBSAN: array-index-out-of-bounds in alg_bind"
(https://syzkaller.appspot.com/bug?extid=92ead4eb8e26a26d465e).
Reported-by: [email protected]
Fixes: 3f69cc60768b ("crypto: af_alg - Allow arbitrarily long algorithm names")
Cc: <[email protected]> # v4.12+
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Use the new crypto_engine_alloc_init_and_set() function to
initialize crypto-engine and enable retry mechanism.
Set the maximum size for crypto-engine software queue based on
Job Ring size (JOBR_DEPTH) and a threshold (reserved for the
non-crypto-API requests that are not passed through crypto-engine).
The callback for do_batch_requests is NULL, since CAAM
doesn't support linked requests.
Signed-off-by: Iuliana Prodan <[email protected]>
Reviewed-by: Horia Geantă <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Currently, by default crypto self-test failures only result in a
pr_warn() message and an "unknown" status in /proc/crypto. Both of
these are easy to miss. There is also an option to panic the kernel
when a test fails, but that can't be the default behavior.
A crypto self-test failure always indicates a kernel bug, however, and
there's already a standard way to report (recoverable) kernel bugs --
the WARN() family of macros. WARNs are noisier and harder to miss, and
existing test systems already know to look for them in dmesg or via
/proc/sys/kernel/tainted.
Therefore, call WARN() when an algorithm fails its self-tests.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
When alg_test() is called from tcrypt.ko rather than from the algorithm
registration code, "driver" is actually the algorithm name, not the
driver name. So it shouldn't be used in places where a driver name is
wanted, e.g. when reporting a test failure or when checking whether the
driver is the generic driver or not.
Fix this for the skcipher algorithm tests by getting the driver name
from the crypto_skcipher that actually got allocated.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
When alg_test() is called from tcrypt.ko rather than from the algorithm
registration code, "driver" is actually the algorithm name, not the
driver name. So it shouldn't be used in places where a driver name is
wanted, e.g. when reporting a test failure or when checking whether the
driver is the generic driver or not.
Fix this for the AEAD algorithm tests by getting the driver name from
the crypto_aead that actually got allocated.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
When alg_test() is called from tcrypt.ko rather than from the algorithm
registration code, "driver" is actually the algorithm name, not the
driver name. So it shouldn't be used in places where a driver name is
wanted, e.g. when reporting a test failure or when checking whether the
driver is the generic driver or not.
Fix this for the hash algorithm tests by getting the driver name from
the crypto_ahash or crypto_shash that actually got allocated.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Add crypto_aead_driver_name(), which is analogous to
crypto_skcipher_driver_name().
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
A break is not needed if it is preceded by a return
Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Alexandre Belloni <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64
(tested on Broadwell Xeon) while not increasing code size too much.
Signed-off-by: Arvind Sankar <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
This reduces code size substantially (on x86_64 with gcc-10 the size of
sha256_update() goes from 7593 bytes to 1952 bytes including the new
SHA256_K array), and on x86 is slightly faster than the full unroll
(tested on Broadwell Xeon).
Signed-off-by: Arvind Sankar <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
The temporary W[] array is currently zeroed out once every call to
sha256_transform(), i.e. once every 64 bytes of input data. Moving it to
sha256_update() instead so that it is cleared only once per update can
save about 2-3% of the total time taken to compute the digest, with a
reasonable memset() implementation, and considerably more (~20%) with a
bad one (eg the x86 purgatory currently uses a memset() coded in C).
Signed-off-by: Arvind Sankar <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
The assignments to clear a through h and t1/t2 are optimized out by the
compiler because they are unused after the assignments.
Clearing individual scalar variables is unlikely to be useful, as they
may have been assigned to registers, and even if stack spilling was
required, there may be compiler-generated temporaries that are
impossible to clear in any case.
So drop the clearing of a through h and t1/t2.
Signed-off-by: Arvind Sankar <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Without the barrier_data() inside memzero_explicit(), the compiler may
optimize away the state-clearing if it can tell that the state is not
used afterwards.
Signed-off-by: Arvind Sankar <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Without the barrier_data() inside memzero_explicit(), the compiler may
optimize away the state-clearing if it can tell that the state is not
used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the
function can get inlined into sha256(), in which case the memset is
optimized away.
Signed-off-by: Arvind Sankar <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
pm_runtime_get_sync() will increment pm usage counter even
when it returns an error code. We should call put operation
in error handling paths of omap_aes_hw_init.
Signed-off-by: Zhang Qilong <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
This is an algorithm optimization. The reset operation when
setting the public key is repeated and redundant, so remove it.
At the same time, `sm2_ecc_os2ec()` is optimized to make the
function more simpler and more in line with the Linux code style.
Signed-off-by: Tianjia Zhang <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
This patch reduces the stack usage in sa2ul:
1. Move the exported sha state into sa_prepare_iopads so that it
can occupy the same space as the k_pad buffer.
2. Use one buffer for ipad/opad in sa_prepare_iopads.
3. Remove ipad/opad buffer from sa_set_sc_auth.
4. Use async skcipher fallback and remove on-stack request from
sa_cipher_run.
Reported-by: kernel test robot <[email protected]>
Fixes: d2c8ac187fc9 ("crypto: sa2ul - Add AEAD algorithm support")
Signed-off-by: Herbert Xu <[email protected]>
|
|
Clean up extra blank lines
Signed-off-by: Longfang Liu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
1. Remove unused member‘pending_reqs' in‘sec_qp_ctx' structure.
2. Remove unused member‘status' in‘sec_dev' structure.
Signed-off-by: Longfang Liu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Commit 1d2c3279311e ("crypto: x86/aes - drop scalar assembler
implementations") was meant to remove aes_glue.c, but it actually left
it as an unused one-line file. Remove this unused file.
Cc: Ard Biesheuvel <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Change type of ae_mask in adf_hw_device_data to allow for devices with
more than 16 Acceleration Engines (AEs).
Signed-off-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Wojciech Ziemba <[email protected]>
Reviewed-by: Fiona Trahe <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Allow for crypto instances to be configured with symmetric crypto rings
that belong to a bank that is different from the one where asymmetric
crypto rings are located.
This is to allow for devices with banks made of a single ring pair.
In these, crypto instances will be composed of two separate banks.
Changed string literals are not exposed to the user space.
Signed-off-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Wojciech Ziemba <[email protected]>
Reviewed-by: Fiona Trahe <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Refactor function qat_crypto_dev_config() to propagate errors to
the caller.
Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|