ML-KEM Design
=============

This document covers OpenSSL-specific ML-KEM implementation details.
**ML-KEM** is specified in [FIPS 203], which includes comprehensive pseudo-code
for all its algorithms.

ML-KEM Parameters & Functions
-----------------------------

There are 3 different parameter sets in FIPS 203 (see Section 8).

To support these variants, OpenSSL has 3 associated key managers and 3
corresponding KEM function sets.
The key management and KEM algorithm names are **ML-KEM-512**, **ML-KEM-768**
and **ML-KEM-1024**.
At the TLS layer, the associated key exchange *groups* are, respectively,
**MLKEM512**, **MLKEM768** and **MLKEM1024**.

**ML-KEM** makes extensive use of four **SHA3** primitives: **SHA3-256**,
**SHA3-512**, **SHAKE128** and **SHAKE256**.
To improve **ML-KEM** execution performance, the EVP handles for these are
pre-fetched during **ML-KEM** key initialisation and stored with each key.
These are then used in key generation, encapsulation and decapsulation.
These are also duplicated by reference (**EVP_MD** handles uprefed) when an
**ML-KEM** key is duplicated.

ML-KEM keys
-----------

**ML-KEM** is an asymmetric algorithm, and has both public and private keys.
Since the public key is exchanged between the two parties as part of key
agreement, the encoding (*wire-form*) of the public key is clearly defined and
there are unambiguous choices for its encoding and decoding functions.

It may be noted that the *wire-form* public key is "compressed".
Instead of the bulky "A" ("m" in the code) matrix, which represents the majority
of the storage required for ML-KEM public and private keys, the *wire-form* public
key, holds a 32-byte seed from which the the matrix is regenerated by the recipient
of the public key.
In the OpenSSL implementation, the matrix is *eagerly* evaluated as part of
decoding the public key, and stored in memory in the internal form needed for
subsequent computations (encapsulation).
Since the private key includes the public key as one of its components, the matrix
is also pre-computed and stored with the private key, and then need not be
regenerated during decapsulation.
During encapsulation (typically performed by servers), it is in principle
possible to save space and compute the matrix elements *just-in-time*, as each
matrix element is used exactly once.
This is not currently implemented, and the matrix is pre-computed in full.

However, the same matrix is used both during key generation and decapsulation
and computing it twice would have a noticeable performance impact (typically on
the client).
If we wanted to do *just-in-time* matrix computation for decapsulation, we'd
need to have a different memory layout for public keys when only the public key
is known, and to change the algorithm code to generate matrix elements on
demand during encapsulation.
This can be considered later, if it is determined that the space savings (9 *
512 bytes in memory for ML-KEM-768, for the full matrix, instead of 512 bytes
for a just-in-time element).
Since servers will generally destroy the client public key soon after the
shared secret is computed, these don't stay in memory long, and briefly saving
~2KB may not to be of much benefit).

The private key format is yet to be clearly standardised, though (to be able to
fully describe the algorithms) FIPS 203 documents a format that is commonly
referred to as the "extended" format.
This is the private key format supported by our key management provider
interface.
The IETF voices interest in using the "seed-based" format (the 64-byte (*d*,
*z*) seed pair from which the key is generated and can be recovered).
Recovery of the key from the seed (*d*, *z* pair) is supported by the [FIPS
203] internal deterministic key generation functions, which are used in the
*keygen* portion of the Known Answer Tests (KATs).

The design therefore caters to both options: The default key generation and KEM
encapsulation/decapsulation functions operate on "extended keys" in the
[FIPS 203] format, but it will be possible to use the "seed-based" private key
format by using the (currently test-only) deterministic *keygen* interface.
When keys are generated randomly, we don't presently provide a mechanism
to obtain and store the seed.
This can be added later if required.

Key generation API
------------------

Keys can be generated via the usual **EVP_PKEY_generate()** and
**EVP_PKEY_Q_keygen()** functions.

An explicit seed can be specified by setting the key generation
**OSSL_PKEY_PARAM_ML_KEM_SEED** parameter to a 64-byte octet-string
(concatenation of the **d** and **z** values (32-bytes each) in that order).

KEM API
-------

**ML-KEM** is meant to be a drop-in replacement for existing KEM algorithms.
Accessed in the usual way via:

- **EVP_PKEY_encapsulate_init()**,
- **EVP_PKEY_encapsulate()**,
- **EVP_PKEY_decapsulate_init()**, and
- **EVP_PKEY_decapsulate()**.

For the encapsulation operation, a test-only option exists to bypass the random
number generator (secret random inputs are required for security) and pass in
a pre-determined 32-byte random value, by setting of the
**OSSL_KEM_PARAM_IKME** parameter.

Buffers
-------

The **ML-KEM** key management and KEM providers interface with the underlying
libcrypto implementation via functions that validate the sizes of all provided
input/output buffers (encoded keys, ciphertext, shared secrets and seeds) against
the values expected for the provider's ML-KEM variant (a pointer to the variant
parameters is stored with each key).

The underlying libcrypto **ML-KEM** APIs are not directly exposed to users,
only the abstracted key management and KEM **EVP** APIs are public interfaces.

Constant Time Considerations
----------------------------

The usual constant time methods are used in the implementation.
However, we avoid using a *value-barrier* to set the masks that perform
constant-time *select* between one of two values.
This avoids a 30-50% performance penalty and is expected to be robust even in
the face of plausible future compiler optimisations.
Remainders module the prime are computed via Barret Reduction and the decoding
and decompression of the decrypted *message* has been tested to not be
vulnerable to the "clangover" attack in our implementation.

All the libcrypto functions (other than **ML_KEM_KEY** allocation, which
returns **NULL** on error) return 1 for success or zero on error.
It should be noted that to avoid chosen-ciphertext attacks, the
**decapsulate** implementation **must** return success and a synthetic
shared secret (generated in constant-time whether synthetic or successfully
decrypted) whenever the input is a well-formed ciphertext.

The only exception to the above is when, unexpectedly, one of the **SHA3**
functions fails, in that case all hope of constant-time computation is
lost, but we don't expect such failures to be influenced by the content
of chosen-ciphertexts, so this should not be an issue).

Nevertheless, even then we fall back on returning a shared secret from the RNG
along with an error indication only when the key derivation function
for the synthetic shared secret fails.
In all other conditions we return success and, as appropriate, either
the correct shared secret, or the synthetic alternative generated by the KDF.

<!-- Links  -->

[FIPS 203]:
    <https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.203.pdf>
