KEF Encryption Format -- Technical Specification¶
...The K stands for "KEF"
--anon
Motivation¶
In the autumn of 2023, during the lead-up to krux release 23.09.0, contributors proposed a method of encrypting bip39 mnemonics that could be stored in SPI-flash, on sdcard, and/or exported to QR. Regarding the encrypted-mnemonic QR format: the layout proposed was interesting as an extensible, lite-weight, self-describing envelope that has been appreciated by users ever since.
..."Wen passphrases, output descriptors, PSBTs, and notes?"
--plebs
This specification, and its accompanying implementation and test-suite are the result of months of exploration into improvements meant to better define, test, and extend the original encryption format that we'll refer to as KEF. It proposes ten new versions, extending its usefulness to more than mnemonics, targeting variable-length strings up to moderately sized PSBTs, flexibility to choose among four AES modes of operation, with-or-without compression, and versions optimized to result in a smaller envelope.
Above all, this specification aims to be supported by as many projects as would consider adopting it, so that users are not "locked" into a particular project when recovering their secrets. Corrections and refinement to, and scrutiny of this specification are appreciated. Proposals for more versions
are welcome, provided they offer "value" to the user and fit within the scope of this system. Once released, because it cannot be known how many KEF envelopes may exist in-the-wild, changes to any particular version must remain backwards compatible for decryption. Adopting implementations are free to support any KEF versions they wish to support, for decryption-only or for both encryption and decryption -- with the expectation that claims-of-support made are clear and precise about what is supported.
Overview¶
This system encrypts arbitrary plaintext into a versioned, self-describing, KEF envelope which includes:
- custom identity/label
len_id
andid
- version
v
- pbkdf2-hmac iterations
i
- and cipher-payload
cpl
where cipher-payload cpl
consists of:
- Initialization-Vector/Nonce (if applicable)
IV
- ciphertext
- and authentication/validation data
auth
KEF versions offer combinations of different modes-of-operation, authentication strategies, padding strategies, and compression, all selected via a numeric version code (0 - 21).
- Available version codes are: 0, 1, 5, 6, 7, 10, 11, 12, 15, 16, 20, 21. Not all integers in the range are assigned, and implementations may disable versions or modes.
Currently, all versions use AES and derive the 256-bit encryption key k
as:
k = pbkdf2_hmac_sha256(K, id, i)
where:
K
= user-provided password/key material (bytes; if str: non-normalized encode as utf-8)id
= salt (variable-length, prepended to envelope; bytes: if str: non-normalized encode as utf-8)i
= iteration count (3 bytes, big-endian)
The stored iteration field i
MUST be ≥ 1. The effective PBKDF2 iteration count is i
if i > 10,000
, otherwise i * 10,000
.
Compression (when enabled) uses zlib.compress(wbits=-10) or raw deflate(micropython).
Authentication has three forms:
- When the mode has built-in authentication, like GCM:
auth
is a truncated auth tag - When the mode does not have built-in authentication,
auth
is a truncated sha256 digest whose pre-image is of two forms:- if
auth
will be encrypted with plaintext, hidden, it issha256(plaintext)
- if
auth
will be appended to ciphertext, exposed, it issha256(version || IV || plaintext || derived-k)
- if
Generalizations Regarding Implementation¶
It is expected that any implementation can decrypt a KEF envelope that was created by itself on the same device. Implementations are asked to make their "best-effort" to be capable of decrypting KEF envelopes for versions they support which were created by other implementations or on other devices -- but this will not always be possible. Decrypting large KEF envelopes on severely constrained devices, or ones created with flawed implementations is unrealistic. Therefore, in such cases it is the responsibility of the user to find an implementation and device capable of decrypting their KEF envelope, or to have a non-KEF form of recovery.
-
Be strict while encrypting. Be tolerant -- and non-specific about errors, when decrypting.
-
At its base, a KEF envelope is a format of bytes -- so are all of its inputs. Remember this when converting strings gathered for the
key
andid
. Consider being strict about offering a reasonably minimal set of characters, common and available on other devices and/or international keyboards when encrypting -- then encode unicode codepoints (if not ascii) directly to their utf-8 representations without normalization. For decryption, more characters could be offered when gathering thekey
, and multiple normalization strategies may be tried, so that secrets may be recovered. Consider some capability of displaying bothkey
andid
as bytes, and gathering thekey
as bytes either directly or via hex/base64 conversion if necessary, to enable recovery. Do NOT assume that a user originally used a particular implementation to encrypt a KEF envelope. -
On the importance of a STRONG user-supplied
key
This cannot be stressed enough to each user of KEF. While KEF allows for key-stretching viaid
anditerations
, and offers modes that require a randomIV
/ Nonce, KEF offers no expectation of security for a weak user-suppliedkey
. Consider making this point clear to users before encrypting and/or offer an indication ofkey
strength once gathered. If a KEF envelope has been created with a "weak"key
and stored accessible to others, user should assume that their secret has been leaked. Consider encouraging users to make sane choices about the characters they use in theirkey
, aware that non-ascii characters offered by one implementation may not be easy to enter on another, or that a recognizable glyph may not exist on other devices for them to verify theirkey
when decrypting. -
On security Not all KEF
versions
offer the same security guarantees, so implementors MUST take care to protect against "unsafe" usage. As already mentioned: be strict and fail to encrypt when "unsafe"; be tolerant and vague while decrypting. Support for decrypt-only on a particular version is perfectly valid should an implementation choose to "nudge" users towards a more-secure version where it supports full encrypt/decrypt functionality.-
On mode ECB: Repeated blocks would leak patterns within ciphertext. Therefore, be strict -- refuse to encrypt using mode ECB whenever duplicate blocks are detected. Consider a compressed version which may resolve this.
-
On block modes with NUL padding Problems to unpad can arise decrypting where valid NUL bytes are confused with removable padding.
- If
auth
is appended to plaintext before padding AND theauth
bytes end in 0x00: be strict -- refuse to encrypt. Consider a version with safe padding. - If
auth
is appended to ciphertext after padding/encryption AND theplaintext
bytes end in 0x00: be strict -- refuse to encrypt. Consider a version with safe padding. - Do not assume that other implementations adhere to the above. Be tolerant and make reasonable efforts to successfully recover secrets when decrypting. Offering a warning to users AFTER successful decryption in this case may be appropriate.
- If
-
-
On modes that require IV or Nonce Take precautions to ensure that this value is random and not reused. ie: Natural entropy captured from camera sensor (user validated and/or analyzed to ensure sensor is working / high entropy).
-
On common
bytes
encodings Outside the strict scope that KEF envelopes are a format of bytes but related to this topic: implementations may be presented with encoded strings that are likely to represent bytes. For instance: base64, base43 (from electrum), base32, or hex might be representations of a KEF envelope that was previously encoded for transport. As you continue reading, it will become clear that with any bytestring, one may recognize a KEF envelope by:- reading the first byte as an integer
len_id
, - jumping that many bytes, over the
id
, to read the next byte as an integerversion
, - if that version represents a known and supported KEF version, then the rest of the envelope may be parsed via that version's KEF rules.
- if parsing succeeds without errors, it is likely to be a KEF envelope and a decryption user-interface should be offered to the user.
While the user likely knows, the process instance of a KEF implementation will learn definitively, only AFTER a successful decryption, that a bytestring was indeed a KEF envelope. If at any point along this process, an implementation finds that
version
is unknown/disabled, or if parsing fails, the expected action is NOT TO RAISE SPECIFIC ERRORS regarding this inspection. Rather, the appropriate action is to assume it was not a KEF envelope and to treat the data under another context: ie: "Unknown". Similarly, as mentioned above, being vague about errors during decryption implies that "Failed!" may be a sufficient response for any error, instead of leaking to a potential attacker specific details about the failure.
- reading the first byte as an integer
-
On Iterations Consider that users may want to decrypt KEF envelopes on various resource-constrained devices. There is a minimum 10,000 iterations imposed in any KEF envelope (a value of 1 would be 10,000 pbkdf2_hmac iterations), and the maximum could be as high as 100,000,000 (a value of 10,000), but depending on the device used, 500,000 might be too high. Also, since the user-supplied
key
is stretched by this value, consider offering a range to users -- then adding a smalldelta
as extra bits of entropy to derive different AES-256 keys that would otherwise be the same in the event the user re-uses the samekey
,id
anditerations
when creating many KEF envelopes. -
On truncated Authentication At first glance it may be concerning that
auth
bytes for many versions have been truncated and are trivially "weak". Note that KEF's use-case for authentication is to validate that the user has correctly entered their decryptionkey
. In the worse case, "false-authenticated" success will occur at a rate of 1:16M (or 1:4B for others) if using an incorrect decryptionkey
; similar if an attacker has modified the KEF envelope. In these "false-authenticated" success cases, data will result from decryption, but that data will NOT be the original secret or plaintext; it will be of no value.
Common Structure of a KEF Envelope¶
All KEF versions' encrypted outputs follow this layout:
len_id + id + v + i + cpl
Field | Size | Description |
---|---|---|
len_id |
1 byte | Length of id (0 - 252) |
id |
len_id bytes |
Salt for PBKDF2 |
v |
1 byte | Version number |
i |
3 bytes | Iteration count (big-endian; if <= 10,000: *= 10,000) |
cpl |
Variable | Cipher PayLoad (IV + ciphertext + auth) |
The Cipher PayLoad cpl
structure varies by version.
Versions - Details¶
Details for currently-available versions of KEF follow. The top half of each section is pre-formatted self-doc
text, built from the reference implementation's test-suite (using KEF VERSION
constants as KEF's rule-set). The bottom half of each section with rich-formatted text was initially LLM-generated, prompted with the self-doc
text, then curated and edited by hand.
v0: "AES-ECB v1"¶
[AES-ECB v1] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =0
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: e.encrypt(P + auth + pad)
e: AES(k, ECB)
auth: sha256(P)[:16]
pad: NUL
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: ECB
- IV: None
- Padding: NUL bytes to block boundary
- Authentication: First 16 bytes of
SHA256(plaintext)
, hidden - cpl layout:
[ciphertext]
(auth embedded before padding/encryption) - Use Case: LEGACY: consider using version 5; encryption of 16 or 32 BIP39 entropy bytes
- Security Note: When encrypting: fail "unsafe" if duplicate plaintext blocks, fail "unsafe" if auth ends 0x00
v1: "AES-CBC v1"¶
[AES-CBC v1] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =1
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: iv + e.encrypt(P + auth + pad)
iv: 16b
e: AES(k, CBC, iv)
auth: sha256(P)[:16]
pad: NUL
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: CBC
- IV: 16 bytes
- Padding: NUL bytes to block boundary
- Authentication: First 16 bytes of
SHA256(plaintext)
, hidden - cpl layout:
[iv (16)] + [ciphertext]
(auth embedded before padding/encryption) - Use Case: LEGACY: consider using version 10; encryption of 16 or 32 BIP39 entropy bytes
- Security Note: When encrypting: do not re-use IV, fail "unsafe" if auth ends 0x00
v5: "AES-ECB"¶
[AES-ECB] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =5
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: e.encrypt(P + pad) + auth
e: AES(k, ECB)
pad: NUL
auth: sha256(v + P + k)[:3]
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: ECB
- IV: None
- Padding: NUL bytes to block boundary
- Authentication: First 3 bytes of
SHA256(version_byte + plaintext + derived_key)
, exposed - cpl layout:
[ciphertext] + [auth (3)]
- Use Case: smallest KEF envelope for high-entropy secrets (BIP39 entropy, passphrase, cryptographic keys)
- Security Note: When encrypting: fail "unsafe" if duplicate plaintext blocks, fail "unsafe" if plaintext ends 0x00
v6: "AES-ECB +p"¶
[AES-ECB +p] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =6
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: e.encrypt(P + auth + pad)
e: AES(k, ECB)
auth: sha256(P)[:4]
pad: PKCS7
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: ECB
- IV: None
- Padding: PKCS7 to block boundary
- Authentication: First 4 bytes of
SHA256(plaintext)
, hidden - cpl layout:
[ciphertext]
(auth embedded before padding/encryption) - Use Case: Mid-sized variable length plaintext
- Security Note: When encrypting: fail "unsafe" if duplicate plaintext blocks
v7: "AES-ECB +c"¶
[AES-ECB +c] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =7
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: e.encrypt(zlib(P, wbits=-10) + auth + pad)
e: AES(k, ECB)
auth: sha256(zlib(P, wbits=-10))[:4]
pad: PKCS7
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: ECB
- IV: None
- Compression:
zlib.compress(P, wbits=-10)
, - raw deflate - Padding: PKCS7 to block boundary
- Authentication: First 4 bytes of
SHA256(plaintext)
, hidden - cpl layout:
[ciphertext]
(auth embedded after compression, before padding/encryption) - Use Case: Mid-sized variable length plaintext
- Security Note: like others, when encrypting: fail "unsafe" if duplicate blocks -- unlikely with compression
v10: "AES-CBC"¶
[AES-CBC] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =10
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: iv + e.encrypt(P + pad) + auth
iv: 16b
e: AES(k, CBC, iv)
pad: NUL
auth: sha256(v + iv + P + k)[:4]
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: CBC
- IV: 16 bytes, random, prepended in
cpl
- Padding: NUL bytes to block boundary
- Authentication: First 4 bytes of
SHA256(v + iv + P + k)
, exposed - cpl layout:
[iv (16)] + [ciphertext] + [auth (4)]
- Use Case: Mnemonics, passphrases, short secrets
- Security Note: When encrypting: do not re-use IV, fail "unsafe" if plaintext ends 0x00
v11: "AES-CBC +p"¶
[AES-CBC +p] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =11
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: iv + e.encrypt(P + auth + pad)
iv: 16b
e: AES(k, CBC, iv)
auth: sha256(P)[:4]
pad: PKCS7
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: CBC
- IV: 16 bytes, random, prepended in
cpl
- Padding: PKCS7 to block boundary
- Authentication: First 4 bytes of
SHA256(plaintext)
, hidden - cpl layout:
[iv] + [ciphertext]
(auth embedded before padding/encryption) - Use Case: General mid-sized plaintext
- Security Note: When encrypting: do not re-use IV
v12: "AES-CBC +c"¶
[AES-CBC +c] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =12
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: iv + e.encrypt(zlib(P, wbits=-10) + auth + pad)
iv: 16b
e: AES(k, CBC, iv)
auth: sha256(zlib(P, wbits=-10))[:4]
pad: PKCS7
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: CBC
- IV: 16 bytes, random, prepended in
cpl
- Compression:
zlib.compress(P, wbits=-10)
, - raw deflate - Padding: PKCS7 to block boundary
- Authentication: First 4 bytes of
SHA256(compressed plaintext)
, hidden - cpl layout:
[iv (16)] + [ciphertext]
(auth embedded after compression, before padding/encryption) - Use Case: Larger plaintext
- Security Note: When encrypting: do not re-use IV
v15: "AES-CTR"¶
[AES-CTR] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =15
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: iv + e.encrypt(P + auth)
iv: 12b
e: AES(k, CTR, iv)
auth: sha256(P)[:4]
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: CTR
- IV: 12 bytes, random, prepended in
cpl
- Padding: None
- Authentication: First 4 bytes of
SHA256(plaintext)
, hidden - cpl layout:
[iv (12)] + [ciphertext]
(auth embedded before padding/encryption) - Use Case: Small to mid-sized plaintext
- Security Note: When encrypting: do not re-use IV
v16: "AES-CTR +c"¶
[AES-CTR +c] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =16
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: iv + e.encrypt(zlib(P, wbits=-10) + auth)
iv: 12b
e: AES(k, CTR, iv)
auth: sha256(zlib(P, wbits=-10))[:4]
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: CTR
- IV: 12 bytes, random, prepended in
cpl
- Compression:
zlib.compress(P, wbits=-10)
, - raw deflate - Padding: None
- Authentication: First 4 bytes of
SHA256(compressed plaintext)
, hidden - cpl layout:
[iv (12)] + [ciphertext]
(auth embedded after compression, before padding/encryption) - Use Case: Larger plaintext
- Security Note: When encrypting: do not re-use IV
v20: "AES-GCM"¶
[AES-GCM] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =20
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: iv + e.encrypt(P) + auth
iv: 12b
e: AES(k, GCM, iv)
auth: e.authtag[:4]
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: GCM
- IV/Nonce: 12 bytes, random, prepended in
cpl
- Padding: None
- Authentication: First 4 bytes of GCM authtag, exposed
- cpl layout:
[iv (12)] + [ciphertext] + [auth (4)]
- Use Case: DEFAULT: Small to mid-sized plaintext
- Security Note: When encrypting: do not re-use IV/Nonce
v21: "AES-GCM +c"¶
[AES-GCM +c] KEF bytes: len_id + id + v + i + cpl
len_id: 1b
id: <len_id>b
v: 1b; =21
i: 3b big; =(i > 10K) ? i : i * 10K
cpl: iv + e.encrypt(zlib(P, wbits=-10)) + auth
iv: 12b
e: AES(k, GCM, iv)
auth: e.authtag[:4]
k: pbkdf2_hmac_sha256(K, id, i)
- Mode: GCM
- IV/Nonce: 12 bytes, random, prepended in
cpl
- Compression:
zlib.compress(P, wbits=-10)
, - raw deflate - Padding: None
- Authentication: First 4 bytes of GCM authtag, exposed
- cpl layout:
[iv (12)] + [ciphertext] + [auth (4)]
- Use Case: DEFAULT: Larger plaintext
- Security Note: When encrypting: do not re-use IV/Nonce
Summary Table¶
Ver | Name | Mode | IV | Padding | Compress | Authentication Method | Auth | Intended Use Case |
---|---|---|---|---|---|---|---|---|
0 | AES-ECB v1 | ECB | – | NUL | No | SHA256(plaintext)[:16] | 16 B | Legacy high-entropy |
1 | AES-CBC v1 | CBC | 16 | NUL | No | SHA256(plaintext)[:16] | 16 B | Legacy high-entropy |
5 | AES-ECB | ECB | – | NUL | No | SHA256(v+P+k)[:3] | 3 B exposed | Small, high-entropy |
6 | AES-ECB +p | ECB | – | PKCS7 | No | SHA256(plaintext)[:4] | 4 B | General Mid-sized |
7 | AES-ECB +c | ECB | – | PKCS7 | Yes | SHA256(compressed)[:4] | 4 B | Large; duplicate blocks |
10 | AES-CBC | CBC | 16 | NUL | No | SHA256(v+iv+P+k)[:4] | 4 B exposed | Small, high-entropy |
11 | AES-CBC +p | CBC | 16 | PKCS7 | No | SHA256(plaintext)[:4] | 4 B | General mid-sized |
12 | AES-CBC +c | CBC | 16 | PKCS7 | Yes | SHA256(compressed)[:4] | 4 B | General large |
15 | AES-CTR | CTR | 12 | – | No | SHA256(plaintext)[:4] | 4 B | General mid-sized |
16 | AES-CTR +c | CTR | 12 | – | Yes | SHA256(compressed)[:4] | 4 B | General large |
20 | AES-GCM | GCM | 12 | – | No | GCM authtag[:4] | 4 B exposed | Best, General mid-sized |
21 | AES-GCM +c | GCM | 12 | – | Yes | GCM authtag[:4] | 4 B exposed | Best, General large |
KEF Implementation Concepts¶
Using examples from, and as an introduction to the reference KEF implementation, we'll quickly cover some basic concepts that may be helpful in getting started with your own KEF implementation.
Version Configuration¶
From the version details and summary table: note that all KEF versions can be defined as having a set of parameters which define that version's KEF rules. For ease-of-maintenance -- and also for extending later, it may be useful to store these in a central configuration. Within our sample reference, these are defined by constants kef.VERSIONS
, kef.MODE_NUMBERS
and kef.MODE_IVS
.
Choosing a Version¶
As soon as you have data to hide, KEF offers choices for which version to use. That choice may be made by the user, or by the implementation, based on what is being hidden, compatibility with others, and how it may be stored/transported. The sample reference uses a function named kef.suggest_versions()
to make a choice based on user's preferred mode-of-operation, the plaintext being hidden, then optimizes for a smaller KEF envelope.
Encryption, Decryption, and Authentication¶
Once you know what you need to hide and how you want to hide it, you'll need something to perform the encryption. You'll start by stretching the user-supplied key
, salted with id
for a number of iterations
to derive the 256-bit AES key. Next you'll need to encrypt the plaintext (possibly with a random IV
/ Nonce) according to the chosen KEF version, so that the result is a cipher-payload cpl
. To reverse this process, you'll need something to decrypt and authenticate the cipher-payload cpl
-- again according to the rules of the particular KEF version. The sample reference uses a class named kef.Cipher
for stretching the key
, encrypting plaintext to cpl
, and decrypting / authenticating cpl
back into plaintext.
Padding and Unpadding¶
Depending on the mode-of-operation of your version, you may need to pad the plaintext. If so, there will also be a need to unpad during the decryption process. The sample reference uses functions named kef.pad()
and kef.unpad()
, which are called from inside the kef.Cipher
object when encrypting and decrypting.
Data Compression: Deflate & Inflate¶
Depending on the version, you may also need to deflate plaintext so that it is compressed before encryption. Likewise, you'll need to reinflate it after decryption to decompress the plaintext. The sample reference uses functions named kef.deflate()
and kef.reinflate()
, which are also called from inside the kef.Cipher
object when encrypting and decrypting.
Metadata Wrapping and Unwrapping¶
After encryption, you'll need to wrap the id
, version
, iterations
and cpl
into a valid KEF envelope. Likewise, in order to reveal something hidden within a KEF envelope, you'll first need to unwrap, or parse it into its constituent parts (id
, version
, iterations
, cpl
) so that it may be decrypted as described above. The sample reference uses functions kef.wrap()
and kef.unwrap()
for these procedures.
On Further Encoding KEF Envelopes¶
Outside the scope of this specification on KEF envelopes, which are strings of bytes, implementations will surely need to make choices about encoding/decoding schemes. Whether for QR transport, copy-pasting into messages, embedding into json documents, in-plain-sight within other document formats, or persisted in binary files, these choices are left to implementors.