Projects/Replay cache improvements
MIT krb5 implements a replay cache and uses it by default. Replay caches provide a limited defense against certain kinds of active attacks against some protocols. The current replay cache implementations has severe performance limitations as well as flaws which can cause both false positives and false negatives. Many server deployments disable the replay cache as a result.
Synopsis of current implementation
The replay cache stores replay records which contain server and client principal names, an authenticator ciphertext hash, and an authenticator timestamp. The ciphertext hash is not part of the file format, so for backward compatibility, two records are stored for each authenticator, one of which encodes the hash, client, and server principal names in the nominal server name field. Records are generally small, but are not fixed-size because principal names vary in length.
When a replay cache handle is opened, we attempt to open an existing replay cache file and read in all of its current entries into an in-memory hash table. If we fail to open the file or read any of its entries, we attempt to initialize the file. If we successfully read all of the entries but detect more than 30 expired entries, we attempt to expunge the file. To initialize the file, we unlink it (ignoring errors), open it with O_CREAT|O_EXCL, and write a header consisting of a two-byte version number and a four-byte lifespan. To expunge the file, we close the file descriptor, initialize the file, then write all non-expired entries from the hash table.
For each authentication, the replay record is checked against the memory hash table for a conflict, then added to the table, then marshalled and written to the file. Based on a heuristic, we may choose to expunge the file; otherwise we fsync() it to ensure that no replay records are lost due to a system reboot.
The current implementation has the following possible issues with correctness:
- There is a race condition when creating a replay cache file, or replacing one after detecting corruption ([krbdev.mit.edu #3498]).
- On Windows, races while expunging the replay cache can cause temporary files to be left behind, which eventually exhausts the space in the directory.
- There is a possibility for file corruption due to lack of locking and not using O_APPEND ([krbdev.mit.edu #1671]). If two processes nearly simultaneously write at the same file offset, the writes will overlap unless O_APPEND is set.
- Once a replay cache handle is open, entries written by other processes (or through other handles) will not be detected. Expunges performed through other handles will also not be detected, and the current handle will continue to write entries to the unlinked file.
The current implementation performs an fsync() on every write ([krbdev.mit.edu #372]). Unless the replay cache is located in a memory filesystem, this is always the limiting performance factor.
Since the current format is not in any way sorted, a newly opened reply cache handle must read all entries in the file to detect a conflict. For high-traffic situations, this is performance-limiting if fsync is not. If a single replay cache handle is used for many authentications, this problem does not apply because the replay cache implementation (incorrectly) does not read new entries.
The current replay cache implementation does not use any file locking. Although this causes correctness issues, it means that contention between processes is not performance-limiting.
Short-term solutions may include:
- Fix the file creation race somehow to avoid spurious failures, perhaps by unlinking and retrying on failure to open.
- Start using O_APPEND and detect when other processes have written entries.
- Detect expunges by other processes.
- Start using file locking.
- Stop using fsync, at least for most records.
Medium-term solutions may include:
- New file-based replay cache, possibly hash table based.
- IPC-based replay cache for higher performance.
Long-term solutions may include:
- Revise protocols to not require replay caches for security
The simplest way to avoid using fsync() is to decide that it's okay to potentially lose replay records due to a system reboot. Since replay caches can never be perfect, this might be an acceptable limitation of the implementation.
Nico has suggested only fsyncing records whose authenticator timestamps might postdate the next reboot, and reject authenticators whose timestamps predate the current boot. Potential issues with this idea include:
- If a non-trivial percentage of incoming authenticators are timestamped in the future by more than the reboot time estimate, performance might still be limited by fsync.
- If the server clock drifts into the past by more than the reboot time estimate, the replay cache becomes slow.
- Rejecting authenticators timestamped before the current boot could result in many spurious failures just after reboot.
Hash-based file format
(Roland's idea can be described here.)
Transition to a new format
We believe, but have not rigorously confirmed, that no other Kerberos implementation implements the MIT krb5 replay cache format. A transition to a new replay cache format must still take into account that old replay cache files may exist after an upgrade, and that old and new versions of MIT krb5 might be used on the same system (e.g. the native OS version and a local build of a more recent version).
The simplest transition strategy is to change the filename picked by krb5_get_server_rcache, and decide that old and new versions do not share replay records.