Projects/replay cache collision avoidance
Contents
Background
The MIT Kerberos replay cache follows the guidelines of RFC 4120 Section 10. Security Considerations:
Implementation note: If a client generates multiple requests to the KDC with the same timestamp, including the microsecond field, all but the first of the requests received will be rejected as replays. This might happen, for example, if the resolution of the client's clock is too coarse. Client implementations SHOULD ensure that the timestamps are not reused, possibly by incrementing the microseconds field in the time stamp when the clock returns the same time for multiple requests.
As computers have become faster and greater numbers of processes or threads are requiring network authentication to network services, it is becoming more likely that the microseconds field will match for multiple requests. It is also more difficult to avoid such collisions without introducing a significant performance hit. As a result, an ever increasing number of false positives are triggered even though the authenticators used by the various client processes differ.
The current FILE replay cache format consists of a file, starting with a two-byte big-endian version number. Integers in the rest of the file are in host byte order, surprisingly. Following the version number is a lifespan value in host byte order, and the remainder of the file consists of records, each of which includes the following fields:
unsigned int - client length == strlen(client principal name) + 1 variable - client principal name (NUL terminated C-string) unsigned int - server length == strlen(server principal name) + 1 variable - server principal name (NUL terminated C-string) krb5_int32 - micro seconds krb5_timestamp - timestamp
These are the fields that are required by RFC 4120 Section 10. Security Considerations:
Unless the application server provides its own suitable means to protect against replay (for example, a challenge-response sequence initiated by the server after authentication, or use of a server- generated encryption subkey), the server MUST utilize a replay cache to remember any authenticator presented within the allowable clock skew. Careful analysis of the application protocol and implementation is recommended before eliminating this cache. The replay cache will store at least the server name, along with the client name, time, and microsecond fields from the recently-seen authenticators, and if a matching tuple is found, the KRB_AP_ERR_REPEAT error is returned.
As stated the requirement is that at least the client and server principals along with the complete time stamp are required to be present. Additional data such as a hash of a canonical representation of the authenticator or the full clear text of a canonical representation of the authenticator would be permitted by RFC 4120 and could be used to avoid false positives.
Simply adding new fields to the MIT Kerberos replay cache record would not be an acceptable solution. As indicated in RFC 4120, it is imperative that all services sharing the same service principal share the same replay cache regardless of which Kerberos implementation is in use.
If multiple servers (for example, different services on one machine, or a single service implemented on multiple machines) share a service principal (a practice that we do not recommend in general, but that we acknowledge will be used in some cases), either they MUST share this replay cache, or the application protocol MUST be designed so as to eliminate the need for it. Note that this applies to all of the services. If any of the application protocols does not have replay protection built in, an authenticator used with such a service could later be replayed to a different service with the same service principal but no replay protection, if the former doesn't record the authenticator information in the common replay cache.
As a result, the file format used by MIT Kerberos has been implemented in other Kerberos implementations. It would not be safe to change the replay cache file format in a manner that prevented the sharing of the replay cache among all of the implementations that can be expected to be deployed on a single machine, whether they be from different versions of MIT Kerberos or implementations from different vendors.
The Proposal
In order to avoid false positives we require that the replay cache data structure store sufficient data to be able to distinguish between two authenticators while at the same time maintaining compatibility with existing deployments.
To do this, we define a new type of extension record consisting of the following:
unsigned int - 1 (length of sentinel) unsigned char - null byte (sentinel to mark a "new" record) unsigned int - length of authenticator-hash extension variable - extension identification "HASH:" variable - hex-encoded hash of authenticator ciphertext char - space variable - ascii decimal length of client principal name char - ":" variable - client principal name char - space char - ascii decimal length of server principal name char - ":" char - server principal name char - null terminator krb5_int32 - microseconds krb5_timestamp - timestamp
The extension identification scheme allows transition to future enhanced replay cache file formats. We eschew binary encoding because embedded null bytes will cause old implementations to truncate that component of the record because they use strlen
to determine the length of the principal name to write out.
This record would be followed by the existing record consisting of the client and server principal names, the time stamp and the microseconds value. A complete entry therefore takes up two existing records.
Hashing the ciphertext is relatively safe given reasonable assumptions about the security of the encryption, and provides additional opportunities to prevent false positives. (The mandatory components of an authenticator might be identical between two successive legitimate requests.) The default hash method would be a MD5 hash. This will be fast to compare, and if there are collisions, the worst case is a false-positive indication that would have happened anyway in the old implementation. It is not important that the hash be very collision-resistant because a collision merely creates a (relatively unlikely) denial of service.
How does this provide backward compatibility?
New servers reading entries written by new servers will gather the new-style entries in pairs and use the hash in the "new" half of the pair. Since a pair consists of a "new" record followed immediately by an "old" record, the new server can distinguish between an "old" record written by an old implementation from a "new" record written by a new implementation. In this way, the new implementation can ignore "old" records written by a new implementation, avoiding any false positives that might occur by matching the more-ambiguous "old" record, while still correctly matching "old" records written by an old implementation, should that be necessary.
For new servers reading entries written by new servers, the comparisons are made based upon the hashes, then the authenticators. The client and server principal names and the timestamps become irrelevant. If there is an authenticator being used multiple times, it is a problem.
For new servers reading entries written by old servers, the string fields will be valid principal names and the time stamp value will not be a small integer. As a result, the server will know it is dealing with an old style entry and perform an old style check. This might lead to false positives but there is nothing we can do without additional information that is not available.
For old servers reading entries written by new servers, the new hash based entry will never match the incoming principal names and will therefore be skipped. The old style entries will be used as they are today.
For old servers reading entries written by old servers, the behavior will be the same as today.
Is this a long term fix for the problem?
Perhaps this is the best we can do. RFC 4120 Section 10. Security Considerations describes how the application server must behave when it is unsure that it has enough replay data to cover the allowable clock skew period:
If a server loses track of authenticators presented within the allowable clock skew, it MUST reject all requests until the clock skew interval has passed, providing assurance that any lost or replayed authenticators will fall outside the allowable clock skew and can no longer be successfully replayed. If this were not done, an attacker could subvert the authentication by recording the ticket and authenticator sent over the network to a server and replaying them following an event that caused the server to lose track of recently seen authenticators.
That implies that at the very least an backward-incompatible change in file format would result in an outage.
In addition, a backward-incompatible change in file format would make it impossible to share a replay cache among services built against different Kerberos implementations. While it is certainly possible to create a faster implementation that is built around B-trees, the new format could only be used if it was known that only MIT Kerberos would ever use that replay cache.
Implementation considerations
The krb5_rc_store
function is an internal API. The krb5_donot_replay
structure can gain a new member to point to the cleartext of the authenticator.
The key to detecting one of these new extension records is finding a replay cache entry that appears to have a zero-length string for the server principal name. The search mechanism that krb5_rc_dfl_store
uses will take into account the authenticator hash.
Old implementations will read the entire extension records in the new replay cache entry, but ignore the new entries due to the zero-length strings.
Review
This section documents the review of the project according to Project policy. It is divided into multiple sections. First, approvals should be listed. To list an approval type
- #~~~~
on its own line. The next section is for discussion. Use standard talk page conventions. In particular, sign comments with
- --~~~~
and indent replies.
Members of Krbcore raising Blocking objections should preface their comment with {{project-block}}. The member who raised the objection should remove this markup when their objection is handled.