Projects/replay cache collision avoidance
The MIT Kerberos replay cache follows the guidelines of RFC 4120 Section 10. Security Considerations:
Implementation note: If a client generates multiple requests to the KDC with the same timestamp, including the microsecond field, all but the first of the requests received will be rejected as replays. This might happen, for example, if the resolution of the client's clock is too coarse. Client implementations SHOULD ensure that the timestamps are not reused, possibly by incrementing the microseconds field in the time stamp when the clock returns the same time for multiple requests.
As computers have become faster and greater numbers of processes or threads are requiring network authentication to network services, it is becoming more likely that the microseconds field will match for multiple requests. It is also more difficult to avoid such collisions without introducing a significant performance hit. As a result, an ever increasing number of false positives are triggered even though the authenticators used by the various client processes differ.
The current FILE replay cache format consists of a file without a header made up of records, each of which include the following fields:
unsigned int - client length == strlen(client principal name) + 1 variable - client principal name (NUL terminated C-string) unsigned int - server length == strlen(server principal name) + 1 variable - server principal name (NUL terminated C-string) krb5_int32 - micro seconds krb5_timestamp - timestamp
These are the fields that are required by RFC 4120 Section 10. Security Considerations:
Unless the application server provides its own suitable means to protect against replay (for example, a challenge-response sequence initiated by the server after authentication, or use of a server- generated encryption subkey), the server MUST utilize a replay cache to remember any authenticator presented within the allowable clock skew. Careful analysis of the application protocol and implementation is recommended before eliminating this cache. The replay cache will store at least the server name, along with the client name, time, and microsecond fields from the recently-seen authenticators, and if a matching tuple is found, the KRB_AP_ERR_REPEAT error is returned.
As stated the requirement is that at least the client and server principals along with the complete time stamp are required to be present. Additional data such as a hash of a canonical representation of the authenticator or the full clear text of a canonical representation of the authenticator would be permitted by RFC 4120 and could be used to avoid false positives.
Simply adding new fields to the MIT Kerberos replay cache record would not be an acceptable solution. As indicated in RFC 4120, it is imperative that all services sharing the same service principal share the same replay cache regardless of which Kerberos implementation is in use.
If multiple servers (for example, different services on one machine, or a single service implemented on multiple machines) share a service principal (a practice that we do not recommend in general, but that we acknowledge will be used in some cases), either they MUST share this replay cache, or the application protocol MUST be designed so as to eliminate the need for it. Note that this applies to all of the services. If any of the application protocols does not have replay protection built in, an authenticator used with such a service could later be replayed to a different service with the same service principal but no replay protection, if the former doesn't record the authenticator information in the common replay cache.
As a result the file format used by MIT Kerberos has been implemented in other Kerberos implementations. It would not be safe to change the replay cache file format in a manner that prevented the sharing of the replay cache among all of the implementations that can be expected to be deployed on a single machine. Whether they be from different versions of MIT Kerberos or implementations from different vendors.
In order to avoid false positives we require that the replay cache data structure store sufficient data to be able to distinguish between two authenticators while at the same time maintain compatibility with existing deployments.
One method of achieving this goal is to modify the replay cache library to store two records for each entry. The first will consist of the following:
krb5_uint32 - length of authenticator-hash == strlen(authenticator-hash) + 1 variable - authenticator-hash == base64-encoded hash string with NUL terminator krb5_uint32 - length == 1 or (strlen(base64-encoded cleartext authenticator) + 1) variable - NUL or base64-encoded cleartext authenticator with NUL terminator krb5_int32 - hash-type krb5_timestamp - 0
This record would be followed by the existing record consisting of the client and server principal names, the time stamp and the microseconds value. A complete entry therefore takes up two existing records.
The default hash method would be 128 bits of a SHA-1 hash. This will be fast to compare and if there are collisions the cleartext authenticators can be compared.
How does this provide backward compatibility?
For new servers reading entries written by new servers, the comparisons are made based upon the hashes, then the authenticators. The client and server principal names and the timestamps become irrelevant. If there is an authenticator being used multiple times, it is a problem.
For new servers reading entries written by old servers, the string fields will be valid principal names and the time stamp value will not be a small integer. As a result, the server will know it is dealing with an old style entry and perform and old style check. This might lead to false positives but there is nothing we can do without additional information that is not available.
For old servers reading entries written by new servers. The new hash based entry will never match the incoming principal names and will therefore be skipped. The old style entries will be used as they are today.
For old servers reading entries written by old servers. The behavior will be the same as today.
Is this a long term fix for the problem?
Perhaps this is the best we can do. RFC 4120 Section 10. Security Considerations describes how the application server must behave when it is unsure that it has enough replay data to cover the allowable clock skew period:
If a server loses track of authenticators presented within the allowable clock skew, it MUST reject all requests until the clock skew interval has passed, providing assurance that any lost or replayed authenticators will fall outside the allowable clock skew and can no longer be successfully replayed. If this were not done, an attacker could subvert the authentication by recording the ticket and authenticator sent over the network to a server and replaying them following an event that caused the server to lose track of recently seen authenticators.
That implies that at the very least a change in file format would result in an outage.
In addition, a change in file format would make it impossible to share a replay cache among services built against different Kerberos implementations. While it is certainly possible to create a faster implementation that is built around B-trees, the new format could only be used if it was known that only MIT Kerberos would ever use that replay cache.