Projects/replay cache collision avoidance

This is an early stage project for MIT Kerberos. It is being fleshed out by its proponents. Feel free to help flesh out the details of this project. After the project is ready, it will be presented for review and approval.

The Problem

The MIT Kerberos replay cache follows the guidelines of RFC 4120 Section 10. Security Considerations:

  Implementation note: If a client generates multiple requests to the
  KDC with the same timestamp, including the microsecond field, all but
  the first of the requests received will be rejected as replays.  This
  might happen, for example, if the resolution of the client's clock is
  too coarse.  Client implementations SHOULD ensure that the timestamps
  are not reused, possibly by incrementing the microseconds field in
  the time stamp when the clock returns the same time for multiple
  requests.

As computers have become faster and greater numbers of processes or threads are requiring network authentication to network services, it is becoming more likely that the microseconds field will match for multiple requests. It is also more difficult to avoid such collisions without introducing a significant performance hit. As a result, an ever increasing number of false positives are triggered even though the authenticators used by the various client processes differ.

The current FILE replay cache format consists of a file without a header made up of records, each of which include the following fields:

 unsigned int   - client length == strlen(client principal name) + 1
 variable       - client principal name (NUL terminated C-string)
 unsigned int   - server length == strlen(server principal name) + 1
 variable       - server principal name (NUL terminated C-string)
 krb5_int32     - micro seconds
 krb5_timestamp - timestamp

These are the fields that are required by RFC 4120 Section 10. Security Considerations:

  Unless the application server provides its own suitable means to
  protect against replay (for example, a challenge-response sequence
  initiated by the server after authentication, or use of a server-
  generated encryption subkey), the server MUST utilize a replay cache
  to remember any authenticator presented within the allowable clock
  skew.  Careful analysis of the application protocol and
  implementation is recommended before eliminating this cache.  The
  replay cache will store at least the server name, along with the
  client name, time, and microsecond fields from the recently-seen
  authenticators, and if a matching tuple is found, the
  KRB_AP_ERR_REPEAT error is returned.

As stated the requirement is that at least the client and server principals along with the complete time stamp are required to be present. Additional data such as a hash of a canonical representation of the authenticator or the full clear text of a canonical representation of the authenticator would be permitted by RFC 4120 and could be used to avoid false positives.

Simply adding new fields to the MIT Kerberos replay cache record would not be an acceptable solution. As indicated in RFC 4120, it is imperative that all services sharing the same service principal share the same replay cache regardless of which Kerberos implementation is in use.

  If multiple servers (for example, different services on one machine,
  or a single service implemented on multiple machines) share a service
  principal (a practice that we do not recommend in general, but that
  we acknowledge will be used in some cases), either they MUST share
  this replay cache, or the application protocol MUST be designed so as
  to eliminate the need for it.  Note that this applies to all of the
  services.  If any of the application protocols does not have replay
  protection built in, an authenticator used with such a service could
  later be replayed to a different service with the same service
  principal but no replay protection, if the former doesn't record the
  authenticator information in the common replay cache.

As a result the file format used by MIT Kerberos has been implemented in other Kerberos implementations. It would not be safe to change the replay cache file format in a manner that prevented the sharing of the replay cache among all of the implementations that can be expected to be deployed on a single machine. Whether they be from different versions of MIT Kerberos or implementations from different vendors.

The Proposal

In order to avoid false positives we require that the replay cache data structure store sufficient data to be able to distinguish between two authenticators while at the same time maintain compatibility with existing deployments.

One method of achieving this goal is to modify the replay cache library to store two records for each entry. The first will consist of the following:

 krb5_uint32    - length of authenticator-hash == strlen(authenticator-hash) + 1
 variable       - authenticator-hash == base64-encoded hash string with NUL terminator
 krb5_uint32    - length == 1 or (strlen(base64-encoded cleartext authenticator) + 1)
 variable       - NUL or base64-encoded cleartext authenticator with NUL terminator
 krb5_int32     - hash-type 
 krb5_timestamp - 0

This record would be followed by the existing record consisting of the client and server principal names, the time stamp and the microseconds value. A complete entry therefore takes up two existing records.

The default hash method would be 128 bits of a SHA-1 hash. This will be fast to compare and if there are collisions the cleartext authenticators can be compared.

How does this provide backward compatibility?

For new servers reading entries written by new servers, the comparisons are made based upon the hashes, then the authenticators. The client and server principal names and the timestamps become irrelevant. If there is an authenticator being used multiple times, it is a problem.

For new servers reading entries written by old servers, the string fields will be valid principal names and the time stamp value will not be a small integer. As a result, the server will know it is dealing with an old style entry and perform and old style check. This might lead to false positives but there is nothing we can do without additional information that is not available.

For old servers reading entries written by new servers. The new hash based entry will never match the incoming principal names and will therefore be skipped. The old style entries will be used as they are today.

For old servers reading entries written by old servers. The behavior will be the same as today.

Is this a long term fix for the problem?

Perhaps this is the best we can do. RFC 4120 Section 10. Security Considerations describes how the application server must behave when it is unsure that it has enough replay data to cover the allowable clock skew period:

  If a server loses track of authenticators presented within the
  allowable clock skew, it MUST reject all requests until the clock
  skew interval has passed, providing assurance that any lost or
  replayed authenticators will fall outside the allowable clock skew
  and can no longer be successfully replayed.  If this were not done,
  an attacker could subvert the authentication by recording the ticket
  and authenticator sent over the network to a server and replaying
  them following an event that caused the server to lose track of
  recently seen authenticators.

That implies that at the very least a change in file format would result in an outage.

In addition, a change in file format would make it impossible to share a replay cache among services built against different Kerberos implementations. While it is certainly possible to create a faster implementation that is built around B-trees, the new format could only be used if it was known that only MIT Kerberos would ever use that replay cache.

Projects/replay cache collision avoidance

Contents

The Problem

The Proposal

How does this provide backward compatibility?

Is this a long term fix for the problem?

Navigation menu

Views

Personal tools

Navigation

Search

Tools