Projects/Parallel KDC
Problem
The KDC is a single-threaded daemon--once it receives a complete request from a client, it fully processes that request before receiving another. The performance consequences of this are threefold:
- Only one CPU services KDC requests, including cryptography operations.
- When the KDC is reading data from disk (such as the replay cache or a BDB database), it does nothing else.
- If the KDB module retrieves data from a remote source (such as an LDAP query), the KDC does nothing while waiting for a reply.
Most KDCs experience only moderate load and can service requests quickly. In some circumstances, higher performance may be required.
Candidate Solutions
There are four possible solutions, the first of which is already possible:
- The realm administrator can run multiple KDC processes on the same host, each listening on a different port, each accessing the same database. This is possible with the current implementation, and SRV records can be used to avoid the need for client configuration; however, it does not yield optimal performance. Each client request will select a port without knowing whether the KDC process servicing that port is busy, and will wait for a timeout before trying another port. Moreover, MIT krb5 client code does not implement randomization of equal-priority SRV records, so randomization of SRV responses by the DNS infrastructure would be necessary for load-balancing to occur, and such randomization is sometimes defeated by caching. Parallelism is limited to the number of KDC processes.
- We could make the KDC event-oriented. This approach would require refactoring the entire KDC code base and all KDB modules. The DAL would have to provide KDB modules access to the listen_and_process main loop, and all DAL requests would have to be structured with callbacks or other mechanisms to allow the answer to arrive after further iterations of the main loop. This approach would only solve the problem of allowing the KDC to perform work while waiting for remote data sources such as LDAP; it would not allow multiple CPUs to service KDC requests or allow the KDC to perform work while waiting for disk accesses to complete.
- We could make the KDC multithreaded. This approach would require eliminating all use of global state (in particular, the kdc_active_realm variable and all of the macros such as kdc_context which derive from it) and ensuring that all library code used by the KDC is thread-safe. Any mistakes in thread-safety might result in difficult-to-debug race conditions, some of which might have security consequences.
- We could make the KDC use a multi-process worker model. After setting up its initial state including listener sockets, the KDC would fork multiple subprocesses. The set of idle subprocesses would compete for UDP packets or incoming TCP connections on the listener sockets, invisibly to clients. Once a worker process has obtained a request, it would service it according to the current single-threaded logic. Parallelism would be limited to the number of worker processes.
This project proposes to implement the fourth option, as it requires minimal code changes and does not introduce much additional risk.
Design of Proposed Solution
A new option would need to be added to the getopt() loop in initialize_realms() to specify the number of worker threads. The -w option is a reasonable option since it is currently unused.
Code to create the worker processes would be invoked from main() after the call to write_pid_file(). The parent process would act as a proxy for SIGTERM and SIGHUP so that killing the pid in the pid file terminates or signals all worker processes.
The network socket code would likely need to set the listening sockets to non-blocking, and process_packet() would need to ignore EAGAIN errors instead of logging them.
When the host network is reconfigured, the KDC must rebind its UDP listening sockets, on platforms with no IP_PKTINFO support. (This is necessary to send UDP replies from the same interface we received the request on.) This cannot be done in the worker subprocesses since multiple processes cannot all bind to the same port, so workers would have to be terminated and restarted to perform reconfiguration. Alternatively, the network reconfiguration support could be disabled when worker processes are used, or removed entirely; an inventory of IP_PKTINFO platform support would be helpful to evaluate the viability of these options.
The KDB module should be independently opened in each worker process (rather than opening it once and cloning the resulting process state). It is also probably a good idea to open and then close the KDB module in the supervisor, prior to starting worker processes, in order to get a more controlled failure if the KDB module is misconfigured.
The logging code would need to be examined to make sure that concurrent access to the same logging sinks would not create problems.
Additional attention to bug #1671 (no file locking used by replay cache) may be necessary to evaluate whether there is a security impact on a multi-process KDC, keeping in mind that allowing one replay to each independent KDC processes is typically not considered a serious security threat in master/slave scenarios.
Testing Plan
In most test scenarios, requests are processed too quickly by the KDC to measure any difference in behavior from a multi-process worker model. It should be possible to test this by hand by temporarily modifying the BDB back end to sleep() for a minute when looking up a particular principal name such as "slowuser". While testing, note that libkrb5 will retry requests after a timeout, so a single "kinit slowuser" will cause multiple worker processes to block unless the retry logic is disabled in the client code. Client retry logic can be disabled in lib/krb5/os/sendto_kdc.c by changing MAX_PASS from 3 to 1 and changing all assignments to socktype2 to 0 (e.g. instead of SOCK_STREAM).
Automated testing of this functionality would be pretty tricky; we would need a special stub KDB back end to cause worker processes to block, as well as a way to control the client retry loop.
Schedule
This project is currently on the back burner.