Projects/Hierarchical iprop

From K5Wiki
Jump to: navigation, search
This project was completed in release 1.13.



This project adds the ability for incremental propagation slaves to act as masters for other slaves, forming a hierarchy of masters and slaves. This feature can be useful for controlling propagation load in environments with many slaves, or to address incomplete network connectivity between masters and slaves. The project is based on an existing implementation by Richard Basch.

Existing Design and Code

Since release 1.7, MIT krb5 has had support for incremental propagation for changes to principal information. The slave runs a kpropd daemon in standalone mode, which periodically polls for updates from kadmind on the master, using a GSSRPC-based protocol. Updates are normally transmitted to the slave as marshalled operations in the response to the polling RPC. If the slave drifts too far out of date or if there have been changes to policies on the master, then a full dump is performed from the master to the slave using kprop.

The iprop code experienced numerous changes between 1.11 and 1.12; this section describes the code as it exists in 1.12.

The master and slave store the state needed for incremental propagation in a memory-mapped file called the ulog, which contains a header and a circular array of fixed-size update entries. Each update entry has a timestamp and a serial number; serial numbers begin at 1 and increment for each update. The ulog header contains:

  • kdb_hmagic: A magic number (to detect corruption)
  • db_version_num: A version number (currently set to 1 and unused by readers)
  • kdb_num: The number of updates in the ulog
  • kdb_first_time: The timestamp of the ulog's first entry
  • kdb_last_time: The timestamp of the ulog's last entry
  • kdb_first_sno: The timestamp of the ulog's first entry
  • kdb_last_sno: The timestamp of the ulog's last entry
  • kdb_state: The stability state of the ulog
  • kdb_block: The entry size (normally 2048)

The size of the circular entry array is determined by the iprop_master_ulogsize krb5.conf variable, which is not recorded in the header. The kdb_num, kdb_first_sno, and kdb_last_sno fields are redundant; when kdb_num is not zero, kdb_first_sno must always be equal to kdb_last_sno - kdb_num + 1.

When the ulog is initialized or reinitialized, the kdb_num, kdb_first_time, kdb_first_sno, and kdb_last_sno fields are set to 0, and the kdb_last_time field is set to the time of initialization. The master ulog may be reinitialized because of a policy change (since the iprop protocol cannot transport policy changes), or because the ulog reached the highest possible serial number, or because of administrative action (kproplog -R). The first database update after ulog initialization increments the kdb_num, kdb_first_sno, and kdb_last_sno fields to 1, and sets the kdb_first_time and kdb_last_time fields to the time of the update.

The kdb_first_sno and kdb_first_time fields are used by the following code operations on the master KDC:

  • ulog_reset sets both fields to zero.
  • ulog_add_update, on the first update after initialization, sets kdb_first_sno to 1 and kdb_first_time to the time of the update.
  • ulog_get_entries uses kdb_first_sno to detect if the slave's last update is no longer in the ulog. kdb_first_time is not used here; instead, the timestamp of the ulog entry for the slave's last update sno is checked against the slave's last update time.
  • kdb5_util load uses kdb_first_sno and kdb_first time to detect if the current dump's last update serial number is still in the ulog.
  • kproplog prints both fields when displaying the header status.

The kdb_last_sno and kdb_last_time fields are used by the following code operations on the master KDC:

  • ulog_reset sets kdb_last_sno to zero and kdb_last_time to the current time.
  • ulog_add_update sets both fields to the serial number and timestamp of the update being added.
  • ulog_map detects if ulogentries has decreased after the ulog circled, by checking if kdb_last_sno > kdb_num but kdb_num != ulogentries. The ulog header is reinitialized in this case.
  • ulog_get_entries compares kdb_last_sno and kdb_last_time to the RPC request parameters to see if the slave is up to date. If the slave's serial number is higher, or if it matches kdb_last_sno but the timestamp does not match, the slave is told to perform a full resync because the master's ulog was reinitialized since the last slave update.
  • ulog_get_entries uses kdb_last_sno to count and bound the updates to send to the slave, and uses both fields to set the lastentry parameter of the RPC response.
  • kdb5_util dump checks whether kdb_last_sno is nonzero to detect if the ulog is empty (and therefore does not contain the current dump's last update).
  • kdb5_util dump writes both fields into the dump's iprop header.
  • kdb5_util load blocks non-iprop loads if kdb_last_sno is non-zero.
  • kproplog prints both fields when displaying the header status.
  • kproplog uses kdb_last_sno as the end bound of the loop when displaying entries, and to compute the starting index when a -e parameter is given.

The kdb_num field is used by the following code operations on the master KDC:

  • ulog_reset sets it to zero.
  • ulog_add_update increments it if it is less than ulogentries (i.e. if the ulog is not yet circling).
  • ulog_map detects if ulogentries has shrunk too much prior to the ulog circling, by checking if kdb_num > ulogentries. The ulog header is reinitialized in this case.
  • ulog_map detects it ulogentries might have grown by checking if kdb_num < ulogentries (which can also be true if the ulog simply has not circled yet). In this case, the file size is checked and the file is extended with zero bytes if required.
  • kproplog displays it as part of the header status.
  • kproplog checks if kdb_num is zero before printing update entries.
  • kproplog ignores the -e argument if it is equal to or greater than kdb_num.

Slaves do not use the update entry array and only make use of two header fields, kdb_last_sno and kdb_last_time, to track the most recently received update from the master. The following code operations use the two fields:

  • ulog_finish_update_slave, called from ulog_replay, sets both fields to the serial number and timestamp of the RPC response's lastentry parameter, or to zero if there was an error replaying the changes.
  • kdb5_util load sets the two fields to the values from the iprop header when an iprop dump is loaded.
  • kdb5_util load blocks non-iprop loads if kdb_last_sno is nonzero, signifying that the ulog received updates from the master at a previous time.
  • ulog_reset sets kdb_last_sno to 0 and kdb_last_time to the current time. This can happen if the administrator runs kproplog -R on the slave KDC, or when the slave database is first created.

Proposed changes

Three changes are needed to allow hierarchical iprop:

1. On slaves, maintain a complete ulog. When update entries are received from the upstream master and processed by ulog_replay(), add them to the ulog in addition to modifying the KDB. When the slave receives a full dump from upstream, its ulog is reset and its kdb_last_sno and kdb_last_time fields reflect the information in the dump file.

2. Add a -proponly option to kadmind so that it can be run on slaves as an iprop server.

3. Add a "-A server" option to kpropd to make it talk to a specified upstream master rather than the krb5.conf value of admin_server.

The ulog on a slave can be in states that are impossible on the master:

  • After receiving a full dump from upstream, the kdb_num, kdb_first_time, and kdb_first_sno fields are 0, but the kdb_last_sno and kdb_last_time fields reflect the most recent update from the upstream master when the dump file was created. Although the ulog is empty in this state, downstream slaves should still receive UPDATE_NIL (not UPDATE_FULL_RESYNC_NEEDED) if they match the kdb_last_sno and kdb_last_time value.
  • After a ulog reset or full dump, subsequent updates reflect the serial numbers of the updates received from upstream, instead of starting at 1. On the master, if kdb_num is less than ulogsize, kdb_first_sno is always 1 and kdb_last_sno is always equal to kdb_num. On a slave, updates can begin at a serial number other than 1, and in this case they can wrap around within the circular array before the array fills up and old entries are overwritten.

As a result, some code operations must change:

  • ulog_map can no longer detect from header fields alone that the ulogsize has shrunk in a way that invalidates the entry array (and the current check would give frequent false positives). For example, if the ulog is initialized with a ulogsize of 1000, then a single update is added from upstream with serial number 1700, and then the ulog is mapped again with a ulogsize of 600, then the header fields will appear exactly the same as if the update had been added when ulogsize was 600, but the update was written to position 699 and will be read from position 499. Instead, we need to verify the serial numbers of the first and last update against the entry array.
  • ulog_get_entries must explicitly check if the ulog is empty and require a full resync if the downstream slave does not match kdb_last_sno/kdb_last_time. Otherwise, an intermediate slave which has just taken a full dump from upstream will erroneously believe that it has a wide range of serial numbers in its ulog (because kdb_first_sno is 0 and kdb_last_sno is the serial number from the upstream dump).
  • kdb5_util dump's current_sno_in_ulog() function must check for an empty ulog using kdb_num, not kdb_last_sno, since a slave can have an empty ulog with non-zero kdb_last_sno. However, if the dump file matches kdb_last_sno and kdb_last_time, the dump should be considered up to date even if the ulog is empty.

In addition, more locking is necessary. Currently kdb5_util and kpropd read from and write to the ulog header without locking, and ulog_replay() does not lock. This is not a serious problem in the current architecture where slaves are largely single-threaded, but it will become a problem if slave operations take place alongside master operations.

The code changes will be broken out into the following commits:

  1. Add a new interface ulog_get_sno_status, so that current_sno_in_ulog and ulog_get_entries can do the same thing to check how current a serial number and timestamp are. In the new function, check if the ulog is empty (using kdb_num) and only accept an exact match against kdb_last_sno/time if it is. Otherwise, check the presented serial number and timestamp against the actual ulog entry.
  2. Add new interfaces ulog_get_last and ulog_set_last, and use them in kdb5_util and kpropd to properly lock around accesses to the ulog header. Make ulog_init_header acquire an exclusive lock.
  3. Remove the ulog_map caller parameter, and make it always behave the same way, so that kpropd is prepared to write ulog entries. Make kproplog map the ulog directly without going through ulog_map (unless -R is given), since its mapping semantics don't have much in common with other callers.
  4. Modify ulog_replay to store ulog updates in the slave ulog as they are processed, using code factored out from ulog_add_update. Modify the ulog_map check for incompatible ulogentries changes to check the first and last serial numbers against the actual ulog entries.
  5. Implement kadmind -proponly.
  6. Implement kpropd -A.
  7. Augment the existing automated iprop tests to verify the new functionality.
  8. Add documentation on setting up hierarchical iprop.

Related problems

These problems are tangentially related to hierarchical iprop, but are out of scope for the project.

1. After receiving a full resync, the slave may not have the most current database state, because the master may have used a pre-existing dump file which was not current. Ideally the slave would request incremental changes immediately after processing the dump file. Currently there is no communication channel between the kpropd child process which receives the dump file and the parent which polls for incremental changes, so the slave waits for the next polling interval to ask for incremental updates.

The best solution is to make kpropd use a single process to serve incoming kprop connections and poll for changes, using libverto. A quicker solution is to make the do_standalone() process send a SIGUSR1 to its parent whenever it finishes processing a connection.

2. If the master ulog is initialized and performs a full resync to the slave before any updates are added to its ulog, the slave has kdb_last_sno 0 and kdb_last_time equal to the time of the master's reinitialization. As long as the slave and master states match, the master does not repeatedly resync to the slave. However, the next time an update is added to the master's ulog, the master discards its serial number 0 state and sets both kdb_first_sno and kdb_last_sno to 1. When it receives the next poll from the slave, it has no way of knowing whether the slave's serial number 0 state precedes its first update or reflects a different full resync, so it performs a second full resync to the slave. This is correct but inefficient.

With hierarchical propagation, the same problem occurs after an intermediate slave receives a full resync and then performs a full resync to a downstream slave; the next update it receives from master overwrites its kdb_first_sno and kdb_last_sno values with the serial number of the received update, causing it to perform a second full resync to the downstream slave.

We could try to solve this problem by redefining kdb_first_sno to be the serial number state preceding the first update, rather than the first update itself, so that kadmind can recognize that a slave's state matches the state the master was in just prior to the first update in its ulog. This change would create a (perhaps harmless) ambiguity during a code upgrade, because a ulog file which had been maintained with older code would have the old definition of kdb_first_sno. More seriously, this change might make it impossible to reliably detect incompatible ulog size changes when mapping the ulog file, because we would no longer have a timestamp with which to verify the first serial number. (It might be sufficient to verify the serial number of the first update entry as deduced from kdb_last_sno and kdb_num; more analysis would be required.)

Alternatively, we could write a no-op update entry to the ulog after a reinitialization or after receiving a full resync. kadmind never transmits the first update entry in the ulog, so this update would never be sent to a downstream slave. This solution maintains the current semantics of kdb_first_sno and kdb_first_time, and retains a timestamp with which the first update entry can be verified to detect incompatible ulog size changes.


The existing t_iprop.py will be augmented to propagate from the first slave to a second slave. The ulogs on master, intermediate slave, and leaf slave will be checked after each operation to ensure that the kdb_num, kdb_first_sno, and kdb_last_sno fields are as expected and that the entries in the ulog each have the expected principal name.


database.rst will be modified to describe how to set up hierarchical iprop.

Mailing list discussions


   444ef5fe9ec8d64a5db27b3a8aaf6813dd7ef0e0 Simplify iprop update locking and avoid deadlock
   d1f9aa3737b2b3e62b5c5ed488d6112b7ce8a5ad Factor out ulog serial number status check
   71d028f1054deb186807e7c8048218b82b478422 Lock around more ulog operations
   6a4a4b7b5e3265e4a811a9fd72c2534e6c5f5fd4 Simplify ulog_map
   406c83c835a8ce062d798a2ec4eda2eddd088450 Maintain complete ulog on iprop slaves
   2ed8ebf18809af66aeaa2af6984754bdbefff500 Implement kadmind -proponly
   90c11ff42008a90a72ee71444b0ad799e38b7ff0 Implement kpropd -A
   cf090890b4219483412ab89b39a60d92515191eb Test iprop slave ulog management
   e87bba2e8a8c753b761227dda5f2e216a6771db2 Document hierarchical iprop

Finished in [krbdev.mit.edu #7861] [krbdev.mit.edu #7855]

Release notes

Administrative experience:

  • Add support for hierarchical incremental propagation, where slaves can act as intermediates between an upstream master and other downstream slaves.
Personal tools