guildhouse-spire-plugins/pkg
Tyler J King 83b1264ebc governance: lazy connect + exponential reconnect backoff
NewClient no longer returns an error when Quartermaster is unreachable.
grpc.DialContext without WithBlock is already non-blocking; the prior
10s timeout context was effectively a no-op. Removing it and adding
explicit ConnectParams (BaseDelay 1s, Multiplier 1.5, Jitter 0.2,
MaxDelay 30s, MinConnectTimeout 20s) makes the intended behavior
explicit: the gRPC ClientConn retries connection in the background
with exponential backoff, and RPCs return Unavailable until QM is up.

The governance-notifier and substrate-keymanager plugins already log
RPC errors via handleEvent and continue without aborting the SPIRE
operation, so no call-site changes are needed. This unblocks SPIRE
bootstrap when Quartermaster hasn't been deployed yet, breaking the
SPIRE <-> QM circular deployment dependency.

Added watchConnState helper that logs once per transition so operators
see at SPIRE startup whether QM is reachable: a single WARN-style line
when the connection is not yet Ready, and an INFO line when it becomes
Ready. conn.Connect() is called eagerly so those logs fire at plugin
load rather than waiting for the first RPC.

Deferred:
- Add a unit test for NewClient succeeding with an unreachable address
  (existing TestNewClientAcceptsTLSConfig is a pre-existing failure
  using placeholder cert paths; unrelated to this change).

Signed-off-by: Tyler J King <tking@guildhouse.dev>
2026-04-22 11:53:36 -04:00
..
config feat: network-policy extension, governance lifecycle, audit remediation 2026-03-18 15:54:46 -04:00
governance governance: lazy connect + exponential reconnect backoff 2026-04-22 11:53:36 -04:00
keylime feat(spire): Keylime node attestor plugin — single TPM authority 2026-04-15 20:35:45 -04:00
oidc feat: network-policy extension, governance lifecycle, audit remediation 2026-03-18 15:54:46 -04:00
shellstream feat: network-policy extension, governance lifecycle, audit remediation 2026-03-18 15:54:46 -04:00
sshcert feat: network-policy extension, governance lifecycle, audit remediation 2026-03-18 15:54:46 -04:00