Commit graph

14 commits

Author SHA256 Message Date
Tyler J King
68414987d5 fix(gsap-attestor): handle SPIRE's HCL v1 quoted-key format
SPIRE converts JSON plugin_data to HCL v1 native syntax with quoted
attribute names ("max_depth" = 10). HCL v2's parser rejects quoted
keys, so strip them before parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-13 08:43:05 -04:00
Tyler J King
f4f02b0e2e debug(gsap-attestor): include raw input and both parse errors in diagnostics
Temporary diagnostic commit — surfaces the exact data SPIRE sends to
the Configure RPC so we can determine why JSON decode fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-13 08:32:16 -04:00
Tyler J King
646944ab2a fix(gsap-attestor): use HCL JSON mode for SPIRE plugin_data parsing
SPIRE's chart renders plugin_data as JSON via reformat-and-yaml2json,
so hclsimple.Decode with "plugin.json" filename triggers HCL v2 JSON
mode. Falls back to native HCL for direct testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-13 06:49:15 -04:00
Tyler J King
490c813586 fix(gsap-attestor): use spire-plugin-sdk for SPIRE compatibility
The original implementation used hashicorp/go-plugin directly with a
custom handshake, which SPIRE rejected. Switch to spire-plugin-sdk's
pluginmain.Serve() for correct WorkloadAttestor protocol negotiation,
implement ConfigServer for plugin_data parsing, and return selector
values in key:value format (SPIRE infers the type prefix from the
plugin name). Config decoding tries JSON first (chart renders YAML
as JSON) then falls back to HCL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-13 06:37:37 -04:00
Tyler J King
fe5e2cf3c6 feat(spire): gsap-attestor WorkloadAttestor plugin
SPIRE WorkloadAttestor that reads governance env vars from /proc/{pid}/environ
(walking up the process tree to find gsh) and emits gsap: selectors on workload
SVIDs. Maps BASCULE_* vars set by bascule-shell and future GSH_* vars to the
11-selector vocabulary defined in gsap-types/src/selectors.rs.

- pkg/gsap/selectors.go: shared Go constants mirroring Rust vocabulary
- cmd/gsap-attestor/: plugin implementation with /proc reading, process tree
  walking, capability ceiling translation, and fail-open for non-governed processes
- 28 tests covering selector extraction, proc parsing, tree walking, and depth limits
- Makefile, Dockerfile, deploy config updated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-13 03:59:08 -04:00
f3e1d161d0 packaging: production Dockerfile for spire-plugins image
Two-stage build. Builder stage: golang:1.23.6-bookworm (pinned to
match the go directive in go.mod exactly), CGO_ENABLED=0, -trimpath
and -s -w linker flags for reproducible, size-minimized static
binaries. Compiles all four plugin binaries into /plugins/.

Runtime stage: debian:bookworm-slim with the /plugins/ directory
copied in and made world-readable. The image is inert — SPIRE server
and agent Deployments consume it via an initContainer that runs
`cp -r /plugins/ /opt/spire/plugins/` into a shared emptyDir volume,
so no ENTRYPOINT is needed.

Path: git.guildhouse.dev/tking/spire-plugins:v0.1.0.

Not replacing Containerfile.dev, which remains the local-dev variant.

Signed-off-by: Tyler J King <tking@guildhouse.dev>
2026-04-22 12:06:55 -04:00
83b1264ebc governance: lazy connect + exponential reconnect backoff
NewClient no longer returns an error when Quartermaster is unreachable.
grpc.DialContext without WithBlock is already non-blocking; the prior
10s timeout context was effectively a no-op. Removing it and adding
explicit ConnectParams (BaseDelay 1s, Multiplier 1.5, Jitter 0.2,
MaxDelay 30s, MinConnectTimeout 20s) makes the intended behavior
explicit: the gRPC ClientConn retries connection in the background
with exponential backoff, and RPCs return Unavailable until QM is up.

The governance-notifier and substrate-keymanager plugins already log
RPC errors via handleEvent and continue without aborting the SPIRE
operation, so no call-site changes are needed. This unblocks SPIRE
bootstrap when Quartermaster hasn't been deployed yet, breaking the
SPIRE <-> QM circular deployment dependency.

Added watchConnState helper that logs once per transition so operators
see at SPIRE startup whether QM is reachable: a single WARN-style line
when the connection is not yet Ready, and an INFO line when it becomes
Ready. conn.Connect() is called eagerly so those logs fire at plugin
load rather than waiting for the first RPC.

Deferred:
- Add a unit test for NewClient succeeding with an unreachable address
  (existing TestNewClientAcceptsTLSConfig is a pre-existing failure
  using placeholder cert paths; unrelated to this change).

Signed-off-by: Tyler J King <tking@guildhouse.dev>
2026-04-22 11:53:36 -04:00
Tyler J King
f0268305ae docs(spire): revocation cascade timing + Keylime SPIRE server config
Document the trust withdrawal cascade:
  Keylime breach → posture degraded → sessions downgraded
  → SPIRE re-attestation fails → SVIDs expire
  → service mTLS fails → quorum degrades

No new code for the cascade — it's emergent from existing
re-attestation behavior + the Keylime attestor plugin.
SPIRE federation handles cross-edge propagation through
standard certificate expiration.

Three timing profiles: Standard (~1hr), Enhanced (~15min),
Critical (~5min) with SVID TTL configuration guidance.

Example SPIRE server config with Keylime attestor + k8s_psat
fallback for nodes without hardware TPM.

Signed-off-by: Tyler King <tking@guildhouse.dev>
Signed-off-by: Tyler J King <tking727@gmail.com>
2026-04-15 20:36:00 -04:00
Tyler J King
5f62da6ca9 feat(spire): Keylime node attestor plugin — single TPM authority
Custom SPIRE NodeAttestor that queries Keylime attestation status
instead of performing independent TPM attestation. Keylime remains
the single TPM authority in the stack.

Two data source strategies:
- ConfigMap (default): reads posture-current ConfigMap (recommended,
  consistent with single-consumer principle)
- Verifier: queries Keylime verifier REST API directly (for
  out-of-cluster SPIRE servers)

Fail-closed: unknown nodes, unreachable sources, degraded posture
all result in non-attested verdict — no SVID issued.

Maps posture level to attestation verdict:
  Normal(5)/Elevated(4) → Attested
  Restricted(3) → Pending
  Critical(2)/Lockdown(1) → Failed

8 unit tests covering ConfigMap source, verifier mapping, edge cases.

Signed-off-by: Tyler King <tking@guildhouse.dev>
Signed-off-by: Tyler J King <tking727@gmail.com>
2026-04-15 20:35:45 -04:00
a58d548518 feat: network-policy extension, governance lifecycle, audit remediation
- Network-policy SPIRE plugin extension
- Governance event notification with merkle anchoring
- Shellstream specs for consent channels + HFL embedded ABI
- All 17 audit findings from AUDIT.md remediated
- SSH credential composer + substrate key manager updates
- Test coverage for config + sshcert packages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 15:54:46 -04:00
6321037ac1 Add network-policy extension and network governance lifecycle events
New shellstream extension §10.6 network-policy@guildhouse.dev carrying
GovernedNetworkPolicy hash in SSH certificates. New §8.7 in upper layers
spec documenting network governance lifecycle events (attach, detach,
flow policy, route announce/withdraw) emitted by governance-notifier
using the tiered consent transport model.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 19:38:13 -05:00
9319ad0ce8 Update Shellstream specs for consent channels and HFL embedded ABI
Add consent-channels@guildhouse.dev SSH certificate extension for
advertising available consent transport channels. Add §8.6 to upper
layers spec describing HFL as the in-process capability boundary
within Shellstream sessions, with WIT as the formal contract.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 17:57:48 -05:00
420a4e2ea0 Remediate all 17 audit findings from AUDIT.md
Critical fixes:
- F-01: SatScope array form support (single pointer → slice with polymorphic JSON)
- F-02: Add governance-intent@guildhouse.dev as 10th Shellstream extension
- F-06: Replace os.Exit(1) stubs with go-plugin Serve() boilerplate in all cmd/
- F-13: Validate SatScope.ResourcePattern is non-empty

High priority:
- F-03: Add normative Accord policy syntax note to credential-governance.md §8.2
- F-04: Replace OID XXXXX placeholder with explicit PEN reference and IANA TODO
- F-05: Document CredentialComposer hook mapping in spec and plugin-types.md
- F-07/F-08: Commit CI pipeline (.github/workflows/ci.yaml)
- F-09: Add hashicorp/go-plugin v1.6.3 to go.mod

Medium priority:
- F-10: Wire sample-ssh-cert-extensions.json fixture into shellstream tests
- F-11: Cross-reference merkle proof depth limit (256 leaves) in governance spec
- F-12: Add YAML format clarification headers to deploy configs
- F-14: Expand README with project status, docs links, and quick-start

Low priority:
- F-15: Standardize "SSH SVID" → "SSH-SVID" terminology across docs
- F-16: Add GovernanceEpochSeconds to PluginConfig and deploy configs
- F-17: Add troubleshooting section to deployment.md, error handling to OIDC docs

Global: Rename all extension keys from @guildhouse.io to @guildhouse.dev

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 11:45:33 -05:00
3dc3e9ee37 Initial scaffolding: specs, plugins, pkg/shellstream 2026-02-18 10:47:09 -05:00