- Network-policy SPIRE plugin extension - Governance event notification with merkle anchoring - Shellstream specs for consent channels + HFL embedded ABI - All 17 audit findings from AUDIT.md remediated - SSH credential composer + substrate key manager updates - Test coverage for config + sshcert packages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1095 lines
58 KiB
Markdown
1095 lines
58 KiB
Markdown
# Security & Integrity Audit: guildhouse-spire-plugins
|
||
|
||
**Date:** 2026-02-18
|
||
**Commit:** eb9edf5
|
||
**Repository:** guildhouse-cooperative/guildhouse-spire-plugins
|
||
**Scope:** OIDC token handling, SSH certificate generation, Shellstream extensions, governance integration, key management, configuration/deployment, test fixtures, specification security review
|
||
**License:** Apache 2.0
|
||
**Build Status:** UNVERIFIABLE (Go toolchain not available on audit machine)
|
||
|
||
---
|
||
|
||
## Attacker Profiles
|
||
|
||
| ID | Profile | Capabilities |
|
||
|----|---------|-------------|
|
||
| A1 | Malicious Operator | Valid platform credentials, admin-level Keycloak access, can create intents and ceremonies, can configure SPIRE plugins |
|
||
| A2 | Network Attacker | Can intercept, replay, or DoS gRPC traffic between SPIRE Server and Quartermaster services; no valid credentials |
|
||
| A3 | Compromised Workload | Has a valid SPIFFE SVID, can call the Workload API, can present crafted OIDC tokens; confined to a single pod/namespace |
|
||
| A4 | Rogue Plugin | Binary substituted in the SPIRE plugin path (`/opt/spire/plugins/`); full access to plugin IPC channel |
|
||
| A5 | Insider with Merkle Access | Read/write access to NotaryService data; can query, replay, or forge merkle proofs and anchors |
|
||
|
||
## Severity Scale
|
||
|
||
| Level | Definition |
|
||
|-------|-----------|
|
||
| CRITICAL | Exploitable in current code/design; leads to authentication bypass, privilege escalation, or complete audit trail compromise |
|
||
| HIGH | Exploitable with moderate effort or upon implementation; leads to significant security degradation |
|
||
| MEDIUM | Requires specific conditions or affects defense-in-depth; leads to partial security degradation |
|
||
| LOW | Minor issue; limited exploitability or impact |
|
||
| INFORMATIONAL | Observation; no direct security impact but worth noting |
|
||
| DESIGN-NOTE | Architectural decision with security implications; not a vulnerability per se |
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
This audit identified **20 findings**: 3 CRITICAL, 6 HIGH, 5 MEDIUM, 2 LOW, 1 INFORMATIONAL, and 3 DESIGN-NOTE.
|
||
|
||
**Top 3 most impactful findings:**
|
||
|
||
1. **S-01 (CRITICAL):** `CertRequest` allows requester-controlled TTL and principals with no server-side enforcement of the spec's 1-hour maximum. A compromised workload can request arbitrarily long-lived certificates with unrestricted principal lists.
|
||
|
||
2. **S-02 (CRITICAL):** The `Verifier` interface does not contractually require audience validation, enabling token confusion attacks where an OIDC token intended for one service is accepted by another.
|
||
|
||
3. **S-03 (CRITICAL):** Merkle proofs embedded in SSH certificates are not cryptographically bound to the certificate content, allowing proof replay across certificates to forge governance provenance.
|
||
|
||
**Security posture:** The architectural design is sound — fail-closed governance authorization, short-lived certificates, domain-separated hashing, and JCS canonicalization are strong foundations. However, the interface designs in `pkg/oidc` and `pkg/sshcert` contain structural flaws that, if carried into implementation, would create exploitable vulnerabilities. The `pkg/shellstream` implementation is the most mature and has good validation, though it lacks input size limits on the decode path.
|
||
|
||
**Key architectural strengths:**
|
||
- GovernanceService fail-closed policy (spec Section 10.1) — credential operations cannot bypass authorization
|
||
- Short-lived SSH certificates (5m default, 1h max per spec) — limits blast radius of credential compromise
|
||
- Domain-separated SHA-256 hashing (`guildhouse.credential.v1:` prefix) — prevents cross-protocol collisions
|
||
- JCS (RFC 8785) canonicalization — ensures deterministic merkle leaf construction
|
||
- Append-only merkle chain with `previous_root` linkage — tamper-evident audit trail
|
||
|
||
---
|
||
|
||
## Trust Boundary Diagram
|
||
|
||
```
|
||
TRUST BOUNDARY: Cluster Network
|
||
================================
|
||
┌─────────────┐ ┌──────────────────────┐
|
||
│ Workload │──[Unix Socket]──▶┌──────────────┐ │ GovernanceService │
|
||
│ (Pod) │ Workload API │ SPIRE Agent │──[gRPC/mTLS]──▶ │ (Quartermaster) │
|
||
│ │ │ │ SPIRE Server API │ :50051 │
|
||
│ OIDC Token │ │ Attestors: │ │ └──────────────────────┘
|
||
│ (injected) │ │ - k8s_psat │ │ ▲
|
||
└─────────────┘ │ - guildhouse│ ▼ │
|
||
│ │ _oidc │┌──────────────┐ │ [gRPC/mTLS]
|
||
│ └──────────────┘│ SPIRE Server │ │
|
||
│ │ │ ┌──────────────────────┐
|
||
│ TRUST BOUNDARY: │ Plugins: │ │ CeremonyService │
|
||
│ Pod Filesystem │ ─────────── │ │ (Bascule) │
|
||
│ │ KeyManager │──│ :50052 │
|
||
▼ │ (substrate) │ └──────────────────────┘
|
||
┌─────────────┐ │ │ │
|
||
│ token_path │ │ Credential │ │ [gRPC/mTLS]
|
||
│ /var/run/ │ │ Composer │ ▼
|
||
│ secrets/ │ │ (ssh) │ ┌──────────────────────┐
|
||
│ oidc/token │ │ │ │ NotaryService │
|
||
└─────────────┘ │ Notifier │──│ (Quartermaster) │
|
||
│ (governance)│ │ :50051 │
|
||
└──────────────┘ └──────────────────────┘
|
||
│
|
||
│ [SSH Certificate]
|
||
▼
|
||
┌──────────────┐
|
||
│ SSH Server │
|
||
│ (validates │
|
||
│ cert + ext) │
|
||
└──────────────┘
|
||
|
||
Trust Boundaries:
|
||
TB-1: Workload ↔ SPIRE Agent — Unix domain socket, kernel PID attestation
|
||
TB-2: SPIRE Agent ↔ SPIRE Server — gRPC with mTLS (X.509-SVIDs)
|
||
TB-3: SPIRE Server ↔ Governance — gRPC, mTLS required (TODO in code)
|
||
TB-4: SPIRE Server ↔ Ceremony — gRPC, mTLS required (TODO in code)
|
||
TB-5: SPIRE Server ↔ Notary — gRPC, mTLS required (TODO in code)
|
||
TB-6: Workload ↔ SSH Server — SSH protocol, certificate-based auth
|
||
TB-7: Pod filesystem ↔ Workload — OIDC token file, filesystem permissions
|
||
```
|
||
|
||
---
|
||
|
||
## Threat Analysis Matrix
|
||
|
||
| Finding | Severity | Attacker | Scope Area | Confidentiality | Integrity | Availability |
|
||
|---------|----------|----------|------------|-----------------|-----------|--------------|
|
||
| S-01 | CRITICAL | A1, A3 | SSH Cert Generation | Low | **High** | Low |
|
||
| S-02 | CRITICAL | A3 | OIDC Token Handling | **High** | **High** | Low |
|
||
| S-03 | CRITICAL | A5 | Spec Security | Low | **High** | Low |
|
||
| S-04 | HIGH | A2 | Spec Security | Low | **High** | Medium |
|
||
| S-05 | HIGH | A2, A3 | Shellstream Extensions | Low | Low | **High** |
|
||
| S-06 | HIGH | A1 | Governance Integration | Low | **High** | Low |
|
||
| S-07 | HIGH | A1 | Configuration | Low | Low | **High** |
|
||
| S-08 | HIGH | A1 | OIDC Token Handling | **High** | **High** | Low |
|
||
| S-09 | HIGH | A4 | Key Management | **High** | **High** | **High** |
|
||
| S-10 | MEDIUM | A1, A3 | Governance Integration | Low | Medium | Low |
|
||
| S-11 | MEDIUM | A3 | SSH Cert Generation | Low | Medium | Low |
|
||
| S-12 | MEDIUM | A1 | Configuration | Low | Low | Medium |
|
||
| S-13 | MEDIUM | A1, A5 | Shellstream Extensions | Low | Medium | Low |
|
||
| S-14 | MEDIUM | A1, A3 | Test Fixtures | Medium | Medium | Low |
|
||
| S-15 | DESIGN-NOTE | A2 | Spec Security | Low | Medium | Low |
|
||
| S-16 | LOW | A1 | Spec Security | Low | Medium | Low |
|
||
| S-17 | LOW | A5 | Spec Security | Low | Medium | Low |
|
||
| S-18 | INFORMATIONAL | A1 | Governance Integration | Low | Medium | Low |
|
||
| S-19 | DESIGN-NOTE | A2 | Key Management | Medium | Medium | Low |
|
||
| S-20 | DESIGN-NOTE | — | Governance Integration | Low | Low | Low |
|
||
|
||
---
|
||
|
||
## Findings
|
||
|
||
### CRITICAL
|
||
|
||
#### S-01: CertRequest Allows Requester-Controlled TTL and Principals
|
||
|
||
**Severity:** CRITICAL
|
||
**Attacker Profile:** A1 (Malicious Operator), A3 (Compromised Workload)
|
||
**Scope Area:** SSH Certificate Generation
|
||
**Location:** `pkg/sshcert/sshcert.go:18-29`
|
||
|
||
**Description:**
|
||
|
||
The `CertRequest` struct exposes two security-critical fields as requester-settable values:
|
||
|
||
```go
|
||
type CertRequest struct {
|
||
SpiffeID string
|
||
Extensions *shellstream.ShellstreamExtensions
|
||
ValidSeconds uint64 // Requester-controlled
|
||
Principals []string // Requester-controlled
|
||
}
|
||
```
|
||
|
||
The SSH-SVID specification (`specs/spiffe-ssh-svid.md`, Section 7.1, lines 390-396) mandates:
|
||
- Maximum TTL: 1 hour (MUST NOT exceed)
|
||
- Minimum TTL: 30 seconds (MUST NOT issue below)
|
||
- Default TTL: 5 minutes (RECOMMENDED)
|
||
|
||
The `Build()` method (line 47) performs no TTL bounds checking and no principal validation. The `Principals` field allows arbitrary SSH principals beyond the SPIFFE ID, which could grant access to hosts or accounts the workload is not authorized to reach.
|
||
|
||
**Attack Scenario:**
|
||
1. A3 obtains a valid SPIFFE SVID through legitimate attestation.
|
||
2. A3 crafts a `CertRequest` with `ValidSeconds: 31536000` (1 year) and `Principals: ["root", "admin", "deploy"]`.
|
||
3. The SSH Credential Composer builds a certificate with a 1-year lifetime and root/admin principals.
|
||
4. A3 uses this certificate for persistent, privileged SSH access across the infrastructure.
|
||
|
||
**Impact:** Privilege escalation (unrestricted principals), credential persistence beyond intended lifetime (up to max uint64 seconds). Violates the short-lived credential model that is the primary security guarantee of the SSH-SVID design.
|
||
|
||
**Recommendation:**
|
||
- Add `MaxValidSeconds`, `MinValidSeconds`, and `AllowedPrincipals` to the `Builder` config, enforced server-side.
|
||
- `Build()` must clamp `ValidSeconds` to `[MinValidSeconds, MaxValidSeconds]` and reject principals not in the allow list.
|
||
- The SPIFFE ID should be the *only* principal unless explicitly configured otherwise in the SPIRE registration entry.
|
||
|
||
---
|
||
|
||
#### S-02: Verifier Interface Lacks Audience Parameter
|
||
|
||
**Severity:** CRITICAL
|
||
**Attacker Profile:** A3 (Compromised Workload)
|
||
**Scope Area:** OIDC Token Handling
|
||
**Location:** `pkg/oidc/oidc.go:31-34`
|
||
|
||
**Description:**
|
||
|
||
The `Verifier` interface defines:
|
||
|
||
```go
|
||
type Verifier interface {
|
||
Verify(ctx context.Context, rawToken string) (*Claims, error)
|
||
}
|
||
```
|
||
|
||
The `Config` struct (line 10-18) includes `Audience string`, but this field is never referenced in the `Verifier` interface contract. An implementation of `Verifier` has no obligation to check the audience claim. The `Claims` struct (line 22-28) includes `Audience []string` in the output, but this is informational — no enforcement occurs.
|
||
|
||
Additionally:
|
||
- No nonce parameter exists (replay protection)
|
||
- No JTI (JWT ID) tracking exists (token reuse detection)
|
||
- `Config.Audience` is not validated as required in `NewVerifier` (line 37-43)
|
||
|
||
**Attack Scenario:**
|
||
1. A3 is a workload in namespace `tenant-b` with a valid OIDC token whose `aud` claim is `["api-gateway"]`.
|
||
2. A3 presents this token to the SPIRE OIDC attestor, which calls `Verify(ctx, token)`.
|
||
3. Because audience validation is not part of the interface contract, the implementation may accept the token despite the audience mismatch.
|
||
4. A3 obtains SPIRE selectors (`oidc:sub:...`) and gains an SVID for a workload identity it should not have.
|
||
|
||
**Impact:** Token confusion / cross-service authentication bypass. A token issued for one relying party is accepted by another. This is a fundamental OIDC security control.
|
||
|
||
**Recommendation:**
|
||
- Change the interface to `Verify(ctx context.Context, rawToken string, expectedAudience string) (*Claims, error)` to make audience validation contractually required.
|
||
- Add `audience` as a required parameter that cannot be empty.
|
||
- Implement JTI tracking with a TTL-bounded cache to prevent token replay.
|
||
|
||
---
|
||
|
||
#### S-03: Merkle Proof Not Bound to Specific Credential Content
|
||
|
||
**Severity:** CRITICAL
|
||
**Attacker Profile:** A5 (Insider with Merkle Access)
|
||
**Scope Area:** Specification Security Review
|
||
**Location:** `specs/credential-governance.md:589-599`, `pkg/shellstream/shellstream.go:69-70`
|
||
|
||
**Description:**
|
||
|
||
The Shellstream `merkle-proof@guildhouse.dev` extension carries a binary inclusion proof, and `merkle-root@guildhouse.dev` carries the root hash. Together they prove that *some* leaf was included in the governance merkle tree. However, there is no cryptographic binding between the proof and the specific SSH certificate that carries it.
|
||
|
||
The merkle leaf is `SHA-256(JCS(MutationEnvelope))` (spec Section 7.4, line 360-361). The MutationEnvelope includes `payload_hash`, `actor_svid`, `tenant_id`, `event_type`, and `intent_id` — but it does NOT include the SSH certificate's serial number, public key fingerprint, or any certificate-specific identifier.
|
||
|
||
**Attack Scenario:**
|
||
1. A5 observes a legitimately issued certificate C1 with a valid merkle proof P1.
|
||
2. A5 extracts P1 and the corresponding merkle root R1.
|
||
3. A5 creates a new certificate C2 (possibly with different principals or TTL) and embeds P1 and R1 as Shellstream extensions.
|
||
4. A verifier calls `NotaryService.VerifyInclusion(R1, leaf_from_C1, P1)` — this succeeds because P1 proves inclusion of C1's leaf, which is still valid.
|
||
5. The verifier concludes C2 has valid governance provenance, but C2 was never actually governed.
|
||
|
||
**Impact:** Complete governance audit trail bypass. Certificates can be minted with fabricated governance provenance.
|
||
|
||
**Recommendation:**
|
||
- The MutationEnvelope MUST include a credential-binding field: the SHA-256 hash of the SSH certificate's public key (or serial number). This binds the merkle leaf to a specific certificate.
|
||
- The verifier MUST recompute the leaf hash from the certificate's own content and verify that the proof matches *this specific certificate*, not just any historical leaf.
|
||
- Add the credential binding field to `specs/credential-governance.md` Section 7.3 and the `MutationEnvelope` schema.
|
||
|
||
---
|
||
|
||
### HIGH
|
||
|
||
#### S-04: NotaryService Fail-Open Creates Unaudited Credential Window
|
||
|
||
**Severity:** HIGH
|
||
**Attacker Profile:** A2 (Network Attacker)
|
||
**Scope Area:** Specification Security Review
|
||
**Location:** `specs/credential-governance.md:661-671`
|
||
|
||
**Description:**
|
||
|
||
The specification defines that when the NotaryService is unreachable after a credential operation completes, the credential "MUST proceed" (line 665). This is a fail-open design for the audit channel. The spec requires local queueing and retry (lines 666-668), but the current codebase has no implementation of this queue.
|
||
|
||
A network attacker who can disrupt connectivity to the NotaryService (TB-5 in the trust boundary diagram) can cause all credential operations to proceed without merkle anchoring. The credentials are valid and usable, but invisible to the audit trail.
|
||
|
||
**Attack Scenario:**
|
||
1. A2 identifies the NotaryService endpoint (`notary.quartermaster.svc.cluster.local:50051` per deploy config).
|
||
2. A2 initiates a sustained DoS against the NotaryService or disrupts the network path.
|
||
3. A1 (colluding or independently) issues credentials during the outage.
|
||
4. Credentials are issued with valid governance authorization (GovernanceService is fail-closed and still reachable) but no audit trail.
|
||
5. When the NotaryService recovers, the local queue (if implemented) would eventually anchor the events — but if the queue is lost (pod restart), the events are permanently unaudited.
|
||
|
||
**Impact:** Audit gap during NotaryService outage. If combined with local queue loss, permanent audit gap for credential operations during the window.
|
||
|
||
**Recommendation:**
|
||
- Implement the local durable queue as specified (on-disk WAL or embedded database).
|
||
- Add a monitoring metric `governance_anchoring_pending_total` as specified in the spec (line 669).
|
||
- Consider adding a "maximum anchoring delay" after which credentials should be flagged as requiring re-verification.
|
||
- Document the durable queue implementation as a MUST in the implementation guide.
|
||
|
||
---
|
||
|
||
#### S-05: No Size Limit on Merkle-Proof Base64 Decode
|
||
|
||
**Severity:** HIGH
|
||
**Attacker Profile:** A2 (Network Attacker), A3 (Compromised Workload)
|
||
**Scope Area:** Shellstream Extensions
|
||
**Location:** `pkg/shellstream/shellstream.go:202-208`
|
||
|
||
**Description:**
|
||
|
||
The `Decode()` function performs base64 decoding on the `merkle-proof@guildhouse.dev` extension value with no size limit:
|
||
|
||
```go
|
||
if v, ok := extensions[ExtMerkleProof]; ok {
|
||
proof, err := base64.StdEncoding.DecodeString(v)
|
||
if err != nil {
|
||
return nil, fmt.Errorf("shellstream: decode merkle-proof: %w", err)
|
||
}
|
||
ext.MerkleProof = proof
|
||
}
|
||
```
|
||
|
||
The Shellstream spec (Section 9.4, lines 439-445) recommends a 4 KB total extension size limit, but the code enforces nothing. A crafted SSH certificate with a multi-megabyte `merkle-proof` value causes unbounded memory allocation when any service calls `Decode()`.
|
||
|
||
The merkle proof should be at most 8 × 32 = 256 bytes (8 levels × 32-byte SHA-256 hashes) based on the 256-leaf depth limit (spec Section 6.8). In base64, this is ~344 characters.
|
||
|
||
**Attack Scenario:**
|
||
1. A3 crafts an SSH certificate with a `merkle-proof@guildhouse.dev` extension containing 100 MB of base64-encoded data.
|
||
2. Any service that decodes Shellstream extensions (SSH server middleware, audit verifier, monitoring) calls `Decode()`.
|
||
3. The base64 decode allocates ~75 MB of memory per call. Repeated calls exhaust server memory.
|
||
|
||
**Impact:** Denial of service against any component that processes Shellstream extensions.
|
||
|
||
**Recommendation:**
|
||
- Add a size check before decoding: `if len(v) > 512 { return error }` (344 base64 chars for max proof, with margin).
|
||
- Consider adding a total extension payload size check at the start of `Decode()` to enforce the 4 KB spec recommendation.
|
||
|
||
---
|
||
|
||
#### S-06: TOCTOU Between Intent Creation and Credential Issuance
|
||
|
||
**Severity:** HIGH
|
||
**Attacker Profile:** A1 (Malicious Operator)
|
||
**Scope Area:** Governance Integration
|
||
**Location:** `specs/credential-governance.md:30`, `pkg/governance/governance.go:53-56`
|
||
|
||
**Description:**
|
||
|
||
The governance flow is:
|
||
1. `CreateIntent` → Accord policy evaluation → intent authorized or ceremony required
|
||
2. (Optional) Ceremony approval
|
||
3. `RedeemIntent` → SAT issued
|
||
4. Credential operation executed with SAT
|
||
|
||
Between step 1 and step 3, an arbitrary amount of time may pass (minutes for ceremony approval, up to `ttl_seconds` of the intent). During this window, Accord policy may change. The SAT issued at step 3 reflects the policy evaluation at step 1.
|
||
|
||
**Attack Scenario:**
|
||
1. A1 creates an intent for a credential operation that is currently classified as `Autonomous` (no approval needed).
|
||
2. A1 (or another admin) updates the Accord policy to reclassify this operation as `QuorumApproval`.
|
||
3. A1 redeems the intent. The GovernanceService issues a SAT based on the original `Autonomous` classification.
|
||
4. The credential is issued without the now-required quorum approval.
|
||
|
||
**Impact:** Policy bypass during policy transition windows. The window size equals `min(intent_ttl, ceremony_timeout)`.
|
||
|
||
**Recommendation:**
|
||
- Re-evaluate Accord policy at `RedeemIntent` time, not just at `CreateIntent` time. If the classification has escalated, the intent should transition to the new ceremony requirement.
|
||
- Alternatively, add a `policy_version` field to the intent and check at redemption that the policy version hasn't changed.
|
||
- Document this behavior explicitly in the spec's error handling section.
|
||
|
||
---
|
||
|
||
#### S-07: GovernanceEpochSeconds Defaults to Zero
|
||
|
||
**Severity:** HIGH
|
||
**Attacker Profile:** A1 (Malicious Operator)
|
||
**Scope Area:** Configuration & Deployment
|
||
**Location:** `pkg/config/config.go:30`
|
||
|
||
**Description:**
|
||
|
||
```go
|
||
GovernanceEpochSeconds int `hcl:"governance_epoch_seconds"`
|
||
```
|
||
|
||
Go initializes `int` to `0`. The `Validate()` method (lines 34-38) only checks `TrustDomain`. If `governance_epoch_seconds` is omitted from configuration, it defaults to 0, which could mean:
|
||
- Division by zero in epoch boundary calculations
|
||
- Continuous anchoring on every event (resource exhaustion)
|
||
- No anchoring at all (implementation-dependent interpretation of epoch=0)
|
||
|
||
The field comment (line 27-29) documents a default of 300 seconds, but this default is not enforced in code.
|
||
|
||
**Attack Scenario:**
|
||
1. A1 deploys a SPIRE server with a config file that omits `governance_epoch_seconds`.
|
||
2. The plugin starts with epoch=0.
|
||
3. Depending on implementation: epoch boundary triggers on every event (DoS on NotaryService) or never triggers (no anchoring, audit gap).
|
||
|
||
**Impact:** Denial of service or silent audit trail failure due to misconfiguration.
|
||
|
||
**Recommendation:**
|
||
- In `Validate()`, check `GovernanceEpochSeconds > 0` and set a default of 300 if unset.
|
||
- Add validation for all required fields: `GovernanceAddr`, `ClusterID`, `GovernanceEpochSeconds`.
|
||
- Consider using a pointer type (`*int`) to distinguish "not set" from "set to 0".
|
||
|
||
---
|
||
|
||
#### S-08: OIDC Config Lacks URL Validation for Issuer
|
||
|
||
**Severity:** HIGH
|
||
**Attacker Profile:** A1 (Malicious Operator)
|
||
**Scope Area:** OIDC Token Handling
|
||
**Location:** `pkg/oidc/oidc.go:37-40`
|
||
|
||
**Description:**
|
||
|
||
```go
|
||
func NewVerifier(cfg Config) (Verifier, error) {
|
||
if cfg.Issuer == "" {
|
||
return nil, fmt.Errorf("oidc: issuer is required")
|
||
}
|
||
// TODO: implement
|
||
}
|
||
```
|
||
|
||
The issuer field is validated only for non-emptiness. OIDC discovery (`/.well-known/openid-configuration`) fetches the JWKS endpoint from the issuer URL. If the issuer is a `file://` URL, `http://` URL (no TLS), or points to an attacker-controlled domain, the JWKS keys are attacker-controlled.
|
||
|
||
Additionally, `Config.Audience` (line 15) is not validated as required, and `Config.JWKSURL` (line 18) could override discovery with an arbitrary URL.
|
||
|
||
**Attack Scenario:**
|
||
1. A1 configures the OIDC attestor with `issuer: "http://attacker.example.com"`.
|
||
2. The attestor fetches `http://attacker.example.com/.well-known/openid-configuration`.
|
||
3. The attacker's OIDC discovery document points to their own JWKS endpoint.
|
||
4. The attacker can now sign any OIDC token that will be accepted by the attestor.
|
||
5. Workloads in the cluster are attested with attacker-controlled identities.
|
||
|
||
**Impact:** Complete OIDC authentication bypass via misconfigured or malicious issuer URL.
|
||
|
||
**Recommendation:**
|
||
- Validate that `Issuer` is a well-formed `https://` URL. Reject `http://`, `file://`, and non-URL strings.
|
||
- Validate that `Audience` is non-empty.
|
||
- If `JWKSURL` is set, validate it is also `https://`.
|
||
- Consider pinning the expected issuer to a known domain pattern.
|
||
|
||
---
|
||
|
||
#### S-09: No go-plugin Serving Pattern in Plugin Binaries
|
||
|
||
**Severity:** HIGH
|
||
**Attacker Profile:** A4 (Rogue Plugin)
|
||
**Scope Area:** Key Management
|
||
**Location:** `cmd/substrate-keymanager/main.go`, `cmd/ssh-credential-composer/main.go`, `cmd/governance-notifier/main.go`, `cmd/oidc-attestor/main.go` (all contain `func main() { os.Exit(1) }`)
|
||
|
||
**Description:**
|
||
|
||
SPIRE external plugins use HashiCorp's `go-plugin` framework, which includes a magic cookie handshake to verify that the plugin binary was launched by the expected host process. Without this handshake:
|
||
- Any binary placed at the plugin path will be loaded by SPIRE Server.
|
||
- There is no mutual authentication between SPIRE Server and the plugin binary.
|
||
- The plugin stub `plugin.go` files define struct types but have no `plugin.Serve()` call.
|
||
|
||
The deploy config (`deploy/spire-server-config.yaml`, lines 39, 50, 60) specifies absolute plugin paths under `/opt/spire/plugins/`. If an attacker can write to this path (container escape, misconfigured volume mount, supply chain compromise), they can substitute a rogue binary.
|
||
|
||
**Attack Scenario:**
|
||
1. A4 gains write access to `/opt/spire/plugins/` (e.g., via a writable `hostPath` volume mount in Kubernetes).
|
||
2. A4 replaces `substrate-keymanager` with a rogue binary that implements the KeyManager interface.
|
||
3. SPIRE Server loads the rogue binary on next restart.
|
||
4. The rogue KeyManager has access to all signing keys and can issue arbitrary SVIDs.
|
||
|
||
**Impact:** Complete CA key compromise, arbitrary SVID issuance, full trust domain takeover.
|
||
|
||
**Recommendation:**
|
||
- Implement the `go-plugin` serving pattern with the SPIRE-defined handshake config in all `cmd/*/main.go`.
|
||
- Use plugin binary checksums in the SPIRE server config (SPIRE supports `plugin_checksum`).
|
||
- Ensure plugin paths are on read-only filesystems in production.
|
||
|
||
---
|
||
|
||
### MEDIUM
|
||
|
||
#### S-10: RedeemResult Missing ExpiresAt for SAT TTL Enforcement
|
||
|
||
**Severity:** MEDIUM
|
||
**Attacker Profile:** A1 (Malicious Operator), A3 (Compromised Workload)
|
||
**Scope Area:** Governance Integration
|
||
**Location:** `pkg/governance/governance.go:31-36`
|
||
|
||
**Description:**
|
||
|
||
```go
|
||
type RedeemResult struct {
|
||
Success bool
|
||
SatHash []byte
|
||
Status string
|
||
Error string
|
||
}
|
||
```
|
||
|
||
The proto definition (`proto/quartermaster/v1/governance.proto:77-85`) includes `expires_at` in `SatToken`, but the Go `RedeemResult` struct does not surface this field. Code consuming `RedeemResult` has no way to check whether the SAT has expired.
|
||
|
||
**Attack Scenario:**
|
||
1. A credential composer redeems an intent and receives a SAT with a 5-minute TTL.
|
||
2. Due to processing delay or retry logic, the SAT expires before the credential is issued.
|
||
3. The composer cannot detect the expiration because `RedeemResult` lacks `ExpiresAt`.
|
||
4. The credential is issued with an expired SAT hash embedded in the `sat-hash@guildhouse.dev` extension.
|
||
|
||
**Impact:** Credentials issued with expired authorization tokens. Downstream verifiers that check SAT validity would reject these credentials.
|
||
|
||
**Recommendation:**
|
||
- Add `ExpiresAt time.Time` to `RedeemResult`.
|
||
- Check `time.Now().Before(result.ExpiresAt)` before proceeding with credential issuance.
|
||
- Add `SatBytes []byte` to enable downstream SAT verification.
|
||
|
||
---
|
||
|
||
#### S-11: Roles Decoded Without Validation in Decode Path
|
||
|
||
**Severity:** MEDIUM
|
||
**Attacker Profile:** A3 (Compromised Workload)
|
||
**Scope Area:** SSH Certificate Generation
|
||
**Location:** `pkg/shellstream/shellstream.go:159-161`
|
||
|
||
**Description:**
|
||
|
||
```go
|
||
if v, ok := extensions[ExtRoles]; ok && v != "" {
|
||
ext.Roles = strings.Split(v, ",")
|
||
}
|
||
```
|
||
|
||
`strings.Split("admin,,engineer", ",")` produces `["admin", "", "engineer"]`. The empty string is silently included in the roles list. While `Validate()` (lines 247-254) catches this, `Decode()` and `Validate()` are separate functions. Any code path that calls `Decode()` without subsequently calling `Validate()` operates on unvalidated role data.
|
||
|
||
Similarly, `strings.Split("admin, engineer", ",")` produces `["admin", " engineer"]` — the leading space is preserved, which could cause authorization mismatches if roles are compared by exact string equality.
|
||
|
||
**Attack Scenario:**
|
||
1. A3 presents an SSH certificate with `roles@guildhouse.dev: "admin,,,"`.
|
||
2. A server-side authorization check calls `Decode()` but not `Validate()`.
|
||
3. The roles list is `["admin", "", "", ""]` — the empty strings may match wildcard or default role rules.
|
||
|
||
**Impact:** Authorization bypass if roles are consumed after `Decode()` without `Validate()`.
|
||
|
||
**Recommendation:**
|
||
- Filter empty strings and trim whitespace in `Decode()` itself, not just in `Validate()`.
|
||
- Alternatively, make `Decode()` call `Validate()` internally and return validated data only.
|
||
- Document that `Decode()` output MUST be validated before use.
|
||
|
||
---
|
||
|
||
#### S-12: Config.Validate() Incomplete — Only Checks TrustDomain
|
||
|
||
**Severity:** MEDIUM
|
||
**Attacker Profile:** A1 (Malicious Operator)
|
||
**Scope Area:** Configuration & Deployment
|
||
**Location:** `pkg/config/config.go:34-38`
|
||
|
||
**Description:**
|
||
|
||
```go
|
||
func (c *PluginConfig) Validate() error {
|
||
if c.TrustDomain == "" {
|
||
return fmt.Errorf("config: trust_domain is required")
|
||
}
|
||
return nil
|
||
}
|
||
```
|
||
|
||
The `PluginConfig` struct has 6 fields (lines 10-30) but only `TrustDomain` is validated. Fields `GovernanceAddr`, `CeremonyAddr`, `NotaryAddr`, `ClusterID`, and `GovernanceEpochSeconds` can all be empty or zero without triggering a validation error.
|
||
|
||
**Impact:** Plugins start with incomplete configuration and fail at runtime when the first governance call is made, potentially after having already partially processed credentials.
|
||
|
||
**Recommendation:**
|
||
- Validate all required fields: `GovernanceAddr` (non-empty, valid address format), `ClusterID` (non-empty), `GovernanceEpochSeconds` (> 0).
|
||
- `CeremonyAddr` and `NotaryAddr` should be validated when the corresponding features are enabled.
|
||
|
||
---
|
||
|
||
#### S-13: GovernanceIntent Only Format-Validated, Not Existence-Validated
|
||
|
||
**Severity:** MEDIUM
|
||
**Attacker Profile:** A1 (Malicious Operator), A5 (Insider with Merkle Access)
|
||
**Scope Area:** Shellstream Extensions
|
||
**Location:** `pkg/shellstream/shellstream.go:331-335`
|
||
|
||
**Description:**
|
||
|
||
```go
|
||
if ext.GovernanceIntent != "" {
|
||
if !isValidUUID(ext.GovernanceIntent) {
|
||
return fmt.Errorf("shellstream: governance-intent is not a valid UUID: %q", ext.GovernanceIntent)
|
||
}
|
||
}
|
||
```
|
||
|
||
The `governance-intent@guildhouse.dev` extension is validated as a well-formed UUID but there is no mechanism to verify the intent actually exists in the GovernanceService or that it corresponds to the current credential operation.
|
||
|
||
**Attack Scenario:**
|
||
1. A1 crafts a certificate with `governance-intent@guildhouse.dev: "00000000-0000-0000-0000-000000000000"`.
|
||
2. The UUID is syntactically valid, so `Validate()` passes.
|
||
3. A verifier sees the governance-intent field and assumes the credential went through governance.
|
||
4. But the intent ID is fabricated — no such intent exists in the GovernanceService.
|
||
|
||
**Impact:** False governance provenance. Certificates appear governed but were never actually subject to governance authorization.
|
||
|
||
**Recommendation:**
|
||
- The SSH Credential Composer (at issuance time) must set `governance-intent` from the actual `CreateIntentResponse.intent_id`, not from external input.
|
||
- Verifiers should call `GovernanceService.ListIntents` with the intent ID to confirm existence.
|
||
- Document that `governance-intent` is server-set and MUST NOT be accepted from external sources.
|
||
|
||
---
|
||
|
||
#### S-14: Sample OIDC Token Contains tenant_id Custom Claim
|
||
|
||
**Severity:** MEDIUM
|
||
**Attacker Profile:** A1 (Malicious Operator), A3 (Compromised Workload)
|
||
**Scope Area:** Test Fixtures
|
||
**Location:** `test/fixtures/sample-oidc-token.json:21`
|
||
|
||
**Description:**
|
||
|
||
```json
|
||
{
|
||
"tenant_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
|
||
}
|
||
```
|
||
|
||
The OIDC token fixture includes `tenant_id` as a top-level custom claim. The deploy config references Keycloak (`https://keycloak.guildhouse.example.org/realms/platform`), which is self-managed. In Keycloak, custom claims are configurable via protocol mappers and, depending on configuration, may be user-editable (e.g., through a user attribute mapper).
|
||
|
||
If the OIDC attestor uses `tenant_id` from the token to determine tenant context (rather than deriving it server-side from the workload's namespace or registration entry), a user who can control their Keycloak attributes can claim any tenant.
|
||
|
||
**Attack Scenario:**
|
||
1. A3 is a workload in `tenant-alpha` that has access to modify its own Keycloak user profile.
|
||
2. A3 changes its `tenant_id` user attribute to `tenant-beta`'s UUID.
|
||
3. Keycloak issues a token with `"tenant_id": "<tenant-beta-uuid>"`.
|
||
4. The OIDC attestor extracts `tenant_id` from the token and provides selectors indicating `tenant-beta`.
|
||
5. A3 obtains an SVID with tenant-beta's context.
|
||
|
||
**Impact:** Tenant boundary bypass if `tenant_id` is sourced from the OIDC token.
|
||
|
||
**Recommendation:**
|
||
- Do NOT use `tenant_id` from OIDC tokens for authorization decisions. Derive tenant context from the workload's SPIFFE ID path or Kubernetes namespace.
|
||
- If `tenant_id` must come from OIDC, use a Keycloak hardcoded claim mapper (not user attribute mapper) and restrict mapper editing to admin-only.
|
||
- Document this trust assumption explicitly.
|
||
|
||
---
|
||
|
||
### LOW
|
||
|
||
#### S-15: DESIGN-NOTE — Governance-Optional Model (Fail-Closed Auth, Fail-Open Audit)
|
||
|
||
**Severity:** DESIGN-NOTE
|
||
**Attacker Profile:** A2 (Network Attacker)
|
||
**Scope Area:** Specification Security Review
|
||
**Location:** `specs/credential-governance.md:642-671`
|
||
|
||
**Description:**
|
||
|
||
The architecture uses two different failure modes:
|
||
- **GovernanceService unreachable** (Section 10.1, line 644): Fail-CLOSED. Credential operations MUST fail. Authorization is mandatory.
|
||
- **NotaryService unreachable** (Section 10.3, line 661): Fail-OPEN. Credential operations MUST proceed. Audit is best-effort.
|
||
|
||
This is a deliberate, documented design tradeoff. The rationale (line 671) is that a temporary audit gap is preferable to failing a legitimately authorized credential operation.
|
||
|
||
**Security Implication:** An attacker who can selectively disrupt NotaryService connectivity creates a window where credential operations proceed without audit trail entries. This is acceptable in the threat model if:
|
||
1. The local durable queue (spec line 666) is implemented correctly.
|
||
2. The queue survives pod restarts.
|
||
3. Monitoring alerts on `governance_anchoring_pending_total` (spec line 669).
|
||
|
||
**Recommendation:** No design change needed, but operators must be aware that NotaryService availability directly affects audit completeness. Document this prominently in the deployment guide.
|
||
|
||
---
|
||
|
||
#### S-16: Emergency Break-Glass Post-Hoc Approval Window Undefined
|
||
|
||
**Severity:** LOW
|
||
**Attacker Profile:** A1 (Malicious Operator)
|
||
**Scope Area:** Specification Security Review
|
||
**Location:** `specs/credential-governance.md:535-543`
|
||
|
||
**Description:**
|
||
|
||
The Accord policy emergency section specifies:
|
||
|
||
```yaml
|
||
emergency:
|
||
classification: EmergencyBreakGlass
|
||
post_hoc_approval_window_hours: 24
|
||
```
|
||
|
||
The spec defines that emergency break-glass ceremonies allow credential operations to proceed before approval, with post-hoc approval required within 24 hours. However, the spec does not define:
|
||
- What happens if post-hoc approval is not obtained within the window.
|
||
- Whether the credential is automatically revoked.
|
||
- Whether subsequent uses of the credential are blocked.
|
||
- Whether an alert is generated.
|
||
|
||
**Impact:** Emergency credentials could persist indefinitely without retrospective approval. A malicious operator could use emergency break-glass repeatedly as a governance bypass mechanism.
|
||
|
||
**Recommendation:**
|
||
- Specify in `specs/credential-governance.md` that credentials issued via EmergencyBreakGlass MUST be automatically revoked if post-hoc approval is not obtained within the window.
|
||
- Add a required `escalation_action` field (e.g., `revoke`, `alert`, `suspend`) to the emergency policy schema.
|
||
|
||
---
|
||
|
||
#### S-17: Merkle Chain Fork Detection Not Specified
|
||
|
||
**Severity:** LOW
|
||
**Attacker Profile:** A5 (Insider with Merkle Access)
|
||
**Scope Area:** Specification Security Review
|
||
**Location:** `specs/credential-governance.md:624-630`
|
||
|
||
**Description:**
|
||
|
||
The audit chain uses `previous_root` linkage:
|
||
|
||
> "Each anchor's `previous_root` field MUST reference the `merkle_root` of the immediately preceding anchor." (line 625)
|
||
|
||
This detects history rewriting (changing an existing anchor) but does not prevent forks: two anchors could both claim the same `previous_root`, creating a split-brain audit trail. The spec does not define:
|
||
- A canonical sequence number for anchors.
|
||
- A consensus mechanism for anchor creation.
|
||
- Fork detection or resolution procedures.
|
||
|
||
The `CreateAnchorRequest` proto (`proto/quartermaster/v1/notary.proto:18-22`) includes `etcd_revision` which could serve as a sequence number, but its use is not specified (`0 means not set`).
|
||
|
||
**Impact:** If the NotaryService has multiple writers (e.g., multiple SPIRE servers in an HA deployment), concurrent anchor creation could fork the chain. Each fork is internally consistent but they diverge, making cross-fork audit queries unreliable.
|
||
|
||
**Recommendation:**
|
||
- Specify that `etcd_revision` (or an equivalent monotonic counter) MUST be set and MUST be unique per anchor.
|
||
- The NotaryService MUST reject `CreateAnchor` requests where `previous_root` does not match the current latest anchor's `merkle_root` (optimistic concurrency control).
|
||
- Document fork detection: auditors should verify that no two anchors share the same `previous_root`.
|
||
|
||
---
|
||
|
||
### INFORMATIONAL
|
||
|
||
#### S-18: CreateIntentRequest identity_claim Oneof Allows Unverifiable External Events
|
||
|
||
**Severity:** INFORMATIONAL
|
||
**Attacker Profile:** A1 (Malicious Operator)
|
||
**Scope Area:** Governance Integration
|
||
**Location:** `proto/quartermaster/v1/governance.proto:35-38`
|
||
|
||
**Description:**
|
||
|
||
```protobuf
|
||
oneof identity_claim {
|
||
string oidc_token = 5;
|
||
ExternalEventClaim external_event = 6;
|
||
}
|
||
```
|
||
|
||
The `ExternalEventClaim` message (lines 45-50) includes a `verification` string field with no defined verification protocol:
|
||
|
||
```protobuf
|
||
message ExternalEventClaim {
|
||
string source = 1;
|
||
string event_id = 2;
|
||
string event_type = 3;
|
||
string verification = 4;
|
||
}
|
||
```
|
||
|
||
The GovernanceService must decide whether to trust an external event claim, but there is no standard for what `verification` should contain (HMAC, signature, URL to verify, etc.).
|
||
|
||
**Impact:** If the GovernanceService accepts external event claims without strong verification, a malicious operator can forge identity claims to create intents on behalf of other actors.
|
||
|
||
**Recommendation:**
|
||
- Define a verification protocol for external event claims (e.g., HMAC with a shared secret, or a URL that returns verification status).
|
||
- Add a `verification_type` field to `ExternalEventClaim` to disambiguate verification methods.
|
||
- Consider requiring that external event claims always trigger at least `SingleApproval` ceremony classification.
|
||
|
||
---
|
||
|
||
### DESIGN-NOTE
|
||
|
||
#### S-19: No mTLS Requirement in gRPC Client Configuration
|
||
|
||
**Severity:** DESIGN-NOTE
|
||
**Attacker Profile:** A2 (Network Attacker)
|
||
**Scope Area:** Key Management
|
||
**Location:** `pkg/governance/governance.go:44-49`
|
||
|
||
**Description:**
|
||
|
||
```go
|
||
func NewClient(cfg Config) (*Client, error) {
|
||
if cfg.GovernanceAddr == "" {
|
||
return nil, fmt.Errorf("governance: governance address is required")
|
||
}
|
||
// TODO: implement — establish gRPC connections with mTLS
|
||
return &Client{config: cfg}, nil
|
||
}
|
||
```
|
||
|
||
The `Config` struct (lines 10-20) has `GovernanceAddr`, `CeremonyAddr`, and `NotaryAddr` but no TLS configuration fields (certificate paths, CA bundle, TLS required flag). The TODO comment mentions mTLS, but when this is implemented, if mTLS is made *optional* rather than *required*, a network attacker can intercept governance traffic.
|
||
|
||
**Impact:** If mTLS is not enforced, all trust boundaries TB-3, TB-4, and TB-5 in the trust diagram are vulnerable to man-in-the-middle attacks.
|
||
|
||
**Recommendation:**
|
||
- Add `TLSCertPath`, `TLSKeyPath`, `TLSCAPath` to the governance `Config` struct.
|
||
- Make TLS mandatory — `NewClient` should fail if TLS configuration is missing.
|
||
- Use SPIFFE-aware TLS (SVID-based mTLS) for consistency with the SPIRE ecosystem.
|
||
|
||
---
|
||
|
||
#### S-20: Proto Files Lack Field Validation Annotations
|
||
|
||
**Severity:** DESIGN-NOTE
|
||
**Attacker Profile:** N/A
|
||
**Scope Area:** Governance Integration
|
||
**Location:** All proto files under `proto/`
|
||
|
||
**Description:**
|
||
|
||
None of the 4 proto files use field validation annotations (e.g., `buf/validate` or `protoc-gen-validate`). Fields with implicit constraints have no documentation of valid ranges:
|
||
|
||
| Proto File | Field | Expected Constraint |
|
||
|-----------|-------|-------------------|
|
||
| `governance.proto:40` | `ttl_seconds` | > 0, <= max_intent_ttl |
|
||
| `governance.proto:41` | `max_redemptions` | > 0, reasonable upper bound |
|
||
| `ceremony.proto:43` | `required_approvals` | > 0, <= pool_size |
|
||
| `ceremony.proto:44` | `ttl_hours` | > 0, reasonable upper bound |
|
||
| `notary.proto:21` | `etcd_revision` | >= 0 |
|
||
|
||
**Impact:** Generated client/server code has no built-in validation. All validation must happen in application code, increasing the risk of missing checks.
|
||
|
||
**Recommendation:**
|
||
- Add `buf/validate` annotations to proto fields with constraints.
|
||
- This provides both documentation and generated validation code.
|
||
- At minimum, add proto-level comments documenting valid ranges for each constrained field.
|
||
|
||
---
|
||
|
||
## Attack Trees
|
||
|
||
### Attack Tree 1: Forge SSH Certificate with Arbitrary Privileges
|
||
|
||
```
|
||
GOAL: Obtain SSH certificate with unauthorized privileges
|
||
├── 1. Exploit CertRequest TTL/Principal control [S-01] [CRITICAL]
|
||
│ ├── 1a. Compromised workload (A3) calls credential composer
|
||
│ │ ├── Set ValidSeconds = 31536000 (1 year)
|
||
│ │ ├── Set Principals = ["root", "deploy", "admin"]
|
||
│ │ └── RESULT: Long-lived cert with privileged principals
|
||
│ └── 1b. Malicious operator (A1) configures registration entry
|
||
│ ├── Set overly broad principal mapping
|
||
│ └── RESULT: Legitimate-looking but over-privileged certs
|
||
│
|
||
├── 2. Bypass OIDC attestation [S-02, S-08]
|
||
│ ├── 2a. Token confusion (A3) [S-02]
|
||
│ │ ├── Present token with wrong audience
|
||
│ │ ├── Verifier lacks audience check
|
||
│ │ └── RESULT: SVID for unintended workload identity
|
||
│ └── 2b. Malicious issuer (A1) [S-08]
|
||
│ ├── Configure issuer = "http://attacker.example.com"
|
||
│ ├── Serve attacker-controlled JWKS keys
|
||
│ └── RESULT: Arbitrary identity attestation
|
||
│
|
||
└── 3. Fabricate governance provenance [S-03, S-13]
|
||
├── 3a. Replay merkle proof (A5) [S-03]
|
||
│ ├── Extract proof from legitimate cert C1
|
||
│ ├── Embed in forged cert C2
|
||
│ └── RESULT: C2 appears governed
|
||
└── 3b. Fabricate intent ID (A1) [S-13]
|
||
├── Set governance-intent = random valid UUID
|
||
├── Passes format validation
|
||
└── RESULT: Cert appears to have governance intent
|
||
```
|
||
|
||
### Attack Tree 2: Bypass Governance for Credential Issuance
|
||
|
||
```
|
||
GOAL: Issue credentials without proper governance authorization
|
||
├── 1. Exploit TOCTOU window [S-06] [HIGH]
|
||
│ ├── 1a. Create intent under permissive policy (A1)
|
||
│ │ ├── Intent classified as Autonomous
|
||
│ │ ├── Policy updated to require QuorumApproval
|
||
│ │ ├── Redeem intent (still uses old classification)
|
||
│ │ └── RESULT: Credential issued without required quorum
|
||
│ └── 1b. Race condition during policy deployment
|
||
│ ├── Multiple intents created during transition
|
||
│ └── RESULT: Batch of under-governed credentials
|
||
│
|
||
├── 2. Suppress audit trail [S-04, S-15]
|
||
│ ├── 2a. DoS NotaryService (A2) [S-04]
|
||
│ │ ├── NotaryService unreachable
|
||
│ │ ├── Credentials proceed (fail-open audit)
|
||
│ │ ├── Local queue lost on pod restart
|
||
│ │ └── RESULT: Permanently unaudited credential operations
|
||
│ └── 2b. Exploit anchoring delay
|
||
│ ├── Issue credentials faster than epoch boundary
|
||
│ ├── Exceed 256 leaves per epoch [spec depth limit]
|
||
│ └── RESULT: Leaves dropped or epoch overflow
|
||
│
|
||
└── 3. Exploit emergency break-glass [S-16]
|
||
├── Trigger emergency condition (A1)
|
||
├── Credential issued without pre-approval
|
||
├── Post-hoc approval window expires
|
||
├── No automatic revocation specified
|
||
└── RESULT: Ungoverned credential persists indefinitely
|
||
```
|
||
|
||
### Attack Tree 3: Corrupt Audit Trail
|
||
|
||
```
|
||
GOAL: Compromise integrity of the governance audit trail
|
||
├── 1. Proof replay attack [S-03] [CRITICAL]
|
||
│ ├── 1a. Cross-credential replay (A5)
|
||
│ │ ├── Extract (merkle-root, merkle-proof) from cert C1
|
||
│ │ ├── Embed in crafted cert C2
|
||
│ │ ├── VerifyInclusion succeeds (proves C1's leaf, not C2's)
|
||
│ │ └── RESULT: C2 has false governance provenance
|
||
│ └── 1b. Cross-epoch replay
|
||
│ ├── Use proof from epoch N in cert issued at epoch N+1
|
||
│ ├── Governance-epoch mismatch may not be checked
|
||
│ └── RESULT: Stale governance state appears current
|
||
│
|
||
├── 2. Fork the merkle chain [S-17] [LOW]
|
||
│ ├── 2a. Concurrent anchor creation (A5)
|
||
│ │ ├── Two writers create anchors with same previous_root
|
||
│ │ ├── Chain forks into two valid branches
|
||
│ │ └── RESULT: Audit queries return inconsistent results
|
||
│ └── 2b. Reorder anchors
|
||
│ ├── etcd_revision = 0 (not set) per proto comment
|
||
│ ├── No sequence number to enforce ordering
|
||
│ └── RESULT: Anchors may be out of causal order
|
||
│
|
||
├── 3. DoS-induced audit gap [S-04, S-05]
|
||
│ ├── 3a. Overwhelm extension decoder (A3) [S-05]
|
||
│ │ ├── Craft cert with 100MB merkle-proof
|
||
│ │ ├── Audit verifier calls Decode()
|
||
│ │ ├── OOM kills verifier process
|
||
│ │ └── RESULT: Audit verification unavailable
|
||
│ └── 3b. NotaryService DoS (A2) [S-04]
|
||
│ ├── Sustained DoS during credential operations
|
||
│ └── RESULT: Audit gap for all operations during outage
|
||
│
|
||
└── 4. Fabricate governance metadata [S-13, S-18]
|
||
├── 4a. Fake intent ID [S-13]
|
||
│ ├── governance-intent = random UUID
|
||
│ └── RESULT: Cert appears governed when it wasn't
|
||
└── 4b. Forge external event claim [S-18]
|
||
├── CreateIntent with fabricated ExternalEventClaim
|
||
├── GovernanceService may accept without strong verification
|
||
└── RESULT: Intent created under false identity
|
||
```
|
||
|
||
---
|
||
|
||
## Scope Area Deep Dives
|
||
|
||
### 1. OIDC Token Handling
|
||
|
||
**Findings:** S-02 (CRITICAL), S-08 (HIGH), S-14 (MEDIUM)
|
||
|
||
**Files reviewed:** `pkg/oidc/oidc.go`, `cmd/oidc-attestor/plugin.go`, `test/fixtures/sample-oidc-token.json`
|
||
|
||
The OIDC subsystem has three compounding issues:
|
||
1. The `Verifier` interface (S-02) does not contractually require audience validation.
|
||
2. The issuer URL (S-08) is not validated for HTTPS.
|
||
3. The `tenant_id` claim (S-14) may be user-controllable in Keycloak.
|
||
|
||
Together, these create a layered authentication failure: even if one control is implemented correctly, the others provide bypass routes. The recommended fix order is: S-02 first (interface design, hardest to change later), then S-08 (configuration validation), then S-14 (trust model documentation).
|
||
|
||
### 2. SSH Certificate Generation
|
||
|
||
**Findings:** S-01 (CRITICAL), S-11 (MEDIUM)
|
||
|
||
**Files reviewed:** `pkg/sshcert/sshcert.go`, `cmd/ssh-credential-composer/plugin.go`, `specs/spiffe-ssh-svid.md`
|
||
|
||
The SSH certificate generation path has a fundamental interface design flaw (S-01) where the `CertRequest` struct allows callers to control security-critical fields. The SSH-SVID spec clearly defines TTL bounds (30s–1h) and principal derivation (from SPIFFE ID), but the code enforces neither. When this stub is implemented, the `Build()` method MUST enforce these constraints server-side.
|
||
|
||
### 3. Shellstream Extensions
|
||
|
||
**Findings:** S-05 (HIGH), S-13 (MEDIUM)
|
||
|
||
**Files reviewed:** `pkg/shellstream/shellstream.go`, `pkg/shellstream/shellstream_test.go`, `specs/shellstream-extensions.md`
|
||
|
||
The Shellstream implementation is the most mature code in the repository (360 lines, extensive tests). The primary security concern is the lack of size limits on the decode path (S-05), which is a straightforward DoS vector. The spec recommends a 4 KB total extension limit (Section 9.4, line 441-445), and the merkle proof has a known maximum size of ~256 bytes — both provide clear bounds for validation.
|
||
|
||
The governance-intent format-only validation (S-13) is a design issue: the extension value looks correct syntactically but could be fabricated. This is mitigated if the credential composer sets it server-side and verifiers cross-check with the GovernanceService.
|
||
|
||
### 4. Governance Integration
|
||
|
||
**Findings:** S-06 (HIGH), S-07 (HIGH), S-10 (MEDIUM), S-18 (INFORMATIONAL), S-20 (DESIGN-NOTE)
|
||
|
||
**Files reviewed:** `pkg/governance/governance.go`, `pkg/config/config.go`, `proto/quartermaster/v1/governance.proto`, `specs/credential-governance.md`
|
||
|
||
The governance integration has the widest attack surface. The TOCTOU window (S-06) is an inherent challenge in any async authorization system with separate create/redeem steps. The spec should either re-evaluate policy at redemption time or bound the intent TTL to minimize the window. The GovernanceEpochSeconds default-to-zero (S-07) is a dangerous misconfiguration vector. The `RedeemResult` missing `ExpiresAt` (S-10) prevents SAT TTL enforcement in consumer code.
|
||
|
||
### 5. Key Management
|
||
|
||
**Findings:** S-09 (HIGH), S-19 (DESIGN-NOTE)
|
||
|
||
**Files reviewed:** `cmd/substrate-keymanager/main.go`, `cmd/substrate-keymanager/plugin.go`, `pkg/governance/governance.go`
|
||
|
||
Key management is entirely scaffolded. The most significant risk is the absence of the `go-plugin` serving pattern (S-09), which means there is no binary trust verification between SPIRE Server and the plugin. The mTLS requirement (S-19) for gRPC connections to Quartermaster services is noted in TODOs but not yet structurally enforced.
|
||
|
||
### 6. Configuration & Deployment
|
||
|
||
**Findings:** S-07 (HIGH), S-08 (HIGH), S-12 (MEDIUM)
|
||
|
||
**Files reviewed:** `pkg/config/config.go`, `pkg/oidc/oidc.go`, `deploy/spire-server-config.yaml`, `deploy/spire-agent-config.yaml`
|
||
|
||
Configuration validation is the weakest link in the current codebase. The `PluginConfig.Validate()` method checks only one of six fields (S-12). The `oidc.NewVerifier` checks only issuer non-emptiness (S-08). The `GovernanceEpochSeconds` field defaults to zero with no validation (S-07). These gaps mean that misconfigured deployments will start successfully but fail in unpredictable ways at runtime.
|
||
|
||
### 7. Test Fixtures
|
||
|
||
**Findings:** S-14 (MEDIUM)
|
||
|
||
**Files reviewed:** `test/fixtures/sample-oidc-token.json`, `test/fixtures/sample-ssh-cert-extensions.json`
|
||
|
||
The OIDC token fixture includes a `tenant_id` custom claim that, in a self-managed Keycloak deployment, could be user-controllable. This is primarily a trust model documentation issue — the fixture itself is correct for testing, but the trust assumption about where `tenant_id` comes from must be made explicit in the implementation.
|
||
|
||
### 8. Specification Security Review
|
||
|
||
**Findings:** S-03 (CRITICAL), S-04 (HIGH), S-15 (DESIGN-NOTE), S-16 (LOW), S-17 (LOW)
|
||
|
||
**Files reviewed:** All 3 specs — `specs/spiffe-ssh-svid.md` (Section 8), `specs/shellstream-extensions.md` (Section 9), `specs/credential-governance.md` (Sections 9-10)
|
||
|
||
The specifications are well-written with dedicated security sections. The SSH-SVID spec (Section 8, 7 subsections) covers attestation trust, SPIFFE ID visibility, mTLS, socket protection, rate limiting, CA key compromise, and clock skew. The Shellstream spec (Section 9, 5 subsections) covers extension visibility, authorization boundary, parsing safety, size constraints, and replay/freshness. The Credential Governance spec (Section 10, 5 subsections) covers fail-closed/fail-open modes, ceremony timeout, and idempotency.
|
||
|
||
The critical gap is the merkle proof binding (S-03): the proof is not tied to the specific certificate. The other spec-level findings (S-16, S-17) are edge cases in the emergency and chain fork scenarios that should be addressed but are not urgent.
|
||
|
||
---
|
||
|
||
## Appendix A: Attacker Profile × Finding Matrix
|
||
|
||
Cells marked with severity initial: **C**=CRITICAL, **H**=HIGH, **M**=MEDIUM, **L**=LOW, **I**=INFO, **D**=DESIGN-NOTE. Empty = not exploitable by this profile.
|
||
|
||
| Finding | A1 Malicious Operator | A2 Network Attacker | A3 Compromised Workload | A4 Rogue Plugin | A5 Merkle Insider |
|
||
|---------|-----------------------|---------------------|-------------------------|-----------------|-------------------|
|
||
| S-01 | **C** | | **C** | | |
|
||
| S-02 | | | **C** | | |
|
||
| S-03 | | | | | **C** |
|
||
| S-04 | | **H** | | | |
|
||
| S-05 | | **H** | **H** | | |
|
||
| S-06 | **H** | | | | |
|
||
| S-07 | **H** | | | | |
|
||
| S-08 | **H** | | | | |
|
||
| S-09 | | | | **H** | |
|
||
| S-10 | **M** | | **M** | | |
|
||
| S-11 | | | **M** | | |
|
||
| S-12 | **M** | | | | |
|
||
| S-13 | **M** | | | | **M** |
|
||
| S-14 | **M** | | **M** | | |
|
||
| S-15 | | **D** | | | |
|
||
| S-16 | **L** | | | | |
|
||
| S-17 | | | | | **L** |
|
||
| S-18 | **I** | | | | |
|
||
| S-19 | | **D** | | | |
|
||
| S-20 | | | | | |
|
||
|
||
**Summary by attacker profile:**
|
||
- **A1 (Malicious Operator):** 9 findings exploitable. Highest concentration of risk.
|
||
- **A2 (Network Attacker):** 3 findings exploitable. Primary vector is service DoS.
|
||
- **A3 (Compromised Workload):** 5 findings exploitable. Primary vector is cert/token manipulation.
|
||
- **A4 (Rogue Plugin):** 1 finding exploitable. High impact but narrow vector.
|
||
- **A5 (Merkle Insider):** 3 findings exploitable. Primary vector is audit trail manipulation.
|
||
|
||
---
|
||
|
||
## Appendix B: Trust Boundary Inventory
|
||
|
||
| ID | Boundary | From | To | Protocol | Authentication | Encryption | Status |
|
||
|----|----------|------|----|----------|----------------|------------|--------|
|
||
| TB-1 | Workload API | Workload | SPIRE Agent | Unix Socket | Kernel PID attestation | N/A (local) | Specified in SSH-SVID spec Section 8.4 |
|
||
| TB-2 | SPIRE Node API | SPIRE Agent | SPIRE Server | gRPC | mTLS with X.509-SVIDs | TLS 1.2+ | Specified in SSH-SVID spec Section 8.3 |
|
||
| TB-3 | Governance API | SPIRE Server | GovernanceService | gRPC | mTLS (TODO) | TLS (TODO) | S-19: No TLS config in code |
|
||
| TB-4 | Ceremony API | SPIRE Server | CeremonyService | gRPC | mTLS (TODO) | TLS (TODO) | S-19: No TLS config in code |
|
||
| TB-5 | Notary API | SPIRE Server | NotaryService | gRPC | mTLS (TODO) | TLS (TODO) | S-19: No TLS config in code |
|
||
| TB-6 | SSH Auth | Workload | SSH Server | SSH | Certificate-based | SSH encryption | Spec-compliant (standard OpenSSH) |
|
||
| TB-7 | Token File | Pod filesystem | Workload | File I/O | Filesystem permissions | N/A | Specified in agent config: `/var/run/secrets/oidc/token` |
|
||
| TB-8 | Plugin IPC | SPIRE Server | Plugin Binary | go-plugin/gRPC | Magic cookie handshake (TODO) | Local | S-09: No handshake implemented |
|
||
|
||
---
|
||
|
||
## Appendix C: Specification Security Coverage
|
||
|
||
### SSH-SVID Specification (`specs/spiffe-ssh-svid.md`, Section 8)
|
||
|
||
| Topic | Covered | CWE Reference | Assessment |
|
||
|-------|---------|---------------|------------|
|
||
| Attestation trust | Section 8.1 | CWE-287 (Improper Authentication) | Thorough. Recommends k8s attestation, warns about PID-based. |
|
||
| Identity visibility | Section 8.2 | CWE-200 (Exposure of Sensitive Information) | Good. Notes SPIFFE ID in cleartext during SSH handshake. |
|
||
| Transport security | Section 8.3 | CWE-319 (Cleartext Transmission) | Good. Requires mTLS for Agent-Server. |
|
||
| Socket protection | Section 8.4 | CWE-732 (Incorrect Permission Assignment) | Good. Requires filesystem permissions + kernel attestation. |
|
||
| Rate limiting | Section 8.5 | CWE-770 (Allocation Without Limits) | Good. Recommends 60 issuances/min/SPIFFE-ID. |
|
||
| CA compromise | Section 8.6 | CWE-321 (Use of Hard-coded Cryptographic Key) | Good. Recommends HSM, monitoring, rotation. |
|
||
| Clock skew | Section 8.7 | CWE-613 (Insufficient Session Expiration) | Good. 60-second tolerance. |
|
||
| **Certificate TTL enforcement** | **NOT COVERED** | **CWE-613** | **GAP: Spec defines bounds (S-01) but no server-side enforcement guidance.** |
|
||
| **Principal restriction** | **NOT COVERED** | **CWE-269 (Improper Privilege Management)** | **GAP: Spec derives principal from SPIFFE ID but doesn't restrict additional principals.** |
|
||
|
||
### Shellstream Extensions Specification (`specs/shellstream-extensions.md`, Section 9)
|
||
|
||
| Topic | Covered | CWE Reference | Assessment |
|
||
|-------|---------|---------------|------------|
|
||
| Extension visibility | Section 9.1 | CWE-200 | Good. "MUST NOT embed secrets." |
|
||
| Authorization boundary | Section 9.2 | CWE-863 (Incorrect Authorization) | Excellent. "Merkle proofs provide auditability, not authorization." |
|
||
| Parsing safety | Section 9.3 | CWE-94 (Improper Control of Code Generation) | Good. "MUST NOT evaluate as code." |
|
||
| Size constraints | Section 9.4 | CWE-770 | Partial. Recommends 4 KB limit but uses SHOULD, not MUST. |
|
||
| Replay/freshness | Section 9.5 | CWE-294 (Authentication Bypass by Capture-replay) | Partial. Relies on cert validity period. No proof-to-cert binding. |
|
||
| **Proof binding** | **NOT COVERED** | **CWE-345 (Insufficient Verification of Data Authenticity)** | **GAP: S-03 — proof is not bound to specific certificate content.** |
|
||
|
||
### Credential Governance Specification (`specs/credential-governance.md`, Sections 9-10)
|
||
|
||
| Topic | Covered | CWE Reference | Assessment |
|
||
|-------|---------|---------------|------------|
|
||
| Merkle anchoring | Section 9.1 | CWE-354 (Improper Validation of Integrity Check) | Good. Detailed merkle construction with domain separation. |
|
||
| Certificate-embedded audit | Section 9.2 | CWE-778 (Insufficient Logging) | Good. Defines required extensions. |
|
||
| Audit queries | Section 9.3 | CWE-778 | Good. Three verification patterns defined. |
|
||
| Chain continuity | Section 9.4 | CWE-354 | Good. Append-only with `previous_root` linkage. |
|
||
| Fail-closed auth | Section 10.1 | CWE-636 (Not Failing Securely) | Excellent. GovernanceService unreachable = operation fails. |
|
||
| Ceremony timeout | Section 10.2 | CWE-613 | Good. Timeout = denial. |
|
||
| Fail-open audit | Section 10.3 | CWE-778 | Acceptable tradeoff, well-documented rationale. |
|
||
| Policy missing | Section 10.4 | CWE-636 | Good. Default to SingleApproval. |
|
||
| **TOCTOU** | **NOT COVERED** | **CWE-367 (TOCTOU Race Condition)** | **GAP: S-06 — policy re-evaluation at redemption not specified.** |
|
||
| **Chain fork** | **NOT COVERED** | **CWE-362 (Concurrent Execution Using Shared Resource)** | **GAP: S-17 — no fork prevention mechanism.** |
|
||
| **Break-glass expiry** | **PARTIAL** | **CWE-613** | **GAP: S-16 — post-hoc approval expiry action undefined.** |
|
||
|
||
---
|
||
|
||
*End of security audit report.*
|