
The deterministic evidence layer for AV safety cases.
FieldSpace populates the behavioral acceptance criteria in the industry-standard safety case framework with bit-reproducible per-event evidence. Built for the safety teams writing the documents Waymo, Zoox, Nuro, and Wayve publish today, and for the insurers underwriting their fleets.
Reproducible evidence the safety case can cite.
The current benchmark package shows FieldSpace running the official 64-scenario nuPlan closed-loop simulation, observing real openpilot drive logs and Waymo Open Motion scenarios, and producing per-event evidence traces that an operator's safety team, an auditor, or an insurer can re-run identically.
Safety cases are authored. Every claim needs evidence.
AV operators publish formal safety cases for regulators, insurers, and the public. Waymo's Safety Case Approach, Zoox's VSSA, Nuro's safety reports, and Wayve's safety framework all share the same problem. Evidence has to be reproducible by auditors, per-event, and across an evolving operating domain.
The evidence layer is written by hand.
Internal safety teams spend significant headcount producing per-claim evidence packs to back the GSN structure of the safety case.
Evidence must replay identically for outside reviewers.
An auditor, an insurer, and a regulator should all be able to re-run the same evidence and get the same output. Sampled neural runs cannot.
Every ODD extension reopens the case.
New conditions, new geographies, and new maneuvers all force fresh evidence. Hand-authoring scales linearly with that surface area.
A deterministic methodology that maps into the framework you already use.
FieldSpace is an independent evaluative methodology. It populates the behavioral acceptance criteria with bit-reproducible per-event evidence. Same input, same output, every time, for the safety team, the auditor, the regulator, and the insurer.
Mapped to the industry-standard acceptance criteria framework.
The Favaro et al. 2023 paper "Building a Credible Case for Safety" defines a five-dimensional acceptance criteria framework for behavioral hazards, and explicitly invites third-party evaluative methodologies to map into it. FieldSpace populates all five dimensions.
| AC DIMENSION | FIELDSPACE EVIDENCE | REPRODUCIBILITY ARTIFACT |
|---|---|---|
| Severity Potential | TTC-bound, risk-field magnitude, predicted delta-V | nuPlan TTC 0.9219 |
| Conflict Role (Initiator / Responder) | Identified per-event from active repulsive field | Per-frame JSON trace |
| Behavioral Capability — Regulatory | Speed-limit + drivable-area compliance | nuPlan 1.000 / 1.000 |
| Behavioral Capability — Conflict Avoidance | Route progress while avoiding conflict initiation | nuPlan 0.9660 |
| Behavioral Capability — Collision Avoidance | No-at-fault collision, early warning lead-time | nuPlan 0.9766 / 13-15s lead |
| Functionality Status | CPU-only engine survives degraded-compute states | nfs-modulus crate |
| Level of Aggregation — Event | Per-frame trace across openpilot + Waymo logs | 60,019 + 4,550 frames |
| Level of Aggregation — Aggregate | Rate-based scores across nuPlan + Waymo + openpilot | 64 scenarios, 50 scenarios, 10 clips |
Credibility of Evidence
Bit-reproducible via exact number theory. Same input, same output, for the operator, the auditor, NHTSA, and the insurer. The strongest possible answer to the bottom-up credibility pillar in Section 4 of the Favaro 2023 paper.
Credibility of Arguments
A different epistemic basis than the ADS being assessed. FieldSpace evidence does not share training data, weights, or sampling assumptions with the system under review, which is what an independent methodology is supposed to provide.
Implementation Credibility
Zero runner failures across 192 official nuPlan simulations. CPU-only and training-free, which reduces implementation attack surface and simplifies auditor review.
Source documents: Favaro et al. 2023 "Building a Credible Case for Safety" (arXiv:2306.01917); Webb et al. 2020 "Waymo's Safety Methodologies and Safety Readiness Determinations" (arXiv:2011.00054); Waymo Safety Case Approach White Paper (2020); UL 4600:2022; ISO 21448:2022 (SOTIF); ISO/AWI TS 5083.
One deterministic observer path.
Scene state to evidence.
Five stages, all auditable, all deterministic. Every observer output is a function of its inputs, with no hidden training state or inference variance.
Perception
Camera + YOLO-class detector + Kalman tracker → object tracks with velocity.
HD Map
Lanelet2 / OSM map, Frenet projection, route planning with lane-change cost.
Prediction
1.5 s motion horizon. Map-aware lane-following, CV/CTRV kinematic fallback.
PDE Field
Continuity + velocity + potential PDEs on 256×64 grid. 0.2 ms solve.
Evidence
Go / slow / stop output, risk trace, and optional benchmark trajectory candidate.
A repeatable fallback trace, not a black-box alert.
When perception drops, route context breaks, or collision risk rises, FieldSpace records the active trigger, risk state, and recommended fallback phase for engineering review.
Safety Case Generator today. Counterfactual replay next. Insurer bridge after.
The first deliverable is a per-claim evidence pack for the operator's safety case. The same engine extends to counterfactual replay on incidents and actuarial-grade input for fleet underwriting.
Safety Case Generator
Per-claim deterministic evidence pack for the operator's published safety case.
- ✓Populates five behavioral AC dimensions
- ✓GSN-structured per-claim evidence pack
- ✓Reproducibility-pinned benchmark traces
- ✓UL 4600 + ISO 21448 aligned output
- ✓Auditor and insurer re-run on same input
- ✓CPU-only, no GPU, no training step
Counterfactual Replay
Drop in an incident log, get a deterministic should-have-happened trace tied to the affected safety claim.
- →Sensor or sim trace ingestion
- →Per-claim impact analysis from one incident
- →Causal-chain mapping into Section 2.2 of the framework
- →Safety-case revision delta on output
- →Defensible answer for post-incident review
Insurer Bridge
Quantitative ODD coverage and expected-loss inputs for the underwriters pricing AV fleets.
- ○Quantitative ODD coverage metrics
- ○Per-mile expected-loss distributions
- ○Premium curve sensitivity to ODD restrictions
- ○Reinsurance-ready evidence package
- ○Same engine, different output shape
v1 lands as a paid validation engagement. v2 follows once an operator's incident-review process is connected. v3 is the insurer-channel multiplier that turns one underwriter win into every fleet they price.
Built for the standards conversation customers already have.
FieldSpace is not claiming vehicle-level certification. We are organizing the observer, replay, and evidence package around the frameworks OEM and Tier-1 safety teams use to review ADAS and autonomy systems.
Functional safety readiness
Supplier safety plan, SEooC assumptions, traceability, verification evidence, and tool-confidence path for validation use cases.
Triggering-condition evidence
Replayable edge cases, false-positive / false-negative review, ODD assumptions, and residual-risk documentation.
Scenario-based validation
Scenario taxonomy, ODD tags, source dataset, trigger type, review status, and pass/fail metrics for replay studies.
Safety-case structure
GSN-style argument skeletons and evidence registers that can become inputs to an OEM-owned safety case.
Cybersecurity and updates
Threat analysis, SBOM, vulnerability handling, release integrity, and update-impact planning for software delivery.
Customer data readiness
Security-control mapping for hosted replay, partner log handling, access review, retention, and supplier-quality review.
Current status: alignment and gap-assessment preparation. FieldSpace does not claim ISO certification, SOC 2, TISAX, UNECE approval, or vehicle-level compliance. Formal scope depends on assessor review and OEM integration context.
Safety Suite v1 results.
Five safety-critical scenarios run in CARLA with synthetic ground truth. Every scenario: earlier hazard detection, zero false positives, all braking margins met. Real-world replay against 182 k frames of comma openpilot drive logs below.
| SCENARIO | BASELINE LEAD | FIELDSPACE LEAD | GAIN |
|---|---|---|---|
| Falling Debris | -0.50s | +0.20s | +0.70s |
| Sudden Cut-In | -0.10s | +0.20s | +0.30s |
| Occluded Pedestrian | -0.30s | +0.20s | +0.50s |
| Stopped Vehicle | -0.40s | +0.20s | +0.60s |
| Sliding Cargo | -0.80s | +0.20s | +1.00s |
182,505 frames of public drive data.
Frame-for-frame replay against comma.ai's openpilot CI route bucket — real cars, real roads, real radar and vision. FieldSpace emitted one warning event and zero false criticals across 31 segments of real driving. 85% fewer spurious alerts than the prior observer.
Built for the entities carrying the safety liability.
Operators of autonomous fleets carry end-state liability and publish safety cases. Insurers underwrite those fleets and need third-party validation. Both are budget owners with a real problem the safety case generator solves.
Autonomous fleet operators
Robotaxi and autonomous delivery operators are already publishing formal safety cases for regulators and the public. FieldSpace is the evidence layer underneath.
Insurance and reinsurance underwriters
AV insurance teams need actuarial-grade safety evidence to price fleet risk. Winning one underwriter creates a downstream requirement at every fleet they price.
Three engagement patterns.
Start with a paid evidence pack against the operator's existing safety case. Extend into counterfactual replay on incidents. Connect into the insurer channel when the case is defensible enough to underwrite.
Safety Case Generator
Operator ships an ODD specification and stack summary. FieldSpace returns a per-claim evidence pack mapped to the behavioral acceptance criteria, with reproducibility-pinned artifacts for every claim.
Counterfactual Replay
Operator submits an incident log. FieldSpace returns the deterministic should-have-happened trace, the affected safety claim, and a structured delta for the next safety case revision.
Underwriting Bridge
Same engine, different output shape. Quantitative ODD coverage and expected-loss distributions for the underwriter pricing the fleet. Currently in scoping with mobility-focused reinsurance teams.

Your safety case needs
a deterministic evidence layer.
Bit-reproducible. Mapped to the acceptance criteria framework you already use. Replayable by your auditors, regulators, and insurers. Bring the safety case. We populate the evidence.