Synset

Synthetic clinical trajectories, not just synthetic records.

Synset generates privacy-preserving synthetic EHR data with trajectory-level certification, from static clinical snapshots to support-certified acute-care trajectories across 96 sequential hourly transitions.

Built for clinical AI developers, researchers, bioinformaticians, and healthcare teams that need realistic data without direct PHI exposure.

Static record

certified

Hour 1

certified

Hour 2

certified

Hour 24

certified

Hour 48

certified

Hour 72

certified

Hour 96

certified

Synthetic patient world loop

Watch trajectories move through support, certification, and generated evidence.

Inspired by the internal semantic-AE demo, this loop shows density support anchors, generated trajectory paths, hourly transition markers, and evidence artifacts emitted from the same synthetic clinical state.

1,450

support anchors

96

hourly transitions

3

artifact channels

Faux generator

Evidence-grounded artifact stream

SYN-ACU-0421

Acute hypoxemic trajectory

Hour 48 / 96
MAP 72HR 104Lactate 2.1Hgb 10.8
Note output

Synthetic ICU progress note: patient remains on high-flow oxygen with improving work of breathing after diuresis and bronchodilator support.

Structured evidence sidecar shows stable mean arterial pressure, mild tachycardia, falling lactate, and no certificate-breaking lab/vital transition.

Plan: continue oxygen wean, repeat metabolic panel, maintain strict intake/output tracking, and reassess antimicrobial need at next certified state point.

Faux synthetic artifact for product demonstration only. No PHI, no source IDs, no patient reconstruction.

Real-world clinical data is powerful. It is also hard to use.

EHR data is fragmented, incomplete, privacy-sensitive, and expensive to access. Researchers and AI teams spend months negotiating access, cleaning records, handling missingness, resolving inconsistencies, and building test datasets before a single model can be evaluated.

Synthetic data should not just imitate a spreadsheet.

Clinical AI needs data that captures how patient states evolve, how documentation changes, and how models behave when evidence is incomplete, noisy, or longitudinal.

Platform

Structured data, trajectories, notes, and dialogues from the same synthetic clinical world.

Structured Synthetic EHR Data

Generate privacy-preserving tabular clinical datasets for research, prototyping, and model development.

  • Static synthetic EHR records
  • Configurable feature completeness
  • Support and consistency checks
  • No direct patient reconstruction

Acute Longitudinal Trajectories

Generate synthetic acute-care trajectories with certified progression across sequential hourly transitions.

  • 96 sequential hourly transitions validated internally
  • Trajectory-level certification before release
  • Support checks across time, not just row-level plausibility
  • Designed for ICU / inpatient acute-care simulation

Evidence-Grounded Clinical Text

Attach locally generated synthetic progress notes, discharge-style summaries, portal messages, and patient-clinician dialogues as sidecars.

  • Grounded to structured synthetic evidence
  • Local LLM generation
  • No cloud LLM required by default
  • Text certificates and grounding audits

Clinical AI Validation

Stress-test clinical AI systems with controlled synthetic perturbations.

  • ICD coding robustness
  • Clinical summarization testing
  • Risk model stress tests
  • Missingness, wording, subgroup, and longitudinal perturbations

Most synthetic healthcare datasets generate rows. Synset generates clinical evolution.

A static synthetic record can look plausible in isolation. But real clinical data changes over time.

Synset evaluates synthetic trajectories across sequential transitions, checking whether generated patient states remain consistent with observed clinical support constraints.

Highlight metric

Support-certified acute-care trajectories across 96 sequential hourly transitions.

In internal validation, Synset's acute trajectory engine generated support-certified trajectories across four days of hourly clinical evolution. Each trajectory is evaluated as a longitudinal sequence, not merely as disconnected rows.

t0
t1
t2
t3
...
t96

96 transitions. 97 state points including the initial anchor.

Validation

Every released trajectory should earn its way out.

Synset separates generation from certification. Candidate trajectories are generated, checked, filtered, and audited before they are treated as releasable synthetic data.

See validation metrics

Acute Trajectory Validation

96 hourly transitions

Support-certified acute-care trajectory generation through 96 sequential one-hour steps.

Accepted-Only Validator Audit

0 / 14,143

Observed validator violations among accepted acute transition candidates in accepted-only audit.

Independent accepted-only validator audit, not held-out future patient accuracy.

Conservative Upper Bound

~0.021%

Approximate 95% upper bound under rule-of-three for accepted-only acute validator violations.

Reanchored Yield

~7x

Improvement in final-horizon certified yield in internal h96 validation.

Internal DGX validation; not a held-out future outcome claim.

Validation distinguishes between candidates rejected before release, accepted trajectories that pass certification, independent post-certification audits, and held-out future validation when row-level future data is available.

Chronic trajectory modeling is under active development.

Synset's chronic/outpatient transition prototype is aggregate-prior-distilled and calibrated against longitudinal empirical transition priors. It is not yet marketed as held-out future patient trajectory prediction.

Diligence-ready statement

In withheld empirical-prior validation, the optimized chronic release policy achieved 0 observed final violations across 3,953 withheld-prior audit checks, with a conservative one-sided 95% upper bound of approximately 0.076%.

This is withheld empirical-prior validation, not held-out patient-future accuracy.

Public-safe version: Synset is developing chronic/outpatient trajectory generation calibrated against withheld empirical longitudinal-prior constraints.

Clinical AI validation

Stress-test clinical AI before it fails in the real world.

Synset can generate clinically plausible perturbations of structured data, notes, dialogues, missingness patterns, subgroup context, and longitudinal evolution.

Where does the model become unstable?

Synset turns synthetic clinical generation into model-under-test infrastructure.

Generate the world. Perturb the evidence. Audit the model.

Example: ICD Coding Model Audit

Generate thousands of clinically plausible variations of coding inputs and identify where an ICD model becomes unstable, overconfident, or sensitive to:

note wording

missing labs

longitudinal context

care setting

disease progression

subgroup context

documentation style

patient-clinician dialogue

Configurable realism

Preserve real-world messiness, or control it.

Real EHR data is messy because real healthcare is messy. Sometimes researchers need that realism. Sometimes they need analysis-ready completeness. Synset supports both modes.

Realistic Mode

Preserve:

  • missingness patterns
  • measurement-density variation
  • incomplete follow-up
  • care-setting variation
  • noisy documentation patterns

Analysis-Ready Mode

Require:

  • complete features
  • clean longitudinal windows
  • selected lab/vital availability
  • specific time horizons
  • trajectory certification before release

Synset does not pretend clinical data is perfect. It gives users control over the realism/completeness tradeoff.

Text sidecars

Synthetic notes and dialogues grounded to structured synthetic evidence.

Synset can attach locally generated clinical text sidecars to accepted synthetic trajectories, including progress notes, discharge-style summaries, patient-clinician dialogues, portal-message-style threads, and certificate-aware failure appendices.

The structured synthetic EHR remains the source of truth. Text is generated as a sidecar, validated against structured evidence, and rejected if it introduces unsupported facts.

Local vLLM / Qwen generation
No cloud LLM required by default
Text certificates
Grounding audits
Redaction checks
Think-block stripping
No source IDs
No patient reconstruction

Use cases

Built for teams that need clinical data they can actually use.

Clinical AI Companies

Validate and stress-test models before deployment.

Bioinformatics Teams

Prototype and benchmark without waiting months for data access.

Researchers

Generate privacy-preserving cohorts with configurable completeness and longitudinal structure.

Hospitals and Health Systems

Evaluate AI models without exposing protected patient data.

Education and Simulation

Create realistic synthetic clinical scenarios, notes, and dialogues for training and demonstration.

Claims

Measured claims, not hand-waving.

Public-Ready

  • Support-certified acute-care trajectories across 96 sequential hourly transitions.
  • Structured synthetic EHR generation with trajectory-level quality controls.
  • Optional evidence-grounded clinical note/dialogue sidecars.
  • Configurable missingness and completeness.

Diligence-Ready

  • 0 / 14,143 accepted acute candidates had accepted-only validator violations.
  • Approximate 95% upper bound: ~0.021%.
  • Reanchoring improved final-horizon h96 certified yield by ~7x.
  • Chronic prototype achieved 0 / 3,953 final withheld-prior violations after secondary release gating.
  • Approximate 95% upper bound for chronic withheld-prior violations: ~0.076%.

Not Claimed

  • Perfect data.
  • Clinical truth.
  • True future patient prediction.
  • Synthetic control arms.
  • Causal treatment effects.
  • Production-ready clinical decision support.
  • Held-out patient-future accuracy for chronic.

Build, test, and validate clinical AI with synthetic patient worlds.

Synset is opening pilot collaborations with clinical AI teams, researchers, and healthcare organizations interested in privacy-preserving synthetic EHR data, longitudinal trajectory generation, and clinical AI robustness testing.