Abstract
Clinical artificial intelligence (AI) is trained on coded electronic health record (EHR) and claims data. Survivorship and informative-censoring bias are recognised, but usually framed as analytic problems. We ask a prior, structural question: is death, the outcome that matters most, even present in the coded substrate that models learn from? We examined two substrates: MIMIC-IV, a real hospital EHR with ICD-coded diagnoses and administrative death fields; and a Synthea cohort with SNOMED CT coded conditions, in which the generator knows every death exactly (ground truth). Death was essentially never coded as a diagnosis or condition: 0 of 4,506 diagnosis rows in MIMIC-IV and 0 of 118 Synthea concepts encoded death, despite ground-truth mortality of 31% and 31.3%. In MIMIC-IV, 51.6% of deaths were invisible to encounter data, knowable only via an external death-record link. Decedents carried more diagnoses than survivors in real data (18.8 vs 14.9 per admission) but not in synthetic data, locating the survivorship gradient in real-world coding rather than the generative model. Yet the dead are not deleted: in JSON and FHIR exports, decedents' full histories persisted (about 2,980 resources each) and 0 of 104,291 clinical resources carried a death flag, which was confined to one demographic field. The dead are therefore simultaneously invisible to care and immortal in storage, a permanence that documentation-retention duties mandate. A clinical AI trained on coded data has no death feature, learns the dead as if living, and cannot tell.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)