Michigan State researcher details Macromorphoscopic Databank, a tool to standardize ancestry estimates
Get AI-powered insights, summaries, and transcripts
SubscribeSummary
Joseph Heffner of Michigan State described the Macromorphoscopic Databank (MAMD) and the MMS analytical program, which pair standardized trait definitions and a growing global reference dataset to produce probabilistic ancestry classifications and measure observer error.
Joseph Heffner, an assistant professor at Michigan State University and a board-certified forensic anthropologist, presented the Macromorphoscopic Databank (MAMD) and the MMS analytical program during a forensic anthropology session. He said the project, funded in part by the National Institute of Justice, aims to standardize macromorphoscopic trait collection and provide probabilistic ancestry estimates to support forensic identification.
Heffner framed the problem as one of data and standardization. "This is something that's been, on my mind for about probably 10 or 15 years," he said, explaining that earlier trait studies suffered from small, inconsistent samples and unclear trait definitions. The project’s three goals are to develop large-scale standardized protocols, assemble a geographically diverse reference database and build an analytical program that classifies geographic origin from trait frequencies.
The software Heffner described—MMS, at version 1.6.1—encodes up to 17 macromorphoscopic traits with precise character-state definitions and stores records in Advantage Data Architect relational tables. He said the Smithsonian’s macromorphoscopic data boosted the databank by roughly 4,000 individuals; the full MAMD currently houses about 7,500 individuals, including roughly 3,000 modern specimens. Heffner reported known-age coverage for about 71% of the modern sample and self-identified ancestry for about 84.7% of those individuals, and emphasized that those figures will change as collection continues.
On methods, Heffner described tests of several classification techniques. He called artificial neural networks and support vector machines "black box" methods and noted k-nearest neighbors mimics human similarity judgments. He highlighted canonical analysis of principal coordinates (CAP), adapted from ecological statistics, as particularly useful because it transforms categorical trait data into a form usable by traditional discriminant approaches. He said early tests (a 3-group model) returned classification accuracies above 90%.
Heffner gave a practical example: a set of scores from a Hawaii lab initially classified to a pooled Pacific Island grouping; drilling down in the databank showed closest matches to Borneo, Indonesia and the Solomon Islands. He advised reporting multiple close matches rather than asserting a single definitive origin when groups are tightly clustered. "You're gonna want to report all of those groups," he said.
Heffner also outlined efforts to measure and reduce observer error. Using Smithsonian data, his team tested whether observer identity could be predicted from trait scoring patterns; traits with more states (4–5) were most informative about observer differences. He said the team has removed some longstanding traits that did not perform reliably and added new traits submitted by collaborators.
Heffner said the combined databank and MMS will allow practitioners to attach quantified uncertainty to ancestry statements. "We can start saying I'm gonna be wrong but I'm also gonna be right 80% of the time," he said, characterizing the aim as verified and validated probabilistic reporting rather than purely subjective judgments.
He closed by thanking NIJ and contributors to the databank, his graduate students who collected data (including a recent trip to Thailand), Michigan State faculty and Candace Linde for photographs and editorial comments. He noted MMS was still under testing and refinement and that continued data-sharing with other collections will be central to improving resolution and reducing error.
