medwireNews: Researchers have used machine learning to construct a model that has high accuracy for predicting women’s risk for gestational diabetes, even when applied as early as pregnancy initiation.
The model “substantially” outperformed a standard baseline risk score, say Eran Segal (Weizmann Institute of Science, Rehovot, Israel) and study co-authors, and its accuracy was only slightly reduced when they simplified it to nine questions that women could answer themselves.
The team drew data from nationwide electronic health records for 368,351 women who had 588,622 pregnancies between 2010 and 2017. In this cohort, 3.9% developed gestational diabetes, diagnosed with a glucose challenge test plus an oral glucose tolerance test.
The machine learning predictive model developed from these data had an area under the receiver operating characteristic curve (ROC) of 0.854, where 1.0 is perfect accuracy for distinguishing between women with and without gestational diabetes and 0.5 is no better than chance.
This was a marked improvement on the performance of a standard baseline risk score, which gave an area under the ROC of 0.682. The machine learning model outperformed the baseline risk score in the whole cohort, and also in subgroups of women at high risk according to their baseline score and those with a first pregnancy.
The baseline risk score comprised seven variables recommended by the US National Institutes of Health (age, overweight, family history of diabetes, previous pregnancy complications, prediabetes, polycystic ovary syndrome [PCOS], and vascular risk factors). The machine learning model, by contrast, used 2355 variables.
However, when the researchers looked at the relative importance of these variables, they found that a fairly small number of features made the largest contributions to women’s gestational diabetes risk. Key among these was the result of a glucose challenge test in a previous pregnancy, which was a considerably more powerful predictor than even a previous gestational diabetes diagnosis.
The findings “drove us to try and establish a simpler prediction model based on a minimal number of the most influential features,” write Segal and team in Nature Medicine.
This resulted in them creating a survey of just nine questions covering the following predictive variables:
- weight and height;
- first-degree relatives with diabetes;
- risk factors including vascular disease, PCOS, prediabetes, and previous gestational diabetes;
- glycated hemoglobin value; and
- previous pregnancies and the results of any associated glucose challenge tests.
A model based on these variables had an area under the ROC of 0.799 in 417,601 women with no missing values for these data, compared with 0.678 for the baseline risk score.
“Our predictive model could become the basis for a selective screening process for [gestational diabetes] diagnosis, and for identification and implementation of early-stage pregnancy interventions to prevent or reduce the development of [gestational diabetes] and its associated adverse health outcomes,” conclude the researchers.
medwireNews is an independent medical news service provided by Springer Healthcare. © 2020 Springer Healthcare part of the Springer Nature group