Health

Development and validation of gestational diabetes treatment modality predictive models using supervised machine learning: a population-based cohort study – BMC Medicine

Development and validation of gestational diabetes treatment modality predictive models using supervised machine learning: a population-based cohort study - BMC Medicine
Written by adrina

Study population and design

The study population was drawn from members of Kaiser Permanente Northern California (KPNC), an integrated health care system with 4.5 million members. KPNC membership accounts for approximately 30% of the underlying population and is socio-demographically representative of the population residing in the served geographic areas [11, 12]. The integrated information system allows for the quantification of predictors and outcomes across the entire pregnancy continuum. Individuals with GDM are identified by searching the KPNC Pregnancy Glucose Tolerance and GDM Registry, which is an active surveillance registry that downloads laboratory data to determine screening and diagnosis for GDM, with pre-existing type 1 or 2 diabetes automatically excluded becomes. In particular, pregnant women at KPNC receive in the 24.-28. Universal screening (98%) for GDM at 1 week gestation using the 50 g 1 h glucose challenge test (GCT). [1]. If the screening test is abnormal, a diagnostic 100 g 3-hour oral glucose tolerance test (OGTT) is performed after an 8-12 hour fast. GDM is determined by meeting one of the following criteria: (1) ≥ 2 OGTT plasma glucose values ​​that meet or exceed Carpenter-Coustan breakpoints: 1 h 180 mg/dL, 2 h 155 mg/dL, and 3 h 140 mg/dL; or (2) 1-h GCT ≥ 180 mg/dL and a fasting glucose ≥ 95 mg/dL, performed alone or during the oGTT [13, 14]. Plasma glucose measurements were performed using the hexokinase method at the KPNC regional laboratory participating in the College of American Pathologists accreditation and oversight program [15]. This data-only project was approved by the KPNC Institutional Review Board, which waived the requirement for informed consent from participants.

Among 405,557 pregnancies with gestational age at delivery < 24 weeks' gestation delivered at 21 KPNC hospitals from January 1, 2007 to December 31, 2017, we excluded 375,041 (92.5%) subjects without GDM. Among 30,516 GDM pregnancies, we further excluded individuals with GDM diagnosed before universal GDM screening (n= 42), deriving an analytical sample of 30,474 GDM-complicated pregnancies. We further derived a discovery set containing 27,240 GDM-complicated pregnancies from 2007 to 2016 and a temporal/future validation set of 3234 GDM-complicated pregnancies in 2017 (Fig. 1).

Fig. 1

Flowchart for the evolution of the gestational diabetes cohort 2007–2017. GDM: gestational diabetes mellitus

results analysis

Individuals diagnosed with GDM received a general referral to the KPNC Regional Perinatal Service Center for the supplemental program of care beyond their standard of prenatal care. MNT was the first-line therapy. If glycemic control goals were not met with MNT alone, pharmacological treatment was instituted. Based on advice on risks and benefits of oral antidiabetics versus insulin, pharmacologic treatment was chosen via a shared patient-physician decision-making model: (1) with oral antidiabetics such as glyburide and metformin added to MNT and at optimal glycemic Control continued to fail, oral medication was escalated to insulin therapy and (2) or insulin therapy started directly beyond MNT (an additional table shows this in more detail). [see Additional file 1]). We searched the pharmacy information management database for prescriptions for oral agents (glyburide 97.9%, metformin or other) and insulin for GDM diagnosis. Treatment modality was grouped into MNT only and pharmacologic treatment (oral agents and/or insulin) beyond MNT. Notably, despite an overall large sample size, we grouped oral agents (32.6% of the total population) and insulin (6.2%) into pharmacological treatment because of insufficient power to predict insulin as an outcome separately.

candidate predictors

Based on risk factors associated with GDM treatment modality and clinician input, we selected 176 (64 continuous and 112 categorical) candidate sociodemographic, behavioral, and clinical predictors derived from electronic health records for model development. Candidate predictors were classified into four levels based on availability at different stages of pregnancy (an additional table shows this in more detail). [see Additional file 2]): Level 1 predictors (n= 68) were available at baseline and predated to 1 year prior to index pregnancy; Level 2 predictors (n= 26) were measured from last menstrual period to before GDM diagnosis; Level 3 predictors (n= 12) were available at the time of GDM diagnosis; and level 4 (n= 70) included self-monitoring of blood glucose levels (SMBG) as the primary measure of glycemic control during pregnancy, as recommended by the American Diabetes Association [5], measured in the first week after GDM diagnosis. All predictors, levels 1-4, were measured before the outcome of interest (i.e. last line of GDM treatment). Pregnant subjects with GDM in our study population had a mean of 11.8 weeks (SD: 6.6 weeks) of SMBG measurements between GDM diagnosis and delivery. We included data 1 week after GDM diagnosis to allow for earlier prediction, as there is an average delay of 5.6 weeks between GDM diagnosis and the offer of optimal treatment. Notably, people with GDM were generally offered the option of enrolling in a supplemental GDM care program administered by nurses and nutritionists via telemedicine from the KPNC Regional Perinatal Service Center [16]. All subjects with GDM were instructed to self-monitor and record glucose measurements four times a day: fasting before breakfast and 1 h after the start of each meal. SMBG measurements were then reported to nurses or registered dietitians during weekly telephone counseling calls from admission through delivery, and the data was recorded in the Patient Reported Capillary Glucose Clinical Database.

Statistical analysis

preprocessing

We imputed missing values ​​using the random forest algorithm because the algorithm does not require parametric model assumptions that reduce the predictor’s efficiency (an additional table shows this in more detail [see Additional file 2]). We assessed the estimate of the true imputation error using the normalized mean squared error and the proportion of misclassified entries for continuous and categorical variables, respectively. Both values ​​were close to 0, indicating good performance on imputation (an additional table shows this in more detail [see Additional file 3]). After the preprocessing, we got busy ttest and Pearson’s chi-square test comparing participant characteristics between the discovery and temporal/future validation sets. We performed the Mann-Kendall test to examine secular trends for GDM treatment modalities across calendar years. The discovery set (2007–2016) was stratified by calendar year and treatment modality for a 10-fold cross-validation. The temporal/future validation set (2017) was stratified by treatment modality for the calculation of cross-validated predictive power.

Variable selection and full model development and comparison

We have prediction by Classification and Regression Tree (CART), Least Absolute Shrinkage and Selection Operator (LASSO) regression, and Super Learner (SL) prediction with predictors of levels 1, 1-2, 1-3, and 1-4, respectively carried out. CART and LASSO regression were chosen as simple prediction methods compared to SL. The SL defines a set of candidate machine learning algorithms, namely the library, and combines prediction results through meta-learning via cross-validation [17]. SL has the asymptotic property that it is at least as good (in risk, defined by the negative log-likelihood) as the best-fitting algorithm in the library [17]. Although the variables included in the final ensemble SL cannot be easily interpreted for their individual contributions, SL can be used for optimal prediction performance and to benchmark simpler and less adaptive approaches [17].

We tuned the prediction methods as follows. In CART, the Gini index measures the heterogeneity composition of the subset with respect to the outcome, and the maximum depth (6) was defined as the stopping criterion. Taking into account potential errors from the risk curve estimation, the regularization parameter in the LASSO regression was chosen from the cross-validated error within a standard error of its minimum value [18]. For the SL, we looked at a simple and a complex library for comparison. The simple library included response mean, LASSO regression, and CART; the complex library expands with additional Random Forest and Extreme Gradient Boosting (XGBoost). Several XGBoosts were considered, with tuning parameters set to 10, 20, 50 trees, 1 to 6 maximum depths, and 0.001, 0.01, and 0.1 shrinkage for regularization.

For models using predictors at each level, prediction results were assessed using ten-fold cross-validated receiver operating characteristic curves and the area under the receiver operating characteristic curve (AUC) statistic in the discovery and temporal/future validation sets. We used the Delong test to compare AUCs between different prediction algorithms at the same predictor level or within the same prediction algorithm across all levels [19]. We used permutation-based variable importance to calculate the AUCs with 5 simulations and obtained the 10 most important characteristics. Permuting one variable at a time, the method calculated the AUC difference before and after the permutation to assign a measure of importance [20]. The model with the highest AUC in the validation set was chosen as the final full model.

development of simpler models

To improve interpretability and potential clinical uptake, we used 10-way cross-validated logistic regression to develop simpler models in the discovery set, based on a minimal set of the most important features at each level, as opposed to the full set of features, used in the complex SL. We additionally selected interaction terms considering all cross products by stepwise forward and backward selection according to the Akaike information criterion. We evaluated the predictive performance (i.e., simplicity and cross-validated AUCs) of these simpler models in the validation set. In addition, calibration was examined by assessing the quality of an uncalibrated model via the built-in calibration index, which captured the distribution of predicted probabilities coupled with a calibration plot. The calibration method (ie isotonic regression) was implemented to recalibrate in case of observed over- or underprediction.

#Development #validation #gestational #diabetes #treatment #modality #predictive #models #supervised #machine #learning #populationbased #cohort #study #BMC #Medicine

 







About the author

adrina

Leave a Comment