Machine learning-based prediction of antibiotic resistance in Mycobacterium tuberculosis clinical isolates from Uganda.

Publication date: Dec 05, 2024

Efforts toward tuberculosis management and control are challenged by the emergence of Mycobacterium tuberculosis (MTB) resistance to existing anti-TB drugs. This study aimed to explore the potential of machine learning algorithms in predicting drug resistance of four anti-TB drugs (rifampicin, isoniazid, streptomycin, and ethambutol) in MTB using whole-genome sequence and clinical data from Uganda. We also assessed the model’s generalizability on another dataset from South Africa. We trained ten machine learning algorithms on a dataset comprising of 182 MTB isolates with clinical data variables (age, sex, HIV status) and SNP mutations across the entire genome as predictor variables and phenotypic drug-susceptibility data for the four drugs as the outcome variable. Model performance varied across the four anti-TB drugs after a five-fold cross validation. The best model was selected considering the highest Mathews Correlation Coefficient (MCC) and Area Under the Receiver Operating Characteristic Curve (AUC) score as key metrics. The Logistic regression excelled in predicting rifampicin resistance (MCC: 0. 83 (95% confidence intervals (CI) 0. 73-0. 86) and AUC: 0. 96 (95% CI 0. 95-0. 98) and streptomycin (MCC: 0. 44 (95% CI 0. 27-0. 58) and AUC: 0. 80 (95% CI 0. 74-0. 82), Extreme Gradient Boosting (XGBoost) for ethambutol (MCC: 0. 65 (95% CI 0. 54-0. 74) and AUC: 0. 90 (95% CI 0. 83-0. 96) and Gradient Boosting (GBC) for isoniazid (MCC: 0. 69 (95% CI 0. 61-0. 78) and AUC: 0. 91 (95% CI 0. 88-0. 96). The best performing model per drug was only trained on the SNP dataset after excluding the clinical data variables because intergrating them with SNP mutations showed a marginal improvement in the model’s performance. Despite the high MCC (0. 18 to 0. 72) and AUC (0. 66 to 0. 95) scores for all the best models with the Uganda test dataset, LR model for rifampicin and streptomycin didn’t generalize with the South Africa dataset compared to the GBC and XGBoost models. Compared to TB profiler, LR for RIF was very sensitive and the GBC for INH and XGBoost for EMB were very specific on the Uganda dataset. TB profiler outperformed all the best models on the South Africa dataset. We identified key mutations associated with drug resistance for these antibiotics. HIV status was also identified among the top significant features in predicting drug resistance. Leveraging machine learning applications in predicting antimicrobial resistance represents a promising avenue in addressing the global health challenge posed by antimicrobial resistance. This work demonstrates that integration of diverse data types such as genomic and clinical data could improve resistance predictions while using machine learning algorithms, support robust surveillance systems and also inform targeted interventions to curb the rising threat of antimicrobial resistance.

Open Access PDF

Concepts Keywords
Informatics Adult
Mycobacterium Antimicrobial resistance
Outperformed Antitubercular Agents
Tuberculosis Antitubercular Agents
Uganda Clinical
Drug resistance
Drug Resistance, Bacterial
Female
Genes
Humans
Isoniazid
Isoniazid
Machine Learning
Machine learning
Male
Microbial Sensitivity Tests
Middle Aged
Mutations
Mycobacterium tuberculosis
Mycobacterium tuberculosis
Polymorphism, Single Nucleotide
Rifampin
Rifampin
South Africa
Streptomycin
Streptomycin
Tuberculosis, Multidrug-Resistant
Uganda
Whole Genome Sequencing
Whole-genome sequence
Young Adult

Semantics

Type Source Name
disease IDO antibiotic resistance
disease MESH tuberculosis
pathway KEGG Tuberculosis
drug DRUGBANK Rifampicin
drug DRUGBANK Isoniazid
drug DRUGBANK Streptomycin
drug DRUGBANK Ethambutol
drug DRUGBANK Pentaerythritol tetranitrate
disease IDO susceptibility
drug DRUGBANK MCC
drug DRUGBANK Flunarizine
pathway REACTOME Reproduction
disease MESH Infectious Diseases
drug DRUGBANK Coenzyme M
disease IDO drug susceptibility
disease MESH emergency
drug DRUGBANK Nitazoxanide
drug DRUGBANK Pyrazinamide
drug DRUGBANK Trestolone
disease IDO assay
drug DRUGBANK Methylergometrine
drug DRUGBANK Ademetionine
disease IDO blood
disease IDO cell
drug DRUGBANK Bedaquiline
drug DRUGBANK Linezolid
drug DRUGBANK Clofazimine
drug DRUGBANK Nonoxynol-9
drug DRUGBANK L-Valine
drug DRUGBANK Ranitidine
disease IDO algorithm
drug DRUGBANK Saquinavir
drug DRUGBANK Methionine
disease MESH confusion
drug DRUGBANK Esomeprazole
disease IDO process
drug DRUGBANK Indoleacetic acid
disease MESH co infection
drug DRUGBANK Proline
drug DRUGBANK Guanine
disease IDO host
disease IDO pathogen
disease IDO virulence
disease MESH tics
drug DRUGBANK L-Phenylalanine
drug DRUGBANK Isoxaflutole
disease IDO intervention
drug DRUGBANK Cysteamine
disease IDO history
disease IDO country
drug DRUGBANK Serine
disease MESH Pulmonary Tuberculosis
drug DRUGBANK Guanosine
disease MESH Tuberculosis Multidrug-Resistant

Original Article

Leave a Comment

Your email address will not be published. Required fields are marked *