The CAMDA Contest Challenges
For 2023, we present:
- The Synthetic Clinical Health Records Challenge provides a rich set of highly realistic Electronic Health Records (EHR) tracing the diagnosis trajectories of diabetic patients, created with dual-adversarial auto-encoders trained on data from 1.2 million real patients in the Population Health Database of the Andalusian Ministry of Health. Predict relevant diabetes endpoints like blindness or cardiopathy from past diagnosis trajectories!
- The Anti-Microbial Resistance Prediction and Forensics Challenge features both resistance profiles of clinical isolates as well as environmental meta-genomics sequences. Predict resistant bacteria and identify resistance genes to track emerging resistance in urban and non-urban locations!
CAMDA encourages an open contest, where all analyses of the contest data sets are of interest, not limited to the questions suggested here. There is an
online forum
for the free discussion of the contest data sets and their analysis, in which you are encouraged to participate.
We look forward to a lively contest!
Synthetic Clinical Health Records
Although data protection is necessary to protect patients' intimacy, privacy regulations are also an obstacle to biomedical research. An interesting alternative is the use of synthetic patients. However, conventional synthetic patients are useless for discovery given that they are built out of known data distributions. Interestingly, Generative Adversarial Networks (GANs) and related developments have emerged as powerful tools to generate synthetic data in a way that captures relationships between the variables produced even if such relationships were previously unknown. GANs became popular in the generation of highly realistic synthetic pictures but have been applied in many fields, including in the generation of synthetic patients with applications such as medGAN and others.
This dataset includes an ordered list of pathologies for 999,936 synthetic patients. These synthetic patients have been generated from a real cohort with 979,308 diabetes patients, retrieved from the Health Population Database (Base Poblacional de Salud, BPS) at the Andalusian Health System (Spain). The synthetic dataset has been generated with a Dual Adversarial AutoEncoder (DAAE) approach. Two challenges are suggested on these data, although any other original analysis you may think will alse be wellcome:
1) Finding some strong relationships in diabetes-associated pathologies that allows to predict any pathology before this is diagnosed. Some well-known pathological diabetes consequences, which can be considered relevant endpoints to predict, can be: a) Retinopathy (Code “703”), b) Chronic kidney disease (Code “1401”), c) Ischemic heart disease (Code “910”), d) Amputations (Code “1999”)
2) Another proposed challenge is the prediction of disease trajectories in diabetes patients (see for example: Jensen et al. Nat Commun. 2014)
Prediction proposals which are submitted with the model trained and the code required to run the model can be tested on the real dataset by the organisers and participate in a collective publication.
Data download
Please read and accept the data download agreement for access.
Please sign up to announcements from the CAMDA general discussion forum for alerts.
We thank the Institute of Advanced Research in Artificial Intelligence for its support.
Anti-Microbial Resistance Prediction and Forensics
Antimicrobial resistance is one of the biggest challenges facing modern medicine. Because the management of COVID-19 was increasingly becoming dependent on pharmacological interventions, there is greater risk for accelerating the evolution and spread of antimicrobial resistance [1]. A study in a tertiary hospital environment revealed concerning colonisation patterns of microbes during extended periods [2]. It also highlighted the diversity of antimicrobial resistance gene reservoirs in hospitals that could facilitate the emergence and transmission of new modes of antibiotic resistance. This year we would like CAMDA community to look into AMR related challenges.
The goal will be to explore a metagenomic surveillance data from a selection of about 400 samples provided by MetaSUB International Consortium collected during global City Sampling Day 2016 and 2017 in several cities in US (Baltimore, Denver, Mineapollis, New York, Sacramento, San Antonio) and worldwide (Berlin, Bogota, Doha, Ilorin, Lisbon, Sao Paulo, Tokyo, Vienna, Zurich) to trace the AMR patterns.
A focus should be placed especially on AMR markers and resistance groups identified in about 150 isolates from hospital in one of abovementioned US cities collected in similar time. Can you tell which one?
As it was shown in the past CAMDA challenges an antibiotic resistance as functional biomarkers can accurately predict the origin of urban metagenomics samples.
Of course, you are welcome to use your imagination to carry out any side analysis that you would like, using the provided datasets.
Data download
Please read and accept the data download agreement for access.
Please sign up to announcements from the CAMDA metagenomics forum for alerts.
[1] E Afshinnekoo et al. COVID-19 drug practices risk antimicrobial resistance evolution. The Lancet Microbe (2021), 10.1016/S2666-5247(21)00039-2
[2] KR Chng et al. Cartography of opportunistic pathogens and antibiotic resistance genes in a tertiary hospital environment. Nature Medicine (2020), 1-11
STAY CONNECTED
Tweet