Early prognosis of Covid-19 Infections via Machine Learning

- The team led by Prof. Santiago Mazuelas carries out the research funded by Axa Research Fund

The project aims at the development of algorithms for the early prognosis of COVID-19 patients at the point of care. These algorithms, also referred to as scores or discrimination rules, help the physician triage new patients by providing a prediction of their clinical outcome. Such predictions are key to assess critical resources like ICU beds and respirators.

The activities during Q1Y1 focused on the review of the state-of-the-art in COVID-19 prognosis literature, including a search for potential data sets. The team identified three data sets from the first wave of the pandemic: Wuhan (China), Sao Paulo (Brazil), and NH Hospitals (Spain). We chose to start working with the Spanish data due to its size and completeness, 2547 patients and 100+ clinical tests. After a methodical curation of values and formats, we released our first benchmark data corpus, codename CDSL_HM_1_0. It includes 2378 patients with at least one lab test within the first week of hospital admission. In total, there are 36 lab items flagged as predictors. The age average is 68 years-old. The data set comprises 343 deceased records (49 after ICU) and 1849 home discharges (96 stayed in ICU).

The first and foremost goal for any COVID-19 patient triage is the mortality prediction. We carried out an analysis of relevance of 38 predictor variables (36 lab tests together with the number of comorbidities and symptoms). Our initial results corroborates the importance of three key biomarkers of severity, namely lymphocyte count, C-reactive protein level, and the amount of urea nitrogen in your blood. However, either these markers alone or all of the complete blood panel are not enough for an effective mortality detection (see attached figure for a distribution of the source data). In the next months, we will explore prediction algorithms based on cost-sensitive functions that greatly penalize false negative errors, that is, survival prediction however the patient died.