Procesado de datos para la toma de decisiones y aprendizaje máquina abierto

BCAM principal investigator: Santiago Mazuelas

BCAM research line(s) involved:

003

Machine Learning

BCAM research area(s) involved:

Data Science & Artificial Intelligence

Reference: Becas Leonardo a Investigadores y Creadores Culturales 2018

Coordinator: BCAM - Basque Center for Applied Mathematics

Duration: 2018 - 2020

Funding agency: BBVA Foundation

Type: National Project

Status: Closed

Project website: https://www.redleonardo.es/beneficiario/santiago-mazuelas/

Objective:

Data serves to improve TD as data reduces uncertainty about consequences associated with actions taken. For example, certain keywords in an email serve to reduce uncertainty about the usefulness of the message and therefore improve the filtering of unwanted messages. AMS uses data obtained in a training stage to obtain a function that is used to predict labels from attributes. Despite the progress made during the last decades, the tools available to the AMS system designer are often ad hoc and lack generality, transparency, usability, and interoperability [1]. A paradigmatic case is the disparity of existing methods for AMS problems depending on the type of training data used [2]. The fact that AMS methods are completely different depending on how the training data has been obtained makes it difficult to perform AMS in an open way. This difficulty is significantly exacerbated when heterogeneous training data obtained through collaborative and distributed efforts are to be used. This project has two main objectives: 1) to interpret AMS problems as TD problems and 2) to develop techniques for open AMS. Such a TD-theoretic interpretation can unify multiple AMS problems using diverse training data. This unification would enable the effective use of collaboratively collected and distributed data. For example, training data obtained by participants with very high annotation capabilities or using sophisticated attributes could be used in conjunction with other data obtained by participants who annotate data quickly or from simple attributes.