Executive Secretary

II Conferencia Internacional de Procesamiento de la Información "CIPI - IOTAI2019" -International Workshop on Internet of Things and Artificial Intelligence

II Conferencia Internacional de Procesamiento de la Información

CIPI - IOTAI2019

A Quality Measure for Multi-label Datasets on the Apache Spark Framework

In the last years, the amounts of data have increased considerably and therefore, it is becoming more complex to handle these volumes of information. Measuring the data quality is a pivotal aspect to assess the classifier's discriminatory power as the classifiers accuracy heavily depends on the data used to build the model. Multi-label classification is one specific type of classification problem, which has generated an increasing interest in recent years. However, there are no quality measures for multi-label datasets implemented in cluster computing frameworks to evaluate large datasets. This work aims to implement a measure of data quality for multi-label datasets based on Granular Computing under the Apache Spark framework. As a result, it was possible to calculate the values of the quality measure for the datasets, and even in relatively short times.

In the last years, the amounts of data have increased considerably and therefore, it is becoming more complex to handle these volumes of information. Measuring the data quality is a pivotal aspect to assess the classifier's discriminatory power as the classifiers accuracy heavily depends on the data used to build the model. Multi-label classification is one specific type of classification problem, which has generated an increasing interest in recent years. However, there are no quality measures for multi-label datasets implemented in cluster computing frameworks to evaluate large datasets. This work aims to implement a measure of data quality for multi-label datasets based on Granular Computing under the Apache Spark framework. As a result, it was possible to calculate the values of the quality measure for the datasets, and even in relatively short times.

Sobre el ponente

Ricardo Sánchez Alba

Lic. Ricardo Sánchez Alba

UCLV Flag of Cuba
Información Práctica
English (US)
No definido
30 minutos
No definido
Autores
Carlos morell
Lic. Ricardo Sánchez Alba
Marilyn bello
Rafael bello
Koen vanhoof
Palabras clave
apache spark
multi-label classification
quality measure