Executive Secretary

2nd International Conference of Information Processing "CIPI - IOTAI 2019" -International Workshop of Internet of Things & Artificial Intelligence

2nd International Conference of Information Processing

CIPI - IOTAI 2019

A Quality Measure for Multi-label Datasets on the Apache Spark Framework

In the last years, the amounts of data have increased considerably and therefore, it is becoming more complex to handle these volumes of information. Measuring the data quality is a pivotal aspect to assess the classifier's discriminatory power as the classifiers accuracy heavily depends on the data used to build the model. Multi-label classification is one specific type of classification problem, which has generated an increasing interest in recent years. However, there are no quality measures for multi-label datasets implemented in cluster computing frameworks to evaluate large datasets. This work aims to implement a measure of data quality for multi-label datasets based on Granular Computing under the Apache Spark framework. As a result, it was possible to calculate the values of the quality measure for the datasets, and even in relatively short times.

In the last years, the amounts of data have increased considerably and therefore, it is becoming more complex to handle these volumes of information. Measuring the data quality is a pivotal aspect to assess the classifier's discriminatory power as the classifiers accuracy heavily depends on the data used to build the model. Multi-label classification is one specific type of classification problem, which has generated an increasing interest in recent years. However, there are no quality measures for multi-label datasets implemented in cluster computing frameworks to evaluate large datasets. This work aims to implement a measure of data quality for multi-label datasets based on Granular Computing under the Apache Spark framework. As a result, it was possible to calculate the values of the quality measure for the datasets, and even in relatively short times.

About The Speaker

Ricardo Sánchez Alba

Lic. Ricardo Sánchez Alba

UCLV Flag of Cuba
Practical Info
English (US)
Not defined
30 minutes
Not defined
Authors
Carlos morell
Lic. Ricardo Sánchez Alba
Marilyn bello
Rafael bello
Koen vanhoof
Keywords
apache spark
multi-label classification
quality measure