II Conferencia Internacional de Procesamiento de la Información CIPI - IOTAI2019
International Workshop on Internet of Things and Artificial Intelligence
II Conferencia Internacional de Procesamiento de la Información CIPI - IOTAI2019 International Workshop on Internet of Things and Artificial Intelligence24/06/2019 12:00 a 28/06/2019 22:00 Hora de Cuba
Cayos de Villa Clara,
Cuba
A Quality Measure for Multi-label Datasets on the Apache Spark Framework
Propuesto por Lic. Ricardo Sánchez AlbaResumen
In the last years, the amounts of data have increased considerably and therefore, it is becoming more complex to handle these volumes of information. Measuring the data quality is a pivotal aspect to assess the classifier's discriminatory power as the classifiers accuracy heavily depends on the data used to build the model. Multi-label classification is one specific type of classification problem, which has generated an increasing interest in recent years. However, there are no quality measures for multi-label datasets implemented in cluster computing frameworks to evaluate large datasets. This work aims to implement a measure of data quality for multi-label datasets based on Granular Computing under the Apache Spark framework. As a result, it was possible to calculate the values of the quality measure for the datasets, and even in relatively short times.