Executive Secretary

2nd International Symposium on "Generation and Transfer of Knowledge for Digital Transformation"

SITIC2023

Development of a new distributed method for outliers detection based on the Alternating Direction Method of Multipliers

Abstract

In data mining, one of the most studied problems is anomaly detection, i.e. finding "rare" data within a set that are suspected of being generated by a mechanism different from the rest of the set. Anomaly detection finds application in the discovery of novel information, bank fraud and system intrusions. It should not only be seen from the point of view of data cleaning, but also allows the discovery of information that may be of interest. Problems: Nowadays, the handling of large volumes of data (big data) represents a challenge for anomaly detection algorithms, the resources of a single computer may be insufficient for the efficiency of a given algorithm, in addition to the fact that data sets, due to their increasing magnitude, are frequently stored in distributed environments. It is proposed as an objective: To develop a new distributed anomaly detection algorithm based on solving the Support Vector Data Description using the Method of Multipliers with Alternate Directions. This method, kernel and dimensionality reduction methods, and Support Vector Data Description are used. The contribution of this work is to obtain a new distributed method of anomaly detection that will be effective in front of the existing ones, particularly in large volumes of data. As conclusions, it is possible to obtain the modeling and code of a new method for the detection of anomalies and the results obtained on the test data sets are analyzed.

Resumen

En la minería de datos uno de los problemas más estudiados es la detección de anomalías, es decir encontrar datos “raros” dentro de un conjunto que resultan sospechosos de ser generados por un mecanismo distinto al resto del conjunto. La detección de anomalías encuentra aplicación en el descubrimiento de información novedosa, fraudes bancarios e intrusiones en sistemas. No solo debe verse desde el punto de vista de la limpieza de datos, sino que permite descubrir información que puede resultar de interés. Como problemática se tiene: En la actualidad, el manejo de grandes volúmenes de datos(big data), representa un desafío para los algoritmos de detección de anomalías, los recursos de un solo ordenador pueden ser insuficientes para la eficiencia de un determinado algoritmo, además de que los conjuntos de datos, por su creciente magnitud, se almacenan frecuentemente en ambientes distribuidos. Se propone como objetivo: Desarrollar un nuevo algoritmo distribuido de detección de anomalías basado en solucionar la Descripcion de Datos por Vectores Soporte utilizando el Método de los Multiplicadores con Direcciones Alternadas. Se emplea dicho método, métodos kernel y de reducción de dimensionalidad, y la Descripción de Datos por Vectores Soporte. El aporte de este trabajo es la obtención de un nuevo método distribuido de detección de anomalías que resulte eficaz ante los existentes, en particular en grandes volúmenes de datos. Como conclusiones se logra obtener la modelación e implementación del método propuesto, y se analizan los resultados obtenidos en los conjuntos de datos de prueba.