Methodologies to Improve One-Class Classifier Performance Applied to Multivariate Time Series
Nombre: ANDRÉ PAULO FERREIRA MACHADO
Fecha de publicación: 05/04/2024
Junta de examinadores:
Nombre | Papel |
---|---|
CELSO JOSE MUNARO | Presidente |
GINALBER LUIZ DE OLIVEIRA SERRA | Examinador Externo |
LEANDRO DOS SANTOS COELHO | Examinador Externo |
PATRICK MARQUES CIARELLI | Coorientador |
RICARDO EMANUEL VAZ VARGAS | Examinador Externo |
Páginas
Sumario: This work proposes novel methodologies to improve the performance of one-class classifiers applied to multivariate time series data. The main method is through clustering of multivariate time series. Datasets arising from real processes come from the available sensors and are affected by many factors, such as aging of the process, changes in the operation region, and equipment malfunction. Despite that, one expects that the classes represented by such diverse data can be unveiled via trained classifiers. This work hypothesizes that the overall performance can be improved by training sets of one-class classifiers with subsets of data clustered by similarity, obtained by DTW Barycenter Averaging (DBA) which is used to measure the similarity between the time series and each cluster. The proposed method is applied to one class classifiers since they are trained only with the target class, which is clustered based on time series similarity using Dynamic Time Warping and k-means. Additionally, a second approach is proposed, called time-shift of labels, to improve the differentiation between normal and faulty data. This method is applied during the training phase and focuses on particular situations involving the transition from normality to faulty data, where the boundaries are difficult to differentiate (overlapping data). The time-shift results show a mitigation of the effect of overlapping data. The advantages of the techniques are illustrated through their application to two public datasets one from the oil industry
with instances characterizing eight classes of data represented by five time series (3W dataset), and another from a hydraulic system for the study of typical hydraulic system failures with five classes and seventeen time series (Condition monitoring of hydraulic systems - ICM dataset). For the 3W dataset, seven classes are selected to train Long Short Term Memory (LSTM) classifiers using the variables and instances clustered using time series clustering algorithms. The results demonstrate that increasing the similarity of training data tends to improve the performance of the LSTM classifier, achieving an increase of 10% in the overall performance on the 3W dataset. In a specific case, where the clustering model raised the similarity by 84%, the classification performance improved by 21%. For condition monitoring of hydraulic system data, the proposed method achieved a
significant performance improvement of over 40% compared to the baseline model. Notably, in the specific case of leakage fault, the classification performance improvement rises by 64%.