Abstract

Outlier detection is studied and applied in many domains. Outliers arise due to different reasons such as fraudulent activities, structural defects, health problems, and mechanical issues. The detection of outliers is a challenging task that can reveal system faults, fraud, and save people's lives. Outlier detection techniques are often domain-specific. The main challenge in outlier detection relates to modelling the normal behaviour in order to identify abnormalities. The choice of model is important, i.e., an unsuitable data model can lead to poor results. This requires a good understanding and interpretation of the data, the constraints, and requirements of the domain problem. Outlier detection is largely an unsupervised problem due to unavailability of labeled data and the fact that labeled data is expensive.

In this thesis, we study and apply a combination of both machine learning and data mining techniques to build data-driven and domain-oriented outlier detection models. We focus on three real-world application domains: maritime surveillance, district heating, and online media and sequence datasets. We show the importance of data preprocessing as well as feature selection in building suitable methods for data modelling. We take advantage of both supervised and unsupervised techniques to create hybrid methods.

More specifically, we propose a rule-based anomaly detection system using open data for the maritime surveillance domain. We exploit sequential pattern mining for identifying contextual and collective outliers in online media data. We propose a minimum spanning tree clustering technique for detection of groups of outliers in online media and sequence data. We develop a few higher order mining approaches for identifying manual changes and deviating behaviours in the heating systems at the building level. The proposed approaches are shown to be capable of explaining the underlying properties of the detected outliers. This can facilitate domain experts in narrowing down the scope of analysis and understanding the reasons of such anomalous behaviours. We also investigate the reproducibility of the proposed models in similar application domains.

PhD dissertation description and author contribution

This thesis consists of seven papers. In Paper I, the author has been one of the main drivers. While in the next six papers, he has been the main driver. The studies in all papers have been developed and designed under the guidance of the supervisors and domain experts. The formatting of the published papers included in this thesis has been changed to achieve a consistent style.

The final version of the dissertation can be downloaded as a [PDF] (version: 2020-10-23).
The complete dissertation can be downloaded as a [PDF] (version: 2020-10-12).

Included papers

  1. Kazemi, S., Abghari, S., Lavesson, N., Johnson, H., & Ryman, P. "Open data for anomaly detection in maritime surveillance". Expert Systems with Applications. 2013; 40(14), pp. 5719-5729. DOI:10.1016/J.ESWA.2013.04.029 - [ScienceDirect]


  2. Abghari, S., Boeva, V., Lavesson, N., Grahn, H., Gustafsson, J., & Shaikh, J. "Outlier detection for video session data using sequential pattern mining". In Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining: Workshop On Outlier Detection De-constructed, 2018, London, UK. [ODD v5.0 Workshop]


  3. Abghari, S., Boeva, V., Lavesson, N., Grahn, H., Ickin, S., and Gustafsson, J. "A minimum spanning tree clustering approach for outlier detection in event sequences". In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 1123-1130). DOI: 10.1109/ICMLA.2018.00182 - [IEEE]


  4. Abghari, S., Garcia-Martin, E., Johansson, C., Lavesson, N., & Grahn, H. "Trend analysis to automatically identify heat program changes". Energy Procedia. 2017; 116, pp. 407-415. DOI:10.1016/J.EGYPRO.2017.05.088 - [ScienceDirect]

    The paper was presented at the 2016 15th International Symposium on District Heating and Cooling, Seoul, Korea.


  5. Abghari, S., Boeva, V., Brage, J., & Johansson, C. "District heating substation behaviour modelling for annotating the performance. In Cellier P., Driessens K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham. DOI: 10.1007/978-3-030-43887-6_1 - [Springer]


  6. Abghari, S., Boeva, V., Brage, J., & Grahn, H. "Multi-view clustering analyses for district heating substations". In 2020 9th International Conference on Data Science, Technology and Applications (DATA). vol 1. SciTePress, ISBN: 978-989-758-440-4, pp 158-168. DOI: 10.5220/0009780001580168 - [SciTePress]


  7. Abghari, S., Boeva, V., Brage, J., & Grahn, H. "A higher order mining approach for the analysis of real-world datasets". Energies. 2020, 13(21):5781. DOI:10.3390/en13215781 - [MDPI] (The paper is an extention of Paper 10.).


Related papers

  1. Abghari, S., Boeva, V., Lavesson, Gustafsson, J., Shaikh, J., & Grahn, H. "Anomaly Detection in Video Session Data". In 2017 5th Swedish Workshop on Data Science (SweDS). [SweDS 2017 Workshop]


  2. Abghari, S., Boeva, V., Lavesson, N., Grahn, H., Ickin, S., & Gustafsson, J. "A Minimum Spanning Tree Clustering Approach for Mining Sequence Datasets". In 2018 6th Swedish Workshop on Data Science (SweDS). [SweDS 2018 Workshop]


  3. Abghari, S., Boeva, V., Brage, J., & Grahn, H. "Higher order mining for monitoring district seating substations". In 2019 6th IEEE International Conference on Data Science and Advanced Analytics (DSAA) (pp. 382-391). DOI: 10.1109/DSAA.2019.00053 - [IEEE]


  4. Abghari, S., Boeva, V., Brage, J., Johansson, C., Grahn, H., & Lavesson, N. "Monitoring district heating substations via clustering analysis". In 2019 31st Swedish AI Society Workshop (SAIS). [SAIS 2019 Workshop] - [PDF]


  5. Eghbalian, A., Abghari, S., Boeva, V., & Basiri, F. "Multi-view data mining approach for behaviour analysis ofsmart control valve". In 2020 19th IEEE InternationalConference on Machine Learning and Applications (ICMLA) (pp. 1238-1245). DOI: 10.1109/ICMLA51294.2020.00195 - [IEEE]