Skip to content
Posts
In parts #1 and #2 of the “Outliers Detection in PySpark” series, I talked about Anomaly Detection, Outliers Detection and the interquartile range (boxplot) method. In this third and last part, I will talk about how one can use the popular K-means clustering algorithm to detect outliers.
K-means K-means is one of the easiest and most popular unsupervised algorithms in Machine Learning for Clustering.