Data Mining

Color quantization using K-means clustering in ML.NET

When I was looking for K-means use cases, I found out about Color quantization, a very interesting . I implemented it in Python and was wondering whether it would be as easy to implement in ML.NET. All the code is available in this GitHub repository. What is color quantization Color quantization is the usage of quantization, a lossy compression technique, in color spaces in order to reduce the number of unique colors in an image.

Read the post

06 August / Zanid Haytam / Algorithms / Data Mining

Outliers Detection in PySpark #3 – K-means

In parts #1 and #2 of the “Outliers Detection in PySpark” series, I talked about Anomaly Detection, Outliers Detection and the interquartile range (boxplot) method. In this third and last part, I will talk about how one can use the popular K-means clustering algorithm to detect outliers. K-means K-means is one of the easiest and most popular unsupervised algorithms in Machine Learning for Clustering.

Read the post

15 July / Zanid Haytam / Algorithms / Data Mining

Outliers Detection in PySpark #2 – Interquartile Range

In the first part, I talked about what Data Quality, Anomaly Detection and Outliers Detection are and what’s the difference between outliers detection and novelty detection. In this part, I will talk about a very known and easy method to detect outliers called Interquartile Range. Introduction The Interquartile Range method, also known as IQR, was developed by John Widler Turky, an American mathematician best known for development of the FFT algorithm and box plot.

Read the post

21 June / Zanid Haytam / Data Mining

Outliers Detection in PySpark #1 – Intro

These last months, while working on my graduation project, I had the chance to learn a lot about Data Quality, Anomaly Detection and especially Outliers Detection. In these series, I will be explaining what outliers are, the difference between novelty and outliers detection and how we can detect outliers using different algorithms.

Read the post

23 October / Zanid Haytam / Algorithms / Data Mining

Association Rule Mining using Apriori Algorithm

Have you ever wondered how Amazon suggets to us items to buy when we’re looking at a product (labeled as “Frequently bought together”)? For example, when checking a GPU product (e.g. GTX 1080), amazon will tell you that the gpu, i7 cpu and RAM are frequently bought together. Which is true because a lot of people buy their components grouped when building a desktop pc.

Read the post

Category: Data Mining rss

Posts

Color quantization using K-means clustering in ML.NET

Outliers Detection in PySpark #3 – K-means

Outliers Detection in PySpark #2 – Interquartile Range

Outliers Detection in PySpark #1 – Intro

Association Rule Mining using Apriori Algorithm