Real Python: K-Means Clustering in Python: A Practical Guide

The k-means clustering method is an unsupervised machine learning technique used to identify clusters of data objects in a dataset. There are many different types of clustering methods, but k-means is one of the oldest and most approachable. These traits make implementing k-means clustering in Python reasonably straightforward, even for novice programmers and data scientists.

If you’re interested in learning how and when to implement k-means clustering in Python, then this is the right place. You’ll walk through an end-to-end example of k-means clustering using Python, from preprocessing the data to evaluating results.

In this tutorial, you’ll learn:

What k-means clustering is
When to use k-means clustering to analyze your data
How to implement k-means clustering in Python with scikit-learn
How to select a meaningful number of clusters

Click the link below to download the code you’ll use to follow along with the examples in this tutorial and implement your own k-means clustering pipeline:

Download the sample code:Click here to get the code you'll use to learn how to write a k-means clustering pipeline in this tutorial.

What Is Clustering?

Clustering is a set of techniques used to partition data into groups, or clusters. Clusters are loosely defined as groups of data objects that are more similar to other objects in their cluster than they are to data objects in other clusters. In practice, clustering helps identify two qualities of data:

Meaningfulness
Usefulness

Meaningful clusters expand domain knowledge. For example, in the medical field, researchers applied clustering to gene expression experiments. The clustering results identified groups of patients who respond differently to medical treatments.

Useful clusters, on the other hand, serve as an intermediate step in a data pipeline. For example, businesses use clustering for customer segmentation. The clustering results segment customers into groups with similar purchase histories, which businesses can then use to create targeted advertising campaigns.

Note: You’ll learn about unsupervised machine learning techniques in this tutorial. If you’re interested in learning more about supervised machine learning techniques, then check out Logistic Regression in Python.

There are many other applications of clustering, such as document clustering and social network analysis. These applications are relevant in nearly every industry, making clustering a valuable skill for professionals working with data in any field.

Overview of Clustering Techniques

You can perform clustering using many different approaches—so many, in fact, that there are entire categories of clustering algorithms. Each of these categories has its own unique strengths and weaknesses. This means that certain clustering algorithms will result in more natural cluster assignments depending on the input data.

Note: If you’re interested in learning about clustering algorithms not mentioned in this section, then check out A Comprehensive Survey of Clustering Algorithms for an excellent review of popular techniques.

Selecting an appropriate clustering algorithm for your dataset is often difficult due to the number of choices available. Some important factors that affect this decision include the characteristics of the clusters, the features of the dataset, the number of outliers, and the number of data objects.

You’ll explore how these factors help determine which approach is most appropriate by looking at three popular categories of clustering algorithms:

Partitional clustering
Hierarchical clustering
Density-based clustering

It’s worth reviewing these categories at a high level before jumping right into k-means. You’ll learn the strengths and weaknesses of each category to provide context for how k-means fits into the landscape of clustering algorithms.

Partitional Clustering

Partitional clustering divides data objects into nonoverlapping groups. In other words, no object can be a member of more than one cluster, and every cluster must have at least one object.

These techniques require the user to specify the number of clusters, indicated by the variable k. Many partitional clustering algorithms work through an iterative process to assign subsets of data points into k clusters. Two examples of partitional clustering algorithms are k-means and k-medoids.

These algorithms are both nondeterministic, meaning they could produce different results from two separate runs even if the runs were based on the same input.

Partitional clustering methods have several strengths:

Read the full article at https://realpython.com/k-means-clustering-python/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Real Python: K-Means Clustering in Python: A Practical Guide

What Is Clustering?

Overview of Clustering Techniques

Partitional Clustering

Read the full article at https://realpython.com/k-means-clustering-python/ »

Trending Articles

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Angry father ordered to compensate daughter’s male friend

Download: Rich Bizzy -Panono Ukwenda (Cover)

Anthony Wahome Biography, Family, Wife and Children

Best 5 Happy Mothers Day Poems For Step Mother

IN COURT: Full list of people sentenced at Northampton Magistrates’ Court

Hyper-V replication "Enabling Replication Failed"

DMG Audio Limitless v1.01 WiN/OSX Incl Patched and Keygen-R2R

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Sri Lankan Actress Nadeesha Hemamali Hot Shoot

Prison officer charged!

Moondru Mudichu 20-07-2016 – Polimer tv Serial

Who’s been sentenced from Corby, Kettering, Ringstead, Rothwell, Rushden,...

Jamani mm nauliza hivi second selection za form five zinatoka lini?

Reply: Betrayal at House on the Hill:: Rules:: Re: Haunt #6 - Spoilers Within

JESSIE ROGERSON ON JULY 10, 20...

Madonna – Behind Me (feat. Guido Dos Santos) – Single [iTunes Plus M4A]

Stories • Goddess Stepmom

Laura Pausini - Platinum Collection (3Cd) (2009) .mp3 - 320 Kbps

Joseph Bradley – Carlisle