Clustering is a fundamental technique in machine learning and data analysis that involves grouping similar data points together based on certain features or attributes. The goal of clustering is to discover inherent patterns, structures, or relationships within a dataset without the need for explicit labels or classifications. Clustering algorithms attempt to find natural divisions or clusters within the data, with data points within the same cluster being more similar to each other than to those in other clusters.
Types of Clustering Algorithms: There are various clustering algorithms, each with its own approach to grouping data points. Some of the most common algorithms include:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
# Generate synthetic data
data, _ = make_blobs(n_samples=300, centers=4, cluster_std=1.0, random_state=42)
# Visualize the data
plt.scatter(data[:, 0], data[:, 1], s=30)
plt.title("Synthetic Data")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
# Perform K-Means clustering
num_clusters = 4
kmeans = KMeans(n_clusters=num_clusters)
kmeans.fit(data)
# Get cluster assignments and cluster centers
cluster_assignments = kmeans.labels_
cluster_centers = kmeans.cluster_centers_
# Visualize clustering results
plt.scatter(data[:, 0], data[:, 1], c=cluster_assignments, s=30, cmap='viridis')
plt.scatter(cluster_centers[:, 0], cluster_centers[:, 1], c='red', marker='x', s=100, label='Cluster Centers')
plt.title("K-Means Clustering Results")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.show()