Skip to content

Instantly share code, notes, and snippets.

@answerquest
Last active September 22, 2023 04:51
Show Gist options
  • Save answerquest/b0276771776bba010d845157a2c46ed5 to your computer and use it in GitHub Desktop.
Save answerquest/b0276771776bba010d845157a2c46ed5 to your computer and use it in GitHub Desktop.
Grouping lat-long locations geo-spatially into N clusters
# Grouping lat-long locations geo-spatially into N clusters
from sklearn.cluster import KMeans
import pandas as pd
def clusterPoints(df1, N=10, cluster_column='cluster', lat_column='lat', lon_column='lon'):
kmeans = KMeans(n_clusters=N, random_state=None, n_init='auto')
df1[cluster_column] = kmeans.fit_predict(df1[[lat_column, lon_column]].values) + 1
# +1 so that 0,1,2,3..N-1 becomes 1,2,3..N
return
df1 = pd.read_csv('locations.csv')
# split into default 10 clusters
clusterPoints(df1)
print(df1)
# split into 4 clusters
clusterPoints(df1, N=4)
print(df1)
# specify a different cluster cloumn
clusterPoints(df1, cluster_column='group')
print(df1)
# lat-long column names are different
clusterPoints(df1, lat_column='Y', lon_column='X')
print(df1)

Clustering (or grouping) lat-long locations geo-spatially using K-means

Quick utility function in python to geo-spatially cluster a pandas dataframe table carrying latitude-longitude into N clusters

This will take a df and add a cluster column to it (you can change defaults by adding arguments to your function), carrying cluster number 1 to N

There are more customizations to kmeans that I haven't gone into, please explore them at https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans

required python packages installations:

pip install pandas scikit-learn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment