Created
April 29, 2020 17:45
-
-
Save VictorNas/984bdc533f6c337d435285cc1a6e7c08 to your computer and use it in GitHub Desktop.
Epsilon DBSCAN
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def find_epsilon(matrix, min_samples): | |
""" | |
Automatically find epsilon hyperparameter necessary to run DBSCAN. | |
Args: | |
matrix (numpy array): Matrix embbeding. Each row represents a product title in form of a vector. | |
min_samples(int): Should be the same value of the min_samples hyperparameter used in DBSCAN. | |
Returns: | |
return eps(float): Value of episilon hyperparameter. | |
""" | |
# Para cada produto, calcula os min_samples produtos mais proximos utilizando similaridade de cosseno. | |
neigh = NearestNeighbors(n_neighbors = min_samples,metric='cosine').fit(matrix) | |
distances, indices = neigh.kneighbors(matrix) | |
## Calcula a media de distancia de cado produto paras seus min_samples produtos mais proximos | |
mean = np.mean(distances,axis=1) | |
## Calcula o desvio padrao das medias. | |
std = np.std(mean) | |
## Calcula a media das medias | |
mean = np.mean(mean) | |
## O epsilon sera a media das medias mais o desvio padrao das medias. | |
eps= mean + std | |
return eps |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment