i NTD in AI: DBSCAN · Dark Matter Industries

NTD in AI: DBSCAN

Non-technical definitions in AI

DBSCAN, an acronym for Density-Based Spatial Clustering of Applications with Noise, is an algorithm used in unsupervised learning to find patterns of clustering in data.

Unlike [k-Means], which is a centroid-based algorithm, DBSCAN does not require practitioners to set critical hyperparameters. In contrast, for k-Means the results are sensitive to the number of clusters (the “k” in “k-Means”), which has to be tweaked manually.

DBSCAN instead uses the idea of density rather than centroids to find clusters. The 2 hyperparameters in this algorithm are the minimum distance from a selected point to be in a cluster (typically denoted with $\epsilon$), and the minimum number of points within that distance which are required to be defined as a cluster, n.

With these 2 parameters, the clusters are found iteratively. First an example in the set to be clustered is chosen. The number of members which are below the minimum distance, $\epsilon$, from that point is discovered. If it is greater than n then it becomes the first cluster.

Each member in that first cluster is then tested. If the member has n or more neighbours with distance less than $\epsilon$, then any new members are added to the cluster. If there are less than n neighbours with distance less than $\epsilon then no new examples are added. This is repeated until no more new members can be added to the cluster.

Then, a second example in the dataset not in the first cluster is chosen and the process is repeated. This iterative process is run until all points either belong to a cluster or are outliers. Hence, without labeled examples, clusters in the data can be found.

See also k-Means, clustering, unsupervised learning.


Machine learning is a technical subject and the use of technical terms by engineers have the potential of coming between clear communication with non-engineers, especially in the business setting. In spare moments I started to put together simple, non-technical definitions of nouns and verbs used in the field of machine learning as a kind of Rosetta Stone for non-engineers.This is a work-in-progress which I may collect into a book one day. This is one of those definitions.


Other non-technical definitions:

  1. NTD in AI: 1 of K Encoding
  2. NTD in AI: Activation Function
  3. NTD in AI: Active Learning
  4. NTD in AI: Accuracy
  5. NTD in AI: Autoencoder
  6. NTD in AI: Backward Stepwise Selection
  7. NTD in AI: Bagging
  8. NTD in AI: Batch Normalization
  9. NTD in AI: Bayesian Hyperparameter Optimization
  10. NTD in AI: BERT
  11. NTD in AI: Best Subset Selection
  12. NTD in AI: Bias
  13. NTD in AI: Clustering
  14. NTD in AI: Collaborative Filtering
  15. NTD in AI: Confusion Set Disambiguation
  16. NTD in AI: Convolution Neural Network
  17. NTD in AI: Cosine Similarity
  18. NTD in AI: Cost-Sensitive Accuracy
  19. NTD in AI: Cloze Test
  20. NTD in AI: Credit Assignment Problem
  21. NTD in AI: Data Augmentation
  22. NTD in AI: Data Imputation
  23. NTD in AI: Dataset
  24. NTD in AI: DBSCAN
  25. NTD in AI: Decision Boundary
  26. NTD in AI: Decoder
  27. NTD in AI: Deep Learning
  28. NTD in AI: Denoising Autoencoder
  29. NTD in AI: Density Estimation
  30. NTD in AI: Domain Expert
  31. NTD in AI: Dropout
  32. NTD in AI: Early Stopping
  33. NTD in AI: Embedding
  34. NTD in AI: Encoder
  35. NTD in AI: Ensemble Learning
  36. NTD in AI: Expected Test MSE
  37. NTD in AI: Exploding Gradient
  38. NTD in AI: Feature
  39. NTD in AI: Feature Selection
  40. NTD in AI: Feed Forward Neural Network
  41. NTD in AI: Filter (Matrix)
  42. NTD in AI: Forward Propagation
  43. NTD in AI: Forward Stepwise Selection
  44. NTD in AI: Fully Connected Neural Network Layers
  45. NTD in AI: Fully Visible Belief Network
  46. NTD in AI: Fuzzy Set
  47. NTD in AI: Gated Recurrent Neural Network
  48. NTD in AI: Gaussian Kernel Regression
  49. NTD in AI: Gaussian Mixture Model
  50. NTD in AI: Generalize
  51. NTD in AI: Gradient
  52. NTD in AI: Gradient Boosting
  53. NTD in AI: Gradient Descent
  54. NTD in AI: Grid Search
  55. NTD in AI: Ground Truth
  56. NTD in AI: Hidden Layers
  57. NTD in AI: Hyperbolic Tangent (tanH)
  58. NTD in AI: Hyperparameter
  59. NTD in AI: Input Vectors
  60. NTD in AI: Intrinsic Motivation
  61. NTD in AI: Irreducible Errors
  62. NTD in AI: k-Means
  63. NTD in AI: Kernel (Trick)
  64. NTD in AI: Kernel Regression
  65. NTD in AI: Label/Labeled Examples
  66. NTD in AI: LambdaMART
  67. NTD in AI: Linear Models
  68. NTD in AI: Logistic Regression (Softmax)
  69. NTD in AI: Long Short Term Memory (LSTM)
  70. NTD in AI: Meta-Model
  71. NTD in AI: Manhattan Taxicab Norm
  72. NTD in AI: MNIST
  73. NTD in AI: Model Cards
  74. NTD in AI: Moment Matching
  75. NTD in AI: MP Neuron
  76. NTD in AI: Multi-Label Classification
  77. NTD in AI: Multi-Layer Perceptron
  78. NTD in AI: Munging
  79. NTD in AI: NADE
  80. NTD in AI: Non-Parametric Methods
  81. NTD in AI: Norm
  82. NTD in AI: Observation
  83. NTD in AI: One Class Classification
  84. NTD in AI: One-Hot Encoding
  85. NTD in AI: One Shot Learning
  86. NTD in AI: One Versus Rest
  87. NTD in AI: Oracle
  88. NTD in AI: Overfitting
  89. NTD in AI: Oversampling
  90. NTD in AI: Padding
  91. NTD in AI: Perceptron
  92. NTD in AI: Pooling
  93. NTD in AI: Prediction Strength
  94. NTD in AI: Predictors
  95. NTD in AI: Preprocessing
  96. NTD in AI: Principal Component Analysis (PCA)
  97. NTD in AI: Random Search
  98. NTD in AI: ReLU
  99. NTD in AI: Recurrent Neural Network (RNN)
  100. NTD in AI: ROC Curve
  101. NTD in AI: Semi-Supervised Learning
  102. NTD in AI: Sequence Labeling
  103. NTD in AI: Siamese Neural Network
  104. NTD in AI: SMOTE - Synthetic Minority Oversampling Technique
  105. NTD in AI: Softmax
  106. NTD in AI: Softplus
  107. NTD in AI: Stepwise Selection
  108. NTD in AI: Stride
  109. NTD in AI: Subset Selection
  110. NTD in AI: Supervised Learning
  111. NTD in AI: t-SNE
  112. NTD in AI: Target Vectors
  113. NTD in AI: Training Instance
  114. NTD in AI: Training Set
  115. NTD in AI: Triplet Loss Function
  116. NTD in AI: UMAP - Unifold Manifold Approximation and Projection
  117. NTD in AI: Unary Classification
  118. NTD in AI: Validation Set
  119. NTD in AI: Vanishing Gradient
  120. NTD in AI: Variational Autoencoder
  121. NTD in AI: Volume (Convolution)
  122. NTD in AI: Voting
  123. NTD in AI: WaveNet
  124. NTD in AI: Weak Learners
  125. NTD in AI: Word Embeddings
  126. NTD in AI: word2vec