NTD in AI: DBSCAN
Non-technical definitions in AI
DBSCAN, an acronym for Density-Based Spatial Clustering of Applications with Noise, is an algorithm used in unsupervised learning to find patterns of clustering in data.
Unlike [k-Means], which is a centroid-based algorithm, DBSCAN does not require practitioners to set critical hyperparameters. In contrast, for k-Means the results are sensitive to the number of clusters (the “k” in “k-Means”), which has to be tweaked manually.
DBSCAN instead uses the idea of density rather than centroids to find clusters. The 2 hyperparameters in this algorithm are the minimum distance from a selected point to be in a cluster (typically denoted with $\epsilon$), and the minimum number of points within that distance which are required to be defined as a cluster, n.
With these 2 parameters, the clusters are found iteratively. First an example in the set to be clustered is chosen. The number of members which are below the minimum distance, $\epsilon$, from that point is discovered. If it is greater than n then it becomes the first cluster.
Each member in that first cluster is then tested. If the member has n or more neighbours with distance less than $\epsilon$, then any new members are added to the cluster. If there are less than n neighbours with distance less than $\epsilon then no new examples are added. This is repeated until no more new members can be added to the cluster.
Then, a second example in the dataset not in the first cluster is chosen and the process is repeated. This iterative process is run until all points either belong to a cluster or are outliers. Hence, without labeled examples, clusters in the data can be found.
See also k-Means, clustering, unsupervised learning.
Machine learning is a technical subject and the use of technical terms by engineers have the potential of coming between clear communication with non-engineers, especially in the business setting. In spare moments I started to put together simple, non-technical definitions of nouns and verbs used in the field of machine learning as a kind of Rosetta Stone for non-engineers.This is a work-in-progress which I may collect into a book one day. This is one of those definitions.
Other non-technical definitions:
- NTD in AI: 1 of K Encoding
- NTD in AI: Activation Function
- NTD in AI: Active Learning
- NTD in AI: Accuracy
- NTD in AI: Autoencoder
- NTD in AI: Backward Stepwise Selection
- NTD in AI: Bagging
- NTD in AI: Batch Normalization
- NTD in AI: Bayesian Hyperparameter Optimization
- NTD in AI: BERT
- NTD in AI: Best Subset Selection
- NTD in AI: Bias
- NTD in AI: Clustering
- NTD in AI: Collaborative Filtering
- NTD in AI: Confusion Set Disambiguation
- NTD in AI: Convolution Neural Network
- NTD in AI: Cosine Similarity
- NTD in AI: Cost-Sensitive Accuracy
- NTD in AI: Cloze Test
- NTD in AI: Credit Assignment Problem
- NTD in AI: Data Augmentation
- NTD in AI: Data Imputation
- NTD in AI: Dataset
- NTD in AI: DBSCAN
- NTD in AI: Decision Boundary
- NTD in AI: Decoder
- NTD in AI: Deep Learning
- NTD in AI: Denoising Autoencoder
- NTD in AI: Density Estimation
- NTD in AI: Domain Expert
- NTD in AI: Dropout
- NTD in AI: Early Stopping
- NTD in AI: Embedding
- NTD in AI: Encoder
- NTD in AI: Ensemble Learning
- NTD in AI: Expected Test MSE
- NTD in AI: Exploding Gradient
- NTD in AI: Feature
- NTD in AI: Feature Selection
- NTD in AI: Feed Forward Neural Network
- NTD in AI: Filter (Matrix)
- NTD in AI: Forward Propagation
- NTD in AI: Forward Stepwise Selection
- NTD in AI: Fully Connected Neural Network Layers
- NTD in AI: Fully Visible Belief Network
- NTD in AI: Fuzzy Set
- NTD in AI: Gated Recurrent Neural Network
- NTD in AI: Gaussian Kernel Regression
- NTD in AI: Gaussian Mixture Model
- NTD in AI: Generalize
- NTD in AI: Gradient
- NTD in AI: Gradient Boosting
- NTD in AI: Gradient Descent
- NTD in AI: Grid Search
- NTD in AI: Ground Truth
- NTD in AI: Hidden Layers
- NTD in AI: Hyperbolic Tangent (tanH)
- NTD in AI: Hyperparameter
- NTD in AI: Input Vectors
- NTD in AI: Intrinsic Motivation
- NTD in AI: Irreducible Errors
- NTD in AI: k-Means
- NTD in AI: Kernel (Trick)
- NTD in AI: Kernel Regression
- NTD in AI: Label/Labeled Examples
- NTD in AI: LambdaMART
- NTD in AI: Linear Models
- NTD in AI: Logistic Regression (Softmax)
- NTD in AI: Long Short Term Memory (LSTM)
- NTD in AI: Meta-Model
- NTD in AI: Manhattan Taxicab Norm
- NTD in AI: MNIST
- NTD in AI: Model Cards
- NTD in AI: Moment Matching
- NTD in AI: MP Neuron
- NTD in AI: Multi-Label Classification
- NTD in AI: Multi-Layer Perceptron
- NTD in AI: Munging
- NTD in AI: NADE
- NTD in AI: Non-Parametric Methods
- NTD in AI: Norm
- NTD in AI: Observation
- NTD in AI: One Class Classification
- NTD in AI: One-Hot Encoding
- NTD in AI: One Shot Learning
- NTD in AI: One Versus Rest
- NTD in AI: Oracle
- NTD in AI: Overfitting
- NTD in AI: Oversampling
- NTD in AI: Padding
- NTD in AI: Perceptron
- NTD in AI: Pooling
- NTD in AI: Prediction Strength
- NTD in AI: Predictors
- NTD in AI: Preprocessing
- NTD in AI: Principal Component Analysis (PCA)
- NTD in AI: Random Search
- NTD in AI: ReLU
- NTD in AI: Recurrent Neural Network (RNN)
- NTD in AI: ROC Curve
- NTD in AI: Semi-Supervised Learning
- NTD in AI: Sequence Labeling
- NTD in AI: Siamese Neural Network
- NTD in AI: SMOTE - Synthetic Minority Oversampling Technique
- NTD in AI: Softmax
- NTD in AI: Softplus
- NTD in AI: Stepwise Selection
- NTD in AI: Stride
- NTD in AI: Subset Selection
- NTD in AI: Supervised Learning
- NTD in AI: t-SNE
- NTD in AI: Target Vectors
- NTD in AI: Training Instance
- NTD in AI: Training Set
- NTD in AI: Triplet Loss Function
- NTD in AI: UMAP - Unifold Manifold Approximation and Projection
- NTD in AI: Unary Classification
- NTD in AI: Validation Set
- NTD in AI: Vanishing Gradient
- NTD in AI: Variational Autoencoder
- NTD in AI: Volume (Convolution)
- NTD in AI: Voting
- NTD in AI: WaveNet
- NTD in AI: Weak Learners
- NTD in AI: Word Embeddings
- NTD in AI: word2vec