NTD in AI: BERT
Non-technical definitions in AI
BERT is an acronym for Bidirectional Encoder Representations from Transformers. It is a language representation model built by Google which is used for natural language processing tasks.
It was built to be able to predict missing words in a sentence. Google engineers created the massive labeled training set by randomly masking 15% of the words in their data, which was 2.5 billion words from the English Wikipedia and 800 million words from a database called the BooksCorpus (Devlin, et al, 2019). This is called a Masked Language Model.
When BERT was first presented it was far superior to other such models like GloVe because, it was able to discern the context of a sentence from the surrounding words before choosing the right words to fill in the blanks. For instance the word “pen” can be both a noun (the writing instrument) and a verb (“to pen a novel”), which models like GloVe are unable to finesse.
This ability is due to its architecture which its name describes.
It is “Bidrectional” because this neural network is computed both in the forward direction and backwards. Neural networks computed in a single direction soon lose information on earlier elements. In natural language processing, this means that information contained earlier in a sentence becomes less prominent, essentially “forgotten”. This is important because the model could lose the context of a sentence. Processing in both directions was a very successful strategy invented to overcome this problem (Schuster & Paliwal, 1997).
“Encoder Representations” describes the function of converting the input data, in BERT’s case it is sentences, into a form or representation that computers understand, here a matrix of numbers.
Finally, “Transformer” refers to an architecture that converts/encodes an input into the representations mentioned above followed by taking those encodings and producing another output. The original transformer was a language translation model, hence the input would be, for instance, an English sentence, which is then transformed to a German sentence. (Vaswani, et al, 2017) BERT uses only the encoder part of the architecture hence the emphasis on “ER” above.
Machine learning is a technical subject and the use of technical terms by engineers have the potential of coming between clear communication with non-engineers, especially in the business setting. In spare moments I started to put together simple, non-technical definitions of nouns and verbs used in the field of machine learning as a kind of Rosetta Stone for non-engineers.This is a work-in-progress which I may collect into a book one day. This is one of those definitions.
Other non-technical definitions:
- NTD in AI: 1 of K Encoding
- NTD in AI: Activation Function
- NTD in AI: Active Learning
- NTD in AI: Accuracy
- NTD in AI: Autoencoder
- NTD in AI: Backward Stepwise Selection
- NTD in AI: Bagging
- NTD in AI: Batch Normalization
- NTD in AI: Bayesian Hyperparameter Optimization
- NTD in AI: BERT
- NTD in AI: Best Subset Selection
- NTD in AI: Bias
- NTD in AI: Clustering
- NTD in AI: Collaborative Filtering
- NTD in AI: Confusion Set Disambiguation
- NTD in AI: Convolution Neural Network
- NTD in AI: Cosine Similarity
- NTD in AI: Cost-Sensitive Accuracy
- NTD in AI: Cloze Test
- NTD in AI: Credit Assignment Problem
- NTD in AI: Data Augmentation
- NTD in AI: Data Imputation
- NTD in AI: Dataset
- NTD in AI: DBSCAN
- NTD in AI: Decision Boundary
- NTD in AI: Decoder
- NTD in AI: Deep Learning
- NTD in AI: Denoising Autoencoder
- NTD in AI: Density Estimation
- NTD in AI: Domain Expert
- NTD in AI: Dropout
- NTD in AI: Early Stopping
- NTD in AI: Embedding
- NTD in AI: Encoder
- NTD in AI: Ensemble Learning
- NTD in AI: Expected Test MSE
- NTD in AI: Exploding Gradient
- NTD in AI: Feature
- NTD in AI: Feature Selection
- NTD in AI: Feed Forward Neural Network
- NTD in AI: Filter (Matrix)
- NTD in AI: Forward Propagation
- NTD in AI: Forward Stepwise Selection
- NTD in AI: Fully Connected Neural Network Layers
- NTD in AI: Fully Visible Belief Network
- NTD in AI: Fuzzy Set
- NTD in AI: Gated Recurrent Neural Network
- NTD in AI: Gaussian Kernel Regression
- NTD in AI: Gaussian Mixture Model
- NTD in AI: Generalize
- NTD in AI: Gradient
- NTD in AI: Gradient Boosting
- NTD in AI: Gradient Descent
- NTD in AI: Grid Search
- NTD in AI: Ground Truth
- NTD in AI: Hidden Layers
- NTD in AI: Hyperbolic Tangent (tanH)
- NTD in AI: Hyperparameter
- NTD in AI: Input Vectors
- NTD in AI: Intrinsic Motivation
- NTD in AI: Irreducible Errors
- NTD in AI: k-Means
- NTD in AI: Kernel (Trick)
- NTD in AI: Kernel Regression
- NTD in AI: Label/Labeled Examples
- NTD in AI: LambdaMART
- NTD in AI: Linear Models
- NTD in AI: Logistic Regression (Softmax)
- NTD in AI: Long Short Term Memory (LSTM)
- NTD in AI: Meta-Model
- NTD in AI: Manhattan Taxicab Norm
- NTD in AI: MNIST
- NTD in AI: Model Cards
- NTD in AI: Moment Matching
- NTD in AI: MP Neuron
- NTD in AI: Multi-Label Classification
- NTD in AI: Multi-Layer Perceptron
- NTD in AI: Munging
- NTD in AI: NADE
- NTD in AI: Non-Parametric Methods
- NTD in AI: Norm
- NTD in AI: Observation
- NTD in AI: One Class Classification
- NTD in AI: One-Hot Encoding
- NTD in AI: One Shot Learning
- NTD in AI: One Versus Rest
- NTD in AI: Oracle
- NTD in AI: Overfitting
- NTD in AI: Oversampling
- NTD in AI: Padding
- NTD in AI: Perceptron
- NTD in AI: Pooling
- NTD in AI: Prediction Strength
- NTD in AI: Predictors
- NTD in AI: Preprocessing
- NTD in AI: Principal Component Analysis (PCA)
- NTD in AI: Random Search
- NTD in AI: ReLU
- NTD in AI: Recurrent Neural Network (RNN)
- NTD in AI: ROC Curve
- NTD in AI: Semi-Supervised Learning
- NTD in AI: Sequence Labeling
- NTD in AI: Siamese Neural Network
- NTD in AI: SMOTE - Synthetic Minority Oversampling Technique
- NTD in AI: Softmax
- NTD in AI: Softplus
- NTD in AI: Stepwise Selection
- NTD in AI: Stride
- NTD in AI: Subset Selection
- NTD in AI: Supervised Learning
- NTD in AI: t-SNE
- NTD in AI: Target Vectors
- NTD in AI: Training Instance
- NTD in AI: Training Set
- NTD in AI: Triplet Loss Function
- NTD in AI: UMAP - Unifold Manifold Approximation and Projection
- NTD in AI: Unary Classification
- NTD in AI: Validation Set
- NTD in AI: Vanishing Gradient
- NTD in AI: Variational Autoencoder
- NTD in AI: Volume (Convolution)
- NTD in AI: Voting
- NTD in AI: WaveNet
- NTD in AI: Weak Learners
- NTD in AI: Word Embeddings
- NTD in AI: word2vec