NTD in AI: Forward Stepwise Selection

Non-technical definitions in AI

Forward stepwise selection is a method to reduce the number of predictor variables/features in a training data set so as to improve model accuracy. Some predictors may be irrelevant in predicting the outcome we want, or the data set could contain more predictors than there are observations (which will likely lead to overfitting).

Lets say we have 5 predictor variables. In forward stepwise selection, we start with a null hypothesis, that is a model with no predictors. We then train 5 models with 1 variable each added to the null hypothesis. We select the best performing model (say variable {2}), then train that best model with one more of the remaining variables added (hence 4 models {2,1}, {2,3}, {2,4}, {2,5}).

We select the best performing model with 2 variables (say {2,5}), and train the next set of models with another variable selected from the remaining pool of variables.

We stop when some condition has been reached, for instance when no more improvement in some measure of fit such as residual sum of squares is achieved or when the number of variables is equal to the number of observations, whichever comes first.

Related: backward stepwise selection, subset selection.

Machine learning is a technical subject and the use of technical terms by engineers have the potential of coming between clear communication with non-engineers, especially in the business setting. In spare moments I started to put together simple, non-technical definitions of nouns and verbs used in the field of machine learning as a kind of Rosetta Stone for non-engineers.This is a work-in-progress which I may collect into a book one day. This is one of those definitions.

NTD in AI: Forward Stepwise Selection

Other non-technical definitions: