Useful Concepts in Machine Learning
Calibration of Models¶
When performing classification we often want not only to predict the class label, but also obtain a probability of the respective label. So after training a model, if this model is returning class labels and not returning the actual probabilities then we train a calibration moldel to calculate the probabilities. - Using \(D_{train}\) learn a function \(f(x)\) and using \(D_{cross \ validation}: \{x_i, y_i\}\) create a table of values \(x_i, \hat{y_i}, y_i\), sorted in increasing order of \(\hat{y_i}\) - Break the table into \(k\) chunks of size \(m\) and calculate \(mean \ y_i \ \forall \ j \ \epsilon \ k\) and call it \({y_{mean}^j}\) & \(mean \ \hat{y_i} \ \forall \ j \ \epsilon \ k\) and call it \({\hat{y}_{mean}^j}\) for each chunk - \(D_{calibration} : \{\ \hat{y}_{mean}^j, y_{mean}^j \}\) , Calibration Plot : \(y_{axis} = y_{mean}^j, \ x_{axis} = \ \hat{y}_{mean}^j\) - Now a Calibration function is trained to map \(\hat{y}_{mean}^j\) to \(y_{mean}^j\) (where \(y_{mean}^j\) is the probability if positive class) - Platt Scaling Callibration : Works only if the calibration plot looks like sigmoid - Isotonic Regression Callibration : Learns Piece wise linear models, Works in almost all cases but needs more data than plat scaling - CalibratedClassifierCV - Probability calibration of classifiers - Predicting Good Probabilities With Supervised Learning
Random Sampling Consensus (RANSAC)¶
- Get a Random sample from \(D_{Train}\) call it \(D_0\) and build a Model \(M_0\) using \(D_0\)
- Compute outliers dataset \(O_o\) using abolute error based on the \(M_0\) prediction
- Now get the filtered data \(D_{train}^1 = D_{Train} - O_o\)
- Repeat the above steps to get \(D_{train}^2, D_{train}^3 ...\)
- When the \(M_i\) and \(M_{i+1}\) are very same then stop iterating and the \(M_{i+1}\) is a very robust model
Loss Minimisation Framework¶
Loss functions for classification
gray : Zero-one loss
green: Savage loss
orange: Logistic loss
purple: Exponential loss
brown: Tangent loss
blue: Square loss
- Logistic Regression : Logistic Loss (approximate of 0-1 loss) + Regulariser
- Linear Regression : Linear Loss + Regulariser
- SVM Regression : Hinge Loss + Regulariser
Hinge Loss : \(max(0, 1- y_i(w^Tx_i+b)) = \zeta_i\)
Overfitting, Underfitting, Variance, Bias and Generalisation¶
In general Overfitting results in High Variance Underfitting results in High Bias
If the data has high number of a constant value in prediction like : maximum 0s then Overfitting can result in High Bias
ToDo¶
- A/B Testing
- A-A-B Testing
- VC_dimension)