Useful Concepts in Machine Learning
Calibration of Models¶
When performing classification we often want not only to predict the class label, but also obtain a probability of the respective label. So after training a model, if this model is returning class labels and not returning the actual probabilities then we train a calibration moldel to calculate the probabilities.
- Using \(D_{train}\) learn a function \(f(x)\) and using \(D_{cross \ validation}: \{x_i, y_i\}\) create a table of values \(x_i, \hat{y_i}, y_i\), sorted in increasing order of \(\hat{y_i}\)
- Break the table into \(k\) chunks of size \(m\) and calculate \(mean \ y_i \ \forall \ j \ \epsilon \ k\) and call it \({y_{mean}^j}\) & \(mean \ \hat{y_i} \ \forall \ j \ \epsilon \ k\) and call it \({\hat{y}_{mean}^j}\) for each chunk
- \(D_{calibration} : \{\ \hat{y}_{mean}^j, y_{mean}^j \}\) , Calibration Plot : \(y_{axis} = y_{mean}^j, \ x_{axis} = \ \hat{y}_{mean}^j\)
- Now a Calibration function is trained to map \(\hat{y}_{mean}^j\) to \(y_{mean}^j\) (where \(y_{mean}^j\) is the probability if positive class)
- Platt Scaling Callibration : Works only if the calibration plot looks like sigmoid
- Isotonic Regression Callibration : Learns Piece wise linear models, Works in almost all cases but needs more data than plat scaling
- CalibratedClassifierCV
- Probability calibration of classifiers
- Predicting Good Probabilities With Supervised Learning
Random Sampling Consensus (RANSAC)¶
- Get a Random sample from \(D_{Train}\) call it \(D_0\) and build a Model \(M_0\) using \(D_0\)
- Compute outliers dataset \(O_o\) using abolute error based on the \(M_0\) prediction
- Now get the filtered data \(D_{train}^1 = D_{Train} - O_o\)
- Repeat the above steps to get \(D_{train}^2, D_{train}^3 ...\)
- When the \(M_i\) and \(M_{i+1}\) are very same then stop iterating and the \(M_{i+1}\) is a very robust model
Loss Minimisation Framework¶
Loss functions for classification
gray : Zero-one loss
green: Savage loss
orange: Logistic loss
purple: Exponential loss
brown: Tangent loss
blue: Square loss
- Logistic Regression : Logistic Loss (approximate of 0-1 loss) + Regulariser
- Linear Regression : Linear Loss + Regulariser
- SVM Regression : Hinge Loss + Regulariser
Hinge Loss : \(max(0, 1- y_i(w^Tx_i+b)) = \zeta_i\)
Overfitting, Underfitting, Variance, Bias and Generalisation¶
In general Overfitting results in High Variance Underfitting results in High Bias
If the data has high number of a constant value in prediction like : maximum 0s then Overfitting can result in High Bias
ToDo¶
- A/B Testing
- A-A-B Testing
- VC_dimension)