Skip to content

Feature Engineering

Note

Before creating new feature research the domain and do some literature review

Handaling different data

Text Data

Text Processing

Text to Vectors

Bag of Words (BOW)
  • Semantic meaning of word is lost
  • Binary/Boolean BOW
  • Count BOW
  • Uni-gram/ Bi-gram/N-gram
TF-IDF Term Frequency Inverse Documents Frequency
  • Semantic meaning of word is lost Ref1 Ref2
Word2Vec
  • Ways to train a Word2Vec Dictionary: CBOW & Skipgram
  • CBOW predicts focus word given context words using an autoencoder model
  • Skipgram predicts context word given focus word using an autoencoder model with multiple softmax (=number of context words) (Computationally more expensive to train, but can work with smaller data and infrequent words)
  • Algorithmic optimisations for Skipgram and CBOW
    • Hieraarchial Softmax - No of softmax needed = \(log_2(Vocab\ Size)\)
    • Negatice Sampling - Update only a sample (based on frequency) of words including the target word
  • In Word2Vec or Avg Word2Vec, Semantic information is learned by the vector
  • References : Word Embedding | Tensorflow Reference | TFIDF Weighted Word2Vec
Positional Embedding
BERT & DistillBERT
  • Can be used as replacement for word2vec

Note

Transformers are more efficient in parallel processing than LSTMs Reference 1, Reference 2

Categorical Data

Time Series & Wave Form Data

Moving Window

  • Within a Moving Window of width \(w\) calculate mean, std dev, median, quantiles, max, min, max - min, local minimas, local maximas, zero crossing

Fourier Transform

Any repeating pattern with Amplitude \(A\), Time Period \(T\), Frequency \(F\) and phase \(\phi\) can be broken down as sum of sine and cosines waves. Fourier Transform is used to convert a wave from Time domain to Frequency domain which is very insightful in repeating patterns

Oscillating_sine_wave FFT_Time_Frequency_View

Discrete Fourier Transform (DFT)
  1. The input is a sequence of numbers of the original variable (one value per time step)
  2. The output is one value of amplitude (or signal strength) for each frequency. These are represented by the Fourier Coefficients.
  3. This new series is computed using the Fourier formula:
\[X_k = \sum_{n=0}^{N-1} x_ne^{-2 \pi ikn/N}\]

Now we obtain the frequencies that are present in the variable. Each of the values in the outcome series is the strength of a specific frequency. If the amplitude of a frequency is high, then that seasonality is important in the orginal time series (or waves). There exists an Computational optimized form of DFT called Fast Fourier Transform. It is computed using the Cooley-Tukey FFT algorithm.

Graph Data

Node Level Features

Node InDegree and Node OutDegree
Eigenvector Centrality
Clustering Coefficient
DeepWalk
Graph coloring
HOPE
Page Rank

Adar Index Kartz Centrality Shortest Path Connected-component HITS Score

Graph Level Features

Adjacency Matrix
Laplacian Matrix
Bag of Nodes
Weisfeiler-Lehman Kernel
Graphlet Kernels
Path-based Kernels
GraphHopper kernelNeural Message Passing,  Graph Convolutional Networks

Neighbourhood Overlap Features

Local Overlap Measures
Global Overlap Measures

Sequence Data

Image Data

Graph Data

Feature Engineering

  • Feature Binning and Indicator variables
    • Depending on domain knowledge sometime it might be useful to convert real valued / categorical feature to bucketed/bined feature based on some rule
    • To find the Binning thresholds / rule for a real valued variable \(X\) train a decision tree using \(X\) and target \(Y\) and get the threshold from the decision trees (trained on entropy)
  • Interaction Variables :
    • Logical \(N\) way Interaction between \(N\) variables created using Decision Trees of Depth \(N\)
    • Numerical \(N\) way Interaction between \(N\) variables created using Mathamteical Operations between these features
  • Feature Orthogonality : More orthogonal/different/uncorrelated features are better for learning a model
  • Slicing Features : Features that help us divide the data into separate data generating process Eg: Device Feature : {Desktop, Mobile, Tablet} The way a customer would a product on mobile would be different and on computer would be different

References

  1. Regular Expression Blog
  2. Gensim
  3. Sliding Window Discrete Fourier Transform
  4. https://kenndanielso.github.io/mlrefined/