Thursday, December 11, 2014

Rule of thumb for classification

There are quite a few machine learning classifiers. It is usually hard to say which is better until every one is tried on the given data and performance is measured. However, there are few rules of thumb:
  • Linear classifier is better used when:
    • Sparse data (lot of zeroes in feature vector) 
    • Feature engineering performed, or deep feature learning
    • Up to large datasets (fits one machine)
  • Non-linear or kernel-based classifier is better used when
    • There are only few features (up to tens)
    • Big data - a lot of training examples
Bonus: how to manage imbalanced training set:
  • Evaluation: ROC under PR curve
  • Negative subsampling
  • Weighs for imbalanced classes (also - regularization parameter)

No comments:

Post a Comment