Clasification

The training data comes in pairs. Yet the target values belong to a finite set. The goal is to pick the correct one as often as possible.

  • Analyzing a text to determine if the sentiment is happy or sad.
  • Using hockey players' statistics to predict which of two teams will probably win a match.
  • Analyzing a song to determine if it is classical, techno, pop, or rock.

Binary Classification

There are only two classes to distinguish, typically "positive" or "negative".

Logistic Regression

If x is a Vector

Training

Loss Function

Cost Function

Evaluation

  • Comparing training and testing result. If training accuracy is nearly 100%, it is overfitting.
  • Since testing data is new data, it is used to test the accuracy on completely new data

Confusion Matrix

T = True F = False

Precision: TPos / (TPos + TNeg + TNeu)

Recall: TPos / (TPos / FNeg / FNeu)

F-Measure: 2*precision * recall (precision + recall)

Accuracy: (all true) / (all data)