6.036 Machine learning

We are destined to meet again.

Overview or Big picture?



This note will focus on the topics that mit note doesn’t cover so much, and some great ideas during the implement of HW.

Chapter 4 : Feature representation

The number of features becomes a problem if there are so many of them that some data points could have been identified by just looking at very few of their coordinates. Here it is not the case.


首先是Feature Transformations,包括

  1. Scaling
  2. Encoding Discrete Values
  3. Polynomial Features

其实也就是对应了课上讲的那几种处理raw data的方式的实操。


第一个是Evaluating algorithmic and feature choices for AUTO data,这是最简单的形式,auto data给好了,然后接下来就是选择raw, standard and one_hot三种数据形式, perceptron and average perceptron 两种算法,以及其中参数T的值,最后评测算法eval_classifier and xval_learning_alg

第二个实验是Evaluating algorithmic and feature choices for review data,这个是结合具体案例,是评论系统,也就是自然语言处理,用的是bag-of-words (BOW) approach,还有最后对most positive和negative的选择,很有意思。

最后一个实验是Evaluating features for MNIST data,也就是对应CV的数据集,涉及对于数据集压缩的处理。
