6.036 Machine learning

We are destined to meet again.

Overview or Big picture?

/loading

Introduction

This note will focus on the topics that mit note doesn’t cover so much, and some great ideas during the implement of HW.

Chapter 4 : Feature representation

The number of features becomes a problem if there are so many of them that some data points could have been identified by just looking at very few of their coordinates. Here it is not the case.

这部分的HW真的很棒,不同data形式对于最后的accuracy的影响的尝试与分析。

首先是Feature Transformations,包括

  1. Scaling
  2. Encoding Discrete Values
  3. Polynomial Features

其实也就是对应了课上讲的那几种处理raw data的方式的实操。

接下来是Experiments,结合之前所学的应用,非常棒,具体对应的数据集和解释在lab3中有。

第一个是Evaluating algorithmic and feature choices for AUTO data,这是最简单的形式,auto data给好了,然后接下来就是选择raw, standard and one_hot三种数据形式, perceptron and average perceptron 两种算法,以及其中参数T的值,最后评测算法eval_classifier and xval_learning_alg

第二个实验是Evaluating algorithmic and feature choices for review data,这个是结合具体案例,是评论系统,也就是自然语言处理,用的是bag-of-words (BOW) approach,还有最后对most positive和negative的选择,很有意思。

最后一个实验是Evaluating features for MNIST data,也就是对应CV的数据集,涉及对于数据集压缩的处理。

其实进一步来说,个人觉得,load_data到最后处理data成向量的部分,虽然以及提供了implement,但是还是有必要去掌握实现的,后期应该会完成~