In this note(blog) , i will cover some basic ideas and knowledge you are supposed to hold in your mind. This can be quiet useful for cmu’s deep-learning course.

So the layout is as the following:

  1. Pytorch tutorial
  2. Mathematics tutorial

Pytorch

Math

Matrix derivatives

Jacobian matrix explanation (2 aspect: derivative and integral):

I will introduce he main idea of vector function derivative, watch the video to get the idea of integral :)

https://www.youtube.com/watch?v=wCZ1VEmVjVo

Suppose we have , just a vector function, and we want to calculate its derivative. One reasonable idea is to calculate every element in respectively and concatenate them to get a vector. This is feasible, but still remaining a sense of just a little bit to go because we just separately viewing as 2 multivariate functions and get the gradient vector, composing them up to get derivative matrix (in a specified way for matrix multiplication, without other meaning), which looses wholeness and overall perspective. We just get the value, haven’t captured the relations and structure.

So we need to think from the perspective of the whole picture, see the picture bellow to get a feeling of regarding as nonlinear transformation.

image-20221225104006269

Since is a continuous function, let’s zoom in, and we can get the linear picture like this:

image-20221225112702996

Now think of adding a perturbation on point , so what will the output look like? Instead of imaging a hyperplane and getting 1 dim output like multivariate function, we now have 2 dim output. The intuitive idea is that a perturbation in the input space like will cause the output point move along the white and yellow line in the above picture.

The white line corresponds to the first col of the matrix, determining how much changing input will effect the output along the white line (corresponding the axis of input space). The yellow line corresponds to the second col of the matrix, the explanation is similar.

In fact, this matrix is called as the Jacobian matrix. Jacobian matrix is the matrix representing best linear map approximation of near .

Let’s look at an example, suppose we have , what’s the derivative of with respect to ? The ans is , the reason is obvious, because now the Jacobian matrix degenerates into a constant matrix .

这里的结果是而不是的原因是采用了分子布局,在Jacobian有关的资料里是这样用的,但是机器学习里面多采用分母布局,也就是最后答案是,分子布局和分母布局的答案互为转置(如果有Chain rule的应用的话应该是每个项转置而不是连乘的转置,乘积顺序是不变的),原因是在分母布局中规定,也就是说梯度的转置乘才是,而分子布局中则是

分母布局的例子可以看下面:

For any linear equation of the kind , the derivative of with respect to is . The derivative of with respect to is . (We will explain the rationale behind this in class). Also the derivative with respect to a transpose is the transpose of the derivative, so the derivative of with respect to is but the derivative of with respect to is .