Movement Operations¶
Operations like transpose, reshape and squeeze only move the elements to other locations. The number of elements would not change after these operations. The back propagation of a movement is another movement, and more specifically, its inverse operation.
transpose¶
The inverse operation of transpose is transpose.
-
class
auto_diff.
OpTranspose
(x: auto_diff.op.operation.Operation, axes: Optional[Sequence[int]] = None, **kwargs)[source]¶ Bases:
auto_diff.op.operation.Operation
Transpose the tensor.
Basic operation without axes:
\[Y = X^T\]Partial derivative of a single element:
\[\begin{split}\begin{array}{rcl} \displaystyle \frac{\partial L}{\partial x_{ij}} &=& \displaystyle \sum_{i,j} \frac{\partial L}{\partial y_{ij}} \cdot \frac{\partial y_{ij}}{\partial x_{ij}} \\ &=& \displaystyle \frac{\partial L}{\partial y_{ji}} \cdot \frac{\partial y_{ji}}{\partial x_{ij}} \\ &=& \displaystyle \frac{\partial L}{\partial y_{ji}} \cdot \frac{\partial x_{ij}}{\partial x_{ij}} \\ &=& \displaystyle \frac{\partial L}{\partial y_{ji}} \\ \end{array}\end{split}\]Matrix derivative:
\[\frac{\partial L}{\partial X} = \left ( \frac{\partial L}{\partial Y} \right )^T\]Generally, axes should be a permutation of the dimensions, suppose there is a function \(f\) that maps from \((0, 1, \dots, k)\) to the new permutation, then this transpose operation would be:
\[y_{i_1, i_2, \dots, i_k} = x_{f(i_1), f(i_2), \dots, f(i_k)}\]The partial derivative of \(x_{i_1, i_2, \dots, i_k}\) is 1 only with \(y_{f(i_1)^{-1}, f(i_2)^{-1}, \dots, f(i_k)^{-1}}\). Therefore the derivative should be another transpose operation with inverse mapping function \(f^{-1}\).