Movement Operations¶

Operations like transpose, reshape and squeeze only move the elements to other locations. The number of elements would not change after these operations. The back propagation of a movement is another movement, and more specifically, its inverse operation.

transpose¶

The inverse operation of transpose is transpose.

class auto_diff.OpTranspose(x: auto_diff.op.operation.Operation, axes: Optional[Sequence[int]] = None, **kwargs)[source]¶

Bases: auto_diff.op.operation.Operation

Transpose the tensor.

Basic operation without axes:

\[Y = X^T\]

Partial derivative of a single element:

\[\begin{split}\begin{array}{rcl} \displaystyle \frac{\partial L}{\partial x_{ij}} &=& \displaystyle \sum_{i,j} \frac{\partial L}{\partial y_{ij}} \cdot \frac{\partial y_{ij}}{\partial x_{ij}} \\ &=& \displaystyle \frac{\partial L}{\partial y_{ji}} \cdot \frac{\partial y_{ji}}{\partial x_{ij}} \\ &=& \displaystyle \frac{\partial L}{\partial y_{ji}} \cdot \frac{\partial x_{ij}}{\partial x_{ij}} \\ &=& \displaystyle \frac{\partial L}{\partial y_{ji}} \\ \end{array}\end{split}\]

Matrix derivative:

\[\frac{\partial L}{\partial X} = \left ( \frac{\partial L}{\partial Y} \right )^T\]

Generally, axes should be a permutation of the dimensions, suppose there is a function \(f\) that maps from \((0, 1, \dots, k)\) to the new permutation, then this transpose operation would be:

\[y_{i_1, i_2, \dots, i_k} = x_{f(i_1), f(i_2), \dots, f(i_k)}\]

The partial derivative of \(x_{i_1, i_2, \dots, i_k}\) is 1 only with \(y_{f(i_1)^{-1}, f(i_2)^{-1}, \dots, f(i_k)^{-1}}\). Therefore the derivative should be another transpose operation with inverse mapping function \(f^{-1}\).

__init__(x: auto_diff.op.operation.Operation, axes: Optional[Sequence[int]] = None, **kwargs)[source]¶

Parameters:	x – Input operation. axes – A permutation of dimensions. The dimensions will be reversed if it is None. kwargs – Arguments for parent.