Penrose notation for deep learning.
Multidimenstional arrays are the basic building blocks for any deep neural network, and many other models in statistics and machine learning. However, even for relatively simple operations the notation becomes monstrous:
\[ \sum_{i_1=1}^n \sum_{i_2=1}^n \sum_{i_3=1}^n \sum_{i_4=1}^n p_{i_1} T_{i_1 i_2} O_{i_2 j_1} T_{i_2 i_3} O_{i_3 j_2} T_{i_3 i_4} O_{i_4 j_3} \]
If you are aleary faimiliar with the problem, you may guess that it is probability of observing \((j_1, j_2, j_3)\) sequence in a Hidden Markov Model. If you are not, it takes time and effort to parse, and comprehend the formula. In fact, I made a typo when writing it for the first time.
This notation has some problems, as:
In this expository article we introduce tensor diagram notation in the context of deep learning. That is - we focus on array of real numbers and relevant operations. We present it as a convenient notation for matrix summation, nothing less more more. In this notation, the described equation becomes
In this article we use tensor as in torch.Tensor
or TensorFlow - i.e. multidimensional arrays of numebrs with some additional structure. This a very specific case of would a mathematican call a tensor. In this context tensor product is outer product. (A note for mathematical purists and fetishists: here we work in a finite-dimensional Hilbert space over real numbers, in a fixed basis.) Also, in physics tensors have a need to fullfil certain transformation criteria II 02: Differential Calculus of Vector Fields from The Feynman Lectures on Physics:
it is not generally true that any three numbers form a vector. It is true only if, when we rotate the coordinate system, the components of the vector transform among themselves in the correct way.
Tensor diagrams where invented by Penrose(Penrose 1971). For the first contact with tensor diagrams I suggest glaring at the beautiful diagrams(Bradley 2019). If you have some background in quantum mechanics, go for a short introduciton(Biamonte and Bergholm 2017) or a slightly longer(Bridgeman and Chubb 2017).
For a complete introduction I suggest book(Coecke and Kissinger 2017) or lecture notes (Biamonte 2020).
Tensor diagrams are popular quantum state decomposition for condensed matter physicsVerstraete, Murg, and Cirac (2008).
A scalar \(c\), a vector \(v_i\), a matrix \(M_{ij}\) and a (third-order) tensor \(T_{ijk}\) are represented by:
Each loose end correspond to an index.
A dot product of two vectors - traditionally: \(\vec{v} \cdot \vec{u}\) or in quantum mechanics \(\langle u | v \rangle\).
A vector transformed by a matrix, traditionally \(A \vec{u}\) or in quanutm mechanics \(A | v \rangle\).
Multiply matrix by matrix, traditionally: \(A B\).
An outer product between two vectors. In it commonly used in quantum mechanics and written as \(| u \rangle \langle v |\). In particular for \(| u \rangle = | v \rangle\), we get a projection operator \(| v \rangle \langle v |\).
TODO
\(\sum_{ij} v_i M_{ij} v_j\) (or in the context of quantum: \(\langle v | M | v \rangle\))
TODO
TO FIX
Let’s introduce a symbol for a tensor with all 1s.
In the case of no dimensions, it is just 1, and it is not that interesting. For \(v = (1, 1, \ldots, 1)\). \(M\) is an identity matrix or Kronecker delta. \(T\) is a tensor for which \(T_{ijk} = 1\) if \(i = j = k\), otherwise there are 0s.
TODO:
TO FIX
TO FIX
TODO: dotize “i” (on top)
TODO
TODO
Hidden Markov Models(Rabiner 1989).
TODO
TODO
In deep learning data is structured on 3-5 dimensional tensor. The most typical dimensions are: * sample number (size: batch size), * features - also called embeddings, * spatial coordinates (1D, 2D or 3D).
The order of the dimensions depends on the deep learning framework and concrete functions. For discussions on that, see PyTorch tensor dimension names.
Let’s focus on a two-dimensional image with RGB color channels for input.
If we want to average over all samples, e.g. for exploration.
we can sum the following way:
Channel average values will
Mention:
Note:
TODO
TODO
TODO
TODO
And maybe Batch normalization
Feynman Diagrams(Kaiser 2005) (for unbounded operators on infitely-dimensional Hilbert spaces) and quantum computing.
TODO: Do LSTM diagrams qualify?
Tensor diagrams are cool!
Written in R Markdown Distill.
Other inspirations:
Idea: