Tensor diagrams

Penrose notation for deep learning.

Piotr Migdał https://p.migdal.pl (Quantum Flytrap)https://quantumgame.io , Claudia Zendejas-Morales http://claudiazm.xyz (Quantum Flytrap)https://quantumflytrap.com
2021-10-19

Multidimenstional arrays are the basic building blocks for any deep neural network, and many other models in statistics and machine learning. However, even for relatively simple operations the notation becomes monstrous:

\[ \sum_{i_1=1}^n \sum_{i_2=1}^n \sum_{i_3=1}^n \sum_{i_4=1}^n p_{i_1} T_{i_1 i_2} O_{i_2 j_1} T_{i_2 i_3} O_{i_3 j_2} T_{i_3 i_4} O_{i_4 j_3} \]

If you are aleary faimiliar with the problem, you may guess that it is probability of observing \((j_1, j_2, j_3)\) sequence in a Hidden Markov Model. If you are not, it takes time and effort to parse, and comprehend the formula. In fact, I made a typo when writing it for the first time.

This notation has some problems, as:

In this expository article we introduce tensor diagram notation in the context of deep learning. That is - we focus on array of real numbers and relevant operations. We present it as a convenient notation for matrix summation, nothing less more more. In this notation, the described equation becomes

In this article we use tensor as in torch.Tensor or TensorFlow - i.e. multidimensional arrays of numebrs with some additional structure. This a very specific case of would a mathematican call a tensor. In this context tensor product is outer product. (A note for mathematical purists and fetishists: here we work in a finite-dimensional Hilbert space over real numbers, in a fixed basis.) Also, in physics tensors have a need to fullfil certain transformation criteria II 02: Differential Calculus of Vector Fields from The Feynman Lectures on Physics:

it is not generally true that any three numbers form a vector. It is true only if, when we rotate the coordinate system, the components of the vector transform among themselves in the correct way.

Background

Tensor diagrams where invented by Penrose(Penrose 1971). For the first contact with tensor diagrams I suggest glaring at the beautiful diagrams(Bradley 2019). If you have some background in quantum mechanics, go for a short introduciton(Biamonte and Bergholm 2017) or a slightly longer(Bridgeman and Chubb 2017).

For a complete introduction I suggest book(Coecke and Kissinger 2017) or lecture notes (Biamonte 2020).

Tensor diagrams are popular quantum state decomposition for condensed matter physicsVerstraete, Murg, and Cirac (2008).

Key parts

Tensors

A scalar \(c\), a vector \(v_i\), a matrix \(M_{ij}\) and a (third-order) tensor \(T_{ijk}\) are represented by:

Each loose end correspond to an index.

Basic operations

Scalar product

A dot product of two vectors - traditionally: \(\vec{v} \cdot \vec{u}\) or in quantum mechanics \(\langle u | v \rangle\).

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{i}\]
\[a_{i}\]
\[b_{i}\]

Matrix-vector multiplication

A vector transformed by a matrix, traditionally \(A \vec{u}\) or in quanutm mechanics \(A | v \rangle\).

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{j}\]
\[A_{ij}\]
\[v_{j}\]

Matrix-matrix multiplication

Multiply matrix by matrix, traditionally: \(A B\).

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{k}\]
\[A_{ik}\]
\[B_{kj}\]

Outer product

An outer product between two vectors. In it commonly used in quantum mechanics and written as \(| u \rangle \langle v |\). In particular for \(| u \rangle = | v \rangle\), we get a projection operator \(| v \rangle \langle v |\).

\[\hspace{0.3cm}=\hspace{0.3cm}\]
\[a_{i}\]
\[b_{j}\]

Tensor product

TODO

Expected value

\(\sum_{ij} v_i M_{ij} v_j\) (or in the context of quantum: \(\langle v | M | v \rangle\))

Trace

TODO

Matrix transpose

TO FIX

\[\hspace{0.3cm}=\hspace{0.3cm}\]
\[B_{ji}\]

Matrix is diagonal

Dot operator

Dot operator

Let’s introduce a symbol for a tensor with all 1s.

In the case of no dimensions, it is just 1, and it is not that interesting. For \(v = (1, 1, \ldots, 1)\). \(M\) is an identity matrix or Kronecker delta. \(T\) is a tensor for which \(T_{ijk} = 1\) if \(i = j = k\), otherwise there are 0s.

Sum

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{ij}\]
\[A_{ij}\]

Column sum

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{i}\]
\[A_{ij}\]

Row sum

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{j}\]
\[A_{ij}\]

Hadamard product

\[\hspace{0.3cm}=\hspace{0.3cm}\]
\[A_{ij}\]
\[B_{ij}\]

Properties

TODO:

Other examples

Scalar product of matrices

TO FIX

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{ij}\]
\[A_{ij}\]
\[B_{ij}\]

Applying a transformation to vectors in a higher-order tensor

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{k}\]
\[T_{ntk}\]
\[W_{kq}\]

Order-4 tensor and project vectors in the 3rd dimension

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{tk}\]
\[T_{ntkm}\]
\[W_{kq}\]

Batch matrix multiplication

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{k}\]
\[A_{ijk}\]
\[B_{ikl}\]

Tensor contraction

TO FIX

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{qr}\]
\[A_{pqrs}\]
\[B_{tuqvr}\]

Bilinear transformation

TODO: dotize “i” (on top)

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{kl}\]
\[A_{ik}\]
\[B_{jkl}\]
\[C_{il}\]

Applications

Machine learning

Linear regression

TODO

Factorization and SVD

TODO

Hidden Markov Model

Hidden Markov Models(Rabiner 1989).

TODO

Restricted Boltzmann Machine

TODO

Deep learning

Tensors for data

In deep learning data is structured on 3-5 dimensional tensor. The most typical dimensions are: * sample number (size: batch size), * features - also called embeddings, * spatial coordinates (1D, 2D or 3D).

The order of the dimensions depends on the deep learning framework and concrete functions. For discussions on that, see PyTorch tensor dimension names.

Let’s focus on a two-dimensional image with RGB color channels for input.

Averages

If we want to average over all samples, e.g. for exploration.

we can sum the following way:

Channel average values will

Mention:

Note:

Batch processing

TODO

Convolutions

TODO

Separable convolution

TODO

Non-linearity

TODO

And maybe Batch normalization

You may have seen tensor diagrams

Feynman Diagrams(Kaiser 2005) (for unbounded operators on infitely-dimensional Hilbert spaces) and quantum computing.

TODO: Do LSTM diagrams qualify?

Conclusion

Tensor diagrams are cool!

Footnote

Written in R Markdown Distill.

Other inspirations:

TODO

Idea:

Biamonte, Jacob. 2020. “Lectures on Quantum Tensor Networks.” arXiv:1912.10049 [Cond-Mat, Physics:math-Ph, Physics:quant-Ph], January. http://arxiv.org/abs/1912.10049.
Biamonte, Jacob, and Ville Bergholm. 2017. “Tensor Networks in a Nutshell.” arXiv:1708.00006 [Cond-Mat, Physics:gr-Qc, Physics:hep-Th, Physics:math-Ph, Physics:quant-Ph], July. http://arxiv.org/abs/1708.00006.
Bradley, Tai-Danae. 2019. “Matrices as Tensor Network Diagrams.” https://www.math3ma.com/blog/matrices-as-tensor-network-diagrams.
Bridgeman, Jacob C, and Christopher T Chubb. 2017. “Hand-Waving and Interpretive Dance: An Introductory Course on Tensor Networks.” Journal of Physics A: Mathematical and Theoretical 50 (22): 223001. https://doi.org/10.1088/1751-8121/aa6dc3.
Coecke, Bob, and Aleks Kissinger. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge, United Kingdom ; New York, NY, USA: Cambridge University Press.
Kaiser, David. 2005. “Physics and Feynman’s Diagrams.” American Scientist 93 (2): 156. https://doi.org/10.1511/2005.52.957.
Orús, Román. 2014. “A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States.” Annals of Physics 349 (October): 117–58. https://doi.org/10.1016/j.aop.2014.06.013.
Penrose, Roger. 1971. “Applications of Negative Dimensional Tensors.” Combinatorial Mathematics and Its Applications, no. 1: 221–44. https://www.mscs.dal.ca/~selinger/papers/graphical-bib/public/Penrose-applications-of-negative-dimensional-tensors.pdf.
Rabiner, L. R. 1989. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” Proceedings of the IEEE 77 (2): 257–86. https://doi.org/10.1109/5.18626.
Verstraete, F., V. Murg, and J. I. Cirac. 2008. “Matrix Product States, Projected Entangled Pair States, and Variational Renormalization Group Methods for Quantum Spin Systems.” Advances in Physics 57 (2): 143–224. https://doi.org/10.1080/14789940801912366.

References