Tensor diagrams

Multidimenstional arrays are the basic building blocks for any deep neural network, and many other models in statistics and machine learning. However, even for relatively simple operations the notation becomes monstrous:

\[ \sum_{i_1=1}^n \sum_{i_2=1}^n \sum_{i_3=1}^n \sum_{i_4=1}^n p_{i_1} T_{i_1 i_2} O_{i_2 j_1} T_{i_2 i_3} O_{i_3 j_2} T_{i_3 i_4} O_{i_4 j_3} \]

If you are aleary faimiliar with the problem, you may guess that it is probability of observing \((j_1, j_2, j_3)\) sequence in a Hidden Markov Model. If you are not, it takes time and effort to parse, and comprehend the formula. In fact, I made a typo when writing it for the first time.

This notation has some problems, as:

We introduce indices just to sum over them.
We need to manually inspect which indices repeat (and how many times).
There is no overview over the struture.

In this expository article we introduce tensor diagram notation in the context of deep learning. That is - we focus on array of real numbers and relevant operations. We present it as a convenient notation for matrix summation, nothing less more more. In this notation, the described equation becomes

In this article we use tensor as in torch.Tensor or TensorFlow - i.e. multidimensional arrays of numebrs with some additional structure. This a very specific case of would a mathematican call a tensor. In this context tensor product is outer product. (A note for mathematical purists and fetishists: here we work in a finite-dimensional Hilbert space over real numbers, in a fixed basis.) Also, in physics tensors have a need to fullfil certain transformation criteria II 02: Differential Calculus of Vector Fields from The Feynman Lectures on Physics:

it is not generally true that any three numbers form a vector. It is true only if, when we rotate the coordinate system, the components of the vector transform among themselves in the correct way.

Background

Tensor diagrams where invented by Penrose(Penrose 1971). For the first contact with tensor diagrams I suggest glaring at the beautiful diagrams(Bradley 2019). If you have some background in quantum mechanics, go for a short introduciton(Biamonte and Bergholm 2017) or a slightly longer(Bridgeman and Chubb 2017).

For a complete introduction I suggest book(Coecke and Kissinger 2017) or lecture notes (Biamonte 2020).

Tensor diagrams are popular quantum state decomposition for condensed matter physicsVerstraete, Murg, and Cirac (2008).

Key parts

Tensors

A scalar \(c\), a vector \(v_i\), a matrix \(M_{ij}\) and a (third-order) tensor \(T_{ijk}\) are represented by:

Each loose end correspond to an index.

Basic operations

Scalar product

A dot product of two vectors - traditionally: \(\vec{v} \cdot \vec{u}\) or in quantum mechanics \(\langle u | v \rangle\).

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{i}\]

\[a_{i}\]

\[b_{i}\]

Matrix-vector multiplication

A vector transformed by a matrix, traditionally \(A \vec{u}\) or in quanutm mechanics \(A | v \rangle\).

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{j}\]

\[A_{ij}\]

\[v_{j}\]

Matrix-matrix multiplication

Multiply matrix by matrix, traditionally: \(A B\).

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{k}\]

\[A_{ik}\]

\[B_{kj}\]

Outer product

\[\hspace{0.3cm}=\hspace{0.3cm}\]

\[a_{i}\]

\[b_{j}\]

Tensor product

TODO

Expected value

\(\sum_{ij} v_i M_{ij} v_j\) (or in the context of quantum: \(\langle v | M | v \rangle\))

Trace

TODO

Matrix transpose

TO FIX

\[\hspace{0.3cm}=\hspace{0.3cm}\]

\[B_{ji}\]

Matrix is diagonal

Dot operator

Let’s introduce a symbol for a tensor with all 1s.

In the case of no dimensions, it is just 1, and it is not that interesting. For \(v = (1, 1, \ldots, 1)\). \(M\) is an identity matrix or Kronecker delta. \(T\) is a tensor for which \(T_{ijk} = 1\) if \(i = j = k\), otherwise there are 0s.

Sum

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{ij}\]

\[A_{ij}\]

Column sum

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{i}\]

\[A_{ij}\]

Row sum

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{j}\]

\[A_{ij}\]

Hadamard product

\[\hspace{0.3cm}=\hspace{0.3cm}\]

\[A_{ij}\]

\[B_{ij}\]

Properties

TODO:

normalized probabilities (and other momemnts)
matrix is stochastic
matrix is double stochastic
events are independent

Other examples

Scalar product of matrices

TO FIX

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{ij}\]

\[A_{ij}\]

\[B_{ij}\]

Applying a transformation to vectors in a higher-order tensor

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{k}\]

\[T_{ntk}\]

\[W_{kq}\]

Order-4 tensor and project vectors in the 3rd dimension

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{tk}\]

\[T_{ntkm}\]

\[W_{kq}\]

Batch matrix multiplication

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{k}\]

\[A_{ijk}\]

\[B_{ikl}\]

Tensor contraction

TO FIX

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{qr}\]

\[A_{pqrs}\]

\[B_{tuqvr}\]

Bilinear transformation

TODO: dotize “i” (on top)

\[\hspace{0.3cm}=\hspace{0.3cm}\sum_{kl}\]

\[A_{ik}\]

\[B_{jkl}\]

\[C_{il}\]

Applications

Machine learning

Linear regression

TODO

Factorization and SVD

TODO

Hidden Markov Model

Hidden Markov Models(Rabiner 1989).

TODO

Restricted Boltzmann Machine

TODO

Deep learning

Tensors for data

In deep learning data is structured on 3-5 dimensional tensor. The most typical dimensions are: * sample number (size: batch size), * features - also called embeddings, * spatial coordinates (1D, 2D or 3D).

The order of the dimensions depends on the deep learning framework and concrete functions. For discussions on that, see PyTorch tensor dimension names.

Let’s focus on a two-dimensional image with RGB color channels for input.

Averages

If we want to average over all samples, e.g. for exploration.

Machine Learning for Visualization - Let’s Explore the Cutest Big Dataset by Ian Johnson,
Forma Fluens by Visual AI Lab @ IBM Research
QuickDraw Visual Averages by Country by Kyle McDonald

we can sum the following way:

Channel average values will

Mention:

batch normalization
Google Draw dataset

Note:

do we want/need to introduce a symbol for average vs sum?

Batch processing

TODO

Convolutions

TODO

Separable convolution

TODO

Non-linearity

TODO

And maybe Batch normalization

You may have seen tensor diagrams

Feynman Diagrams(Kaiser 2005) (for unbounded operators on infitely-dimensional Hilbert spaces) and quantum computing.

TODO: Do LSTM diagrams qualify?

Conclusion

Tensor diagrams are cool!

Footnote

Written in R Markdown Distill.

Other inspirations:

Einsum is All you Need - Einstein Summation in Deep Learning - Tim Rocktäschel
Simple diagrams of convoluted neural networks - Piotr Migdał
Tensor Diagram Notation - tensornetwork.org for graphic style
Thinking in tensors, writing in PyTorch (a hands-on deep learning intro), especially chapter 1 (it is an eternal Work in Progress, though)
bra-ket-vue - a visualizer for quantum states and matrices, aware of the tensor structure
Tensor diagram notation in D3.js - JSFiddle - initial version of vis
HN on Terry Tao on some desirable properties of mathematical notation - including that “Is it just me, or does probability theory in general have fairly terrible notation?”

TODO

Idea:

Antisymmetric tensor and cross product.
Tensors in the sense of TensorFlow and torch.tensor (e.g with einsum).

Biamonte, Jacob. 2020. “Lectures on Quantum Tensor Networks.” arXiv:1912.10049 [Cond-Mat, Physics:math-Ph, Physics:quant-Ph], January. http://arxiv.org/abs/1912.10049.

Biamonte, Jacob, and Ville Bergholm. 2017. “Tensor Networks in a Nutshell.” arXiv:1708.00006 [Cond-Mat, Physics:gr-Qc, Physics:hep-Th, Physics:math-Ph, Physics:quant-Ph], July. http://arxiv.org/abs/1708.00006.

Bradley, Tai-Danae. 2019. “Matrices as Tensor Network Diagrams.” https://www.math3ma.com/blog/matrices-as-tensor-network-diagrams.

Bridgeman, Jacob C, and Christopher T Chubb. 2017. “Hand-Waving and Interpretive Dance: An Introductory Course on Tensor Networks.” Journal of Physics A: Mathematical and Theoretical 50 (22): 223001. https://doi.org/10.1088/1751-8121/aa6dc3.

Coecke, Bob, and Aleks Kissinger. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge, United Kingdom ; New York, NY, USA: Cambridge University Press.

Kaiser, David. 2005. “Physics and Feynman’s Diagrams.” American Scientist 93 (2): 156. https://doi.org/10.1511/2005.52.957.

Orús, Román. 2014. “A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States.” Annals of Physics 349 (October): 117–58. https://doi.org/10.1016/j.aop.2014.06.013.

Penrose, Roger. 1971. “Applications of Negative Dimensional Tensors.” Combinatorial Mathematics and Its Applications, no. 1: 221–44. https://www.mscs.dal.ca/~selinger/papers/graphical-bib/public/Penrose-applications-of-negative-dimensional-tensors.pdf.

Rabiner, L. R. 1989. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” Proceedings of the IEEE 77 (2): 257–86. https://doi.org/10.1109/5.18626.

Verstraete, F., V. Murg, and J. I. Cirac. 2008. “Matrix Product States, Projected Entangled Pair States, and Variational Renormalization Group Methods for Quantum Spin Systems.” Advances in Physics 57 (2): 143–224. https://doi.org/10.1080/14789940801912366.

Tensor diagrams

Background

Key parts

Tensors

Basic operations

Scalar product

Matrix-vector multiplication

Matrix-matrix multiplication

Outer product

Tensor product

Expected value

Trace

Matrix transpose

Matrix is diagonal

Dot operator

Dot operator

Sum

Column sum

Row sum

Hadamard product

Properties

Other examples

Scalar product of matrices

Applying a transformation to vectors in a higher-order tensor

Order-4 tensor and project vectors in the 3rd dimension

Batch matrix multiplication

Tensor contraction

Bilinear transformation

Applications

Machine learning

Linear regression

Factorization and SVD

Hidden Markov Model

Restricted Boltzmann Machine

Deep learning

Tensors for data

Averages

Batch processing

Convolutions

Separable convolution

Non-linearity

You may have seen tensor diagrams

Conclusion

Footnote

TODO

References