Learning Deep Learning with Keras
I teach deep learning both for a living (as the main deepsense.ai instructor, in a Kagglewinning team^{1}) and as a part of my volunteering with the Polish Children’s Fund giving workshops to gifted highschool students^{2}. I want to share a few things I’ve learnt about teaching (and learning) deep learning.
Whether you want to start learning deep learning for you career, to have a nice adventure (e.g. with detecting huggable objects) or to get insight into machines before they take over^{3}, this post is for you! Its goal is not to teach neural networks by itself, but to provide an overview and to point to didactically useful resources.
Don’t be afraid of artificial neural networks  it is easy to start! In fact, my biggest regret is delaying learning it, because of the perceived difficulty. To start, all you need is really basic programming, very simple mathematics and knowledge of a few machine learning concepts. I will explain where to start with these requirements.
In my opinion, the best way to start is from a highlevel interactive approach (see also: Quantum mechanics for highschool students and my Quantum Game with Photons). For that reason, I suggest starting with image recognition tasks in Keras, a popular neural network library in Python. If you like to train neural networks with less code than in Keras, the only viable option is to use pigeons. Yes, seriously: pigeons spot cancer as well as human experts!
What is deep learning and why is it cool?
Deep learning is a name for machine learning techniques using manylayered artificial neural networks. Occasionally people use the term artificial intelligence, but unless you want to sound scifi, it is reserved for problems that are currently considered “too hard for machines”  a frontier that keeps moving rapidly. This is a field that exploded in the last few years, reaching humanlevel accuracy in visual recognition tasks (among many other tasks), see:
 Measuring the Progress of AI Research by Electronic Frontier Foundation (2017)
Unlike quantum computing, or fusion power  it is a technology that is being applied right now, not some possibility for the future. There is a rule of thumb:
Pretty much anything that a normal person can do in <1 sec, we can now automate with AI.  Andrew Ng’s tweet
Some people go even further, extrapolating that statement to experts. It’s not a surprise that companies like Google and Facebook at the cuttingedge of progress. In fact, every few months I am blown away by something exceeding my expectations, e.g.:
 The Unreasonable Effectiveness of Recurrent Neural Networks^{4} for generating fake Shakespeare, Wikipedia entries and LaTeX articles
 A Neural Algorithm of Artistic Style style transfer (and for videos!)
 Realtime Face Capture and Reenactment
 Colorful Image Colorization
 Plug & Play Generative Networks for photorealistic image generation
 Dermatologistlevel classification of skin cancer along with other medical diagnostic tools
 ImagetoImage Translation (pix2pix)  sketch to photo
 Teaching Machines to Draw sketches of cats, dogs etc
It looks like some sorcery. If you are curious what neural networks are, take a look at this series of videos for a smooth introduction:
 Neural Networks Demystified by Stephen Welch  video series
 A Visual and Interactive Guide to the Basics of Neural Networks by J Alammar
These techniques are datahungry. See a plot of AUC score for logistic regression, random forest and deep learning on Higgs dataset (data points are in millions):
In general there is no guarantee that, even with a lot of data, deep learning does better than other techniques, for example treebased such as random forest or boosted trees.
Let’s play!
Do I need some Skynet to run it? Actually not  it’s a piece of software, like any other. And you can even play with it in your browser:
 TensorFlow Playground for point separation, with a visual interface
 ConvNetJS for digit and image recognition
 Keras.js Demo  to visualize and use real networks in your browser (e.g. ResNet50)
Or… if you want to use Keras in Python, see this minimal example  just to get convinced you can use it on your own computer.
Python and machine learning
I mentioned basics Python and machine learning as a requirement. They are already covered in my introduction to data science in Python and statistics and machine learning sections, respectively.
For Python, if you already have Anaconda distribution (covering most data science packages), the only thing you need is to install TensorFlow and Keras.
When it comes to machine learning, you don’t need to learn many techniques before jumping into deep learning. Though, later it would be a good practice to see if a given problem can be solved with much simpler methods. For example, random forest is often a lockpick, working outofthebox for many problems. You need to understand why we need to train and then test a classifier (to validate its predictive power). To get the gist of it, start with this beautiful treebased animation:
 Visual introduction to machine learning by Stephanie Yee and Tony Chu
Also, it is good to understand logistic regression, which is a building block of almost any neural network for classification.
Mathematics
Deep learning (that is  neural networks with many layers) uses mostly very simple mathematical operations  just many of them. Here there are a few, which you can find in almost any network (look at this list, but don’t get intimidated):
 vectors, matrices, multidimensional arrays,
 addition, multiplication,
 convolutions to extract and process local patterns,
 activation functions: sigmoid, tanh or ReLU to add nonlinearity,
 softmax to convert vectors into probabilities,
 logloss (crossentropy) to penalize wrong guesses in a smart way (see also KullbackLeibler Divergence Explained),
 gradients and chainrule (backpropagation) for optimizing network parameters,
 stochastic gradient descent and its variants (e.g. momentum).
If your background is in mathematics, statistics, physics^{5} or signal processing  most likely you already know more than enough to start!
If your last contact with mathematics was in highschool, don’t worry. Its mathematics is simple to the point that a convolutional neural network for digit recognition can be implemented in a spreadsheet (with no macros), see: Deep Spreadsheets with ExcelNet. It is only a proofofprinciple solution  not only inefficient, but also lacking the most crucial part  the ability to train new networks.
The basics of vector calculus are crucial not only for deep learning, but also for many other machine learning techniques (e.g. in word2vec I wrote about). To learn it, I recommend starting from one of the following:
 Getting started with linear algebra for deep learning by Hadrien Jean (an intro to Linear Algebra from the Deep Learning)
 J. Ström, K. Åström, and T. AkenineMöller, Immersive Linear Algebra  a linear algebra book with fully interactive figures
 Linear algebra cheat sheet for deep learning by Brendan Fortuner
Since there are many references to NumPy, it may be useful to learn its basics:
 From Python to Numpy by Nicolas P. Rougier
 SciPy lectures: The NumPy array object
At the same time  look back at the meme, at the What mathematicians think I do part. It’s totally fine to start from a magically working code, treating neural network layers like LEGO blocks.
Frameworks
There is a handful of popular deep learning libraries, including TensorFlow, Theano, Torch and Caffe. Each of them has Python interface (now also for Torch: PyTorch).
So, which to choose? First, as always, screw all subtle performance benchmarks, as premature optimization is the root of all evil. What is crucial is to start with one which is easy to write (and read!), one with many online resources, and one that you can actually install on your computer without too much pain.
Bear in mind that core frameworks are multidimensional array expression compilers with GPU support. Current neural networks can be expressed as such. However, if you just want to work with neural networks, by rule of least power, I recommend starting with a framework just for neural networks. For example…
Keras
If you like the philosophy of Python (brevity, readability, one preferred way to do things), Keras is for you. It is a highlevel library for neural networks, using TensorFlow or Theano as its backend. Also, if you want to have a propaganda picture, there is a possibly biased (or overfitted?) popularity ranking:
If you want to consult a different source, based on arXiv papers rather than GitHub activity, see A Peek at Trends in Machine Learning by Andrej Karpathy.
Popularity is important  it means that if you want to search for a network architecture, googling for it (e.g. UNet Keras
) is likely to return an example.
Where to start learning it? Documentation on Keras is nice, and its blog is a valuable resource. For a complete, interactive introduction to deep learning with Keras in Jupyter Notebook, I really recommend:
 Deep Learning with Keras and TensorFlow by Valerio Maggio
For shorter ones, try one of these:
 Visualizing parts of Convolutional Neural Networks using Keras and Cats by Erik Reppel
 Deep learning for complete beginners: convolutional neural networks with Keras by Petar Veličković
 Handwritten Digit Recognition using Convolutional Neural Networks in Python with Keras by Jason Brownlee (Theano tensor dimension order^{6})
There are a few addons to Keras, which are especially useful for learning it. I created ASCII summary for sequential models to show data flow inside networks (in a nicer way than model.summary()
).
It shows layers, dimensions of data (x, y, channels)
and the number of free parameters (to be optimized). For example, for a network for digit recognition it might look like:
OPERATION DATA DIMENSIONS WEIGHTS(N) WEIGHTS(%)
Input ##### 32 32 3
Conv2D \/  896 0.1%
relu ##### 32 32 32
Conv2D \/  9248 0.7%
relu ##### 30 30 32
MaxPooling2D Y max  0 0.0%
##### 15 15 32
Dropout    0 0.0%
##### 15 15 32
Conv2D \/  18496 1.5%
relu ##### 15 15 64
Conv2D \/  36928 3.0%
relu ##### 13 13 64
MaxPooling2D Y max  0 0.0%
##### 6 6 64
Dropout    0 0.0%
##### 6 6 64
Flatten   0 0.0%
##### 2304
Dense XXXXX  1180160 94.3%
relu ##### 512
Dropout    0 0.0%
##### 512
Dense XXXXX  5130 0.4%
softmax ##### 10
You might be also interested in nicer progress bars with kerastqdm, exploration of activations at each layer with quiver, checking attention maps with kerasvis or converting Keras models to JavaScript, runnable in a browser with Keras.js. Speaking of languages, there is also R interface to Keras.
EDIT (March 2018): Also, I wrote livelossplot  a live training loss plot in Jupyter Notebook (for Keras, PyTorch and other frameworks).
TensorFlow
If not Keras, then I recommend starting with bare TensorFlow. It is a bit more lowlevel and verbose, but makes it straightforward to optimize various multidimensional array (or, well, tensor) operations. A few good resources:
 the official TensorFlow Tutorial is very good
 Learn TensorFlow and deep learning, without a Ph.D. by Martin Görner
 TensorFlow Tutorial and Examples for beginners by Aymeric Damien (with Python 2.7)
 Simple tutorials using Google’s TensorFlow Framework by Nathan Lintz
In any case, TensorBoard makes it easy to keep track of the training process. It can also be used with Keras, via callbacks.
Other
Theano is similar to TensorFlow, but a bit older and harder to start. For example, you need to manually write updates of variables. Typical neural network layers are not included, so one often uses libraries such as Lasagne. If you’re looking for a place to start, I like this introduction:
 Theano Tutorial by Marek Rei
At the same time, if you see some nice code in Torch or PyTorch, don’t be afraid to install and run it!
EDIT (July 2017): If you want a lowlevel framework, PyTorch may be the best way to start. It combines relatively brief and readable code (almost like Keras) but at the same time gives lowlevel access to all features (actually, more than TensorFlow). Start here:
EDIT (June 2018): In Keras or PyTorch as your first deep learning framework I discuss pros and cons of starting learning deep learning with each of them.
Datasets
Every machine learning problem needs data. You cannot just tell it “detect if there is a cat in this picture” and expect the computer to tell you the answer. You need to show many instances of cats, and pictures not containing cats, and (hopefully) it will learn to generalize it to other cases. So, you need some data to start. And it is not a drawback of machine learning or just deep learning  it is a fundamental property of any learning!
Before you dive into uncharted waters, it is good to take a look at some popular datasets. The key part about them is that they are… popular. It means that you can find a lot of examples what works. And have a guarantee that these problems can be solved with neural networks.
MNIST
Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].  François Chollet’s tweet
Still, I recommend starting with the MNIST digit recognition dataset (60k grayscale 28x28 images), included in keras.datasets. Not necessary to master it, but just to get a sense that it works at all (or to test the basics of Keras on your local machine).
notMNIST
Indeed, I once even proposed that the toughest challenge facing AI workers is to answer the question: “What are the letters ‘A’ and ‘I’?  Douglas R. Hofstadter (1995)
A more interesting dataset, and harder for classical machine learning algorithms, is notMNIST (letters AJ from strange fonts). If you want to start with it, here is my code for notMNIST loading and logistic regression in Keras.
CIFAR
If you want to play with image recognition, there is CIFAR dataset, a dataset of 32x32 photos (also in keras.datasets). It comes in two versions: 10 simple classes (including cats, dogs, frogs and airplanes ) and 100 harder and more nuanced classes (including beaver, dolphin, otter, seal and whale). I strongly suggest starting with CIFAR10, the simpler version. Beware, more complicated networks may take quite some time (~12h on CPU my 7 year old Macbook Pro).
EDIT (Nov 2017): If you are interested in practical exercises, I wrote Starting deep learning handson: image classification on CIFAR10.
More
Deep learning requires a lot of data. If you want to train your network from scratch, it may require as many as ~10k images even if lowresolution (32x32). Especially if data is scarce, there is no guarantee that a network will learn anything. So, what are the ways to go?
 use really low res (if your eye can see it, no need to use higher resolution)
 get a lot of data (for images like 256x256 it may be: millions of instances)
 retrain a network that already saw a lot
 generate much more data (with rotations, shifts, distortions)
Often, it’s a combination of everything mentioned here.
EDIT (May 2018): Were to look for suitable datasets? Kaggle Datasets is a place to start (along with some Kaggle Competitions).
Standing on the shoulders of giants
Creating a new neural network has a lot in common with cooking  there are typical ingredients (layers) and recipes (popular network architectures). The most important cooking contest is ImageNet Large Scale Visual Recognition Challenge, with recognition of hundreds of classes from half a million dataset of photos. Look at these Neural Network Architectures, typically using 224x224x3 input (chart by Eugenio Culurciello):
Circle size represents the number of parameters (a lot!). It doesn’t mention SqueezeNet though, an architecture vastly reducing the number of parameters (e.g. 50x fewer).
A few key networks for image classification can be readily loaded from the keras.applications module: Xception, VGG16, VGG19, ResNet50, InceptionV3. Some others are not as plug & play, but still easy to find online  yes, there is SqueezeNet in Keras. These networks serve two purposes:
 they give insight into useful building blocks and architectures
 they are great candidates for retraining (socalled transfer learning), when using architecture along with pretrained weights)
Some other important network architectures for images:
 UNet: Convolutional Networks for Biomedical Image Segmentation
 A Neural Algorithm of Artistic Style
 Neural Style Transfer & Neural Doodles implemented in Keras by Somshubra Majumdar
 A Brief History of CNNs in Image Segmentation: From RCNN to Mask RCNN by Dhruv Parthasarathy
Another set of insights:
 The Neural Network Zoo by Fjodor van Veen
 How to train your Deep Neural Network  how many layers, parameters, etc
Infrastructure
For very small problems (e.g. MNIST, notMNIST), you can use your personal computer  even if it is a laptop and computations are on CPU.
For small problems (e.g. CIFAR, the unreasonable RNN), you might be still able to use a PC, but it requires much more patience and tradeoffs.
For medium and larger problems, essentially the only way to go is to use a machine with a strong graphic card (GPU). For example, it took us 2 days to train a model for satellite image processing for a Kaggle competition, see our:
 Deep learning for satellite imagery via image segmentation by Arkadiusz Nowaczyński
On a strong CPU it would have taken weeks, see:
 Benchmarks for popular convolutional neural network models by Justin Johnson
The easiest, and the cheapest, way to use a strong GPU is to rent a remote machine on a perhour basis. You can use Amazon (it is not only a bookstore!), here are some guides:
 Keras with GPU on Amazon EC2 – a stepbystep instruction by Mateusz Sieniawski, my mentee
 Running Jupyter notebooks on GPU on AWS: a starter guide by Francois Chollet
EDIT (Dec 2017): For a hasslefree GPU support for deep learning I recommend Neptune: Machine Learning Lab.
Further learning
I encourage you to interact with code. For example, notMNIST or CIFAR10 can be great starting points. Sometimes the best start is to start with someone’s else code and run it, then see what happens when you modify parameters.
For learning how it works, this one is a masterpiece:
 CS231n: Convolutional Neural Networks for Visual Recognition by Andrej Karpathy and the lecture videos
When it comes to books, there is a wonderful one, starting from introduction to mathematics and machine learning learning context (it even covers logloss and entropy in a way I like!):
 Deep Learning, An MIT Press book by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Alternatively, you can use (it may be good for an introduction with interactive materials, but I’ve found the style a bit longwinded):
 Neural Networks and Deep Learning by Michael Nielsen
EDIT (Dec 2017): For a very practical introduction to deep learning with Keras, I recommend Deep Learning with Python by François Chollet.
Other materials
There are many applications of deep learning (it’s not only image recognition!). I collected some introductory materials to cover its various aspects (beware: they are of various difficulty). Don’t try to read them all  I list them for inspiration, not intimidation!
 General
 The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy
 How convolutional neural networks see the world  Keras Blog
 What convolutional neural networks look at when they see nudity – Clarifai Blog (NSFW)
 Convolutional neural networks for artistic style transfer by Harish Nrayanan
 Dreams, Drugs and ConvNets  my slides (NSFW); I am considering turning it into a longer post on machine learning vs human learning, based on common mistakes
 Technical
 Yes you should understand backprop by Andrej Karpathy
 Transfer Learning using Keras by Prakash Vanapalli
 Generative Adversarial Networks (GANs) in 50 lines of code (PyTorch)
 Minimal and Clean Reinforcement Learning Examples
 An overview of gradient descent optimization algorithms by Sebastian Ruder
 Picking an optimizer for Style Transfer by Slav Ivanov
 Building Autoencoders in Keras by Francois Chollet
 Understanding LSTM Networks by Chris Olah
 Recurrent Neural Networks & LSTMs by Rohan Kapur
 Oxford Deep NLP 2017 course
 List of resources
 Staying uptodate:
 r/MachineLearning Reddit channel covering most of new stuff
 distill.pub  an interactive, visual, openaccess journal for machine learning research, with expository articles
 my links at pinboard.in/u:pmigdal/t:deeplearning  though, just saving, not an automatic recommendation
 @fastml_extra Twitter channel
 GitXiv for papers with code
 don’t be afraid to read academic papers; some are wellwritten and insightful (if you own Kindle or another ereader, I recommend Dontprint)
 Data (usually from challenges)
Thanks
I would like to thank Kasia Kulma, Martina Pugliese, Paweł Subko, Monika Pawłowska and Łukasz Kidziński for helpful feedback on the content and to Sarah Martin for polishing my English.
If you recommend a source that helped you with your adventure with deep learning  feel invited to contact me! (@pmigdal for short links, an email for longer remarks.)
The deep learning meme is not mine  I just rewrote it from Theano to Keras (with TensorFlow backend).

NOAA Right Whale Recognition, Winners’ Interview (1st place, Jan 2016), and a fresh one: Deep learning for satellite imagery via image segmentation (4th place, Apr 2017). ↩

This January during a 5day workshop 6 highschool students participated in a rather NSFL project  constructing a neural network for detecting trypophobia triggers, see Trypophobia Image Detector  Browser Plugin using Deep Learning GitHub repository. ↩

It made a few episodes of webcomics obsolete: xkcd: Tasks (totally, by Park or Bird?), xkcd: Game AI (partially, by AlphaGo), PHD Comics: If TV Science was more like REAL Science (not exactly, but still it’s cool, by LapSRN). ↩

The title alludes to The Unreasonable Effectiveness of Mathematics in the Natural Sciences by Eugene Wigner (1960), one of my favourite texts in philosophy of science. Along with More is Different by PW Andreson (1972) and Genesis and development of a scientific fact (pdf here) by Ludwik Fleck (1935). ↩

If your background is in quantum information, the only thing you need to change is ℂ to ℝ. Just expect less tensor structure, but more convolutions. ↩

Is it only me, or does Theano tensor dimension order sound like some secret convent? Before you start searching how to join it: it is about the shape of multidimensional arrays:
(samples, channels, x, y)
rather than TensorFlow’s(samples, x, y, channels)
. ↩