My impressions from PyData Berlin 2016
25 May 2016 | by Piotr Migdał
Last week I attended PyData Berlin 2016. It was my first non-academic conference. I was not sure if it was going to be interesting or going beyond things I can see on the Internet anyway. But since I've never been to Berlin, had an open invitation by a friend of mine1, and it's 6.5h by train2 from Warsaw, I decided to go.
td;dr: It was worth my time and I really enjoyed the event.
Talks
Talks I have found the most interesting:
- Katharina Rasch, What every Data Scientist should know about data anonymization
- anonymization by merging unique rows; sensitive information is culture-dependent
- Łukasz Czarnecki, Brand recognition in real-life photos using deep learning
- if you use pre-trained convolutional neural network (like VGG_S), not that many samples are needed (~300 per class)
- Maciej Gryka, Removing Soft Shadows with Hard Data (research paper)
- custom Random Forest can do wonders, even for problems typically suitable for CNNs (vide image colorization)
- Julia Evans, How to trick a neural network
- a periodical reminder that learning by playing (and breaking) is the best way of learning
- Matthew Honnibal, Designing spaCy: A high-performance natural language processing (NLP) library written in Cython
- after reading Sense2vec with spaCy and Gensim it was insightful to hear about the package design philosophy
Also, these were good:
- Daniel Kirsch, Functional Programming in Python
- David Higgins, Introduction to Julia for Python programmers
- Katharine Jarmul, Holy D@t*! How to Deal with Imperfect, Unclean Datasets
- Ian Ozsvald, Statistically Solving Sneezes and Sniffles (a work in progress)
- Olivier Grisel, Evolution of the pydata ecosystem
- Delia Rusu, Estimating stock price correlations using Wikipedia
And talks I missed, but I am sure were great:
- Wes McKinney, Python Data Ecosystem: Thoughts on Building for the Future
- Lev Konstantinovskiy, Practical Word2vec in Gensim (workshop)
- Maciej Jaskowski, Let's play Space Invaders!
If your beloved talk is not there, don't cry - most likely it was in a parallel session. (Also, in general topic selection and quality of presentation was good.)
I had a lightning talk: Teaching Machine Learning. I should write a blog post on it one day (especially on the 5-day data analysis summer school for sociology students and researchers, as now materials are in Polish). As for now, it is implicitly covered in Data science intro for math/phys background.
Other take-home lessons
- There are so many ex-physicists. I even heard Hey, I saw you... on ICPS3.
- Many methods are new, so it's crucial to learn them (and how can they be tailored to your tasks); sometimes being an expert with 10-year experience is physically impossible.
- The number of participants was optimal (200? I hate huge conferences).
- If other PyData events are of similar quality, it's not my last time there! :)
Links
- PyData Berlin 2016 Materials (mainly slides)
- Notes from my PyData Berlin keynote by Julia Evans
- If your have your own blog post or photo gallery from this event - mail me! :)
- UPDATE: There are videos from the talks!
Thanks to the organizers!
Footnotes
- Thanks Skander! ↩
- I love traveling by train. And I spend my time in trains more productively than in the office. I work, read or sleep... efficiently. I guess it's mainly because of slow Internet connection, fixed time, and less opportunities for distraction. ↩
- International Conference of Physics Students, not International Carnivorous Plant Society. Since ICPS is a very student event (a crazy conference party each night), I got scared. Fortunately, Martina Pugliese remembered me from giving A mathematical model of the Mafia game talk. It was in 2010 in Graz; for a moment of nostalgia, here are my photos. ↩