Thursday, June 25, 2015

Recap: PyDataUK 2015

This weekend, ~200 delegates trudged through typical London weather (rain) to the Bloomberg offices in London to attend PyDataUK 2015.
While it’s not your typical nerds in T-shirts meet-up; if you use Python to hack data, this conference is probably definitely for you.

Attendance

Curiously, for a ‘data science’ conference the attendance list (which I would have crawled LinkedIn with…heh), was not available. Bases on my (biased) observations, the attendance was roughly as follows…
Type Sub-type Percentage (%)
Industry 70
Self-employed 20
SME 40
Sponsors 10
Large <1
Academia 30
Ugrad <1
Masters <1
PhD 15
Postdoc 5
Professor 10
Government <1
A few highlights…
* Self-employed contractors and consultants were very well represented.

Conference feel

A your data conference, not a ‘big data’ conference

Hadoop has delivered value for <10% of the companies that have installed it
- Paraphrase, anon
This conference is data focused, i.e. focused on using the Python ecosystem to solve your data challenges. The focus is on practice, and practical tools, not theory.
Type Approx Size Appropriate tools
Micro-data <1Gb Ipython
Small-data (Memory-limited) ~10Gb Pandas
Medium-data (Disk-limited) <1Tb Ad-hoc databases
Big-data Tb - Pb Consider enterprise solutions, or grep
The fact is, ‘big data tools’ would be wildly inappropriate for the vast majority of attendees. The problem seems particularly acute in the life sciences. In his war story talk, Paul Agapow covered the herculean efforts required to re-purpose an ill-advised ‘big data’ solution to recover data from a an ongoing clinical trial.
His message was very clear. Life sciences tends to have very detailed, very heterogeneous data in hundreds to thousands of rows (small/medium data): let the data guide the solutions: you probably don’t need enterprise software, so just don’t waste your money.

A Python is useful conference, not a “Python is deity” conference

All tools are shyte, but some tools (Python!) are useful.
- Paraphrase, anon
Speakers like Russel Winder and his talk on the lack of computation efficiency in Python, even using libraries like numpy set a memento mori undertone to some of the more blatant Python triumphalism.

An interpersonal conference, not a Cloister

The very high-level of interpersonal interaction is yet another way in which the conference betrays the nerds in T-shirts. This is very much a conference that one goes to seek guidance and solve problems.
While there are always the stragglers that don’t head down the pub, a good 2/3s of the conference went for fruitful discussion and drink on Saturday. Unsurprisingly, pub attendance was lower on Sunday, but still fruitful.

A place to get hired/take action, not heavy on theory

Folks were hiring like crazy, and it was very much a sellers market.
If you’re a job seeker anywhere on the Python+data spectrum, I’d strongly recommend attending. Companies were recruiting along the entire spectrum, everywhere from AWS-ineering to user-focused commercial data analysis with IPython notebooks (or re-dash, see Arik’s talk for more details on this user-friendly database interaction framework).
In-line with the action oriented nature of the conference, the Pivigo Recruitment founds were there, doing resume/CV screens and offering advice, both to students and established professionals.
If you are a PhD/Postdoc looking to make the transition, I highty recommend taking a look at their Science to Data Science training program.
Continuum may also be prototyping a training programme of their own through its Client Facing Consultant position. Not entirely sure, but 6-months of training via a 3rd-party consultuncy followed by an intentional poach (Continuum –> 3rd party) could be an interesting model.

Talks

I found the spread of talks fantastic. At least amongst the talks I attended…
Type Percentage (%)
Tools 40
War story 30
Skills 20
Under the hood 10

Tools

Tools talks were the most common. They covered ‘non-brand name’ and upcoming tools with emerging communities.
Attend/watch if:
(i) You want to learn about specific tools that may be applicable to your problem.
(ii) You want to collaborate on extending / adopting new tools.

War story

These talks gave the horrifying and nitty-grity details of a specific problem the speaker faced, and how they went about solving it (including gotcha’s and failures). The focus isn’t ‘wow, look at me’; but rather, this was some B.S., and I want no one to go through what I went through ever again.
  • Paul Agapow: Don’t use ‘big data’ tools when simpler solutions will do, particularly in the life sciences.
Attend/watch if:
(i) You want help with the problems you are immediately facing
(ii) You want exposure to problems you’ve never thought-of.

Skills

These were high-level talks that focused more on skills and knowledge than specific tools.
  • Ian Ozdvald: Writing code for you is only the begining, lets see what it takes to push a Bloomberg model to production.
Attend/watch if:
(i) You want to learn what you need to know in a new area.
(ii) You want an overview of a topic you’ve never heard of.
(iii) You want to chat with the speaker about specific War Stories, after the talk.

Under the hood

These talks focused on low-level implementation details of numpy, pandas, Cython, Numba, etc with a particular focus on performance and appropriateness. Personally, I found these talks the most useful. Where else can one gather such concentrated information from the mouth of the open-source contributors.
  • Russel Winder: If you want performance, use Python as a glue-language, and write your computationally intensive functions in a ‘real’ language.
  • Jeff Reback: In pandas, think about idioms and built-in vectorization to get the most out of your code (then write in a ‘real’ language if you still need to go faster).
  • James Powell: Why does writing good numpy feel so different than writing good Python: because the styles have diverged, and will probably continue to do so.
Attend/watch if:
(i) You want a fire-hose of information about low-level topics.
(ii) You want to know how the ‘magic’ happens.

Take-home

This is very much a conference focused on solutions. If you have a problem, don’t be shy!. Ask around, and there will be people there that have faced similar problems, eager to help.
As for me, I look forward to attending next year!