Verdant Force: Discoveries in Life and Proteomics: Recap: PyDataUK 2015

This weekend, ~200 delegates trudged through typical London weather (rain) to the Bloomberg offices in London to attend PyDataUK 2015.
While it’s not your typical nerds in T-shirts meet-up; if you use Python to hack data, this conference is ~~probably~~ definitely for you.

Attendance

Curiously, for a ‘data science’ conference the attendance list (which I would have crawled LinkedIn with…heh), was not available. Bases on my (biased) observations, the attendance was roughly as follows…

Type	Sub-type	Percentage (%)
Industry		70
	Self-employed	20
	SME	40
	Sponsors	10
	Large	<1
Academia		30
	Ugrad	<1
	Masters	<1
	PhD	15
	Postdoc	5
	Professor	10
Government		<1

A few highlights…
* Self-employed contractors and consultants were very well represented.

Conference feel

A your data conference, not a ‘big data’ conference

Hadoop has delivered value for <10% of the companies that have installed it
- Paraphrase, anon

This conference is data focused, i.e. focused on using the Python ecosystem to solve your data challenges. The focus is on practice, and practical tools, not theory.

Type	Approx Size	Appropriate tools
Micro-data	<1Gb	Ipython
Small-data (Memory-limited)	~10Gb	Pandas
Medium-data (Disk-limited)	<1Tb	Ad-hoc databases
Big-data	Tb - Pb	Consider enterprise solutions, or grep

The fact is, ‘big data tools’ would be wildly inappropriate for the vast majority of attendees. The problem seems particularly acute in the life sciences. In his war story talk, Paul Agapow covered the herculean efforts required to re-purpose an ill-advised ‘big data’ solution to recover data from a an ongoing clinical trial.
His message was very clear. Life sciences tends to have very detailed, very heterogeneous data in hundreds to thousands of rows (small/medium data): let the data guide the solutions: you probably don’t need enterprise software, so just don’t waste your money.

A Python is useful conference, not a “Python is deity” conference

All tools are shyte, but some tools (Python!) are useful.
- Paraphrase, anon

Speakers like Russel Winder and his talk on the lack of computation efficiency in Python, even using libraries like numpy set a memento mori undertone to some of the more blatant Python triumphalism.

An interpersonal conference, not a Cloister

The very high-level of interpersonal interaction is yet another way in which the conference betrays the nerds in T-shirts. This is very much a conference that one goes to seek guidance and solve problems.
While there are always the stragglers that don’t head down the pub, a good 2/3s of the conference went for fruitful discussion and drink on Saturday. Unsurprisingly, pub attendance was lower on Sunday, but still fruitful.

A place to get hired/take action, not heavy on theory

Folks were hiring like crazy, and it was very much a sellers market.
If you’re a job seeker anywhere on the Python+data spectrum, I’d strongly recommend attending. Companies were recruiting along the entire spectrum, everywhere from AWS-ineering to user-focused commercial data analysis with IPython notebooks (or re-dash, see Arik’s talk for more details on this user-friendly database interaction framework).
In-line with the action oriented nature of the conference, the Pivigo Recruitment founds were there, doing resume/CV screens and offering advice, both to students and established professionals.
If you are a PhD/Postdoc looking to make the transition, I highty recommend taking a look at their Science to Data Science training program.
Continuum may also be prototyping a training programme of their own through its Client Facing Consultant position. Not entirely sure, but 6-months of training via a 3rd-party consultuncy followed by an intentional poach (Continuum –> 3rd party) could be an interesting model.

Talks

I found the spread of talks fantastic. At least amongst the talks I attended…

Type	Percentage (%)
Tools	40
War story	30
Skills	20
Under the hood	10

Tools

Tools talks were the most common. They covered ‘non-brand name’ and upcoming tools with emerging communities.

Will Usher: Sensitivity analysis with SALib
David MacIver: Randomly test initial conditions in your code simply with Hypothesis. And while you’re at it, why not use contracts to enforce. Wasn’t a talk, but it should be!

Attend/watch if:
(i) You want to learn about specific tools that may be applicable to your problem.
(ii) You want to collaborate on extending / adopting new tools.

War story

These talks gave the horrifying and nitty-grity details of a specific problem the speaker faced, and how they went about solving it (including gotcha’s and failures). The focus isn’t ‘wow, look at me’; but rather, this was some B.S., and I want no one to go through what I went through ever again.

Paul Agapow: Don’t use ‘big data’ tools when simpler solutions will do, particularly in the life sciences.

Attend/watch if:
(i) You want help with the problems you are immediately facing
(ii) You want exposure to problems you’ve never thought-of.

Skills

These were high-level talks that focused more on skills and knowledge than specific tools.

Ian Ozdvald: Writing code for you is only the begining, lets see what it takes to push a Bloomberg model to production.

Attend/watch if:
(i) You want to learn what you need to know in a new area.
(ii) You want an overview of a topic you’ve never heard of.
(iii) You want to chat with the speaker about specific War Stories, after the talk.

Under the hood

These talks focused on low-level implementation details of numpy, pandas, Cython, Numba, etc with a particular focus on performance and appropriateness. Personally, I found these talks the most useful. Where else can one gather such concentrated information from the mouth of the open-source contributors.

Russel Winder: If you want performance, use Python as a glue-language, and write your computationally intensive functions in a ‘real’ language.
Jeff Reback: In pandas, think about idioms and built-in vectorization to get the most out of your code (then write in a ‘real’ language if you still need to go faster).
James Powell: Why does writing good numpy feel so different than writing good Python: because the styles have diverged, and will probably continue to do so.

Attend/watch if:
(i) You want a fire-hose of information about low-level topics.
(ii) You want to know how the ‘magic’ happens.

Take-home

This is very much a conference focused on solutions. If you have a problem, don’t be shy!. Ask around, and there will be people there that have faced similar problems, eager to help.
As for me, I look forward to attending next year!

Verdant Force: Discoveries in Life and Proteomics

Thursday, June 25, 2015

Recap: PyDataUK 2015