Tennessee Leeuwenburg: Setup for PyCon AU Tutorial

This is an attempt to provide attendees of PyCon AU 2015 with a guide to getting set up ahead of the tutorial. Getting set up in advance will assist greatly in getting the most of the tutorial. It will let attendees focus on the slides and the problem examples rather than on hurdling through an installation process.

What it's like installing software during a tutorial session

There will be USB keys on the day with the data sets and some of the software libraries included, in case the network breaks. However, things will go more smoothly on everyone if some of these hurdles can be cleared out the way in advance.

The Software You Will Need

Python 3.4, with Numpy, Scipy, Scikit-Learn, Pandas, Xray, pillow -- install via anaconda
Ipython Notebook, Matplotlib, Seaborn -- install via anaconda
Theano, Keras -- install via pip
Word2Vec (https://github.com/danielfrg/word2vec) -- avoid pip, install from source
https://github.com/danieldiekmeier/memegenerator -- just drop in the notebook folder
https://github.com/tweepy/tweepy -- install via pip

I recommend using Anaconda as it ships with prebuilt binaries for O/S dependencies, for a variety of platforms. It's possible to get this all working with pip and your O/S package manager. It should be fine to use Windows, but OSX or Linux are likely to be easier to use. Due to the use of Ipython Notebook as the primary environment, the choice of operating system is not likely to be a major limiting factor in this case.

I have only had success installing word2vec by cloning the repository and installing locally. I went with the old-school 'python setup.py install'. For whatever reason, what's in PyPI doesn't work for me.

I've noted the easiest path for installing each package in the list above.

The Data You Will Need

MNIST: https://github.com/tleeuwenburg/stml/blob/master/mnist/mnist.pkl.gz
Kaggle Otto competition data: https://www.kaggle.com/c/otto-group-product-classification-challenge
"Text8": http://mattmahoney.net/dc/text8.zip
For a stretch, try the larger data sets from http://mattmahoney.net/dc/textdata

An Overview of the Tutorial

The tutorial will include an introduction, a mini-installfest, and then three problem walkthroughs. There will be some general tips, plus time for discussion.

Entree: Problem Walkthrough One: MNIST Digit Recognition

Compute Time: Around 3 to 5 minutes for a random forest approach

Digit recognition is most obviously used when decoding postcode numbers on envelopes. It's also relevant to general handwriting recognition, and also non-handwritten recognition such as OCR of scanned documents or license plate recognition.

Attendees will be able to run the supplied, worked solution on the spot. We'll step through the implementation stages to talk about how to apply similar solutions to other problems. If time is available, we will include alternative machine learning techniques and other data sets.

Data for this problem will be available on USB.

Main: Otto Shopping Category Challenge

Compute time: 1 minute for random forest
Compute time: 7 minutes for deep learning
Data for this problem can be downloaded only through the Kaggle site due to the terms of use.

This is a real-world, commercial problem. The "Otto Group" sell stuff, and they put that stuff into eight classes for problem. Each thing they sell has 93 features. They sample data set has 200k individual products which have each been somehow scored against these 93 features. The problem definition is to go from 93 input numbers to a category id between 1 and 9.

{ 93 features } --> some kind of machine learning --> { number between 1 and 9 }

Dessert: A Twitter Memebot in Word2Vec

Compute Time: Word2Vec training of 4m + 2 mins meme generation

This is something fun based on Word2Vec. We'll scrape twitter for some text to process, then use Word2Vec to look at some of the word relationships in the timelines.

Visualisation, Plotting and Results Analysis

No data science tutorial would be complete without data visualisation and plotting of results. Rather than have a separate problem for this, we will include them in each problem. We will also be considering how to determine whether your model is 'good', and how to convince both yourself and your customers / managers of that fact!

Bring Your Own Data

If you have a data problem of your own, you can bring it along to the tutorial and work on that instead. As time allows, I'll endeavour to assist with any questions you might have about working with your own data. Alternatively, you can just come up to me during the conference and we can take a look! There's nothing more interesting that looking at data that inherently matters to you.

I hope to see you at the conference!!

Tennessee Leeuwenburg: Setup for PyCon AU Tutorial

The Software You Will Need

The Data You Will Need

An Overview of the Tutorial

Entree: Problem Walkthrough One: MNIST Digit Recognition

Main: Otto Shopping Category Challenge

Dessert: A Twitter Memebot in Word2Vec

Visualisation, Plotting and Results Analysis

Bring Your Own Data

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112