Do you have a dirty, messy data problem? Whether you work as a software developer or as a data scientist, you've surely run across data that was malformed, incomplete, or maybe even wrong. Don't let messy data wreck your apps or generate wrong results.
<br/>
<br/>
What should you do? Listen to this episode of Talk Python To Me with Katharine Jarmul about the book she co-authored called Data Wrangling with Python and her PyCon UK presentation entitled How to Automate your Data Cleanup with Python.
<br/>
<br/>
Links from the show:
<br/>
<div style="font-size: .85em;">
<br/>
<b>Katharine on the web</b>: <a href='http://kjamistan.com/' target='_blank'>kjamistan.com</a>
<br/>
<b>Katharine on twitter</b>: <a href='https://twitter.com/kjam' target='_blank'>@kjam</a>
<br/>
<b>Book: Data Wrangling with Python: Tips and Tools to Make Your Life Easier</b>: <a href='http://amzn.to/2fGc0Cx' target='_blank'>amzn.to/2fGc0Cx</a>
<br/>
<b>Pycon 2016: How to Automate your Data Cleanup with Python</b>: <a href='https://www.youtube.com/watch?v=gp-ngPV_ZX8' target='_blank'>youtube.com/watch?v=gp-ngPV_ZX8</a>
<br/>
<br/>
<strong>Packages from Data Cleanup talk</strong>
<br/>
<b>Dedupe Python Library</b>: <a href='https://github.com/datamade/dedupe' target='_blank'>github.com/datamade/dedupe</a>
<br/>
<b>probablepeople</b>: <a href='https://github.com/datamade/probablepeople' target='_blank'>github.com/datamade/probablepeople</a>
<br/>
<b>usaddress</b>: <a href='https://github.com/datamade/usaddress' target='_blank'>github.com/datamade/usaddress</a>
<br/>
<b>jellyfish</b>: <a href='https://github.com/jamesturk/jellyfish' target='_blank'>github.com/jamesturk/jellyfish</a>
<br/>
<b>Fuzzywuzzy</b>: <a href='https://github.com/seatgeek/fuzzywuzzy' target='_blank'>github.com/seatgeek/fuzzywuzzy</a>
<br/>
<b>scrubadub</b>: <a href='https://github.com/datascopeanalytics/scrubadub' target='_blank'>github.com/datascopeanalytics/scrubadub</a>
<br/>
<b>pint</b>: <a href='https://pint.readthedocs.io/en/0.7.2/' target='_blank'>pint.readthedocs.io</a>
<br/>
<b>arrow</b>: <a href='https://github.com/crsmithdev/arrow' target='_blank'>github.com/crsmithdev/arrow</a>
<br/>
<b>pdftables.six</b>: <a href='https://github.com/vnaydionov/pdftables' target='_blank'>github.com/vnaydionov/pdftables</a>
<br/>
<b>Datacleaner</b>: <a href='https://github.com/rhiever/datacleaner' target='_blank'>github.com/rhiever/datacleaner</a>
<br/>
<b>Parserator</b>: <a href='https://github.com/datamade/parserator' target='_blank'>github.com/datamade/parserator</a>
<br/>
<b>Gensim</b>: <a href='https://radimrehurek.com/gensim/index.html' target='_blank'>radimrehurek.com/gensim</a>
<br/>
<b>Faker</b>: <a href='https://github.com/joke2k/faker' target='_blank'>github.com/joke2k/faker</a>
<br/>
<b>Dask</b>: <a href='http://dask.pydata.org/en/latest/' target='_blank'>dask.pydata.org</a>
<br/>
<b>SpaCy</b>: <a href='https://spacy.io/' target='_blank'>spacy.io</a>
<br/>
<b>Airflow</b>: <a href='https://airflow.incubator.apache.org/' target='_blank'>airflow.incubator.apache.org</a>
<br/>
<b>Luigi</b>: <a href='http://luigi.readthedocs.io/en/stable/' target='_blank'>luigi.readthedocs.io</a>
<br/>
<br/>
<b>Katharine's courses</b>
<br/>
<br/>
<b>Data Pipelines with Python</b>
<br/>
<a href='http://shop.oreilly.com/product/0636920055334.do' target='_blank'>shop.oreilly.com/product/0636920055334.do</a>
<br/>
<b>Data Wrangling & Analysis with Python. Learn Pandas</b>
<br/>
<a href='http://shop.oreilly.com/product/0636920051831.do' target='_blank'>shop.oreilly.com/product/0636920051831.do</a>
<br/>
<br/>
<b>Sponsors</b>
<br/>
<b>Rollbar</b>: <a href='https://rollbar.com/talkpythontome' target='_blank'>rollbar.com/talkpythontome</a>
<br/>
<b>GoCD</b>: <a href='https://talkpython.fm/gocd' target='_blank'>go.cd</a>
<br/>
</div>
↧