Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Talk Python to Me: #90 Data Wrangling with Python

$
0
0
Do you have a dirty, messy data problem? Whether you work as a software developer or as a data scientist, you've surely run across data that was malformed, incomplete, or maybe even wrong. Don't let messy data wreck your apps or generate wrong results. <br/> <br/> What should you do? Listen to this episode of Talk Python To Me with Katharine Jarmul about the book she co-authored called Data Wrangling with Python and her PyCon UK presentation entitled How to Automate your Data Cleanup with Python. <br/> <br/> Links from the show: <br/> <div style="font-size: .85em;"> <br/> <b>Katharine on the web</b>: <a href='http://kjamistan.com/' target='_blank'>kjamistan.com</a> <br/> <b>Katharine on twitter</b>: <a href='https://twitter.com/kjam' target='_blank'>@kjam</a> <br/> <b>Book: Data Wrangling with Python: Tips and Tools to Make Your Life Easier</b>: <a href='http://amzn.to/2fGc0Cx' target='_blank'>amzn.to/2fGc0Cx</a> <br/> <b>Pycon 2016: How to Automate your Data Cleanup with Python</b>: <a href='https://www.youtube.com/watch?v=gp-ngPV_ZX8' target='_blank'>youtube.com/watch?v=gp-ngPV_ZX8</a> <br/> <br/> <strong>Packages from Data Cleanup talk</strong> <br/> <b>Dedupe Python Library</b>: <a href='https://github.com/datamade/dedupe' target='_blank'>github.com/datamade/dedupe</a> <br/> <b>probablepeople</b>: <a href='https://github.com/datamade/probablepeople' target='_blank'>github.com/datamade/probablepeople</a> <br/> <b>usaddress</b>: <a href='https://github.com/datamade/usaddress' target='_blank'>github.com/datamade/usaddress</a> <br/> <b>jellyfish</b>: <a href='https://github.com/jamesturk/jellyfish' target='_blank'>github.com/jamesturk/jellyfish</a> <br/> <b>Fuzzywuzzy</b>: <a href='https://github.com/seatgeek/fuzzywuzzy' target='_blank'>github.com/seatgeek/fuzzywuzzy</a> <br/> <b>scrubadub</b>: <a href='https://github.com/datascopeanalytics/scrubadub' target='_blank'>github.com/datascopeanalytics/scrubadub</a> <br/> <b>pint</b>: <a href='https://pint.readthedocs.io/en/0.7.2/' target='_blank'>pint.readthedocs.io</a> <br/> <b>arrow</b>: <a href='https://github.com/crsmithdev/arrow' target='_blank'>github.com/crsmithdev/arrow</a> <br/> <b>pdftables.six</b>: <a href='https://github.com/vnaydionov/pdftables' target='_blank'>github.com/vnaydionov/pdftables</a> <br/> <b>Datacleaner</b>: <a href='https://github.com/rhiever/datacleaner' target='_blank'>github.com/rhiever/datacleaner</a> <br/> <b>Parserator</b>: <a href='https://github.com/datamade/parserator' target='_blank'>github.com/datamade/parserator</a> <br/> <b>Gensim</b>: <a href='https://radimrehurek.com/gensim/index.html' target='_blank'>radimrehurek.com/gensim</a> <br/> <b>Faker</b>: <a href='https://github.com/joke2k/faker' target='_blank'>github.com/joke2k/faker</a> <br/> <b>Dask</b>: <a href='http://dask.pydata.org/en/latest/' target='_blank'>dask.pydata.org</a> <br/> <b>SpaCy</b>: <a href='https://spacy.io/' target='_blank'>spacy.io</a> <br/> <b>Airflow</b>: <a href='https://airflow.incubator.apache.org/' target='_blank'>airflow.incubator.apache.org</a> <br/> <b>Luigi</b>: <a href='http://luigi.readthedocs.io/en/stable/' target='_blank'>luigi.readthedocs.io</a> <br/> <br/> <b>Katharine's courses</b> <br/> <br/> <b>Data Pipelines with Python</b> <br/> <a href='http://shop.oreilly.com/product/0636920055334.do' target='_blank'>shop.oreilly.com/product/0636920055334.do</a> <br/> <b>Data Wrangling & Analysis with Python. Learn Pandas</b> <br/> <a href='http://shop.oreilly.com/product/0636920051831.do' target='_blank'>shop.oreilly.com/product/0636920051831.do</a> <br/> <br/> <b>Sponsors</b> <br/> <b>Rollbar</b>: <a href='https://rollbar.com/talkpythontome' target='_blank'>rollbar.com/talkpythontome</a> <br/> <b>GoCD</b>: <a href='https://talkpython.fm/gocd' target='_blank'>go.cd</a> <br/> </div>

Viewing all articles
Browse latest Browse all 22462

Trending Articles