Analyzing Tweets with Pandas and Matplotlib
Python has a variety of visualization libraries, including seaborn, networkx, and vispy. Most Python visualization libraries are based wholly or partially on matplotlib, which often makes it the first resort for making simple plots, and the last resort for making plots too complex to create in other libraries.
In this matplotlib tutorial, we’ll cover the basics of the library, and walk through making some intermediate visualizations.
We’ll be working with a dataset of approximately 240,000 tweets about Hillary Clinton, Donald Trump, and Bernie Sanders, all current candidates for president of the United States.
The data was pulled from the Twitter Streaming API, and the csv of all 240,000 tweets can be downloaded here. If you want to scrape more data yourself, you can look here for the scraper code.
Exploring tweets with Pandas
Before we get started with plotting, let’s load in the data and do some basic exploration. We can use Pandas, a Python library for data analysis, to help us with this. In the below code, we’ll:
- Import the Pandas library.
- Read
tweets.csv
into a Pandas DataFrame. - Print the first...