Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 23166

ListenData: How to Import Data in Python

$
0
0
This tutorial explains various methods to read data in Python. Data can be in any of the popular formats - CSV, TXT, XLS/XLSX (Excel), sas7bdat (SAS), Stata, Rdata (R) etc. Loading data in python environment is the most initial step of analyzing data.
Import Data into Python
While importing external files, we need to check the following points -
  1. Check whether header row exists or not
  2. Treatment of special values as missing values
  3. Consistent data type in a variable (column)
  4. Date Type variable in consistent date format.
  5. No truncation of rows while reading external data
Table of Contents

Install and Load pandas Package
pandas is a powerful data analysis package. It makes data exploration and manipulation easy. It has several functions to read data from various sources.
If you are using Anaconda, pandas must be already installed. You need to load the package by using the following command -
import pandas as pd
If pandas package is not installed, you can install it by running the following code in Ipython Console. If you are using Spyder, you can submit the following code in Ipython console within Spyder.
!pip install pandas
If you are using Anaconda, you can try the following line of code to install pandas -
!conda install pandas

1. Import CSV files

It is important to note that a singlebackslash does not work when specifying the file path. You need to either change it to forward slash or add one more backslash like below
import pandas as pd
mydata= pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv")
If no header (title) in raw data file
mydata1 = pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv", header = None)
You need to include header = None option to tell Python there is no column name (header) in data.

Add Column Names

We can include column names by using names= option.
mydata2 = pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv", header = None, names = ['ID', 'first_name', 'salary'])
The variable names can also be added separately by using the following command.
mydata1.columns = ['ID', 'first_name', 'salary']

2. Import File from URL

You don't need to perform additional steps to fetch data from URL. Simply put URL in read_csv() function (applicable only for CSV files stored in URL).
mydata = pd.read_csv("http://winterolympicsmedals.com/medals.csv")

3. Read Text File

We can use read_table() function to pull data from text file. We can also use read_csv() with sep= "\t" to read data from tab-separated file.
mydata = pd.read_table("C:\\Users\\Deepanshu\\Desktop\\example2.txt")
mydata = pd.read_csv("C:\\Users\\Deepanshu\\Desktop\\example2.txt", sep ="\t")
READ MORE »

Viewing all articles
Browse latest Browse all 23166

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>