This tutorial explains various methods to read data in Python. Data can be in any of the popular formats - CSV, TXT, XLS/XLSX (Excel), sas7bdat (SAS), Stata, Rdata (R) etc. Loading data in python environment is the most initial step of analyzing data.
While importing external files, we need to check the following points -
If you are using Anaconda, pandas must be already installed. You need to load the package by using the following command -
We can include column names by using names= option.
READ MORE »
![]() |
Import Data into Python |
- Check whether header row exists or not
- Treatment of special values as missing values
- Consistent data type in a variable (column)
- Date Type variable in consistent date format.
- No truncation of rows while reading external data
Table of Contents
Install and Load pandas Package
pandas is a powerful data analysis package. It makes data exploration and manipulation easy. It has several functions to read data from various sources.If you are using Anaconda, pandas must be already installed. You need to load the package by using the following command -
import pandas as pdIf pandas package is not installed, you can install it by running the following code in Ipython Console. If you are using Spyder, you can submit the following code in Ipython console within Spyder.
!pip install pandasIf you are using Anaconda, you can try the following line of code to install pandas -
!conda install pandas
1. Import CSV files
It is important to note that a singlebackslash does not work when specifying the file path. You need to either change it to forward slash or add one more backslash like belowimport pandas as pd
mydata= pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv")
If no header (title) in raw data file
mydata1 = pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv", header = None)You need to include header = None option to tell Python there is no column name (header) in data.
Add Column Names
We can include column names by using names= option.
mydata2 = pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv", header = None, names = ['ID', 'first_name', 'salary'])The variable names can also be added separately by using the following command.
mydata1.columns = ['ID', 'first_name', 'salary']
2. Import File from URL
You don't need to perform additional steps to fetch data from URL. Simply put URL in read_csv() function (applicable only for CSV files stored in URL).mydata = pd.read_csv("http://winterolympicsmedals.com/medals.csv")
3. Read Text File
We can use read_table() function to pull data from text file. We can also use read_csv() with sep= "\t" to read data from tab-separated file.mydata = pd.read_table("C:\\Users\\Deepanshu\\Desktop\\example2.txt")
mydata = pd.read_csv("C:\\Users\\Deepanshu\\Desktop\\example2.txt", sep ="\t")