Quantcast
Channel: Planet Python
Viewing all 22412 articles
Browse latest View live

Erik Marsja: How to Get the Column Names from a Pandas Dataframe – Print and List

$
0
0

The post How to Get the Column Names from a Pandas Dataframe – Print and List appeared first on Erik Marsja.

In this short post, we will learn 6 methods to get the column names from Pandas dataframe. One of the nice things about Pandas dataframes is that each column will have a name (i.e., the variables in the dataset). Now, we can use these names to access specific columns by name without having to know which column number it is.

To access the names of a Pandas dataframe, we can the method columns(). For example, if our dataframe is called df we just type print(df.columns) to get all the columns of the pandas dataframe.

After this, we can work with the columns to access certain columns, rename a column, and so on.

Importing Data from a CSV File

First, before learning the 6 methods to obtain the column names in Pandas, we need some example data. In this post, we will use Pandas read_csv to import data from a CSV file (from this URL). Now, the first step is, as usual, when working with Pandas to import Pandas as pd.

import pandas as pd

df = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/carData/UN98.csv',
                index_col=0)

df.head()

It is, of course, also possible to read xlsx files using Pandas read_excel method.

Six Methods to Get the Column Names from Pandas Dataframe:

Now, we are ready to learn how we can get all the names using different methods.

1. Get the Names Using the columns method

Now, one of the simplest methods to get all the columns from a Pandas dataframe is, of course, using the columns method and printing it. In the code chunk below, we are doing exactly this.

print(df.columns)

2. Access Column Names Using the keys() Method

Second, we can get the exact same result by using the keys() method. That is, we will get the column names by the following code as well.

print(df.keys())

3. Get Column Names by Iterating of the Columns

In the third method, we will simply iterate over the columns to get the column names. As you may notice, we are again using the columns method.

for col_name in df.columns: 
    print(col_name)
get pandas column names

4. Get the Column Names as a List

In the fourth method, on the other hand, we are going to use the list() method to print the column names as a list.

print(list(df.columns))
pandas column names  to list

5. Another Method to Print Column Names as a List

Now, we can use the values method, as well, to get the columns from Pandas dataframe. If we also use the tolist() method, we will get a list, as well.

print(df.columns.values.tolist())

6. How to Get the Column Names with Pandas Sorted

Now, in the final, and sixth, method to print the names, we will use sorted() to get the columns from a Pandas dataframe in alphabetic order:

sorted(df)

How to Get Values by Column Name:

Now, that we know the column names of our dataframe we can access one column (or many). Here’s how we get the values from one column:

print(df['tfr'].values)

If we, on the other hand, want to access more than one column we add a list: df[['tfr', 'region']]

How to Rename a Column

In the final example, on what we can do when we know the column names of a Pandas dataframe is to rename a column.

df.rename(columns={'tfr': 'TFR'})

Note, if we want to save the changed name to our dataframe we can add the inplace=True, to the code above.

Conclusion: Getting all the Column Names with Pandas

Now, in this post, we have learned how to get the column names from a Pandas dataframe. Specifically, we learned why and when this can be useful, 6 different methods to access the column names, and very briefly what we can do when we know the column names. Finally, here’s the Jupyter Notebook with all the example code.

The post How to Get the Column Names from a Pandas Dataframe – Print and List appeared first on Erik Marsja.


Techiediaries - Django: Multiple File/Image Upload with Django 3, Angular 9 and FormData

$
0
0

In the previous tutorial we have seen how to implement file uploading in Django and Angular 9. In this tutorial, we'll see how to implement multiple file uploading.

It's recommended that you start from the previous tutorial to see detailed steps of how to create a django project, how to install Angular CLI and generate a new Angular 9 project along with services and components as we won't cover those basics in this part.

Cloning Angular 9 Django Upload App

If you don't want to follow the steps from the previous part, you first need to get the project we've built. Open a new terminal and run the following command:

$ git clone https://github.com/techiediaries/django-angular-file-upload-example.git

Next, navigate inside the project's folder and install the npm dependencies using the following command:

$ cd django-angular-file-upload-example
$ npm install

Next, start the development server using:

$ ng serve

Your Angular application will be available from the 127.0.0.1:4200 address.

Running the Django 3 Upload Server

Open a new terminal window and create a virtual environment using the following command:

$ python3 -m venv .env

Next, activate the virtual environment using:

$ source .env/bin/activate

Next, navigate to the backend project and install the Python packages using:

$ cd django-angular-file-upload-example/backend
$ pip install -r requirements.txt

Finally, start the development server using:

$ python manage.py runserver

Open your web browser and navigate to the 127.0.0.1:4200/profile page where you can upload image files to the server:

Django REST API File Upload with Angular 7

Adding Multiple File Upload with Angular 9

Now, let's proceed to implement multiple file uploading.

As a reminder, before you can upload files in your django application, you need to set the MEDIA_URL and MEDIA_ROOT in your settings.py file:

MEDIA_URL='/media/'MEDIA_ROOT=os.path.join(BASE_DIR,'media')

Installing ng2-file-upload

We will be using the ng2-file-upload library which provides easy to use directives for working with file upload in Angular 9:

$ npm install --save ng2-file-upload

Importing the File Upload Angular Module

After installing this package, you will need to import FileUploadModule in your application module. Open the src/app/app.module.ts file and the following changes:

// [...]import{FileUploadModule}from'ng2-file-upload';@NgModule({declarations:[AppComponent,ProfileComponent],imports:[// [...]FileUploadModule],providers:[],bootstrap:[AppComponent]})exportclassAppModule{}

After adding FileUploadModule you'll be able to use the following directives in your templates:

  • The ng2FileDrop directive which will enable you to add an area where users can drag and drop multiple files,
  • The ng2FileSelect directive which will enable you to add an input button for selecting multiple files.

Adding the Upload Input

Open the src/app/profile/profile.component.html file and the following content:

<h1>Django REST API with Angular 9 File Upload Example</h1><divng2FileDrop[ngClass]="{'drop-file-over':hasBaseDropZoneOver}"(fileOver)="fileOverBase($event)"[uploader]="uploader"class="area"><divid="dropZone">Drop files here</div></div><inputtype="file"ng2FileSelect[uploader]="uploader"multiple/>

We add the ng2FileDrop directive to the <div> that represents the drop area and the ng2FileSelect directive to the file input field. We also add the multiple keyword to the file input to allow users to select multiple files.

We also use ngClass to add a dynamic CSS class to the drop area that gets activated when a file is dragged over the area and we bind it to the hasBaseDropZoneOver variable which will define in the component.

We bind the fileOver event to a fileOverBase() method that we'll also define in the component. This will be called when a file is dragged over the dropping area.

We also bind the uploader property to an uploader object that we'll also define in the component. This object is used to track the selected and dropped files that will be uploaded.

Next, we add a button to actually upload the files an a list to show the files that will be uploaded:

<button(click)="upload()">Upload files</button><h2>Your files: {{ uploader?.queue?.length }}</h2><ul><li*ngFor="let item of uploader.queue">
    {{ item?.file?.name }}
</li></ul>

Next, open the src/app/profile/profile.component.ts file and start by adding the following imports:

// [...]import{UploadService}from'../upload.service';import{FileUploader,FileLikeObject}from'ng2-file-upload';import{concat}from'rxjs';

Next, define the following variables:

DJANGO_SERVER='http://127.0.0.1:8000';publicuploader:FileUploader=newFileUploader({});publichasBaseDropZoneOver:boolean=false;

Next, define the fileOverBase() method which gets called when a file is dragged over the drop area:

fileOverBase(event):void{this.hasBaseDropZoneOver=event;}

The event variable equals true when the file is over the base area of the drop area.

Next, define the getFiles() method which return the array of files in the uploader queue:

getFiles():FileLikeObject[]{returnthis.uploader.queue.map((fileItem)=>{returnfileItem.file;});}

Adding the Upload Method

Finally, add the upload() method that will be called to actually upload the files to the Django server using HttpClient and FormData:

upload(){letfiles=this.getFiles();console.log(files);letrequests=[];files.forEach((file)=>{letformData=newFormData();formData.append('file',file.rawFile,file.name);requests.push(this.uploadService.upload(formData));});concat(...requests).subscribe((res)=>{console.log(res);},(err)=>{console.log(err);});}

We call the getFiles() method to get an array of all the selected and dropped files. Next we loop over the files array and we create a FormData object and we append the current file in the loop to it then we call the upload() method of our UploadService and we push the returned Observable to the requests array.

Finally we use the RxJS concat() operator to concatenate all returned Observables and subscribe to each one of them sequentially to send multiple POST requests to the server.

Note: In our example, we created a FormData object for each file in the files array. In theory we could create just one FormData object and append all the files in it using [] in the key i.e formData.append('file[]' , file.rawFile, file.name); then send only one request to the Django server to upload all the files appended to the FormData object (See FormData.append()) but this doesn't seem to work for us! (Maybe because of TypeScript?).

We'll use the CSS styling from this codepen. Open the src/app/profile/profile.component.css file and add:

.area{width:77%;padding:15px;margin:15px;border:1pxsolid#333;background:rgba(0,0,0,0.7);}#dropZone{border:2pxdashed#bbb;-webkit-border-radius:5px;border-radius:5px;padding:50px;text-align:center;font:21ptboldarial;color:#bbb;}.drop-file-over{background:#333;}

This is a screenshot of the page after selecting and uploading a bunch of files:

Angular 7 Django Multiple File Upload

Understanding FormData

Typically, when sending data through a form, it will be encoded with application/x-www-form-urlencoded encoding type. Except for when you need to use a file input field (i.e <input type="file">) in your form; in this case you need to use the multipart/form-dataencoding type.

The multipart/form-data can be used to send complex types of data such as files. Data is sent as key/value pairs where each value is associated with a key.

HTML5 provides the FormData interface which is equivalent to using a multipart/form-data form. This interface is useful when you want to send multipart form data with Ajax or HttpClient in case of Angular so instead of creating a form with the multipart/form-data type, we create an instance of FormData and we use the append() method to add key/value pairs.

Conclusion

In this tutorial, we've seen an example of multiple file upload with Angular 9 and Django 3.

PyCharm: PyCharm 2020.1 EAP 3

$
0
0

We have a new Early Access Program (EAP) version of PyCharm that can be now downloaded from our website.

We have concentrated on fixing the issues that needed to be fixed and making lots of improvements so the final PyCharm 2020.1 will be everything you hoped for. Here is a rundown of some of the things you can expect from this build.

Improved in PyCharm

  • The bug which saw users unable to save all the Live templates that were generated by duplicating and editing existing ones has been resolved.
  • When you have multiple print statements one after the other and you want to convert print to print() it now works correctly. So when you have multiple print statements one after the other, you can convert them all at once without ending up with a load of redundant import statements to deal with.
  • An error occurring with the Jupyter notebooks has been fixed. Now, if the notebook has been left open the preview won’t be blank when you restart PyCharm.
  • The Enum class no longer gives a false positive “Unexpected argument”.
  • No one wants to take incompatible plugins with them. So “until-build” versions that are out of date can now be deleted from your PyCharm.
  • Actually, this is just a select few improvements made in this build. We have a lot of improvements from the JetBrains WebStorm team which will go into the professional version. For more details on what’s new in this version, see the release notes.

Interested?

Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP.
If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP and stay up to date. You can find the installation instructions on our website.

Stack Abuse: Selection Sort in Python

$
0
0

Introduction

Sorting, although a basic operation, is one of the most important operations a computer should perform. It is a building block in many other algorithms and procedures, such as searching and merging. Knowing different sorting algorithms could help you better understand the ideas behind the different algorithms, as well as help you come up with better algorithms.

The Selection Sort algorithm sorts an array by finding the minimum value of the unsorted part and then swapping it with the first unsorted element. It is an in-place algorithm, meaning you won't need to allocate additional lists. While slow, it is still used as the main sorting algorithm in systems where memory is limited.

In this article, we will explain how the Selection Sort works and implement it in Python. We will then break down the actions of the algorithm to learn its time complexity.

Selection Sort

So how does the selection sort work? Selection sort breaks the input list in two parts, the sorted part which initially is empty, and the unsorted part, which initially contains the list of all elements. The algorithm then selects the minimum value of all the unsorted file and swaps it with the first unsorted value, and then increases the sorted part by one.

A high level implementation of this sort would look something like this:

def selection_sort(L):
    for i in range(len(L) - 1):
        min_index = argmin(L[i:])
        L[i], L[min_index] = L[min_index], L[i]

In the above pseudocode, argmin() is a function that returns the index of the minimum value. The algorithm uses a variable i to keep track of where the sorted list ends and where the unsorted one begins. Since we start with no sorted items and take the minimum value, it will always be the case that every member of the unsorted part is greater than any member of the sorted part.

The first line increments the value of i, the second line finds the minimum value's index, and the third line swaps those values. Swapping works because Python calculated the right-hand side before assigning anything to the left-hand side, so we don't need any temporary variables.

Let's see how it works in action with a list that contains the following elements: [3, 5, 1, 2, 4].

We begin with the unsorted list:

  • 3 5 1 2 4

The unsorted section has all the elements. We look through each item and determine that 1 is the smallest element. So, we swap 1 with 3:

  • 1 5 3 2 4

Of the remaining unsorted elements, [5, 3, 2, 4], 2 is the lowest number. We now swap 2 with 5:

  • 1 2 3 5 4

This process continues until the list is sorted:

  • 1 2 3 5 4
  • 1 2 3 4 5
  • 1 2 3 4 5

Let's see how we can implement this in Python!

Implementation

The trick to implementing this algorithm is keeping track of the minimum value and swapping two elements of the list. Open a file named sort.py in your favorite editor and enter the following code in it:

def selection_sort(L):
    # i indicates how many items were sorted
    for i in range(len(L)-1):
        # To find the minimum value of the unsorted segment
        # We first assume that the first element is the lowest
        min_index = i
        # We then use j to loop through the remaining elements
        for j in range(i+1, len(L)-1):
            # Update the min_index if the element at j is lower than it
            if L[j] < L[min_index]:
                min_index = j
        # After finding the lowest item of the unsorted regions, swap with the first unsorted item
        L[i], L[min_index] = L[min_index], L[i]

Now let's add some code to the file to test the algorithm:

L = [3, 1, 41, 59, 26, 53, 59]
print(L)
selection_sort(L)

# Let's see the list after we run the Selection Sort
print(L)

You can then open a terminal and run to see the results:

$ python sort.py
[3, 1, 41, 59, 26, 53, 59]
[1, 3, 26, 41, 53, 59, 59]

The list was correctly sorted! We know how it works and we can implement the Selection Sort in Python. Let's get into some theory and look at its performance with regards to time.

Time Complexity Calculation

So how long does it take for selection sort to sort our list? We are going to take an approach and calculate exactly how much time the selection sort algorithm takes, given an array of size n. The first line of the code is:

def selection_sort(L):

This line shouldn't take that much since it's only setting the function stack. We say that this is a constant - the size of our input does not change how long it takes for this code to run. Let's say it takes c1 operations to perform this line of code. Next, we have:

for i in range(len(L)-1):

This one is a little trickier. First of all, we have two function invocations, len() and range(), which are performed before the for loop begins. The cost of len() is also independent of size in CPython, which is the default Python implementation on Windows, Linux, and Mac. This is also true for the initialization of range(). Let's call these two together c2.

Next, we have the for, which is running n - 1 times. This is not a constant, the size of the input does make an impact on how long this is executed. So we have to multiply whatever time it takes for one loop to complete by n - 1.

There is a constant cost for evaluating the in operator, let's say c3. That covers the outer for-loop.

The variable assignment is also done in constant time. We'll call this one c4:

min_index = i

We now encounter the inner for-loop. It has two constant function invocations. Let's say they take c5 operations.

Note that c5 is different from c2, because range here has two arguments, and there is an addition operation being performed here.

So far we have c1 + c2 + (n - 1) * (c3 + c4 + c5) operations, and then our inner loop begins, multiplying everything by...? Well, it's tricky, but if you look closely, it takes n - 2 times in the first loop, n - 3 in the second one, and 1 in the last time.

We need to multiply everything by the sum of all numbers between 1 and n - 2. Mathematicians have told us that the sum would be (n - 2) * (n - 1) / 2. Feel free to read more about the sum of integers between 1 and any positive number xhere.

The contents of the inner loop are completed in constant time as well. Let's assign the time it takes Python to do the in, if, assignment statement and the variable swap take up an arbitrary constant time of c6.

for j in range(i+1, len(L)-1):
    if L[j] < L[min_index]:
        min_index = j
L[i], L[min_index] = L[min_index], L[i]

All-together we get c1 + c2 + (n - 1) * (c3 + c4 + c5) + (n - 2) * (n - 3) * c6 / 2.

We can simplify this to: a * n * n + b * n + c, where a, b and c representing the values of the evaluated constants.

This is known as O(n2). What does that mean? In summary, our algorithm's performance is based on the squared size of our input list. Therefore, if we double the size of our list, the time it takes to sort it would be multiplied by 4! If we divide the size of our input by 3, the time would shrink by 9!

Conclusion

In this article, we looked at how Selection Sort works and implemented it in Python. We then broke the code down line by line to analyze the algorithm's time complexity.

Learning sorting algorithms will help you get a better understanding of algorithms in general. So, in case you haven't already, you can check out our sorting algorithms overview!

Python Circle: Solving python error - TypeError: 'NoneType' object is not iterable

$
0
0
In this article we are trying to understand what a NoneType object is and why we get python error - TypeError: 'NoneType' object is not iterable, Also we will try different ways to handle or avoid this error, python error NoneType object is not iterable, iterating over a None object safely in python

Python Engineering at Microsoft: Python in Visual Studio Code – February 2020 Release

$
0
0

 

We are happy to announce that the February 2020 release of the Python Extension for Visual Studio Code is now available. You can download the Python extension from the Marketplace, or install it directly from the extension gallery in Visual Studio Code. If you already have the Python extension installed, you can also get the latest update by restarting Visual Studio Code or updating it directly in the Extensions view. You can learn more about  Python support in Visual Studio Code in the documentation.

In this release we made improvements that are listed in our changelog, closing a total of 66 issues, including a much faster startup of Jupyter Notebook editor and scaling back of configuration notifications. Keep on reading to learn more!

Jupyter Notebook editor starts up faster

In the January release of the Python extension, we made tremendous improvements towards the performance of the Notebook editor. In this release, we continued that effort to take it even further. In our testing benchmarks, we see an additional 2-3X improvement in speed when starting up the Jupyter server and when opening the Notebook editor. First cell execution is also faster as the Jupyter server now spins up in the background automatically when notebooks are opened.

Scaling Back of Configuration Notifications

Another feedback we often receive is that when opening a workspace that is already configured for Visual Studio Code without having an interpreter selected, the Python extension was throwing a lot of notifications for installation of tools. Previously, the installation would fail because no interpreter was selected in the workspace.

Screenshot of three notification prompts: one for interpreter selection and two for tools installation.

In this release, we scaled back the notification prompts for tools installation. They are now only displayed if an interpreter is selected.

Screenshot of a single notification prompt for interpreter selection.

In case you missed it: Jump to Cursor

Although it’s not part of the new improvements included in this release, the Python debugger supports a feature that doesn’t seem to be widely known: Jump to Cursor.

When you start a debug session and the debugger hits a breakpoint, you can right click on any part of your code – before or after the point where the breakpoint was hit, and select “Jump to Cursor”. This will make the debugger continue its execution from that selected line onward:

Image Feb20 JumpToCursor8

So if you want to execute pieces of code that the debugger had already passed through, you don’t need to restart the debug session and wait for the execution to reach that point again. You can simply set it to jump to the line you wish to execute.

Call for action!

We’d love to hear your feedback! Did you know about this feature before this blog post? Do you think its name can be improved to better indicate its behaviour? Let us know on the following GitHub issue: https://github.com/microsoft/vscode-python/issues/9947.

Other Changes and Enhancements

In this release we have also added small enhancements and fixed issues requested by users that should improve your experience working with Python in Visual Studio Code. Some notable changes include:

  • Automatically start the Jupyter server when opening a notebook or the interative window. (#7232)
  • Don’t display output panel when building workspace symbols. (#9603)
  • Fix to a crash when using pytest to discover doctests with unknown line number. (thanks Olivier Grisel) (#7487)
  • Update Chinese (Traditional) translation. (thanks pan93412) (#9548)

We’re constantly A/B testing new features. If you see something different that was not announced by the team, you may be part of the experiment! To see if you are part of an experiment, you can check the first lines in the Python extension output channel. If you wish to opt-out of A/B testing, you can open the user settings.json file (View > Command Palette… and run Preferences: Open Settings (JSON)) and set the “python.experiments.optOutFrom” setting to [“All”], or to specific experiments you wish to opt out from.

Be sure to download the Python extension for Visual Studio Code now to try out the features above. If you run into any problems, please file an issue on the Python VS Code GitHub page.

 

 

The post Python in Visual Studio Code – February 2020 Release appeared first on Python.

Peter Bengtsson: redirect-chain - Getting a comfortable insight input URL redirects history

$
0
0
redirect-chain: A simple cli tool to see the history of redirects of a URL

Quansight Labs Blog: Creating the ultimate terminal experience in Spyder 4 with Spyder-Terminal

$
0
0

The Spyder-Terminal project is revitalized! The new 0.3.0 version adds numerous features that improves the user experience, and enhances compatibility with the latest Spyder 4 release, in part thanks to the improvements made in the xterm.js project.

Read more… (3 min remaining to read)


Python Circle: Solving python error - ValueError: invalid literal for int() with base 10

$
0
0
This article explains what is ValueError: invalid literal for int() with base 10 and how to avoid it, python error - ValueError: invalid literal for int() with base 10, invalid literal for base 10 error, what is int() function, converting string to integer in python

Codementor: How I learned Python

$
0
0
This Article gives you how you can learn python to automate your daily activities

Weekly Python StackOverflow Report: (ccxv) stackoverflow python report

$
0
0

Catalin George Festila: Python 3.7.5 : Use Brython in web development to avoid javascript.

$
0
0
The tutorial for today is about how can avoid the javascript and use python script in webdevelopment using the Brython. Brython's goal is to replace Javascript with Python, as the scripting language for web browsers. see the official webpage. It is necessary to include brython.js and to run the brython() function upon page load using the onload attribute of the BODY tag. You can use python

Python Circle: Hello Word in Django 2: How to start with Django 2

$
0
0
In this article, we will see how to start working with Django 2.2, Step by step guide to install Django inside a virtual environment and starting the application on localhost, Django 2.2 installation, first Django project, hello world in Django 2.2, First Django application

Python Circle: Getting query params from request in Django

$
0
0
In this article, we will see how to access the query parameters from a request in the Django view, Accessing GET attribute of request, get() vs getlist() method of request in Django, query parameters Django,

Python Circle: How to display flash messages in Django templates

$
0
0
flash messages in Django template, one-time notifications in Django template, messages framework Django, displaying success message in Django, error message display in Django

Catalin George Festila: Python 3.7.5 : The httpx python package.

$
0
0
Today I will present a new python packet that can help you in developing web applications. This is the next generation HTTP client for Python and is named httpx. This python package comes with a nice logo: a butterfly. The official webpage can be found at this webpage. The development team come with this intro: HTTPX is a fully featured HTTP client for Python 3, which provides sync and async APIs

Erik Marsja: Your Guide to Reading Excel (xlsx) Files in Python

$
0
0

The post Your Guide to Reading Excel (xlsx) Files in Python appeared first on Erik Marsja.

In this brief Python tutorial, we are going to learn how to read Excel (xlsx) files using Python. Specifically, we will read xlsx files in Python using the Python module openpyxl. First, we start by the simplest example of reading a xlsx file in Python. Second, we will learn how to read multiple Excel files using Python.

In previous posts, we have learned how to use Pandas read_excel method to import xlsx files with Python. As previously mentioned, however, we will use another package called openpyxl in this post. In the next paragraph, we will learn how to install openpyxl.

Openpyxl Syntax

Basically, here’s the simplest form of using openpyxl for reading a xlsx file in Python:

import openpyxl
from pathlib import Path

xlsx_file = Path('SimData', 'play_data.xlsx')
wb_obj = openpyxl.load_workbook(xlsx_file) 

# Read the active sheet:
sheet = wb_obj.active
how to read xlsx files in python

It is, of course, also possible to learn how to read, write, and append to files in Python (e.g., text files). Make sure to check that post out, as well.

Prerequisites: Python and Openpyxl

Now, before we will learn what Openpyxl is we need to make sure that we have both Python 3 and the module openpyxl installed. One easy way to install Python is to download a Python distribution such as Anaconda or ActivePython. Openpyxl, on the other hand, can as with many Python packages, be installed using both pip and conda. Now, using pip we type the following in a command prompt, or terminal window, pip install openpyxl and using conda we type this; conda install openpyxl.

learn all about reading excel files in pythonExample file 1 (xlsx)

What is the use of Openpyxl in Python?

Openpyxl is a Python module that can be used for reading and writing Excel (with extension xlsx/xlsm/xltx/xltm) files. Furthermore, this module enables a Python script to modify Excel files. For instance, if we want togo through thousands of rows but just read certain data points and make small changes to these points, we can do this based on some criteria with openpyxl.

How do I read an Excel (xlsx) File in Python?

Now, the general method for reading xlsx files in Python (with openpyxl) is to import openpyxl (import openpyxl) and then read the workbook: wb = openpyxl.load_workbook(PATH_TO_EXCEL_FILE). In this post, we will learn more about this, of course.

<<

How to Read a Excel (xlsx) File in Python

Now, in this section, we will be reading a xlsx file in Python using openpyxl. In a previous section, we have already been familiarized with the general template (syntax) for reading a Excel file using openpyxl and we will now get into this module in more detail. Note, we will also work with the Path method from the Pathlib module.

1. Import the Needed Modules

In the first step, to reading a xlsx file in Python, we need to import the modules we need. That is, we will import Path and openpyxl:

import openpyxl
from pathlib import Path
reading excel (xlsx) files in python

2. Setting the Path to the Excel (xlsx) File

In the second step, we will create a variable using Path. Furthermore, this variable will point at the location and filename of the Excel file we want to import with Python:

# Setting the path to the xlsx file:
xlsx_file = Path('SimData', 'play_data.xlsx')
reading xlsx files in python

Note, “SimData” is a subdirectory to that of the Python script (or notebook). That is, if we were to store the Excel file in a completely different directory, we need to put in the full path. For example, xlsx_file = Path(Path.home(), 'Documents', 'SimData', 'play_data.xlsx')if the data is stored in the Documents in our home directory.

3. Read the Excel File (Workbook)

In the third step, we are going to read the xlsx file. Now, we are using the load_workbook() method:

wb_obj = openpyxl.load_workbook(xlsx_file)
how to read excel in python

4. Read the Active Sheet from the Excel file

Now, in the fourth step, we are going to read the active sheet using the active method:

wsheet = wb_obj.active
reading xlsx files in python

Note, if we know the sheet name we can also use this to read the sheet we want: play_data = wb_obj['play_data']

5. Work, or Manipulate, the Excel Sheet

In the final, and fifth step, we can work, or manipulate, the Excel sheet we have imported with Python. For example, if we want to get the value from a specific cell we can do as follows:

print(sheet["C2"].value)

Another example, on what we can do with the spreadsheet in Python, is that we can iterate through the rows and print them:

for row in sheet.iter_rows(max_row=6):
    for cell in row:
        print(cell.value, end="")
    print()

Note, that we used the max_row and set it to 6 to print the 6 first row from the Excel file.

6. Bonus: Determining the Number of Rows and Columns in the Excel File

In the sixth, and bonus step, we are going to find out how many rows and columns we have in the example Excel file we have imported with Python:

print(sheet.max_row, sheet.max_column)
learning how to read xlsx files in python

Reading an Excel (xlsx) FIle to a Python Dictionary

Now, before we learn how to read multiple xlsx files we are going to import data from Excel and into a Python dictionary. It’s quite simple, but for the example below, we need to know the column names before we start. If we want to find out the column names we can run the following code (or just open the Excel file):

import openpyxl
from pathlib import Path

xlsx_file = Path('SimData', 'play_data.xlsx')
wb_obj = openpyxl.load_workbook(xlsx_file)
sheet = wb_obj.active

col_names = []
for column in sheet.iter_cols(1, sheet.max_column):
    col_names.append(column[0].value)
   
    
print(col_names)

Creating a Dictionary from an Excel File

In this section, we will finally read the Excel file using Python and create a dictionary.

data = {}

for i, row in enumerate(sheet.iter_rows(values_only=True)):
    if i == 0:
        data[row[1]] = []
        data[row[2]] = []
        data[row[3]] = []
        data[row[4]] = []
        data[row[5]] = []
        data[row[6]] = []

    else:
        data['Subject ID'].append(row[1])
        data['First Name'].append(row[2])
        data['Day'].append(row[3])
        data['Age'].append(row[4])
        data['RT'].append(row[5])
        data['Gender'].append(row[6])

Now, let’s walk through the code example above. First, we create a Python dictionary (data). Second, we loop through each row (using iter_rows) and we only go through the rows where there are values. Second, we have an if statement where we check if it’s the first row and we add the keys to the dictionary. That is, we set the column names as keys. Third, we append the data to each key (column name) in the else statement.

How to Read Multiple Excel (xlsx) Files in Python

In this section, we will learn how to read multiple xlsx files in Python using openpyxl. Additionally to openpyxl and Path, we are also going to work with the os module.

1. Import the Modules

In the first step, we are going to import the modules Path, glob, and openpyxl:

import glob
import openpyxl
from pathlib import Path

2. Read all xlsx Files in the Directory to a List

Second, we are going to read all the .xlsx files in a subdirectory into a list. Now, we use the glob module together with Path:

xlsx_files = [path for path in Path('XLSX_FILES').rglob('*.xlsx')]

3. Create Workbook Objects (i.e., read the xlsx files)

Third, we can now read all the xlsx files using Python. Again, we will use the load_workbook method. However, this time we will loop through each file we found in the subdirectory,

wbs = [openpyxl.load_workbook(wb) for wb in xlsx_files]

Now, in the code examples above, we are using Python list comprehension (twice, in both step 2 and 3). First, we create a list of all the xlsx files in the “XLSX_FILES” directory. Second, we loop through this list and create a list of workbooks. Of course, we could add this to the first line of code above.

4. Work with the Imported Excel Files

In the fourth step, we can now work with the imported excel files. For example, we can get the first file by adding “[0]” to the list. If we want to know the sheet names of this file we do like this:wbs[0].sheetnames .That is, many of the things we can do, and have done in the previous example on reading xlsx files in Python, can be done when we’ve read multiple Excel files.

Conclusion: Reading Excel (xlsx) Files in Python

In this post, we have learned how to:

  • Read an Excel file in Python using openpyxl
  • Read a xlsx file to a Python dictionary
  • Read multiple Excel fils in Python

It is if course possible to import data from a range of other file formats. For instance, read the post about parsing json files in Python to learn more about reading JSON files.

The post Your Guide to Reading Excel (xlsx) Files in Python appeared first on Erik Marsja.

PyPy Development: PyPy and CFFI have moved to Heptapod

$
0
0
It has been a very busy month, not so much because of deep changes in the JIT of PyPy but more around the development, deployment, and packaging of the project.

 

Hosting

The biggest news is that we have moved the center of our development off Bitbucket and to the new https://foss.heptapod.net/pypy. This is a friendly fork of Gitlab called heptapod that understands Mercurial and is hosted by Clever Cloud. When Atlassian decided to close down Mercurial hosting on bitbucket.org, PyPy debated what to do. Our development model is based on long-lived branches, and we want to keep the ability to immediately see which branch each commit came from. Mercurial has this, git does not (see our FAQ). Octobus, whose business is Mercurial, developed a way to use Mercurial with Gitlab called heptapod. The product is still under development, but quite usable (i.e., it doesn't get in the way). Octobus partnered with Clever Cloud hosting to offer community FOSS projects hosted on Bitbucket who wish to remain with Mercurial a new home. PyPy took them up on the offer, and migrated its repos to https://foss.heptapod.net/pypy. We were very happy with how smooth it was to import the repos to heptapod/GitLab, and are learning the small differences between Bitbucket and GitLab. All the pull requests, issues, and commits kept the same ids, but work is still being done to attribute the issues, pull requests, and comments to the correct users. So from now on, when you want to contribute to PyPy, you do so at the new home.

CFFI, which previously was also hosted on Bitbucket, has joined the PyPy group at https://foss.heptapod.net/pypy/cffi.

 

Website

Secondly, thanks to work by https://baroquesoftware.com/ in leading a redesign and updating the logo, the https://www.pypy.org website has undergone a facelift. It should now be easier to use on small-screen devices. Thanks also to the PSF for hosting the site.

 

Packaging

Also, building PyPy from source takes a fair amount of time. While we provide downloads in the form of tarballs or zipfiles, and some platforms such as debian and Homebrew provide packages, traditionally the downloads have only worked on a specific flavor of operating system. A few years ago squeaky-pl started providing portable builds. We have adopted that build system for our linux offerings, so the nightly downloads and release downloads should now work on any glibc platform that has not gone EndOfLife. So there goes another excuse not to use PyPy. And the "but does it run scipy" excuse also no longer holds, although "does it speed up scipy" still has the wrong answer. For that we are working on HPy, and will be sprinting soon.
The latest versions of pip, wheel, and setuptools, together with the manylinux2010 standard for linux wheels and tools such as multibuild or cibuildwheels (well, from the next version) make it easier for library developers to build binary wheels for PyPy. If you are having problems getting going with this, please reach out.

 

Give it a try

Thanks to all the folks who provide the infrastructure PyPy depends on. We hope the new look will encourage more involvement and engagement. Help prove us right!

The PyPy Team

Kushal Das: No summer training 2020

$
0
0

No summer training 2020 for me. Last year’s batch was beyond my capability to handle. Most of the participants did not follow anything we taught in the course, instead, they kept demanding more things.

I already started receiving mails from a few people who wants to join in the training in 2020. But, there is no positive answer from my side.

All the course materials are public, the logs are also available. We managed to continue this training for 12 years. This is way more than I could ever imagine.

As I was feeling a bit sad about this, the keynote at Railsconf 2019 from DHH actually helped me a lot to feel better.

Zero-with-Dot (Oleg Żero): Restoring intuition over multi-dimensional space

$
0
0

Introduction

We would not be human if we did not curse things. As beings that are confined in a three-dimensional world, we tend to blame space whenever we have a problem to visualize data that extend to more than three dimensions. From scientific books and journal papers to simple blog articles and comments the term: “curse of dimensionality” is being repeated like a mantra, almost convincing us that any object, whose nature extends to something more than just “3D” is out of reach to our brains.

This article is going to discuss neither data visualization nor seek to conform to the common opinion that highly-dimensional space is incomprehensible.

Quite opposite: the highly-dimensional space is not incomprehensible. It is just weird and less intuitive. Fortunately, take advantage of some mathematical tools and use them as a “free ticket” to gain more intuition. More precisely, we will present three “routes” we can use to get a better feeling on how things play out in “ND space.”

The space of possibilities

We often hear that one of the possible failures in optimization problems occurs when the optimizer “gets stuck in a local minimum”. Imagining that our task is to minimize a function of one variable only, we can only move in two directions: left or right. If trying to move in any of the directions makes the function increase, we have found ourselves in a local minimum. Unless this is also a global minimum, we are sort of out of luck.

Now, consider adding one more dimension to space. In a two-dimensional space, even if we hit a local minimum along one of the axes, there is always a chance we can progress in the other. A situation, in which the value of a function in a particular point in space reaches extremum (minimum or maximum) is called a critical point. In case this point is a minimum along one axis, but a maximum in the other, it is called a saddle point.

/assets/multidimensional-space-intuition/saddle.png Figure 1. Examples of a (global) minimum (left), and a saddle point (right).

The saddle points provide a “getaway” direction to the optimizer. While not existent given one dimension, chances that any given critical point is a saddle should increase given more dimensions. To illustrate this, let’s consider the so-called Hessian matrix, which is a matrix of second derivatives of f with respect to all of its arguments.

As the Hessian is a symmetric matrix, we can diagonalize it.

The condition for the critical point to be a minimum is that the Hessian matrix is positive definite, which means h₁, h₂, …, hₙ > 0.

Assuming that f, being a complicated function, is not biased towards any positive or negative values, we can assume that changes that for any critical point, P(hᵢ) > 0, as well as P(hᵢ) < 0, is 1/2. Furthermore, if we assume that hᵢ does not depend on any other hⱼ, we can treat P(hᵢ) as independent events, in which case:

0\right\)=\prod_{i=1}^N&space;P(h_i>0)=\frac{1}{2^N}" />0\right\)=\prod_{i=1}^N&spaceP(h_i>0)=\frac{1}{2^N}" alt="" />

Similarly, for the maxima:

0\right\)=\frac{1}{2^N}">0\right\)=\frac{1}{2^N}" alt="" />

Chances that our critical point is a saddle point is the chance that it is neither a maximum nor a minimum. Consequently, we can see that:

The high-dimensional space seems, therefore, to be the space of possibilities. The higher the number of dimensions, the more likely it feels that thanks to the saddle points, there will be directions for the optimizer to do its job.

Of course, as we can find examples of functions that would not conform to this statement, the statement is not proof. However, if the function f possesses some complicated dependency on its arguments, we could at least expect that the higher the number of dimensions, the more “forgiving” space would be (on average).

A hyper-ball

A circle, a ball or a hyper-ball - a mathematical description of any of these objects is simple.

All that this equation describes is simply a set of points, whose distance from the origin is less or equal to a constant number (regardless of the number of dimensions). It can be shown that for any number of dimensions, the total volume (or hyper-volume) of this object can be calculated using the following formulae:

Looking on how it scales with n, we can see that:

An interesting thing happens if we try to plot this equation for a unit hyper-ball (r = 1), for an arbitrary number of dimensions:

/assets/multidimensional-space-intuition/volume.png Figure 2. Volume for a hyper-ball of unit radius.

As we can see, for the first few n’s, the volume of the sphere increases. However, as soon as n > 5, it quickly drops to a very small number. Can it be true that the unit hyper-ball is losing mass? To see where the mass goes, let’s define a density parameter:

where epsilon is used to define a “shell” of arbitrary thickness. Again, setting r to 1, and sweeping epsilon from zero to one, we can make an interesting observation:

/assets/multidimensional-space-intuition/density.png Figure 3. Concentration of volume of an N-dimensional ball.

As the number of dimensions grows, it turns out that almost all of the mass of the ball is concentrated around its border.

This property of the high-dimensional space is especially important if we consider drawing samples from some neighborhood that belongs to that space. In silico and Mark Khoury provide interesting illustrations, trying to show what happens when N becomes high. It feels almost as if the dimensionality of space deflects that space, making the confined points wanting to escape or push towards the ends as if they tried to explore the additional degrees of freedom that are available.

A multi-dimensional cake

Let’s consider a cubical cake that we would like to slice N times to create some pieces. If our cake is a three-dimensional cube (we will use K to describe dimensions now), the maximal number of pieces we can divide it to is described by the following sequence:

where

is a binomial coefficient.

As before, we can extend our mathematics to more dimensions.

Having a multi-dimensional cake of K dimensions and a multi-dimensional knife to slice is using (K-1)-dimensional hyper-planes, the number of “hyper-pieces” is to be calculated using:

where the latter formula is completely equivalent, but just easier to compute.

Now, let’s see how hyper-pieces count will scale with N and K.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
importnumpyasnpimportpandasaspdformitertoolsimportproductdefcake(n,K):c=0forkinrange(K+1):temp=[(n+1-i)/iforiinrange(1,k+1)]c+=np.array(term).prod()returncN=10# cutsK=10# dimensionsp=list(product(range(1,N+1),range(1,K+1)))X=pd.DataFrame(pd.DataFrame((p,)).to_numpy().reshape(N,K)X.index=range(1,N+1)X.columns=range(1,K+1)C=X.applymap(lambdax:cake(x[0],x[1])).astype(int)ns=X.applymap(lambdax:x[0])ks=X.applymap(lambdax:x[1])C_normed=C/(nx*ks)
/assets/multidimensional-space-intuition/cake.png Figure 4. The _N_, _K_ - cake number (left): showing the maximum number of slices of an _K_-dimensional space using _N_ cuts. The normalized cake number _C / (N * K)_, shows how the number of slices scale with when adding dimensions or cuts.

Looking at the left figure, we see the direct result of the above formula. Despite it looks close to exponential, it is still a polynomial expression. Intuitively, the more dimensions or the more cuts, the more pieces we can yield from the cake.

However, if we normalize C by dividing it by the product of N * K, we can see that for some fixed number of cuts, the increase of the number of slices with N does not happen so fast anymore. In other words, it seems that the potential of the space to be divisible to more unique regions is somehow saturated and that for any amount of cuts K, there exists an “optimal” dimension number N, for which space “prefers” to be divided.

Considering that, for example, for a dense neural network layer the output sate y is obtained by the following vector-matrix multiplication:

where g is an activation function, and

both N and K can be manipulated. As we have seen before, both increasing K (aka the number of features of x) or N (aka the number of hyper-planes) leads to the definition of more regions that can contribute to the unique “firing” patterns of y’s, which are associated with these pieces. The more pieces we ave, the better performance we should expect, but at the same time increasing N and K also means more operations and larger memory footprint. Therefore, if for a given N, the number of slices per every extra dimension added no longer grows, it may be favorable in terms of resource consumption to keep the dense layers smaller?

Conclusions

In this article, we have looked into three aspects of the multidimensionality of space. As we couldn’t visualize it (we didn’t even try…), we took advantage of some mathematical mechanisms to gain a bit more insight into the strange behavior of this world. Although not backed with any ultimate proofs, we hope that the mathematical reasoning just presented can spark some inspiration, intuition, and imagination, which is something that is often needed when having to cope with N-dimensions.

If you have your ideas or opinions (or you would like to point some inconsistency), please share them in the comments below.

Viewing all 22412 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>