EuroPython: EuroPython 2018: Delaying switch to Late Bird Tickets by one day - please use your coupons today !

July 15, 2018, 4:27 am

≫ Next: Bhishan Bhandari: Idiomatic Python – Looping Approaches

≪ Previous: EuroPython Society: Invitation to the EuroPython Society General Assembly 2018

Since we still have quite a few people with discount coupons who haven’t bought their tickets yet, we are extending the regular ticket sales by one day.

Switch to Late Bird Tickets on July 17, 00:00 CEST

We will now switch to late bird prices, which are about 30% higher than the regular ones on Tuesday, July 17.

Issued coupons are not valid for Late Bird Tickets

Please note that the coupons we have issued so far are not valid for the late bird tickets, so if you have a coupon for the conference, please order your tickets before we switch to late bird.

This includes coupons for sponsors, speakers, trainers and also the EPS community discount coupons we have given to several user groups.

Please make sure you use your coupon before the switch on Tuesday, 00:00 CEST.

Enjoy,
–
EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

↧

Bhishan Bhandari: Idiomatic Python – Looping Approaches

July 15, 2018, 10:54 am

≫ Next: Stefan Behnel: A really fast Python web server with Cython

≪ Previous: EuroPython: EuroPython 2018: Delaying switch to Late Bird Tickets by one day - please use your coupons today !

Python has it’s own unique techniques and guidelines for looping. Through this article, I will present a few examples on bad and better approaches on looping. While the end goal can be achieved using both sets of the codes to follow, the purpose is to highlight on the better approaches and encourage it. Looping over […]

The post Idiomatic Python – Looping Approaches appeared first on The Tara Nights.

↧

Stefan Behnel: A really fast Python web server with Cython

July 15, 2018, 12:42 pm

≫ Next: Mike Driscoll: PyDev of the Week: Katharine Jarmul

≪ Previous: Bhishan Bhandari: Idiomatic Python – Looping Approaches

Shortly after I wrote about speeding up Python web frameworks with Cython, Nexedi posted an article about their attempt to build a fast multicore web server for Python that can compete with the performance of compiled coroutines in the Go language.

Their goal is to use Cython to build a web framework around a fast native web server, and to use Cython's concurrency and coroutine support to gain native performance also in the application code, without sacrificing the readability that Python provides.

Their experiments look very promising so far. They managed to process 10K requests per second concurrently, which actually do real processing work. That is worth noting, because many web server benchmarks out there content themselves with the blank response time for a "hello world", thus ignoring any concurrency overhead etc. For that simple static "Hello world!", they even got 400K requests per second, which shows that this is not a very realistic benchmark. Under load, their system seems to scale pretty linearly with the number of threads, also not a given among web frameworks.

I might personally get involved in further improving Cython for this kind of concurrent, async applications. Stay tuned.

↧

Mike Driscoll: PyDev of the Week: Katharine Jarmul

July 15, 2018, 10:05 pm

≫ Next: Will Kahn-Greene: Thoughts on Guido retiring as BDFL of Python

≪ Previous: Stefan Behnel: A really fast Python web server with Cython

This week we welcome Katharine Jarmul (@kjam) as our PyDev of the Week! Katherine is the co-author of Data Wrangling with Python . She is also the co-founder of KIProtect. You can catch up with the projects she works on over on Github. Let’s take some time to get to know her better!

Can you tell us a little about yourself (hobbies, education, etc):

Sure! I first started working on computers building fan websites for house music in the 90s with my dial-up shared Windows 95 computer. Since then, I have had a love / hate relationship with computers and what is now called data science. I have some formal education with math, statistics and computer science, but also learned most of what I do on my own and therefore am proud to count myself a member of the primarily self-taught folks. For fun, I like to cook and eat with friends, read news or arXiv papers and rant with like-minded folks on and offline. (I am @kjam on Twitter…)

Why did you start using Python?

I first started using Python in 2007, when I was working at the Washington Post. A mentor (Ryan O’Neil) took a chance on me after seeing a small application I built using JavaScript. He set up a Linux computer and installed the Django application stack along with it — even gave me a commit key! I can’t tell you how many times I broke the server, but 6 months later I launched my first Django app. I was hooked and wanted to build and do more.

What other programming languages do you know and which is your favorite?

I have dabbled in numerous other languages: C++, Java, Go, even Perl, R, PHP and Ruby. I like Python the best, but that’s probably because I know it the best. I am working more regularly in Go now, which is really fun — but also hard for me to do so much typing. Python as my primary language has definitely spoiled me, and for data science and machine learning, there is a reason it has been so widely adopted.

What projects are you working on now?

I recently announced my new company, KIProtect (https://kiprotect.com). We are building solutions for data privacy and security for data science and machine learning. Essentially, we believe data privacy should be a right for everyone, not just those of us lucky enough to live in Europe. For this reason, we want to democratize data privacy — making it easier for data scientists and engineers everywhere to enable secure and private data sharing. Our first offering is a pseudonymization API which is free for limited usage (and paid for larger use). This allows you to send private data and get back properly pseudonymized data via one API call. We will be offering additional tools, solutions and APIs to help increase security and privacy in the coming year.

Which Python libraries are your favorite (core or 3rd party)?

NumPy is pretty much the best thing ever as someone working in machine learning and data science. It is such a useful library and the optimizations the core developers have made to allow for us to do fast, efficient math in Python (ahem, Cython) are fantastic. I am unsure if we would have things like Pandas, Scikit-Learn, even Keras and TensorFlow if it wasn’t for the steady grounding of NumPy to help foster a real data science community within Python.

How did you end up writing a book on Python?

I was approached by my co-author Jacqueline Kazil shortly after I moved to Europe. Ironically, the week before I turned to my partner and said, “you know, I am finally feeling less burnt out. I wonder what I should do next?” The book seemed like a great opportunity to get started with computers again.

What did you learn from that experience?

Writing a book is really hard. I know everyone says it, but it takes quite a lot out of you; and you are likely never fully satisfied with the outcome. That said, I have heard a lot of nice things from folks who used our book as a welcoming introduction to the world of Python and data — and if I even convert one new Pythonista, I can say I have achieved some impact.

Is there anything else you’d like to say?

Don’t take your website offline to comply with GDPR (the new EU Privacy regulation). It is alarming to me the blanket blocks of European IPs or other ridiculously clueless reactions and takes I have heard from (primarily) US Americans on the regulation.

First off, the regulation is pretty easy to read — so I recommend reading it. If that’s too hard for you, check out our article covering a lot of what you need to know as a data scientist (https://kiprotect.com/blog/gdpr_for_data_science.html ) or this article for software engineers (https://www.smashingmagazine.com/2018/02/gdpr-for-web-developers/ ).

Secondly, think of it first as a user. Wouldn’t you want more say over your data? Don’t you want to know about data breaches? Is it okay for someone to resell your data without telling you? Treat your users how you want to be treated.

Finally, there are tools to help! At KIProtect, we are building several solutions to help make your life easier. There are also many other companies and projects working to help make our software safer for everyone. Don’t treat privacy and security as nice add-ons, treat them as part of your core product. Protect your data, it might be the most valuable thing you create.

Thanks for doing the interview, Katherine!

↧

Will Kahn-Greene: Thoughts on Guido retiring as BDFL of Python

July 16, 2018, 5:00 am

≫ Next: Mike Driscoll: ANN: Jupyter Notebook 101 Kickstarter

≪ Previous: Mike Driscoll: PyDev of the Week: Katharine Jarmul

I read the news of Guido van Rossum announcing his retirement as BDFL of Python and it made me a bit sad.

I've been programming in Python for almost 20 years on a myriad of open source projects, tools for personal use, and work. I helped out with several PyCon US conferences and attended several others. I met a lot of amazing people who have influenced me as a person and as a programmer.

I started PyVideo in March 2012. At a PyCon US after that (maybe 2015?), I found myself in an elevator with Guido and somehow we got to talking about PyVideo and he asked point-blank, "Why work on that?" I tried to explain what I was trying to do with it: create an index of conference videos across video sites, improve the meta-data, transcriptions, subtitles, feeds, etc. I remember he patiently listened to me and then said something along the lines of how it was a good thing to work on. I really appreciated that moment of validation. I think about it periodically. It was one of the reasons Sheila and I worked hard to transition PyVideo to a new group after we were burned out.

It wouldn't be an overstatement to say that through programming in Python, I've done some good things and become a better person.

Thank you, Guido, for everything!

↧

Mike Driscoll: ANN: Jupyter Notebook 101 Kickstarter

July 16, 2018, 6:00 am

≫ Next: Michael Foord: The Role of Abstractions in Software Engineering

≪ Previous: Will Kahn-Greene: Thoughts on Guido retiring as BDFL of Python

I am happy to announce my latest Kickstarter which is to raise funds to create a book on Jupyter Notebook!

Jupyter Notebook 101 will teach you all you need to know to create and use Notebooks effectively. You can use Jupyter Notebook to help you learn to code, create presentations, make beautiful documentation and much more!

The Jupyter Notebook is also used by the scientific community to demonstrate research in an easy-to-replicate manner.

You will learn the following in Jupyter Notebook 101:

How to create and edit Notebooks
How to add styling, images, graphs, etc
How to configure Notebooks
How to export your Notebooks to other formats
Notebook extensions
Using Notebooks for presentations
and more!

Release Date

I am planning to release the book in November, 2018

You can learn more on Kickstarter!

↧

Michael Foord: The Role of Abstractions in Software Engineering

July 15, 2018, 5:00 pm

≫ Next: Real Python: Reading and Writing CSV Files in Python

≪ Previous: Mike Driscoll: ANN: Jupyter Notebook 101 Kickstarter

Abstract Representation of a Concrete Apple

This is a video and text of a lightning talk, a five minute presentation, given at PyCon US 2018 in Cleveland. The image is an abstract representation of a concrete apple.

The Role of Software Abstractions Lightning Talk

This is an abstract talk. There isn’t time to give examples but I hope that the application to the day to day challenges of the practise of software engineering is clear. The only theory worth a damn is the theory of the practise. This is a talk about the role of abstractions in software engineering.

Programming is all about the use of abstractions. We often say that the fundamental language spoken by the machine is ones and zeros. Binary. This isn’t true. Ones and zeroes are an abstract representation of the fundamental operation of computers. It’s a way of representing what central processors do in a way that can be understood by people.

The actual language spoken by computers is the electromagnetic dance across wires and etched silicon, choreographed by the beating of a quartz crystal at the heart of the machine.

Ones and zeroes are a representation of that dance, understandable by humans in order for us to reason about the behaviour of the system.

That’s a very low level abstraction. Very close to the actual operation of computers, but very hard to work with. The next step up is assembly language where we use mnemonics, symbolic instructions like JMP for jump, to represent these patterns of ones and zeroes. We can also use human recognisable labels for memory locations instead of numbers and allow the assembler to calculate offsets for us. Much easier.

Next we have languages like C and then right at the very top we have Python where each construct, a print statement for example, may correspond to as many as millions of the lowest level operations.

Computer programming is communication in two directions. Programming provides a language the computer understands, and is able to execute deterministically, whilst also communicating with humans so they can conceptualise the behaviour of the system. A programming language is a set of conceptual tools to facilitate that communication in both directions.

The art and craft of software engineering is taking the conceptual tools that programming languages provide and using them to solve real world problems. This is the difference between science and engineering. Science is the theory, engineering is the application.

In order to be able to do this we have to have an understanding of the problem domain. We conceptualise it. We think about it. Software is easy to understand and maintain when the abstractions you build map well to the problem domain. If the way you think about the problem is close to the way you think about your software then you have to do less mental translation between the problem and your code.

Joel Spolsky talks about the law of leaky abstractions. Any abstraction that maps to lower level operations in the system will leak. At some point something will go wrong and you will only be able to fix it by understanding the lower level operations too.

I’ve heard it said, and it rings true, that a good programmer can hold about ten thousand lines of code in their head. So if your system is less than ten thousand lines of code, even if it’s terrible code, you don’t need to build higher level building blocks to hold it in your head.

An all too common situation is that a system becomes too complex to reason about, so an engineer decides to create abstractions to simplify how they think. So they create black boxes, abstractions, in which to place the complexity. These type of abstractions conceal complexity. So now you don’t have to look at the mess you just made.

You can reason about your system with your abstractions, but in order to understand the actual behaviour (at a lower level) you need to go digging in all that dirt.

Instead of concealing complexity a good abstraction will explain and point you to the lower level operations. Good abstractions simplify and reveal complexity rather than concealing it.

We can also use this kind of reasoning to think about product and system design. What user experience are you providing, what’s the user story? Your users also think about the problem domain using conceptual tools. The closer the abstractions your software presents to your user map to the way they already think about the problem the easier your software will be to use.

And here we come full circle. If the way you build your software maps well to the problem domain then it will be easy to reason about and maintain. If the abstractions you present to the user map well to the problem domain then it will be easier for your users to think within your system and it will be more intuitive to use.

So abstractions matter. They’re the raw stuff of our world.

This post originally appeared on my personal blog Abstractions on Unpolished Musings.

↧

Real Python: Reading and Writing CSV Files in Python

July 16, 2018, 7:00 am

≫ Next: Made With Mu: Mu Release Candidate

≪ Previous: Michael Foord: The Role of Abstractions in Software Engineering

Let’s face it: you need to get information into and out of your programs through more than just the keyboard and console. Exchanging information through text files is a common way to share info between programs. One of the most popular formats for exchanging data is the CSV format. But how do you use it?

Let’s get one thing clear: you don’t have to (and you won’t) build your own CSV parser from scratch. There are several perfectly acceptable libraries you can use. The Python csv library will work for most cases. If your work requires lots of data or numerical analysis, the pandas library has CSV parsing capabilities as well, which should handle the rest.

In this article, you’ll learn how to read, process, and parse CSV from text files using Python. You’ll see how CSV files work, learn the all-important csv library built into Python, and see how CSV parsing works using the pandas library.

So let’s get started!

What Is a CSV File?

A CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it’s a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters.

The structure of a CSV file is given away by its name. Normally, CSV files use a comma to separate each specific data value. Here’s what that structure looks like:

column 1 name,column 2 name, column 3 name
first row data 1,first row data 2,first row data 3
second row data 1,second row data 2,second row data 3
...

Notice how each piece of data is separated by a comma. Normally, the first line identifies each piece of data—in other words, the name of a data column. Every subsequent line after that is actual data and is limited only by file size constraints.

In general, the separator character is called a delimiter, and the comma is not the only one used. Other popular delimiters include the tab (\t), colon (:) and semi-colon (;) characters. Properly parsing a CSV file requires us to know which delimiter is being used.

Where Do CSV Files Come From?

CSV files are normally created by programs that handle large amounts of data. They are a convenient way to export data from spreadsheets and databases as well as import or use it in other programs. For example, you might export the results of a data mining program to a CSV file and then import that into a spreadsheet to analyze the data, generate graphs for a presentation, or prepare a report for publication.

CSV files are very easy to work programmatically. Any language that supports text file input and string manipulation (like Python) can work with CSV files directly.

Parsing CSV Files With Python’s Built-in CSV Library

The csv library provides functionality to both read from and write to CSV files. Designed to work out of the box with Excel-generated CSV files, it is easily adapted to work with a variety of CSV formats. The csv library contains objects and other code to read, write, and process data from and to CSV files.

Reading CSV Files With `csv`

Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python’s built-in open() function, which returns a file object. This is then passed to the reader, which does the heavy lifting.

As a reminder, here’s the employee_birthday.txt file:

name,department,birthday month
John Smith,Accounting,November
Erica Meyers,IT,March

Here’s code to read it:

importcsvwithopen('employee_birthday.txt')ascsv_file:csv_reader=csv.reader(csv_file,delimiter=',')line_count=0forrowincsv_reader:ifline_count==0:print(f'Column names are "{", ".join(row)}"')line_count+=1else:print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}.')line_count+=1print(f'Processed {line_count} lines.')

This results in the following output:

Column names are "name", "department", "birthday month"    John Smith works in the Accounting department, and was born in November.    Erica Meyers works in the IT department, and was born in March.Processed 3 lines.

Each row returned by the reader is a list of String elements containing the data found by removing the delimiters. The first row returned contains the column names, which is handled in a special way.

Reading CSV Files Into a Dictionary With `csv`

Rather than deal with a list of individual String elements, you can read CSV data directly into a dictionary (technically, an Ordered Dictionary) as well.

Again, our input file, employee_birthday.txt is as follows:

name,department,birthday month
John Smith,Accounting,November
Erica Meyers,IT,March

Here’s the code to read it in as a dictionary this time:

importcsvwithopen('employee_birthday.txt',mode='r')ascsv_file:csv_reader=csv.DictReader(csv_file)line_count=0forrowincsv_reader:ifline_count==0:print(f'Column names are "{", ".join(row)}"')line_count+=1print(f'\t{row["name"]} works in the {row["department"]} department, and was born in {row["birthday month"]}.')line_count+=1print(f'Processed {line_count} lines.')

This results in the same output as before:

Column names are "name", "department", "birthday month"    John Smith works in the Accounting department, and was born in November.    Erica Meyers works in the IT department, and was born in March.Processed 3 lines.

Where did the dictionary keys come from? The first line of the CSV file is assumed to contain the keys to use to build the dictionary. If you don’t have these in your CSV file, you should specify your own keys by setting the fieldnames optional parameter to a list containing them.

Optional Python CSV `reader` Parameters

The reader object can handle different styles of CSV files by specifying additional parameters, some of which are shown below:

delimiter specifies the character used to separate each field. The default is the comma (',').
quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote (' " ').
escapechar specifies the character used to escape the delimiter character, in case quotes aren’t used. The default is no escape character.

These parameters deserve some more explanation. Suppose you’re working with the employee_addresses.txt file. Here’s a reminder of how it looks:

name,address,date joined
john smith,1132 Anywhere Lane Hoboken NJ, 07030,Jan 4
erica meyers,1234 Smith Lane Hoboken NJ, 07030,March 2

This CSV file contains three fields: name, address, and date joined, which are delimited by commas. The problem is that the data for the address field also contains a comma to signify the zip code.

There are three different ways to handle this situation:

Use a different delimiter
That way, the comma can safely be used in the data itself. You use the delimiter optional parameter to specify the new delimiter.
Wrap the data in quotes
The special nature of your chosen delimiter is ignored in quoted strings. Therefore, you can specify the character used for quoting with the quotechar optional parameter. As long as that character also doesn’t appear in the data, you’re fine.
Escape the delimiter characters in the data
Escape characters work just as they do in format strings, nullifying the interpretation of the character being escaped (in this case, the delimiter). If an escape character is used, it must be specified using the escapechar optional parameter.

Writing CSV Files With `csv`

You can also write to a CSV file using a writer object and the .write_row() method:

importcsvwithopen('employee_file.csv',mode='w')asemployee_file:employee_writer=csv.writer(employee_file,delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)employee_writer.writerow(['John Smith','Accounting','November'])employee_writer.writerow(['Erica Meyers','IT','March'])

The quotechar optional parameter tells the writer which character to use to quote fields when writing. Whether quoting is used or not, however, is determined by the quoting optional parameter:

If quoting is set to csv.QUOTE_MINIMAL, then .writerow() will quote fields only if they contain the delimiter or the quotechar. This is the default case.
If quoting is set to csv.QUOTE_ALL, then .writerow() will quote all fields.
If quoting is set to csv.QUOTE_NONNUMERIC, then .writerow() will quote all fields containing text data and convert all numeric fields to the float data type.
If quoting is set to csv.QUOTE_NONE, then .writerow() will escape delimiters instead of quoting them. In this case, you also must provide a value for the escapechar optional parameter.

Reading the file back in plain text shows that the file is created as follows:

John Smith,Accounting,November
Erica Meyers,IT,March

Writing CSV File From a Dictionary With `csv`

Since you can read our data into a dictionary, it’s only fair that you should be able to write it out from a dictionary as well:

importcsvwithopen('employee_file2.csv',mode='w')ascsv_file:fieldnames=['emp_name','dept','birth_month']writer=csv.DictWriter(csv_file,fieldnames=fieldnames)writer.writeheader()writer.writerow({'emp_name':'John Smith','dept':'Accounting','birth_month':'November'})writer.writerow({'emp_name':'Erica Meyers','dept':'IT','birth_month':'March'})

Unlike DictReader, the fieldnames parameter is required when writing a dictionary. This makes sense, when you think about it: without a list of fieldnames, the DictWriter can’t know which keys to use to retrieve values from your dictionaries. It also uses the keys in fieldnames to write out the first row as column names.

The code above generates the following output file:

emp_name,dept,birth_month
John Smith,Accounting,November
Erica Meyers,IT,March

Parsing CSV Files With the `pandas` Library

Of course, the Python CSV library isn’t the only game in town. Reading CSV files is possible in pandas as well. It is highly recommended if you have a lot of data to analyze.

pandas is an open-source Python library that provides high performance data analysis tools and easy to use data structures. pandas is available for all Python installations, but it is a key part of the Anaconda distribution and works extremely well in Jupyter notebooks to share data, code, analysis results, visualizations, and narrative text.

Installing pandas and its dependencies in Anaconda is easily done:

$ conda install pandas

As is using pip/pipenv for other Python installations:

$ pip install pandas

We won’t delve into the specifics of how pandas works or how to use it. For an in-depth treatment on using pandas to read and analyze large data sets, check out Shantnu Tiwari’s superb article on working with large Excel files in pandas.

Reading CSV Files With `pandas`

To show some of the power of pandas CSV capabilities, I’ve created a slightly more complicated file to read, called hrdata.csv. It contains data on company employees:

Name,Hire Date,Salary,Sick Days remaining
Graham Chapman,03/15/14,50000.00,10
John Cleese,06/01/15,65000.00,8
Eric Idle,05/12/14,45000.00,10
Terry Jones,11/01/13,70000.00,3
Terry Gilliam,08/12/14,48000.00,7
Michael Palin,05/23/13,66000.00,8

Reading the CSV into a pandasDataFrame is quick and straightforward:

importpandasdf=pandas.read_csv('hrdata.csv')print(df)

That’s it: three lines of code, and only one of them is doing the actual work. pandas.read_csv() opens, analyzes, and reads the CSV file provided, and stores the data in a DataFrame. Printing the DataFrame results in the following output:

             Name Hire Date   Salary  Sick Days remaining0  Graham Chapman  03/15/14  50000.0                   101     John Cleese  06/01/15  65000.0                    82       Eric Idle  05/12/14  45000.0                   103     Terry Jones  11/01/13  70000.0                    34   Terry Gilliam  08/12/14  48000.0                    75   Michael Palin  05/23/13  66000.0                    8

Here are a few points worth noting:

First, pandas recognized that the first line of the CSV contained column names, and used them automatically. I call this Goodness.
However, pandas is also using zero-based integer indices in the DataFrame. That’s because we didn’t tell it what our index should be.
Further, if you look at the data types of our columns , you’ll see pandas has properly converted the Salary and Sick Days remaining columns to numbers, but the Hire Date column is still a String. This is easily confirmed in interactive mode:
```
>>> print(type(df['Hire Date'][0]))<class 'str'>
```

Let’s tackle these issues one at a time. To use a different column as the DataFrame index, add the index_col optional parameter:

importpandasdf=pandas.read_csv('hrdata.csv',index_col='Name')print(df)

Now the Name field is our DataFrame index:

               Hire Date   Salary  Sick Days remainingName                                                  Graham Chapman  03/15/14  50000.0                   10John Cleese     06/01/15  65000.0                    8Eric Idle       05/12/14  45000.0                   10Terry Jones     11/01/13  70000.0                    3Terry Gilliam   08/12/14  48000.0                    7Michael Palin   05/23/13  66000.0                    8

Next, let’s fix the data type of the Hire Date field. You can force pandas to read data as a date with the parse_dates optional parameter, which is defined as a list of column names to treat as dates:

importpandasdf=pandas.read_csv('hrdata.csv',index_col='Name',parse_dates=['Hire Date'])print(df)

Notice the difference in the output:

                Hire Date   Salary  Sick Days remainingName                                                   Graham Chapman 2014-03-15  50000.0                   10John Cleese    2015-06-01  65000.0                    8Eric Idle      2014-05-12  45000.0                   10Terry Jones    2013-11-01  70000.0                    3Terry Gilliam  2014-08-12  48000.0                    7Michael Palin  2013-05-23  66000.0                    8

The date is now formatted properly, which is easily confirmed in interactive mode:

>>> print(type(df['Hire Date'][0]))<class 'pandas._libs.tslibs.timestamps.Timestamp'>

If your CSV files doesn’t have column names in the first line, you can use the names optional parameter to provide a list of column names. You can also use this if you want to override the column names provided in the first line. In this case, you must also tell pandas.read_csv() to ignore existing column names using the header=0 optional parameter:

importpandasdf=pandas.read_csv('hrdata.csv',index_col='Employee',parse_dates=['Hired'],header=0,names=['Employee','Hired','Salary','Sick Days'])print(df)

Notice that, since the column names changed, the columns specified in the index_col and parse_dates optional parameters must also be changed. This now results in the following output:

                    Hired   Salary  Sick DaysEmployee                                     Graham Chapman 2014-03-15  50000.0         10John Cleese    2015-06-01  65000.0          8Eric Idle      2014-05-12  45000.0         10Terry Jones    2013-11-01  70000.0          3Terry Gilliam  2014-08-12  48000.0          7Michael Palin  2013-05-23  66000.0          8

Writing CSV Files With `pandas`

Of course, if you can’t get your data out of pandas again, it doesn’t do you much good. Writing a DataFrame to a CSV file is just as easy as reading one in. Let’s write the data with the new column names to a new CSV file:

importpandasdf=pandas.read_csv('hrdata.csv',index_col='Employee',parse_dates=['Hired'],header=0,names=['Employee','Hired','Salary','Sick Days'])df.to_csv('hrdata_modified.csv')

The only difference between this code and the reading code above is that the print(df) call was replaced with df.to_csv(), providing the file name. The new CSV file looks like this:

Employee,Hired,Salary,Sick DaysGraham Chapman,2014-03-15,50000.0,10John Cleese,2015-06-01,65000.0,8Eric Idle,2014-05-12,45000.0,10Terry Jones,2013-11-01,70000.0,3Terry Gilliam,2014-08-12,48000.0,7Michael Palin,2013-05-23,66000.0,8

Conclusion

If you understand the basics of reading CSV files, then you won’t ever be caught flat footed when you need to deal with importing data. Most CSV reading, processing, and writing tasks can be easily handled by the basic csv Python library. If you have a lot of data to read and process, the pandas library provides quick and easy CSV handling capabilities as well.

Are there other ways to parse text files? Of course! Libraries like ANTLR, PLY, and PlyPlus can all handle heavy-duty parsing, and if simple String manipulation won’t work, there are always regular expressions.

But those are topics for other articles…

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Made With Mu: Mu Release Candidate

July 16, 2018, 10:25 am

≫ Next: Bhishan Bhandari: Python Assignment Expression – PEP 572 – Python3.8

≪ Previous: Real Python: Reading and Writing CSV Files in Python

The release candidate for Mu 1.0.0 is out! This is the last step before the final release of Mu 1.0. Apart from a few minor bug fixes, the biggest change from beta 17 is the inclusion of various translations for the user interface. Full details can be found in the changelog.

Many thanks to the following people for their magnificent work on the following translations:

Chinese (incomplete), by Pydo.
Japanese, by Minoru Inachi.
French, by Gerald Quintana.
Spanish, by Carlos Pereira Atencio.
Portuguese, by Tiago Montes.

I would love to include more translations in the final release, especially if they’re in one of the following languages:

Arabic
German
Greek
Hebrew
Hindi
Italian
Russian

(This list reflects both reach and accessibility of languages so Mu is usable by as many beginner programmers as possible.)

Other highlights include a fix to allow users of Adafruit devices to save a file called code.py. This was getting erroneously caught by the new “shadow module” feature which, in this specific case, doesn’t apply. Zander Brown continues to make extraordinary progress in making the user interface both great to look at and consistent across all platforms. We had quite a bit of feedback from teachers who value such UI consistency: it allows them to create resources that apply to all platforms, thus avoiding all the complications of, “if you’re on <platform>, then this will look different” interruptions in the flow of such resources. Finally, Tim Golden and Jonny Austin have done sterling work testing the various fixes for problematic edge-cases in the new BBC micro:bit flash functionality.

↧

Bhishan Bhandari: Python Assignment Expression – PEP 572 – Python3.8

July 16, 2018, 10:52 am

≫ Next: NumFOCUS: NumFOCUS Projects at SciPy 2018

≪ Previous: Made With Mu: Mu Release Candidate

A recent buzz in the Python Community is PEP 572’s acceptance for Python3.8 . PEP stands for Python Enhancement Proposals and each such PEPs are assigned a number by the PEP editors and once assigned are never changed. What exactly is PEP 572(Directly from PEP 572)? Abstract This is a proposal for creating a way […]

The post Python Assignment Expression – PEP 572 – Python3.8 appeared first on The Tara Nights.

↧

NumFOCUS: NumFOCUS Projects at SciPy 2018

July 16, 2018, 2:17 pm

≫ Next: Continuum Analytics Blog: New Release of Anaconda Enterprise features Expanded GPU and Container Usage

≪ Previous: Bhishan Bhandari: Python Assignment Expression – PEP 572 – Python3.8

The post NumFOCUS Projects at SciPy 2018 appeared first on NumFOCUS.

↧

Continuum Analytics Blog: New Release of Anaconda Enterprise features Expanded GPU and Container Usage

July 16, 2018, 2:28 pm

≫ Next: Test and Code: Preparing for Technical Talks with Kelsey Hightower - bonus episode

≪ Previous: NumFOCUS: NumFOCUS Projects at SciPy 2018

Anaconda, Inc. is thrilled to announce the latest release of Anaconda Enterprise, our popular AI/ML enablement platform for teams at scale. The release of Anaconda Enterprise 5.2 adds capabilities for GPU-accelerated, scalable machine learning and cloud-native model management, giving enterprises the power to respond at the speed required by today’s digital interactions. Anaconda Enterprise—An AI/ML …
Read more →

The post New Release of Anaconda Enterprise features Expanded GPU and Container Usage appeared first on Anaconda.

↧

Test and Code: Preparing for Technical Talks with Kelsey Hightower - bonus episode

July 16, 2018, 10:45 pm

≫ Next: Matthew Rocklin: Who uses Dask?

≪ Previous: Continuum Analytics Blog: New Release of Anaconda Enterprise features Expanded GPU and Container Usage

After I had wrapped up the interview with Kelsey Hightower for episode 43, I asked him one last question.

You see, I admire the his presentation style.
So I asked him if he would share with me how he prepared for his presentations.

His answer is so thoughtful and makes so much sense, I couldn't keep it to myself.

I'm releasing this as a bonus mini-episode so that it's easy to refer back to the next time you or I have a chance to do a technical talk.

Special Guest: Kelsey Hightower.

After I had wrapped up the interview with <a href="http://testandcode.com/43" rel="nofollow">Kelsey Hightower for episode 43</a>, I asked him one last question. You see, I admire the his presentation style. So I asked him if he would share with me how he prepared for his presentations. His answer is so thoughtful and makes so much sense, I couldn't keep it to myself. I'm releasing this as a bonus mini-episode so that it's easy to refer back to the next time you or I have a chance to do a technical talk.Special Guest: Kelsey Hightower.

↧

Matthew Rocklin: Who uses Dask?

July 15, 2018, 5:00 pm

≫ Next: Curtis Miller: Stock Data Analysis with Python (Second Edition)

≪ Previous: Test and Code: Preparing for Technical Talks with Kelsey Hightower - bonus episode

This work is supported by Anaconda Inc

People often ask general questions like “Who uses Dask?” or more specific questions like the following:

For what applications do people use Dask dataframe?
How many machines do people often use with Dask?
How far does Dask scale?
Does dask get used on imaging data?
Does anyone use Dask with Kubernetes/Yarn/SGE/Mesos/… ?
Does anyone in the insurance industry use Dask?
…

This yields interesting and productive conversations where new users can dive into historical use cases which informs their choices if and how they use the project in the future.

New users can learn a lot from existing users.

To further enable this conversation we’ve made a new tiny project, dask-stories. This is a small documentation page where people can submit how they use Dask and have that published for others to see.

To seed this site six generous users have written down how their group uses Dask. You can read about them here:

We’ve focused on a few questions, available in our template that focus on problems over technology, and include negative as well as positive feedback to get a complete picture.

Who am I?
What problem am I trying to solve?
How Dask helps?
What pain points did I run into with Dask?
What technology do I use around Dask?

Easy to Contribute

Contributions to this site are simple Markdown documents submitted as pull requests to github.com/dask/dask-stories. The site is then built with ReadTheDocs and updated immediately. We tried to make this as smooth and familiar to our existing userbase as possible.

This is important. Sharing real-world experiences like this are probably more valuable than code contributions to the Dask project at this stage. Dask is more technically mature than it is well-known. Users look to other users to help them understand a project (think of every time you’ve Googled for “some tool in some topic”)

If you use Dask today in an interesting way then please share your story. The world would love to hear your voice.

If you maintain another project you might consider implementing the same model. I hope that this proves successful enough for other projects in the ecosystem to reuse.

↧

Curtis Miller: Stock Data Analysis with Python (Second Edition)

July 17, 2018, 7:00 am

≫ Next: Python Engineering at Microsoft: New web app tutorials in the VS and VS Code Python docs, and docs feedback

≪ Previous: Matthew Rocklin: Who uses Dask?

This is a lecture for MATH 4100/CS 5160: Introduction to Data Science, offered at the University of Utah, introducing time series data analysis applied to finance. This is also an update to my earlier blog posts on the same topic (this one combining them together). I show how to get and visualize stock data in Python, some basic stock analytics, and how to develop a trading system.

↧

Python Engineering at Microsoft: New web app tutorials in the VS and VS Code Python docs, and docs feedback

July 17, 2018, 9:52 am

≫ Next: Chris Warrick: Pipenv: promises a lot, delivers very little

≪ Previous: Curtis Miller: Stock Data Analysis with Python (Second Edition)

This post was written by Kraig Brockschmidt

Recognizing the popularity of the Django and Flask web app frameworks, we recently added several tutorials in the Python documentation that guide you through working with these frameworks in Microsoft’s Python-capable IDEs: the lightweight Visual Studio Code available on all operating systems, and the full Visual Studio for Windows. If you haven’t seen the tutorials yet, this blog post gives you a brief introduction. We also wanted to take the opportunity to highlight how you can contribute to docs, and the ways you can give feedback—both of which we very much welcome!

Flask in Visual Studio Code

First is Using Flask in Visual Studio Code, an already popular walkthrough that starts with creating an environment for Flask and getting a very simple Hello World app up and running. From there it introduces Python debugging and debugger configurations for Flask, using page templates, serving static files, and using template inheritance. The end result is a multi-page app as shown below, which can serve as a starting point for projects of your own. The completed code for the tutorial is available on GitHub.

Django and Flask in Visual Studio

For Visual Studio, we’ve added two series of tutorials for Django and Flask.

Learn Django in Visual Studio is a series of six articles in which you learn how to do the following:

Step 1: Create a basic Django project in a Git repository using the "Blank Django Web Project" template
Step 2: Create a Django app with one page and render that page using a template
Step 3: Serve static files, add pages, and use template inheritance
Step 4: Use the Django Web Project template to create an app with multiple pages and responsive design
Step 5: Authenticate users
Step 6: Use the Polls Django Web Project template to create an app that uses models, database migrations, and customizations to the administrative interface

As you can see, in this tutorial you learn about Django in the context of Visual Studio project templates. The tutorial explains everything that’s happening in the templates, so you can easily adapt the template-generated apps to suit your own needs. The code for the tutorial can of course be found on GitHub.

Learn Flask in Visual Studio, similarly, walks you through the different Flask project templates (code on GitHub):

Step 1: Create a basic Flask project in a Git repository using the "Blank Flask Web Project" template
Step 2: Create a Flask app with one page and render that page using a template
Step 3: Serve static files, add pages, and use template inheritance
Step 4: Use the Flask Web Project template to create an app with multiple pages and responsive design
Step 5: Use the Polls Flask Web Project template to create a polling app that uses a variety of storage options (Azure storage, MongoDB, or memory).

Other Highlights

We’ve also recently added a reference for Python-related item templates in Visual Studio. And if you’re interested in writing C++ modules that you can use from your Python programs, check out Creating a C++ extension for Python.

Contribute to the documentation!

Did you know that all the documentation for Visual Studio and Visual Studio Code is open source?

Visual Studio: https://github.com/MicrosoftDocs/visualstudio-docs
Visual Studio Code: https://github.com/Microsoft/vscode-docs

We’re delighted to receive contributions from the community in both docsets, and actively monitor pull requests. Truly, no contribution is too small. As the content developer who manages these docsets, I very much appreciate anyone who takes the time to fix a typo, correct an error, or otherwise make the docs better! Your contribution doesn’t have to be perfect, either: we’ll take the time to do the necessary editing and formatting.

And did you know you can contribute by simply using the Edit command that appears on each article in the Visual Studio Code docs:

and in the Visual Studio docs?

Those links take you straight into the GitHub repository for that article, where you can make edits, see the history, and enjoy all the other GitHub goodness.

Give docs feedback: what would you like to see?

We also welcome any doc issues or requests you have for new content. In the Visual Studio Docs, the Feedback command that alongside Edit takes you to the bottom of the page where you can create GitHub issues without leaving the docs at all. The Feedback section also shows existing issues and provides a Product Feedback link through which you can file bugs and feature requests for the Python Tools themselves. You can also just create an issue at any time directly in the Visual Studio docs repo.

We’re very responsive to issues and requests. For example, VS docs issue 1086, shown above, identified a detail that was missing from the documentation for the Python Environments window, which we fixed within a day!

For Visual Studio Code, we don’t have integration with GitHub directly in the docs, so just file an issue directly in the repo. For product feedback, use the Request and Report links on the right side of each docs page:

To file a issues for the Python Extension for Visual Studio Code itself (not the docs), go to https://github.com/Microsoft/vscode-python/issues.

In closing, it’s worth mentioning that although you can leave feedback through the “Is this page helpful?” controls in the docs, they are anonymous mechanisms that unfortunately don’t provide us any way to respond. We vastly prefer that you use GitHub issues so we can ask for more details and let you know when the issue has been addressed.

Kraig Brockschmidt
Content Developer for Python in Visual Studio and Visual Studio Code

↧

Chris Warrick: Pipenv: promises a lot, delivers very little

July 17, 2018, 10:40 am

≫ Next: Bhishan Bhandari: Zip files using Python

≪ Previous: Python Engineering at Microsoft: New web app tutorials in the VS and VS Code Python docs, and docs feedback

Pipenv is a Python packaging tool that does one thing reasonably well — application dependency management. However, it is also plagued by issues, limitations and a break-neck development process. In the past, Pipenv’s promotional material was highly misleading as to its purpose and backers.

In this post, I will explore the problems with Pipenv. Was it really recommended by Python.org? Can everyone — or at least, the vast majority of people — benefit from it?

Contents

“Officially recommended tool”, or how we got here

“Pipenv — the officially recommended Python packaging tool from Python.org, free (as in freedom).”

Pipenv’s README used to have a version of the above line in their README for many months: it was added on 2017-08-31 and eventually disappeared on 2018-05-19. For a short while (2018-05-16), it was clarified (managing application dependencies, and PyPA instead of Python.org), and for about 15 minutes, the tagline called Pipenv the world’s worst or something along these lines (this coming from the maintainer).

The README tagline claimed that Pipenv is the be-all, end-all of Python packaging. The problem is: it isn’t that. There are some use cases that benefit from Pipenv, but for many others, trying to use that tool will only lead to frustration. We will explore this issue later.

Another issue with this tagline was the Python.org and official parts. The thing that made it “official” was a short tutorial [1] on packaging.python.org, which is the PyPA’s packaging user guide. Also of note is the Python.org domain used. It makes it sound as if Pipenv was endorsed by the Python core team. PyPA (Python Packaging Authority) is a separate organization — they are responsible for the packaging parts (including pypi.org, setuptools, pip, wheel, virtualenv, etc.) of Python. This made the endorsement misleading. Of course, PyPA is a valued part of the Python world; an endorsement by the core team — say, inclusion in official Python distributions— is something far more important.

This tagline has led to many discussions and flamewars, perhaps with this Reddit thread from May being the most heated and most important. The change was the direct result of this Reddit thread. I recommend reading this thread in full.

What pipenv does

We’ve already learned that Pipenv is used to manage application dependencies. Let’s learn what that term really means.

Application dependencies

Here is an example use case for Pipenv: I’m working on a website based on Django. I create ~/git/website and run pipenv install Django in that directory. Pipenv:

automatically creates a virtualenv somewhere in my home directory
writes a Pipfile, which lists Django as my dependency
installs Django using pip
proceeds to write Pipfile.lock, which stores the exact version and source file hash [2] of each package installed (including pytz, Django’s dependency).

The last part of the process was the most time consuming. At one point, while locking the dependency versions, Pipenv hangs for 46 seconds. This is one of Pipenv’s notable issues: it’s slow. Of course, this isn’t the only one, but it defintely doesn’t help. Losing 46 seconds isn’t much, but when we get to the longer waits in the timing test section later, we’ll see something that could easily discourage users from using a package.

Running scripts (badly)

But let’s continue with our workflow. pipenv run django-admin startproject foobanizer is what I must use now, which is rather unwieldy to type, and requires running pipenv even for the smallest things. (The manage.py script has /usr/bin/env python in its shebang.) I can run pipenv shell to get a new shell which runs the activate script by default, giving you the worst of both worlds when it comes to virtualenv activation: the unwieldiness of a new shell, and the activate script, which the proponents of the shell spawning dislike.

Using pipenv shell means spawning a new subshell, executing the shell startup scripts (eg. .bashrc), and requiring you to exit with exit or ^D. If you type deactivate, you are working with an extra shell, but now outside of the virtualenv. Or you can use the --fancy mode that manipulates $PATH before launching the subshell, but it requires a specific shell configuration, in which $PATH is not overridden in non-login shells — and also often changing the config of your terminal emulator to run a login shell, as many of the Linux terminals don’t do it.

Now, why does all this happen? Because a command cannot manipulate the environment of the shell it spawns. This means that Pipenv must pretend what it does is a reasonable thing instead of a workaround. This can be solved with manual activation using source $(pipenv --venv)/bin/activate (can be made into a neat alias), or shell wrappers (similar to what virtualenvwrapper does).

Finishing it all up

Anyway, I want a blog on my site. I want to write them in Markdown syntax, so I run pipenv install Markdown, and a few long seconds later, it’s added to both Pipfiles. Another thing I can do is pipenv install --dev ipython and get a handy shell for tinkering, but it will be marked as a development dependency — so, not installed in production. That last part is an important advantage of using Pipenv.

When I’m done working on my website, I commit both Pipfiles to my git repository, and push it to the remote server. Then I can clone it to, say, /srv/website. Now I can just pipenv install to get all the production packages installed (but not the development ones — Django, pytz, Markdown will be installed, but IPython and all its million dependencies won’t). There’s just one caveat: by default, the virtualenv will still be created in the current user’s home directory. This is a problem in this case, since it needs to be accessible by nginx and uWSGI, which do not have access to my (or root’s) home directory, and don’t have a home directory of their own. This can be solved with export PIPENV_VENV_IN_PROJECT=1. But note that I will now need to export this environment variable every time I work with the app in /srv via Pipenv. The tool supports loading .env files, but only when running pipenv shell and pipenv run. You can’t use it to configure Pipenv. And to run my app with nginx/uWSGI, I will need to know the exact virtualenv path anyway, since I can’t use pipenv run as part of uWSGI configuration.

What pipenv doesn’t do

The workflow I mentioned above looks pretty reasonable, right? There are some deficiencies, but other than that, it seems to work well. The main issue with Pipenv is: it works with one workflow, and one workflow only. Try to do anything else, and you end up facing multiple obstacles.

Setup.py, source distributions, and wheels

Pipenv only concerns itself with managing dependencies. It isn’t a packaging tool. If you want your thing up on PyPI, Pipenv won’t help you with anything. You still need to write a setup.py with install_requires, because the Pipfile format only specifies the dependencies and runtime requirements (Python version), there is no place in it for the package name, and Pipenv does not mandate/expect you to install your project. It can come in handy to manage the development environment (as a requirements.txt replacement, or something used to write said file), but if your project has a setup.py, you still need to manually manage install_requires. Pipenv can’t create wheels on its own either. And pip freeze is going to be a lot faster than Pipenv ever will be.

Working outside of the project root

Another issue with Pipenv is the use of the working directory to select the virtual environment. [3] Let’s say I’m a library author. A user of my foobar library has just reported a bug and attached a repro.py file that lets me reproduce the issue. I download that file to ~/Downloads on my filesystem. With plain old virtualenv, I can easily confirm the reproduction in a spare shell with:

$ ~/virtualenvs/foobar/bin/python ~/Downloads/repro.py

And then I can launch my fancy IDE to fix the bug. I don’t have to cd into the project. But with Pipenv, I can’t really do that. If I put the virtualenv in .venv with the command line option, I can type ~/git/foobar/.venv/bin/python ~/Downloads/repro.py. If I use the centralized directory + hashes thing, Tab completion becomes mandatory, if I haven’t memorized the hash.

$ cd ~/git/foobar
$ pipenv run python ~/Downloads/repro.py

What if I had two .py files, or repro.py otherwise depended on being in the current working directory?

$ cd ~/git/foobar
$ pipenv shell
(foobar-Mwd1l2m9)$ cd ~/Downloads
(foobar-Mwd1l2m9)$ python repro.py
(foobar-Mwd1l2m9)$ exit# (not deactivate!)

This is becoming ugly fairly quickly. Also, with virtualenvwrapper, I can do this:

$ cd ~/Downloads
$ workon foobar
(foobar)$ python repro.py
(foobar)$ deactivate

And let’s not forget that Pipenv doesn’t help me to write a setup.py, distribute code, or manage releases. It just manages dependencies. And it does it pretty badly.

Nikola

I’m a co-maintainer of a static site generator, Nikola. As part of this, I have the following places where I need to run Nikola:

~/git/nikola
~/git/nikola-site
~/git/nikola-plugins
~/git/nikola-themes
~/website (this blog)
/Volumes/RAMDisk/n (demo site, used for testing and created when needed, on a RAM disk)

That list is long. End users of Nikola probably don’t have a list that long, but they might just have more than one Nikola site. For me, and for the aforementioned users, Pipenv does not work. To use Pipenv, all those repositories would need to live in one directory. I would also need to have a separate Pipenv environment for nikola-users, because that needs Django. Moreover, the Pipfile would have to be symlinked from ~/git/nikola if we were to make use of those in the project. So, I would have a ~/nikola directory just to make Pipenv happy, do testing/bug reproduction on a SSD (and wear it out faster), and so on… Well, I could also use the virtualenv directly. But in that case, Pipenv loses its usefulness, and makes my workflow more complicated. I can’t use virtualenvwrapper, because I would need to hack a fuzzy matching system onto it, or memorize the random string appended to my virtualenv name. All because Pipenv relies on the current directory too much.

Nikola end users who want to use Pipenv will also have a specific directory structure forced on them. What if the site serves as docs for a project, and lives inside another project’s repo? Two virtualenvs, 100 megabytes wasted. Or worse, Nikola ends up in the other project’s Pipfile, which is technically good for our download stats, but not really good for the other project’s contributors.

The part where I try to measure times

Pipenv is famous for being slow. But how slow is it really? I put it to the test. I used two test environments:

Remote: a DigitalOcean VPS, the cheapest option (1 vCPU), Python 3.6/Fedora 28, in Frankfurt
Local: my 2015 13” MacBook Pro (base model), Python 3.7, on a rather slow Internet connection (10 Mbps on a good day, and the test was not performed on one of them)

Both were runninng Pipenv 2018.7.1, installed from pip.

And with the following cache setups:

Removed: ~/.cache/pipenv removed
Partial: rm -rf~/.cache/pipenv/depcache-py*.json~/.cache/pipenv/hash-cache/
Kept: no changes done from previous run

Well, turns out Pipenv likes doing strange things with caching and locking. A look at the Activity Monitor hinted that there is network activity going on when Pipenv displays its Locking [packages] dependencies... line and hangs. Now, the docs don’t tell you that. The most atrocious example was a local Nikola install that was done in two runs: the first pipenv install Nikola run was interrupted [4] right after it was done installing packages, so the cache had all the necessary wheels in it. The install took 10 minutes and 7 seconds, 9:50 of which were taken by locking dependencies and installing the locked dependencies — so, roughly nine and a half minutes were spent staring at a static screen, with the tool doing something in the background — and Pipenv doesn’t tell you what happens in this phase.

Task	Action	Measurement method	Environment	Cache	Times in seconds
Task	Action	Measurement method	Environment	Cache	Attempt 1	Attempt 2	Attempt 3	Average
1	virtualenv	`time`	Remote	(not applicable)	3.911	4.052	3.914	3.959
2	pip install Nikola	`time`	Remote	Removed	11.562	11.943	11.773	11.759
3	pip install Nikola	`time`	Remote	Kept	7.404	7.681	7.569	7.551
4	pipenv install Nikola	`time`	Remote	Removed	67.536	62.973	71.305	67.271
	├─ locking/installing from lockfile	stopwatch			42.6	40.5	39.6	40.9
	└─ Pipfile.lock install	pipenv			14	14	13	13.667
5	adding Django to an environment	`time`	Remote	Kept (only Nikola in cache)	39.576	—	—	39.576
	├─ locking/installing from lockfile	stopwatch			32	—	—	32
	└─ Pipfile.lock install	pipenv			14	—	—	14
6	adding Django to another environment	`time`	Remote	Kept (both in cache)	37.978	—	—	37.978
	├─ locking/installing from lockfile	stopwatch			30.2	—	—	30.2
	└─ Pipfile.lock install	pipenv			14	—	—	14
7	pipenv install Django	`time`	Remote	Removed	20.612	20.666	20.665	20.648
	├─ locking/installing from lockfile	stopwatch			6.6	6.4	6	6.333
	└─ Pipfile.lock install	pipenv			1	1	1	1
8	pipenv install Django (new env)	`time`	Remote	Kept	17.615	—	—	17.615
	├─ locking/installing from lockfile	stopwatch			3.5	—	—	3.5
	└─ Pipfile.lock install	pipenv			1	—	—	1
9	pipenv install Nikola	`time`	Remote	Partial	61.507	—	—	61.507
	├─ locking/installing from lockfile	stopwatch			38.40	—	—	38.40
	└─ Pipfile.lock install	pipenv			14	—	—	14
10	pipenv install Django	`time`	Local	Removed	73.933	—	—	73.933
	├─ locking/installing from lockfile	stopwatch			46	—	—	46
	└─ Pipfile.lock install	pipenv			0	—	—	0
11	virtualenv	`time`	Local	(not applicable)	5.864	—	—	5.864
12	pip install Nikola (cached)	`time`	Local	Kept	10.951	—	—	10.951
13	pipenv install Nikola	`time`	Local	Partial, after interruption	607.647	(10m 7s)		607.647
	├─ locking/installing from lockfile	stopwatch			590.85	(9m 50s)		590.85
	└─ Pipfile.lock install	pipenv			6			6
14	pipenv install	`time`	Local	Kept	31.399	(L/I: 10.51 s)		31.399

Alternatives and new tools

Python packaging is something with the state of which nobody seems to be satisfied. As such, there are many new contenders for the role of “best new packaging tool”. Apart from Pipenv, there are Hatch (by Ofek Lev) and Poetry (by Sébastien Eustace). Both are listed in the “official” tutorial as alternate options.

Hatch

Hatch tries to take care of everything in the packaging process. This is mostly an asset, as it helps replace other tools. However, it can also be argued that it adds a single point of failure. Hatch works on already standard files, such as requirements.txt and setup.py, so it can be replaced with something else quite easily. It doesn’t use as much magic as Pipenv and is more configurable. Some choices made by Hatch are questionable (such as manually parsing pkg/__init__.py for a version number, installing test suites to site-packages (a rather common oversight), or its shell feature which is as ugly as Pipenv’s), and it does not do anything to manage dependencies. It doesn’t necessarily work for the Django use case I mentioned earlier, or for end-users of software.

Poetry

Poetry is somewhere in between. Its main aim is close to Pipenv, but it also makes it possible to distribute things to PyPI. It tries really hard to hide that it uses Pip behind the scenes. Its README comes with an extensive “What about Pipenv?” section, which I recommend reading — it has a few more examples of bad Pipenv features. Poetry claims to use the standardized (PEP 518) pyproject.toml file to replace the usual lot of files. Unfortunately, the only thing that is standardized is the file name and syntax. Poetry uses custom [tool.poetry] sections, which means that one needs Poetry to fully use the packages created with it, leading to vendor lock-in. (The aforementioned Hatch tool also produces a pyproject.tmpl, which contains a metadata section…) There is a build feature to produce a sdist with setup.py and friends.

In a simple poetry add Nikola test, it took 24.4s/15.1s/15.3s to resolve dependencies (according to Poetry’s own count, Remote environment, caches removed), complete with reassuring output and no quiet lockups. Not as good as pip, but it’s more reasonable than Pipenv. Also, the codebase and its layout are rather convoluted. Poetry produces packages instead of just managing dependencies, so it’s generally more useful than Pipenv.

Pip is here to stay!

But in all the talk about new tools, we’re forgetting about the old ones, and they do their job well — so well in fact, that the new tools still need them under the covers.

Pip is fast. It does its job well enough. It lacks support for splitting packages between production and development (as Pipenv and Poetry do). This means that pip freeze and pip install are instant, at the cost of (a) needing two separate environments, or (b) installing development dependencies in production (which should only be a waste of HDD space and nothing more in a well-architected system).

The virtualenv management features can be provided by virtualenvwrapper. That tool’s main advantage is the shell script implementation, which means that workon foo activates the foo virtualenv without spawning a new subshell (an issue with Pipenv, Hatch, and Poetry, that I already covered when describing Pipenv’s operation in the Running scripts (badly) chapter.) An argument often raised by Pipenv proponents is that one does not need to concern itself with creating the virtualenv, and doesn’t need to care where it is. Unfortuntately, many tools require this knowledge from their user, or force a specific location, or require it to be different to the home directory.

And for a reasonable project template with release automation — well, I have my own entry in that category, called (rather unoriginally) the Python Project Template (PyPT).

Yes, setup.py files are not ideal, since they use .py code and a function execution, making access to meta information hard (./setup.py egg_info creates tool-accessible text files). Their main advantage is that they are the only format that is widely supported — pip is the de-facto default Python package manager (which is pre-installed on Windows and Mac), and other tools would require installation/bootstrapping first.

The break-neck pace of Pipenv

A good packaging tool is stable. In other words, it doesn’t change often, and it strives to support existing environments. It wouldn’t be fun to re-download everything on your system, because someone decided that /usr is now called /stuff, and all the files in /usr would become forgotten and not removed. Well, this is what Pipenv did:

Date/Time (UTC)	Event
2017-01-31 22:01	v3.2.14 released. `pipenv --three` creates `./.venv` (eg. `~/git/foo/.venv`). Last version with the original behavior of pipenv.
2017-02-01 05:36	v3.3.0 released. `pipenv --three` creates `~/.local/share/virtualenvs/foo` (to be precise, `$WORKON_HOME/foo`).
2017-02-01 06:10	Issue #178 is reported regarding the behavior change.
2017-02-01 06:18	Kenneth Reitz responds: “no plans for making it configurable.” and closes the issue.
2017-02-02 03:05	Kenneth Reitz responds: “added `PIPENV_VENV_IN_PROJECT` mode for classic operation. Not released yet.”
2017-02-02 04:29	v3.3.3 released. The default is still uses a “remote” location, but `.venv` can now be used.
2017-03-02 13:48	v3.5.0 released. The new default path is `$WORKON_HOME/foo-HASH`, eg. `~/.local/share/virtualenvs/foo-7pl2iuUI`.

Over the course of a month, the location of the virtualenv changed twice. If the user didn’t read the changelog and didn’t manually intervene (also of note, the option name was mentioned in the issue and in v3.3.4’s changelog), they would have a stale .venv directory, since the new scheme was adopted for them. And then, after switching to v3.5.0, they would have a stale virtualenv hidden somewhere in their home directory, because pipenv decided to add hashes.

Also, this is not configurable. One cannot disable the hashes in paths, even though users wanted to. It would also help people who want to mix Pipenv and virtualenvwrapper.

Pipenv is a very opinionated tool, and if the dev team changes their mind, the old way is not supported.

Pipenv moves fast and doesn’t care if anything breaks. As an example, between 2018-03-13 13:21 and 2018-03-14 13:44 (a little over 24 hours), Pipenv had 10 releases, ranging from v11.6.2 to v11.7.3. The changelog is rather unhelpful when it comes to informing users what happened in each of the releases.

Extra reading:

Kenneth Reitz, A Letter to /r/python (with some notes about bipolar disorder)
Reddit comment threads for the letter: first and second

Conclusion

Pipenv, contrary to popular belief and (now removed) propaganda, is not an officially recommended tool of Python.org. It merely has a tutorial written about it on packaging.python.org (page run by the PyPA).
Pipenv solves one use case reasonably well, but fails at many others, because it forces a particular workflow on its users.
Pipenv does not handle any parts of packaging (cannot produce sdists and wheels). Users who want to upload to PyPI need to manage a setup.py file manually, alongside and independently of Pipenv.
Pipenv produces lockfiles, which are useful for reproducibility, at the cost of installation speed. The speed is a noticeable issue with the tool. pip freeze is good enough for this, even if there are no dependency classes (production vs development) and no hashes (which have minor benefits)
Hatch attempts to replace many packaging tools, but some of its practices and ideas can be questionable.
Poetry supports the same niche Pipenv does, while also adding the ability to create packages and improving over many gripes of Pipenv. A notable issue is the use of a custom all-encompassing file format, which makes switching tools more difficult (vendor lock-in).
Pip, setup.py, and virtualenv — the traditional, tried-and-true tools — are still available, undergoing constant development. Using them can lead to a simpler, better experience. Also of note, tools like virtualenvwrapper can manage virtualenvs better than the aforementioned new Python tools, because it is based on shell scripts (which can modify the enivironment).

[1]	On a side note, the tutorial explains nothing. A prospective user only learns it’s similar to npm or bundler (what does that mean?), installs one package, and runs a `.py` file through `pipenv run`.

[2]	Note that one can’t change the file on PyPI after uploading it, so this would only be protection against rogue PyPI admins or a MitM attack (in which case you’ve got bigger problems anyways).

[3]	Fortunately, it looks in the parent directories for Pipfiles as well. Otherwise, you might end up with one environment for `foo` and another for `foo/foo` and yet another for `foo/docs` and so on…

[4]	The interruption happened by mistake due to the RAM disk running out of space, but it was actually a good thing to have happened.

↧

Bhishan Bhandari: Zip files using Python

July 17, 2018, 11:57 am

≫ Next: Juan Rodríguez Monti: Big O Algorithm Complexity Cheatsheet for common data structures

≪ Previous: Chris Warrick: Pipenv: promises a lot, delivers very little

Zipping files can be one part of a more complex operations that we perform using programming. This can usually happen when you are working on a data pipeline and/or products requiring data movement. Python has easy methods available for zipping files and directories. For the records, a ZIP is an archive file format that supports […]

The post Zip files using Python appeared first on The Tara Nights.

↧

Juan Rodríguez Monti: Big O Algorithm Complexity Cheatsheet for common data structures

July 17, 2018, 4:05 pm

≫ Next: Python Bytes: #87 Guido van Rossum steps down

≪ Previous: Bhishan Bhandari: Zip files using Python

Big O Cheatsheet Complexity Efficiency of stacks, queues, linked lists, doubly linked lists, and more data structures when inserting, deleting and searching Big O notation is defined by Wikipedia as a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. It is a member of a family of notations invented by Paul Bachmann,[1] Edmund Landau,[2] and others, collectively called Bachmann–Landau notation or asymptotic notation.

↧

Python Bytes: #87 Guido van Rossum steps down

July 17, 2018, 1:00 am

≫ Next: Rene Dudfield: Draft of, ^Let's write a unit test!^

≪ Previous: Juan Rodríguez Monti: Big O Algorithm Complexity Cheatsheet for common data structures

↧

Release Date

What Is a CSV File?

Where Do CSV Files Come From?

Parsing CSV Files With Python’s Built-in CSV Library

Reading CSV Files With csv

Reading CSV Files Into a Dictionary With csv

Optional Python CSV reader Parameters

Writing CSV Files With csv

Writing CSV File From a Dictionary With csv

Parsing CSV Files With the pandas Library

Reading CSV Files With pandas

Writing CSV Files With pandas

Conclusion

Easy to Contribute

Flask in Visual Studio Code

Django and Flask in Visual Studio

Other Highlights

Contribute to the documentation!

Give docs feedback: what would you like to see?

Reading CSV Files With `csv`

Reading CSV Files Into a Dictionary With `csv`

Optional Python CSV `reader` Parameters

Writing CSV Files With `csv`

Writing CSV File From a Dictionary With `csv`

Parsing CSV Files With the `pandas` Library

Reading CSV Files With `pandas`

Writing CSV Files With `pandas`