Quantcast
Channel: Planet Python
Viewing all 22419 articles
Browse latest View live

PyCharm: PyCharm 2020.1 EAP 6

$
0
0

We have a new Early Access Program (EAP) version of PyCharm that can be now downloaded from our website.

In PyCharm 2020.1 EAP 6, we have been working out some of the kinks to make this release cleaner and more reliable for all our PyCharm users.

Highlights and Fixes

  • Debugging your Python code is essential. So we have made sure to resolve the issue that found users not being able to use step over just after they step into, in the debugger.
  • The issue with breakpoints not being hit in files that were named protocol.py. has been fixed.
  • The Docker Compose interpreter failed in the “Updating skeleton” phase if the docker container was using a non-root user. This has been resolved, so PyCharm can now build skeletons for docker interpreter when the user doesn’t have permissions.
  • PyCharm Professional Edition users can now use the new feature introduced by the DataGrip team: Amazon Redshift stored procedures support.
  • From our WebStorm team for users of Vue, initial support for composition API has been added.
  • Plus many other fixes from across the IntelliJ Platform that will make PyCharm 2020.1 that much nicer to work with. You can [link]find the details in the release notes.

Interested?

Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP.
If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP and stay up to date. You can find the installation instructions on our website.


Matt Layman: Episode 3 - Views On Django

$
0
0
On this episode, we look at views, a major component within Django and a primary place where your code will run. Listen at djangoriffs.com. Last Episode On the previous episode, we talked about URLs and how they describe the main interface that a browser can use to interact with your application. What Is A View? A view is a chunk of code that receives an HTTP request and returns an HTTP response.

PyCon: March 6 Update on COVID-19

$
0
0
PyCon continues to closely monitor the Coronavirus (also known as COVID-19) situation.

As of March 6, PyCon 2020 in Pittsburgh, Pennsylvania is scheduled to take place.

As of this morning, there have been two presumptive positive cases of COVID-19 in Pennsylvania. These cases are in Wayne and Delaware Counties, in eastern Pennsylvania. They are not near Pittsburgh, which is in western Pennsylvania, some 300 miles (480km) away.

At the time of writing, travel or event restrictions have not been put into place anywhere in Pennsylvania.

Our staff continues to work with sponsors, vendors, and speakers. We are proactively planning safety precautions that will be implemented onsite.

We are also working to ensure that people who cannot travel to PyCon will still be able to meaningfully participate: this includes work to convert one talk, tutorial, and sponsor track to handle remote presentations (either live or recorded) as a backup plan.

It is worth reiterating PyCon’s cancellation policy again: PyCon will refund 100% of registration fees for anyone whose travel is impacted by COVID-19, or has any concerns about traveling, especially if traveling internationally.

If at the time of PyCon you feel sick, or are worried that you might have been in contact with people who have been exposed to COVID-19, we encourage you to stay home. Your registration will be 100% refunded.

If you have any questions about this, please reach out to pycon-reg at python dot org.

Additional resources on the subject:
PyCon will continue to monitor this situation and we plan to internally reassess regularly. We will publish another update on Friday, March 13, and plan to keep you informed of our plans at least each week as we approach PyCon.

PyCon: March 2 Update on COVID-19

$
0
0
Read our most recent update on COVID-19: https://pycon.blogspot.com/search/label/COVID-19


The coronavirus (also known as COVID-19) is a new virus that causes respiratory illness in people and can spread from person-to-person. Since PyCon US 2020 is scheduled in April, we want to give our community an update on our status and more information about our policy for attendees pertaining to COVID-19.

As of March 2, PyCon 2020 in Pittsburgh, PA is scheduled to happen.

The staff and board directors are actively watching the situation closely, as it continues to change rapidly. We plan to reassess the situation weekly and more frequently as we get closer to the event. This includes checking in with our Pittsburgh team for updates including from vendors and local authorities.

Currently, there have not been any COVID-19 cases in Pennsylvania and conferences continue to happen at the David L. Lawrence Convention Center. On February 28th, the Pennsylvania Department of Health stated“For the general American public, who are unlikely to be exposed to this virus at this time, the immediate health risk from COVID-19 is considered low”. These evaluations from the CDC and Pennsylvania Department of Health are the basis for our current decision to move forward with PyCon US 2020 as planned.

That said, we understand that the situation varies depending on where attendees live and work. PyCon will refund 100% of registration fees for anyone that has their travel impacted by COVID-19 or has any concerns about traveling, especially if traveling internationally. If at the time of PyCon you feel sick, or are worried that you might have been in contact with people who have been diagnosed with COVID-19, we encourage you to stay home. Your registration will be 100% refunded. If you have any questions about this, please reach out to pycon-reg at python dot org.

Additional resources on the subject:


PyCon will continue to monitor this situation and we plan to internally reassess regularly. We will publish another update on Friday, March 6, and plan to keep you informed of our plans at least each week as we approach PyCon.

Roberto Alsina: Episodio 31: Python moderno III, Pathlib, el camino a la felicidad

$
0
0

Pathlib es para crear / modificar / manipular paths. O, como decíamos antes os.path.join(crear, modificar, manipular paths) ... mas allá del chiste, es muuuuucho más legible. Veamos un ejemplo "real" de código que veo seguido y dejemos de usar os.path.

Pathlib: https://docs.python.org/3/library/pathlib.html

BreadcrumbsCollector: Implementing the Clean Architecture with Python – my book is here!

Roberto Alsina: Episodio 33: Detrás de escena: Haciendo videos.

Erik Marsja: How to Convert a Pandas DataFrame to a NumPy Array

$
0
0

The post How to Convert a Pandas DataFrame to a NumPy Array appeared first on Erik Marsja.

In this short Python Pandas tutorial, we will learn how to convert a Pandas dataframe to a NumPy array.  Specifically, we will learn how easy it is to transform a dataframe to an array using the two methods values and to_numpy, respectively. Furthermore, we will also learn how to import data from an Excel file and change this data to an array.

Now, if we want to carry out some high-level mathematical functions using the NumPy package, we may need to change the dataframe to a 2-d NumPy array.

Prerequisites

Now, if we want to convert a Pandas dataframe to a NumPy array we need to have Python, Pandas, and NumPy installed, of course. Check the post about how to install Python packages to learn more about the installation of packages. It is recommended, however, that we install Python packages in a virtual environment. Finally, if we install and download a Python distribution, we will get everything we need. Nice and easy!

How do you convert a DataFrame to an array in Python?

Now, to convert a Pandas DataFrame into a NumPy array() we can use the values method (DataFrame.values). For instance, if we want to convert our dataframe called df we can add this code: np_array = df.values.

<<<<<<

Convert a Pandas Dataframe to a Numpy Array Example 1:

In this section, we are going to three easy steps to convert a dataframe into an array.

Step #1: Import the Python Libraries

In the first example of how to convert a dataframe to an array, we will create a dataframe from a Python dictionary. The first step, however, is to import the Python libraries we need:

import pandas as pd
import numpy as np

Step #2: Get your Data into a Pandas Dataframe

In the second step, we will create the Python dictionary and convert it to a Pandas dataframe:

<pre><code class="lang-py">data = {'Rank':[1, 2, 3, 4, 5, 6],
       'Language': ['Python', 'Java',
                   'Javascript',
                   'C#', 'PHP',
                   'C/C++'],
       'Share':[29.88, 19.05, 8.17,
               7.3, 6.15, 5.92],
       'Trend':[4.1, -1.8, 0.1, -0.1, -1.0, -0.2]}

df = pd.DataFrame(data)

display(df)</code></pre>

Check the post about how to convert a dictionary to a Pandas dataframe for more information on creating dataframes from dictionaries.

Step #3 Convert the Dataframe to an Array:

Finally, in the third step, we are ready to use the values method to convert the dataframe to a NumPy array:

df.values
convert dataframe to numpy array

How to Change a Dataframe to a Numpy Array Example 2:

In the second example, we are going to convert a Pandas dataframe to a NumPy Array using the to_numpy() method. Now, the to_numpy() method is as simple as the values method. However, this method to convert the dataframe to an array can also take parameters.

Now, here’s a simple convert example, generating the same NumPy array as in the previous the example;

df.to_numpy()

If we want to convert just one column, we can use the dtype parameter. For instance, here we will convert one column of the dataframe (i.e., Share) to a NumPy array of NumPy Float data type;

df['Share'].to_numpy(np.float64)
using to_numpy to convert a dataframe to a numpy array

Convert a Dataframe to a NumPy Array Example 3:

Now, if we only want the numeric values from the dataframe to be converted to NumPy array it is possible. Here, we need to use the select_dtypes method.

df.select_dtypes(include=float).to_numpy()
convert pandas dataframe to numpy array

Note, when selecting the columns with float values we used the parameter float. If we, on the other hand, want to select the columns with integers we could use int.

Read an Excel File to a Dataframe and Convert it to a NumPy Array Example 4:

Now, of course, many times we have the data stored in a file. For instance, we may want to read the data from an Excel file using Pandas and then transform it into a NumPy 2-d array. Here’s a quick an example using Pandas to read an Excel file:

df = pd.read_excel('http://open.nasa.gov/datasets/NASA_Labs_Facilities.xlsx',
                  skiprows=1)

df.iloc[0:5, 0:5]

Now, in the code, above we read an Excel (.xlsx) file from a URL. Here, the skiprows parameter was used to skip the first empty row. Moreover, we used Pandas iloc to slice columns and rows, from this df and print it.

In the last example we will, again, use df.to_numpy() to convert the dataframe to a NumPy array:

np_array = df.to_numpy()
convert dataframe to numpy array

Summary Statistics of NumPy Array

In this last section, we are going to convert a dataframe to a NumPy array and use some of the methods of the array object.

data = {'Rank':[1, 2, 3, 4, 5, 6],
       'Language': ['Python', 'Java',
                   'Javascript',
                   'C#', 'PHP',
                   'C/C++'],
       'Share':[29.88, 19.05, 8.17,
               7.3, 6.15, 5.92],
       'Trend':[4.1, -1.8, 0.1, -0.1, -1.0, -0.2]}

df = pd.DataFrame(data)

np_array = df.select_dtypes(include=float).to_numpy()

First, we are going to summarize the two dimensions using the sum() method.

np_array.sum(axis=0)

Second, we can calculate the mean values of the two dimensions using the mean():

np_array.mean(axis=0)

Note, that we used the parameter axis and set it to “0”. Now, if we didn’t use this parameter and set it to “0” we would have calculated it along each row, sort of speaking, of the array.

Conclusion

In this Pandas dataframe tutorial, we have learned how to convert Pandas dataframes to NumPy arrays. It was an easy task and we learned how to do this using valuesand to_numpy.

The post How to Convert a Pandas DataFrame to a NumPy Array appeared first on Erik Marsja.


PyCoder’s Weekly: Issue #410 (March 3, 2020)

$
0
0

#410 – MARCH 3, 2020
View in Browser »

The PyCoder’s Weekly Logo


Advanced Usage of Python Requests

“While it’s easy to immediately be productive with requests because of the simple API, the library also offers extensibility for advanced use cases. If you’re writing an API-heavy client or a web scraper you’ll probably need tolerance for network failures, helpful debugging traces and syntactic sugar.”
DANI HODOVIC

EOF Is Not a Character

Do you know how an application knows when a read operation reaches the end of a file? In this interesting read, explore what EOF (end-of-file) really is by writing your own version of the Linux cat command in ANSI C, Python, Go, and JavaScript.
RUSLAN SPIVAK

Automate & Standardize Code Reviews for Python

alt

Take the hassle out of code reviews - Codacy flags errors automatically, directly from your Git workflow. Customize standards on coverage, duplication, complexity & style violations. Use in the cloud or on your servers for 30 different languages. Get started for free →
CODACYsponsor

Double-Checked Locking With Django ORM

The double-checked locking pattern is useful when you need to restrict access to a certain resource to stop simultaneous process from working on it at the same time. Learn how to apply this pattern in Django using the ORM and database level locking features.
LUKE PLANT

Python Bindings: Calling C or C++ From Python

What are Python bindings? Should you use ctypes, CFFI, or a different tool? In this step-by-step tutorial, you’ll get an overview of some of the options you can use to call C or C++ code from Python.
REAL PYTHON

PyPy Status Blog: PyPy and CFFI Have Moved to Heptapod

PyPy has moved the center of their development off Bitbucket and to the new foss.heptapod.net/pypy
MOREPYPY.BLOGSPOT.COM

PyCon 2020: March 2 Update on COVID-19

“As of March 2, PyCon 2020 in Pittsburgh, PA is scheduled to happen.”
PYCON.BLOGSPOT.COM

Discussions

Python Jobs

Senior Python/Django Software Engineer (London, UK)

Zego

Python Developer (Malta)

Gaming Innovation Group

Senior Python Software Engineer (London, UK)

Tessian

Sr Software Engineer Backend (Denver, CO, USA)

CyberGRX

Senior Software Developer (Vancouver, BC, Canada)

AbCellera

More Python Jobs >>>

Articles & Tutorials

Packaging and Distributing cppyy-Generated Python Bindings for C++ Projects With CMake and Setuptools

“I rewrote the cppyy CMake modules to be much more user friendly and to work using only Anaconda/PyPI packages, and to generate more feature-complete and customizable Python packages using CMake’s configure_file, while also supporting distribution of cppyy pythonization functions.”
CAMILLE SCOTT

Polynomial Regression From Scratch in Python

Polynomial regression is a core concept underlying machine learning. Learn how to build a polynomial regression model from scratch in Python by working you a real world example to predict salaries based on job position.
RICK WIERENGA

How To Build A Digital Virtual Assistant In Python

alt

The rise of AI has resulted in rapid growth of the digital assistant market, including Siri and Alexa. With Python, it’s easy to code your own digital assistant with voice activation and responses to basic inquiries. Check out ActiveState’s tutorial to learn how →
ACTIVESTATEsponsor

How to Implement a Python Stack

Learn how to implement a stack data structure in Python. You’ll see how to recognize when a stack is a good choice for data structures, how to decide which implementation is best for a program, and what extra considerations to make about stacks in a threading or multiprocessing environment.
REAL PYTHONvideo

Pass the Python Thread State Explicitly

Eric Snow has been working on solving multi-core Python via subinterpreters since 2015. In this article, core developer Victor Stinner discusses how state is passed between interpreters and summarizes his proposal for explicitly passing state to internal C function calls.
VICTOR STINNER

nbdev: Use Jupyter Notebooks for Everything

A Python programming environment called nbdev, which allows you to create complete python packages, including tests and a rich documentation system, all in Jupyter Notebooks.
JEREMY HOWARD

Dealing With Legacy Code

Learn about some of the common problems you encounter when dealing with legacy codebases and how to overcome them in an efficient way that balances delivery with code quality.
ISHA TRIPATHI

Totally Ordered Enums in Python With ordered_enum

Python’s enum.Enum does not provide ordering by default. See how ordering can be added to enums and why these orderings are useful in the first place.
WILLIAM WOODRUFF

Conditional Coverage

Sometimes your code has to take different paths based on the external environment. Make sure that your coverage follows it smoothly.
NIKITA SOBOLEV• Shared by sobolevn

Deploying Machine Learning Models: gRPC and TensorFLow Serving

Learn how to deploy TensorFlow models and consume predictions via gRPC.
RUBIKSCODE.NET

Blackfire Profiler Public Beta Open—Get Started in Minutes

Blackfire Profiler now supports Python, through a Public Beta. Profile Python code with Blackfire’s intuitive developer experience and appealing user interface. Spot bottlenecks in your code, and compare code iterations profiles.
BLACKFIREsponsor

Projects & Code

Events

PyTexas 2020

May 16 to 17, 2020 in Austin, TX
PYTEXAS.ORG


Happy Pythoning!
This was PyCoder’s Weekly Issue #410.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Talk Python to Me: #254 A Python mentorship story

$
0
0
How do you go from poking around at Python code to actually solving real problems, the right way? <br/> <br/> There are many paths. The longest one probably is to get a 4-year CS degree. Maybe faster, but pricy as well, is a solid in-person developer bootcamp. <br/> <br/> Have you considered reaching out to the community to find a mentor? Many Python meetups have project nights where folks who could help will be attending. If you're up for giving back, maybe you could become a mentor too. <br/> <br/> That's what this episode is about. We'll hear from two former guests of Talk Python, Rusti Gregory and Doug Farrell. They teamed up and are back to share their mentorship story!<br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Guests</b><br/> <br/> <b>Rusti Gregory</b>: <a href="https://talkpython.fm/episodes/show/194/learning-and-teaching-python-in-a-vacuum" target="_blank" rel="noopener">talkpython.fm</a><br/> <b>Doug Farrell</b>: <a href="https://twitter.com/writeson" target="_blank" rel="noopener">@writeson</a><br/> <br/> <b>Doug's Real Python articles</b>: <a href="https://realpython.com/team/dfarrell/" target="_blank" rel="noopener">realpython.com</a><br/> <b>Code Mentor Program</b>: <a href="https://www.codementor.io/howitworks/mentorship" target="_blank" rel="noopener">codementor.io</a><br/> <b>D-Tale Project</b>: <a href="https://github.com/man-group/dtale" target="_blank" rel="noopener">github.com</a><br/> <b>Let Me Google That For You Example</b>: <a href="https://lmgtfy.com/?q=connect+to+sqlite+sqlalchemy" target="_blank" rel="noopener">lmgtfy.com</a><br/> <b>JustPy Web Project</b>: <a href="https://justpy.io/#/" target="_blank" rel="noopener">justpy.io</a><br/> <b>Doug's Well-Grounded Python Dev Book</b>: <a href="https://www.manning.com/books/the-well-grounded-python-developer" target="_blank" rel="noopener">manning.com</a><br/></div><br/> <strong>Sponsors</strong><br/> <br/> <a href='https://talkpython.fm/brilliant'>Brilliant</a><br> <a href='https://talkpython.fm/linode'>Linode</a><br> <a href='https://talkpython.fm/training'>Talk Python Training</a>

Codementor: The Zen Of Python Is A Joke And Here Is Why

$
0
0
The zen of python needs more meditation

Codementor: Using Python Functions As Classes

Weekly Python StackOverflow Report: (ccxviii) stackoverflow python report

$
0
0

Roberto Alsina: Episodio 31: Python moderno III, Pathlib, el camino a la felicidad

$
0
0

Pathlib es para crear / modificar / manipular paths. O, como decíamos antes os.path.join(crear, modificar, manipular paths) ... mas allá del chiste, es muuuuucho más legible. Veamos un ejemplo "real" de código que veo seguido y dejemos de usar os.path.

Pathlib: https://docs.python.org/3/library/pathlib.html

EuroPython: EuroPython 2020: Launching the conference website

$
0
0

We are very excited to announce the launch of our website for EuroPython 2020:

image

EuroPython 2020 Website

Our web WG worked hard on putting the finishing touches on the website and many other team members helped update the content.

We have ported the accounts from last year to the new website, so you should be able to login with last year’s details. That said, we’d recommend changing your password as best practice.

Please note that we have also updated the profile page, so after login you will be redirected to the profile page to make any necessary adjustments.

More updates:

  • The CFP will launch as planned on Monday, March 9th.
  • We are also considering to open early bird sales on Wednesday, March 11 at 12:00 CET. However, since we’re still waiting for the VAT ID registration, we won’t be able to produce invoices yet. Those will get delivered later when we have the VAT ID - much like in Edinburgh, where we had similar delays.
  • Ticket prices are already available on the registration page. Unlike in previous years, we are publishing all prices at once, so that you can get a better overview.
  • As you probably know, the Corona virus has hit Europe and we are closely monitoring the situation. We will publish separate blog posts on this topic. So far, we are hopeful that the situation will have calmed down by July.

Enjoy,

EuroPython 2020 Team
https://ep2020.europython.eu/
https://www.europython-society.org/


Codementor: Getting started with Flask

$
0
0
Getting Started With Flask, A Python Microframework

Full Stack Python: The Best Resources for Developers to Learn Finance

$
0
0

Software developers should understand the basics of finance not only to manage their own money but also to understand how businesses' software projects are funded.

Understanding how other people who work in accounting, finance and project management think about business and finance in particular can help you make better architectural decisions when trying to build maintainable systems. Code is only one aspect of a large software project so working with others and viewing the world through their discipline will help you immensely as you advance your career.

Newsletters & Podcasts on Finance

The fastest way to take a first step in improving your financial literacy is to subscribe to a few free newsletters that regularly hit your inbox, or a podcast if listening better fits your daily routine. I read and listen to each of the following newsletters and podcasts to pick up on unfamiliar topics then do more of my own research if I do not understand what they are talking or writing about.

  • Money Stuff by Matt Levine of Bloomberg (newsletter sign up form) is a hilarious must-read daily newsletter that covers the world of finance and breaks down many absurd situations such as financial fraud, insider trading, or competing interests in credit default swaps. Amazingly, the author stays out of political topics, which I find very refreshing because many other journalists seem to force their own biases about finance down your throat even if you do not want their opinions.

  • Endless Metrics explains financial topics in a way that's easy for anyone without a finance background to understand. For example, what the heck is GDP and how do you read a GDP chart?. What I love most about this newsletter is that the author will often venture into finance-related topics he's interested in and then explain those subjects while grounding them with useful charts and data. This analytic approach closely matches how my developer brain processes information!

  • Points of Return by John Auther (newsletter sign up form). This author is incredibly knowledgeable about finance and typically provides a solid grounding in long-term fundamentals rather than the short-term hyperbole that is pervasive in cable television financial journalism.

  • Odd Lots covers kind of whatever topics the hosts find interesting such as pandemic bonds, repo market disruption, sovereign debt restructuring and emerging markets. That's why it's so good - the hosts bring on an expert in that topic and ask a ton of great questions because they want to learn what's going on for themselves. You follow along with them as they try to understand some of the oft-esoteric subject areas of finance.

Books, Websites and Magazines for Finance

Newsletters and podcasts are great for prodding you into discovering topics you did not know you needed to learn. When you discover something that you want to go deeper on in finance, here are a few of my favorite books and websites that range from the very basics of finance to broader macroeconomic data trends.

  • I learned most of my basic finance knowledge when I read Financial Intelligence for IT Professionals in graduate school (go Hoos!). The book is well-written, straightforward and accessible, particularly because it clearly targets its software developer audience.

  • Don't Quit Your Day Job uses a ton of metrics and statistics to ground their articles on financial topics that are often relevant specifically to software developers. For example, the article on How Many Developers are There in America, and Where Do They Live? is fascinating and especially useful because they explain their data sources and analysis methodology.

  • Money Magazine can be useful to pick up in paper edition for a few months to understand personal finance basics. After a few months you'll discover the articles and topics tend to recycle so there are diminishing returns to reading it after you have familiarized yourself with most of the topics.

  • Longtermtrends aggregates long term high-level financial data and displays it. I find looking at these charts gets me away from the day-to-day "oh the stock market is down" and towards thinking about what happens when you invest money over many years or decades.

Specific Articles on Financial Topics

The following individual articles I have found to be both well-written and extremely useful for specific scenarios such as evaluating stock-based equity compensation, or negotiating your salary.

Roberto Alsina: enum_switch: a enum-based switch thing for Python

$
0
0

I am doing a series of videos (spanish only!) about "modern Python", showing the modern replacements for things that are ... dense in their original forms.

So, I showed Poetry as an alternative to writing your setup.py and Click as a way to do things easier than argparse, and Pathlib instead of os.path and then I wanted to show Enums. Which are not so new since they have been there since Python 3.4 but I feel they are not used widely enough.

And then I noticed that they help do a "safer" version of the classical Python version of C's switch / case where you can be sure of not leaving any values unhandled.

So, I wrote a little thing and pushed it to PyPI: https://pypi.org/project/enum-switch/

It's a tiny project, but here's an example of how you use it.

fromenumimportEnumfromenum_switchimportSwitchclassColor(Enum):RED=1GREEN=2BLUE=3classMySwitch(Switch):defRED(self):return"Apple"defGREEN(self):return"Kiwi"defBLUE(self):return"Sky"switch=MySwitch()print(switch(Color.RED))Apple

If MySwitch was missing one of those "handlers" for the Enum values? That's an exception. If you don't want to define them all? Do a default() there.

I like this solution, but am very interested to know if someone has come up with a better one? Comments are enabled, feel free to tell me :-)

Zero-with-Dot (Oleg Żero): Weighted K-Means Clustering example - artificial countries

$
0
0

Introduction

One of fields where WKMC algorithm can be applied is demographics. Imagine a situation, in which you would like to see how people group or would group if all administation divisions or historical conflicts disappeared or ethnical, national or tribal identity would not matter? How would then people go about creating communities?

In this post, we will use the WKMC algorithm to find out how people would group only based on their present geographical distribution. For this reason we will look at two parameters:

  • Geographical coordinates,
  • Population density at specific location.

As this is a curiosity-driven simulation, it is a great simplification that possesses purely hypothetical character. The simulation does not take into account conditions such as natural resources or terrain barriers that would prevent people from settling. Antarctica is the only exception though. We exclude it, as it is a large part of the map, too large for the algorithm to ignore, yet almost completely unhabitable.

The dataset

We will use population density dataset available at NASA. The site offers four versions of the dataset, available in different resolution, which are good for experimentation.

/assets/using-kmc-on-worlds-population/world-population.png Figure 1. World population density map. For visibility, we have taken logarithm of every pixel.

The dataset comes in four different resolution versions. Naturally, the highest resolution one gives the best results, although the computatin tie necessary may become an issue. To get the dataset, execute:

mkdir dataset
wget -O dataset/world.csv "https://neo.sci.gsfc.nasa.gov/servlet/RenderData?si=875430&cs=rgb&format=CSV&width=360&height=180"

then:

1
2
3
4
5
6
7
8
importpandasaspddf=pd.read_csv('./dataset/world.csv',header=-1)df=df.replace(df.max().max(),0)df=df.loc[10:145,:]df=df.reset_index()df=df.drop(columns=['index'])df.columns=range(df.shape[1])

For this dataset, the geographical longitude and latitude are simply expressed as integer numbers and treates as (x, y) indices of a matrix, and the map has cylindrical representation. At the same time, every element of this matrix represents population density of people living at a particular region.

The oceans are marked as 99999.0, which is unnatural and thus we put it to zero. Later, we remove a “strip” of Arctic ocean (just to speed up the computation slightly) and Antarctica, as metioned earlier. Then, we re-enumerate the indices for rows and columns to have them count from zero.

Feature engineering

Before we proceed, we need to transform our dataset a bit in order to fit in with the clustering problem. First of all, we need to change the representation of the dataset from a population density matrix to a list of longitude and latitude coordinate points, in order for the WKMC be able to calculate distance. However, we also need to keep the population density value, which both us and the machine can interpret as weight of each data point. In other words, large settlements such as big cities will have much stronger tendency to pull the nearest points into the clusters comparing to rural areas or deserts.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
latitude_idx=df.index.to_numpy()longitude_idx=df.columns.to_numpy()lat_max=len(latitude_idx)lon_max=len(longitude_idx)x0=latitude_idx.repeat(lon_max)x1=np.tile(longitude_idx,lat_max)x=df.to_numpy()dd=pd.DataFrame({'x0':x0,'x1':x1,'weight':x.flatten()})

First, we extract the latitude and longitude from the dataframe object. Then, we repeat the latitude and latitude values, so that they form unique pairs ordered along some new index. We also dump all the weights and flatten them to a series, which we join into a new dataframe, so we can keep the refernce.

The world is round…

We know that nowadays people tend to put everything for a debate, but no… the Earth is still round. Here, we have a cylindrically represented map, which has an important consequence: the left and the right edge of the map are connected together. It is therefore vital to ensure that our algorithm will not treat two points residing close to the two edges as very separated.

Because the skearn API does not allow us to override the distance metrics easily, we have to parametrize the dataset differently:

1
2
3
dd['latitude']=(x0/x0.max()-0.5)*2.0dd['longitude_sin']=np.sin((x1/x1.max())*np.pi*2.0)dd['longitude_cos']=np.cos((x1/x1.max())*np.pi*2.0)

The longitude is the dimention that is cyclic, and if we scaled it to an interval of [0:2.0*np.pi], it would literally become the longitudonal angle. The problem is that the difference between 1st and the 360th degree is 360 degrees, while the distance should be equal to one degree. Therefore, we can decompose this dimension into two features, and use sine and cosine, respectively.

The latitude should not be cyclic here. However, if we look at the longitude related features we just defined, we can see that the maximum that can occur along there axes is 2. Therefore, in order to compensate for it when scaling of latitude, we need to ensure that the maximum distance along it is also 2.

Because our dataframe dd keeps all the references, we can simply add the new features into it, which we just did.

Solution

Now, our feature matrix X can be constructed by referecing all points through latitude and the sine/cosine projectins of the longitude. At the same time, we take the population density to act as weights. Before we do that, however, we remove all points whose weight is strickly zero. As our plane’s surface is around 70% water, it can drastically reduce the computation needed.

1
2
3
4
5
6
7
8
9
10
11
12
N_CLUSTERS=195dd=dd[dd['weight']!=0.0]dd=dd.reset_index()dd=dd.drop(columns=['index'])X=dd[['latitude','longitude_sin','longitude_cos']].to_numpy()weights=dd['weight'].to_numpy()dd['cluster']=KMeans(n_clusters=N_CLUSTERS).fit_predict(X,sample_weight=weights)

The number 195 is not accidental. Currently, we have 195 regions recognized as countries. We can use this number as reference in our new world.

Having solved the WKMC problem, we need to “recombine” the solution to the old coordinates, which is fairly easy, given we have kept reference to the original indices.

1
2
3
4
5
XX=-1*np.ones((lat_max,lon_max))foriinrange(len(dd)):u,v=dd['x0'].iloc[0],dd['x1'].ilox[i]cluster_id=dd['cluster'].iloc[i]XX[u,v]=cluster_id

Now, let’s plot the results. We will overlay the original population density map with the new “countries”. (Note that the np.where function is only used here for enhancing of the plot.)

1
2
3
4
5
6
7
8
9
10
11
12
fig,ax=plt.subplots(1,1,figsize=(24,24))ax.imshow(np.where(XX==-1,np.NaN,XX),cmap='Spectral',alpha=0.75)ax.imshow(df.apply(lambdax:np.log(x)),alpha=0.25,cmap='gray')ax.contour(longitude_idx,latitude_idx,np.where(XX==-1,-10,XX),levels=N_CLUSTERS,linewidths=0.1,alpha=0.5,colors='k')plt.show()
/assets/using-kmc-on-worlds-population/world-divided.png Figure 2. The world with 195 countries defined through K-Means Clustering algorithm.

Discussion

We have finally clustered the population. It is useful to observe the consequences of WKMC algorithm’s assumptions.

First of all, as we have removed the points of zero weights, no clusters’ labels are assinged to those points. However, the larger the population density, the more concentrated the clusters became. This is especially visible in regions of India and China that are ones of the most densely populated regions in the world. Siberia and Northern parts of Canada, Greenland, Sahara and Australia form larger clusters.

Secondly, by scaling of the features (remember, all features are in range [-1, 1]), the clusters do not exhibit anisotropy in any of directions. In other words, if e.g. the x-axis had 5 times the range, we would expect it’s influence to be much stronger and thus the cluster would be elongated vertically.

Finally, by ensuring the continuity in East-West axis, our clusters are not distorted by presence of the boundary conditions.

Conclusions

We have seen how K-Means Clustering algorithm can be put into use in our hypothetical world. However, the usage just demonstrated is actually very traditional, and can be applied in similar situations, giving especially good results when working on smaller maps. The algorithm helps to spot similarities that exist regardless of any administrative divisions.

Mike Driscoll: PyDev of the Week: Tommy Falgout

$
0
0

This week we welcome Tommy Falgout (@lastcoolname) as our PyDev of the Week! Tommy works on the Robo-Clippy project. You can see what else he is up to by checking out his website. Let’s take a few moments to get to know Tommy better!

Can you tell us a little about yourself (hobbies, education, etc):

I grew up in the bayous of Louisiana, and while everyone else was interested in 4-wheeling and hunting, I gravitated towards computers and spent hours on my Commodore 64.  Early on, I knew what it meant to be an outcast.
As I matured, my hobbies became numerous and varied, but all focused around my passion of building.  For 5 years hosted and competed in Dallas/Fort Worth’s annual trebuchet competition: Slingfest, and was even featured on an episode of Dude Perfect on Nickelodeon as a Trebuchet expert (complete with my own IMDB page!).  I also volunteer at a local Makerspace in Plano, TX (TheLab.ms), built a LEGO Robotic Clippy and competed in the Red Bull Soapbox Derby race.  After a few exciting near-misses from bodily harm, I’ve settled down and recently taken up crochet and hobby electronics.

Why did you start using Python?

My first experience with Python was over 15 years ago when I needed to automate ~100 network switches and I had to choose between Python and Perl.  I will admit, I chose Perl because I liked its terseness and didn’t like using forced spaces.  Looking back, that was a silly reason as I created really unreadable code and hardly anyone uses Perl anymore. (Except for maybe Larry Wall)
My second experience was about 10 years later when working for Yahoo and I wrote their Network Automation Discovery System.  I took my lessons learned from my previous experience and wrote it in Python.


What other programming languages do you know and which is your favorite?

I’ve written production code in C, C++, Java, PHP, Python, Javascript, Typescript, Perl and Clojure while dabbling in Go, Rust, Erlang and Ruby.

Funny enough, my favorite is assembly.  Because I could trust it.  I never wrote anything useful; however, there’s a lot less surprises when there’s few language primitives.
Being realistic my favorite is Python, as it’s easy to get started and the community support is strong so there’s modules for almost everything.

What projects are you working on now?

Outside of work, three main projects are Robo-ClippyLED Lanyard and whatever crochet pattern inspires me.  2 of those 3 projects are written in Python, only because I haven’t incorporated Python and crochet. Yet.

Which Python libraries are your favorite (core or 3rd party)?

I’m a fan of simple interfaces and few surprises.  When getting started with Python the “request” library really hit that for me because most of my projects start with HTTP API.  I’m also a huge fan of using a REPL.

What motivates you to write on your blog?

My mother was a librarian, so information sharing is in my DNA.  Since I love integrating technology, I often hit fringe cases that others haven’t hit but will soon enough.  I want to share my findings with the world in hopes of saving others time.
I’ve also been in meetings where the client said that they knew me because of a blog article I’ve written.  That felt amazing as I had an instant connection with them and the rest of the meeting went extremely smooth because of a background of trust.

I see you help organize a maker space. How did you get into organizing?

Because I love building and was already organizing a trebuchet competition, it made sense to join forces with the Plano, TX Makerspace and build something together.  It’s exciting to work with like-minded builders who aren’t afraid to try something new and love bounce ideas off of each other.  If you’re not already involved in a local Makerspace, I highly recommend it because they are a great hub of knowledge and experience that you wouldn’t get otherwise.  For example, I got up the confidence to build and fly my a drone (Thanks Pat and Brian!).  I’ve made many friends there that I will never forget.

Do you have any advice for other developers that would like to create a meetup?

If something has sparked a passion in you, the best thing you can do is share it.  So often, we’re waiting for someone else to take the reins, but as a programmer you’re are a natural leader. You have an innate ability to create ideas and see them to completion.  Yes it’s scary, and yes, you might fail.  But it’s worth it.

Is there anything else you’d like to say?

I’ve love to share my favorite tech talk, by Rich Hickey: Simple Made Easy.   His analysis of Simple vs Easy, Hard vs Complex really stuck with me and changed the way I build systems.
Thanks for doing the interview, Tommy!

The post PyDev of the Week: Tommy Falgout appeared first on The Mouse Vs. The Python.

Viewing all 22419 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>