Weekly Python StackOverflow Report: (ccxiii) stackoverflow python report

February 1, 2020, 12:14 am

≫ Next: Tryton News: Newsletter February 2020

≪ Previous: Test and Code: 99: Software Maintenance and Chess

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2020-02-01 08:14:33 GMT

↧

Tryton News: Newsletter February 2020

February 1, 2020, 1:30 am

≫ Next: Peter Hoffmann: Azure Data Explorer and Parquet files in the Azure Blob Storage

≪ Previous: Weekly Python StackOverflow Report: (ccxiii) stackoverflow python report

@ced wrote:

customers-users-color-wheel-6231.jpg1280×853 178 KB
Tryton is a business software platform which comes with a set of modules that can be activated to make an ERP, MRP, CRM etc.
During the last month, we kept improving the user experience with many changes that fine tune how Tryton works. There was also a major improvement to the product cost calculations.
Contents:
Changes for users
Changes for developers
Changes For The User
We simplified the stock accounting configuration to only use 3 accounts: Stock, IN and OUT.
The bank name and currency code are now included in the record names of bank accounts.
Tryton already has builtin protection against brute force attacks by returning a 429 HTTP status. However, as this can also happen on valid user requests, when the user fails to login successfully multiple times in quick succession, we have improved the clients so they display a more user friendly error message explaining to try again later.
An attempt is now made to restore the web client session when the user manually enters in the database and login. If the user’s session is no longer valid then the login process continues as usual.
Now the wizard to credit an invoice asks for the invoice date to put on the credit note. This is useful when refund is checked because the credit note is automatically posted and does not allow modification after that.
The separators, used with the name of a field, are now considered a label for the field. This means that the keyboard shortcuts for quick navigation also work for them just like with other labels. And they are then used by screen readers, and for accessibility, to describe the field.
The product cost calculation has received major improvements. The cost is automatically recalculated when the cost of incoming moves is changed (for example by a landed cost). The production costs are also recalculated automatically in cascade. The cost of outgoing moves are updated with the new cost for that date. This way the product cost history is always up to date.
The status of projects and tasks can now be directly created by users. It is possible to define a minimal progression of statuses which projects and tasks can pass through. Each status automatically becomes a tab on the project window.
The default frequency for asset depreciation can now be configured per company. This is done because normally all assets of the same company are depreciated at the same frequency.
Changes For The Developer
We now use tests to ensure that all SQL functions are supported by all the supported database back ends. This change targeted mainly SQLite which does not natively support all of them. Any missing functions were implemented with a custom version.
Tryton now checks the stock quantity earlier when quoting or confirming sales. This prevents the unnecessary loss of a sales number due to the transaction being rolled-back.
The xalign and yalign attributes are now supported by the <group/> tag. They allow different alignments to be set when the widget is not expanded.

Posts: 1

Participants: 1

Read full topic

↧

Peter Hoffmann: Azure Data Explorer and Parquet files in the Azure Blob Storage

January 31, 2020, 4:00 pm

≫ Next: Go Deh: Sharing another way?

≪ Previous: Tryton News: Newsletter February 2020

Azure Data Explorer

With the heavy use of Apache Parquet datasets within my team at Blue Yonder we are always looking for managed, scalable/elastic query engines on flat files beside the usual suspects like drill, hive, presto or impala.

For the following tests I deployed a Azure Data Explorer cluster with two instances of Standard_D14_v2 servers with each 16 vCores, 112 GiB ram, 800 GiB SSD storage and a network bandwidth class extremely high (which corresponds to 8 NICs).

Data Preparation NY Taxi Dataset

Like in the understanding parquet predicate pushdown blog post we are using the NY Taxi dataset for the tests because it has a reasonable size and some nice properties like different datatypes and includes some messy data (like all real world data engineering problems).

mkdircsvmkdirparquetcdcsvforiin{01..12};dowgetgethttps://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2018-$i.csvdone

We can convert the csv files to parquet with pandas and pyarrow:

importpandasaspdimportpyarrowaspaimportpyarrow.parquetaspqmonths=range(1,13)defcsv_to_parquet(month):filename="csv/yellow_tripdata_2018-{:02d}.csv".format(month)df=pd.read_csv(filename,dtype={"store_and_fwd_flag":"bool"},parse_dates=["tpep_pickup_datetime","tpep_dropoff_datetime"],index_col=False,infer_datetime_format=True,true_values=["Y"],false_values=["N"],)df=df[(df['tpep_pickup_datetime'].dt.year==2018)&(df['tpep_pickup_datetime'].dt.month==month)]filename="parquet/yellow_tripdata_2018-{:02d}.parquet".format(month)df.to_parquet(filename)forminmonths:csv_to_parquet(m)

Each csv file has about 700MiB, the parquet files about 180MiB and per file about 10 million rows.

Data Ingestion

The Azure Data Explorer supports control and query commands to interact with the cluster. Kusto control commands always start with a dot and are used to manage the service, query information about it and explore, create and alter tables. The primary query language is the kusto query language, but a subset of T-SQL is also supported.

The table schema definition supports a number of scalar data types:

Type	Storage Type (internal name)
bool	I8
datetime	DateTime
guid	UniqueId
int	I32
long	I64
real	R64
string	StringBuffer
timespan	TimeSpan
decimal	Decimal

To create a table for the NY Taxi dataset we can use the following control command with the table and columns names and the corresponding data types:

.createtablenytaxi(VendorID:int,tpep_pickup_datetime:datetime,tpep_dropoff_datetime:datetime,passenger_count:int,trip_distance:real,RatecodeID:int,store_and_fwd_flag:bool,PULocationID:int,DOLocationID:int,payment_type:int,fare_amount:real,extra:real,mta_tax:real,tip_amount:real,tolls_amount:real,improvement_surcharge:real,total_amount:real)

Ingest data into the Azure Data Explorer

The .ingest into table command can read the data from an Azure Blob or Azure Data Lake Storage and import the data into the cluster. This means it it ingesting the data and stores it locally for a better performance. Authentication is done with Azure SaS Tokens.

Importing one month of csv data takes about 110 seconds. As a reference parsing the same csv file with pandas.read_csv takes about 19 seconds.

One should always use of obfuscated strings (the h in front of the string values) to ensure that the SaS Token is never recorded or logged:

.ingestintotablenytaxih'https://phoffmann.blob.core.windows.net/nytaxi/csv/yellow_tripdata_2018-01.csv?sp=r&st=2020-02-01T09:20:07Z&se=2020-02-24T20:20:07Z&spr=https&sv=2019-02-02&sr=b&sig=XXX'with(ignoreFirstRecord=true);

Ingesting parquet data from the azure blob storage uses the similar command, and determines the different file format from the file extension. Beside csv and parquet quite some more data formats like json, jsonlines, ocr and avro are supported. According to the documentation it is also possible to specify the format by appending with (format="parquet").

.ingestintotablenytaxi_parqueth'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-01.parquet?sp=r&st=2020-02-01T09:17:43Z&se=2020-02-27T20:17:43Z&spr=https&sv=2019-02-02&sr=b&sig=xxx';

Loading the data from parquet only took 30s and already gives us a nice speedup. One can also use multiple parquet files in the blob store to load the data in one run, but I did not get a performance improvement (e.g better than duration times number files, which I interpret that there is no parallel import happening):

.ingestintotablenytaxi_parquet(h'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-01.parquet?sp=r&st=2020-02-01T09:17:43Z&se=2020-02-27T20:17:43Z&spr=https&sv=2019-02-02&sr=b&sig=xxx',h'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-02.parquet?sp=r&st=2020-02-01T09:17:43Z&se=2020-02-27T20:17:43Z&spr=https&sv=2019-02-02&sr=b&sig=xxx',h'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-03.parquet?sp=r&st=2020-02-01T09:17:43Z&se=2020-02-27T20:17:43Z&spr=https&sv=2019-02-02&sr=b&sig=xxx');

Once the data is ingested on can nicely query it using the Azure Data explorer either in the Kusto query language or in T-SQL:

Kusto Query

Query External Tables

Loading the data into the cluster gives best performance, but often one just wants to do an ad hoc query on parquet data in the blob storage. Using external tables supports exactly this scenario. And this time using multiple files/partitioning helped to speed up the query.

.createexternaltablenytaxi_parquet_external(VendorID:int,tpep_pickup_datetime:datetime,tpep_dropoff_datetime:datetime,passenger_count:int,trip_distance:real,RatecodeID:int,store_and_fwd_flag:bool,PULocationID:int,DOLocationID:int,payment_type:int,fare_amount:real,extra:real,mta_tax:real,tip_amount:real,tolls_amount:real,improvement_surcharge:real,total_amount:real)kind=blobdataformat=parquet(h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-01.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-02.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-03.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-04.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-05.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-06.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-07.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-08.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-09.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-10.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-11.parquet;xxx',h@'https://phoffmann.blob.core.windows.net/nytaxi/parquet/yellow_tripdata_2018-12.parquet;xxx')with(docstring="NyTaxiDataset");

Querying external data looks similar but has the benefit that one does not have to load the data into the cluster. In a follow up post I'll do some performance benchmarks. Based on my first experiences it seems like the query engine is aware of some of the parquet properties like columnar storage and predicate pushdown, because queries return results faster than loading the full data from the blob storage (with the 30mb/s limit) would take.

external_table("nytaxi_parquet_external")|take100;

external table

Export Data

For the for the sake of completeness I'll just show an example how to export data from the cluster back to a parquet file in the azure lob storage:

.exporttoparquet(h@"https://phoffmann.blob.core.windows.net/nytaxi/export/test.parquet;xxx")<|nytaxi|limit100;

↧

Go Deh: Sharing another way?

February 1, 2020, 8:48 am

≫ Next: Ionel Cristian Maries: Speeding up Django pagination

≪ Previous: Peter Hoffmann: Azure Data Explorer and Parquet files in the Azure Blob Storage

Tickled!

Sharing came up in something I was reading that got me revisiting the Wikipedia page on the Thue Morse Sequence. Tucked away to the right is a static image of ones and zeroes with the caption:

"When counting in binary, the digit sum modulo 2 is the Thue-Morse sequence"

Now I had already blogged about the Thue-Morse sequence with respect to sharing fairly:

If you had two people, Asia and Bain chosing from different cash bounties then if Asia went first she'd choose the largest cash bounty, then Bain would chose the next larget, ... if they alternate like this then Asia, who went first, would always get more than Bain; furthermore, Asias total accumulated bounty would increase over Bains over time like this:

In my earlier blog entry: Sharing Fairly, I looked into the Thue-Morse sequence for who gets to pick next and showed that using that sequence, and itsextension for more than two people sharing, the person with the most accumulated wealth switches over time and does not diverge like in the simple taking of turns shown above.

I showed how using a Thue-Morse sequence for multiple people sharing gave a "fairer" distribution of wealth, for example this graph of accumulated wealth for three people:

Old Code

My code to generate the Thue Morse sequence for sharing between multiple people was this function:

defthue_morse(persons='ABC', mx=20, tutor=True)

Its algorithm works, and was straight forward to describe as a natural extension of the two-person case, but could generate more terms than requested that are then trimmed.

Back to the tickle

"When counting in binary, the digit sum modulo 2 is the Thue-Morse sequence"

Now that method might be coded to only give the number of terms required. I might code it as a generator. But as it stands it applies only to two people (binary).

I thought that an equivalent statement for many people might be:

"When counting base b, the digit sum modulo b is the Thue-Morse sequence of fairer sharing between b people"

I set out to code it.

Code Journey

"When counting base b" ... "digit sum" ...

I am going to need to express an integer in different bases, and sum the digits.

Base changing

Normally when expressing an integer in a different base, b, it is for printout and the routine returns characters: The columns in the string denote, by position, the multiples of successive powers ob b.
For example:
"123" base 10 == 3 * 10**0 + 2 * 10**1 + 1 * 10**2

If the base goes beyond 10 such as in hexadecimal, then letters A... are used to represent numbers beyond 9 as single characters.

We don't want that!

We need to sum the digits so in the _basechange_int function a list is returned of integers to make for easier summation later.
Integer 123 base 10 would return the list [1, 2, 3]

People representation

I am used to generating the Thue-Morse sequence as a string ofupper-case characters where each individual characters corresponds to one persons turn in sharing. as in my original, I will code my new function to have a persons parameter which is a string of different characters, each representing a person.
The number of persons becomes the base, b, of my altered statement above.
The algorithm will generate an int that will then need to be represented as a corresponding character from the persons string.

Generate

I decided to code the algorithm as a generator of successive terms from the given persons. islice is my friend :-)

Thue-Morse sequence generator from digit counts

# -*- coding: utf-8 -*-
"""
Created on Sat Jan 18 08:34:29 2020

@author: Paddy3118
"""

#%%

fromitertoolsimport count, islice

def_basechange_int(num, b):
"""
    Return list of ints representing positive num in base b

>>> b = 3
>>> print(b, [_basechange_int(num, b) for num in range(11)])
    3 [[0], [1], [2], [1, 0], [1, 1], [1, 2], [2, 0], [2, 1], [2, 2], [1, 0, 0], [1, 0, 1]]
>>>
"""
if num == 0:
return [0]
    result = []
while num != 0:
        num, d = divmod(num, b)
        result.append(d)
return result[::-1]

def_sumdigits_int(numlist, b):
"Return the sum of numbers in numlist modulo b, in base b"
returnsum(numlist) % b

deffairshare(persons='ABC'):
"""
    Sequence of fairer sharing between persons
    We have:
"When counting in binary, the digit sum modulo 2 is the Thue-Morse sequence"

    This computes:
        When counting base b, the digit sum modulo b is the Thue-Morse sequence
        of fairer sharing between b people
"""

    base = len(persons)     # number base
    num2person = dict(zip(range(base), persons))
for i in count():
yield num2person[_sumdigits_int(_basechange_int(i, base), base)]


if __name__ == '__main__':
    PERSONS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
for b in [10, 2, 3, 5, 7]:
#print(b, [_basechange_int(num, b) for num in range(11)])
print(b, ''.join(islice(fairshare(PERSONS[:b]), 44)))

Example runs:

In [24]: b = 3; print(b, ''.join(islice(fairshare(PERSONS[:b]), 44)))
3 ABCBCACABBCACABABCCABABCBCABCACABABCCABABCBC

In [25]: b = 2; print(b, ''.join(islice(fairshare(PERSONS[:b]), 44)))
2 ABBABAABBAABABBABAABABBAABBABAABBAABABBAABBA

In [26]:

Checks

I can't do the maths to formally proove it, but I have compared outputs between the old and the new and they generate the same sequences.

P.S:

I decided to start the "Fairshare between two and more" task on Rosetta Code based on this.

↧

Ionel Cristian Maries: Speeding up Django pagination

February 1, 2020, 2:00 pm

≫ Next: Anarcat: 20 most significant programming languages in history

≪ Previous: Go Deh: Sharing another way?

I assume you have already read Optimizing the Django Admin Paginator. If not, this is basically the take-away from that article:

classInfinityPaginator(Paginator):@propertydefcount(self):return99999999999classMyAdmin(admin.ModelAdmin):paginator=InfinityPaginatorshow_full_result_count=False

Though the article has a trick with using a statement_timeout, I think it's pointless. In the real world you should expect to get that overt 99999999999 count all over the place. Unless you have some sort of toy project it's very likely your database will be under load. Add some user/group filtering and you'll be always hit the time limit.

What if you could make the count more realistic, but still cheap? Using a random number would be too inconsistent. Strangely enough someone decided that it's a good idea to put a count estimate idea in the postgresql wiki and, for reasons I decided to see how hard is to implement it in django, in a somewhat generalized fashion

From a series of "Just because you can, you have to try it!", behold [1]:

classEstimatedQuerySet(models.QuerySet):estimate_bias=1.2estimate_threshold=100defestimated_count(self):ifself._result_cacheisnotNone:returnself.count()try:qs=self.model._base_manager.all()compiler=self.query.get_compiler('default')where,params=compiler.compile(self.query.where)qs=qs.extra(where=[where]ifwhereelseNone,params=params)cursor=connections[self.db].cursor()query=qs.query.clone()query.add_annotation(Count('*'),alias='__count',is_summary=True)query.clear_ordering(True)query.select_for_update=Falsequery.select_related=Falsequery.select=[]query.default_cols=Falsesql,params=query.sql_with_params()logger.info('Running EXPLAIN %s',sql)cursor.execute("EXPLAIN %s"%sql,params)lines=cursor.fetchall()logger.info('Got EXPLAIN result:\n> %s','\n>   '.join(lineforline,inlines))marker=' on %s '%self.model._meta.db_tableforline,inlines:ifmarkerinline:forpartinline.split():ifpart.startswith('rows='):logger.info('Found size (%s) estimate in query EXPLAIN: %s',part,line)count=int(int(part[5:])*self.estimate_bias)ifcount<self.estimate_threshold:# Unreliable, will make views with lots of filtering# output confusing results.# Just do normal count, shouldn't be that slow.# (well, not much slower than the actual query)returnself.count()else:returncountreturnqs.count()exceptExceptionasexc:logger.warning("Failed to estimate queryset count: %s",exc)returnself.count()

Because the normal count method is unchanged you can use that QuerySet everywhere.

classMyModel(models.Model):...objects=EstimatedQuerySet.as_manager()

Now using the estimated_count in the paginator will uncover a problem: sometimes it will underestimate. You can play with the estimate_bias but it will never work well with edge-cases (like heavy filtering).

A good compromise is to tune it for the general case and for everything else trick the pagination to always increment the page count when you're looking at the last page.

classEstimatedPaginator(Paginator):defvalidate_number(self,number):ifnumber>=self.num_pages:# noinspection PyPropertyAccessself.num_pages=number+1returnsuper(EstimatedPaginator,self).validate_number(number)@cached_propertydefcount(self):returnself.object_list.estimated_count()classMyAdmin(admin.ModelAdmin):paginator=EstimatedPaginatorshow_full_result_count=False

If you think that # noinspection PyPropertyAccess is funny it's because it is - num_pages is a cached_property and the following line destroys PyCharm's assumptions about how non-data descriptors should work.

It also goes against sane practices like not having unexpected side-effects. But alas, it gets worse. There's another problem there: there's always going to be a next page even if the current page is empty (or not full). To fix that we mess again with the internals:

def_get_page(self,objects,*args,**kwargs):# If page ain't full it means that it's the real last page, remove the extra.iflen(objects)<self.per_page:# noinspection PyPropertyAccessself.num_pages-=1returnsuper(EstimatedPaginator,self)._get_page(objects,*args,**kwargs)

One could still input an out of bounds page number through in the URL but I think it's pointless to handle that.

What about that PyPropertyAccess? *

Suppose you have this:

classcached_property:def__init__(self,func,name=None):self.func=funcdef__get__(self,instance,cls=None):ifinstanceisNone:returnselfres=instance.__dict__[self.func.__name__]=self.func(instance)returnresclassFoobar:@cached_propertydeffoo(self):return"bar"

Because cached_property doesn't implement a __set__, assignments will be made through the instance's __dict__:

>>> x=Foobar()>>> x.foo='123'>>> x.foo'123'>>> y=Foobar()>>> y.foo+='123'>>> y.foo'bar123'

I suspect that PyCharm doesn't discern data vs non-data descriptors at all. Or perhaps it's a subtle hint that it's a bad idea to assign to something that doesn't implement a setter?

[1]	Though you should be wondering if you want to take a look at this hard-to-test method every time you upgrade Django ...

↧

Anarcat: 20 most significant programming languages in history

February 2, 2020, 7:55 am

≫ Next: Zero-with-Dot (Oleg Żero): Surviving zombie apocalypse with random search algorithm

≪ Previous: Ionel Cristian Maries: Speeding up Django pagination

This is a preposterous table I just made up after reading Wikipedia's History of Programming Languages. I was trying to figure out which programming language or environment this article might be refering to. The article talks about some of the Canadian federal government's computer systems "falling apart" because they are "60 years" old. Everyone cried "COBOL" but I figured there might be other culprits.

Year	Language	Founder	Claim to fame
1954	Fortran	IBM	first high-level language with functional implementation
1958	LISP	MIT	first homoiconic language
1959	COBOL	US DoD	still in use
1964	BASIC	Darmouth College	saw explosive growth with PCs in 1970s
1972	C	AT&T	early systems language, foundation of UNIX
1972	Prolog	Alain Colmerauer	first (and only?) "logic programming" language
1978	SQL	Oracle?	first commercial language to use the relational model, still in use in most database systems
1980	C++	AT&T Bell Labs	major systems programming language
1986	Objective C	Apple Inc.	main Apple language until the introductino of Swift
1986	Erlang	Ericsson	originally written in Prolog, highly-available, hot-swapping, distributed language
1987	Perl	Larry Wall	every sysadmin can write-only it
1990	Haskell	University of Glasgow	first type classes implementation
1991	Python	Guido van Rossum	ease of use and readability, built Dropbox
1995	Ruby	Yukihiro Matsumoto	built GitHub
1995	Javascript	Netscape	you're running it right now, most popular language on stackoverflow
1995	Java	Sun	"write once, run everywhere", consistently the most popular language on the TIOBE index
1995	PHP	Rasmus Lerdorf	personal project, built Facebook, which eventually replaced it with Hack (2014)
2001	C#	Microsoft	multi-paradigm
2009	Go	Google	readable, concurrent, high-performance
2010	Rust	Mozilla	memory-safe, concurrent, high-performance

Some interesting observations:

all of those languages are still in use
in particular, COBOL is indeed 60 years old and still in use by governmental agencies, according to a US congress report
I am also aware that Fortran is still in use in institutions, particularly research, and particularly Environnement Canada
a significant number of programming languages came from research (Lisp, Prolog, Haskell, Python, Ruby), but that has basically disappeared in the last two decades
the list is skewed towards languages I learned as I reached adult life
yet I can't help but think some years were especially fertile (like 1995) and that things seem to be slowing down - after all, all the languages before the new ones still exist as legacy code that needs to be rewritten
in this list, PHP is the only language that was not designed by an author working under a large corporation or university - it was, after all, designed for Personal Home Pages...

But for me, the most significant thing I find in this list is that every corporate ruler eventually creates its own programming language. IBM made Fortran. The US government made COBOL. AT&T made C. Ericsson made Erlang. Google made Golang. Facebook made Hack. And it's interesting to note that some languages came up shortly before the business crashed (e.g. Ericsson, Netscape, Sun) or a dark period (Apple post-Jobs, Google post don't-be-evil, Microsoft anti-trust era). Maybe this means Mozilla is about to crash?

Notable omissions and regrets

I originally jotted this down as a quick list of 18 languages I found while reviewing the Wikipedia page. Then I couldn't help myself and added Prolog, rounding up to 20 languages.

Then I realized I had forgotten Java, one of the most popular programming languages and the foundation of many corporations. So I was stuck and had to remove some things. Besides, there's only so much stuff that can fit in here. So here's the list of langugages that did not make it.

Year	Language	Founder	Claim to fame	Excluded
1940?	Assembly	Alan Turing	first concept of a stored program	not high level
1970	Pascal	Niklaus Wirth	first major decent language with complex datatypes	mostly dead
1971	Shell	Ken Thompson / AT&T Bell Labs	interactive programming	not a real programming language
1983	Ada	US DoD	design-by-contract, used in safety systems	own ignorance
1987	Hypertalk	Dan Winkler / Bill Atkinson	english-like	mostly disappeared
1996	OCaml	INRIA	the other significant functional language aside Haskell	too similar to Haskell in spirit
2002	Scratch	MIT Media Lab	block-based visual language, used for teaching kids	not very well known
2014	Swift	Apple Inc.	safer version of Objective C	too Apple-specific
2014	Hack	Facebook	gradual typing for PHP	too Facebook-specific

I also excluded things like ~~Ada,~~ Algol, APL, and other relics that are historically significant but largely irrelevant now as they are not in use anymore. I was surprised to see that Pascal was the most popular programming language for a few years (1980-1984) until it was surpassed by C, according to this vizualisation. (That video also gives Ada the top row for 1985-1986, which I found surprising...)

Scala, Groovy, Typescript, and other minorities are excluded because I am not familiar with them at all.

Update: I added Ada to the table above after being told it's still widely used in aerospace, avionics, traffic control and all sorts of safety-critical systems. It's also standardized and still developed, with the latest stable release from 2012. Ada is also one of the languages still supported by GCC, along with C, C++, Objective-C, Objective-C++, Fortran, and Golang, at the time of writing.

Zero-with-Dot (Oleg Żero): Surviving zombie apocalypse with random search algorithm

February 2, 2020, 3:00 pm

≫ Next: Mike Driscoll: PyDev of the Week: Alessia Marcolini

≪ Previous: Anarcat: 20 most significant programming languages in history

Introduction

Imagine a typical Holywood zombie apocalypse scenario…

A group of desperate folks, led by a charismatic hero, barricades itself in some building, trying to make it to the end of the movie. Hoards of blood craving creatures knock on every door and every window trying to get inside. The people try to back them off using anything they can, but as they are getting short on ammo, the hero needs to send somebody to fetch more shells.

The decision has to be made - weighing the risks of losing a team member against running out of ammo. Doesn’t it sound like an optimization problem?

Although it certainly can be formulated this way, we will see that it is not always possible to apply the standard gradient descent algorithm (SGD). Instead, we will use a simple alternative known as the random search. To illustrate its simplicity, we will implement it in pure python .

Our survival scenario

Let’s assume, for the sake of argument, that our charismatic hero has a laptop with a text editor (e.g. vim if he’s a real hero) and python3 installed - just enough to quickly run a simulation and find out what to do.

The situation presents itself as follows:

There are N days of peril, during which they have to make it.
As soon as the (N + 1)-th day comes, the military will come to rescue the survivors.
Until then, the team consumes one chest of ammo a day, and complete depletion equals inevitable death.
Furthermore, the ammo loses its anti-zombie quality with every passing day, making it unusable after E days.
The group starts with X chests of ammo.
On designated days D = {d_1, d_2, …, d_K}, the airforce drops additional pods with chests of ammo to support the struggling men, and these days are known upfront.
Unfortunately, there is a risk associated with recovering a chest from a pod. It is expressed as p in [0, 1] and represents the probability of losing a group member.
Furthermore, with every chest taken from a pod, the risk increases quadratically with the number of chests retrieved.
Finally, there is no need to accumulate any ammunition beyond the “zero-day”, as the military will promptly take it all away.
Now, the challenge is to ensure that the team never runs out of ammo, while trying to minimize potential human sacrifice.

Building a model

The situation looks rather complex, so our first step is to build a model. We will use it to simulate different scenarios, which will help us to find the answer. This step is somewhat equivalent to the forward pass in SGD.

We know that the system is completely described using N, E, X and pairs of (d_k, p_k). Therefore, it is convenient to implement our model as a class and expose the method run to perform our forward pass.

Let’s start with a constructor.

1
2
3
4
5
6
7
8
E=30# expiry of anti-zombie serumX=10# initial ammo supplyclassSurvival:def__init__(self,total_days,pods,supplies_init=[E]*X):self.total_days=total_daysself.pods=podsself.supplies=supplies_init

The pods is a list of tuples representing the incoming ammo pods with associated risk of retrieval (e.g. [(10, 0.1), (20, 0.2)]). Since the supplies run out, we represent it as list of “days-to-expire”. Once the list gets empty - it’s over!

Next, to account for shrinking supplies, we define the following protected method:

1
2
3
4
def_consume(self):self.supplies.remove(min(self.supplies))self.supplies=list(map(lambdax:x-1,self.supplies))self.supplies=list(filter(lambdax:x!=0,self.supplies))

Every day one chest of ammo is consumed (line #3), then all supplies get one day closer to expiry (#4) and finally, the expired ones get removed from the list (#5).

Another thing happens when new supplies are retrieved.

1
2
3
4
def_retrieve(self,risk,quantity):new_ammo=[E]*quantitycost=quantity**2*riskreturnnew_ammo,cost

In every case when we generate more ammo, we also increase the fractional risk that is proportional to how many chests we choose to retrieve from the pod. (If the cost > 1, we can treat is as the probability of losing more than one man).

To facilitate the forward pass, let’s define one more function:

1
2
3
4
5
def_get_risk(self,day):pod=list(filter(lambdax:x[0]==day,self.pods))iflen(pod)>0:returnpod[0][1]returnNone

The function simply picks the risk associated with a pod that is scheduled for a given day (if it is scheduled).

Finally, the essence of the forward pass:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
defrun(self,retrieval_plan):total_risk=0retrieval_idx=0fordayinrange(1,self.total_days+1):risk_involved=self._get_risk(day)ifrisk_involved:new_ammo,partial_risk=self._retrieve(risk_involved,retrieval_plan[retrieval_idx])self.supplies+=new_ammototal_risk+=partial_riskretrieval_idx+=1iflen(self.supplies)==0:return0,total_risk,dayelse:self._consume()returnlen(self.supplies),total_risk,day

The procedure starts with some retrieval_plan (more on that later) and the total risk is zero. Then, as we move day by day, there will be “opportunities” to acquire ammo. In case such opportunity exists on a particular day, we try to retrieve some particular amount of ammo given our retrieval plan. Whenever there are risk points involved, we add them to the overall risk count.

Then, regardless if we got more ammo or not, we check the status of our supplies. If we reached the end, this night will be the team’s last. The function returns 0 ammo left, the total sacrifice and the day to commemorate. If not, the team consumes the ammo while heroically defending their stronghold and the function returns the corresponding values.

This is all we need from our forward pass. Now, let’s look at the optimization.

Solution

/assets/random-search-algorithm/cost.png

Figure 1. Example of a cost function vs. parameter space (z) for two pods. The lower floor region is associated with violating of the constraints.

Under typical circumstances that would involve the standard gradient descent, we would now be designing the back-propagation pass and discussing how to properly implement the gradient. However, here there is no gradient! The function that describes the risk accumulation is more of a procedure. It is not differentiable. (No zombies are involved either…)

Because of that, and because of the number of constraints that we mentioned earlier, we need to think of another way to lower the risk. This is how the random search algorithm comes in.

The basic principle goes a follows:

We define a cost function that is a way of measuring how “bad” the solution is. (In our example it is the total risk or losing team members).
We initialize the system at random in the search space (z).
We sample a new position (z’) in our search space from within the neighborhood of (z).
We evaluate the cost and if lower, we move to that point (z’ -> z).

We repeat 3. and 4. as many times as it takes to improve and we can apply a hard stop if, for example, the cost no longer decreases, the time runs up, etc.

Of course, this approach does not guarantee that we find the best possible solution, but neither the SGD does that. Intuitively, however, we can think that we should find a possibly better solution if we continue to question the “status quo” and make small improvements.

Let’s take a look at the implementation. First, the initalization:

1
2
3
4
definitialize_retrieval_plan(pods,total_days):a=list(map(lambdax:x[0],pods))a.append(total_days)return[a[x+1]-a[x]forxinrange(len(a)-1)]

The function accepts the schedule of pods (list of {d_k, p_k}) and the number of days to survive, and returns a possible plan that describes how much ammo the team could retrieve at each opportunity - it is a list of integers.

Perhaps you have noticed there are no random numbers involved. Exactly - we do not initialize it randomly. Random initializations work well in case there is no preference of where to start. As our zombie-crisis is full of constraints, we can begin from the position of where at least one of the constraint can easily be satisfied. Therefore, we start with a plan that will guarantee the survival, giving the folks just enough bullets to make it.

From then on, we can optimize.

1
2
3
4
5
6
defcalculate_plan(total_days,pods,epochs=300,supplies=[E]*X,attempts=12,radius=4):...

The function calculate_plan is our alternative to the back-propagation step. We can think of epochs, attempts and radius as hyper-parameters, where epochs is a number search steps, attempts is a number of the points we sample under each step and radius expresses the “stretch” of the neighborhood of (z).

Because of the calculations can, in principle, take some time, the first thing that we do is to check if the conditions aren’t ill-suited or if the plan proposed won’t make them die. For example, if the date of the first pod comes after the team runs out of supplies, there will be no chance to resupply and the death is inevitable. In this case, we can just as well stop calculating and start praying.

1
2
3
4
5
6
7
8
plan=initialize_retrieval_plan(pods,total_days)ifis_illegal(plan):print("This retrieval plan cannot make them survive.")returnS=Survival(total_days,pods,supplies.copy())supplies_size,cost,day=S.run(plan)ifday<total_days:return

The is_illegal function is a simple check if no elements of the plan is negative (def is_illegal(plan): return (True in [c < 0 for c in plan])).

Having checked the feasibility, we can finally start optimizing for the minimal risk. The dimensionality of our parameter space equals the number of pods (that equals the length of plan). Sampling of that space means to pick an element of plan at random and see what happens if we remove some number of chests from a pod. However, to prevent violating the condition that the team must not run out of the ammo, we also pick another element of plan to give this quantity to. That creates a new plan that we verify using S.run(plan) to obtain a new outcome.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
plan_range=range(len(plan))cost_history=[9999999]# initializing cost history for early stoppinglast_plan=plan.copy()epoch=1whileepoch<epochs:cache={}forattemptinrange(attepmts):i1=random.choice(plan_range)i2=random.choice(plan_range)plan=last_plan.copy()qty=random.choice(range(1,radius+1))plan[i1]-=qtyplan[i2]+=qtyS=Survival(total_days,pods,supplies.copy())supplies_size,cost,day=S.run(plan)

The series of the if-else statements then checks for the remaining constraints, but if the new plan is feasible, we add it to cache that contains all alternative plans for the step with an associated cost function. Then, if there exists an alternative for which the cost function is lower, we choose that plan for the next iteration (z’ -> z).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ifsupplies_size<0:# ran out of ammocontinueelifday<total_days:# ran out of dayscontinueelifis_illegal(plan):# attempt to sell ammocontinueelse:# solution foundkey='-'.join([str(x)forxinplan])cache[key]=costlast_plan,cost=get_minimum(cache)print(f"{epoch}, cost: {cost}, plan: {last_plan}")epoch+=1cost_history.append(cost)ifcost_history[epoch-1]==cost_history[epoch-2]:break

At some point there will be no further plan for a given step. When that happens, we fall back on the last feasible plan to be the solution.

1
2
3
4
5
6
iflen(cache)==0:print("No solution for this case. They must die.")returnelse:print(f"Solution found, but it'll cost = {cost}.")returnlast_plan

The get_minumum function is our custom version of arg min:

1
2
3
4
defget_minimum(cache):min_risk=min(cache.values())key=list(cache.keys())[list(cache.values()).index(min_risk)]return[int(x)forxinkey.split('-')],min_risk

Potential problems and alternatives This is it! Running, for example:

1
2
3
4
5
6
if__name__=='__main__':pods=[(10,0.2),(15,0.1),(35,0.5),(50,0.3)]total_days=70plan=calculate_plan(total_days,pods)print(plan)

we can see the progression in finding better plans and the cost value going down.

/assets/random-search-algoritm/zombie.png

Figure 2. Minimum cost (left) and minimum time taken (right) by the random search algorithm of N = 100 and ten pods, checking 10-times for each combination of `attempts` and `radius`. The results were obtained for linear dependency of the cost function on the retrieved chests quantity.

Unfortunately, if we execute this optimization a couple of times, we will quickly spot that the most “optimal” plan varies, and the cost value is higher at times than the others. This is because our algorithm is “blind”. Without gradient, we only give it a couple of shots (attempts) at every iteration to look for a better plan and we do so at random. Consequently, there is a significant threat that the algorithm gets to a place (z`) it can no longer progress - something like the local minimum.

To overcome this issue, we can naturally increase the number of attempts to give it more chance or radius to allow for more alternatives in finding better z’.

In an extreme case, we can choose to loop over all pairs of (z, z’) of a distance radius. This approach would be equivalent to brute-forcing the gradient, drastically increasing our chance of reaching the global minimum. On the other hand, there is a price to pay. With only radius == 1, the number of combinations (z, z’) is proportional to K ** 2, where K = len(plan). The computations can take longer.

Conclusions

Defending against hoards of the undead is, indeed, a major challenge. Zombies apart, in this article, we have presented an easy, bare-python implementation of the random search algorithm. This algorithm can be a feasible alternative to the SGD whenever we can define a cost function, but we cannot ensure its differentiability. Furthermore, if the problem is filled with constraints, like this one, the approach presented here can be applied. Finally, we have also discussed the computational complexity of the algorithm itself, which may impact the next hero’s decision if there is indeed a major zombie crisis.

↧

Mike Driscoll: PyDev of the Week: Alessia Marcolini

February 2, 2020, 10:05 pm

≫ Next: James Bennett: How I'm testing in 2020

≪ Previous: Zero-with-Dot (Oleg Żero): Surviving zombie apocalypse with random search algorithm

This week we welcome Alessia Marcolini (@viperale) as our PyDev of the Week! Alessia is a Python blogger and speaker. You can check out some of her work over on Medium. You can also see some of her coding skills on Github. Let’s spend a few moments getting to know her better!

Can you tell us a little about yourself (hobbies, education, etc):

Hello everybody, my name is Alessia and I’m 21. I come from a little town near Verona, a beautiful city in the north of Italy.

I’ve been living in Trento (Italy) for 2 years and a half now. I moved here to attend university: I’m currently enrolled in the third year of a Bachelor’s degree in Computer Science.

In 2017 I started working part time as a Junior Research Assistant in the Bruno Kessler Foundation, too. FBK is a research foundation based in Trento, working on Science, Technology, and Social Sciences. I’m part of the MPBA unit which focuses on novel applications of Deep Learning from complex data: e.g. Precision Medicine, Imaging and Portable Spectroscopy in industry processes, Nowcasting on time-spatial data. I’m currently working on deep learning frameworks to integrate multiple medical imaging modalities and different clinical data to get more precise prognostic/diagnostic functions.

When not coding, I love dancing and listening to music. I have also been part of a hip hop crew until 2017.

Why did you start using Python?

Well, this dates back to the very first years of my technical high school. We had a teacher who, going against the opinions of many other computer science teachers in my school, decided to teach students in my class Python as the first ever programming language. So, it wasn’t really a choice I made. However, after these six years, I realise how lucky I was to have had that teacher (joking, I realised it even before, I still love that teacher and we are on the best terms but perhaps I did not understand the impact he would have on my future).

What other programming languages do you know and which is your favorite?

It’s difficult to say whether you “know” or you “don’t know” a programming language. I can say that Python is my most practiced language, since I’ve been using it every day at work for three years now. Apart from it, I had the opportunity to practice also Java, C and C++ at school and at university. I also took part in the Italian Olympiad in Informatics in teams for a couple of years and we were required to write our programs in C++.

Anyway, Python is definitely my favourite programming language: it is easy to learn, the syntax is intuitive and with Python you can accomplish tasks with much less code than with other languages. It’s very handy for writing scripts, but at the same time it’s powerful and it gives you the possibility to write an entire object oriented application end-to-end. It can serve multiple areas of application, from web development, to desktop development, to data science.

They say you “Come for the language, Stay for the community”, and this is really one of the aspects I appreciate the most about the Python environment. My experience with the Python community has been awesome and that’s why I always encourage people to come to the Python world (more on this later).

What projects are you working on now?

One of my longest running projects at work is the AI & Open Innovation Lab: I am the co-Director of a tech lab jointly organized by FBK and Istituto Artigianelli (TN, Italy) with the aim to introduce high-school students to data science and to develop new applications of Deep Learning combined with art and design. I started three years ago, teaching students Python and Deep Learning; we also applied design thinking methodologies for projects management. The 2019-2020 edition challenge is to develop a smart packaging solution for the wine business, to ensure product integrity and product traceability during the production and manufacturing process. It is cool because I get to work with students from diverse backgrounds (ITC high schools, lyceums, universities) and of different ages (from 17 to 25) and the team can establish a direct link with companies and clients (we are now partnering with a packaging company and a sparkling wine company).

From the research perspective, I’m currently working on Deep Learning algorithms for Digital Pathology and for Radiology. Regarding Digital Pathology, I’m optimising a reproducible deep learning framework to predict clinically relevant outcomes on pathological tissues and studying the effect of overfitting caused by data leakage.
Considering Radiology, I’m evaluating the prognostic capability of radiomics and deep features in cancer bioimaging through an integrative deep learning framework, applied to a combined dataset of PET-CT scans.

Additionally, I recently started working on Hangar. It’s a pretty young project (born in April 2019), but in my opinion it is very promising. It basically provides support for versioning your data in a smart way. It is designed to solve many of the problems faced by traditional version control systems, just adapted to numerical data. I think this was a missing piece in the data science tools puzzle.

In particular, I started developing new tutorials and use cases for the library.

Which Python libraries are your favorite (core or 3rd party)?

NumPy and Pandas, which are the core libraries for scientific computing in Python. One of the most powerful features of Pandas is to translate complex data operations into mere one or two commands.

How did you get into organizing Python conferences?

Wonderful question. When I was 18 I had the great opportunity to get accepted into WebValley 2016 International, a summer school about data science dedicated to 17-18 yo students coming from all over the world: 15-20 students for three weeks in a small village in Trentino to work on a research project tutored by FBK researchers. There I met those who would become my mentors: Valerio Maggio and Ernesto Arbitrio. Fellow Pythonistas, they started to get me involved into the organization of PyCon Italy. Also a special mention goes to Carlo Miron, one of the founders of the PyCon Italia Association. I’m grateful for the trust they placed (and are still placing) in me. We’ve already started to work on the next year conference: PyCon 11, May, 2nd-5th 2020, Florence! It’s going to be a lot of fun (as always! ;)) Here is a short recap of the last edition: https://www.youtube.com/watch?v=ZBgwhPFzi_M

What is the most challenging thing about organizing a conference?

When you have a team that is supportive, caring and friendly, organizing a conference is not that difficult. The keyword is teamwork. Although, organizing a conference and always being on the ball can be time-consuming. The team behind the scenes of this kind of conferences (usually) consists of volunteers and sometimes it’s hard to fit meetings and calls in everyone’s agenda (we organize a public call once every two weeks – from October to May).

Anyway, being an organizer (and attendee too) is one of the most rewarding experiences I’ve been having thanks to all the relationships I’ve made and the love shared around a programming language.

Is there anything else you’d like to say?

“The bad news: nothing is permanent.
The good news: nothing is permanent.” – Lolly Daskal

Thanks for doing the interview, Alessia!

The post PyDev of the Week: Alessia Marcolini appeared first on The Mouse Vs. The Python.

↧

James Bennett: How I'm testing in 2020

February 2, 2020, 11:30 pm

≫ Next: Django Weblog: Django security releases issued: 3.0.3, 2.2.10, and 1.11.28

≪ Previous: Mike Driscoll: PyDev of the Week: Alessia Marcolini

Once upon a time I wrote a bit about testing, specifically how I was organizing and testing my open-source Django apps. It’s been a while since that post, though, and the calendar has even flipped over to a new penultimate digit in the year number, so it’s worth revisiting to go over what’s changed in how I do things and what’s stayed the same. And since I do maintain a couple things that aren’t …

Read full entry

↧

Django Weblog: Django security releases issued: 3.0.3, 2.2.10, and 1.11.28

February 3, 2020, 1:54 am

≫ Next: CubicWeb: What is new in CubicWeb 3.27 ?

≪ Previous: James Bennett: How I'm testing in 2020

In accordance with our security release policy, the Django team is issuing Django 3.0.3, Django 2.2.10 and Django 1.11.28. These releases address the security issue detailed below. We encourage all users of Django to upgrade as soon as possible.

Affected supported versions

Django master branch
Django 3.0
Django 2.2
Django 1.11

CVE-2020-7471: Potential SQL injection via `StringAgg(delimiter)`

django.contrib.postgres.aggregates.StringAgg aggregation function was subject to SQL injection, using a suitably crafted delimiter.

Thank you to Simon Charette for the report and patch.

Resolution

Patches to resolve the issue have been applied to Django's master branch and the 3.0, 2.2, and 1.11 release branches. The patches may be obtained from the following changesets:

The following releases have been issued:

Django 3.0.3 (download Django 3.0.3 | 3.0.3 checksums)
Django 2.2.10 (download Django 2.2.10 | 2.2.10 checksums)
Django 1.11.28 (download Django 1.11.28 | 1.11.28 checksums)

The PGP key ID used for these releases is Carlton Gibson: E17DF5C82B4F9D00.

General notes regarding security reporting

As always, we ask that potential security issues be reported via private email to security@djangoproject.com, and not via Django's Trac instance or the django-developers list. Please see our security policies for further information.

↧

CubicWeb: What is new in CubicWeb 3.27 ?

February 3, 2020, 1:55 am

≫ Next: Julien Danjou: Python Logging with Datadog

≪ Previous: Django Weblog: Django security releases issued: 3.0.3, 2.2.10, and 1.11.28

Hello CubicWeb community,

We are pleased to announce the release of CubicWeb 3.27. Many thanks to all the contributors of this release!

Main changes in this release are listed below. Please note this release drops python2 support.

Enjoy this new version!

New features

Tests can now be run concurrently across multiple processes. You can use pytest-xdist for that. For tests using PostgresApptestConfiguration you should be aware that startpgcluster() can't run concurrently. Workaround is to call pytest with --dist=loadfile to use a single test process per test module or use an existing database cluster and set db-host and db-port of devtools.DEFAULT_PSQL_SOURCES['system'] accordingly.

on cubicweb-ctl create and cubicweb-ctl pyramid, if it doesn't already exist in the instance directory, the pyramid.ini file will be generated with the needed secrets.
add a --pdb flag to all cubicweb-ctl command to launch (i)pdb if an exception occurs during a command execution.
the --loglevel and --dbglevel flags are available for all cubicweb-ctl instance commands (and not only the pyramid one)
following "only in foreground" behavior all commands logs to stdout by default from now on. To still log to a file pass log_to_file=True to CubicWebConfiguration.config_for
add a new migration function update_bfss_path(old_path, new_path) to update the path in Bytes File-System Storage (bfss).
on every request display request path and selected controller in CLI
migration interactive mode improvements:
- when an exception occurs, display the full traceback instead of only the exception
- on migration p(db) choice, launch ipdb if it's installed
- on migration p(db) choice, give the traceback to pdb if it's available, this mean that the (i)pdb interactive session will be on the stack of the exception instead of being on the stack where pdb is launched which will allow the user to access all the relevant context of the exception which otherwise is lost
on DBG_SQL and/or DBG_RQL, if pygments is installed, syntax highlight sql/rql debug output
allow to specify the instance id for any instance command using the CW_INSTANCE global variable instead of or giving it as a cli argument
when debugmode is activated ('-D/--debug' on the pyramid command for example), the HTML generated by CW will contains new tags that will indicate by which object in the code it has been generated and in which line of which source code. For example:

<div
  cubicweb-generated-by="cubicweb.web.views.basetemplates.TheMainTemplate"
  cubicweb-from-source="/home/user/code/logilab/cubicweb/cubicweb/web/views/basetemplates.py:161"
  id="contentmain">
    <h1
      cubicweb-generated-by="cubicweb.web.views.basetemplates.TheMainTemplate"
      cubicweb-from-source="/home/user/code/logilab/cubicweb/cubicweb/view.py:136">
        unset title
    </h1>
    [...]
</div>

While this hasn't been done yet, this feature is an open path for building dynamic tools that can help inspect the page.

a new debug channels mechanism has been added, you can subscribe to one of those channels in your python code to build debug tools for example (the pyramid custom panels are built using that) and you will receive a datastructure (a dict) containing related information. The available channels are: controller, rql, sql, vreg, registry_decisions
add a new '-t/--toolbar' option the pyramid command to activate the pyramid debugtoolbar
a series of pyramid debugtoolbar panels specifically made for CW, see bellow

Pyramid debugtoolbar and custom panel

The pyramid debugtoolbar is now integrated into CubicWeb during the development phase when you use the 'pyramid' command. To activate it you need to pass the '-t/--toolbar' argument to the 'pyramid' command.

In addition, a series of custom panels specifically done for CW are now available, they display useful information for the development and the debugging of each page. The available panels are:

a general panel which contains the selected controller, the current settings and useful links screenshot1
a panel listing all decisions taken in registry for building this page screenshot2
a panel listing the content of the vreg registries screenshot3
a panel listing all the RQL queries made during a request screenshot4
a panel listing all the SQL queries made during a request screenshot5

Furthermore, in all those panels, next to each object/class/function/method a link to display its source code is available (shown as '[source]' screenshot6) and also every file path shown is a traceback is also a link to display the corresponding file (screenshot7). For example: screenshot8.

Backwards incompatible changes

Standardization on the way to launch a cubicweb instance, from now on the only way to do that will be the used the pyramid command. Therefore:
- cubicweb-ctl commands "start", "stop", "restart", "reload" and "status" have been removed because they relied on the Twisted web server backend that is no longer maintained nor working with Python 3.
- Twisted web server support has been removed.
- cubicweb-ctl wsgi has also been removed.
Support for legacy cubes (in the 'cubes' python namespace) has been dropped. Use of environment variables CW_CUBES_PATH and CUBES_DIR is removed.
Python 2 support has been dropped.
Exceptions in notification hooks aren't catched-all anymore during tests so one can expect tests that seem to pass (but were actually silently failing) to fail now.
All "cubicweb-ctl" command only accept one instance argument from now one (instead of 0 to n)
'pyramid' command will always run in the foreground now, by consequence the option --no-daemon has been removed.
DBG_MS flag has been removed since it is not used anymore
transactions db logs where displayed using the logging (debug/info/warning...) mechanism, now it is only displayed if the corresponding DBG_OPS flag is used

Deprecated code drops

Most code deprecated by version 3.25 or older versions has been dropped.

↧

Julien Danjou: Python Logging with Datadog

February 3, 2020, 2:33 am

≫ Next: Real Python: Use a Flask Blueprint to Architect Your Applications

≪ Previous: CubicWeb: What is new in CubicWeb 3.27 ?

At Mergify, we generate a pretty large amount of logs. Every time an event is received from GitHub for a particular pull request, our engine computes a new state for it. Doing so, it logs some informational statements about what it's doing — and any error that might happen.

This information is precious to us. Without proper logging, it'd be utterly impossible for us to debug any issue. As we needed to store and index our logs somewhere, we picked Datadog as our log storage provider.

Datadog offers real-time indexing of our logs. The ability to search our records that fast is compelling as we're able to retrieve log about a GitHub repository or a pull request with a single click.

Our custom Datadog log facets

To achieve this result, we had to inject our Python application logs into Datadog. To set up the Python logging mechanism, we rely on daiquiri, a fantastic library I maintained for several years now. Daiquiri leverages the regular Python logging module, making its a no-brainer to set up and offering a few extra features.

We recently added native support for the Datadog agent in daiquiri, making it even more straightforward to log from your Python application.

Enabling log on the Datadog agent

Datadog has extensive documentation on how to configure its agent. This can be summarized to adding logs_enabled: true in your agent configuration. Simple as that.

You then need to create a new source for the agent. The easiest way to connect your application and the Datadog agent is using the TCP socket. Your application will write logs directly to the Datadog agent, which will forward the entries to Datadog backend.

Create a configuration file in conf.d/python.d/conf.yaml with the following content:

init_config:

instances:

logs:
  - type: tcp
    port: 10518
    source: python
    service: <YOUR SERVICE NAME>
    sourcecategory: sourcecode

conf.d/python.d/conf.yaml

Setting up `daiquiri`

Once this is done, you need to configure your Python application to log to the TCP socket configured in the agent above.

The Datadog agent expects logs in JSON format being sent, which is what daiquiri does for you. Using JSON allows to embed any extra fields to leverage fast search and indexing. As daiquiri provides native handling for extra fields, you'll be able to send those extra fields without trouble.

First, list daiquiri in your application dependency. Then, set up logging in your application this way:

import daiquiri

daiquiri.setup(
  outputs=[
    daiquiri.output.Datadog(),
  ],
  level=logging.INFO,
)

This configuration logs to the default TCP destination localhost:10518— though you can pass the host and port argument to change that. You can customize the outputs as you wish by checking out daiquiri documentation. For example, you could also include logging to stdout by adding daiquiri.output.Stream(sys.stdout) in the output list.

Using `extra`

When using daiquiri, you're free to use logging.getLogger to get your regular logging object. However, by using the alternative daiquiri.getLogger function, you're enabling the native use of extra arguments — which is quite handy. That means you can pass any arbitrary key/value to your log call, and see it up being embedded in your log data — up to Datadog.

Here's an example:

import daiquiri

[…]

log = daiquiri.getLogger(__name__)
log.info("User did something important", user=user, request_id=request_id)

The extra keyword argument passed to log.info will be directly shown as attributes in Datadog logs:

One of the log line of our Mergify engine

All those attributes can then be used to search or to display custom views. This is really powerful to monitor and debug any kind of service.

A log object per object

When passing extra arguments, it is easy to make mistakes and forget some. This especially can happen when your application wants to log information for a particular object.

The best pattern to avoid this is to create a custom log object per object:

import daiquiri

class MyObject:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.log = daiquiri.getLogger("MyObject", x=self.x, y=self.y)

    def do_something(self):
        try:
            self.call_this()
        except Exception:
            self.log.error("Something bad happened")

By using the self.log object as defined above, there's no way for your application to miss some extra fields for an object. All your logs will look in the same style and will end up being indexed correctly in Datadog.

Log Design

The extra arguments from the Python loggers are often dismissed, and many developers stick to logging strings with various information included inside. Having a proper explanation string, plus a few extra key/value pairs that are parsable by machines and humans, is a better way to do logging. Leveraging engines such as Datadog allow to store and query those logs in a snap.

This is way more efficient than trying to parse and grep strings yourselves!

↧

Real Python: Use a Flask Blueprint to Architect Your Applications

February 3, 2020, 6:00 am

≫ Next: Python Bytes: #167 Cheating at Kaggle and uWSGI in prod

≪ Previous: Julien Danjou: Python Logging with Datadog

Flask is a very popular web application framework that leaves almost all design and architecture decisions up to the developer. In this tutorial, you’ll learn how a Flask Blueprint, or Blueprint for short, can help you structure your Flask application by grouping its functionality into reusable components.

In this tutorial, you’ll learn:

What Flask Blueprints are and how they work
How to create and use a Flask Blueprint to organize your code
How to improve code reusability using your own or a third-party Flask Blueprint

This tutorial assumes that you have some experience using Flask and that you’ve built some applications before. If you haven’t used Flask before, then check out Python Web Applications with Flask (Tutorial Series).

Free Bonus:Click here to get access to a free Flask + Python video tutorial that shows you how to build Flask web app, step-by-step.

What a Flask Application Looks Like

Let’s start by reviewing the structure of a small Flask application. You can create a small web application by following the steps in this section. To get started, you need to install the Flask Python package. You can run the following command to install Flask using pip:

$ pip install Flask==1.1.1

The above command installs Flask version 1.1.1. This is the version you’ll use throughout this tutorial, though you can apply what you’ll learn here to other versions, as well.

Note: For more information on how to install Flask in a virtual environment and other pip options, check out Python Virtual Environments: A Primer and What Is Pip? A Guide for New Pythonistas.

After you install Flask, you’re ready to start implementing its functionality. Since Flask doesn’t impose any restrictions on project structure, you can organize your project’s code as you want. For your first application, you can use a very straightforward layout, as shown below. A single file will contain all the application logic:

app/
|
└── app.py

The file app.py will contain the definition of the application and its views.

When you create a Flask application, you start by creating a Flask object that represents your application, and then you associate views to routes. Flask takes care of dispatching incoming requests to the correct view based on the request URL and the routes you’ve defined.

In Flask, views can be any callable (like a function) that receives requests and returns the response for that request. Flask is responsible for sending the response back to the user.

The following code block is your application’s full source code:

fromflaskimportFlaskapp=Flask(__name__)@app.route('/')defindex():return"This is an example app"

This code creates the object app, which belongs to the Flask class. The view function index() is linked to the route / using the app.route decorator. To learn more about decorators, check out Primer on Python Decorators and Python Decorators 101.

You can run the application with the following command:

$ flask run

By default, Flask will run the application you defined in app.py on port 5000. While the application is running, go to http://localhost:5000 using your web browser. You’ll see a page showing the message, This is an example app.

The chosen project layout is great for very small applications, but it doesn’t scale well. As your code grows, it can become harder for you to maintain everything in a single file. So, when your application grows in size or complexity, you may want to structure your code in a different way to keep it maintainable and clear to understand. Throughout this tutorial, you’ll learn how to use a Flask Blueprint to achieve this.

What a Flask Blueprint Looks Like

Flask Blueprints encapsulate functionality, such as views, templates, and other resources. To get a taste for how a Flask Blueprint would work, you can refactor the previous application by moving the index view into a Flask Blueprint. To do so, you have to create a Flask Blueprint that contains the index view and then use it in the application.

This is what the file structure looks like for this new application:

app/
|
├── app.py
└── example_blueprint.py

example_blueprint.py will contain the Flask Blueprint implementation. You’ll then modify app.py to use it.

The following code block shows how you can implement this Flask Blueprint in example_blueprint.py. It contains a view at the route / that returns the text This is an example app:

fromflaskimportBlueprintexample_blueprint=Blueprint('example_blueprint',__name__)@example_blueprint.route('/')defindex():return"This is an example app"

In the above code, you can see the steps common to most Flask Blueprint definitions:

Create a Blueprint object called example_blueprint.
Add views to example_blueprint using the route decorator.

The following code block shows how your application imports and uses the Flask Blueprint:

fromflaskimportFlaskfromexample_blueprintimportexample_blueprintapp=Flask(__name__)app.register_blueprint(example_blueprint)

To use any Flask Blueprint, you have to import it and then register it in the application using register_blueprint(). When a Flask Blueprint is registered, the application is extended with its contents.

You can run the application with the following command:

$ flask run

While the application is running, go to http://localhost:5000 using your web browser. You’ll see a page showing the message, This is an example app.

How Flask Blueprints Work

In this section, you’ll learn in detail how a Flask Blueprint is implemented and used. Each Flask Blueprint is an object that works very similarly to a Flask application. They both can have resources, such as static files, templates, and views that are associated with routes.

However, a Flask Blueprint is not actually an application. It needs to be registered in an application before you can run it. When you register a Flask Blueprint in an application, you’re actually extending the application with the contents of the Blueprint.

This is the key concept behind any Flask Blueprint. They record operations to be executed later when you register them on an application. For example, when you associate a view to a route in a Flask Blueprint, it records this association to be made later in the application when the Blueprint is registered.

Making a Flask Blueprint

Let’s revisit the Flask Blueprint definition that you’ve seen previously and review it in detail. The following code shows the Blueprint object creation:

fromflaskimportBlueprintexample_blueprint=Blueprint('example_blueprint',__name__)

Note that in the above code, some arguments are specified when creating the Blueprint object. The first argument, "example_blueprint", is the Blueprint’s name, which is used by Flask’s routing mechanism. The second argument, __name__, is the Blueprint’s import name, which Flask uses to locate the Blueprint’s resources.

There are other optional arguments that you can provide to alter the Blueprint’s behavior:

static_folder: the folder where the Blueprint’s static files can be found
static_url_path: the URL to serve static files from
template_folder: the folder containing the Blueprint’s templates
url_prefix: the path to prepend to all of the Blueprint’s URLs
subdomain: the subdomain that this Blueprint’s routes will match on by default
url_defaults: a dictionary of default values that this Blueprint’s views will receive
root_path: the Blueprint’s root directory path, whose default value is obtained from the Blueprint’s import name

Note that all paths, except root_path, are relative to the Blueprint’s directory.

The Blueprint object example_blueprint has methods and decorators that allow you to record operations to be executed when registering the Flask Blueprint in an application to extend it. One of the most used decorators is route. It allows you to associate a view function to a URL route. The following code block shows how this decorator is used:

@example_blueprint.route('/')defindex():return"This is an example app"

You decorate index() using example_blueprint.route and associate the function to the URL /.

Blueprint objects also provide other methods that you may find useful:

.errorhandler() to register an error handler function
.before_before()() to execute an action before every request
.after_request() to execute an action after every request
.app_template_filter() to register a template filter at the application level

You can learn more about using Blueprints and the Blueprint class in the Flask Blueprints Documentation.

Registering the Blueprint in Your Application

Recall that a Flask Blueprint is not actually an application. When you register the Flask Blueprint in an application, you extend the application with its contents. The following code shows how you can register the previously-created Flask Blueprint in an application:

fromflaskimportFlaskfromexample_blueprintimportexample_blueprintapp=Flask(__name__)app.register_blueprint(example_blueprint)

When you call .register_blueprint(), you apply all operations recorded in the Flask Blueprint example_blueprint to app. Now, requests to the app for the URL / will be served using .index() from the Flask Blueprint.

You can customize how the Flask Blueprint extends the application by providing some parameters to register_blueprint:

url_prefix is an optional prefix for all the Blueprint’s routes.
subdomain is a subdomain that Blueprint routes will match.
url_defaults is a dictionary with default values for view arguments.

Being able to do some customization at registration time, instead of at creation time, is particularly useful when you’re sharing the same Flask Blueprint in different projects.

In this section, you’ve seen how Flask Blueprints work and how you can create them and use them. In the following sections, you’ll learn how you can leverage a Flask Blueprint to architect your applications, structuring them into independent components. In some cases, it’s also possible for you to reuse these components in different applications to reduce development time!

How to Use Flask Blueprints to Architect Your Application’s Code

In this section, you’re going to see how you can refactor an example application using a Flask Blueprint. The example application is an e-commerce site with the following features:

Visitors can sign up, log in, and recover passwords.
Visitors can search for products and view their details.
Users can add products to their cart and checkout.
An API enables external systems to search and retrieve product information.

You don’t need to care much about the details of the implementation. Instead, you’ll focus mainly on how a Flask Blueprint can be used to improve the application’s architecture.

Understanding Why Project Layout Matters

Remember, Flask does not enforce any particular project layout. It’s completely feasible to organize this application’s code as follows:

ecommerce/
|
├── static/
|   ├── logo.png
|   ├── main.css
|   ├── generic.js
|   └── product_view.js
|
├── templates/
|   ├── login.html
|   ├── forgot_password.html
|   ├── signup.html
|   ├── checkout.html
|   ├── cart_view.html
|   ├── index.html
|   ├── products_list.html
|   └── product_view.html
|
├── app.py
├── config.py
└── models.py

This application’s code is organized using these directories and files:

static/ contains the application’s static files.
templates/ contains the application’s templates.
models.py contains the definition of the application’s models.
app.py contains the application logic.
config.py contains the application configuration parameters.

This is an example of how many applications begin. Although this layout is pretty straightforward, it has several drawbacks that arise as the app complexity increases. For example, it will be hard for you to reuse the application logic in other projects because all the functionality is bundled in app.py. If you split this functionality into modules instead, then you could reuse complete modules across different projects.

Also, if you have just one file for the application logic, then you would end up with a very large app.py that mixes code that’s nearly unrelated. This can make it hard for you to navigate and maintain the script.

What’s more, large code files are a source of conflicts when you’re working in a team, since everybody will be making changes to the same file. These are just a few reasons why the previous layout is only good for very small applications.

Organizing Your Projects

Instead of structuring the application using the previous layout, you can leverage a Flask Blueprint to split the code into different modules. In this section, you’ll see how to architect the previous application to make Blueprints that encapsulate related functionality. In this layout, there are five Flask Blueprints:

API Blueprint to enable external systems to search and retrieve product information
Authentication Blueprint to enable users to log in and recover their password
Cart Blueprint for cart and checkout functionality
General Blueprint for the homepage
Products Blueprint for searching and viewing products

If you use a separate directory for each Flask Blueprint and its resources, then the project layout would look as follows:

ecommerce/
|
├── api/
|   ├── __init__.py
|   └── api.py
|
├── auth/
|   ├── templates/
|   |   └── auth/
|   |       ├── login.html
|   |       ├── forgot_password.html
|   |       └── signup.html
|   |
|   ├── __init__.py
|   └── auth.py
|
├── cart/
|   ├── templates/
|   |   └── cart/
|   |       ├── checkout.html
|   |       └── view.html
|   |
|   ├── __init__.py
|   └── cart.py
|
├── general/
|   ├── templates/
|   |   └── general/
|   |       └── index.html
|   |
|   ├── __init__.py
|   └── general.py
|
├── products/
|   ├── static/
|   |   └── view.js
|   |
|   ├── templates/
|   |   └── products/
|   |       ├── list.html
|   |       └── view.html
|   |
|   ├── __init__.py
|   └── products.py
|
├── static/
|   ├── logo.png
|   ├── main.css
|   └── generic.js
|
├── app.py
├── config.py
└── models.py

To organize code in this way, you move all views from app.py into the corresponding Flask Blueprint. You also moved templates and non-global static files. This structure makes it easier for you to find the code and resources related to a given functionality. For example, if you want to find the application logic about products, then you can go to the Products Blueprint in products/products.py instead of scrolling through app.py.

Let’s see the Products Blueprint implementation in products/products.py:

fromflaskimportBlueprint,render_templatefromecommerce.modelsimportProductproducts_bp=Blueprint('products_bp',__name__,template_folder='template',static_folder='static',static_url_path='assets')@products_bp.route('/')deflist():products=Product.query.all()returnrender_template('products/list.html',products=products)@products_bp.route('/view/<int:product_id>')defview(product_id):product=Product.query.get(product_id)returnrender_template('products/view.html',product=product)

This code defines the products_bp Flask Blueprint and contains only the code that’s related to product functionality. Since this Flask Blueprint has its own templates, you need to specify the template_folder relative to the Blueprint’s root in the Blueprint object creation. Since you specify static_folder='static' and static_url_path='assets', files in ecommerce/products/static/ will be served under the /assets/ URL.

Now you can move the rest of your code’s functionality to the corresponding Flask Blueprint. In other words, you can create Blueprints for API, authentication, cart, and general functionality. Once you’ve done so, the only code left in app.py will be code that deals with application initialization and Flask Blueprint registration:

fromflaskimportFlaskfromecommmerce.api.apiimportapi_bpfromecommmerce.auth.authimportauth_bpfromecommmerce.cart.cartimportcart_bpfromecommmerce.general.generalimportgeneral_bpfromecommmerce.products.productsimportproducts_bpapp=Flask(__name__)app.register_blueprint(api_bp,url_prefix='/api')app.register_blueprint(auth_bp)app.register_blueprint(cart_bp,url_prefix='/cart')app.register_blueprint(general_bp)app.register_blueprint(products_bp,url_prefix='/products')

Now, app.py simply imports and registers the Blueprints to extend the application. Since you use url_prefix, you can avoid URL collisions between Flask Blueprint routes. For example, the URLs /products/ and /cart/ resolve to different endpoints defined in the products_bp and cart_bp Blueprints for the same route, /.

Including Templates

In Flask, when a view renders a template, the template file is searched in all the directories that were registered in the application’s template search path. By default, this path is ["/templates"], so templates are only searched for in the /templates directory inside the application’s root directory.

If you set the template_folder argument in a Blueprint’s creation, then its templates folder is added to the application’s template search path when the Flask Blueprint is registered. However, if there are duplicated file paths under different directories that are part of the template search path, then one will take precedence, depending on their registration order.

For example, if a view requests the template view.html and there are files with this same name in different directories in the template search path, then one of these will take precedence over the other. Since it may be hard to remember the precedence order, it’s best to avoid having files under the same path in different template directories. That’s why the following structure for the templates in the application makes sense:

ecommerce/
|
└── products/
    └── templates/
        └── products/
            ├── search.html
            └── view.html

At first, it may look redundant to have the Flask Blueprint name appear twice:

As the Blueprint’s root directory
Inside the templates directory

However, know that by doing this, you can avoid possible template name collisions between different Blueprints. Using this directory structure, any views requiring the view.html template for products can use products/view.html as the template file name when calling render_template. This avoids conflicts with the view.html that belongs to the Cart Blueprint.

As a final note, it’s important to know that templates in the application’s template directory have greater precedence than those inside the Blueprint’s template directory. This can be useful to know if you want to override Flask Blueprint templates without actually modifying the template file.

For example, if you wanted to override the template products/view.html in the Products Blueprint, then you can accomplish this by creating a new file products/view.html in the application templates directory:

ecommerce/
|
├── products/
|   └── templates/
|       └── products/
|           ├── search.html
|           └── view.html
|
└── templates/
        └── products/
            └── view.html

When you do this, your program will use templates/products/view.html instead of products/templates/products/view.html whenever a view requires the template products/view.html.

Providing Functionality Other Than Views

So far, you’ve only seen Blueprints that extend applications with views, but Flask Blueprints don’t have to provide just views! They can extend applications with templates, static files, and template filters. For example, you could create a Flask Blueprint to provide a set of icons and use it across your applications. This would be the file structure for such a Blueprint:

app/
|
└── icons/
    ├── static/
    |   ├── add.png
    |   ├── remove.png
    |   └── save.png
    |
    ├── __init__.py
    └── icons.py

The static folder contains the icon files and icons.py is the Flask Blueprint definition.

This is how icons.py might look:

fromflaskimportBlueprinticons_bp=Blueprint('icons_bp',__name__,static_folder='static',static_url_path='icons')

This code defines the icons_bp Flask Blueprint that exposes the files in the static directory under the /icons/ URL. Note that this Blueprint does not define any route.

When you can create Blueprints that package views and other types of content, you make your code and assets more reusable across your applications. You’ll learn more about Flask Blueprint reusability in the following section.

How to Use Flask Blueprints to Improve Code Reuse

Besides code organization, there’s another advantage to structuring your Flask application as a collection of independent components. You can reuse these components even across different applications! For example, if you created a Flask Blueprint that provides functionality for a contact form, then you can reuse it in all your applications.

You can also leverage Blueprints created by other developers to accelerate your work. While there’s no centralized repository for existing Flask Blueprints, you can find them using the Python Package Index, GitHub Search, and web search engines. You can learn more about searching PyPI packages in What Is Pip? A Guide for New Pythonistas.

There are various Flask Blueprints and Flask Extensions (which are implemented using Blueprints) that provide functionality that you may find useful:

Authentication
Admin/CRUD generation
CMS functionality
And more!

Instead of coding your application from scratch, you may consider searching for an existing Flask Blueprint or Extension that you can reuse. Leveraging third-party Blueprints and Extensions can help you to reduce development time and keep your focus on your application’s core logic!

Conclusion

In this tutorial, you’ve seen how Flask Blueprints work, how to use them, and how they can help you to organize your application’s code. Flask Blueprints are a great tool for dealing with application complexity as it increases.

You’ve learned:

What Flask Blueprints are and how they work
How you can implement and use a Flask Blueprint
How Flask Blueprints can help you to organize your application’s code
How you can use Flask Blueprints to ease the reusability of your own and third-party components
How using a Flask Blueprint in your project can reduce development time

You can use what you’ve learned in this tutorial to start organizing your applications as a set of blueprints. When you architect your applications this way, you’ll improve code reuse, maintainability, and teamwork!

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Python Bytes: #167 Cheating at Kaggle and uWSGI in prod

February 3, 2020, 12:00 am

≫ Next: Podcast.__init__: Build Your Own Personal Data Repository With Nostalgia

≪ Previous: Real Python: Use a Flask Blueprint to Architect Your Applications

↧

Podcast.init: Build Your Own Personal Data Repository With Nostalgia

February 3, 2020, 6:29 pm

≫ Next: Kushal Das: Tor rpm package repository for Fedora and CentOS/RHEL

≪ Previous: Python Bytes: #167 Cheating at Kaggle and uWSGI in prod

The companies that we entrust our personal data to are using that information to gain extensive insights into our lives and habits while not always making those findings accessible to us. Pascal van Kooten decided that he wanted to have the same capabilities to mine his personal data, so he created the Nostalgia project to integrate his various data sources and query across them. In this episode he shares his motivation for creating the project, how he is using it in his day-to-day, and how he is planning to evolve it in the future. If you're interested in learning more about yourself and your habits using the personal data that you share with the various services you use then listen now to learn more.

Summary

The companies that we entrust our personal data to are using that information to gain extensive insights into our lives and habits while not always making those findings accessible to us. Pascal van Kooten decided that he wanted to have the same capabilities to mine his personal data, so he created the Nostalgia project to integrate his various data sources and query across them. In this episode he shares his motivation for creating the project, how he is using it in his day-to-day, and how he is planning to evolve it in the future. If you’re interested in learning more about yourself and your habits using the personal data that you share with the various services you use then listen now to learn more.

Announcements

Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
Your host as usual is Tobias Macey and today I’m interviewing Pascal van Kooten about his nostalgia project, a nascent framework for taking control of your personal data

Interview

Introductions
How did you get introduced to Python?
Can you start by describing your mission with the nostalgia project?
- How did the topic of personal data management come to be a focus for you?
What other options exist for users to be able to collect and manage their own data?
- What capabilities were lacking in those options that made you feel the need to build Nostalgia?
What is your target audience for this set of projects?
How are you using Nostalgia in your own life?
- What are some of the insights that you have been able to gain as a result of integrating your data with Nostalgia?
Can you describe the current architecture of the Nostalgia platform and how it has evolved since you began work on it?
- What are some of the assumptions that you are using to direct the focus of your development and interaction design?
What are the minimum number of data sources needed to make this useful?
What are some of the challenges that you are facing in collating and integrating different data sources?
What are some of the drawbacks of using something like Nostalgia for managing your personal data?
What are some of the most interesting/challenging/unexpected aspects of your work on Nostalgia so far?
What do you have planned for the future of the project?

Keep In Touch

Website
LinkedIn
@kootenpv on Twitter
kootenpv on GitHub

Picks

Tobias
- Jumanji: The Next Level
- Jumanji
Pascal
- Bup

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

↧

Kushal Das: Tor rpm package repository for Fedora and CentOS/RHEL

February 3, 2020, 8:59 pm

≫ Next: Learn PyQt: Adding images to PyQt5/PySide2 applications, using QLabel and QPixmap

≪ Previous: Podcast.__init__: Build Your Own Personal Data Repository With Nostalgia

Now we have official Tor RPM repositories for Fedora, CentOS/RHEL. The support documentation is already in place.

Using this repository, you can get the latest Tor build for your distribution from the upstream project itself. Tor already provides similar packages for Debian/Ubuntu systems.

How to enable the repository in your Fedora box?

Add the following to the /etc/yum.repos.d/tor.repo.

[tor]
name=Tor for Fedora $releasever - $basearch
baseurl=https://rpm.torproject.org/fedora/$releasever/$basearch
enabled=1
gpgcheck=1
gpgkey=https://rpm.torproject.org/fedora/public_gpg.key
cost=100

Then you can install the package via regular dnf command.

$ sudo dnf install tor

You will have to import the new keys used for signing these packages.

Importing GPG key 0x3621CD35:
Userid : "Kushal Das (RPM Signing key) <kushal@torproject.org>"
Fingerprint: 999E C8E3 14BC 8D46 022D 6C7D E217 C30C 3621 CD35
From : https://rpm.torproject.org/fedora/public_gpg.key
Is this ok [y/N]: y

If you run a Tor relay (which you all should, one of the easiest ways to contribute to the project and help people worldwide) on CentOS/RHEL, you can use similar repository configuration.

↧

Learn PyQt: Adding images to PyQt5/PySide2 applications, using QLabel and QPixmap

February 4, 2020, 1:13 am

≫ Next: Codementor: How to Hire Developers for Your Startup?

≪ Previous: Kushal Das: Tor rpm package repository for Fedora and CentOS/RHEL

Adding images to your application is a common requirement, whether you're building an image/photo viewer, or just want to add some decoration to your GUI. Unfortunately, because of how this is done in Qt, it can be a little bit tricky to work out at first.

In this short tutorial, we will look at how you can insert an external image into your PyQt5/Pyside2 application layout, using both code and Qt Designer.

What widget to use?

Since you're wanting to insert an image you might be expecting to use a widget named QImage or similar, but that would make a bit too much sense! QImage is actually Qt's image object type, which is used to store the actual image data for use within your application. The widget you use to display an image is QLabel.

The primary use of QLabel is of course to add labels to a UI, but it also has the ability to display an image — or pixmap— instead, covering the entire area of the widget. Below we'll look at how to use QLabel to display a widget in your applications.

Using Qt Designer

First, create a MainWindow object in Qt Designer and add a "Label" to it. You can find Label at in Display Widgets in the bottom of the left hand panel. Drag this onto the QMainWindow to add it.

MainWindow with a single QLabel added

Next, with the Label selected, look in the right hand QLabel properties panel for the pixmap property (scroll down to the blue region). From the property editor dropdown select "Choose File…" and select an image file to insert.

As you can see, the image is inserted, but the image is kept at its original size, cropped to the boundaries of theQLabel box. You need to resize the QLabel to be able to see the entire image.

In the same controls panel, click to enable scaledContents.

When scaledContents is enabled the image is resized to the fit the bounding box of the QLabel widget. This shows the entire image at all times, although it does not respect the aspect ratio of the image if you resize the widget.

You can now save your UI to file (e.g. as mainwindow.ui).

To view the resulting UI, we can use the standard application template below. This loads the .ui file we've created (mainwindow.ui) creates the window and starts up the application.

PyQt5
PySide2

pythonimportsysfromPyQt5importQtWidgets,uicapp=QtWidgets.QApplication(sys.argv)window=uic.loadUi("mainwindow.ui")window.show()app.exec()

pythonimportsysfromPySide2importQtWidgetsfromPySide2.QtUiToolsimportQUiLoaderloader=QUiLoader()app=QtWidgets.QApplication(sys.argv)window=loader.load("mainwindow.ui",None)window.show()app.exec_()

Running the above code will create a window, with the image displayed in the middle.

QtDesigner application showing a Cat

Using Code

Instead of using Qt Designer, you might also want to show an image in your application through code. As before we use a QLabel widget and add a pixmap image to it. This is done using the QLabel method .setPixmap(). The full code is shown below.

PyQt5
PySide2

pythonimportsysfromPyQt5.QtGuiimportQPixmapfromPyQt5.QtWidgetsimportQMainWindow,QApplication,QLabelclassMainWindow(QMainWindow):def__init__(self):super(MainWindow,self).__init__()self.title="Image Viewer"self.setWindowTitle(self.title)label=QLabel(self)pixmap=QPixmap('cat.jpg')label.setPixmap(pixmap)self.setCentralWidget(label)self.resize(pixmap.width(),pixmap.height())app=QApplication(sys.argv)w=MainWindow()w.show()sys.exit(app.exec_())

pythonimportsysfromPySide2.QtGuiimportQPixmapfromPySide2.QtWidgetsimportQMainWindow,QApplication,QLabelclassMainWindow(QMainWindow):def__init__(self):super(MainWindow,self).__init__()self.title="Image Viewer"self.setWindowTitle(self.title)label=QLabel(self)pixmap=QPixmap('cat.jpg')label.setPixmap(pixmap)self.setCentralWidget(label)self.resize(pixmap.width(),pixmap.height())app=QApplication(sys.argv)w=MainWindow()w.show()sys.exit(app.exec_())

The block of code below shows the process of creating the QLabel, creating a QPixmap object from our file cat.jpg (passed as a file path), setting this QPixmap onto the QLabel with .setPixmap() and then finally resizing the window to fit the image.

label=QLabel(self)pixmap=QPixmap('cat.jpg')label.setPixmap(pixmap)self.setCentralWidget(label)self.resize(pixmap.width(),pixmap.height())

Launching this code will show a window with the cat photo displayed and the window sized to the size of the image.

QMainWindow with Cat image displayed

Just as in Qt designer, you can call .setScaledContents(True) on your QLabel image to enable scaled mode, which resizes the image to fit the available space.

label=QLabel(self)pixmap=QPixmap('cat.jpg')label.setPixmap(pixmap)label.setScaledContents(True)self.setCentralWidget(label)self.resize(pixmap.width(),pixmap.height())

Notice that you set the scaled state on the QLabel widget and not the image pixmap itself.

Conclusion

In this quick tutorial we've covered how to insert images into your Qt UIs using QLabel both from Qt Designer and directly from PyQt5/PySide2 code.

↧

Codementor: How to Hire Developers for Your Startup?

February 4, 2020, 4:37 am

≫ Next: Roberto Alsina: XRandRoll: a new program to configure displays

≪ Previous: Learn PyQt: Adding images to PyQt5/PySide2 applications, using QLabel and QPixmap

Are you looking for hiring developers for your startup but don't know how to do the same? Hire are top things you should know before hiring.

↧

Roberto Alsina: XRandRoll: a new program to configure displays

February 4, 2020, 5:52 am

≫ Next: Real Python: Sets in Python

≪ Previous: Codementor: How to Hire Developers for Your Startup?

TL; DR: I wrote a experimental tool to support X display configurations which I think is, or at least will be, better than others, and you can check it out at https://github.com/ralsina/xrandroll

Now the real post:

I have been using a dual monitor configuration for a little while. However it's a slightly special one.

One monitor is a normal samsung 27" 1080P monitor. But it's in a monitor stand that allows it to rotate. So it's either horizontal or vertical.
The other monitor is the laptop's. BUT ... it's a 2-in-1 so it can be in "normal" or "tent" or "tablet" positions. And when it changes position it reconfigures itself automatically using KDE's awesome support for it. So it can be in 4 different orientations.

So, if you are counting, that gives me 8 different possible monitor configurations.

Also, another thing is that while both screens have the same resolution, they have very different physical dimensions. Display configuration tools usually don't care for that (maybe with good reason!)

So, I wanted to experiment with how a tool would work that:

Looked / worked more or less like current tools
Allowed a little more flexibility
Did some fancy scale things with physical dimensions
Tried to support xrandr features that are ignored by most tools
Got its configuration from xrandr itself.
Applied its configuration via xrandr
Did screen mirroring better (say: exact mirroring when monitors are not the same mode? It does that)

So, xrandroll starts with several "philosophical" opinions in place.

In principle, it stores no configuration. It should obtain the state from xrandr. So it starts with a real reflection of your system as it exists.

It allows more display scaling flexibility. Independent scales per axis! A widget that does all the silly calculations to make things the same size!

It sort of does what I want now? In a prototypey-this-code-needs-to-be-rewritten way?

For the future, I intend to add capability to monitor your monitors (heh) and refresh itself if, for example, you plug in a monitor to your computer with xrandroll running. Also, some sort of service that configures monitors automatically as they are added / removed.

So, it may be worth taking a look at it. If you find bugs (there are bound to be dozens) you can file a bug attaching your xrandr output and I can debug them!

Interaction, UX, etc are still a WIP and subject to change. Experiments are being made. But it should be unable to destroy your system! You can probably even go back to whatever working config you had by clicking "Reset"!

Have fun and keep me posted.

↧

Real Python: Sets in Python

February 4, 2020, 6:00 am

≫ Next: Dataquest: Python for Beginners: Why Does Python Look the Way It Does?

≪ Previous: Roberto Alsina: XRandRoll: a new program to configure displays

In this course, you’ll learn about sets. They’re a useful data structure that allows you to do some complex operations more easily. They come up everywhere in the real world and are important to understand.

By the end of this course, you’ll know:

What a set is
How to define a set in Python
How to operate on a set
How to modify a set
When to use sets
Why sets are a good choice for checking membership

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

Changes For The User

Changes For The Developer

Azure Data Explorer

Data Preparation NY Taxi Dataset

Data Ingestion

Ingest data into the Azure Data Explorer

Query External Tables

Export Data

Tickled!

Old Code

Back to the tickle

Code Journey

Base changing

People representation

Generate

Thue-Morse sequence generator from digit counts

Example runs:

Checks

P.S:

What about that PyPropertyAccess? *

Notable omissions and regrets

Introduction

Our survival scenario

Building a model

Solution

Figure 1.

Figure 2.

Conclusions

Affected supported versions

CVE-2020-7471: Potential SQL injection via StringAgg(delimiter)

Resolution

General notes regarding security reporting

Enabling log on the Datadog agent

Setting up daiquiri

Using extra

A log object per object

Log Design

What a Flask Application Looks Like

What a Flask Blueprint Looks Like

How Flask Blueprints Work

Making a Flask Blueprint

Registering the Blueprint in Your Application

How to Use Flask Blueprints to Architect Your Application’s Code

Understanding Why Project Layout Matters

Organizing Your Projects

Including Templates

Providing Functionality Other Than Views

How to Use Flask Blueprints to Improve Code Reuse

Conclusion

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

How to enable the repository in your Fedora box?

What widget to use?

Using Qt Designer

Using Code

Conclusion

CVE-2020-7471: Potential SQL injection via `StringAgg(delimiter)`

Setting up `daiquiri`

Using `extra`