Do the work (Janusworx): Generating Markdown from HTML

August 23, 2019, 5:45 am

≫ Next: Continuum Analytics Blog: How to Build a Custom Anaconda Installer for R

≪ Previous: Talk Python to Me: #226 Building Flask APIs for data scientists

2019-08-21

Started with the problem,
Need to take in an md file and then generate an html file.

Hint given, use a package from PyPI.

Decided to use the Markdown package from PyPI.
Looks good to me.

Was advised to work off a branch while developing.
Thank God for friends who teach you good habits.
Looking up how to do that in git now.

Watched Git videos for about an hour.
Learnt lots about branches. Giving up for today. it’s 8 in the eve and i am tired.

2019-08-22

Captain’s Log, Stardate something, something :P
Let’s see what the day holds.

Figured out how to read in a file.
Figured out how to convert it.
Figured out how to write to a file safely.
Now gotta figure out how to write the same file name as the one i feed it.
11:44:24, went on a little rabbit hunt for how to type times in here :)
11:45:21, first time ever, that I’ve hit a flow state, doing Python
12:18:03, got it working, yea baby!
12:40:55, Kushal suggested corrections and improvements. getting to work on those later.

2018-08-23

Captain’s Log, stardate, weekend on terra begins XD

09:04, beginning with Kushal’s suggestions
09:38, beginning now. :) had a power cut
End day. Nothing done.

↧

Continuum Analytics Blog: How to Build a Custom Anaconda Installer for R

August 23, 2019, 7:00 am

≫ Next: PSF GSoC students blogs: Check in: Final

≪ Previous: Do the work (Janusworx): Generating Markdown from HTML

A frequent question on the Anaconda Community mailing list is how to package R with conda for distribution. Depending on the use case, one option may be to use conda to move environments. This requires…

The post How to Build a Custom Anaconda Installer for R appeared first on Anaconda.

↧

PSF GSoC students blogs: Check in: Final

August 23, 2019, 7:49 am

≫ Next: Peter Bengtsson: Train your own spell corrector with TextBlob

≪ Previous: Continuum Analytics Blog: How to Build a Custom Anaconda Installer for R

1. What did you do this week?

Polished my final evaluation
Started working in how to add Chianti levels and lines to the atomic files (PR #150)
GSoC'19 has ended!

2. What is coming up next?

I'll keep working with the TARDIS team and contributing to their codebase :)

3. Did you get stuck anywhere?

No, I didn't.

↧

Peter Bengtsson: Train your own spell corrector with TextBlob

August 23, 2019, 7:52 am

≫ Next: Codementor: How To Learn Any Programming Language Online in 2019

≪ Previous: PSF GSoC students blogs: Check in: Final

TextBlob is a wonderful Python library it. It wraps nltk with a really pleasant API. Out of the box, you get a spell-corrector. From the tutorial:

>>>fromtextblobimportTextBlob>>>b=TextBlob("I havv goood speling!")>>>str(b.correct())'I have good spelling!'

The way it works is that, shipped with the library, is this text file: en-spelling.txt It's about 30,000 lines long and looks like this:

;;;   Based on several public domain books from Project Gutenberg
;;;   and frequency lists from Wiktionary and the British National Corpus.
;;;   http://norvig.com/big.txt
;;;   
a 21155
aah 1
aaron 5
ab 2
aback 3
abacus 1
abandon 32
abandoned 72
abandoning 27

That gave me an idea! How about I use the TextBlob API but bring my own text as the training model. It doesn't have to be all that complicated.

The challenge

(Note: All the code I used for this demo is available here: github.com/peterbe/spellthese)

I found this site that lists "Top 1,000 Baby Boy Names". From that list, randomly pick a couple of out and mess with their spelling. Like, remove letters, add letters, and swap letters.

So, 5 random names now look like this:

▶ python challenge.py
RIGHT: jameson  TYPOED: jamesone
RIGHT: abel     TYPOED: aabel
RIGHT: wesley   TYPOED: welsey
RIGHT: thomas   TYPOED: thhomas
RIGHT: bryson   TYPOED: brysn

Imagine some application, where fat-fingered users typo those names on the right-hand side, and your job is to map that back to the correct spelling.

First, let's use the built in TextBlob.correct. A bit simplified but it looks like this:

fromtextblobimportTextBlobcorrect,typo=get_random_name()b=TextBlob(typo)result=str(b.correct())right=correct==result...

And the results:

▶ python test.py
ORIGIN         TYPO           RESULT         WORKED?
jesus          jess           less           Fail
austin         ausin          austin         Yes!
julian         juluian        julian         Yes!
carter         crarter        charter        Fail
emmett         emett          met            Fail
daniel         daiel          daniel         Yes!
luca           lua            la             Fail
anthony        anthonyh       anthony        Yes!
damian         daiman         cabman         Fail
kevin          keevin         keeping        Fail
Right 40.0% of the time

Buuh! Not very impressive. So what went wrong there? Well, the word met is much more common than emmett and the same goes for words like less, charter, keeping etc. You know, because English.

The solution

The solution is actually really simple. You just crack open the classes out of textblob like this:

fromtextblobimportTextBlobfromtextblob.enimportSpellingpath="spelling-model.txt"spelling=Spelling(path=path)# Here, 'names' is a list of all the 1,000 correctly spelled names.# e.g. ['Liam', 'Noah', 'William', 'James', ...spelling.train(" ".join(names),path)

Now, instead of corrected = str(TextBlob(typo).correct()) we do result = spelling.suggest(typo)[0][0] as demonstrated here:

correct,typo=get_random_name()b=spelling.suggest(typo)result=b[0][0]right=correct==result...

So, let's compare the two "side by side" and see how this works out. Here's the output of running with 20 randomly selected names:

▶ python test.py
UNTRAINED...
ORIGIN         TYPO           RESULT         WORKED?
juan           jaun           juan           Yes!
ethan          etha           the            Fail
bryson         brysn          bryan          Fail
hudson         hudsn          hudson         Yes!
oliver         roliver        oliver         Yes!
ryan           rnyan          ran            Fail
cameron        caeron         carron         Fail
christopher    hristopher     christopher    Yes!
elias          leias          elias          Yes!
xavier         xvaier         xvaier         Fail
justin         justi          just           Fail
leo            lo             lo             Fail
adrian         adian          adrian         Yes!
jonah          ojnah          noah           Fail
calvin         cavlin         calvin         Yes!
jose           joe            joe            Fail
carter         arter          after          Fail
braxton        brxton         brixton        Fail
owen           wen            wen            Fail
thomas         thoms          thomas         Yes!
Right 40.0% of the time

TRAINED...
ORIGIN         TYPO           RESULT         WORKED?
landon         landlon        landon         Yes
sebastian      sebstian       sebastian      Yes
evan           ean            ian            Fail
isaac          isaca          isaac          Yes
matthew        matthtew       matthew        Yes
waylon         ywaylon        waylon         Yes
sebastian      sebastina      sebastian      Yes
adrian         darian         damian         Fail
david          dvaid          david          Yes
calvin         calivn         calvin         Yes
jose           ojse           jose           Yes
carlos         arlos          carlos         Yes
wyatt          wyatta         wyatt          Yes
joshua         jsohua         joshua         Yes
anthony        antohny        anthony        Yes
christian      chrisian       christian      Yes
tristan        tristain       tristan        Yes
theodore       therodore      theodore       Yes
christopher    christophr     christopher    Yes
joshua         oshua          joshua         Yes
Right 90.0% of the time

See, with very little effort you can got from 40% correct to 90% correct.

Note, that the output of something like spelling.suggest('darian') is actually a list like this: [('damian', 0.5), ('adrian', 0.5)] and you can use that in your application. For example:

<li><a href="?name=damian">Did you mean <b>damian</b></a></li>
<li><a href="?name=adrian">Did you mean <b>adrian</b></a></li>

Bonus and conclusion

Ultimately, what TextBlob does is a re-implementation of Peter Norvig's original implementation from 2007. I too, have written my own implementation in 2007. Depending on your needs, you can just figure out the licensing of that source code and lift it out and implement in your custom ways. But TextBlob wraps it up nicely for you.

When you use the textblob.en.Spelling class you have some choices. First, like I did in my demo:

path="spelling-model.txt"spelling=Spelling(path=path)spelling.train(my_space_separated_text_blob,path)

What that does is creating a file spelling-model.txt that wasn't there before. It looks like this (in my demo):

▶ head spelling-model.txt
aaron 1
abel 1
adam 1
adrian 1
aiden 1
alexander 1
andrew 1
angel 1
anthony 1
asher 1

The number (on the right) there is the "frequency" of the word. But what if you have a "scoring" number of your own. Perhaps, in your application you just know that adrian is more right than damian. Then, you can make your own file:

Suppose the text file ("spelling-model-weighted.txt") contains lines like this:

...
adrian 8
damian 3
...

Now, the output becomes:

>>> import os
>>> from textblob.en import Spelling
>>> import os
>>> path = "spelling-model-weighted.txt"
>>> assert os.path.isfile(path)
>>> spelling = Spelling(path=path)
>>> spelling.suggest('darian')
[('adrian', 0.7272727272727273), ('damian', 0.2727272727272727)]

Based on the weighting, these numbers add up. I.e. 3 / (3 + 8) == 0.2727272727272727

I hope it inspires you to write your own spelling application using TextBlob.

For example, you can feed it the names of your products on an e-commerce site. The .txt file might bloat if you have too much but note that the 30K lines en-spelling.txt is only 314KB and it loads in...:

>>> from textblob import TextBlob
>>> from time import perf_counter
>>> b = TextBlob("I havv goood speling!")
>>> t0 = perf_counter(); right = b.correct() ; t1 = perf_counter()
>>> t1 - t0
0.07055813199999861

...70ms for 30,000 words.

↧

Codementor: How To Learn Any Programming Language Online in 2019

August 23, 2019, 8:03 am

≫ Next: Catalin George Festila: Python 3.7.3 : Using the flask - part 015.

≪ Previous: Peter Bengtsson: Train your own spell corrector with TextBlob

The different programming languages, and where to learn

↧

Catalin George Festila: Python 3.7.3 : Using the flask - part 015.

August 23, 2019, 11:32 pm

≫ Next: PSF GSoC students blogs: Thirteenth week of GSoC: Final Checkin

≪ Previous: Codementor: How To Learn Any Programming Language Online in 2019

In this tutorial, I will show you how to migrate using the Database Migrations in flask project. Because my laptop is gone I use my old Linux. First you need to install these python modules with --user argument for Linux: [mythcat@desk my_flask]$ pip3 install flask-migrate --user ... [mythcat@desk my_flask]$ pip3 install flask-script --userLet's test this new issue with server.py file by adding

↧

PSF GSoC students blogs: Thirteenth week of GSoC: Final Checkin

August 24, 2019, 1:10 am

≫ Next: PSF GSoC students blogs: Final Blog Post GSOC19

≪ Previous: Catalin George Festila: Python 3.7.3 : Using the flask - part 015.

1. What did you do this week?

I have compiled a list for week 13 in my changelog here: https://github.com/sappelhoff/gsoc2019/blob/master/changelog.md#week-13

This was the final week of my GSoC 2019. I have written a final report here: https://github.com/sappelhoff/gsoc2019/blob/master/FINAL_REPORT.md

2. What is coming up next?

Next up I will focus on my PhD work, hopefully making use of many of the features I helped to bring about!

I am sure that this will also entail making many bug reports, ... and fixing them. Although it will probably take a longer time, because the focused GSoC time is over for now.

3. Did you get stuck anywhere?

This week, we had many discussions on digitized position data of electrophysiology sensors. This discussion eventually lead into whether we want to support template data in BIDS at all ... or whether BIDS should be just for true, measured data.

See:

↧

PSF GSoC students blogs: Final Blog Post GSOC19

August 24, 2019, 5:24 am

≫ Next: PSF GSoC students blogs: Multi-touch update

≪ Previous: PSF GSoC students blogs: Thirteenth week of GSoC: Final Checkin

Hi everyone!

It’s time for the last GSOC blog. I must say that it has been nice working with all of you. I have learned a lot and faced many professional challenges. In the end it was worth it. I hope GSOC experience will help me in the future and that my project will be useful to open source. You can check my final report at https://github.com/teabolt/gsoc19-eli5-gradcam.

Shout out to both of my mentors for taking me on board!

Konstantin, https://github.com/lopuhin
- Thank you for the constant help, knowledge, and advice! A mentor that I would recommend to any student!
Mikhail, https://github.com/kmike
- Thanks for jumping in at the right moments with important observations and help making the right decisions!

Shout out to other students

Anubhav, https://github.com/anubhavp28
Vipul, https://github.com/vipulgupta2048
Leonardo, https://github.com/leovictorsr
Ashwin, https://github.com/AshwinB-hat
- Though not here for the final evaluation, thanks for sticking around and hopefully you had other good endeavours this summer!

Finally shout out to all the other participants in Scrapy, especially

Ian, https://github.com/imduffy15
- For helping me get into GSOC in the first place, and for all your engineering advice!
Cathal, https://github.com/cathalgarvey
- Thanks for your work as an org-admin!

So that’s it. Hope to see you in open source or in another GSOC.

Peace,

Tomas Baltrunas

↧

PSF GSoC students blogs: Multi-touch update

August 24, 2019, 1:26 pm

≫ Next: Weekly Python StackOverflow Report: (cxci) stackoverflow python report

≪ Previous: PSF GSoC students blogs: Final Blog Post GSOC19

Alas, time has caught up to me. I will not be able to get the touch trackball controls working in time for the submission deadline. Instead, I need to use this remaining time to package up my work and create a more thorough write-up that details what I have done and what is left to be done.

I'm still excited, though, since I got all of the minimum requirements of my proposal done and have gotten a good start on a few extras. I will be working beyond the end of GSoC to get iOS support to a ready state.

See you soon!

↧

Weekly Python StackOverflow Report: (cxci) stackoverflow python report

August 24, 2019, 1:59 pm

≫ Next: PSF GSoC students blogs: Week #12

≪ Previous: PSF GSoC students blogs: Multi-touch update

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2019-08-24 20:58:57 GMT

↧

PSF GSoC students blogs: Week #12

August 24, 2019, 2:36 pm

≫ Next: PSF GSoC students blogs: GSoC weekly blog

≪ Previous: Weekly Python StackOverflow Report: (cxci) stackoverflow python report

I unsuccessfully tried to resolve the issue with dimensions in NMF. The main problem that I was facing was with specifying the correct dimensions for matrix operations within NMF algorithm and I have yet to identify the correct dimensions. Furthermore, there is no incremental algorithms for NMF as in PCA. Thus, if a large data is given, it becomes infeasible to perform NMF. There are several heuristics to go around this issue with scalability, namely parallel NMF, which divides columns (i.e. features) of the data matrix into several subsets, perform NMF independently on each of the column subsets, and then join the results at the end. Unfortunately, this algorithm was not feasible for LiberTEM because with LiberTEM, one only has access to the rows of the data matrix (i.e., images) and not to the full columns (i.e., feature vectors) at each step. I also tried to clean up jupyter notebook and reorganize.

What did I do this week?

Work on NMF

What will I work on next week?

Write documentation

↧

PSF GSoC students blogs: GSoC weekly blog

August 25, 2019, 12:44 am

≫ Next: PSF GSoC students blogs: GSoC Weekly Checkin

≪ Previous: PSF GSoC students blogs: Week #12

Hello everyone!

This week is the second last week and next week we will have our final evaluations. This week I just fixed and improved my previous code.

The other student was working on the Icons cheatsheet and after he completed it, I used some of his components and used them in my code too so that the UI looks consistent. Also the API was having some issues with custom font generation so I am currently working on that.

Thanks for reading

Cheers!

↧

PSF GSoC students blogs: GSoC Weekly Checkin

August 25, 2019, 1:03 am

≫ Next: Catalin George Festila: Python 3.7.3 : Using the flask - part 016.

≪ Previous: PSF GSoC students blogs: GSoC weekly blog

Hello everyone!

What did I do this week?

This was the last week of GSoC. I fixed some previous code and some errors with the Icons-picker API. Right now I am working to add Animated Icons support for the Icons Picker. The back-end part of it is done, so if an API request is made with Animated icons name in the JSON, the custom font will have only those icons. Now I have to implement the use of API and Animated icons selector in front end.

Also, this week I submitted the final evaluation. This is the link that have all the project contributions I made: https://github.com/kbhutani0001/GSoC-2019-report/blob/master/README.md

What is coming up next week?

Nothing :'(
GSoC ends next week. But I will still keep contributing to EOS and other open source projects

Did I get stuck anywhere?

Yes, in the front end of Animated icons selector, the animated icons are written in css so they are inheriting other CSS too. That caused a bit of a problem.

Till next time,
Cheers!

↧

Catalin George Festila: Python 3.7.3 : Using the flask - part 016.

August 24, 2019, 10:29 pm

≫ Next: PSF GSoC students blogs: Final week

≪ Previous: PSF GSoC students blogs: GSoC Weekly Checkin

Today I tested a new feature of Flask version 1.1.1. [mythcat@desk my_flask]$ pip list | grep Flask Flask 1.1.1 Flask-Login 0.4.1 Flask-Mail 0.9.1 Flask-Migrate 2.5.2 Flask-Script 2.0.6 Flask-SQLAlchemy 2.4.0 Flask-WTF 0.14.2 This

↧

PSF GSoC students blogs: Final week

August 25, 2019, 4:39 am

≫ Next: Do the work (Janusworx): Learning Python

≪ Previous: Catalin George Festila: Python 3.7.3 : Using the flask - part 016.

This is going to be blog post about cleaning up the work I was doing and revisiting some old things I have done during the GSoC period.

What did I do this week?

I got a patch merged for adding `experimental` argument to the config registrar. But, reviewers asked me to send a compatibility fix as a follow-up as the extension `perf` was failing for older versions of Mercurial. So, I had sent a patch[1] about that and it got folded to the old merged patch by the reviewers. Also, I was asked some questions by my mentor on some of the patches in the stack of `--unresolved`. I investigated on them and answered. A couple of patches were stuck in the middle as I was not able to reproduce the test results in the bug description apparently. So, I investigated on that and updated the patches[2][3]. Then, I started documenting the things I have done during the whole summer as the final report of GSoC. You can read it here[4].

Did you get stuck anywhere?

I was stuck in reproducing the test results given in one of the bug descriptions. But, I found that it was due to a missing indentation. It might sound trivial but, it was not. This was later resolved and the patch was accepted by my mentor,

↧

Do the work (Janusworx): Learning Python

August 25, 2019, 6:16 am

≫ Next: PSF GSoC students blogs: Final Checkin

≪ Previous: PSF GSoC students blogs: Final week

Did I need to read a fifteen hundred page book to learn Python?
At the end of fourteen hundred pages, I can safely assure you, I did not.

If you want to just solve your pressing issues or scratch your itch, or just plain get started with programming (and programming in Python specifically), I’d recommend starting with a simple, fast paced book, like Python for you and me, and then doing tons of practice.¹

Mark Lutz, as he closes the book, himself laments that Python has gotten too big to hold in your head. And by doing so, has lost some of the simplicity and the joy and fun and the magic, Python held for the early adopters of the language.

And yet, having said all this, boy, am I glad, I read the book.

This is a master class from a master.

I may not have understood everything. I may have skimmed a chapter or two (Lutz assures me, it’s ok :P), but what this book has done to my mind, the furrows it has ploughed, will be with me forever.

I have been trying to get into the book, multiple times since I bought it.
It took me a long time, before, as Mortimer Adler puts it, I could come to terms with the author.
The only reason I kept coming back, was because, Mark’s earnest teaching voice shines through, and I loved it, even if I did not quite get what he was saying in the beginning.
And the reason I could get through it (and enjoy it) this time, because I decided to follow his advice and follow along on the computer.
To actually type in the code, and see what happens.

Yes, the book is big, yes, the concepts are repeated a couple of times, but as I progressed, I could feel him sweating the small stuff over and over, just so that I could understand things, so that I would not get scared away.

Time and again, the book reassured me, that what was said, was not as complicated or hard as it read on the page.
And that turned out to be true as I kept trying the examples out.

While I still have a long way to go, before I can remotely be called fluent, I know this book will have a been a big reason, I will be.²

This book was last updated, oh, some six years ago, and yet unless Python decides to change radically, I dare say, the principles in here will stand the test of time.

This was a great read and will serve as an awesome reference on my Python journey.
If you are slightly kooky like me, and you want to know, why things are the way they are as you learn to program in Python, get this book.

Which is actually, what I am doing. ↩
Besides the practice, that is. ↩

↧

PSF GSoC students blogs: Final Checkin

August 25, 2019, 7:10 am

≫ Next: TechBeamers Python: A Beginners Guide to Become a Machine Learning Engineer

≪ Previous: Do the work (Janusworx): Learning Python

Hey everyone! This is my final blog post here. GSoC 19 has finally come to its end. This week I mostly worked on hg abort --update

What did I do this week?

This week I worked on writing intensive-tests for hg update --abort
As suggested by @mharbison72 I worked on cases where we need to take considerations of subrepos and check cases where a subrepo is dirty and cases where more than one subrepo is present and conflicts for one or more of them is resolved but not all conflicts are resolved so as to check the unmark feature. The patch is still under review but it too will soon get accepted.

Did you get stuck anywhere?
I did not get on anything major this week . I did have to perfect the code considering some endpoint cases but it came around fine enough.

Finally, I would like to thank Pulkit for reviewing the patches and solving the doubts despite having a busy schedule, Martin and other people in the community for teaching the basic etiquettes and facts required to be a better open-source contributor and solving the doubts no matter how silly they were.

Adios !

↧

TechBeamers Python: A Beginners Guide to Become a Machine Learning Engineer

August 25, 2019, 12:12 pm

≫ Next: PSF GSoC students blogs: [Blog #5] Time just seems to fly.

≪ Previous: PSF GSoC students blogs: Final Checkin

Do you wish to become a machine learning engineer? Yes, why not, you should because this job has the highest no. of openings in 2019 with $75K as the baseline salary. Also, it is an engineering stream, which is highly technical and provides countless opportunities to learn. By working in this field, you can not only improve your finances but also grow intellectually. This post intends to highlight all the steps that are essential for becoming a machine learning engineer. You’ll get to learn – What is Machine Learning, the job of a Machine Learning Engineer, his/her roles and responsibilities.

The post A Beginners Guide to Become a Machine Learning Engineer appeared first on Learn Programming and Software Testing.

↧

PSF GSoC students blogs: [Blog #5] Time just seems to fly.

August 25, 2019, 12:52 pm

≫ Next: PSF GSoC students blogs: [Blog #6] Part of the journey is the end.

≪ Previous: TechBeamers Python: A Beginners Guide to Become a Machine Learning Engineer

Hello! This is my second last blog post for GSoC 2019 - time has gone by so quickly. I spend this week documenting Protego’s API in detail. I opened a pull request to add Protego integration in Scrapy. I added PyPy test environment and modified Protego to treat non-terminal dollar sign as ordinary character.

Up next, I will start the process to transfer Protego to Scrapy organisation on GitHub. I would modify `SitemapCrawler` in Scrapy to use the new interface, and implement a `ROBOTSTXT_USER_AGENT` setting in Scrapy.

I faced minor problems trying to setup PyPy environment in Travis. With the help from mentors, I was able to resolve the issue.

↧

PSF GSoC students blogs: [Blog #6] Part of the journey is the end.

August 25, 2019, 1:07 pm

≫ Next: PSF GSoC students blogs: Panda3D iOS Support - A Postmortem

≪ Previous: PSF GSoC students blogs: [Blog #5] Time just seems to fly.

Part of the journey is the end. It is time for me to work on my final work report for final evaluation of Google Summer of Code 2019. This week, I will devote my time mainly to write my final report and final blog post. If time permits, I will work on my PRs from last week.

Last week, I worked on getting Travis to push automatically to PyPI and I redid benchmarking.

↧