Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22463

David MacIver: Looking into doing a PhD

$
0
0

As regular readers of this blog have probably figured out, I’m a researchy sort of person.

A lot of my hobbies – maths, voting theory, weird corners of programming, etc – are research oriented, and most of my work has had some sort of research slant to it.

The last two years I’ve basically been engaged in a research project working on Hypothesis. It’s come quite far in that time, and I feel reasonably comfortable saying that it’s the best open source property based testing library on most metrics you’d care to choose. It has a number of novel features and implementation details that advance the state of the art.

It’s been pretty great working on Hypothesis like this, but it’s also been incredibly frustrating.

The big problem is that I do not have an academic background. I have a masters in mathematics (more technically I have a BA, an MA, and a CASM. Cambridge is weird. It’s entirely equivalent to a masters in mathematics though), but that’s where I stopped. Although it says “DR” in my online handle and the domain of this blog, those are just my initials and not my qualification.

As a result, I have little to no formal training or experience in doing academic research, and a similarly low understanding of who’s who and what’s what within the relevant fields. So I’ve been reading papers and trying to figure out the right people to talk to all on my own, and while it’s gone OK it’s still felt like fumbling around in the dark.

Which leads to the obvious solution that I spoilered in the title: If the problem is that I’m trying to do research outside of an academic context, the solution is to do research in an academic context.

So I’d like to do a PhD that is either about Hypothesis, or about something close enough to Hypothesis that each can benefit from the other.

There’s probably enough novel work in Hypothesis already that I could “just” clean it up, factor it out, and turn it into a PhD thesis as it is, but I’m not really expecting to do that (though I’d like that to be part of it). There are a number of additional directions that I think it would be worth exploring, and I expect most PhD funding will come with a focus subject attached which I would be happy to adapt to (a lot of the most interesting innovations in Hypothesis came because some external factor forced me to think about things in ways I wouldn’t otherwise have!).

In the absence of further factors, here are some of the directions for Hypothesis that I think it would be interesting to research further:

  • I have some current prototype work that really pares down Hypothesis to a single core testing primitive on which everything is built. That’s already the case to some degree, but the current primitive is rather messy and the new one is really much more elegant (it takes the core Hypothesis engine back to its eXplode origins and then rebuilds a lot of nice abstractions on top of that). I think this will work really well and it opens up a lot of possibilities for other novel abstractions built on top of it.
  • I’d like to pursue better grammar based generation on top of Hypothesis – e.g. I’d like to make it easy to define and use some sort of Boltzmann Sampler. This would significantly enlarge the set of things you can test with it, and would make it easy to build fuzzers for a wide variety of protocols while getting a lot of the benefits of Hypothesis (mostly example shrinking, but also any of the other improvements on this list) for free.
  • Conversely I’d like to use some sort of grammar inference in its backend. At its core Hypothesis is a tool for transforming byte streams into structured data. Being able to infer a grammar for the underlying bytestream would help in a number of ways, most notably it would significantly improve the assumption functionality. I’ve experimented with this in the past and not found e.g. L* search to work very well for this, but I’ve since read some more research that suggests that it might be practical.
  • I’d like to figure out how to integrate coverage based information into Hypothesis. I’ve done experiments in the past and I’ve produced some prototypes that work pretty well if you run them for long enough, but were not very useful because of the time constraints on Hypothesis running as part of a normal test suite. I’d like to see if I can improve on that.
  • I’m interested in using spies as a way of adding lightweight Concolic testing like features to Hypothesis. Possibly related is my Schroedinteger prototype, where values are kept in a suspension of one of a number of possibilities for as long as possible.
  • I’m interested in exploring parallel testing using Hypothesis. Historically this hasn’t been a priority because Python, but in principle this sort of testing is embarrassingly parallel, so it’s a shame not to take advantage of that.
  • I’m also interested in proving the claim that Hypothesis is an extremely portable set of testing primitives that are easy to implement other languages. This isn’t probably research-worthy in and of itself, but it opens up a lot of potential other applications.

I’m not particularly wed to any of these. They’re all things that I think would be both interestingly novel and would improve users’ lives, but one of the nice things about having so many ideas is that I can’t do all of them anyway, so I have to prioritise. Given that, adding more to the priority queue is no hardship at all!

Which, finally, brings me to the main point of the post: What I want from you.

I’m already looking into and approaching potential universities and interesting researchers there who might be good supervisors or able to recommend people who are. I’ve been in touch with a couple (some of whom might be reading this post. Hi), but I would also massively appreciate suggestions and introductions.

So, if you work in relevant areas or know of people who do and think it would be useful for me to talk to, please drop me an email at david@drmaciver.com. Or just leave a comment on this blog post, tweet at me, etc.


Viewing all articles
Browse latest Browse all 22463

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>