I’ve been thinking a bunch recently about how to use Hypothesis to test constrained optimisation problems. I put together this talk for pydata London, and yesterday I sketched out these tests for how you might test a solution to the knapsack problem (both of these are specifically combinatorial optimisation problems, but the principles I’m outlining here work in general).
Essentially there are only two properties to constrained optimisation problems:
- For all input problems, the optimised solution satisfies the constraints
- For all input problems and solutions satisfying the constraints of those problems, the solution does not score better than the optimised solution.
The first property is usable as is – you write a generator for the problems you’re trying to solve, write a test that runs the optimiser on the problem and spits out a solution, check it satisfies the constraints. This is a super useful test for the correctness of your optimiser and you should definitely write it.
But really the one we’re interested in is the second property. We want to assert that the solution our optimiser provides us with is actually optimal.
The problem is that fuzz testers are not optimisers, and if your optimal solution is worse than simply picking a point at random, basically any sort of test will demonstrate this. You can’t reliably expect Hypothesis or similar to find an improvement just by asking for arbitrary data.
If we have a reference solution (e.g. by restriction to a sufficiently small corner of the space that we can solve the problem through brute force) then that’s one way of doing it: Just run the problem through both your optimiser and the reference solution and assert that they produce solutions with the same score (note: Not necessarily the same solution. There may be multiple equally good optima). This also works if you have a reference but not very good solution – if you implement say a cheap local optimiser and assert that running the local optimiser on a randomly picked point doesn’t improve on your optimal solution, this is still a pretty good test.
But I’m interested in testing the case where we don’t have a reference solution. How would we do that?
Without further ado, here are some properties I’ve figured out that seem to work well:
- If we take the optimal solution and perturb it slightly, this should never improve on the optimal solution. Bonus: If you have a local optimiser, perturb the solution slightly and then run the local optimiser on that perturbed solution.
- If we make the problem strictly easier (e.g. in the knapsack problem by raising the value of one of the items we’ve already chosen) then this should strictly improve the score if we rerun the optimiser.
- If we make the problem strictly harder (e.g. by removing one of the chosen items in the knapsack) and rerun the optimiser, this should never improve the score.
The last seems the most promising to me, because it helps create situations where you break out of local optima. You potentially kill off the local optima you’ve found yourself in and see if you can find a better one. Notably, both of the tests that demonstrated a problem in the knapsack algorithm (increasing the weight of a chosen item, removing a chosen item) are of this form.
The first is also useful, particularly for testing a local optimiser because it helps you find new things you can add in to the optimisation process! e.g. if you find that reversing the solution and rerunning the optimiser on the reversed solution sometimes improves matters, why not iterate that to a fixed point in your optimisation process?
The middle solution is I suspect not that useful because in general things that make the problem easier by promoting stuff that was already chosen, so the optimiser will tend to make the same choice. I might be missing cases where this is actually a useful thing to be doing.
I’m going to keep an eye on this, as it seems to be an under-explored area (not completely unexplored – there’s some research on doing more classic fuzz testing on SMT solvers that I need to read up on) and I think it might inform useful heuristics for other places where you can do property based testing (machine learning for example is something that I’m interested in figuring out how to do).