Random selection utilities used to be common in interviews. Less so in Python circles because of the builtin random
module. Still advanced examples may come up. First is a generalization of shuffle
and sample
.
In [1]:
importitertoolsimportrandomdefshuffled(iterable):"""Generate values in random order for any iterable. Faster than `random.shuffle` if not all values are required. More flexible than `random.sample` if the desired number is unknown a priori."""values=list(iterable)whilevalues:index=random.randrange(0,len(values))values[index],values[-1]=values[-1],values[index]yieldvalues.pop()list(itertools.islice(shuffled(range(10)),5))
Out[1]:
Next up is a random sample in a single pass, e.g., if the data is being read from a large file. The solution requires mathematical induction:
- each Nth element has a fair chance of being selected
- each previously selected element has a fair chance of being removed
In [2]:
defsample(iterable,k):"""Return a random sample from any iterable in a single pass. More memory efficient than `random.sample`."""it=iter(iterable)selection=list(itertools.islice(it,k))# error handling and shuffling are consistent with random.sampleifnot0<=k<=len(selection):raiseValueError("sample larger than population")random.shuffle(selection)forcount,valueinenumerate(it,k+1):index=random.randrange(0,count)ifindex<len(selection):selection[index]=valuereturnselectionsample(iter(range(10)),5)
Out[2]: