The long awaited Recurse Center debut day, 26th Sep 2016 kick started with a welcome note by Nicholas Bergson-Shilcock and David Albert ; decorated by other events and activities to get to know the batchmates; the culture of RC and ended with closing note by Sonali Sridhar.
At the end of the day, I had decided to build a BitTorrent client as a first project. I was at the crossroad to choose Python or Rust or Go for the project. After a quick chat with batch mate, I decided to write the BitTorrent client in Python. I neither knew Rust well nor wrote a BitTorrent client in the past. Fighting two battles at the same time is hard.
My experience with network application is limited. I have maintained Web Socket server and fixed bugs at a previous job. The BitTorrent client is my first major network application.
As a first step towards building the client, I started reading Wikipedia documentation and official proposal. RC alum shared valuable resources, unofficial proposal and blog post from an alum. I continued to read Wikipedia article grasped the higher level working of the BitTorrent working. In one afternoon, few batchmates got together and started to discuss the protocol; strategy recommended to download the data; security; jargon and authentication. We drew higher level steps in the life cycle of the download in a whiteboard. It was an enlightening session and helped me crystal the understanding.
Resisting from coding was hard. I started to program the client by parsing torrent in bencode format. The significant portion of the data in the file is in binary format. Next step in the procedure is to gather how communication between tracker - server which stores information about connected and clients. The blog post suggested to capture all the packets during a torrent session. First I underestimated the value of the advice. Later while reading the spec and scribbling code, I found the value. Both Wireshark and tcpdump were helpful. I like tcpdump because of ease of use, lightweight and command line interface, but viewing captured packets is hard on the command line. The Wireshark infers data from the packet and renders in a useful format with a lot of switches like order by protocol, view the unencrypted packets, etc… Then I implemented communication layer for trackers using HTTP and not UDP. For example, the piratebay trackers completely use UDP.
The next piece is to build components which communicate with other seeders - clients who have complete or partial data. This part was time-consuming for me various reasons.
- All the communication happens in binary data from the client. Having used JSON a lot and human readable format debugging binary data takes away a lot of time.
- Understanding the message exchange format between clients and peer was tricky and different message encoding format. The client message carries data, type of data and length of data. This approach is entirely different coming from web application background where readability and usability are given importance.
Before starting to implement the crux of the client, asyncio
clicked my mind and found reference implementation. The project is highly resourceful for me to build the torrent client. As of now, my client can parse the torrent file, contact the tracker to seeders information, contact peers, respond to a message and request a piece of data. All the above step constitute to 80% in the first milestone of a BitTorrent client. Yes, I can hear you murmuring 80-20 rule!
I’m on the verge of finishing the client and will release the source code in next few days. I’d suggest you to write a BitTorrent client if you haven’t.
OTH, RC’s address is 455, BroadWay, New York. The Broadway
appealed to me and when I visited, the place was a multi laned road. Out of curiosity when I read more, it was revealing to know broadway is literal translation of Dutch words bret, weg and is one the oldest road stretching 53 km long through The New York City boroughs of Manhattan and the Bronx, New York
The county of Westchester, New York.