HTTP
This post recounts my efforst to write an HTTP server and client. This is the fifth post in my simple systems series. As with all posts in this series I have sacrified elegance and performance for the sake of making the process clear and easy to understand. If I have done a good job with this post you should feel like you have the basic idea of how to write an HTTP server and client from scratch(from sockets), which will hopefully give you knowledge to hang new information on when you look at code bases for things like Apache2 or Nginx.
Objectives
- Create an HTTP client capable of requesting web pages.
- Create an HTTP server, also known as a web server, capable of hosting web pages.
Introduction
HTTP is the youngest protocol I have looked at so far. It's initial RFC was rfc 1945 in 1996. Makes it the first protocol I've talked about that is officially younger than me (though technically work started on it a year before I was born). Hyper Text Transfer Protocol's name has a delighful history so I can't help but quote it from Wikipedia:
The term hypertext was coined by Ted Nelson in 1965 in the Xanadu Project, which was in turn inspired by Vannevar Bush's 1930s vision of the microfilm-based information retrieval and management "memex" system described in his 1945 essay "As We May Think".
I do greatly enjoy old scifi references in my modern technology.
HTTP at it's heart is just a protocol for transfering text from server to client. Originally having only "GET" as it's one request type, it would only return HTML. It has since grown to be able to transfer more than text. Binary blobs and streams for pictures and streaming videos. But the simplicity of it being a protocol that can simply transfer text is one of the driving forces behind making it such a valuable protocol today. Obviously the biggest factor in it's popularity is it's use as a the protocol that made websites a thing. But the thing is HTTP is now being used for more than just web browsing.
RESTful APIs are the new hotness, and I can easily see why. JSON is an excellent text specification for transfering data in a compact and fairly human readable form. HTTP under pins REST apis. They make HTTP GET, POST, DELETE, etc, requests to interact with the webserver in a machine friendly way. You can really think of HTTP as the "sockets" of web technology. HTTP is very simple, and that means it can adapt to the new requirements like REST apis easily. But it's also an easy protocol to understand and fun to use as a result.
Also, it is basically SMTP but for pulling. When you read the Client and Server you'll notice there is a fair bit in common between the two protocols. Header specs, Status codes, etc.
Client
I have said that the protocol is simpler. That is mostly true. But what I really mean is that the amount of chaff with the protocol is smaller. This makes it easier to understand and therefore easier. But the implementation of the HTTP client is actually more involved than the SMTP client. This is largely because we are getting something back from the HTTP server. As usual you should read the attached http_client.py file, but I will give an overview here.
- Wait for a URL from the user.
- Parse that URL into a domain name and a path.
- Construct headers for the request headers
- Construct the request.
- Connect to the server.
- Send request.
- Recieve some of the response.
- Parse out the response headers.
- Find out if the request was succesful.
- If it was, find out how much is left to receive.
- Print result to screen.
Server
In terns of length the server isn't that much simpler than the client, but it is actually pretty simple. The server I have made here handles both GET and POST requests, even interacting with an HTML form. It would not be hard to expand this script to host static files and do other more dynamic interactions. This really demonstrates the potential here.
- Bind to the socket for Http, as usual we are using 8080 instead of 80 to avoid special permissions required for the sub 1000 ports.
- Accept a connection.
- Parse out the request line
- Parse out the headers
- Build the response, starting with the headers.
- Depending on if it is a GET or a POST, format the web page template from the top of the file.
- Calculate the size of the payload just created.
- Compile that into a response.
- Encode and send.
- Wait for next request.
Lessons Learned
Build on what's there. And while I do mean this in the sense of re-use existing code instead of writing everything from scratch, what I really mean is be inspired by history. HTTP is clearly a decendant of SMTP. The headers, MIME types, and status codes are all clearly inspired by SMTP. But it is important to recognize the specific application of what you are working on. HTTP would have been much worse if they had decided to also bring SMTP's method of converstation rather than just encoding everything in the request and body. If they had done that I guarantee that HTTP wouldn't be the basis for REST apis. There would be a lot more people trying to create custom protocol's to meet that need instead. So take the good of what exists and use it in your new project, it makes what you are doing instantly more familiar to yourself and others. But don't be affraid to trim off the parts that don't work for you.
Don't be affraid to start over with something new. While I obviously am in favor of using existing technology, this website brought to you by Nginx, I wrote recently about an expierence I had at work, which you can find here. For brevity I'l summarize here. We had a service that was no longer working and I had spent hours trying to fix. I asked to remake it in python, got permission and had it done in less than an hour. The key reason I was able to do this was becuase I opted against using Apache or Nginx. I choose to use python's built-in web server. Which meant no training wheels.
I must admit, when I started down that path I was scared. What if what I made wasn't good enough? What if it created more problems than it solved. But I trusted myself to make something good and it worked. They were pleased, but honestly I was more pleased with myself. I can do this. And you can to. I recommend experimenting at home first, which is what I've done with this article. Go, write a web server and feel the power for your self.
Conclusion
Making a webserver is kinda fun. While I didn't learn as much as I did with the DNS adventure, it was still informative. As in the paragraph above I would definately recommend giving it a try. The real big and polished webservers can be kind of intimidating because of all they can do, but remember that at it's core a webserver is a simple program that takes requests and returns some text.
Real Code
You are probably aware already of nginx an Apache2. So I will instead recommend Twisted if you are looking for something to run at a commercial scale. If you are looking for something simple use the web server integrated into python, it works great for little utilities.