What’s in an HTTP request?
Whenever your web browser fetches a file (a page, a picture, etc) from a web server, it does so using HTTP - that’s "Hypertext Transfer Protocol". HTTP is a request/response protocol, which means your computer sends a request for some file (e.g. "Get me the file ‘home.html’"), and the web server sends back a response ("Here’s the file", followed by the file itself).
That request which your computer sends to the web server contains all sorts of (potentially) interesting information. We’ll now examine the HTTP request your computer just sent to this web server, see what it contains, and find out what it tells me about you.
The raw information
The following HTTP request was received from IP address 22.214.171.124 (port 63841) by IP address 126.96.36.199 (port 80):
GET /dumprequest HTTP/1.1Host: djce.org.ukConnection: keep-aliveReferer: http://www.google.com.hk/search?hl=zh-CN&safe=strict&biw=1366&bih=643&q=request+http&aq=f&aqi=g2g-m3&aql=&oq=Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.6 (KHTML, like Gecko) Chrome/6.0.496.0 Safari/534.6Accept-Encoding: gzip,deflate,sdchAccept-Language: zh-CN,zh;q=0.8,ko;q=0.6Accept-Charset: GBK,utf-8;q=0.7,*;q=0.3The analysisSource IP address, port and proxySource IP address:188.8.131.52Source port:63841Via:not presentX-Forwarded-For:not present
In order to send the appropriate response back to your computer, the web server necessarily knows your computer’s IP address, and a port number to which to send the response. Your IP address seems to be 184.108.40.206, and the port number used was 63841.
On the other hand, there could be one or more proxy servers between your computer and the web server. If the HTTP request includes the header "Via", or "X-Forwarded-For", then that’s a strong indication that there is at least one proxy server somewhere along the line.
If neither of those headers were present, that could mean that no proxy servers were involved, or it could mean that they just chose not to "reveal" themselves by adding those headers.
In this case since there is neither a "Via" header nor a "X-Forwarded-For" header, there quite possibly isn’t a proxy between your computer and the web server. However, this isn’t definite – it might be that there is a proxy, but it just chose not to add the "Via" / "X-Forwarded-For" headers.
Your IP address
For now we’ll assume your IP address is 220.127.116.11. Let’s see what we know about that address.
(Note, this section is nothing to do with HTTP in particular; this is just an example of what information can be determined from an IP address).
IP address:18.104.22.168DNS name:22.214.171.124.broad.cc.jl.dynamic.163data.com.cn
Destination IP address, port, host and protocolDestination IP address:126.96.36.199Destination port:80Host:djce.org.ukProtocol:INCLUDED
These headers tell us which web server you were trying to contact. If that seems odd, bear in mind that many web sites can be "hosted" on a single server, so when the request is received it needs to know which web site you were attempting to access.
The protocol used will almost always be either "HTTP/1.1" or "HTTP/1.0", and is a property of your computer’s web browser and any proxies through which the request might have passed.
Requested URIRequested URI:/dumprequest
Together with the ‘Host’ header and the destination port number (above), this specifies the document which should be retrieved.
Given all these values we can determine that the URL of the document which is being retrieved is: http://djce.org.uk/dumprequest
Request method and contentRequest method:GETData:none
The request method is usually either "GET" or "POST". Basically if you fill in and submit a form on a web page it might generate a POST request (or it might be "GET"), whereas if you just click on a link, or activate one of your browser’s "bookmarks" or "favourites", then the request method will always be "GET".
Therefore, if it’s "POST", we can tell that a form was definitely submitted. The contents of the form would appear here, and there would also be some "Content-" headers describing the data.
Web browsers generate two kinds of "POST" data: either "multipart/form-data", which is used when uploading files to a web server, or the more common "application/x-www-form-urlencoded".
User agentUser-Agent:Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.6 (KHTML, like Gecko) Chrome/6.0.496.0 Safari/534.6Accept:application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, */*;q=0.5Accept-Charset:GBK,utf-8;q=0.7,*;q=0.3Accept-Encoding:gzip,deflate,sdchAccept-Language:zh-CN,zh;q=0.8,ko;q=0.6
The User-Agent header describes your web browser. Typically it contains the browser name and version (e.g. Firefox 1.0.7), your Operating System and version (e.g. Windows XP), and possibly additional information (such as which "service packs" you have installed).
The "Accept" headers describe what sort of things the web browser can handle, and what it would prefer to be given if there’s a choice.
The "Accept" header itself describes which document types the web browser can handle, so for example we can tell whether your browser is capable of handling "image/png" graphics.
The "Accept-Charset" header describes what character sets are acceptable, so we can make some guesses as to what part of the world you might be in, and what language you might speak. For example, western European or north American users quite possibly only understand the "iso-8859-1", "us-ascii" and "utf-8" character sets, whereas "big5" would suggest that you might be Chinese.
"Accept-Encoding" describes the ability of your web browser to handle compressed transfer of documents. Nothing too interesting there, but it’s another snippet of information about the browser you’re using.
"Accept-Language" is more interesting though; it tells us what language(s) you prefer to receive your documents in – again, if the web server offers a choice. For example, if the header tells us that your preference is for "en-gb" followed by "en", that means you’re probably an English-speaking Briton. "pt-br" on the otherhand would suggest a Portuguese-speaking Brazilian.
The "referer" header tells us which document referred you to us – in essence, if you followed a link to get to this page, it is the URL of the page you came fromto get here.
If on the other hand you didn’t follow a link – maybe you clicked on a browser "bookmark", or maybe you just typed the address of this page directly into your browser – then the "referer" will be missing. And yes, that isn’t how it should be spelt.
Every time a web server provides you with a response (a page, a graphic, etc), it has the opportunity to send your browser a "cookie". These cookies are small pieces of information which your browser stores, and then sends back to that same web server whenever you subsequently request a document.
So there’s two important points here: (1) each cookie is only sent back to the same web site as it came from in the first place, and (2) the "contents" of the cookie (the data it contains) can only be made up of whatever information the web server already knew anyway. For example, a web server can’t just say "send me a cookie containing your e-mail address" unless that same web server had already sent you that information in the first place.
Connection controlConnection:keep-aliveKeep-Alive:not present
These headers are used to fine-tune the network traffic between you and the web server. They don’t tell us much, except a little about the capabilities of your web browser.
Cache controlPragma:not presentCache-Control:not presentIf-Modified-Since:not present
These headers control cacheing of the document. By examining them the we can detect if you used your browser’s "refresh" button to force the page to reload.
For example, Mozilla (Netscape 6) sets "Cache-Control" to "max-age=0" when you use the "reload" button. MSIE 5.5 sets it to "no-cache" if you do a "hard" reload (while holding down the "control" key).
If you have "logged in" to a web site, your username appears here.
Note that this only applies to web sites which use proper HTTP authentication – typically, a "login" window pops up and you get three chances to enter your username and password, otherwise you see a page which says "Authentication Required" or similar. It doesn’t apply to web sites where the "login" is a separate page.
It’s also possible to supply the username and password in the URL you tell your browser to visit – for example, http://user:email@example.com/. In that case, the username would appear here too.
The most interesting pieces of information contained in the request are:
the IP address of you and/or your HTTP proxywhich document you requestedwhich version of which browser you’re usingwhich page you came from to get here (if you followed a link)your preferred language(s)cookies
The "odd one out" in that list is "cookies". That’s because the cookies only send to the web server information which it had previously sent to you (and your browser accepted). However, the problem is in knowing what it means. The meaning of the cookie is only actually known to the web server.
If you can get your browser to show you your cookies, you might be able to make a good guess as to what it means – for example a cookie called "LastLoginName" with a value of "fred" probably means that when you last logged in on that site, you used the username "fred". However, a cookie called "TGIDX" with a value of "wl4o6ulhw48lw845yh68hylohw45" is meaningless to everybody except the web server, so you really have no idea what information that cookie actually holds.