Understanding Web Server Log FIles
Well, they aren’t really different. Both of them constitute a “hit” and “page view.” So, why are they counted differently? Because programers are thorough. A regular hit is activity that must be served up by your server: this takes up more bandwidth than a cached hit.
So, what is a cache?
A cache is a temporary storage location for regularly or recently accessed files. This is done in an effort to improve performance. There are memory, bios, browser and proxy caches. For our purposes, they can exist in one of two places: in your browser, or on a proxy server.
The browser cache
Say you go to your web site at www.yourdomain.com, and then click the back button on your browser to come back to this page. Chances are that you will return to this page much faster than when you first got here. This is because a copy of this page and all the graphics on this page were stored to your browser cache so that you wouldn’t have to wait for the web server to serve up the same page again.
The amount of time that a page will stay in your cache depends on your browser settings. Usually they are set to some default size. Once that byte size is exceeded, the browser begins to empty the older files from the cache.
The proxy cache
Many Intranets these days are starting to realize that bandwidth = $$$. One way that a company can save money, which has the added benefit of improving browsing performance for their employees is to install a proxy server. This proxy server also has a cache. Most have very huge caches which store recently or often requested pages. This works exactly the same way as a browser cache but can hold gigs of data, and the data that is stored can be determined by the administrator. Often times the entire Intranet structure of businesses is cached in proxy servers.
How does a cache work and what does it mean to me?
In either scenario, whether a browser or proxy has cached your page, the way that the web server handles the request is the same.
The user connects to the web server and request a web page. The server records the transaction in the log file. It looks something like this:
GET /test.html HTTP/1.1 200 23559
In this particular case, the user request a web page called “test.html.”. The code 200 means that the file was served up to the user successfully. The numbers 23559 refer to the number of bytes that were transferred from the web server to the user.
Now, suppose that this person clicks on another link to visit another page, after which they hit the “back” button on their web browser to return to “test.html.” Since that file still exists in the customer’s cache the browser “asks” the web server if the file has changed since the last time he or she looked at it. That activity will look like this in the log file:
GET /test.html HTTP/1.1 304 0
The code 304 means that the user requested a page he or she already had in their cache, and since the page had not changed since they last looked at it the web server did not deliver it – the web browser did. As a result, the number of bytes for this request is 0.
How does this show up in my reports?
It depends on how the web statistic are calculated by the software you are using. In most cases, since the page was actually viewed both times, the software will likely interpret the two lines above as:
Hits: 2
Bytes transferred: 23559
In other words, the user viewed the page twice, but by looking at the number of bytes transfered one can see that one of those hits were cached.
The reason this is handled this way is that you DO want to know how many times a user viewed a particular page. This is extremely valuable marketing data. At the same time, bytes transferred are extremely important in terms of projecting bandwidth needs, so most software will give you both statistics.
What other server codes can you expect to see in my web server log files?
These are a few of the common ones. There are too many to list them all:
200: The file was served successfully.
201: Following a POST command, this indicates the file was created successfully.
205: The service was interrupted (for whatever reason) in the middle of the file transfer and reset connection. This can happen if the customer’s internet connection was interrupted or dropped, or if the hit the “back” button on their web browser before the file could be completely srved up by the web server.
206: Partial success, or entire file not sent.
304: Not modified. The document has not been modified since it was last viewed, and so the server did not send the document to the customer.
305: Not modified. Indicates that a proxy server cached the file.
400: Bad request. Means that the user made a request the web server did not understand.
401: Unauthorized. Customer tried to enter a protected file area, or that the web server could authenticate the user.
403: Forbidden – Self explanatory. The request is not allowed. Authorization will not help.
404: Not found. The server could not find a match for the URL requested. Usually means some one entered the wrong URL into the browser, or that the page requested no longer exists on the server.