A Speed Problem
Comments Off
First let me give you a little of the background. At work, we are building a web application. Among other things, we’d like it to be fast. We made many painstaking decisions to build it to be fast. My last post talked about some of the things that we were going to undertake to make sure that it was fast through the pipe.
However, we were having a problem. Page loads were taking over 5 seconds and that was after the initial hit penalty that ASP.Net gives you. I put tracing information in our handler and it turns out that all of our framework code was executing in .05 seconds. That wasn’t the problem. The step of getting the base handler (for the hand off to ASP.Net) however, was taking over 5 seconds.
We had another interesting problem. The ASP.Net Cache object didn’t work. It would be alive for the duration of a page hit, but on the next hit it was gone. My boss suggested that the pages were compiling every time, and we ultimately did see that in the %windir%\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files directory. The question was, “Why?”.
We had wanted the site files to be dynamically compiled for ease of updating, while our framework code remained in pre-compiled .dlls. To troubleshoot, I recommended that we pre-compile the website as well and see what happened. Page hits got down to half a second a piece. Interesting.
Shortly after, though, I got a brainstorm. We were writing log files into a folder in the bin directory (a holdover convention from a previous application) and every time that directory changed (on every page hit), ASP.Net would sense that the bin directory changed (since the log folder was in it and a part of it) and think that the site had changed and now needed a recompile. It makes perfect sense in retrospect!
We moved the log directory outside of the bin, went back to dynamic compilation, and got the same performance improvement that we got with the pre-compiled version – much to my relief.
I’m starting to get involved in this book, High Performance Web Sites by Steve Souders. My boss picked it up and read it and had really good things to say. Then the CEO read it and was really geeked about it. So, I decided that I’d better give it more than the cursory glance that I’d previously allotted it. I know that there is a lot of push to write one’s server code to run as quickly as possible, but I feel that the subjects that this book is covering are being largely overlooked.
Mr. Souders wrote the book while he was a Yahoo employee (he works at Google as of January 7th). He is also the creator of YSlow and an expert in web performance. However, you can actually get a lot of the tips if you check out this page and install YSlow onto your machine. YSlow is an addon for Firebug, itself a plugin for Firefox. If you are doing web development, you should already know about Firebug. If you don’t, run – don’t walk – to download it and you can thank me later.
When you run YSlow, it analyzes the current page for each of the 13 points that the Yahoo Developer Network has identified as major causes of web site slowness. It assigns you a grade to each item and gives the site an overall score. For instance, this blog scores a D (65). According to YSlow, I need to add an expires header, use GZip compression, and configure ETags in order to get this site up to par. As an experiment, I may very well dig into trying to get my score up to at *least* a B! If you aren’t sure what these points mean or what I’m talking about, you should check out the links above or get the book.
The book is very well written and is grouped in such a way so that it can be digested separately by different members of a team or by one person. The appendices of the book contain case studies of several major sites using the tools I discussed above and explains what each could do to improve user experience. At $20.00 on Amazon (and used from $11 and change), every web developer really should read this book, implement it, and keep it as a reference for future development.
This is a classic post from my old blog that I’m porting over here. As I read through my old blog and find these in preparation to take it offline, I’ll be posting “Classic” POS (Pete on Software… not the other, more popular meaning) posts on here. This code came from my reading of Code Complete, 2nd edition.
for (int i=0; i <10;i++)
{
someArray[i] = i;
}
versus
someArray[0] = 0;
someArray[1] = 1;
someArray[2] = 2;
someArray[3] = 3;
someArray[4] = 4;
someArray[5] = 5;
someArray[6] = 6;
someArray[7] = 7;
someArray[8] = 8;
someArray[9] = 9;
Both sets do the same thing. Most every programmer that I know will do the first one. But, the loop (first example) is 2.5 times SLOWER than the explicit declaration (second one) when run. In some languages the equivalent code is 4.5 times slower. Unrolling loops (where possible), is often a way to tweak code for speed when you’ve just got to have it.
There is always so much to learn.
This post is about Sql, specifically in the Sql Server RDBMS, but I believe that the principles will hold up across platforms.
If you are using a function in a WHERE clause in your SQL, like the following:
SELECT a.id, a.name
FROM some_table a
WHERE a.hash_value = fn_hash_function(‘value’)
The function will be evaluated for every row in your table. This will result in performance decreasing exponentially as your table grows.
A better way to accomplish this query is to do the following:
DECLARE @input_hash varchar(1000)
SET @input_hash = fn_hash_function(‘value’)
SELECT a.id, a.name
FROM some_table a
WHERE a.hash_value = @input_hash
This code will only evaluate the function one time and as long as you have the comparison column indexed properly, you will see very impressive results. On one query on my local machine, Sql Profiler showed that query that was taking over 7000ms begin to return in about 70ms.
The performance gains in that exact instance were less drastic on a Sql Server running on an actual server (it had just been powering its way through), but in time it would have gotten unbearably slow, as well.
Hope that helps someone.
