Archive for 'Code Tips'
Here is something that I just learned yesterday that I didn’t know. This was one of those fun things where I knew every piece of the puzzle, but had never “realized” or “made the connection” between all of them.
If you aren’t sure what extension methods are, I wrote a blog post about them back in 2008 that you can check out here.
Here is an example for today:
public static class ExtensionMethods
{
public static bool IsEmptyStringArray(this string[] input)
{
if (input == null) return true;
return !input.Any();
}
}
What I’ve done is just create a method that allows you to call .IsEmptyStringArray() on any string array to find out if it has any items in it. I realize that this is a fairly useless example, but it is contrived for the sake of the demonstration.
Now, if I call a “framework” method on a null string array, I get an error. So, doing something like this:
string[] nullArray = null;
var hasItems = nullArray.Any();
Results in the error “Unhandled Exception: System.ArgumentNullException: Value cannot be null.”
However, I *CAN* call my extension method on that null array.
string[] nullArray = null;
var hasItems = !nullArray.IsEmptyStringArray();
Console.WriteLine(hasItems);
This code produces the following result:

How does that work? This is what I had neglected to put together in my mind. When you write an extension method, what actually gets compiled is this:
call bool CodeSandbox2010.ExtensionMethods::IsEmptyStringArray(string[])
The “syntactic sugar” part is that you aren’t actually calling a method on the null object at all. You are just calling your method and passing in the parameter, just like any other method. I really like that because it gives you a concise way to write your code without the same null check over and over and over again throughout your codebase. You can just check in the method and then get on with what you’re doing.

Just a quick note to document a “gotcha” that cost me quite a bit of time today. I am currently working on my company’s second Asp.Net MVC 2.0 application. Our first one was a huge success and (to me) MVC is way more fun and productive than WebForms, so doing as much new development using MVC instead of WebForms is a no-brainer.
Being good MVC-ites, we are doing a lot of AJAX calls. Since we are extensively using the jQuery library, we do a lot of $.ajax() and $(someSelector).load() in our code to make calls. Today, I was calling code like the following:
$('#theDivToUpdate').load('/Our/Url/Product/Edit/', {id : ourIdParameter}, function() { /* callback logic */ });
That code will call the Edit action on the ProductController and pass in the id of the product to edit. The controller gets the product details and then returns a partial to the page, which is then inserted into the div. I was trying to debug the controller action, but after I had edited a product once, the breakpoint wouldn’t catch anymore. I figured that I was dealing with a caching issue and the method wasn’t even being called, but I couldn’t find where I even had output caching configured in MVC.
I knew that I wasn’t having this problem in our previous application, so I started comparing files and looking for answers. Nothing was really giving me any clues, and everything appeared to be set up the same way. Finally, in exasperation, I did a “Find in Files” in Visual Studio in the other solution for just the word “cache” since searching for “outputcache” had failed me.
I dug through tons of results and then finally came across this little gem:
$.ajaxSetup({ cache: false });
I had seen that in the last application, had read the documentation on ajaxSetup, but I didn’t really understand all of the ramifications of it.
Apparently, jQuery in all of its awesomeness will actually cache ajax calls for you to speed up your page. That’s definitely awesome, but unexpected caching can certainly screw up your day.
Today on Twitter, Sahil Malik asked the question “Is there any reason ever to use Request.QueryString over Request.Params?” (link – for as long as it persists on Twitter).
For QueryString itself, there is practically no reason not to use the less specific collection, as QueryString is always checked first. That means that there is a small performance hit in Request.Params as it will check all collections regardless and not one in just calling Request, as it returns the first one it finds.
If you are trying to get at the Forms, Cookies, or ServerVariables collections, there is a performance hit (albeit small) no matter which way you turn. (Information on the happenings inside the .Net Framework are available on this Hanselman post from a few years ago.
There is an issue, however, if you want to access any of the other collections by using either Request[] or Request.Params[]. And really, this is the code smell that I was worried about when responding to Sahil earlier today. Let’s make an example page here. I made this using Asp.Net WebForms (to hopefully have the widest base of understanding).
Here is the meat of the .aspx page.
Here is the code behind
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
namespace Params
{
public partial class _Default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
Response.Cookies.Add(new HttpCookie("IsAdmin", "false"));
}
protected void submitButton_Click(object sender, EventArgs e)
{
var value = Request["IsAdmin"];
if (Convert.ToBoolean(value))
{
adminPanel.Visible = true;
}
else
{
Response.Write("You aren't an admin");
}
}
}
}

When we click the button, the cookie value is checked and since it is false, we get the sad news that we aren’t an admin.

Okay, let’s now assume that someone is curious if you are using simple cookie checks, so they look inside Firefox to see what the cookies are on your site.

Now, let me take a chance that the operator of this site is being sloppy and is using Request[] instead of calling the value from the cookie collection explicitly. I’ll add ?IsAdmin=true onto the querystring to mirror the cookie and then click the button. Since the QueryString collection is checked first, I’ll be overriding the cookie in this instance.
Here is our result

Well, that’s not good. What if we actually just check the cookie value instead of being lazy?
protected void submitButton_Click(object sender, EventArgs e)
{
var value = Request.Cookies["IsAdmin"].Value;
if (Convert.ToBoolean(value))
{
adminPanel.Visible = true;
}
else
{
Response.Write("You aren't an admin");
}
}
Now when I try the QueryString hack, the page is safe.

This doesn’t solve all of the problems with this page, though. I’m not pushing this off as some airtight secure solution. What I’m hoping to point out is that coding hard and even though we are given certain shortcuts, you really have to understand the ramifications of your code before throwing it out in the wild.
Last Time, we looked at ROW_NUMBER() and saw a little of how it works and how we could use it to remove duplicates. Another one of my favorite uses for ROW_NUMBER() is to get only one row from a one-to-many join. I’ve had situations at work where I need to return one row per Person, but each person had several addresses, several injury codes, and several contacts (people working their case).
In many of those cases, there is nothing to indicate which one of those rows should receive precidence over the others. One might just select from those tables and then take just the MIN() of the primary key, grouping by all other fields, but that makes for very messy and hard to maintain code in my opinion. ROW_NUMBER() gives us a much easier way to handle this problem.
Examine this sql script that gives us a Person Table and a PersonAddress table. One person can have many addresses, but there is no way in the schema to give precidence to one of the addresses over any of the others.
USE tempdb
GO
CREATE TABLE Person (PersonId INT IDENTITY(1,1) PRIMARY KEY, Name VARCHAR(50))
GO
INSERT Person ([Name])
SELECT 'Bill'
UNION
SELECT 'Bob'
UNION
SELECT 'Carl'
UNION
SELECT 'Sue'
UNION
SELECT 'Cleo'
UNION
SELECT 'Nick'
UNION
SELECT 'Thomas'
UNION
SELECT 'Jeydro'
UNION
SELECT 'Ronald'
UNION
SELECT 'Jamie'
UNION
SELECT 'Colleen'
UNION
SELECT 'Denise'
UNION
SELECT 'Phil'
UNION
SELECT 'Jim'
UNION
SELECT 'Jack'
UNION
SELECT 'Kathleen'
GO
CREATE TABLE PersonAddress (PersonAddressId INT IDENTITY(1,1) PRIMARY KEY, PersonId INT, [Address] VARCHAR(100))
GO
INSERT PersonAddress (PersonId, [Address])
SELECT PersonId, CONVERT(VARCHAR, PersonId * 123) + ' ' + UPPER(REVERSE([Name])) + ' ST.'
FROM Person
ORDER BY PersonId DESC
GO
INSERT PersonAddress (PersonId, [Address])
SELECT PersonId, CONVERT(VARCHAR, PersonId * 321) + ' SQL DR.'
FROM Person
WHERE PersonId BETWEEN 3 AND 9
GO
SELECT p.PersonId, p.[Name], pa.[Address]
FROM Person p
INNER JOIN PersonAddress pa
ON p.PersonId = pa.PersonId
ORDER BY p.PersonId
GO
DROP TABLE Person
GO
DROP TABLE PersonAddress
GO
Here is a sample of the results:

Using a similar trick to what we used last time to find unique records, we can alter our select statement so that we only return one address per person. If we change our select statement to this
; WITH Addresses
AS
(
SELECT p.PersonId, p.[Name], pa.[Address], ROW_NUMBER() OVER (PARTITION BY p.PersonId ORDER BY p.PersonId ASC) AS RowNum
FROM Person p
INNER JOIN PersonAddress pa
ON p.PersonId = pa.PersonId
)
SELECT PersonId, [Name], [Address], RowNum
FROM Addresses
GO
We get the following results:

You can see now that all we have to do is limit the result where the RowNum is equal to one and we have something that we can use (the CTE) in further joins or in reports to ensure that we don’t introduce duplicates to our query.
; WITH Addresses
AS
(
SELECT p.PersonId, p.[Name], pa.[Address], ROW_NUMBER() OVER (PARTITION BY p.PersonId ORDER BY p.PersonId ASC) AS RowNum
FROM Person p
INNER JOIN PersonAddress pa
ON p.PersonId = pa.PersonId
)
SELECT PersonId, [Name], [Address]
FROM Addresses
WHERE RowNum = 1
GO
Our results:

Sometimes when you are performing queries, you have data that does not have a primary key and contains duplicate records. If it had had a primary key, you could group on the columns that constitute a duplicate and just keep the MIN or MAX of the primary key, deleting the others. However, lacking that primary key, you have no way of uniquely identifying a row. Additionally, UNIQUEIDENTIFIER types don’t work with MAX and MIN, so you’d have a hard time picking a row in that case as well.
One option is of course to add an identity column to the table and do the operation that I described. However, sometimes you don’t have the appropriate permissions or just aren’t allowed to modify the table to do that. You can use the ROW_NUMBER() function to assign a unique row number to each row by partition. I’ll talk a little more about partition later, but when not explicitly specified in the query, the partition refers to the entire table.
Examine the script below. (All Sql Code contained here will only work on Sql Server 2005 and later)
USE tempdb
GO
CREATE TABLE HasDupes (ProductName VARCHAR(50), UpcCode VARCHAR(10))
GO
INSERT HasDupes (ProductName, UpcCode)
SELECT 'Diapers', '0123456780'
UNION ALL
SELECT 'Diapers', '0123456780'
UNION ALL
SELECT 'Diapers', '0123456780'
UNION ALL
SELECT 'Diapers', '0123456780'
UNION ALL
SELECT 'Formula', '0987654320'
UNION ALL
SELECT 'Formula', '0987654320'
UNION ALL
SELECT 'Formula', '0987654320'
UNION ALL
SELECT 'Diet Coke', '0564534230'
UNION ALL
SELECT 'Chair', '0872372530'
GO
SELECT *
FROM HasDupes
GO
DROP TABLE HasDupes
GO
You get the results shown here:

If we want to add a rownumber to the table, we would alter the part of the query that is
SELECT *
FROM HasDupes
GO
to say this
SELECT ProductName, UpcCode, ROW_NUMBER() OVER (ORDER BY ProductName ASC) AS RowNum
FROM HasDupes
GO
and get these results

That helps some, but wouldn’t it be better if we could get a row number assigned within groups? That is what the PARTITION BY clause does. Change the SELECT sql again so that it now looks like this (note: you can PARTITION BY and ORDER BY any columns you want, and they can be the same or different from each other)
SELECT ProductName, UpcCode, ROW_NUMBER() OVER (PARTITION BY UpcCode ORDER BY ProductName ASC) AS RowNum
FROM HasDupes
GO
Now, we get this

Okay, now all we have to do is keep the records that get assigned a one, forsaking all others. Change that same section of code again to read like this and then run the script. Note, I am using the CTE syntax here.
WITH ProductDupes
AS
(
SELECT ProductName, UpcCode, ROW_NUMBER() OVER (PARTITION BY UpcCode ORDER BY ProductName ASC) AS RowNum
FROM HasDupes
)
DELETE FROM ProductDupes WHERE RowNum > 1
GO
SELECT *
FROM HasDupes
GO
Now we have a table with no duplicates.

That’s it, that is all there is to it. In my next blog, I will cover how we can use this trick in joins to ensure that we only get a 1 to 1 join, even when the data is 1 to Many.