Category: SQL

SQL

T-SQL IIF

Decisions Recently, I’ve decided to try to catch up on some things that were “snuck” into SQL Server while I wasn’t paying attention. The first post in this series was about TRY_CONVERT() (which is awesome) and today I want to talk about IIF().

I can’t imagine that there is a single beginner who has written any appreciable amount of SQL queries who at some point didn’t have to Google “how to do an if statement in sql” or ask someone with a little more experience. I certainly know that I had to do that. What you end up with is the CASE statement.

Let’s create a table to work with and populate it with some data.

USE tempdb
GO

IF OBJECT_ID('Customer') IS NULL
BEGIN
	CREATE TABLE dbo.Customer (
		CustomerId INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
		Name NVARCHAR(100) NOT NULL,
		CreditLimit MONEY NOT NULL,
		Active BIT NOT NULL		
	);

	INSERT dbo.Customer (Name, CreditLimit, Active)
	SELECT 'Nike', 10000000.00, 1
	UNION
	SELECT 'Under Armor', 5000000.00, 1
	UNION
	SELECT 'Reebok', 1000000.00, 1
	UNION
	SELECT 'Buster Brown', 10000.00, 0
END
GO

If we select out all of the data from the table now, it should look like this:

Customer Table Data

So, if we were to run a standard CASE statement against the table, it might look something like this. This statement returns the name of all of the customers along with their “Credit Status”. If a credit limit is greater than 1,000,000, then they are Preferred. Otherwise, they are Standard.

SELECT Name, CASE WHEN CreditLimit > 1000000.00 THEN 'Preferred' ELSE 'Standard' END AS CreditStatus
FROM dbo.Customer

This returns the following results.

Results after CASE Statement

T-SQL finally has an if statement of sorts for use in SELECT statements (logical branching IFs have always existed… I used one in my table create script). Those of you with any kind of VBA background (Access, Excel Macros and Formulas, etc) are probably already familiar with IIF(), which stands for “Immediate IF”. The T-SQL version takes three parameters, just as the VBA version does. The first parameter is the conditional, the second is what will display if the conditional is true, and the third is what will display if the conditional is false. Here is that same Credit Status query rewritten to use IIF().

SELECT Name, IIF(CreditLimit > 1000000.00, 'Preferred', 'Standard') AS CreditStatus
FROM dbo.Customer

You can see that the results are the same and the query does look a little cleaner.

Results after IIF Statement

What if you want to do multiple “layers” of the conditional? Nested IIF statements! Take a look at this query:

SELECT Name, 
IIF(CreditLimit > 5000000.00, 'Platinum', 
	IIF(CreditLimit >= 1000000.00, 'Gold', 'Silver')) AS CreditStatus
FROM dbo.Customer

The equivalent CASE statement would look like this:

SELECT Name, 
CASE 
	WHEN CreditLimit > 5000000.00 THEN 'Platinum'
	WHEN CreditLimit >= 1000000.00 THEN 'Gold'
	ELSE 'Silver'
END AS CreditStatus
FROM dbo.Customer

Both of those return a result set like this:
Nested Conditional Results

So, those statements do the same thing. For complicated conditional logic (especially logic that requires 5 or more decisions), I’d much prefer the CASE statement. Nested IIFs gets pretty dirty once you get that far. However, that appears to be an entirely stylistic choice. If you have any strong preference one way or the other, comment and let me know how you feel and why.

SQL

T-SQL TRY_CONVERT()

I am a little behind the curve on this one. Starting in SQL Server 2012, Microsoft introduced the TRY_CONVERT() function. This function works the same way as the CONVERT() function, but if the convert doesn’t work, you get NULL instead of an error.

Let’s take a look at the old way:

SELECT CONVERT(INT, 'aaa')

This returns the message “Conversion failed when converting the varchar value ‘aaa’ to data type int.” If you are doing a CONVERT() on some column in a large dataset, this often can throw you while you try to sort out what went wrong. Enter TRY_CONVERT(). Here is the new syntax:

SELECT TRY_CONVERT(INT, 'aaa')

This just returns NULL and doesn’t error at all. In this case, you can easily provide a default value for the conversion by using ISNULL() or COALESCE()

SELECT ISNULL(TRY_CONVERT(INT, 'aaa'), 0)
-- OR
SELECT COALESCE(TRY_CONVERT(INT, 'aaa'), 0)

Pretty darn awesome and about time (well, even four years ago, it was about time!).

SQL

NullIf

NullIf This post is about a handy little function that works across many database systems, including Sql Server, Oracle, and MySql. I often forget about it and at times even end up coding some workaround that would have been a lot easier if I had just used NullIf(). NullIf takes two parameters. The first parameter is the value to check and the second parameter is the value that should equate to NULL. Let’s take a look at an example to hopefully make this clearer.

First, we will create a table to work from, populate it with some data, and select the results out so that we can see what we have visually.

USE tempdb
GO

CREATE TABLE dbo.JunkData (
	JunkDataId INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
	Name VARCHAR(100) NOT NULL,
	StreetAddress VARCHAR(100) NULL
);

INSERT dbo.JunkData (Name, StreetAddress)
SELECT 'Pete', '777 Main St';

INSERT dbo.JunkData (Name, StreetAddress)
SELECT 'Jeff', NULL;

INSERT dbo.JunkData (Name, StreetAddress)
SELECT 'Dustin', '999 Oak St';

INSERT dbo.JunkData (Name, StreetAddress)
SELECT 'Ron', NULL;

INSERT dbo.JunkData (Name, StreetAddress)
SELECT 'Keith', '';

SELECT Name, StreetAddress
FROM dbo.JunkData;

Our table's contents

In this case, the Street Address column is nullable, but for Keith’s record the developer chose to insert a blank space instead of a NULL for a missing address. If we try to just write a query to get rid of the nulls, we still have the blank space issue. Here is a query that uses COALESCE to get rid of the NULLS and its results.

SELECT Name, COALESCE(StreetAddress, 'Not Provided') AS StreetAddress
FROM dbo.JunkData;

Our results with COALESCE Only

You see that we still have the blank address to deal with. One work around is to use a case statement. Something like “CASE WHEN COALESCE(StreetAddress, ‘Not Provided’) = ” THEN ‘Not Provided’ ELSE COALESCE(StreetAddress, ‘Not Provided’) END AS StreetAddress”. But, that is quite a mouthful and we repeat ourselves several times. However, if we could get COALESCE to treat a blank address like a NULL, we’d have been in business in the first place. That is what NULLIF() does. It evaluates the value we specify as a NULL (in this case a blank space) and then the rest of the query can treat it like a NULL. Here is that example

SELECT Name, COALESCE(NULLIF(StreetAddress, ''), 'Not Provided') AS StreetAddress
FROM dbo.JunkData;

Our results with COALESCE and NULLIF

That’s much more succinct and I feel like it conveys our intent much more easily. Here’s another practical use case for NULLIF(). Given the following table and values, we want to find the average sale on each given day.

CREATE TABLE dbo.Reporting(
  RowId INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
  SalesDate DATETIME NOT NULL,
  TotalSales MONEY NOT NULL,
  QuantitySold NUMERIC(9,2) NOT NULL
);

INSERT dbo.Reporting(SalesDate, TotalSales, QuantitySold)
SELECT '4/1/16', 432.50, 2;

INSERT dbo.Reporting(SalesDate, TotalSales, QuantitySold)
SELECT '4/2/16', 0.00, 0;

INSERT dbo.Reporting(SalesDate, TotalSales, QuantitySold)
SELECT '4/3/16', 5422.10, 10;

We might write something like the following

SELECT SalesDate, (TotalSales/QuantitySold) as AverageSale
FROM dbo.Reporting;

However, when you do, you get a divide-by-zero error. Sometimes in more complicated situations, that can be a bear to track down exactly what is evaluating to zero in an equation. However, if you ask the denominator to evaluate as NULL whenever it is 0, SQL is much happier. Here’s our query now:

SELECT SalesDate, (TotalSales/NULLIF(QuantitySold, 0)) as AverageSale
FROM dbo.Reporting;

Now, we easily get our results.

Results of Dividing by NULL

This helps identify which records are causing our issue and can now easily be dealt with. If we don’t like NULL in our results, we can use ISNULL() or COALESCE() to put a sensible default answer. Done and done.

SQL

SQL Pivot – Static or Dynamic Columns

Pivot Arrow This is going to be another one of those posts that will surely help some people, but it is far more likely to help me later 😉 I have culled these techniques from a variety of sources across the Internet, but I’m not entirely sure who taught me what in order to give proper credit.

If you want to follow along, I went to the Microsoft Sql Product Sample Site and downloaded “Adventure Works 2014 Full Database Backup.zip” and unzipped and restored it onto my database server. That gives me the OLTP version (rather than the OLAP) of the 2014 Adventure Works database. To start off, let’s take a look at the following query:

USE AdventureWorks2014
GO

SELECT c.AccountNumber, DATENAME(month, h.OrderDate) as OrderMonthName, SUM(d.LineTotal) as TotalSales
FROM Sales.SalesOrderHeader h 
INNER JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
INNER JOIN Sales.Customer c ON h.CustomerID = c.CustomerID
WHERE YEAR(h.OrderDate) = 2011
GROUP BY c.AccountNumber, DATENAME(month, h.OrderDate)
ORDER BY c.AccountNumber

This returns results like these:
Initial Month-based Results

However, what if you wanted to easily see how each of those customers’ sales looked month over month in comparison to other customers? Typically, a data analyst might take a result set like that and use a tool like Microsoft Excel to create a pivot table. That would let your rows be per customer and your columns would be per month. Reporting wonks could also use a tool like SSRS to pivot the data for presentation to users. However, if you are using Sql Server 2005 or later, you could also use the PIVOT command.

If your data that you’d like to pivot on has a finite and known quantity, this is very easy. I’m going to change the query above to show how to do that:

USE AdventureWorks2014
GO

;WITH SalesData AS (
	SELECT c.AccountNumber, DATENAME(month, h.OrderDate) as OrderMonthName, CAST(SUM(d.LineTotal) AS NUMERIC(12,2)) as TotalSales
	FROM Sales.SalesOrderHeader h 
	INNER JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
	INNER JOIN Sales.Customer c ON h.CustomerID = c.CustomerID
	WHERE YEAR(h.OrderDate) = 2011
	GROUP BY c.AccountNumber, DATENAME(month, h.OrderDate))

SELECT *
FROM SalesData
PIVOT (SUM(TotalSales) FOR OrderMonthName IN (January, February, March, April, May, June, July, August, September, October, November, December)) pvt

A few peculiar things to notice. First, the values that become the new columns are not quoted. They are treated like literals. I didn’t have to say ‘January’, ‘February’, etc. Secondly, notice that I had to name the pivot table (as pvt here). Even though I don’t reference it, if I omit it, the Sql statement won’t parse and execute. Look at the results below. Notice a third strange thing. Even though I selected out all of the columns (using SELECT *), it did not actually return all columns like you would normally see in a join situation. It showed the AccountNumber column from the SalesData CTE, but then it ignored the OrderMonthNames and TotalSales columns from SalesData. The PIVOT was smart enough to know that those values were being represented in other ways in our new dataset.

Pivoted Month Results

If you want/need to get rid of the NULLS in the results, you can do it the same way that you would do it in a regular query. I could change the “SELECT *” to be “SELECT AccountNumber, ISNULL(January, 0.00) AS January, ISNULL(February, 0.00) AS February” and just continue that pattern for each month.

Simple enough and that gets us pretty far along the way. But, what if I wanted to instead see customer sales by product category? I could see every product category that exists in the system right now and do a static pivot. However, everytime someone updated the system with a new product category, we would have to modify our query (potentially many queries across the enterprise). Is there a way to do that?

No, not directly.

However, we can use Dynamic Sql to dynamically determine our static columns, if that makes sense 😉 Let’s take a look at the query needed to just return the standard results, grouped by customer and product category.

USE AdventureWorks2014
GO

SELECT c.AccountNumber, pc.Name + '-' + ps.Name AS CategoryName, CAST(SUM(d.LineTotal) AS NUMERIC(12,2)) as TotalSales
FROM Sales.SalesOrderHeader h 
INNER JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
INNER JOIN Production.Product p ON d.ProductID = p.ProductID
INNER JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID
INNER JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID
INNER JOIN Sales.Customer c ON h.CustomerID = c.CustomerID
WHERE YEAR(h.OrderDate) = 2011
GROUP BY c.AccountNumber, pc.Name, ps.Name

This returns the following results:
Initial Dynamic Pivot Results

There are 8 unique category – subcategory combinations in these results, but it doesn’t take much imagination to see how new products can be added all the time. Probably several new subcategories a year. How can we handle that? The key is those columns listed out in the PIVOT operator definition. Let’s see how we can build that dynamically using a temp table to make working with the data easier.

USE AdventureWorks2014
GO

SELECT c.AccountNumber, pc.Name + '-' + ps.Name AS CategoryName, CAST(SUM(d.LineTotal) AS NUMERIC(12,2)) as TotalSales
INTO #SalesData
FROM Sales.SalesOrderHeader h 
INNER JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
INNER JOIN Production.Product p ON d.ProductID = p.ProductID
INNER JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID
INNER JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID
INNER JOIN Sales.Customer c ON h.CustomerID = c.CustomerID
WHERE YEAR(h.OrderDate) = 2011
GROUP BY c.AccountNumber, pc.Name, ps.Name

DECLARE @PivotColumns NVARCHAR(MAX), @SqlStatement NVARCHAR(MAX)

SELECT @PivotColumns= ISNULL(@PivotColumns + ',','') + QUOTENAME(CategoryName) FROM (SELECT DISTINCT CategoryName FROM #SalesData) AS Categories

SET @SqlStatement = 
	N'SELECT AccountNumber, ' + @PivotColumns + '
    FROM #SalesData
    PIVOT(SUM(TotalSales) 
          FOR CategoryName IN (' + @PivotColumns + ')) AS pvt'

EXEC sp_executesql @SqlStatement

DROP TABLE #SalesData

The bulk of the work here is just being done in this line

SELECT @PivotColumns= ISNULL(@PivotColumns + ',','') + QUOTENAME(CategoryName) FROM (SELECT DISTINCT CategoryName FROM #SalesData) AS Categories

That just select a distinct list of categories and builds a comma delimited string out of them. The value in that variable looks like this:
[Bikes-Road Bikes],[Clothing-Socks],[Clothing-Jerseys],[Clothing-Caps],[Components-Mountain Frames],[Accessories-Helmets],[Bikes-Mountain Bikes],[Components-Road Frames]

Because of the spacing in the categories, that’s why we used QUOTENAME. That makes sure the name will work in SQL statements. Ultimately, the executed query looks like this:

SELECT AccountNumber, [Bikes-Road Bikes],[Clothing-Socks],[Clothing-Jerseys],[Clothing-Caps],[Components-Mountain Frames],[Accessories-Helmets],[Bikes-Mountain Bikes],[Components-Road Frames]
FROM #SalesData
PIVOT(SUM(TotalSales) FOR CategoryName IN ([Bikes-Road Bikes],[Clothing-Socks],[Clothing-Jerseys],[Clothing-Caps],[Components-Mountain Frames],[Accessories-Helmets],[Bikes-Mountain Bikes],[Components-Road Frames])) AS pvt

You can see how that is now very much like our initial static example with the months. The query still has the NULL problem, though, so to fix that you can create another variable so the SELECTed columns can contain ISNULL but the PIVOTed columns will not. That looks like this:

USE AdventureWorks2014
GO

SELECT c.AccountNumber, pc.Name + '-' + ps.Name AS CategoryName, CAST(SUM(d.LineTotal) AS NUMERIC(12,2)) as TotalSales
INTO #SalesData
FROM Sales.SalesOrderHeader h 
INNER JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
INNER JOIN Production.Product p ON d.ProductID = p.ProductID
INNER JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID
INNER JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID
INNER JOIN Sales.Customer c ON h.CustomerID = c.CustomerID
WHERE YEAR(h.OrderDate) = 2011
GROUP BY c.AccountNumber, pc.Name, ps.Name


DECLARE @PivotColumns NVARCHAR(MAX), @SelectColumns NVARCHAR(MAX), @SqlStatement NVARCHAR(MAX)

SELECT @PivotColumns= ISNULL(@PivotColumns + ',','') + QUOTENAME(CategoryName) FROM (SELECT DISTINCT CategoryName FROM #SalesData) AS Categories
SELECT @SelectColumns = ISNULL(@SelectColumns + ',','') + CategoryName FROM (SELECT DISTINCT 'ISNULL(' + QUOTENAME(CategoryName) + ', ''0.00'') AS ' + QUOTENAME(CategoryName) AS CategoryName FROM #SalesData) AS Categories

SET @SqlStatement = 
	N'SELECT AccountNumber, ' + @SelectColumns + '
    FROM #SalesData
    PIVOT(SUM(TotalSales) 
          FOR CategoryName IN (' + @PivotColumns + ')) AS pvt'

EXEC sp_executesql @SqlStatement

DROP TABLE #SalesData

Basically, I just made the distinct list of category names to be a distinct list of category names wrapped in ISNULL. Then I used the @SelectColumns variable in the select statement and the @PivotColumns variable in the pivot list. The resulting Sql looks like this:

SELECT AccountNumber, ISNULL([Accessories-Helmets], '0.00') AS [Accessories-Helmets],ISNULL([Components-Mountain Frames], '0.00') AS [Components-Mountain Frames],ISNULL([Clothing-Caps], '0.00') AS [Clothing-Caps],ISNULL([Components-Road Frames], '0.00') AS [Components-Road Frames],ISNULL([Bikes-Road Bikes], '0.00') AS [Bikes-Road Bikes],ISNULL([Clothing-Socks], '0.00') AS [Clothing-Socks],ISNULL([Clothing-Jerseys], '0.00') AS [Clothing-Jerseys],ISNULL([Bikes-Mountain Bikes], '0.00') AS [Bikes-Mountain Bikes]
    FROM #SalesData
    PIVOT(SUM(TotalSales) 
          FOR CategoryName IN ([Bikes-Road Bikes],[Clothing-Socks],[Clothing-Jerseys],[Clothing-Caps],[Components-Mountain Frames],[Accessories-Helmets],[Bikes-Mountain Bikes],
[Components-Road Frames])) AS pvt

And the resulting data looks like this:
Dynamic Pivot Final Results

Hopefully you find this helpful (I’m sure future me will be back here). If you have any questions, let me know in the comments.

Podcasts

Podcast Episode 22 – In Defense of Stored Procedures

Gavel - Labeled for CC Reuse, From http://upload.wikimedia.org/wikipedia/commons/b/b2/CourtGavel.JPG Episode 22 finds me talking about Stored Procedures. Stored Procedures have gotten a bad rap not just recently, but for over a decade. Stored Procedure apologists don’t always help the matter and misinformation abounds. I take a look at stored procedures and why it isn’t the worst thing in the world if you use them.

I haven’t forgotten my last podcast about religious arguments and I try to take an honest look at when stored procedures aren’t terrible, when ORMs are best, and when in-line parameterized SQL might be your best option. Too often, I feel that developers are scared of the database because they don’t know it or only have a limited view of what it could be for. In this episode, I look at it from all angles.

Links Mentioned in this Show:
Rob’s Original Post
Frans Bouma’s Reply
Rob’s Rebuttal
Jeff Atwood’s Post
Ayende’s Compromise
Zip Locate
Gooey

You can also subscribe to the podcast at any of these places:

Thanks to all the people who listen, and a special thanks to those who have rated me. I really appreciate it.

The episodes have been archived. Click Here to see the archive page.

Pete on Software