Friday, January 15, 2010

Pivot/Unpivot or Transpose in SQL 2005 and SSIS 2005

Lately I’ve been working on a project to bring a suite of Excel Reports to the new age, via SSIS and SSRS. (yeah, boring stuff, not mentioning the times spent to understand the user actions, VBA macros, cell formulae etc, such a pain!!!). Anyway, one of the existing manual actions from the users of the Excel which I need to emulate in the new system is the transpose copy paste action. At last, something that I haven’t done before and it’s interesting to me, in a sense.

After some research, there’s actually quite a lot of way to achieve the result, either using SSIS Data Flow task of Pivot/Unpivot, or just pure Transact-SQL approach.

The SSIS Method

Can’t explain better than what this blog has already mentioned. It involves a clever use of both Pivot and Unpivot Data Flow tasks. The example given provides the basic concept for me to tackle my slightly complex scenario.

Transact-SQL Method

Again, no point for me to reinvent the wheel. This blog has given the best options available. On top of that, MSDN also provide good examples on how to use the PIVOT and UNPIVOT command.

My Implementation

After the research, time to get the job done. So here’s I need to do:

I have an Excel table that looks like this:

Date: 15/1/2010

Category

Product A

Product B

Product C

Product D

In

Out

In

Out

In

Out

In

Out

Category 1

Category 2

Category 3

I need to transform it into this first:

Date

Product Name

Category

Balance

15/1/2010

Product A

Category 1

{In-Out}

15/1/2010

Product B

Category 2

{In-Out}

15/1/2010

Product C

Category 3

{In-Out}

Then into these tables for SSRS consumption:

Product A

Date

Category 1

Category 2

Category 3

15/1/2010

{Balance Value 1}

{Balance Value 2}

{Balance Value 3}

Product B

Date

Category 1

Category 2

Category 3

15/1/2010

{Balance Value 1}

{Balance Value 2}

{Balance Value 3}

Product C

Date

Category 1

Category 2

Category 3

15/1/2010

{Balance Value 1}

{Balance Value 2}

{Balance Value 3}

I decided to use both SSIS and Transact-SQL PIVOT for my solution. Reason is to avoid having to keep multiple Product tables. In the example shown above, I would need 3 tables for each product. Imagine having more than that…

Part A: SSIS Unpivot

First off, here’s my SSIS Data Flow:

1. Because I can’t Unpivot both [In] and [Out] columns at the same time (SSIS limitation), I have to Unpivot them separately, then join them back later through the Merge Join task.

2. Merge Join task requires all the data flow coming into it sorted with the same key(s). It will sort of being smart to detect if the data flow reaching it is sorted or not. If it is not, then it will not run and you'll get an error. This is typically solved by using the Sort task first, but since I know my data from the Excel is sorted, I avoid using the performance taxing Sort task. Instead, I set the Unpivot task’s output properties of IsSorted to True, then, in the Unpivot Output columns, set the SortKeyPosition of the sorted column to 1. (If more columns were used to sort, set the other columns with SortKeyPosition=2 or 3 or so forth). Screenshot below shows the Avanced Editor of the Unpivot task, where I set the IsSorted property.

3. The resultant join will have some invalid rows, due to merge columns in the Excel. Therefore, I’ve used the Conditional Split task (Get valid data) to exclude the unwanted columns.

4. The rest are simple stuff of adding the derived Date column, calculations of {In}-{Out} and data conversion, before loading into Production table.

Part B: SQL PIVOT

Next is the PIVOT SQL command (following the sample in MSDN)

SELECT Date, [Category 1], [Category 2], [Category 3] FROM

(

SELECT Date, Category, Balance

FROM Table01

WHERE ProductName = ‘Product A’

AND Date = ‘20100115’

) AS SourceTable

PIVOT

(

AVG(Balance)

FOR Category

IN ([Category 1], [Category 2], [Category 3])

) AS PivotTable;

So it’s done. Probably not the best apparoch, but hey, it works anyway J.

No comments: