Lately I’ve been working on a project to bring a suite of Excel Reports to the new age, via SSIS and SSRS. (yeah, boring stuff, not mentioning the times spent to understand the user actions, VBA macros, cell formulae etc, such a pain!!!). Anyway, one of the existing manual actions from the users of the Excel which I need to emulate in the new system is the transpose copy paste action. At last, something that I haven’t done before and it’s interesting to me, in a sense.
After some research, there’s actually quite a lot of way to achieve the result, either using SSIS Data Flow task of Pivot/Unpivot, or just pure Transact-SQL approach.
The SSIS Method
Can’t explain better than what this blog has already mentioned. It involves a clever use of both Pivot and Unpivot Data Flow tasks. The example given provides the basic concept for me to tackle my slightly complex scenario.
Transact-SQL Method
Again, no point for me to reinvent the wheel. This blog has given the best options available. On top of that, MSDN also provide good examples on how to use the PIVOT and UNPIVOT command.
My Implementation
After the research, time to get the job done. So here’s I need to do:
I have an Excel table that looks like this:
Date: 15/1/2010
Category | Product A | Product B | Product C | Product D | ||||
In | Out | In | Out | In | Out | In | Out | |
Category 1 | ||||||||
Category 2 | ||||||||
Category 3 |
I need to transform it into this first:
Date | Product Name | Category | Balance |
15/1/2010 | Product A | Category 1 | {In-Out} |
15/1/2010 | Product B | Category 2 | {In-Out} |
15/1/2010 | Product C | Category 3 | {In-Out} |
Then into these tables for SSRS consumption:
Product A
Date | Category 1 | Category 2 | Category 3 |
15/1/2010 | {Balance Value 1} | {Balance Value 2} | {Balance Value 3} |
Product B
Date | Category 1 | Category 2 | Category 3 |
15/1/2010 | {Balance Value 1} | {Balance Value 2} | {Balance Value 3} |
Product C
Date | Category 1 | Category 2 | Category 3 |
15/1/2010 | {Balance Value 1} | {Balance Value 2} | {Balance Value 3} |
I decided to use both SSIS and Transact-SQL PIVOT for my solution. Reason is to avoid having to keep multiple Product tables. In the example shown above, I would need 3 tables for each product. Imagine having more than that…
Part A: SSIS Unpivot
First off, here’s my SSIS Data Flow:
1. Because I can’t Unpivot both [In] and [Out] columns at the same time (SSIS limitation), I have to Unpivot them separately, then join them back later through the Merge Join task.
2. Merge Join task requires all the data flow coming into it sorted with the same key(s). It will sort of being smart to detect if the data flow reaching it is sorted or not. If it is not, then it will not run and you'll get an error. This is typically solved by using the Sort task first, but since I know my data from the Excel is sorted, I avoid using the performance taxing Sort task. Instead, I set the Unpivot task’s output properties of IsSorted to True, then, in the Unpivot Output columns, set the SortKeyPosition of the sorted column to 1. (If more columns were used to sort, set the other columns with SortKeyPosition=2 or 3 or so forth). Screenshot below shows the Avanced Editor of the Unpivot task, where I set the IsSorted property.
3. The resultant join will have some invalid rows, due to merge columns in the Excel. Therefore, I’ve used the Conditional Split task (Get valid data) to exclude the unwanted columns.
4. The rest are simple stuff of adding the derived Date column, calculations of {In}-{Out} and data conversion, before loading into Production table.
Part B: SQL PIVOT
Next is the PIVOT SQL command (following the sample in MSDN)
SELECT Date, [Category 1], [Category 2], [Category 3] FROM ( SELECT Date, Category, Balance FROM Table01 WHERE ProductName = ‘Product A’ AND Date = ‘20100115’ ) AS SourceTable PIVOT ( AVG(Balance) FOR Category IN ([Category 1], [Category 2], [Category 3]) ) AS PivotTable; |
So it’s done. Probably not the best apparoch, but hey, it works anyway J.
No comments:
Post a Comment