SQL Server

Import Shapefiles into SQL Server and Aggregate Spatial Data (Geometry)

Around 6 months ago, I was desperately looking for some online resources to help me implement some map reports. The shapefile I was dealing with was at a very granular level (which means that it had a lot of data, 2.5 MB in size), and we needed to do some custom aggregations on the spatial data. As the custom aggregations were done at report run time, it turned out to be a major performance bottleneck. That was when this blog by Sean Boon - SQL Server 2008 R2 Map Tips: How To Import Shapefiles Into SQL Server and Aggregate Spatial Data, caught my eye and I got the idea of importing my shapefile into DB and then doing the custom aggregations there itself for solving my performance problems. Armed with the contents from this blog, I started developing my reporting solution but soon stumbled upon a major roadblock – I realized that the blog he had written was for aggregating shapefiles of Geography data type, while the shapefile I had with me was of Geometry data type (read here to find the difference between the two data types). After a couple of days of searching, I finally managed to find the solution and thought of blogging it down at least now.

The major steps involved in the solution are:-

1) Converting Shapefiles into SQL Server Spatial Data : As mentioned in Sean’s blog, we will be using the tool from http://www.sharpgis.net/page/Shape2SQL.aspx called Shape2SQL. Download the software from the site and then click on the exe to see the user interface. If you go to the site, you will find an image of the interface which guides you how to convert the shapefile into SQL Server Spatial data but then that is for a shapefile of Geography data type. So how do you convert a shapefile of Geometry data type into spatial data? For that, make sure that the interface looks like the image given below:-

For some reason, I had to create the table beforehand, the tool didn’t automatically create the table as I thought it would. The table creation script is also given below so that you can have an idea of the columns

CREATE TABLE [dbo].[MapData](
[geom] [geometry] NULL,
[PC4CODE] [nvarchar](max) NULL,
[Sales Rep] [nvarchar](max) NULL)

So as you can see, in my table, I would have all the PC4Code information (which is the postal code) at the end of this exercise. Now I update my table such that I get all the parent information in the [Sales Rep] column (which would have all the list of sales regions). My requirement is that instead of having a map by some 4000 odd postal codes, I would like to see a map by just the 40 odd sales regions that I have. A simple select from the MapData table would look like this now:-

2) Aggregate Polygons in SQL Server – Get assembly : For aggregating polygons of geography data type, we can use the method that Sean has mentioned in his blog. But that particular function would not work in the case of geometry data type. For doing this, please follow the steps mentioned in ConceptDev (Craig Dunn’s blog) quoted below

DOWNLOAD the source (8k) to try out the two aggregates posted on MSDN, or follow these steps:

1. Download the code from MSDN, or my copy. NOTE: You must do all these steps on a PC with SQL Server 2008 (Katmai) or above installed to get the Microsoft.SqlServer.Types.dll assembly. I copied this assembly into C:assemblies.

2. Compile the C# SQLCLR functions in C:assemblies via the Command Prompt

c:windowsmicrosoft.netframeworkv3.5csc /t:library /r:Microsoft.SqlServer.Types.dll C:assembliesaggregate.cs

3. Execute T-SQL to register the aggregate functions in SQL Server 2008
CREATE ASSEMBLY SpatialAggregate FROM ‘c:assembliesAggregate.dll’
go
CREATE AGGREGATE EnvelopeAggregate(@input GEOMETRY) RETURNS GEOMETRY
external name [SpatialAggregate].[SilverlightEarth.Geoquery.SqlClr.EnvelopeAggregate]
go
CREATE AGGREGATE UnionAggregate(@input GEOMETRY) RETURNS GEOMETRY
external name [SpatialAggregate].[SilverlightEarth.Geoquery.SqlClr.UnionAggregate]
go

sp_configure ‘clr enabled‘, 1;
go
RECONFIGURE;
go

3) Aggregate Polygons in SQL Server – Use function to aggregate data and store in table : Now that you have the aggregate functions registered, you can use it in a query to aggregate it and store it in a table like shown below:-

select WAM=M.[Sales_Rep],
geom=dbo.UnionAggregate(geom.MakeValid())
into NL_Map
from MapData M
group by M.[Sales_Rep]

The UnionAggregate function will aggregate the data and the aggregated data can be viewed by a simple select on the new NL_Map table

Now you are all set to use this table as the source of your reports.

Update (25/03/2012)

SQL 2012 has integrated these features and hence if you are using SQL 2012, you can aggregate both geography and geometry data types using the built in functions Geometry::UnionAggregate(geom) and Geography::UnionAggregate(geog). Refer the blog below http://social.technet.microsoft.com/wiki/contents/articles/4136.aspx

“ UnionAggregate() combines multiple spatial objects into a single spatial object, removing interior boundaries, where applicable.
SELECT Geography::UnionAggregate(geog) FROM Counties
WHERE name = ‘Washington’;

Posted by SQLJason, 10 comments

Performing a Right Outer Join with a Left Outer Join clause

Most of the interviewers I have seen tend to prefer people with better problem solving skills rather than better knowledge on a particular technology. No wonder considering the fact that technologies change with each fleeting moment, while a good problem solving mentality stays forever. With this prelude, I put forward this question that I came to hear from one of friends recently – How do you perform a right outer join with a left outer join clause? And no, you can’t swap the tables here. Disclaimer : This question is purely to test your knowledge of how the said Joins work and to test your problem solving skills. It may not serve any real purpose in a practical project scenario. Question Lets say, I have the following tables Demo and Demo2 Query : select * from Demo Output Query : select * from Demo2 Output Now using a left outer join, the result should be equivalent to the output of SELECT A.attribute,
       B.attribute
FROM   Demo2 AS A
       RIGHT OUTER JOIN
       Demo AS B
       ON A.attribute = B.attribute; Output
And you can’t swap the order of the tables i.e. Demo2 should be on the left hand side and Demo should be on the right side. Answer Now this exercise is basically to inculcate the importance of problem solving and hence I will try to detail out the steps as much as possible. Lets go step by step:- 1) The thing to understand here is that we can’t use the left outer join with the normal join condition like A.attribute = B.attribute, the reason being that we would get all the rows of the left side table (Demo2). Eg: SELECT A.attribute,
       B.attribute
FROM   Demo2 AS A
       LEFT OUTER JOIN
       Demo AS B
       ON A.attribute = B.attribute; Output So somehow we need to bring the required rows in. 2) Now that we have understood that the above join condition won’t work, we need to think of another join condition. Let’s say, what would happen if we give a condition that is always true on the ON clause, something like 1=1? SELECT A.attribute,
       B.attribute
FROM   Demo2 AS A
       LEFT OUTER JOIN
       Demo AS B
       ON 1=1; Output Cool, now we have got a cross product. 3) Now the thing to do is to understand what a right outer join is really. If the column on the right hand side matches with the column on the left hand side based on the join condition, then both the columns are displayed. Else, a NULL will be displayed on the left hand side with the correct column on the right hand side. For this first we can write a subquery in the columns like shown below SELECT (SELECT attribute
        FROM   Demo2 AS c
        WHERE  c.attribute = b.attribute) AS attr,
       B.attribute
FROM   Demo2 AS A
       LEFT OUTER JOIN
       Demo AS B
       ON 1 = 1; Output 4) Now all we need to do is to take the distinct and then we get the output as required. SELECT DISTINCT (SELECT attribute
                 FROM   Demo2 AS c
                 WHERE  c.attribute = b.attribute) AS attr,
                B.attribute
FROM   Demo2 AS A
       LEFT OUTER JOIN
       Demo AS B
       ON 1 = 1; Output So, all is well? Not really. This particular query would break when we get duplicates. Suppose the TH row was duplicated in Demo table. Now, when we take the distinct after applying the subquery on the cross product, we will still get only one row for TH when a right outer join would have given 2 rows. So I came up with the below query to solve this particular scenario:- SELECT attr,
       attribute
FROM   (SELECT row_number() OVER (PARTITION BY CASE WHEN A.attribute = b.attribute THEN A.attribute ELSE NULL END, b.attribute ORDER BY a.attribute) AS rnk,
               CASE WHEN A.attribute = b.attribute THEN A.attribute ELSE NULL END AS attr,
               B.attribute,
               row_number() OVER (PARTITION BY b.attribute ORDER BY CASE WHEN A.attribute = b.attribute THEN A.attribute ELSE NULL END DESC) AS rnk2
        FROM   Demo2 AS A
               LEFT OUTER JOIN
               Demo AS B
               ON 1 = 1) AS Outr
WHERE  (Outr.rnk = 1
        AND Outr.rnk2 = 1)
       OR (attr = attribute); Please post your queries in the comments if you have got alternate solutions, I would be pleased to have a look at them.

Posted by SQLJason, 0 comments

Collation conflict error

Yesterday, I was working on SQL Server 2008 R2 for a Proof of Concept regarding map reports. I was doing some querying on the database and hence, had to join a table in a SQL 2008 DB and SQL 2008 R2 DB. So for this purpose, I created a linked server and executed the following query. SELECT *
FROM   mapdata AS a
       LEFT OUTER JOIN 
       CHAOS.diw.dbo.DIM_SALES_ORGANISATION AS b
       ON a.[Sales Rep] = b.[employee_name];
That was when I hit upon the following error “ Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation. ” On further reading, I found out that the default collation in R2 is SQL_Latin1_General_CP1_CI_AS and in SQL 2008 is Latin1_General_CI_AS. Since the databases have different collation, it would not be possible to compare the fields from those databases directly. To solve this issue, I just added the following keyword before the = operator like shown below SELECT *
FROM   mapdata AS a
       LEFT OUTER JOIN 
       CHAOS.diw.dbo.DIM_SALES_ORGANISATION AS b
       ON a.[Sales Rep] COLLATE database_default = b.[employee_name] COLLATE database_default;

Posted by SQLJason, 0 comments
Converting a single comma separated row into multiple rows

Converting a single comma separated row into multiple rows

Ever since I wrote Converting multiple rows into a single comma separated row, I was trying to find a SQL command which will do the reverse, which is converting the single comma separated row back to multiple rows. I checked up a few blogs and I found out that it was possible to do with the help of custom functions or stored procedures, but I was not interested in all of them. Finally I got the answer that I was looking for, and now that I did, I did not waste precious time in noting it down here, lest I forget. Let me try to show you what we are trying to achieve here with the help of an image

This can be done with the help of the below query

 SELECT A.[State],  
     Split.a.value('.', 'VARCHAR(100)') AS String  
 FROM  (SELECT [State],  
         CAST ('<M>' + REPLACE([City], ',', '</M><M>') + '</M>' AS XML) AS String  
     FROM  TableA) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

Again, no stored procedures or function, just plain old SQL 🙂

Update (23/05/2010) : I saw this blog by Brad Schulz (MVP) explaining this scenario and the performance implications in detail. So people who are serious about getting this in their code, please read his blog before you do so - Delimited String Tennis Anyone?

Posted by SQLJason, 69 comments

Converting multiple rows into a single comma separated row

Recently, I have been bitten by the MSDN forums bug. I enjoy spending time in the SSAS and SSRS forums, and the learning that I take out from there is tremendous. Initially, I just used to read and learn, while now I reply to a lot of posts too (just the simple ones, keep the tough ones for the experts to solve 😉 ).
So the other day, there was this post - how can you convert multiple rows into a single comma separated row? For eg,
This can be done by the FOR XML command.
select DISTINCT State, (
select City + ‘,’ as [text()]
from tableA soi
where soi.State=t.State
order by City
for xml path( ” )
)
from tableA t
A handy command, especially because you don’t have to use stored procedures to achieve the result.
UPDATE - 5/26/2016
I see that this post still gets a lot of hits, so just wanted to share the version that I use currently. The previous version works really slow with big tables, but this one works much faster.
SELECT OutTab.state ,
Cities =
STUFF ( ( SELECT ‘,’+InrTab.City
FROM tableA InrTab
WHERE InrTab.state= OutTab.state
ORDER BY InrTab.City
FOR XML PATH(”),TYPE
).value(‘.’,’VARCHAR(MAX)’)
, 1,1,SPACE(0))
FROM tableA OutTab
GROUP BY OutTab.State
Posted by SQLJason, 8 comments