Tuesday 22 December 2009

String manipulation - removing extraneous detail - SQL

I'm currently working on an Integration Services project where I'm pulling filenames from a directory into a database to check for name consistency against a 'master' table.

The filenames take the form:
Abbey Lane Cemetery 03022 Floor Plan
Abbey Lane Primary 4-11 2001 Floor Plan
Abbeydale Grange Secondary 11-16 4254 Floor Plan
Abbeydale Industrial Hamlet 00433 Floor Plan
Aldine House CYP0048 Floor Plan

To check against the 'master' table, the entries need to be in this format:
Abbey Lane Cemetery
Abbey Lane Primary 4-11
Abbeydale Grange Secondary 11-16
Abbeydale Industrial Hamlet
Aldine House

What I needed to do was strip out the 4fig., 5fig. or CYP#### and the words 'Floor Plan' from every entry.

To strip out the words 'Floor Plan' would be simple enough, but was unsure how best to remove the reference number in each case. I first thought that I could create a selection for all entries with 4fig. and remove those, and repeat for all those with 5fig. and again for CYP####.
After much help from Chris Morris on SQLServerCentral.Com, he devised a query that would reverse the text string, find the position of the third space and return only the site name as required.
I combined the query with my Nums CTE to produce:


DECLARE @n AS BIGINT;
SET @n = 100;
WITH Nums AS
(
SELECT 1 AS n
UNION ALL
SELECT
n + 1 FROM Nums WHERE n < @n
)
UPDATE CADList
SET FileNM = d.[Output]
FROM CADlist AS c
CROSS APPLY
(
SELECT LEFT(c.FileNM, LEN(c.FileNM) - MAX(x.n)) AS [Output]
FROM (SELECT TOP 3 n
FROM Nums AS n
WHERE SUBSTRING(LTRIM(REVERSE(c.FileNM)), n, 1) = ' '
AND n < LEN(c.FileNM)
ORDER BY n) AS x
) AS d


This can be improved by using the more efficient CTE as posted http://sqlreportingservicescrystalreports.blogspot.com/2009/12/auxilary-cte-of-numbers-sql.html
In my case, I'm only dealing with around 400 rows.

No comments: