

(There are ways to get that working but that is out of the scope of this article.) If you don’t know what regular expressions are, and want to learn, you can look here.

SQL Server does not support regular expressions natively. Some of you may be familiar with regular expressions. Is there a way to find character strings that do not match (or do match) what we consider a valid character string? Complex Pattern Matching with LIKE Operator As we stated before, the LEN() and DATALENGHT() (or DATALENGHT()/2 for UNICODE) of a proper character string is the same. Technically what is left is a proper character string. Looking over the query you can see that we needed to trim leading spaces with LTRIM() (LEN() trimmed the trailing spaces for us as part of its behavior). If you were searching a UNICODE character expression you would replace the DATALENGTH(THE_TEXT) in the where clause with DATALENGTH(THE_TEXT)/2 WHERE LEN(LTRIM(THE_TEXT)) DATALENGTH(THE_TEXT) ,LEN(LTRIM(THE_TEXT)) AS N'LEN() OF TRIMMED' The DATALENGTH is 2X that of the ASCII for the Unicode character string.Īrmed with that knowledge, let’s write a query that will locate all of the strings that have a space at the beginning and the end. ,DATALENGTH(N'Zombieland') AS 'DATALENGTH() UNICODE' Īs you can see, all things being equal, and with no trailing spaces, LEN() & DATALENGTH() will return the same number for ASCII and UNICODE character expressions. ,DATALENGTH('Zombieland') AS 'DATALENGTH()' For UNICODE character strings, which use 2 bytes per character, DATALENGTH() is 2X the LEN(). So for ASCII character strings that use 1 byte per character, the LEN and DATALENGTH() should be equal. The DATALENGTH() function tells you the number of bytes used to make a character string. It is not mentioned in the Microsoft Technet article and it is not very intuitive. I put that in bold and italics for a reason. The LEN() function gives you the number of characters in a character string but it trims the trailing empty spaces. Using LEN()/DATALENGTH() to find leading and trailing spaces Let’s investigate a couple of methods in which to find out what is happening in these strings. I will discuss this in a bit more detail later. For example: are stings case insensitive or not (is "ZombieLand" = "zombieland"?) or are a string’s accent sensitive or not (is Ça = Ca).
WILD 9 CHARACTERS HOW TO
Collation is SQL Server’s way of knowing how to sort and compare data. There is however some basic things you should know. If you need further information here is a good place to start.
WILD 9 CHARACTERS FULL
CollationĪ full examination of collation is out of the scope of this article. This means the column is case and accent sensitive. One last thing, the collation of the THE_TEXT is Latin1_General_CS_AS. This particular error/issue will come up later. It has multiple spaces between “Zombie” and “Land”. I created character expressions with spaces at the beginning and end as well as added special characters (“/”,”#”) in the middle. ('ACCENTS','Régie du logement - Gouvernement du Québec'),Īlso, look at a number of the problems. ('SPECIAL CHARACTERS MULTIPLE SPACES','35 Zombie Land Apt 13'), ('SPECIAL CHARACTERS \','This and\or that'), ('SPECIAL CHARACTERS /','This and/or that'), ('SPECIAL CHARACTERS - # SYMBOL','35 Zombie Land Apt. ('SPECIAL CHARACTERS - APOSTROPHE','Zombie''s Land'), ('SPECIAL CHARACTERS - CARRAGE RETURN LINE FEED','Zombieland' + CHAR(13) + CHAR(10)), ('SPACE AT START AND END', N' Zombieland '), ,THE_TEXT NVARCHAR(60) COLLATE Latin1_General_CS_AS IF (SELECT OBJECT_id(N'TEST_DATA','U')) IS NOT NULL

So the “EXAMPLE_TYPE” can be “SPACE AT END” and “THE_TEXT” has a piece of text with a space at the end (‘Zombieland ‘). The first column is a description of the key component we are looking at in the 2 nd column.

The script below will create a table with 2 columns. What I do is run them against the data every morning and send out emails to those responsible so they can correct them before the issue starts to manifest. Over the years I have compiled a small library of queries that I hope you will find as useful as I have. The reality is that even if you have a great deal of control over the data, this stuff always sneaks in. These issues manifest as odd characters on a report, off center titles, screens not opening, orders failing to get created and the list goes on. In addition many times the errors associated with the problem have nothing to do with identifying the character, but are related to the issue it is causing. When working with multiple source systems wouldn't it be nice if everyone could agree on what characters were acceptable. Special characters are one of those necessary evils.
