SQLTeam.com | Weblogs | Forums

Bad Data Entry in a huge database


#1

Hi Someone enter a NVarchar (characters) into a field that ONLY takes Datetime. How can I find what row and What table is the Data Entry Error?

Thanks,


#2

You can start with the following:

SELECT * FROM YourTable 
WHERE ISDATE(YourNVarcharColumn) = 0

However, that may return rows where the M/D/Y setting does not match the sessions M/D/Y setting. But, that should give you something to start with. If only a few rows are returned, then you can work with that.

After you fix the problem, consider the feasibility of changing the data type to DATETIME.


#3

Is that column included in an index? If not, checking it will cause a full table scan.

Do you have any idea of when the bad data was inserted? Maybe that could help you reduce the amount of the table you have to search.


#4

I know we all get stuck with database designs that someone else made and that we often times have no control over the physical structure. However, I am curious as to why NVARCHAR is being used to store the date/time value?

Aside from the obvious issues with doing that, using NVARCHAR is going to take double the amount of space for the date/time as opposed to using just VARCHAR. If you are only storing the numbers and delimiters in the column, using NVARCHAR is a waste of space. If you are storing the language specific values (like Monday or July) for the month/day IN the NVARCHAR column, that seems to me to be a serious problem.

The OP indicated that the database is huge. If you have any control over the data types, getting rid of the NVARCHAR column and using a valid date/time datatype would be preferred for many different reasons. Even if you are storing ONLY the date in a known format (e.g., YYYY-MM-DD) it would be better as a CHAR column than a VARCHAR or even worse, an NVARCHAR column.

I am sure someone out there is able to articulate the differences with the size requirements for the different date types and can illustrate the disk space savings between date/time vs CHAR vs VARCHAR vs NVARCHAR. When you are dealing with Millions or Billions or rows, the savings in space can really add up.


#5

I did a little more digging and some calculations (very rough) on the disk space requirements for the differences between the different data types (char VS varchar vs nvarchar vs datetime) and the numbers really add up as you add more rows.

Storing the data in a date/time data type vs a NVARCHAR datatype is going to save you upwards of five times the amount of disk space for ONLY that column. That doesn't include any indexes on the column and certainly doesn't address the performance of trying to get data out of that value.

Again, VERY rough numbers but in the general ballpark. Enough to illustrate the point that datatype choice makes a big impact on the database.

I would like to give credit to a couple articles I found on-line related to this:

Data Type Space Requirements
Always excellent Aaron Bertrand and Bad Habits to Break


#6

That's impossible. If the column is defined as type datetime, SQL will only allow valid date/time values. Or, do you mean that the column is defined as nvarchar, but you use it to store datetime values? If so, why is it defined as nvarchar?

If that's the case, and you are running SQL 2012 or later, you could use TRY_CONVERT() to find the errors:

TRY_CONVERT