SQLTeam.com | Weblogs | Forums

T-sql 2012 parse out message when appropriate

sql-server-2008
sql2012
sql2008r2

#1

In a t-sql 2012 database, I am suppose to count the number of times each 'canned' message is used for
Elementary students in the last school year and the current school year.

The following is the sql that I currently am using:
Declare @Delimiter char(2) = '. ';

;With GetTheComments(Comments) As
(SELECT GS.Comments
FROM dbo.Enrol Enrol
JOIN dbo.Student Student
ON Student.StudentID = Enrol.StudentID
JOIN dbo.GS GS
ON GS.StudentID = Student.Studentid
AND (GS.Comments IS NOT NULL)
AND (GS.Comments <> '')
where Enrol.grade in ('KG','01','02','03','04','05','06')
and Enrol.endYear between 2016 and 2017)

),
Pieces (Comments, start, stop) AS (
SELECT Comments, CAST(1 AS bigint) As start, CAST(CHARINDEX(@Delimiter, Comments) AS bigint) As stop
From GetTheComments
UNION ALL
SELECT Comments, CAST(stop + Len(@Delimiter) As bigint) As start, CAST(CHARINDEX(@Delimiter, Comments, stop + Len(@Delimiter)) AS bigint) As stop
FROM Pieces
WHERE stop > 0),

EachComment(Comments) As
(SELECT LTrim(SUBSTRING(Comments, start, CASE WHEN stop > 0 THEN stop-start ELSE LEN(Comments) END)) AS Comments
FROM Pieces)

Select Comments, Count(*) As Counts
From EachComment
where Comments <> ''
Group By Comments
Order By Counts Desc, Comments Asc;
An example of multiple messages in the one GS.Comments field would look like the following:
This student is good at math. This student turns in their assignments on time. This student seems to enjoy school.

An example of when one messages is in the one GS.Comments field would look like the following:

This student seems to enjoy school

The problem is the GS.Comments field is defined as varchar(1200). There can be one message in the field and/or there can be lots of messages in this field. Each message ends with a period and there is a space between each message that have
more than 1 message in the varchar(1200) field. When there is only 1 message, there is no period afer the message.

The problem is there can be results that look like the following

message count
This student seems to enjoy school. 150
This student seems to enjoy school 25
.
Basically there is a count for a message with a period and there is a couunt for when there is not a period for the same message.

Thus can you show me what I can do with the sql listed above so the messages are treated as being the same. Basically I want 1 message with 1 count for what I listed above.


#2

try a case statement
case when message count like '%This student seems to enjoy school%' then 1 else 0 end as msgcount


#3

probably more efficient (might be splitting hairs though!) to just do

AND (GS.Comments <> '')

that will:

  • Exclude where GS.Comments IS NULL (as a side effect)
  • Exclude where GS.Comments is one, or many, spaces.

You may find that Jeff Moden's SQL 8K “CSV Splitter” Function is more efficient than your inline code, but if you don't have a performance problem I wouldn't change it. Most splitters like Jeff's assume a single character delimiters, so you'd need to REPLACE your period-space first 9with a single-char delimiter), or just use a PERIOD as the delimiter and LTRIM the resulting data (which might have leading spaces anyway ... perhaps?)

Presupposes none of the comments contain a period within their text though.

Personally I would prefer to have a design that had the comments in a separate one-per-row child table, rather than concatenated into a single column.

Dunno if this is any help?

SELECT	X.CountStart
	, [Left] = ']'+LEFT(MyString, X.CountStart) +'['
	, [Right] = ']'+STUFF(MyString, 1, X.CountStart, '') +'['
	, MyString
FROM
	(
		SELECT	[MyString] = 'This student seems to enjoy school. 150'
		UNION ALL SELECT 'This student seems to enjoy school 25'
		UNION ALL SELECT 'Trailing space 123 '
		UNION ALL SELECT 'This has no numeric at the end'
		UNION ALL SELECT ''	-- Blank string
		UNION ALL SELECT NULL
	) AS T
	CROSS APPLY
	(
		SELECT	[CountStart] = LEN(RTrim(MyString)) - PATINDEX('%[^0-9]%', REVERSE(RTrim(MyString))) + 1
	) AS X
CountStart  Left                                      Right
----------- ----------------------------------------- -----
36          ]This student seems to enjoy school. [    ]150[
35          ]This student seems to enjoy school [     ]25[
15          ]Trailing space [                         ]123 [
30          ]This has no numeric at the end[          ][
1           ][                                        NULL
NULL        NULL                                      NULL