T-sql 2012 parse out multiple values in one field

In a t-sql 2012 database, I am suppose to count the number of times each 'canned' message is used for
Elementary students in the last school year and the current school year.

Right now I have the following sql that kind of works:

USE TEST
SELECT GS.Comments,Count(*) AS [Counts]
FROM dbo.Enrol Enrol
JOIN dbo.Student Student
ON Student.StudentID = Enrol.StudentID
JOIN dbo.GS GS
ON GS.StudentID = Student.Studentid
AND (GS.Comments IS NOT NULL)
AND (GS.Comments <> '')
where Enrol.grade in ('KG','01','02','03','04','05','06')
and Enrol.endYear between 2016 and 2017
Group by GS.Comments
order by Counts desc,GS.Comments asc

The problem is the GS.Comments field is defined as varchar(1200). There can be one message in the field and/or there can be lots of messages in this field. Each message ends with a period and there is a space between each message.

An example of multiple messages in the one GS.Comments field would look like the following:

The student is trying hard and needs to make their time more efficiently. This student is good at math. This student turns in their assignments on time. This student seems to enjoy school.

An example of when one messages is in the one GS.Comments field would look like the following:

This student seems to enjoy school.

Thus would showe me the t-sql 2012 logic that I can use when the GS.Comments field contains multiple messages and/or just one message so that I can count the number of times each unique message has been used?

You have different approachs :

  • use the replace - like in the first query - but this is counting the same message existing in different records

  • if your system have a split function , so you can split it on period character.

       DECLARE @vc_Text AS varchar(1200);
       SET @vc_Text = 'The student is trying hard and needs to make their time more efficiently. This student is good at math. This student turns in their assignments on time. This student seems to enjoy school.';
    
      DECLARE @vt_GS  TABLE
      ( idComments int  NOT NULL IDENTITY(1,1) PRiMARy KEY 
       ,Comments VARCHAR(1200) NOT NULL
       )
    
      INSERT INTO @vt_GS(Comments)
      VALUES(@vc_Text),('This student seems to enjoy school.')
    
      SELECT 
             LEN(Comments) - LEN(REPLACE(Comments,'.','')) AS numberOfMessages
      FROM
            @vt_GS
            
            
            -- here I replace the period character , for the last sentence 
      SELECT COUNT(DISTINCT REPLACE(MsgText,'.','')) AS numberOfUniqueMsgs
      FROM
      (
           SELECT
                     c.v.value('.','varchar(1200)') AS MsgText
           FROM
           (
                  SELECT 
                        CAST('<Comms><comm>' + REPLACE(Comments,'. ','</comm><comm>') + '</comm></Comms>' AS XML) AS New_XML
                  FROM
                        @vt_GS
           )A
           CROSS APPLY A.New_XML.nodes('Comms/comm') AS c(v)
      ) A
    

    numberOfUniqueMsgs
    4

http://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=04d3c63c54d6ea170f5b722fd890c7d8

Like I asked on the other forum, do you have a table that contains the list of "canned" messages?

1 Like

Are you able and willing to change the design you now have? your current design approach will eventually not scale, will have performance issues and in fact will have erroneous results guaranteed.

This is just a one time run. Thus performance is not that much of an issue. Can you show me the sql on how to accomplish my goal?

I need to count the number of times each message is used. I also need to see if the messages entered into this table actually exists on the 'control' tale. Can you show me how to accomplish this goal using t-sql 2012?

Yes. Can you post the CREATE TABLE for the "control" table?

Thanks for showing me the sql:

Here is the table with the canned 'comments' listed below:
CREATE TABLE [dbo].[GradingCommentTemplate](
[commentID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
[schoolID] [int] NULL,
[comment] varchar NULL,
[code] varchar NULL,
CONSTRAINT [PK_GradingCommentTemplate] PRIMARY KEY NONCLUSTERED
(
[commentID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 97) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[GradingCommentTemplate] CHECK CONSTRAINT [FK_GradingCommentTemplate_School]

ALTER TABLE [dbo].[GradingCommentTemplate] WITH NOCHECK ADD CONSTRAINT [FK_GradingCommentTemplate_School] FOREIGN KEY([schoolID])
REFERENCES [dbo].[School] ([schoolID])
ON DELETE CASCADE
GO

Notes:

  1. When I run the job, I will select the elementary schools by school id.
  2. It is a possibility that different elementary schools have the exact same message. For this request, I do not care if they have the same comments.
  3. The value for 'code' in the GradingCommentTemplate is not used anywhere in the database.
  4. The identity key of CommentID is only used in this table and does not refer to any other database tables. There are lots
    of tables in this database that have their commentID column with their own unique values.
    Here is a copy of the school table that is referenced above as a foreign key relationship:

CREATE TABLE [dbo].[School](
[schoolID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
[districtID] [int] NOT NULL,
[ncesSchoolID] varchar NULL,
[number] varchar NOT NULL,
[name] varchar NOT NULL,
[type] varchar NULL,
[comments] varchar NULL,
[address] varchar NULL,
[city] varchar NULL,
[state] varchar NULL,
[zip] varchar NULL,
[phone] varchar NULL
) ON [PRIMARY]

GO

number of times unique message has been used per student or for all students?

create table #studentcomments(studentname varchar(20), comment varchar(1200))

insert into #studentcomments
select 'yosiasz', 'The student is trying hard and needs to make their time more efficiently. This student is good at math. This student is good at math. This student turns in their assignments on time. This student seems to enjoy school.' union
select 'jassie', 'You got a lot to learn kid.' union
select 'shabbaranks', 'Ting-a-ling a ling, schoolbell a ring, knife and fork ah fight fi dumplin. Ting-a-ling-a-ling. This student is good at math.' 

 select count(comment), comment 
 From (
 SELECT Split.a.value('.', 'VARCHAR(100)') AS comment 
FROM (SELECT CAST ('<M>' + REPLACE(rtrim(ltrim(comment)), '.', '</M><M>') + '</M>' AS XML) AS comment ,
             studentname
		FROM #studentcomments
		where comment <> '' or comment is not null
	) A CROSS APPLY comment.nodes ('/M') AS Split(a)
 where (Split.a.value('.', 'VARCHAR(100)') <> '' and Split.a.value('.', 'VARCHAR(100)') is not null)
) a
group by comment

drop table #studentcomments
1 Like

This is the number of times a unique message has been used by all students in grades kindergarten through 6th grade.

Of the xml that is listed above, I have the following questions:

  1. I need compare the values against a control table. Thus can you show me how to join the above logic to a control table that have a field called comment declared as varchar(200)?
  2. Does the split command listed above work in t-sql 2012?
    3, can you explain the sql below:
    (SELECT CAST ('' + REPLACE(rtrim(ltrim(comment)), '.', '') + '' AS XML) AS comment ,
    studentname
    FROM #studentcomments
    where comment <> '' or comment is not null
    ) A CROSS APPLY comment.nodes ('/M') AS Split(a)
    I know in xml you need . However you are using this tag twice. is there a reason for using the M tag twice? If so, what is the reason? If not, are you just using this as a 'random' tag that you came up with?
  1. what have you tried for this? show me your code
  2. Have you tried it in sql-2012?
  3. what do you think it is doing? Try to explain it to me

I have tried the following code:
SELECT
P.RetVal AS Comments
,COUNT() AS [Counts]
FROM
dbo.Enrol Enrol
JOIN
dbo.Student Student ON Student.StudentID = Enrol.StudentID
JOIN
dbo.GS GS ON GS.StudentID = Student.Studentid
AND (GS.Comments IS NOT NULL)
AND (GS.Comments <> '')
CROSS APPLY (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('' + replace((Select replace(GS.Comments,'. ','§§Split§§') as [
] For XML Path('')),'§§Split§§','')+'' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) P
WHERE
Enrol.grade IN ('KG', '01', '02', '03', '04', '05', '06')
AND Enrol.endYear BETWEEN 2016 AND 2017
GROUP BY
P.RetVal

I would also like to join a control table with a field called comments with varchar(200) and join it to the results listed above.
can you tell me how to setup this join