Finding the Precision and recall between agent entered and automatically generated codes

Hi All,

Hope you are doing well!.. An encounter is indicated by the combination of ctextid and vbillid ...A human codes the encounter and the entered set of codes by the human is represented in the agent entered codes column in input table (Column C).. From these set of codes we need to consider only the codes present in the match table and remove the ones not present in the match table from the agent entered codes column in input table thus creating a new column agent entered matching codes . .. The ML Predicted codes column indicates the set of codes that are automatically generated ...

Between the agent entered matching codes column and the ML predicted codes column we need to find the set of intersecting codes ...

After that the metrics precision and recall are calculated as the following :

Precision -( Number of codes in matching codes column)/ (Number of codes in the ML predicted code column)*100

A code is separated from the other by a comma..

Recall -( Number of codes in matching codes column)/ (Number of codes in the agent entered matching code column)*100

Please find below the DDL for the input table ,match table and the output table ..Can you please help me here..

Input table

create table ##input
(ctextid int,
vbillid int,
agententeredcodes varchar(100000),
MLpredictedcode varchar(100000))

insert into ##input values
('711','475','H6692,J309','J309, H6692'),
('811','130','I714,I2510,I77810,Z9862,E785,I119,Z955,I639','I2510'),
('9021','900','Z00129,E669,Z6853','Z00129, Z7189, Z713'),
('9234','890','E039,I10,E559,R7303,Z760,Z6829,Z79899,Z79890','I10, E039, E559, R7303')

Match table

create table ##match
(codelist varchar (10000)
)

insert into ##match values
('J309'),
('H6692'),
('I2510'),
('E785'),
('I119'),
('I639'),
('Z6853'),
('I10'),
('E039'),
('E559'),
('Z79899'),
('R7303'),
('Z760'),
('Z6829')

##output

create table ##output
(ctextid int,
vbillid int,
agententeredcodes varchar(100000),
agententeredmatchingcodes varchar (100000),
MLpredictedcodes varchar(100000),
matchingcodes varchar(100000),
precision float,
recall float
)

insert into ##output values
('711','475','H6692,J309','H6692,J309','J309, H6692','J309, H6692','100','100'),
('811','130','I714,I2510,I77810,Z9862,E785,I119,Z955,I639','I2510,E785,I119,I639','I2510','I2510','100','25'),
('9021','900','Z00129,E669,Z6853','Z00129,E669','Z00129, Z7189, Z713','Z00129','33.3333333333333','50'),
('9234','890','E039,I10,E559,R7303,Z760,Z6829,Z79899,Z79890','I10,E039,E559,Z79899,R7303,Z760,Z6829','I10, E039, E559, R7303','I10, E039, E559, R7303','100','50')

Thanks,
Arun

  1. Is this for microsoft sql server? never seen varchar(100000) before.

  2. What is : The ML Predicted codes

  3. From these set of codes we need to consider only the codes present in the match table
    In that case agententeredcodes would be identical to agententeredmatchingcodes as well as matchingcodes

Hi @yosiasz

  1. Sorry its just varchar(1000)
  2. The ML predicted codes column in the input table are the codes that are automatically generated due to an algorithm
  3. So in the input table we have the column agententeredcodes...Also in the match table there are a set of codes ...So essentially the column agententeredmatching codes in the output table is the one that is got only by retaining the codes in the match table for every encounter and the other codes are removed for every encounter

cool. since we do not have access to the algorithm for ML predicted code you will have to do the calculation for Precision yourself. This might get you started. You have a lot of redundancy in some data points.

;with cals
as
(
 select  count(d.value) as agent_count,ctextid
   from ##input i
   cross apply string_split(i.agententeredcodes, ',') d
   group by ctextid
 ), mtc as 
 (
select distinct i.ctextid, 
       i.vbillid, 
	   i.agententeredcodes, 
       'algorithm' as agentmatchingcode,
       MLpredictedcode, 'same stuff' machinecodes
  from ##input i
 cross apply string_split(i.agententeredcodes, ',') d
 )
 select * , 
 	   (agent_count/agent_count) * 100  precision,
	   1 recall

   from mtc m
   join cals c on m.ctextid = c.ctextid