Help with this query needed

I have the following 2 tables which are linked by LINENUM and by TAG which is part of the FILENAME.

       TABLE a                      TABLE b
LINENUM       TAG           LINENUM      FILENAME
-------------------         --------------------------
   1         TAG-01            1      FN-TAG-01_130918
   1         TAG-02            1      FN-TAG-04_120918
   1         TAG-03            1      FN-TAG-09_130918
   2         TAG-04            2      FN-TAG-04_130918
   3         TAG-05            3      FN-TAG-05_110918

In our process FILENAME is typed manually so typos occur. I'm working on a tool to detect these errors and want something like this to be returned:

LINENUM       TAG          FILENAME           Check
-----------------------------------------------------------
   1         TAG-01        FN-TAG-01_130918      NULL
   1         TAG-02              NULL         FN-TAG-04_120918, FN-TAG-09_130918
   1         TAG-03              NULL         FN-TAG-04_120918, FN-TAG-09_130918
   2         TAG-04        FN-TAG-04_130918      NULL
   3         TAG-05        FN-TAG-05_110918      NULL

This is what I was trying:

;with Match as (
Select a.LINENUM, a.TAG, b.FILENAME
From a
    left outer Join b on a.LINENUM = b.LINENUM and b.FILENAME like '%'+a.TAG+'%'),

This gives the first 3 columns but the tricky part starts as I want to join only those records of Table b that haven't been linked by the first relationship. So I continue...

Orphans as (
Select LINENUM, FILENAME
From b
WHERE NOT EXISTS (SELECT FILENAME
                  FROM Match
                  WHERE Match.FILENAME = b.FILENAME))

and finally:

Select Match.*
From Match
    left outer join Orphans on Match.LINENUM = Orphans.LINENUM and Match.FILENAME is NULL

At that point the Records for LINENUM 1 and the unmatched FILENAME double and everything gets more complicated, beside the fact that applying this to my real case the query is executing extremly slow. So I probably need a completely different approach.

Any suggestions?


OK, after having declared Orphans the final Select statement would be:
Select Match.*, a.Comment
From Match
     left outer Join (Select LINENUM, stuff((select ', ' + FILENAME from Orphans for xml path('')),1,1,'') AS Comment
	              From Orphans
                      Group by LINENUM)a on a.LINENUM = Match.LINENUM and Match.FILENAME is NULL

This returns the desidered table, but with a very bad performance when applied to my actual tables.
Anybody has a smarter idea?

trying to duplicate your problem

when i try to run
your
final select it fails ????

Select Match.*, a.Files
From Match
     left outer Join (Select LINENUM, stuff((select ', ' + FILENAME from Orphans for xml path('')),1,1,'') AS Check
	              From Orphans
                      Group by LINENUM)a on a.LINENUM = Match.LINENUM and Match.FILENAME is NULL

This

This is the complete example:

IF OBJECT_ID('tempdb..#A') IS NOT NULL
DROP TABLE #A

IF OBJECT_ID('tempdb..#B') IS NOT NULL
DROP TABLE #B

CREATE TABLE #A ( LINENUM int, TAG NVARCHAR(10))
CREATE TABLE #B ( LINENUM int, FILENAME NVARCHAR(30))

INSERT INTO #A (LINENUM, TAG) VALUES (1, 'TAG-01')
INSERT INTO #A (LINENUM, TAG) VALUES (1, 'TAG-02')
INSERT INTO #A (LINENUM, TAG) VALUES (1, 'TAG-03')
INSERT INTO #A (LINENUM, TAG) VALUES (2, 'TAG-04')
INSERT INTO #A (LINENUM, TAG) VALUES (3, 'TAG-05')

INSERT INTO #B (LINENUM, FILENAME) VALUES (1, 'FN-TAG-01_130918')
INSERT INTO #B (LINENUM, FILENAME) VALUES (1, 'FN-TAG-04_120918')
INSERT INTO #B (LINENUM, FILENAME) VALUES (1, 'FN-TAG-09_130918')
INSERT INTO #B (LINENUM, FILENAME) VALUES (2, 'FN-TAG-04_130918')
INSERT INTO #B (LINENUM, FILENAME) VALUES (3, 'FN-TAG-05_110918')

;with Match as (
Select a.LINENUM, a.TAG, b.FILENAME
From #A a
    left outer Join #B b on a.LINENUM = b.LINENUM and b.FILENAME like '%'+a.TAG+'%'),

Orphans as (
Select LINENUM, FILENAME
From #B b
WHERE NOT EXISTS (SELECT FILENAME
                  FROM Match
                  WHERE Match.FILENAME = b.FILENAME))

Select Match.*, a.Comment
From Match
     left outer Join (Select LINENUM, stuff((select ', ' + FILENAME from Orphans for xml path('')),1,1,'') AS Comment
	              From Orphans
                      Group by LINENUM)a on a.LINENUM = Match.LINENUM and Match.FILENAME is NULL

hi barnabeck

  1. one idea is to add indexes to make it fast

  2. idea two is to break down the data using row_number() over()
    and then processing half the data at a time
    means if you have 1 million rows divide into two halves
    and process each half at a time

  3. I got some ideas while doing your query and made some changes

In the Same Table another column Match with Orphans
IF Object_id('harish_sample..A') IS NOT NULL 
    DROP TABLE a 

IF Object_id('harish_sample..B') IS NOT NULL 
  DROP TABLE b 

CREATE TABLE a 
  ( 
     linenum INT, 
     tag     NVARCHAR(10) 
  ) 

CREATE TABLE b 
  ( 
     linenum  INT, 
     filename NVARCHAR(30) 
  ) 

INSERT INTO a 
            (linenum, 
             tag) 
VALUES      (1, 
             'TAG-01') 

INSERT INTO a 
            (linenum, 
             tag) 
VALUES      (1, 
             'TAG-02') 

INSERT INTO a 
            (linenum, 
             tag) 
VALUES      (1, 
             'TAG-03') 

INSERT INTO a 
            (linenum, 
             tag) 
VALUES      (2, 
             'TAG-04') 

INSERT INTO a 
            (linenum, 
             tag) 
VALUES      (3, 
             'TAG-05') 

INSERT INTO b 
            (linenum, 
             filename) 
VALUES      (1, 
             'FN-TAG-01_130918') 

INSERT INTO b 
            (linenum, 
             filename) 
VALUES      (1, 
             'FN-TAG-04_120918') 

INSERT INTO b 
            (linenum, 
             filename) 
VALUES      (1, 
             'FN-TAG-09_130918') 

INSERT INTO b 
            (linenum, 
             filename) 
VALUES      (2, 
             'FN-TAG-04_130918') 

INSERT INTO b 
            (linenum, 
             filename) 
VALUES      (3, 
             'FN-TAG-05_110918'); 

IF Object_id('harish_sample..match') IS NOT NULL 
  DROP TABLE match
  
SELECT a.linenum, 
                a.tag, 
                b.filename ,
                case when b.filename is not null then 'match' else 'orphans' end as match 
                into match
         FROM   a a 
                LEFT OUTER JOIN b b 
                             ON a.linenum = b.linenum 
                                AND b.filename LIKE '%' + a.tag + '%' 
select * from match;