Show difference between records in master file and submitted records

Greetings experts,

I am really confused about the results I am getting when I run the following query:

    SELECT Name, COALESCE(NULLIF(workemail, ''), homeemail) AS empEmail
    FROM MasterList ml 
    WHERE NOT EXISTS ( 
                  SELECT email 
                  FROM SubmittedList sl inner join DateDetails dd on sl.employeeID = dd.employeeID
                  WHERE Year(dd.datecreated) = Year(getDate()) and sl.email IN (ml.workemail, ml.homeemail))
    ORDER BY Name ASC

First, some background. We were given what we call a master list of employees who have been tasked with providing some useful feedback by taking a survey.

This list contains a total of 340 employees.

We loaded this list into our database called MasterList

Then we have a web application that records the names, email addresses and phone numbers of the employees who have taken this survey and stores this information in a table called SubmittedList.

Finally, we are asked to provide weekly list of those who have taken the survey.

From a logical point of view, the only relationship between the MasterList and SubmittedList is email address.

The MasterList has two columns, workEmail and HomeEmail.

The query above looks at WorkEmail column first. If the employee is using his/her work email, grab that.

Otherwise, grab the home email.

Both WorkEmail and HomeEmail columns cannot both be blank at same time.

The query compares the emails on the SubmittedLit table with the emails on the MasterList table.

If there is a match, then an employee has submitted the survey.

If there is no match, then the employee has not submitted the survey.

Then the list of the employees who have not submitted the survey is to be sent to management.

The issue we are having currently, is when the first part of the query that checks list of records based on existence of either workemail or homeemail, I get the correct result, say for instance 340 records.

    SELECT Name, COALESCE(NULLIF(workemail, ''), homeemail) AS empEmail
    FROM MasterList ml 

When I run the bottom half of the query based on the existence of email addresses within the range date range of submission, I get the correct results, for instance, 220.

                  SELECT email 
                  FROM SubmittedList sl inner join DateDetails dd on sl.employeeID = dd.employeeID
                  WHERE Year(dd.datecreated) = Year(getDate()) and sl.email IN (ml.workemail, ml.homeemail))

However, when I run the query I posted at the beginning, the total number of those who have not taken the survey shows 20 more users have taken the survey.

We could not find any duplications.

Can you please tell me if there is something wrong with my query comparing the difference between SubmittedList and MasterList based on the existence of email addresses on both tables?

Did I approach the query incorrectly?

Thank for your assistance in advance

I take it it's possible the either/both of the original emails can be NULL. In that case, you would need to do:


SELECT Name, COALESCE(NULLIF(workemail, ''), homeemail) AS empEmail
    FROM MasterList ml 
    WHERE NOT EXISTS ( 
                  SELECT email 
                  FROM SubmittedList sl inner join DateDetails dd on sl.employeeID = dd.employeeID
                  WHERE dd.datecreated >= Dateadd(Year, Datediff(Year, 0, GETDATE()), 0)
                      and dd.datecreated < Dateadd(Year, Datediff(Year, 0, GETDATE()) + 1, 0) 
                      and sl.email IN (ISNULL(ml.workemail, ''), --<<--
                                                ISNULL(ml.homeemail, '')) --<<--
                    )
    ORDER BY Name ASC

@Chikwe

The problem with NULL(s) is that they will slip through the NOT EXISTS if you use "IN (<may_be_null_value>, ...)". Getting rid of NULL values in the IN should prevent false rows from coming thru in the result.

Hi

hope this helps

have rewritten your query using left join

Create Sample Data Script
DROP TABLE #masterlist

DROP TABLE #submittedlist

DROP TABLE #datedetails

CREATE TABLE #masterlist
  (
     employeeid INT,
     NAME       VARCHAR(20),
     workemail  VARCHAR(40) NULL,
     homeemail  VARCHAR(40) NULL,
     phone      INT
  )

INSERT INTO #masterlist
SELECT 1234,
       'Tom',
       'tom@workemail.com',
       'tomc@homeemail.com',
       222222

INSERT INTO #masterlist
SELECT 5555,
       'Pam',
       'Pam@workemail.com',
       NULL,
       66666666

INSERT INTO #masterlist
SELECT 3434,
       'Sam',
       NULL,
       'Sam@homeemail.com',
       77777

CREATE TABLE #submittedlist
  (
     employeeid INT,
     email      VARCHAR(40)
  )

INSERT INTO #submittedlist
SELECT 1234,
       'tom@workemail.com'

INSERT INTO #submittedlist
SELECT 5555,
       'Sam@homeemail.com'

CREATE TABLE #datedetails
  (
     employeeid  INT,
     datecreated DATE
  )

INSERT INTO #datedetails
SELECT 1234,
       Getdate() - 5

INSERT INTO #datedetails
SELECT 5555,
       Getdate() - 10

INSERT INTO #datedetails
SELECT 3434,
       Getdate() - 15

SELECT 'Not Taken Survey',
       a.email
FROM   (SELECT COALESCE(NULLIF(workemail, ''), homeemail) AS email
        FROM   #masterlist) a
       LEFT JOIN (SELECT email
                  FROM   #submittedlist sl
                         INNER JOIN #datedetails dd
                                 ON sl.employeeid = dd.employeeid
                  WHERE  Year(dd.datecreated) = Year(Getdate())) b
              ON a.email = b.email
WHERE  b.email IS NULL

image

SELECT 'Taken Survey',
       a.email
FROM   (SELECT COALESCE(NULLIF(workemail, ''), homeemail) AS email
        FROM   #masterlist) a
       LEFT JOIN (SELECT email
                  FROM   #submittedlist sl
                         INNER JOIN #datedetails dd
                                 ON sl.employeeid = dd.employeeid
                  WHERE  Year(dd.datecreated) = Year(Getdate())) b
              ON a.email = b.email
WHERE  b.email IS NOT NULL

image

hi

i ran your query on my sample data set

SELECT Name, COALESCE(NULLIF(workemail, ''), homeemail) AS empEmail
    FROM #MasterList ml 
    WHERE NOT EXISTS ( 
                  SELECT email 
                  FROM #SubmittedList sl inner join #DateDetails dd on sl.employeeID = dd.employeeID
                  WHERE Year(dd.datecreated) = Year(getDate()) and sl.email IN (ml.workemail, ml.homeemail))
    ORDER BY Name ASC

image

Your query is working fine in my sample data set

I think
you will have to pay attention to the data

i mean submitted could have data not present in master

or

datedetails could be filtering out data from submitted with the year clause
which could be the issue

Please let me know

Thanks

@harishgg1 ,

Thank you and Scott Fletcher very much for the assistance,

Given that your query, Scott's query and a sleuth of many queries I have ran on this, they are all coming back with same results.

Like you said @harishgg1 , it is possible that the excel file containing the master list must have changed at some point and therefore, is no longer matching up with our calculations with various queries.

I have asked the owners of the app that we built, to compare the data we gave them as the list of those who have not taken the survey, compare that list against the master list including the original master list they started with and see if they see the difference in numbers.