Sumproduct / Count Distinct Sql script?

Hi all, apologies for the potential incorrect category but was hoping someone can help basically I've got a query dataset and I'm after an extra column which basically counts the number of distinct columns but only once. In Excel I use the sum product line to do it but not sure how to do it in sql.

For example, if I have a dataset of 400000 records with a variety of dates, booked or not booked, person and morning or afternoon. I want a sql query to run through and put 0 or 1 so that I have a distinct count of unique records based on the morning/afternoon, date and person whilst retaining all 400000 records

Not sure if I'm making sense but not sure how to explain it in any other manner.

Thank you for time reading this :slight_smile:

Your question is not entirely clear to me, but you can use the COUNT function along with GROUP BY clause to find the counts. An example below:

select 
	Date,
	BookingStatus,
	count(*)  AS RecordsPerDateAndBookingStatus
from YourTable 
Group by
	Date,
	BookingStatus

My recommendation would be to look at native SQL methods rather than trying to replicate the behavior/logic that Excel implements. For data manipulation, SQL Server is more powerful and flexible, at least most of the time.

1 Like

Thanks James for your response :smiley:. I thought my question didn't make sense but hopefully with the attached it may make it clearer as to what I'm hoping to achieve (the attached picture is an extract from my SQL script - I've just been copying it into excel to be able to run that formula (formula also included - I've just assumed that's the easiest way for me to get a distinct column to show unique dates by person and type)

All I'm wanting it to do is keep the number of records but just have an extra column which has 0 or 1 for distinct records based on three criteria's (Name + Date and Session (Morning/Afternoon))

DISTINCT records exist, presumable your Match = "0" is where no record exists?

So you would need a full list of all Name, Type, Date, Timing possibilities, and then a 1/0 for whether there is a row, or not?

Do you have a "Timing" column with a "Morning" / "Afternoon" value, or do just just have a "Time" value (which therefore needs converting to Morning / Afternoon)?

1 Like

Hi Kristen,

Thank you for taking time to read and respond - much appreciated :smiley:

I do have a time value converted as varchar (sorry the picture is just an extract/example of what I'm trying to achieve), so i have a SQL statement saying:

iif(Aptime > 1259, 'Afternoon', 'Morning)

If I have Person A, 12/03/2017, Morning Session (14 records different times) I would want the first row (earliest time) to be flagged as 1 and then the other 13 flagged as 0.

In excel, running the sumproduct formula makes my machine completely freeze and sometimes unresponsive. Which is why, as the data comes from SQL, i thought it would help if i could get this flagged straight through SQL

Thank you

Maybe this:

select [name]
      ,[type]
      ,[date]
      ,timing
      ,case when rn=1 then 1 else 0 end as match
  from (select [name]
              ,[type]
              ,[date]
              ,case when aptime>1259 then 'Afternoon' else 'Morning' end as timing
              ,row_number() over(partition by [name]
                                             ,[date]
                                             ,case when aptime>1259 then 1 else 0 end
                                 order by aptime
                                )
               as rn
          from yourtable
       ) as a
;

Thanks for the assistance, I tried to fit that into the script however kept failing... maybe because the example above was an example - here is the actual script (apologies for the inappropriate use of iif as opposed to case etc.. (newbie at work!)

Thanks

select b.ID,b.[Clino],[Past/Future],[Specialty_Clinic],[Conscode_Clinic],

[Consultant_Clinic],[Locspec_Clinic],[ClinLoc],[ClinDate],

CONVERT(datetime,[clindate],112) as 'Date',datepart(WK,CONVERT(datetime,[clindate],112)) as 'Week',datename(dw,CONVERT(datetime,[clindate],112)) as 'Day',

b.[ApTime],iif(b.[aptime]>'1259','Afternoon','Morning') as 'Session Type',c.[Attime],c.[Seentime], [Duration],[Slot],[Booked],[Empty],b.[Unitno],[ReasonableOffer],

[ForceBookOnly],[OvbkSlot],[Bkdover],[CBSlot],[CBBkd],b.[Refsrce],[Attendance],c.Walkin,[Cancelled],[VisitType],b.schdl_refno,

a.speccode,

a.division,

c.attended_dna_cds,

c.First_Attendance,

iif(c.[first_attendance]='1','New',iif(c.[first_attendance]='2','Follow Up',iif(c.[First_attendance]='3','Telephone',iif(c.[first_attendance]='4','Telephone','Empty')))) as TYPE,

[Booked] * [duration] as APPOINTMENTTIME

from

[daily_builds].[dbo].[Daily_Demand_and_Capacity] b

left join information.dbo.specialty a on b.Locspec_clinic = a.speccode

left join [final].dbo.final_op C on b.schdl_refno = c.Schdl_Refno

where (a.Division = 'B') and ([clindate] >= '20160301' and [clindate] <= '20170228')

This is essentially and gaps & islands type of issue - and you can use several different methods for identify them. Using LEAD/LAG would be one of the simpler methods in my opinion.

Since you have a calculation for the 4th column - it gets a bit tricky to structure...something like this:

   With dailyDemand
     As (
 Select b.ID
      , b.[Clino]
      , [Past/Future]
      , [Specialty_Clinic]
      , [Conscode_Clinic]
      , [Consultant_Clinic]
      , [Locspec_Clinic]
      , [ClinLoc]
      , [ClinDate]
      , convert(datetime, [clindate], 112) As 'Date'
      , datepart(WK, convert(datetime, [clindate] , 112)) As 'Week'
      , datename(dw, convert(datetime, [clindate] , 112)) As 'Day'
      , b.[ApTime]
      , iif(b.[aptime] > '1259' , 'Afternoon' , 'Morning') As 'Session Type'
      , c.[Attime]
      , c.[Seentime]
      , [Duration]
      , [Slot]
      , [Booked]
      , [Empty]
      , b.[Unitno]
      , [ReasonableOffer]
      , [ForceBookOnly]
      , [OvbkSlot]
      , [Bkdover]
      , [CBSlot]
      , [CBBkd]
      , b.[Refsrce]
      , [Attendance]
      , c.Walkin
      , [Cancelled]
      , [VisitType]
      , b.schdl_refno
      , a.speccode
      , a.division
      , c.attended_dna_cds
      , c.First_Attendance
      , iif(c.[first_attendance] = '1', 'New', iif(c.[first_attendance] = '2', 'Follow Up'
        , iif(c.[First_attendance] = '3', 'Telephone', iif(c.[first_attendance] = '4' , 'Telephone' , 'Empty')))) As Type
      , [Booked] * [duration]  As APPOINTMENTTIME
   From [daily_builds].[dbo].[Daily_Demand_and_Capacity] b
   Left Join information.dbo.specialty a On b.Locspec_clinic = a.speccode
   Left Join [final].dbo.final_op C On b.schdl_refno = c.Schdl_Refno
  Where (a.Division = 'B')
    And ([clindate] >= '20160301' And [clindate] <= '20170228')
        )
 Select *
      , iif(d.[Session Type] <> lag(d.[Session Type], 1, 1) over(Partition By d.Name, d.Type, d.Date Order By d.ApTime), 1, 0) As Match
   From dailyDemand d

Replace the partition columns with the appropriate columns from your actual query and you should get the desired result. I cannot test without actual data to validate...

I moved your query into a CTE so I could reference the computed columns without having to repeat the calculation - which simplifies the LAG function so I can insure the LAG returns a 1 when there isn't a previous row.

1 Like

Hi Jeff,

Thank you ever so much that worked perfect - I only need to make one adjustment (which doesn't seem to work :(!)

Select *
      , iif(Booked = 1, iif(d.[Session Type] <> lag(d.[Session Type], 1, 1) over(Partition By d.Name, d.[Session Type], d.Date Order By d.ApTime), 1, 0),0) As Match
   From dailyDemand d

The records are pulling out correctly, however, its not flagging the afternoon session as a new one. For example, ten records with Person A/13th March 2017/ 5 morning and 5 afternoon - the result in match column is 1 (ideally I'd like 1 in the morning and 1 in the afternoon flagged with the earliest apptime)

I removed type and changed it to session type but no joy :confused:

Thanks for all your help :slight_smile:
Thanks

You changed the partition so the last one is the Date - make the last part of the partition the [Session Type] instead. If that still doesn't work - try including the session type in the order by (Order By [Session Type], d.ApTime).

As you have it now - the 1 row is going to be the first row for Morning for that Date - and the first row for Afternoon for that Date instead of the first Morning for the Date and first Afternoon for the Date.

1 Like