SQLTeam.com | Weblogs | Forums

Trying to get only first instance of Row for Field TrafficMedium


#1

I would like to see my final query results as per below output.

What I have now is below query where I get 2 instances of traffic Medium "Display" but I need only the first instance

I also need an Unassigned value for any Period Start Date which has a Signupdate which is < than Period Start date

**Currently I have below query where I have a self join to itself to get 2 rows of data next to each other but need help to get to the final result. Please advise.

Thanks in advance

Select
C.Eventid, C.AffiliateId, C.AffiliateLogin, C.SignupDate, C.TrafficMediumLabel, C.PeriodStartDate, C.PeriodEndDate, C.TimeStamp, RN
INTO #TempTbl1
from (
Select X.Eventid,X.AffiliateId, X.AffiliateLogin, X.SignupDate, X.TrafficMediumLabel, X.PeriodStartDate, X.PeriodEndDate, X.TimeStamp, ROW_NUMBER()OVER( ORDER BY X.Timestamp ) AS RN
from (
Select A.Eventid, A.AffiliateId, A.AffiliateLogin, A.SignupDate, A.TrafficMediumLabel, A.PeriodStartDate, A.PeriodEndDate, A.TimeStamp
from AnalyticsEventsAffiliateTraffic A
JOIN (
Select AffiliateId, AffiliateLogin, SignupDate, PeriodStartDate, MAX(TimeStamp) [MaxChange]
from AnalyticsEventsAffiliateTraffic where AffiliateLogin = 'sknilllc'
GROUP BY AffiliateId, AffiliateLogin, SignupDate, PeriodStartDate
) B ON B.PeriodStartDate = A.PeriodStartDate AND B.MaxChange = A.TimeStamp
where
A.AffiliateLogin = 'sknilllc') X
) C

SELECT *

FROM (

SELECT 
	A.AffiliateId, A.AffiliateLogin, CAST(A.SignupDate AS DATE) [SignupDate],
	A.TrafficMediumLabel [P1Medium], A.PeriodStartDate [P1StartDate], A.PeriodEndDate [P1EndDate], A.RN [P1Row], B.RN [P2Row],
	B.TrafficMediumLabel [P2Medium], B.PeriodStartDate [P2StartDate], B.PeriodEndDate [P2EndDate]
FROM #TempTbl1 A

FULL OUTER JOIN #TempTbl1 B ON B.RN = A.RN- 1

) X

DROP TABLE #TempTbl1


#2

Not sure why you have things nested like this. There's only one source table that I can see:

AnalyticsEventsAffiliateTraffic

However if you are getting multiple results where you expect single, the problem is probably with the JOIN. It may not be specific enough.

Try running the individual sections (from the inside out) to see where the extra rows come in (or don't disappear as you think they should.

BTW, here's your query reformatted for better readability:

SELECT C.Eventid
      , C.AffiliateId
      , C.AffiliateLogin
      , C.SignupDate
      , C.TrafficMediumLabel
      , C.PeriodStartDate
      , C.PeriodEndDate
      , C.TIMESTAMP
      , RN
INTO #TempTbl1
FROM (
      SELECT X.Eventid
            , X.AffiliateId
            , X.AffiliateLogin
            , X.SignupDate
            , X.TrafficMediumLabel
            , X.PeriodStartDate
            , X.PeriodEndDate
            , X.TIMESTAMP
            , ROW_NUMBER() OVER (
                  ORDER BY X.TIMESTAMP
                  ) AS RN
      FROM (
            SELECT A.Eventid
                  , A.AffiliateId
                  , A.AffiliateLogin
                  , A.SignupDate
                  , A.TrafficMediumLabel
                  , A.PeriodStartDate
                  , A.PeriodEndDate
                  , A.TIMESTAMP
            FROM AnalyticsEventsAffiliateTraffic A
            INNER JOIN (
                  SELECT AffiliateId
                        , AffiliateLogin
                        , SignupDate
                        , PeriodStartDate
                        , MAX(TIMESTAMP) [MaxChange]
                  FROM AnalyticsEventsAffiliateTraffic
                  WHERE AffiliateLogin = 'sknilllc'
                  GROUP BY AffiliateId
                        , AffiliateLogin
                        , SignupDate
                        , PeriodStartDate
                  ) B
                  ON B.PeriodStartDate = A.PeriodStartDate
                        AND B.MaxChange = A.TIMESTAMP
            WHERE A.AffiliateLogin = 'sknilllc'
            ) X
      ) C

SELECT *
FROM (
      SELECT A.AffiliateId
            , A.AffiliateLogin
            , CAST(A.SignupDate AS DATE) [SignupDate]
            , A.TrafficMediumLabel [P1Medium]
            , A.PeriodStartDate [P1StartDate]
            , A.PeriodEndDate [P1EndDate]
            , A.RN [P1Row]
            , B.RN [P2Row]
            , B.TrafficMediumLabel [P2Medium]
            , B.PeriodStartDate [P2StartDate]
            , B.PeriodEndDate [P2EndDate]
      FROM #TempTbl1 A
      FULL JOIN #TempTbl1 B
            ON B.RN = A.RN - 1
      ) X

DROP TABLE #TempTbl1

#3

Thanks Gideon for your prompt reply and the formatting.

The other table is called Dimaffiliatemedium DAM where if I run that against the below query I get the desired result. Sorry should have clarified this is as I am conducting testing to ensure that the results of DAM match the result in the DImAnalyticsAff traffic. So essentially I am trying to recreate the same table results via this query.

What part of my query/join should I modify to get only one row for Display (the first instance in this case) and the 2nd one should be discarded keeping the rest results. Please advise

Thanks


#4

Also, to clarify, the results are correct, I just need the first instance of record 3 (the 2nd instance which is record 4 should be discarded) as next step via this query.

Hope that clarifies my issue better.


#5

Would you please post a CREATE TABLE statement for this table and some sample data as INSERT INTO statement(s)? That will make it easier to test and give you a proper solution


#6

Create Table DimAffiliateMedium
(Affiliateid varchar(255),
AffiliateLogin varchar(255),
Trafficmedium varchar(255),
Sigupdate varchar(255),
PeriodStartDate date,
PeriodEnddate date
)

Insert into DimAffiliateMedium (Affiliateid, AffiliateLogin,Trafficmedium, Sigupdate, PeriodStartDate, PeriodEnddate)
VALUES (2811, 'sknilllc', 'Chat', 2007-03-29, 2014-12-03, 2014-12-09 );

Insert into DimAffiliateMedium (Affiliateid, AffiliateLogin,Trafficmedium, Sigupdate, PeriodStartDate, PeriodEnddate)
VALUES (2811, 'sknilllc', 'Chat', 2007-03-29, 2014-12-10, 2015-10-12);

Insert into DimAffiliateMedium (Affiliateid, AffiliateLogin,Trafficmedium, Sigupdate, PeriodStartDate, PeriodEnddate)
VALUES (2811, 'sknilllc', 'Chat', 2007-03-29, 2015-10-13, NULL);

Imp Note
1)Please note here that if NULL in PeriodEnddate means there is no Enddate as that is the current Medium at this time, hence it is NULL or has no Enddate assigned
2) Also if the PeriodStartDate is < SignupDate, then till the Medium is assigned to the instance, it would fall in the category of "Unassigned" from the Signupdate till the Startdate when it got assigned a medium. So in this case, the instance should be "Unassigned" from 2007-03-29 to 2014-12-02 as it got assigned to "Chat" on 2014-12-03. So, essentially this is a dummy record which should be added dynamically to the result set.


#7

Something like this?

SELECT Affiliateid
      , AffiliateLogin
      , Trafficmedium
      , Sigupdate
      , PeriodStartDate
      , PeriodEnddate
FROM (
      SELECT Affiliateid
            , AffiliateLogin
            , CASE 
                  WHEN PeriodEnddate IS NULL
                        THEN 'Unassigned'
                  WHEN Sigupdate < PeriodStartDate
                        THEN 'Unassigned'
                  ELSE Trafficmedium
                  END AS Trafficmedium
            , Sigupdate
            , PeriodStartDate
            , PeriodEnddate
            , ROW_NUMBER() OVER (
                  PARTITION BY Affiliateid
                  , Trafficmedium ORDER BY PeriodEnddate
                  ) AS rn
      FROM DimAffiliateMedium
      ) _
WHERE rn = 1

#8

I tried this code but the Signupdate field is in another table called AnalyticsEventsTraffic and the common field for Join is Affiliate ID (which also has duplicates) , so this query would not work although I understand the logic.

Can you please help on how I can join the 2. I did try joining but not able to make it work.

- Also what does this mean and do I did not quite understand, especially the "_ " underscore

ROW_NUMBER() OVER (
PARTITION BY Affiliateid
, Trafficmedium ORDER BY PeriodEnddate
) AS rn
FROM DimAffiliateMedium
) _
WHERE rn = 1

Here's my modified code based on what you had

Select M.Eventid, M.AffiliateId, M.AffiliateLogin, M.SignupDate, M.TrafficMediumLabel, M.PeriodStartDate,M.PeriodEndDate, M.TimeStamp, RN
INTO #Temptbl2
from
(Select
C.Eventid, C.AffiliateId, C.AffiliateLogin, C.SignupDate, C.TrafficMediumLabel, C.PeriodStartDate, C.PeriodEndDate, C.TimeStamp, RN
from (
Select X.Eventid,X.AffiliateId, X.AffiliateLogin, X.SignupDate, X.TrafficMediumLabel, X.PeriodStartDate, X.PeriodEndDate, X.TimeStamp, ROW_NUMBER()OVER( ORDER BY X.Timestamp ) AS RN
from (
Select A.Eventid, A.AffiliateId, A.AffiliateLogin, A.SignupDate, A.TrafficMediumLabel, A.PeriodStartDate, A.PeriodEndDate, A.TimeStamp
from AnalyticsEventsAffiliateTraffic A
JOIN (
Select AffiliateId, AffiliateLogin, SignupDate, PeriodStartDate, MAX(TimeStamp) [MaxChange]
from AnalyticsEventsAffiliateTraffic where AffiliateLogin = 'sknilllc'
GROUP BY AffiliateId, AffiliateLogin, SignupDate, PeriodStartDate
) B ON B.PeriodStartDate = A.PeriodStartDate AND B.MaxChange = A.TimeStamp
where
A.AffiliateLogin = 'sknilllc') X
) C
) M

JOIN
(SELECT Affiliateid
, AffiliateLogin
, Trafficmediumlabel
, PeriodStartDate
, PeriodEnddate
FROM (
SELECT Affiliateid
, AffiliateLogin
, CASE
WHEN PeriodEnddate IS NULL
THEN 'Unassigned'
WHEN Signupdate < PeriodStartDate
THEN 'Unassigned'
ELSE TrafficMediumLabel
END AS Trafficmediumlabel
, Signupdate
, PeriodStartDate
, PeriodEnddate
, ROW_NUMBER() OVER (
PARTITION BY Affiliateid
, Trafficmedium ORDER BY PeriodEnddate
) AS rn
FROM FCTBI_DW.dbo.DimAffiliateMedium
) _
WHERE rn = 1) Y ON Y.AffiliateId = M.AffiliateId

DROP TABLE #Temptbl2


#9

It's the alias for the subexpression. In SQL, identifiers must begin with an alphabetical character, #, or _ but there's no requirement for a second character! i use _ when I don't care about the alias name

Your modified query is about as bad as the first one -- maybe worse. I don't think you need those nested subqueries. Try to rewrite it without them. As it is it's pretty hard to follow with some source data.


#10

Thanks Gideon so its just when you dont want to use an alias you use the _. In this case, it is for the DimaffiliateMedium table correct?.

I am hoping you can help me modify my final query I sent you to get the final results when you get a chance.