SQLTeam.com | Weblogs | Forums

Need to edit code without CTE

Hello Community,

The following code uses CTE (Common Table Expression). Can someone help me rewrite the code without CTE.

Basically, I would like the same result but without using CTEs.

;
WITH TallyTable_CTE
AS
(
SELECT     ROW_NUMBER() OVER (ORDER BY StockCode) AS ID
FROM       Data.Stock
)
,LastDayOfMonth_CTE
AS
(
SELECT        EOMONTH(DATEFROMPARTS(2016, ID, 1)) AS LastDayDate
FROM         TallyTable_CTE
WHERE        ID <= 12
)
SELECT        CTE.LastDayDate
             ,SUM(SLS.SalePrice) AS TotalDailySales
FROM         Data.SalesByCountry SLS
INNER JOIN   LastDayOfMonth_CTE CTE
             ON CTE.LastDayDate = CAST(SLS.SaleDate AS DATE)
GROUP BY     CTE.LastDayDate
ORDER BY     CTE.LastDayDate

Please let me know if you need anything from me to assist you in finding a solution

Thanks

Carlton

Why don't you want to use a CTE?

Hi Robert,

I don't want to use CTEs because the code won't work in Apache Spark.SQL

You have bigger problems - there are functions in your code that are SQL Server specific which need to be modified for Spark SQL.

For example - Spark does not have an EOMONTH function so you need to modify that portion to use Spark specific code to determine the end of the month.

As far as I can find - Spark does support common table expressions, but if you don't want to utilize a CTE then convert it to a derived table.

SELECT ...
FROM (SELECT ...) t

Just one more note: are you sure you want to SUM the totals from the last day of the month only? Doesn't that just show you the total sales that occurred on the last day of the month and not the total sales for the full month?

How Jeff,

Thanks for reaching out. .. you're right, I have bigger problems with Spark functions

Spark does have a LAST_DAY function which is similar to EOMonth in SQL Server.
https://spark.apache.org/docs/2.3.0/api/sql/search.html?q=last+day

Your original code also has a nasty performance issue in the first CTE. You're calculating way more "IDs" than the 12 you need. Perhaps having a real Tally Table to read from would do the trick for you. I say that because I've not been able to locate either a UNION ALL or VALUES clause in the reference where you could build a sub-query with the values of 0 through 11 to generate the first day of each month for a year and then use the following to get the end-of-month date for each month.

Spark SQL also had the ability to add months even if their DATEADD functionality is quite limited. It has an ADD_MONTHS function.
https://spark.apache.org/docs/2.3.0/api/sql/#add_months

Looking further at your code - I am not sure you need a tally table or derived table.

 Select last_day(sls.SaleDate) As LastDay
      , sum(sls.SalePrice) As TotalDailySales
   From Data.SalesByCountry sls
  Where sls.SaleDate = last_day(sls.SaleDate)
    And sls.SaleDate >= '2016-01-01'
    And sls.SaleDate <  '2017-01-01'
  Group By
        last_day(sls.SaleDate)
  Order By
        last_day(sls.SaleDate)

If you want to sum the totals for every day in the month - then you don't need 'sls.SaleDate = last_day(sls.SaleDate)', but if you want to sum only the sales from the last day of the month then you need to include that portion.

If you want to use a derived table with a sequence of dates, you can try something like:

 Select last_day(sls.SaleDate) As LastDay
      , sum(sls.SalePrice) As TotalDailySales
   From Data.SalesByCountry sls
  Inner Join (Select sequence('2016-01-01', '2016-12-31', interval 1 month) As StartOfMonth) As t On last_day(t.StartOfMonth) = sls.SaleDate
  Group By
        last_day(sls.SaleDate)
  Order By
        last_day(sls.SaleDate)

Since I don't have Spark SQL available - my syntax may be incorrect...hopefully this will give you some ideas.

2 Likes

Jeffw's first example looks to the right way to do that.