Learning SQL and looking for advice on queries

Greetings. First-time poster here. Hoping I picked the right category for this! Someone feel free to re-direct me if I'm in the wrong place. I am learning SQL by way of SQL Server Express 2012 and the good ol' Adventureworks 2012 database. A friend in the business came up with a list of questions for me to design queries to answer. And of course, some of them are giving me a hard time! So I'm trying to be resourceful and look for hints and answers etc., which brings me here.

A question I am currently stumped on is, "What products sold in June 2008, also had sales in July 2008, but saw a decline?"

I can easily get answers to what sold in the given months, but am not sure how to work out a query that coughs up the difference in sales between the two months and highlights those that saw a decline.

If anyone has any helpful tips to get me going in the right direction here, i would greatly appreciate it!

Otherwise, thanks all the same.

Cheers,
Eric

For homework type questions we normally ask what you have tried, and got stuck on, so far.

Seeing what you have done, so far, for the "answers to what sold in the given months" would be a starting point (and will also give us something to Cut & Paste from :slight_smile: rather than having to type some stuff from scratch.

If you post code here please highlight it and press the [</>] button to format it which makes it easier to read.

Excellent! Thank you! Funny thing is, I have several friends who do this for a living, and when I've asked them for advice, their answers are always too broad ("Heck, just pull up sales for the whole year and use that") But my instructions were to "write a SQL query that returns ONLY the data requested, with no extra information/rows returned". And here comes my code thus far. I will be imagining the snickering going on when seasoned pros look at this, but so it goes! And for what it's worth, the textbook I'm using is SAM'S 'Teach Yourself SQL in Ten Minutes' by Ben Forta. I'm going down rabbit holes that seem most likely, though I am aware I may in the entirely wrong hole. With that in mind, I give you this...though I can't find the formatting button you referred to, so I apologize for now.

SELECT COUNT(*) S1.ProductID AS Jun_Prods, Count(*) S2.ProductID As Jul_Prods
FROM Sales.SalesOrderDetail AS S1, Sales.SalesOrderDetail AS S2
ON S1.ProductID = S2.ProductID
AND DATEPART(yy, S1.ModifiedDate) = 2008 AND DATEPART(mm, S1.ModifiedDate) = 06
AND DATEPART(yy, S2.ModifiedDate) = 2008 AND DATEPART(mm, S2.ModifiedDate) = 07;

Well its formatted just fine anyway :slight_smile: Might be because it has leading spaces - anything with 4, or more, spaces on the front of each line formats automatically.

Some suggestions

FROM Sales.SalesOrderDetail AS S1, Sales.SalesOrderDetail AS S2
ON S1.ProductID = S2.ProductID

I assume this should be

FROM Sales.SalesOrderDetail AS S1
    JOIN Sales.SalesOrderDetail AS S2
        ON S1.ProductID = S2.ProductID

ModifiedDate tests should be in a WHERE clause (you could put the S2 tests in the JOIN or the WHERE clause, personal preference - some people only put the KEY columns in the JOIN, some people put everything, and both of those groups probably do some-and-some at some point in their careers!

This works, but is generally seen as A Bad Idea (ask your SQL friends what they think, eh?!!)

The use of the DATEPART (or pretty much any other) function prevents the query from being SARGable. Whenever possible SQL will use an Index to get straight to the heart of a query, however if you use a Function like this then SQL will, instead, process every row to calculate the result of the function, instead of using the index (of course you might not actually have an index on this column ...)

Dates are a bit funny, particularly if they include a time component, so instead of BETWEEN we tend to use >= and < (and have an endpoint which is one-more-than-the-last-item), so something like this:

WHERE S1.ModifiedDate >= '20080601' AND S1.ModifiedDate < '20080701'
AND S2.ModifiedDate >= '20080701' AND S2.ModifiedDate < '20080801'

This is just an "efficiency and performance" tweak though.

I can see where you are going with this, but you can't do it in that way.

You can do

SELECT S1.ProductID, COUNT(S1.ProductID) AS Jun_Prods, Count(S2.ProductID) As Jul_Prods

but that will just give you the number of rows that have that ProductID - which will (I assume) just be the number of Order Rows. (You'll also need a GROUP BY clause, I'll come to that).

I expect what you actually want is the (total) Quantity Sold for each Product across all orders (in the relevant month), for that you need (I'm guessing hat the column, in [SalesOrderDetail], is called [Qty] - I don't have AdventureWorks installed in my database; when you figure out what the actual name is just change it in my code:)

SELECT S1.ProductID, SUM(S1.Qty) AS Jun_Prods, SUM(S2.Qty) As Jul_Prods

and you need to add a

GROUP BY S1.ProductID

this will cause all aggregate functions (SUM, COUNT etc.) to be specific to each value of S1.ProductID - and because we are JOINing S2 on S1.ProductID = S2.ProductID then, in effect, the GROUP BY is on both of those columns as they will match.

Lastly the original question is "but saw a decline", so having calculated the Total Qty for each ProductID in June and July you need to only include the ones which sold less in July. After the GROUP BY clause you can put a HAVING clause - that will operate on the values of the aggregate functions - so after the data has been retrieved and prep'd up ready to display

HAVING SUM(S1.Qty) > SUM(S2.Qty)

There is a further wrinkle ... what if a ProductID that sold in June did not sell anything in July? I would consider that a "decline". Because your query uses a JOIN and WHERE clause then only ProductID which sold in BOTH June AND July will be included. You will need to use an OUTER JOIN to include ALL ProductID from June regardless of whether they sold in July. Also, doing SUM(S2.Qty) for July will give you warnings when no value exists, and

HAVING SUM(S1.Qty) > SUM(S2.Qty)

will NOT consider that a missing (NULL) value for SUM(S2.Qty) is less than SUM(S1.Qty) - NULL compared to anything is always FALSE.

We can help with that OUTER JOIN edge-condition once you've got the main query working. I've not given you a Cut & Paste fully-finished answer, deliberately :), but hopefully enough that you will get there on your own ... but if you get stuck post your work so far and any error message, or incorrect results values (and the values you actually expect to see) and someone here will nudge you in the right direction.

1 Like

Thank you SO MUCH for your help! This is great. I'll keep you posted.

Kristen's response is excellent! Except that I don't think you need to worry about OUTER JOIN or "no sales in July counting as a decline", because your original conditions clearly stated "also had sales in July 2008".

I'm sure that was intentional, so that you didn't have to worry about using an OUTER join yet.

I was just discussing that with another SQL acquaintance! So whew!

Ah! That saves me a job!! thanks :slight_smile: