ROW_Number () to index Categories

barnabeck · December 14, 2018, 9:32am

I am not achieving this in the simple way, so maybe somebody can help me:

I have this table:

Item    Category    SubCategory    Position
--------------------------------------------
ItemXY   this        whatever         1
ItemAB   this           no            2
ItemCT   that        alphabethical    3
ItemMN   here          order          4
ItemKL   here          order          5
ItemYC   here        required         6

and I want to get the index for Category and Sub-Category like this:

Item    Category    Cat_Idx   SubCategory    Sub_Idx   Position
---------------------------------------------------------------
ItemXY   this          1       whatever         1          1
ItemAB   this          1          no            2          2
ItemCT   that          2       alphabethical    1          3
ItemMN   here          3         order          1          4
ItemKL   here          3         order          1          5
ItemYC   here          3       required         2          6

I can get the Cat_Idx by this:

;with Category as (
select Category, min(Position) as Ord
From Table
Group by Category)

Select Category, ROW_NUMBER () over (Order by Ord) as Cat_Idx
From Category

.... and at that point I only have the categories indexes.

There must be a simpler way.

harishgg1 · December 14, 2018, 10:41am

hi

i tried it

please take a look
let me know if it is ok
thanks

drop create data

use tempdb 

go 


drop table data
go 

create table data
(
Item varchar(100) NULL, 
Category varchar(100) NULL, 
SubCategory varchar(100) NULL
)
go 



insert into data select 'ItemXY','this','whatever'
insert into data select 'ItemAB','this','no'
insert into data select 'ItemCT','that','alphabethical'
insert into data select 'ItemMN','here','order'
insert into data select 'ItemKL','here','order'
insert into data select 'ItemYC','here','required '
go

SQL

; WITH cte 
     AS (SELECT item, 
                category, 
                Dense_rank() 
                  OVER( 
                    ORDER BY category DESC ) AS cat_dx, 
                subcategory 
         FROM   data) 
SELECT item, 
       category, 
       cat_dx, 
       subcategory, 
       Dense_rank() 
         OVER( 
           partition BY category 
           ORDER BY subcategory ) AS sub_idx, 
       Row_number() 
         OVER ( 
           ORDER BY cat_dx)       AS position 
FROM   cte 
ORDER  BY cat_dx 

go

Results

barnabeck · December 14, 2018, 11:05am

Thanks harishgg1! Looks promising, but the order criteria is not correct although the grouping is done the way I want. The Position is the only order criteria for both Category and SubCategory and has to reflect the order at the end of the query. I was trying to correct it in your query but the grouping fails it I do order by Position

harishgg1 · December 14, 2018, 11:20am

there is a way around i think

using rownumber and joining on rownumber

i will try
LATER

thanks

barnabeck · December 14, 2018, 11:29am

Harishgg1, don't worry I have my solution that works. I was just wondering if there would be a more elegant way. This is what I do:

getting the indexes for the categories (like I mentioned)
joining the table on categories
ROW_NUMBER () over (Partition by SubCategory Order by Position) as idx1
ROW_NUMBER () over (Partition by SubCategory Order by Position desc) as idx2
with idx1 and idx2 I get all I need.

Thank you!

JeffModen · December 15, 2018, 6:11pm

Details are in the comments in the code below.

--===== Create and populate the test table on-the-fly.
     -- This is NOT a part of the solution but it is
     -- an excellent way to help those trying to help 
     -- you on future posts.
 SELECT *
   INTO #TestTable
   FROM (
         VALUES
         ('ItemXY','this','whatever'     ,1)
        ,('ItemAB','this','no'           ,2)
        ,('ItemCT','that','alphabethical',3)
        ,('ItemMN','here','order'        ,4)
        ,('ItemKL','here','order'        ,5)
        ,('ItemYC','here','required'     ,6)
        ) v (Item,Category,SubCategory,Position)
;
--===== Solve the problem using a difference between sequences to form numbered groups
     -- which can then be ranked to produce the final ordered index notation.
     -- Note that this code is heavily dependent on the value of the Position column 
     -- not only being in the correct order but being sequential and with no gaps.
     -- If that's not the case, then an additional sequence generation to produce a
     -- column with such sequential/no gap values will need to be added.
   WITH cteGroup AS
(
 SELECT  *
        ,CategoryGroup    = Position - ROW_NUMBER() OVER (PARTITION BY Category ORDER BY Position) 
        ,SubCategoryGroup = Position - ROW_NUMBER() OVER (PARTITION BY SubCategory,Category ORDER BY Position) 
   FROM #TestTable
)
 SELECT  Item
        ,Category
        ,Cat_IDX = DENSE_RANK() OVER (ORDER BY CategoryGroup)
        ,SubCategory
        ,Sub_IDX = DENSE_RANK() OVER (PARTITION BY CategoryGroup ORDER BY SubCategoryGroup)
        ,Position
   FROM cteGroup
  ORDER BY Position
;

mannesravya · December 15, 2018, 11:34pm

SELECT *
INTO #TestTable
FROM (
VALUES
('ItemXY','this','whatever' ,1)
,('ItemAB','this','no' ,2)
,('ItemCT','that','alphabethical',3)
,('ItemMN','here','order' ,4)
,('ItemKL','here','order' ,5)
,('ItemYC','here','required' ,6)
) v (Item,Category,SubCategory,Position)
;

with cte as
(
select *,dense_rank() over(partition by Category order by SubCategory) as Sub_Idx from #TestTable
--order by Category desc
)
select Item,Category,dense_rank() over(order by Category desc) as Cat_Idx,SubCategory,Sub_Idx,Position from cte
order by Category desc

JeffModen · December 16, 2018, 4:43am

That's very clever, mannesravya. The only problem is that the Sub_IDX doesn't follow the order of the Position column. While not explicitly mentioned that it has to do so, the depicted desired results seem to indicate that it must. Obviously, it would be better to do it the way you've done it(and I originally did it until I saw the "mistake" with the "Position" column) because there are fewer windowing functions and so there's also fewer sequencing (ordering) requirements and the code should run faster, but only if the Position column can be ignored.