</ Hi,I have a scenario where we have to match the record based on the highest value of sequence matched in 2nd table.
In the 1st table I have 5 rows and in 2nd table I have corresponding matching records .for an example in 1st table I have a record with Value 123456789 but in 2nd table I can have values like ,
1
12
123
1234
12345
So in this case the last value which is matching the most letters should match and get the result output...The output expected:
123456789 -> 12345
Below is some example and expected output required:
TABLE_1
COL_1
123456789
195678434
114678900
456789800
112332456
TABLE_2
COL_1
NAME
1
A
12
B
123
C
1234
D
12345
E
<- This should match because max character is matching in a sequence
1956
A
195678
B
<- This should match because max character is matching in a sequence
1146
A
11467
C
<- This should match because max character is matching in a sequence
4567
A
<- This should match because max character is matching in a sequence
112332456
E
<- This should match because max character is matching in a sequence
By now you should know to post comsumable test data!
I am assuming the number are strings, if not you will have to do the casting yourself.
SET ANSI_NULLS, QUOTED_IDENTIFIER, ANSI_PADDING ON;
GO
CREATE TABLE #t1
(
Col1 varchar(10) NOT NULL PRIMARY KEY
);
INSERT INTO #t1
VALUES ('123456789'), ('195678434')
,('114678900'), ('456789800'), ('112332456');
CREATE TABLE #t2
(
Col1 varchar(10) NOT NULL PRIMARY KEY
,ColName char(1) NOT NULL
);
INSERT INTO #t2
VALUES ('1', 'A'), ('12', 'B'), ('123', 'C'), ('1234', 'D'), ('12345', 'E'), ('1956', 'A')
,('195678', 'B'), ('1146', 'A'), ('11467', 'C'), ('4567', 'A'), ('112332456', 'E');
The following seems to do what you want:
WITH Col1Order
AS
(
SELECT T1.Col1, T2.ColName
,ROW_NUMBER() OVER (PARTITION BY T1.Col1 ORDER BY T2.ColName DESC) AS rn
FROM #t1 T1
JOIN #t2 T2
ON T1.Col1 LIKE T2.Col1 + '%'
)
SELECT Col1, ColName
FROM Col1Order
WHERE rn = 1;
Is there any other way to get rid of Row_Number because both the tables are having 50Million data and it will take a lot of time. Is there anyway we can introduce regular expression to match the highest matching string directly.
On this PARTITION BY T1.Col1 ORDER BY T2.ColName DESC , it may not be correct always that we have (ORDER BY T2.ColName DESC ) T2.colname id desc order .May have few will be in ascending , only thing we have to pull the respective column from that user
for 50 million rows data issue = consider python/Linux ( becomes very very fast ) = moving it out of SQL Server
move the 50 million data out fo SQL Server = use python linux ( get the result set ) = move the result back into SQL Server
drop create tables insert data
-- Drop tables if they already exist
DROP TABLE IF EXISTS table_1;
DROP TABLE IF EXISTS table_2;
-- Create TABLE_1
CREATE TABLE table_1 (
col_1 VARCHAR(50)
);
-- Create TABLE_2
CREATE TABLE table_2 (
col_1 VARCHAR(50),
name VARCHAR(10)
);
-- Insert data into TABLE_1
INSERT INTO table_1 (col_1) VALUES
('123456789'),
('195678434'),
('114678900'),
('456789800'),
('112332456');
-- Insert data into TABLE_2
INSERT INTO table_2 (col_1, name) VALUES
('1', 'A'),
('12', 'B'),
('123', 'C'),
('1234', 'D'),
('12345', 'E'),
('1956', 'A'),
('195678', 'B'),
('1146', 'A'),
('11467', 'C'),
('4567', 'A'),
('112332456', 'E');
t-sql = avoiding row number
SELECT t1.col_1, x.name
FROM dbo.table_1 t1
CROSS APPLY (
SELECT TOP 1 t2.name
FROM dbo.table_2 t2
WHERE LEFT(t1.col_1, LEN(t2.col_1)) = t2.col_1
ORDER BY LEN(t2.col_1) DESC
) x;