Join based on the maximum letter match from other Table

</ Hi,I have a scenario where we have to match the record based on the highest value of sequence matched in 2nd table.
In the 1st table I have 5 rows and in 2nd table I have corresponding matching records .for an example in 1st table I have a record with Value 123456789 but in 2nd table I can have values like ,
1
12
123
1234
12345
So in this case the last value which is matching the most letters should match and get the result output...The output expected:
123456789 -> 12345

Below is some example and expected output required:

TABLE_1
COL_1

123456789
195678434
114678900
456789800
112332456

TABLE_2
COL_1 NAME
1 A
12 B
123 C
1234 D
12345 E <- This should match because max character is matching in a sequence
1956 A
195678 B <- This should match because max character is matching in a sequence
1146 A
11467 C <- This should match because max character is matching in a sequence
4567 A <- This should match because max character is matching in a sequence
112332456 E <- This should match because max character is matching in a sequence
EXPECTED OUTPUT
COL_1 NAME
123456789 E
195678434 B
114678900 C
456789800 A
112332456 E

hi

i am having difficulty understanding what you are saying

By now you should know to post comsumable test data!
I am assuming the number are strings, if not you will have to do the casting yourself.

SET ANSI_NULLS, QUOTED_IDENTIFIER, ANSI_PADDING ON;
GO
CREATE TABLE #t1
(
	Col1 varchar(10) NOT NULL PRIMARY KEY
);
INSERT INTO #t1
VALUES ('123456789'), ('195678434')
	,('114678900'), ('456789800'), ('112332456');

CREATE TABLE #t2
(
	Col1 varchar(10) NOT NULL PRIMARY KEY
	,ColName char(1) NOT NULL
);
INSERT INTO #t2
VALUES ('1', 'A'), ('12', 'B'), ('123', 'C'), ('1234', 'D'), ('12345', 'E'), ('1956', 'A')
	,('195678', 'B'), ('1146', 'A'), ('11467', 'C'), ('4567', 'A'), ('112332456', 'E');

The following seems to do what you want:

WITH Col1Order
AS
(
	SELECT T1.Col1, T2.ColName
		,ROW_NUMBER() OVER (PARTITION BY T1.Col1 ORDER BY T2.ColName DESC) AS rn
	FROM #t1 T1
		JOIN #t2 T2
			ON T1.Col1 LIKE T2.Col1 + '%'
)
SELECT Col1, ColName
FROM Col1Order
WHERE rn = 1;
1 Like

An alternative method, that will also check for no matching row at all:


SELECT *
FROM #t1 t1
OUTER APPLY (
    SELECT TOP (1) *
    FROM #t2 t2
    WHERE t1.Col1 LIKE t2.Col1 + '%'
    ORDER BY LEN(t2.Col1) DESC
) AS t2

Hi Team, Thanks for this. Few concerns.

  1. Is there any other way to get rid of Row_Number because both the tables are having 50Million data and it will take a lot of time. Is there anyway we can introduce regular expression to match the highest matching string directly.
  2. On this PARTITION BY T1.Col1 ORDER BY T2.ColName DESC , it may not be correct always that we have (ORDER BY T2.ColName DESC ) T2.colname id desc order .May have few will be in ascending , only thing we have to pull the respective column from that user

I already provided an alternate method. For performance, you should create an index to support the lookup in the OUTER APPLY query.

Well, your idea is good. We are migrating from SQL server to Snowflake. So looking for some equivalent. Row Number is good but takes lot of time

Is it possible for you to convert with Joins?

hi hope this helps

for 50 million rows data issue = consider python/Linux ( becomes very very fast ) = moving it out of SQL Server

move the 50 million data out fo SQL Server = use python linux ( get the result set ) = move the result back into SQL Server

drop create tables insert data
-- Drop tables if they already exist
DROP TABLE IF EXISTS table_1;
DROP TABLE IF EXISTS table_2;

-- Create TABLE_1
CREATE TABLE table_1 (
    col_1 VARCHAR(50)
);

-- Create TABLE_2
CREATE TABLE table_2 (
    col_1 VARCHAR(50),
    name  VARCHAR(10)
);

-- Insert data into TABLE_1
INSERT INTO table_1 (col_1) VALUES
('123456789'),
('195678434'),
('114678900'),
('456789800'),
('112332456');

-- Insert data into TABLE_2
INSERT INTO table_2 (col_1, name) VALUES
('1', 'A'),
('12', 'B'),
('123', 'C'),
('1234', 'D'),
('12345', 'E'),
('1956', 'A'),
('195678', 'B'),
('1146', 'A'),
('11467', 'C'),
('4567', 'A'),
('112332456', 'E');

t-sql = avoiding row number

SELECT t1.col_1, x.name
FROM dbo.table_1 t1
CROSS APPLY (
    SELECT TOP 1 t2.name
    FROM dbo.table_2 t2
    WHERE LEFT(t1.col_1, LEN(t2.col_1)) = t2.col_1
    ORDER BY LEN(t2.col_1) DESC
) x;

result