Join based on the maximum letter match from other Table

</ Hi,I have a scenario where we have to match the record based on the highest value of sequence matched in 2nd table.
In the 1st table I have 5 rows and in 2nd table I have corresponding matching records .for an example in 1st table I have a record with Value 123456789 but in 2nd table I can have values like ,
1
12
123
1234
12345
So in this case the last value which is matching the most letters should match and get the result output...The output expected:
123456789 -> 12345

Below is some example and expected output required:

TABLE_1
COL_1

123456789
195678434
114678900
456789800
112332456

TABLE_2
COL_1 NAME
1 A
12 B
123 C
1234 D
12345 E <- This should match because max character is matching in a sequence
1956 A
195678 B <- This should match because max character is matching in a sequence
1146 A
11467 C <- This should match because max character is matching in a sequence
4567 A <- This should match because max character is matching in a sequence
112332456 E <- This should match because max character is matching in a sequence
EXPECTED OUTPUT
COL_1 NAME
123456789 E
195678434 B
114678900 C
456789800 A
112332456 E

hi

i am having difficulty understanding what you are saying

By now you should know to post comsumable test data!
I am assuming the number are strings, if not you will have to do the casting yourself.

SET ANSI_NULLS, QUOTED_IDENTIFIER, ANSI_PADDING ON;
GO
CREATE TABLE #t1
(
	Col1 varchar(10) NOT NULL PRIMARY KEY
);
INSERT INTO #t1
VALUES ('123456789'), ('195678434')
	,('114678900'), ('456789800'), ('112332456');

CREATE TABLE #t2
(
	Col1 varchar(10) NOT NULL PRIMARY KEY
	,ColName char(1) NOT NULL
);
INSERT INTO #t2
VALUES ('1', 'A'), ('12', 'B'), ('123', 'C'), ('1234', 'D'), ('12345', 'E'), ('1956', 'A')
	,('195678', 'B'), ('1146', 'A'), ('11467', 'C'), ('4567', 'A'), ('112332456', 'E');

The following seems to do what you want:

WITH Col1Order
AS
(
	SELECT T1.Col1, T2.ColName
		,ROW_NUMBER() OVER (PARTITION BY T1.Col1 ORDER BY T2.ColName DESC) AS rn
	FROM #t1 T1
		JOIN #t2 T2
			ON T1.Col1 LIKE T2.Col1 + '%'
)
SELECT Col1, ColName
FROM Col1Order
WHERE rn = 1;
1 Like

An alternative method, that will also check for no matching row at all:


SELECT *
FROM #t1 t1
OUTER APPLY (
    SELECT TOP (1) *
    FROM #t2 t2
    WHERE t1.Col1 LIKE t2.Col1 + '%'
    ORDER BY LEN(t2.Col1) DESC
) AS t2

Hi Team, Thanks for this. Few concerns.

  1. Is there any other way to get rid of Row_Number because both the tables are having 50Million data and it will take a lot of time. Is there anyway we can introduce regular expression to match the highest matching string directly.
  2. On this PARTITION BY T1.Col1 ORDER BY T2.ColName DESC , it may not be correct always that we have (ORDER BY T2.ColName DESC ) T2.colname id desc order .May have few will be in ascending , only thing we have to pull the respective column from that user

I already provided an alternate method. For performance, you should create an index to support the lookup in the OUTER APPLY query.

Well, your idea is good. We are migrating from SQL server to Snowflake. So looking for some equivalent. Row Number is good but takes lot of time

Is it possible for you to convert with Joins?