Opinion on a search query where I only have access to a word list

We are building a unique app, and I keep going back and fourth on the best way to store search data.

We have an encypted message that has to be searched, and the database will not be able to decrypt this message, it is done with a outside cypto library.

I do however need to globally search those messages to know which ones contain certain words, hence the issue. My thoughts are to split the words of the message before it is encrypted and create an array of randomized words.

I could either store the message in a varchar comma seperated , or a JSON list and use the string_split like so

declare @myMessage table (myMessageId int,randomizedWords varchar(max))
insert into  @myMessage(myMessageId,randomizedWords)
select 1,'hello,hi,machine,sports,Love,I'
union all
select 2,'hello,hi,machine,sports,hate'

declare @mySearch varchar(200) = 'I love sports'
declare @table table(myVal varchar(100))
insert into @table(myVal)
select a.value from string_split(@mySearch,' ') a

select a.* from @myMessage a
outer apply(select top 1 aa.* from @table aa left join string_split(a.randomizedWords,',') bb on aa.myVal = bb.value where bb.value is null) c
where c.myVal is null

Alternatively I can store each word in a detail table and not split which would be faster, but it would create a massive table that would not be used that often such as:

declare @myMessage table (myMessageId int,randomizedWords varchar(100))
insert into  @myMessage(myMessageId,randomizedWords)
select 1,value
from
string_split('hello,hi,machine,sports,Love,I',',') a
union all
select 2,value
from
string_split('hello,hi,machine,sports,hate',',') a

declare @mySearch varchar(200) = 'I love sports'
declare @table table(myVal varchar(100))
insert into @table(myVal)
select a.value from string_split(@mySearch,' ') a

declare @perfectMatch int =( select count(*) from @table)
select aa.myMessageId from 
@myMessage aa
inner join @table bb
on aa.randomizedWords = bb.myVal
group by aa.myMessageId
having count(*) = @perfectMatch

Both get me the results I want, but again I need to store just a word list randomized and my search has to match all words for the group in order to be returned. I am open to sugestions.

THanks

My thoughts:

I would create a folder and store the search string in a text file, 1 file a day. You keep them for as long as you want.

Every night you load that textfile into your table. You keep the data in the table for example for 6 months and every day you delete a day and insert a day. You can create a smaller table to keep historical data to report, like

searchword, year, month, number of search.
hello 2023 februari 200

This approch means that you can use your tables for small analytics the past 6 months and reports for the management. If you need to do something data scientific you can use the text-files in the folder.

I appreciate your response. What you are describing is more a way to analyze the searches, I am looking for a quick way to retrieve information on matching searches to retrieve real-time for a front-end. I don't love storing each word as a record as it is not a feature that will be used often and that approach could produce millions of records in a very short time frame, so I was looking at alternatives. Storing in a file system/folder traditionally is not a recommended approach with MSSQL, and there are probably better alternatives for the issue I am having, I would store the entire message in SQL before accessing from a file system as that opens up potential issues and not a best practice.

Very creative approach though, and thanks for your opinion.

If you want to retrieve real-time you should take a look at the full text catalog in MSSQL.

1 Like

That may be a good option. Due to the words being randomized that I am searching through and in no specific order there are a few logic issues, but def worth playing around with. Thanks for the suggestion.