SQLTeam.com | Weblogs | Forums

How good work JSON field queries for searching?


my team (full stack web application development) want to use Django ORM to query a PostgreSQL database table using a JSON field containing arbitrary tag-value pairs (like search tags) with the django ‘contained by’ operation on JSON fields. I hope it is understandable what we are trying to do. It is basically a reversed search engine. We want to select all rows of which the JSON field key-value-pairs are a subset of the requested key-value-pairs. Example: Search for tags Nation:Austria, Gender:Male, Religion:Muslim returns all table entries with any tag (or tag combination) of these tags. THis should work streight forward using the contained-by operator from the django documentation but would this also work on millions of rows within milliseconds and for millions of users (like a search engine)? Is this doable within django using postgreSQL or do you have doubts from your experience? The alternative would be to try ElasticSearch-like databases. My team and I would very much appreciate any suggestion from you.

It's feasible, but for it to perform as best as it can you'll want to use JSONB columns and index them. There's some improvement in syntax and performance lately:

Postgres also has full-text search:

The last link also mentions Django usage. I'm almost positive someone has written something that describes exactly what you're looking for, using JSON and possibly full-text search together. Note that without indexing either or both of your attributes and values, the query engine will default to scanning blobs of text, like scanning the phone book for all phone numbers that contain "333".

If JSONB and its associated indexing doesn't work, nor does full-text, you might be able to parse the JSON into attribute-value pairs in a materialized view, which you can then index. The EAV (entity-attribute-value) model is usually not a good choice, but for dynamic/unknown combinations it could work out well for you.

Here's a writeup on the possibilities of an EAV model:

And more detail on the implementation in SQL Server:

There's a mix of theory and explanation, but it's worthwhile to understand what a relational database engine needs to do to answer such queries quickly. Most of it will translate to Postgres pretty easily. Full disclosure: I've seen Peter Larsson demo this in person, and yes, it can query billions of rows of data with hundreds of attributes in tens of milliseconds. Making this work against parsed JSON is a different beast though, but again, if all other options fail...

FYI, SQLTeam.com is a Microsoft SQL Server site, we're not really experts on Postgres.

Thank you a lot for your detailed answer. I will certainly take a close look at the resources you provided.

How many tags do you currently and how many do you project having? Wonder if a bitwise operators can come in handy here?

use sqlteam

create table objecttypes(objecttypeid int, 
objecttype nvarchar(50))
insert into objecttypes
select 1, 'Posts' union
select 2, 'Persons'

create table posts(postid int, posttitle nvarchar(50), 
postdetails nvarchar(max))
insert into posts
select 1, 'how to tag posts', 'Lorem ipsum dolor sit amet.'
create table tags(tagid int, tag nvarchar(50))
insert into tags
select 1, 'Chicken' union
select 2, 'tandoori' union
select 4, 'rice'

create table objecttags(objectid int, tagid int, objecttype int)

insert into objecttags
select 1, 6, 1

select  * 
 From tags t
 where exists (select 1 
from objecttags ot 
where t.tagid = (ot.tagid & 2) )

drop table posts
drop table objecttypes
drop table tags
drop table objecttags

What I would recommend is to offload this search to something outside of the live postgres json column and into a search specific model.

You'll end up scanning the column, you can't index a bitwise operation without creating a computed column for each bit position. The EAV solution would have better performance.