Hi all, having some trouble with a query that is taking too long to run. The query is meant to pull back an identifier and name from a very large table, restricted to only a set of ids that appear in another large table. I'd like to avoid indexing the table, or using other techniques that can't be contained within the query, if possible.
Here's my query. The customer_data table is huge, so I want to limit it to current customers only, by
selecting id from the much smaller current_cusomters table:
select * from
(select id, first_name, last_name
from customer_data
where id in (select id from current_customers where year = 2022) )
If I simply pass the ids directly, it works great:
select * from
(select id, first_name, last_name
from customer_data
where id in (560,561) )
Can anyone explain why the first query takes forever (I've had to terminate it each time since it never finishes), but the second query works almost instantly? Is there a better way to optimize my first query? Been research it all day but can't seem to find a good solution. Thanks in advance!!
I would also recommend validating there actually is an 'id' column in the current_customers table. By not aliasing the tables - and not using the alias for columns, it is entirely possible that there isn't an 'id' column in that table and that subquery is referencing the id column in the outer query.
when
the SQL query you write ... is going through HUGE data .. then it will take forever and ever and ever
this is a very common issue since the begining of time
to overcome this .. all different types of solutions have been thought of
Partitioning of data
Indexing
Run the query on a super computer ( rent it ) with huge capablilties
go through the data in small parts at a time
Switch to some other databases ( Oracle or MonogDB) meant for this purpose
from the inception designing with Huge Data in mind ..called thinking of Scalability