Performance Issue

Sumit_seth · April 12, 2016, 5:04pm

I have one 'MainTable'(1) as , one 'dispatched table'(2) and 'dispatchedbutNotOrdered'(3).
NOTE: table one is updated with 50K records each day and also there is no primary key.

In dispatchedButNotOrdered(3) table I have to insert those cards which are present in dispatched table(2) but not in MainTable(1). so to do that updation which is faster in terms of performance - Inner join, left join or if exist?

Right now I am using Join and it is taking 3 hrs or more and CPU utilisation is 100%, how can i reduce that?
There is one common field let say account number, Can someone put down a query for above?

JamesK · April 12, 2016, 5:11pm

You said there is no primary key on the MainTable(1). Is there any index on that table? Do you have the permissions/flexibility to create an index or a primary key?

Do you have an index/primary key on the dispatched Table(2)? If not do you have the ability to create one?

If you have no index on these tables, SQL Server will have to read EVERY row in both tables to figure out which records to insert into dispatchedButNotOrdered(3) . I don't think using join vs exists is going to speed up the process to any noticeable degree.

Your logic to determine whether a given row in dispatched table(2) is equal to a row in MainTable(1) can be a candidate key for primary or a non-clustered index.

If you post the query, people may be able to offer additional suggestions.

Sumit_seth · April 12, 2016, 5:25pm

there is no primary key defined on any of the three tables.

Sumit_seth · April 12, 2016, 5:33pm

As I remember the query is -

insert into table tbl_dispatchedNotOrderedLoad (Dispatch_Date,Account_num,PAN,Combined_ACCPAN)
Values
(select d.dispatch date,d.Account_num,d.PAN,d.Combined_ACCPAN
from
tbl_DispatchLoad as d Inner Join tbl_MainDataLoad as m
ON d.Combined_ACCPAN=m.Combined_ACCPAN
where d.Combined_ACCPAN not in(select m.Combined_ACCPAN from tbl_MainDataLoad as m)

gbritton · April 12, 2016, 7:10pm

depending on the sizes of tbl_DispatchLoad and tbl_MainDataLoad this could be your bottleneck. Without some sort of index including
the Combined_ACCPAN in both tables, the only way SQL can do a join is by looping through both tables. imagine that both tables have 100k rows. That's 10 million rows to compare.

jason_clark · April 13, 2016, 5:38am

SQL Server troubleshooting methodology is to hard, because of the actual methodology that I apply depends entirely on the specific problem that I am trying to troubleshoot for a specific environment. However, my basic approach, and the tools I use, remain constant, regardless of whether the problem is users complaining of slow performance, or if I am just performing a standard server health check. Please move on this to know more http://blogs.sqlsentry.com/jasonhall/my-perspective-the-top-5-most-common-sql-server-performance-problems/

Kristen · April 14, 2016, 5:57am

Create these indexes indexes:

CREATE
	UNIQUE	-- If this value is NOT unique in this table then remove this line
	INDEX IX_MainDataLoad
ON tbl_MainDataLoad
(
	Combined_ACCPAN
)
GO

CREATE INDEX IX_DispatchLoad
ON tbl_DispatchLoad
(
	Combined_ACCPAN
)
INCLUDE
(
	[dispatch date],
	Account_num,
	PAN
)
GO

I expect you will get the exact same Query Plan but personally I would change

where d.Combined_ACCPAN not in
(
	select	m.Combined_ACCPAN
	from	tbl_MainDataLoad as m
)

to

where NOT EXISTS
(
	select	*
	from	tbl_MainDataLoad as m
	WHERE	m.Combined_ACCPAN = d.Combined_ACCPAN
)

in case SQL uses a Query Plan that is more efficient.

Reason: NOT IN may require SQL to make a working table of ALL the values of Combined_ACCPAN in tbl_MainDataLoad, that may include duplicate values and include values that are not relevant to the main query (that said, I do expect that the Query Planner would not do it that way, but absent indexes etc. it may have to).

EXISTS requires only finding a single row that matches to resolve.

Also: IF there is ANY NULL value (now, or in the future) in that column/table then the NOT IN will fail.

When you run this next time don't you get duplicate rows in [tbl_dispatchedNotOrderedLoad]?