SQLTeam.com | Weblogs | Forums

Working with large data set


#1

Hello All,
I am trying to run a query that will result in large dataset like 80 million records. I tried it several time and it error out on memory or try again when less users are active. I am using No Lock with all the table that I join to. Also added noncluster index to fields that are used in joins. At this point I am trying to find a way to break down the output without missing any data from source table into different batches in SQL Server. Any recommendations or steps for how to divide the results in different resultset and insert into destination table

Thanks so much!!


#2

show us what you have.
table schemas

join columns and where clauses.


#3

What is the end goal? Are you outputting the data to another system - to a table or just to the screen? Why do you need 80 million rows in the results and how are you going to use it?


#4

Thanks Jeff and Tito, we need to load the data to an archive solution system before it get sunset. The query has propriety info so I will need to setup a dummy query and schema in order to publish it.. as of now I was able to save it to file but I have never worked with bulk load insert settings to looking for some expert help


#5

I am trying to send the information like table definition but getting error sory you can not add links.


#6

Is the problem with the select statement - or the insert process?


#7

insert process


#8

Are you able to see the information that I replied using my email. I responded to SQL Team email. The select result is now stored in .rpt file.


#9

Okay - you can use SSIS with an OLEDB Destination. In the OLEDB destination, set the batch/commit sizes to 200,000. This will commit the transaction every 200,000 records - which will allow the process to clear/mark for reuse the space in the transaction log (if in simple recovery). If in full recovery - a transaction log backup would need to be performed to mark the space as reusable.

This will also reduce the amount of memory being utilized for the insert process - as it doesn't have to hold everything in memory between batches.


#10

The file is .rpt, I changed the type to .txt and selected in SSIS as source file
For OLEDB SQL Server destination it only read one column. I edited the table to correct columns but it still gave me error. Is there any article that can show the steps with example? I am searching but did not find anything specific


#11

There are plenty of SSIS tutorials available - you just have to search for them.

How is the file structured? Is it a delimited file - fixed width - or something else? How you create and setup the file source will depend on how the file is structured.

But - none of that should be needed. You originally stated the source of the data was a query - use that as the source in SSIS and setup the OLEDB destination to point to the destination table. Map the columns from the source to the destination - setting the batch/commit sizes to insure you don't cause issues with the transaction log and you should be fine.


#12

are you archiving the whole db or a subset?


#13

Thanks Yosiasz and Jeff for your response:
We are archiving complete datsets from db. I am working on one of them
I was able to use SSIS with query as source and have the records saved in destination table. When I used the file (.rpt) as source then it didn't work. It seem to be tab delimited file. But I am not using that option any more. Trying to see if can getter better timings.

Thanks


#14

have you looked at the option of using backups?


#15

Why do you need to get better timings? Is this a process that is going to be repeated on a scheduled basis? I thought this was a one time process to archive the data before the old system is sunset. If so - then it really doesn't matter how long it takes to complete.

If there is some other reason you need to improve the performance - then you need to provide the query you are using as the source, as that is going to be where you see the biggest performance issues.

Also - provide how you have setup and configured the OLEDB source and destinations. There are some settings and configurations that will improve performance...for example, on the source you want to use either a select statement or execute a stored procedure and not pull directly from a table - and on the destination you want to use the fast load option with large enough batch/commit sizes assigned.


#16

/*
-- Destination Table Definition
USE [TargetDB]
GO

/****** Object: Table [dbo].[OrderMad] Script Date: 6/7/2018 1:00:40 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[OrderMad](
[_ID] [int] IDENTITY(-1247483648,1) NOT NULL,
[TEXT] varchar NULL,
[EStatus]varchar NULL,
[Inst]varchar NULL,
[InstSp] varchar NULL,
[Name] varchar NULL,
[OTTL] varchar NULL,
[OFNAME] varchar NULL,
[OLNAME] varchar NULL,
[OMIDNM] varchar NULL,
[OIDP] varchar NULL,
[ODATE] NULL,
[ONUM] varchar NULL,
[PID] varchar NULL,
[ADD1] varchar NULL,
[ADD2] varchar NULL,
[CTY] varchar NULL,
[ST] varchar NULL,
[PO] varchar NULL,
[LOCNM] varchar NULL,
[ETTL] varchar NULL,
[EFNAME] varchar NULL,
[ELNAME] varchar NULL,
[EMIDNM] varchar NULL,
[EIDP] varchar NULL,
[RID] varchar NULL,
[STTL] varchar NULL,
[SFNAME] varchar NULL,
[SLNAME] varchar NULL,
[SMIDNM] varchar NULL,
[SIDP] varchar NULL,
[SDATE] varchar NULL,
[STAT] varchar NULL,
[VID] varchar NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

*/
--INSERT INTO [TargetDB].dbo.[OrderMad]
--[TEXT],
--[EStatus],
--[Inst],
--[InstSp],
--[Name],
--[OTTL],
--[OFNAME],
--[OLNAME],
--[OMIDNM],
--[OIDP],
--[ODATE],
--[ONUM],
--[PID],
--[ADD1],
--[ADD2],
--[CTY],
--[ST],
--[PO],
--[LOCNM],
--[ETTL],
--[EFNAME],
--[ELNAME],
--[EMIDNM],
--[EIDP],
--[RID],
--[STTL],
--[SFNAME],
--[SLNAME],
--[SMIDNM],
--[SIDP],
--[SDATE],
--[STAT],
--[VID]

-- Query to extract records from source system [SourceDB]
SELECT CONVERT(VARCHAR(MAX), LTRIM(RTRIM(ISNULL([tabCom].[TXT], '')))) AS [TEXT]
,LTRIM(RTRIM(ISNULL([tabA].[FLD3_DESC], ''))) AS [EStatus]
,LTRIM(RTRIM(ISNULL(cast([tabB].[FLD4_Dose] as varchar),'')))+' '+LTRIM(RTRIM(ISNULL(cast([tabB].[FLD5_Unit] as varchar),'')))+' '+LTRIM(RTRIM(ISNULL(cast([tabB].[FLD6_RTE] as varchar),''))) AS [Inst]
,LTRIM(RTRIM(ISNULL(cast([tabJ].[FLD2_FRM] as varchar),'')))+': '+LTRIM(RTRIM(ISNULL((case when [primetab].[FLD9_FLG] ='Y' then ltrim(rtrim(cast([tabH].[DESCR] as varchar)))+ltrim('NRP_') else [tabH].[DESCR] end),''))) AS [InstSp]
,LTRIM(RTRIM(ISNULL([tabI].[FLD3_NM], ''))) AS [Name]
,LTRIM(RTRIM(ISNULL([tabN].[FLD2_TTL], ''))) AS [OTTL]
,LTRIM(RTRIM(ISNULL([tabL].[FIRST_NM], ''))) AS [OFNAME]
,LTRIM(RTRIM(ISNULL([tabL].[LAST_NM], ''))) AS [OLNAME]
,LTRIM(RTRIM(ISNULL([tabL].[MID_NM], ''))) AS [OMIDNM]
,LTRIM(RTRIM(ISNULL([tabM].[ID], ''))) AS [OIDP]
,CONVERT(DATETIME, LTRIM(RTRIM(ISNULL(convert(varchar,[primetab].[BEGIN_DTTM],107), '1753-01-01')))) AS [ODATE]
,LTRIM(RTRIM(ISNULL([tabD].[FLD1_NUM], ''))) AS [ONUM]
,LTRIM(RTRIM(ISNULL([primetab].[FLD2_NUM], ''))) AS [PID]
,'' AS [ADD1]
,'' AS [ADD2]
,'' AS [CTY]
,'' AS [ST]
,'' AS [PO]
,LTRIM(RTRIM(ISNULL([tabF].[ID], ''))) AS [LOCNM]
,LTRIM(RTRIM(ISNULL([tabR].[FLD2_TTL], ''))) AS [ETTL]
,LTRIM(RTRIM(ISNULL([tabP].[FIRST_NM], ''))) AS [EFNAME]
,LTRIM(RTRIM(ISNULL([tabP].[LAST_NM], ''))) AS [ELNAME]
,LTRIM(RTRIM(ISNULL([tabP].[MID_NM], ''))) AS [EMIDNM]
,LTRIM(RTRIM(ISNULL([tabQ].[ID], ''))) AS [EIDP]
,LTRIM(RTRIM(ISNULL([tabC].[FLD1_SV_TYPE], ''))) AS [RID]
,LTRIM(RTRIM(ISNULL([tabV].[FLD2_TTL], ''))) AS [STTL]
,LTRIM(RTRIM(ISNULL([tabT].[FIRST_NM], ''))) AS [SFNAME]
,LTRIM(RTRIM(ISNULL([tabT].[LAST_NM], ''))) AS [SLNAME]
,LTRIM(RTRIM(ISNULL([tabT].[MID_NM], ''))) AS [SMIDNM]
,LTRIM(RTRIM(ISNULL([tabU].[ID], ''))) AS [SIDP]
,LTRIM(RTRIM(ISNULL([tabC].[FLD2_Date_Begin], ''))) AS [SDATE]
,LTRIM(RTRIM(ISNULL([tabC].[FLD3_RST], ''))) AS [STAT]
,LTRIM(RTRIM(ISNULL([tabC].[FLD4_CNE], ''))) AS [VID]
FROM [SourceDB].[dbo].[PrimeTable] [primetab] WITH (NOLOCK)

LEFT JOIN	[SourceDB].[dbo]. [TableA] [tabA] WITH (NOLOCK)
	ON [tabA].[FLD1_NAME1] = [primetab].[FLD1_SUBSTATUS_CDE]
	AND [tabA].[FLD2_NUM] = '002'

LEFT JOIN	[SourceDB].[dbo].[TableB] [tabB] WITH (NOLOCK)
	ON [tabB].[FLD2_NUM] = [primetab].[FLD2_NUM]
	AND [tabB].[FLD3_Date_Created] = [primetab].[FLD3_Date_Created]

LEFT JOIN	[SourceDB].[dbo].[TableC] [tabC] WITH (NOLOCK)
	ON [tabC].[FLD5_NUM] = [primetab].[FLD2_NUM]
	AND [tabC].[FLD6_Date_Created] = [primetab].[FLD3_Date_Created]
	AND [tabC].[FLD1_SV_TYPE] = [primetab].[FLD4__TYPE]
	
LEFT JOIN	[SourceDB].[dbo].[TableD] [tabD] WITH (NOLOCK)
	ON [tabD].[FLD2_NUM] = [primetab].[FLD2_NUM]
	AND [tabD].[FLD3_Date_Created] = [primetab].[FLD3_Date_Created]
	AND [tabD].[FLD4_PNUM] = 1


LEFT JOIN	[SourceDB].[dbo].[TableCom] [tabCom] WITH (NOLOCK)
	ON [tabCom].[ORDER_PAT_PERSON_NUM] = [primetab].[FLD2_NUM]
	AND [tabCom].[ORDER_DATA_CREATE_TS] = [primetab].[FLD3_Date_Created]
	AND [tabCom].[ENSCRIBE_PRMCMT_SEQ_NUM] = 1
	AND [tabCom].[ORDER_COMMENT_TYPE_CDE] = 'RO'

LEFT JOIN	[SourceDB].[dbo].[TableE] [tabE] WITH (NOLOCK)
	ON [tabE].[FLD1_NUM] = [tabB].[FLD2_NUM]
	AND [tabE].[FLD2_Date_Created] = [tabB].[FLD3_Date_Created]
	AND [tabE].[FLD3_SEQ] = 1



LEFT JOIN	[SourceDB].[dbo].[TableF] [tabF] WITH (NOLOCK)
	ON [tabF].[NUM] = [primetab].[Fld5]

LEFT JOIN	[SourceDB].[dbo].[TableG] [tabG] WITH (NOLOCK)
	ON [tabG].[CDE] = [tabE].[FLD4]

LEFT JOIN	[SourceDB].[dbo].[TableH] [tabH] WITH (NOLOCK)
	ON [tabH].[ID] = [primetab].[Fld6]

LEFT JOIN	[SourceDB].[dbo].[TableI] [tabI] WITH (NOLOCK)
	ON [tabI].[FLD1_SEQ] = right([tabG].[FLD1_NUM],5)

LEFT JOIN	[SourceDB].[dbo].[TableJ] [tabJ] WITH (NOLOCK)
	ON [tabJ].[FLD1_SEQ] = [tabI].[FLD2_PRD]

LEFT JOIN	[SourceDB].[dbo].[TableK] [tabK] WITH (NOLOCK)
	ON [tabK].[NUM] = [primetab].[Fld7]

	LEFT JOIN	[SourceDB].[dbo].[TableL] [tabL] WITH (NOLOCK)
	ON [tabL].[FLD1_NUM] = [primetab].[Fld7]

LEFT JOIN	[SourceDB].[dbo].[TableM] [tabM] WITH (NOLOCK)
	ON [tabM].[FLD1_NUM] = [primetab].[Fld7]

LEFT JOIN	[SourceDB].[dbo].[TableN] [tabN] WITH (NOLOCK)
	ON [tabN].[FLD1_NUM = [primetab].[Fld7]

LEFT JOIN	[SourceDB].[dbo].[TableK] [tabO] WITH (NOLOCK)
	ON [tabO].[NUM] = [primetab].[Fld8]

LEFT JOIN	[SourceDB].[dbo].[TableL] [tabP] WITH (NOLOCK)
	ON [tabP].[FLD1_NUM = [primetab].[Fld8]

LEFT JOIN	[SourceDB].[dbo].[TableM] [tabQ] WITH (NOLOCK)
	ON [tabQ].[FLD1_NUM = [primetab].[Fld8]

LEFT JOIN	[SourceDB].[dbo].TableN] [tabR] WITH (NOLOCK)
	ON [tabR].[FLD1_NUM = [primetab].[Fld8]

LEFT JOIN	[SourceDB].[dbo].[TableK] [tabS] WITH (NOLOCK)
	ON [tabS].[NUM] = [tabC].[FLD7_User]

LEFT JOIN	[SourceDB].[dbo].[TableL] [tabT] WITH (NOLOCK)
	ON [tabT].[FLD1_NUM = [tabC].[FLD7_User]

LEFT JOIN	[SourceDB].[dbo].[TableM] [tabU] WITH (NOLOCK)
	ON [tabU].[FLD1_NUM = [tabC].[FLD7_User]

LEFT JOIN	[SourceDB].[dbo].[TableN] [tabV] WITH (NOLOCK)
	ON [tabV].[FLD1_NUM = [tabC].[FLD7_User]

WHERE [tabC].[FLD1_SV_TYPE]='DEM'

#17

Last time, I tried copying the query then I received the error "can't add links"
We are in the process of testing before we actually extract from production source. it will be one time after all the data is validated to be correct and also looking for better optimization and response time so it does not time out while other users running their queries at the same time
I was not using procedure, but will send the detail as trying to setup another test
I did add how the select query is setup to insert records in targetDB