hi i have the following logic that i thought was working. it pick up the latest file in a folder and processes it based on the keyword of the file and the latest date. i thought it as working and it does if only the files you want are been used for that package. but sometime there will be files sitting in the folder that have to do with other packages that shouldnt be picked up. but the logic is picking them up. can anyone help. my code is as follows
i thought this part where it looks for the keyword in the file would do it. so the file has to have a certain name on it. if it doesnt i thought it would skip it then.
// Get our file pattern search criteria
string filePattern = Dts.Variables["$Package::FilePattern"].Value.ToString();
// Get the files from the root directory that match our file pattern
DirectoryInfo info = new DirectoryInfo(rootDirectory);
FileInfo[] files = info.GetFiles(filePattern);
You should be careful relying on the LastWriteTime of the files - this can be changed and may not be the correct value. For example, if your files are date stamped (e.g. MyInputFile_YYYYMMDD.csv) - the last write time may not coincide with the date stamp - and could be before or after another file with a different date stamp.
If you are attempting to process a set of files in date order - you would be better off creating a variable to hold the sorted file list (as an object in SSIS) and using that object in a for each. If that is something you are interested in I can put together some examples.
So what exactly are you trying to accomplish? Do you just want to process a set of files in a folder - or is there other criteria that needs to be met?
Assume the files are date stamped - and you want to process those files in order of the date represented in the file name.
Add a Project parameter named: RootDirectory - type String - populate the value with the root directory where you will search for the files. This can be a UNC path or a drive letter and path.
Create the following package level variables:
Add a package parameter named: FilePattern as a string and a value that identifies the files to be selected (e.g. MyFile*.csv)
Add a script task in the Control Flow - set the ReadOnlyVariables as follows:
public void Main()
{
IFormatProvider provider = null;
// Get the root and archive directories
string rootDirectory = Dts.Variables["$Project::RootDirectory"].Value.ToString();
string archiveDirectory = Dts.Variables["User::ArchiveDirectory"].Value.ToString();
// Create the archive directory (if it already exists - nothing done)
Directory.CreateDirectory(archiveDirectory);
// Create our unsorted dataset and datatable
DataSet dsUnsorted = new DataSet();
DataTable fileListTable = dsUnsorted.Tables.Add();
fileListTable.Columns.Add("FileName", typeof(string));
fileListTable.Columns.Add("ArchiveFileName", typeof(string));
fileListTable.Columns.Add("FileDate", typeof(DateTime));
// Get our file pattern search criteria
string filePattern = Dts.Variables["$Package::FilePattern"].Value.ToString();
// Get the files from the root directory that match our file pattern
DirectoryInfo info = new DirectoryInfo(rootDirectory);
FileInfo[] files = info.GetFiles(filePattern);
// Loop through each file found
foreach (FileInfo file in files)
{
// Get File Date from file name
string fileName = Path.GetFileNameWithoutExtension(file.Name);
DateTime fileDate = DateTime.ParseExact(fileName.Substring(fileName.Length - 8), "yyyyMMdd", provider);
string archiveDestination = Path.Combine(archiveDirectory, fileDate.ToString("yyyyMM"));
Directory.CreateDirectory(archiveDestination);
// Add file information to the datatable
fileListTable.Rows.Add(file.Name, Path.Combine(archiveDestination, file.Name).ToString(), fileDate);
}
// Sort the files by file date
DataView dvSorted = new DataView(fileListTable);
dvSorted.Sort = "FileDate ASC";
// Convert DataView to DataSet
DataSet dsSorted = new DataSet();
dsSorted.Tables.Add(dvSorted.ToTable());
Dts.Variables["SortedFileList"].Value = dsSorted;
Dts.TaskResult = (int)ScriptResults.Success;
}
Create an archive directory in the root identified in the first step. This code will create a dated folder for each month but that can be adjusted if needed.
Inside the for each loop - setup your file processing as needed. Use expressions on the input/source to identify the file based on the variable User::FileName
Following the data load - add a new script task to archive the file (inside the for each loop). Add the variables User::ArchiveFileName and User::FullFileName as ReadOnlyVariables. Use the following code to archive the file:
Note: the reason we use a script task for this instead of the file system task is that 'moving' a file retains system settings. If you compress the Archive folder (recommended) then moving the file will override the compression setting and the file will not be compressed - so we copy the file then delete the original file.
hey. just working on the logic and keep running in to this error. its to do with the filedate. it keeps showing up as + fileDate {1/1/0001 12:00:00 AM} System.DateTime and failing on the line DateTime fileDate = DateTime.ParseExact(fileName.Substring(fileName.Length - 8), "dd/mm/yyyy", provider);
Yes - that line of code attempts to pull the date stamp from the file name. For example - if your file name is MyFile_20201214.csv it extracts the last 8 characters from the file name without extension (MyFile_20201214) which gives us 20201214...
It then parses that string to a datetime value.
To set that correctly - we need a fixed and consistent file naming standard and the format the date will be structured.
If you can provide an example of the file names you expect - we can adjust that line.
However - if you do not have a fixed format for the file name then this code will not work as is...but it can be adjusted depending on how you want to handle the file date for your files.
a i see the issue. thats probably going to be a problem as files dont have dates in the name and file could be called different things. the one constant is that it will have GL text in the name.
so file can be like this
How are you going to identify the files you want this package to process? What is the file pattern that will only select those files and leave the other files alone?
Would that be: GL.csv
Can there be other files with GL in the name that should not be processed?
As for creating the sorted list of files by a date, we can change how we get the date to use the LastWriteTime of the file itself...although that isn't as reliable it might still work.
Change the code to the above - this will comment out the check for the date stamp on the file and use the LastWriteTime associated with the file.
If there is some other way to organize the files in an appropriate order - that can be added. For example, if you want to process the files by the first part of the file name (e.g. 302150, 261750, 21112) then you can parse that information out of the file name - add a new column to the data table - add the value to the data table and then sort by that field and the FileDate in the DataView.
To review - you have 2 issues here. The first issue is how to identify the files to be processed by this package. The second issue is how to identify the order of processing for those files.
Once you have that information we can modify the code to handle that and return a sorted file list to process the files in the appropriate groups and order.
only files with gl should be processed and there will never be files with gl in the name that should not be processed.
the reason im going with the latestdate on the file in my original code was to basically order them.
so my folder might looks like this
302150_JPY_GL
261750 GL
21112 GL
302150_JPY_holdings
so i was trying to get it to pick the lasts date on the files with gl in the name pick that one up process it then move it to archive then pick up the next . but if that holdings file had the latest date that was been picked up even thought i thought i had logic in my code to only look for the data with the keyword gl.
i will try the change of logic see how that works. thanks for all the help
The file pattern should be GL.{ext} - not sure what extension you have for these files, but set that portion as well. If they are CSV files then GL.csv - if text files then GL.txt.
Hi, I am trying to follow your post but seems like I am missing something. I need to look for latest file from directory and move it to the different directory.
For example, I have below files in the folder and need to look for the latest test file(Test_20221011.txt) from the directory. I appreciate your help.
Hi, I am trying to follow your post but seems like I am missing something. I need to look for latest file from directory and move it to the different directory.
For example, I have below files in the folder and need to look for the latest test file(Test_20221011.txt) from the directory. I appreciate your help.
To find the latest file based on the timestamp in the filename - you want to set the FilePattern to "Test_*.txt". This will limit the results to only those files that begin with 'Test'.
Since the last 8 characters of the filename are in the format YYYYMMDD - the above code will find all files with that pattern and convert the timestamp to an actual date.
The list of files found will then be returned sorted by the FileDate that was generated.
If you reverse the order then the first item in the list will be the newest file. You can then use that to drive the rest of your processing.