Picking the lastest file in a folder

hi i have the following logic that i thought was working. it pick up the latest file in a folder and processes it based on the keyword of the file and the latest date. i thought it as working and it does if only the files you want are been used for that package. but sometime there will be files sitting in the folder that have to do with other packages that shouldnt be picked up. but the logic is picking them up. can anyone help. my code is as follows

[code]

            if (checkname.Contains(KeywordGLBAL))
            {
                FileInfo fileinfo;
                fileinfo = new FileInfo(InputDirectory);

                FileInfo[] files = directory.GetFiles();
                DateTime lastModified = DateTime.MinValue;

                foreach (FileInfo file in files)

                {
                    if (file.LastWriteTime > lastModified)
                    {

                        lastModified = file.LastWriteTime;



                        //    Dts.Variables["User::LastCreateTimeGLBAL"].Value = lastModified;
                        LatestFileName = directory + file.ToString();
                        Dts.Variables["User::GLBALDataFile"].Value = LatestFileName;

                      
                    }
                }

[\code]

How do you know which files to process and which to skip? You have to give it some sort of filter otherwise, it's working as you coded it.

i thought this part where it looks for the keyword in the file would do it. so the file has to have a certain name on it. if it doesnt i thought it would skip it then.

(checkname.Contains(KeywordGLBAL))

            // Get our file pattern search criteria
            string filePattern = Dts.Variables["$Package::FilePattern"].Value.ToString();

            // Get the files from the root directory that match our file pattern
            DirectoryInfo info = new DirectoryInfo(rootDirectory);
            FileInfo[] files = info.GetFiles(filePattern);

You should be careful relying on the LastWriteTime of the files - this can be changed and may not be the correct value. For example, if your files are date stamped (e.g. MyInputFile_YYYYMMDD.csv) - the last write time may not coincide with the date stamp - and could be before or after another file with a different date stamp.

If you are attempting to process a set of files in date order - you would be better off creating a variable to hold the sorted file list (as an object in SSIS) and using that object in a for each. If that is something you are interested in I can put together some examples.

I would be great full for any help. Kind of stuck on it at the moment

So what exactly are you trying to accomplish? Do you just want to process a set of files in a folder - or is there other criteria that needs to be met?

Assume the files are date stamped - and you want to process those files in order of the date represented in the file name.

  1. Add a Project parameter named: RootDirectory - type String - populate the value with the root directory where you will search for the files. This can be a UNC path or a drive letter and path.

  2. Create the following package level variables:

image

  1. Add a package parameter named: FilePattern as a string and a value that identifies the files to be selected (e.g. MyFile*.csv)

  2. Add a script task in the Control Flow - set the ReadOnlyVariables as follows:

  1. Use the following code in the script task:
		public void Main()
		{
            IFormatProvider provider = null;

            // Get the root and archive directories
            string rootDirectory = Dts.Variables["$Project::RootDirectory"].Value.ToString();
            string archiveDirectory = Dts.Variables["User::ArchiveDirectory"].Value.ToString();

            // Create the archive directory (if it already exists - nothing done)
            Directory.CreateDirectory(archiveDirectory);

            // Create our unsorted dataset and datatable
            DataSet dsUnsorted = new DataSet();
            DataTable fileListTable = dsUnsorted.Tables.Add();
            fileListTable.Columns.Add("FileName", typeof(string));
            fileListTable.Columns.Add("ArchiveFileName", typeof(string));
            fileListTable.Columns.Add("FileDate", typeof(DateTime));

            // Get our file pattern search criteria
            string filePattern = Dts.Variables["$Package::FilePattern"].Value.ToString();

            // Get the files from the root directory that match our file pattern
            DirectoryInfo info = new DirectoryInfo(rootDirectory);
            FileInfo[] files = info.GetFiles(filePattern);

            // Loop through each file found
            foreach (FileInfo file in files)
            {
                // Get File Date from file name
                string fileName = Path.GetFileNameWithoutExtension(file.Name);
                DateTime fileDate = DateTime.ParseExact(fileName.Substring(fileName.Length - 8), "yyyyMMdd", provider);

                string archiveDestination = Path.Combine(archiveDirectory, fileDate.ToString("yyyyMM"));
                Directory.CreateDirectory(archiveDestination);

                // Add file information to the datatable
                fileListTable.Rows.Add(file.Name, Path.Combine(archiveDestination, file.Name).ToString(), fileDate);
            }

            // Sort the files by file date
            DataView dvSorted = new DataView(fileListTable);
            dvSorted.Sort = "FileDate ASC";

            // Convert DataView to DataSet
            DataSet dsSorted = new DataSet();
            dsSorted.Tables.Add(dvSorted.ToTable());

            Dts.Variables["SortedFileList"].Value = dsSorted;
            Dts.TaskResult = (int)ScriptResults.Success;
		}
  1. Create an archive directory in the root identified in the first step. This code will create a dated folder for each month but that can be adjusted if needed.

  2. Add a for each loop and set it up as follows:


image

  1. Inside the for each loop - setup your file processing as needed. Use expressions on the input/source to identify the file based on the variable User::FileName

  2. Following the data load - add a new script task to archive the file (inside the for each loop). Add the variables User::ArchiveFileName and User::FullFileName as ReadOnlyVariables. Use the following code to archive the file:

		public void Main()
		{
            bool fireAgain = true;

            // Get the source and destination files
            string fullFileName = Dts.Variables["User::FullFileName"].Value.ToString();
            string archiveFileName = Dts.Variables["User::ArchiveFileName"].Value.ToString();

            try
            {
                if (File.Exists(fullFileName))
                {
                    File.Copy(fullFileName, archiveFileName, true);
                    File.Delete(fullFileName);

                    Dts.Events.FireInformation(0, "Archive File Task", fullFileName + " archived to " + archiveFileName, string.Empty, 0, ref fireAgain);
                }
    
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception ex)
            {
                Dts.Events.FireError(0, "Archive File Task", ex.ToString(), string.Empty, 0);
            }
		}

Note: the reason we use a script task for this instead of the file system task is that 'moving' a file retains system settings. If you compress the Archive folder (recommended) then moving the file will override the compression setting and the file will not be compressed - so we copy the file then delete the original file.

Missed this - set the connection string property on the file connection manager to the following:

image

Use the connection manager in your flat file source - the variable will be updated/changed for each file found in the folder.

thanks for the help.will work on this

hey. just working on the logic and keep running in to this error. its to do with the filedate. it keeps showing up as + fileDate {1/1/0001 12:00:00 AM} System.DateTime and failing on the line DateTime fileDate = DateTime.ParseExact(fileName.Substring(fileName.Length - 8), "dd/mm/yyyy", provider);

any idea why.

Yes - that line of code attempts to pull the date stamp from the file name. For example - if your file name is MyFile_20201214.csv it extracts the last 8 characters from the file name without extension (MyFile_20201214) which gives us 20201214...

It then parses that string to a datetime value.

To set that correctly - we need a fixed and consistent file naming standard and the format the date will be structured.

If you can provide an example of the file names you expect - we can adjust that line.

However - if you do not have a fixed format for the file name then this code will not work as is...but it can be adjusted depending on how you want to handle the file date for your files.

a i see the issue. thats probably going to be a problem as files dont have dates in the name and file could be called different things. the one constant is that it will have GL text in the name.
so file can be like this

302150_JPY_GL
261750 GL
21112 GL

Okay - back to the beginning...

How are you going to identify the files you want this package to process? What is the file pattern that will only select those files and leave the other files alone?

Would that be: GL.csv

Can there be other files with GL in the name that should not be processed?

As for creating the sorted list of files by a date, we can change how we get the date to use the LastWriteTime of the file itself...although that isn't as reliable it might still work.

                //DateTime fileDate = DateTime.ParseExact(fileName.Substring(fileName.Length - 8), "yyyyMMdd", provider);
                DateTime fileDate = file.LastWriteTime;

Change the code to the above - this will comment out the check for the date stamp on the file and use the LastWriteTime associated with the file.

If there is some other way to organize the files in an appropriate order - that can be added. For example, if you want to process the files by the first part of the file name (e.g. 302150, 261750, 21112) then you can parse that information out of the file name - add a new column to the data table - add the value to the data table and then sort by that field and the FileDate in the DataView.

To review - you have 2 issues here. The first issue is how to identify the files to be processed by this package. The second issue is how to identify the order of processing for those files.

Once you have that information we can modify the code to handle that and return a sorted file list to process the files in the appropriate groups and order.

only files with gl should be processed and there will never be files with gl in the name that should not be processed.

the reason im going with the latestdate on the file in my original code was to basically order them.

so my folder might looks like this
302150_JPY_GL
261750 GL
21112 GL
302150_JPY_holdings

so i was trying to get it to pick the lasts date on the files with gl in the name pick that one up process it then move it to archive then pick up the next . but if that holdings file had the latest date that was been picked up even thought i thought i had logic in my code to only look for the data with the keyword gl.

i will try the change of logic see how that works. thanks for all the help

The file pattern should be GL.{ext} - not sure what extension you have for these files, but set that portion as well. If they are CSV files then GL.csv - if text files then GL.txt.

thank you

Hi Jeff,

Hi, I am trying to follow your post but seems like I am missing something. I need to look for latest file from directory and move it to the different directory.

For example, I have below files in the folder and need to look for the latest test file(Test_20221011.txt) from the directory. I appreciate your help.

Exam_20221127.txt

Test_20221011.txt

Address_20221009.txt

Exam_20220925.txt

Test_20220620.txt

Test_20220521.txt

Test20220303.txt

Exam_20220201.txt

Hi, I am trying to follow your post but seems like I am missing something. I need to look for latest file from directory and move it to the different directory.

For example, I have below files in the folder and need to look for the latest test file(Test_20221011.txt) from the directory. I appreciate your help.

Exam_20221127.txt

Test_20221011.txt

Address_20221009.txt

Exam_20220925.txt

Test_20220620.txt

Test_20220521.txt

Test20220303.txt

Exam_20220201.txt

To find the latest file based on the timestamp in the filename - you want to set the FilePattern to "Test_*.txt". This will limit the results to only those files that begin with 'Test'.

Since the last 8 characters of the filename are in the format YYYYMMDD - the above code will find all files with that pattern and convert the timestamp to an actual date.

The list of files found will then be returned sorted by the FileDate that was generated.

If you reverse the order then the first item in the list will be the newest file. You can then use that to drive the rest of your processing.