I'm no PS expert but trying to learn more to make my life easier. I will present the problem and then provide details if your interested in why this is such a pain for me right now.
Problem
I need to be able to take a list of file names (sometimes hundreds of thousands) and compare against a directory, also containing hundreds of thousands of file names, to move the matches to a seperate UPLOAD directory to isolate only the files that we want
to put onto our file server.
Background
I work for a company that is in acquisition mode, we have bought so far this year 5 very large companies to incorporate into our own. We have an in house software division and an in house built CRM where we take the data from these CRM's that these companies
used before and put it into our own. This includes the attachments that are the root of the problem above. The acquisitions are not slowing down and this will continue to be a painful process (takes 20 + hours as of right now to separate out the files
we need from the massive file dumps we get fro these companies). I will paste the PS we have created below..feel free to criticize and tell us how terribly wrong it is. We are here to learn and get a resolution for this headache. The step that seems to be
the major pain point is the COMPARE step...
#This script identifies and prepares the attachments needed to upload to our file servers
#Contents
###########################################################
#Extract - Takes a folder containing zipped folders and unzips them all to one location
#Split - Splits out the folder of files to multiple smaller folders to improve runtime of compare step
#Compare - Compares files to CSV of files in the database and returns the files that match
#Extract
###########################################################
$ExtractRunTime = Measure-Command {
$FolderPath = "E:\File Download\7-1-19 Download"
$DestinationPath = "F:\DealerAttachments"
$Files = Get-ChildItem -Path $FolderPath -Recurse -Force
ForEach ($F in $Files.FullName) {
Expand-Archive -Path $F -DestinationPath $DestinationPath
}
}
$ExtractTime = "It took $([math]::Round($ExtractRunTime.Hours,2) ) hours, $([math]::Round($ExtractRunTime.Minutes,2) ) minutes, and $([math]::Round($ExtractRunTime.Seconds,2) ) seconds to unzip $($Files.Count) folders of attachments."
$ExtractTime
#Split
###########################################################
$SplitRunTime = Measure-Command {
$FilesPerFolder = 25000 #Change this based on the number of files per folder desired
$SourcePath = "F:\Dealer Attachments\AttachmentsSplit\1"
$DestPath = "F:\Dealer Attachments\AttachmentsSplit\1"
$i = 0 #Instantiate the counting variable for the loop below. Do not change.
$FolderNum = 1 #Instantiate the folder number variable for the loop below. Do not change.
$GetFilesRunTime = Measure-Command {
$Files = Get-ChildItem -Path $SourcePath -Recurse -Force
}
$MoveFilesRunTime = Measure-Command {
ForEach ($F in $Files.FullName) {
if (!(Test-Path "$DestPath\$FolderNum")) {
New-Item -Path "$DestPath\$FolderNum" -Type Directory -Force
}
Copy-Item -Path $F -Destination "$DestPath\$FolderNum"
$i++
if ($i -eq $FilesPerFolder){
$FolderNum++
$i = 0
}
}
}
}
$SplitTime = "It took $([math]::Round($SplitRunTime.Hours,2) ) hours, $([math]::Round($SplitRunTime.Minutes,2) ) minutes, and $([math]::Round($SplitRunTime.Seconds,2) ) seconds to divide the attachments into $FolderNum folders."
#Compare
###########################################################
$CSV = Import-Csv 'F:\Dealer Attachments\AttachmentsSplit\Dealer Attachments To Upload.csv'
$Destination = "F:\Dealer Attachments\AttachmentsSplit\Dealer Attachments To Upload"
$FilePath = 'F:\Dealer Attachments\AttachmentsSplit\1\1'
$Files = Get-ChildItem $FilePath
$FolderTime = Measure-Command {
$CompareTime = Measure-Command {
$Compare = Compare-Object $Files.Name $CSV.Id -IncludeEqual -ExcludeDifferent
}
$MoveTime = Measure-Command {
ForEach ( $File in $Compare.InputObject ) {
ForEach ( $C in $CSV ) {
If ( $File -eq $C.Id ) {
If ( $C.Destination -eq "OBA" ) {
Copy-Item ( $FilePath + '\' + $File ) $Destination
}
}
}
}
}
}
$OutputCompareTime = "Folder 1 took $([math]::Round($CompareTime.Hours,2) ) hours, $([math]::Round($CompareTime.Minutes,2) ) minutes, and $([math]::Round($CompareTime.Seconds,2) ) seconds to compare the files, resulting in $($Compare.Count) matches."
$OutputMoveTime = "Folder 1 took $([math]::Round($MoveTime.Hours,2) ) hours, $([math]::Round($MoveTime.Minutes,2) ) minutes, and $([math]::Round($MoveTime.Seconds,2) ) seconds to move the matching files."
$OutputFolderTime = "Folder 1 took $([math]::Round($FolderTime.Hours,2) ) hours, $([math]::Round($FolderTime.Minutes,2) ) minutes, and $([math]::Round($FolderTime.Seconds,2) ) seconds in total."
$OutputCompareTime
$OutputMoveTime
$OutputFolderTime