Match-LargeArrays function added to AZSBTools PowerShell module
Sometimes one needs to match large arrays in PowerShell based on a common property. For example, matching Active Directory (AD) users to Azure users based on AD user’s ms-ds-ConsistencyGuid versus Azure user’s ImmutableId. The ImmutableId Azure user property is not actually immutable and can be modified. It is populated with the base 64 version of the corresponding AD user’s ObjectGuid, and is typically populated by the ADConnect software which uses a stripped down version of MIM.
Consider the following demo data:
$DesiredArraySize = 1000
$MatchPerCent = 93.7 # The sample data generated will have x% matching records
#region Generate Demo arrays
$DemoADUserList = $DemoAzureUserList = @()
$MatchRecordCount = [Math]::Round($MatchPerCent * $DesiredArraySize / 100)
foreach ($LoopCounter in (1..$DesiredArraySize)) {
$Guid = New-Guid
$EmployeeId = Get-Random -Minimum 1000000000 -Maximum 9999999999
$DemoADUserList += New-Object -TypeName PSObject -Property ([Ordered]@{
GivenName = 'Sam'
SurName = $Guid
DisplayName = "Sam $Guid"
Name = "Sam $Guid"
samAccountName = "Sam$Guid"
UserPrincipalName = "sam$Guid@mydomain.com"
Mail = "sam$Guid@mydomain.com"
EmployeeId = $EmployeeId
Enabled = $true
DistinguishedName = "CN=Sam $Guid,OU=US,DC=mydomain,DC=com"
'msDS-CloudExtensionAttribute5' = $EmployeeId
ObjectGuid = $Guid
'ms-ds-ConsistencyGuid' = Convert-ObjectGuid2ImmutableId -ObjectGuid $Guid
})
if ($LoopCounter -le $MatchRecordCount) {$Id = $Guid} else {$Id = New-Guid}
$DemoAzureUserList += New-Object -TypeName PSObject -Property ([Ordered]@{
GivenName = 'Sam'
SurName = $Id
DisplayName = "Sam $Id"
Mail = "sam$Id@mydomain.com"
UserPrincipalName = "sam$Id@mydomain.onmicrosoft.com"
AccountEnabled = $true
ObjectId = New-Guid
ImmutableId = Convert-ObjectGuid2ImmutableId -ObjectGuid $Id
CreationType = $null
UserState = $null
UserType = 'Member'
})
}
#endregion
This code generates two array: $DemoADUserList and $DemoAzureUserList. A record in the DemoADUserList looks like:
While a record in the DemoAzureUserList looks like:
Traditional matching algorithm may look like:
#region Match using the traditional method
$Duration = Measure-Command {
foreach ($ADUser in $DemoADUserList) {
if ($FoundInAzure = $DemoAzureUserList | where ImmutableId -EQ $ADUser.'ms-ds-ConsistencyGuid') {
$ADUser | Add-Member -MemberType NoteProperty -Name MatchingAzureObjectId -Value $FoundInAzure.ObjectId -Force
} else {
$ADUser | Add-Member -MemberType NoteProperty -Name MatchingAzureObjectId -Value 'Not Found in Azure' -Force
}
}
}
Write-log 'Using the traditional method','matched',('{0:N0}' -f $DesiredArraySize),'lists',"($('{0:N0}' -f ($DesiredArraySize*$DesiredArraySize)) records) in",
"$($Duration.Hours):$($Duration.Minutes):$($Duration.Seconds)",'hh:mm:ss',"($('{0:N0}' -f $Duration.TotalSeconds) seconds)" Yellow,Green,Cyan,Green,Cyan,Green,Cyan,DarkYellow
Write-Log ' Identified',('{0:N0}' -f ($DemoADUserList | where MatchingAzureObjectId -ne 'Not Found in Azure').Count),'matching records' Green,Cyan,Green
#endregion
The problem with traditional matching algorithm is that it takes a very long time for large data sets. For example, for 10k records data sets, this algorithm takes ~ 30 minutes. For 200k data sets, it takes over a week!!
The new Match-LargeArrays function leverages Hashtable indexing to reduce that matching time by upwards of 50,000% or 500 folds!!
$Result = Match-LargeArrays -Array1 $DemoADUserList -Property1 'ms-ds-ConsistencyGuid' -Array2 $DemoAzureUserList -Property2 'ImmutableId'
For a 200k data sets the matching time is reduced from over a week to under 2 minutes!!
The result for 10k data sets takes under 2 seconds:
Using the traditional matching algorithm with the same 10k data sets and on the same hardware takes 17 minutes and 31 seconds or over 586 times longer!!
The $Result is the function’s returned array. It is a copy of the input Array1 with an additional property “MatchingObject” that contains the matching record(s) from Array2. For example:
To use/update the AZSBTools PowerShell module which is available in the PowerShell Gallery, you can use the following code:
Set-PSRepository -Name PSGallery -InstallationPolicy Trusted [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 # PowerShellGallery dropped Ssl3 and Tls as of 1 April 2020 Remove-Module AZSBTools -Force -EA 0 Install-Module AZSBTools -Force -AllowClobber -SkipPublisherCheck # -Scope CurrentUser Import-Module AZSBTools -DisableNameChecking -Force Get-Command -Module AZSBTools
You need PowerShell 5. To view your PowerShell version, in an elevated PowerShell ISE window type
$PSVersionTable
To download and install the latest version of AZSBTools from the PowerShell Gallery and its dependencies, type
Set-PSRepository -Name PSGallery -InstallationPolicy Trusted
To trust the Microsoft PowerShell Gallery repository, then
Install-Module AZSBTools,Az -Force -AllowClobber -Scope CurrentUser
AZSBTools contains functions that depend on Az module, and they’re typically installed together.
To load the AZSBTools, and Az modules type:
Import-Module AZSBTools,Az -DisableNameChecking
To view a list of cmdlets/functions in AZSBTools, type
Get-Command -Module AZSBTools
To view the built-in help of one of the AZSBTools functions/cmdlets, type
help <function/cmdlet name> -show
such as
help Get-DayOfMonth -show