Archive for July, 2023

Match-LargeArrays function added to AZSBTools PowerShell module


Sometimes one needs to match large arrays in PowerShell based on a common property. For example, matching Active Directory (AD) users to Azure users based on AD user’s ms-ds-ConsistencyGuid versus Azure user’s ImmutableId. The ImmutableId Azure user property is not actually immutable and can be modified. It is populated with the base 64 version of the corresponding AD user’s ObjectGuid, and is typically populated by the ADConnect software which uses a stripped down version of MIM.

Consider the following demo data:

$DesiredArraySize = 1000
$MatchPerCent = 93.7 # The sample data generated will have x% matching records

#region Generate Demo arrays
$DemoADUserList = $DemoAzureUserList = @()
$MatchRecordCount = [Math]::Round($MatchPerCent * $DesiredArraySize / 100)
foreach ($LoopCounter in (1..$DesiredArraySize)) {
    $Guid = New-Guid
    $EmployeeId = Get-Random -Minimum 1000000000 -Maximum 9999999999
    $DemoADUserList += New-Object -TypeName PSObject -Property ([Ordered]@{
        GivenName                       = 'Sam'
        SurName                         = $Guid
        DisplayName                     = "Sam $Guid"
        Name                            = "Sam $Guid"
        samAccountName                  = "Sam$Guid"
        UserPrincipalName               = "sam$Guid@mydomain.com"
        Mail                            = "sam$Guid@mydomain.com"
        EmployeeId                      = $EmployeeId
        Enabled                         = $true
        DistinguishedName               = "CN=Sam $Guid,OU=US,DC=mydomain,DC=com"
        'msDS-CloudExtensionAttribute5' = $EmployeeId
        ObjectGuid                      = $Guid
        'ms-ds-ConsistencyGuid'         = Convert-ObjectGuid2ImmutableId -ObjectGuid $Guid
    })
    if ($LoopCounter -le $MatchRecordCount) {$Id = $Guid} else {$Id = New-Guid}
    $DemoAzureUserList += New-Object -TypeName PSObject -Property ([Ordered]@{
        GivenName         = 'Sam'
        SurName           = $Id
        DisplayName       = "Sam $Id"
        Mail              = "sam$Id@mydomain.com"
        UserPrincipalName = "sam$Id@mydomain.onmicrosoft.com"
        AccountEnabled    = $true
        ObjectId          = New-Guid
        ImmutableId       = Convert-ObjectGuid2ImmutableId -ObjectGuid $Id
        CreationType      = $null
        UserState         = $null
        UserType          = 'Member'
    })
}
#endregion

This code generates two array: $DemoADUserList and $DemoAzureUserList. A record in the DemoADUserList looks like:

While a record in the DemoAzureUserList looks like:

Traditional matching algorithm may look like:

#region Match using the traditional method
$Duration  = Measure-Command {
    foreach ($ADUser in $DemoADUserList) {
        if ($FoundInAzure = $DemoAzureUserList | where ImmutableId -EQ $ADUser.'ms-ds-ConsistencyGuid') {
            $ADUser | Add-Member -MemberType NoteProperty -Name MatchingAzureObjectId -Value $FoundInAzure.ObjectId -Force
        } else {
            $ADUser | Add-Member -MemberType NoteProperty -Name MatchingAzureObjectId -Value 'Not Found in Azure' -Force
        }
    }
}
Write-log 'Using the traditional method','matched',('{0:N0}' -f $DesiredArraySize),'lists',"($('{0:N0}' -f ($DesiredArraySize*$DesiredArraySize)) records) in",
    "$($Duration.Hours):$($Duration.Minutes):$($Duration.Seconds)",'hh:mm:ss',"($('{0:N0}' -f $Duration.TotalSeconds) seconds)" Yellow,Green,Cyan,Green,Cyan,Green,Cyan,DarkYellow
Write-Log '   Identified',('{0:N0}' -f ($DemoADUserList | where MatchingAzureObjectId -ne 'Not Found in Azure').Count),'matching records' Green,Cyan,Green
#endregion

The problem with traditional matching algorithm is that it takes a very long time for large data sets. For example, for 10k records data sets, this algorithm takes ~ 30 minutes. For 200k data sets, it takes over a week!!

The new Match-LargeArrays function leverages Hashtable indexing to reduce that matching time by upwards of 50,000% or 500 folds!!

$Result = Match-LargeArrays -Array1 $DemoADUserList -Property1 'ms-ds-ConsistencyGuid' -Array2 $DemoAzureUserList -Property2 'ImmutableId'

For a 200k data sets the matching time is reduced from over a week to under 2 minutes!!

The result for 10k data sets takes under 2 seconds:

Using the traditional matching algorithm with the same 10k data sets and on the same hardware takes 17 minutes and 31 seconds or over 586 times longer!!

The $Result is the function’s returned array. It is a copy of the input Array1 with an additional property “MatchingObject” that contains the matching record(s) from Array2. For example:


To use/update the AZSBTools PowerShell module which is available in the PowerShell Gallery, you can use the following code:

Set-PSRepository -Name PSGallery -InstallationPolicy Trusted 
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 
# PowerShellGallery dropped Ssl3 and Tls as of 1 April 2020
Remove-Module AZSBTools -Force -EA 0 
Install-Module AZSBTools -Force -AllowClobber -SkipPublisherCheck # -Scope CurrentUser
Import-Module AZSBTools -DisableNameChecking -Force 
Get-Command -Module AZSBTools

You need PowerShell 5. To view your PowerShell version, in an elevated PowerShell ISE window type

$PSVersionTable

To download and install the latest version of AZSBTools from the PowerShell Gallery and its dependencies, type

Set-PSRepository -Name PSGallery -InstallationPolicy Trusted

To trust the Microsoft PowerShell Gallery repository, then

Install-Module AZSBTools,Az -Force -AllowClobber -Scope CurrentUser

AZSBTools contains functions that depend on Az module, and they’re typically installed together.

To load the AZSBTools, and Az modules type:

Import-Module AZSBTools,Az -DisableNameChecking

To view a list of cmdlets/functions in AZSBTools, type

Get-Command -Module AZSBTools

To view the built-in help of one of the AZSBTools functions/cmdlets, type

help <function/cmdlet name> -show

such as

help Get-DayOfMonth -show