Archive for March, 2016

Troubleshooting StorSimple high latency IO’s blocking low latency IO’s


By design StorSimple hybrid cloud storage tiers off automatically the oldest blocks from the local SSD tier down to the SAS tier as the SSD tier fills up (reaches ~80% capacity). In turn it also tiers down the oldest blocks from the SAS tier to the Azure tier as that fills up (reaches ~80% capacity).

This has the great benefits of:

  1. Automated tiering: This negates the need for data classification and the entirety of the efforts associated with that.
  2. Granular tiering: Tiering happens at the block level not at the file level. That’s 64KB for tiered volumes. So, a file can have some hot blocks in SSD, some older blocks in SAS, and some cold blocks that have been displaced all the way down to the Azure tier by warmer blocks (of the same or other files)

As of the time of writing this post (28 March 2016), tiering is fully automated and not configurable. The exception is ‘Locally Pinned Volume’ feature that comes with StorSimple software update 2.0 (17673) and above. A locally pinned volume loses the deduplication and compression features of a ‘Tiered Volume’, and always resides on the physical device. Currently no visibility is provided as to what tier a Locally Pinned Volume resides (SSD or SAS).

In the following scenario – take the example of an 8100 StorSimple device that has 15.8 TB local usable capacity (prior to dedplication and compression):

  1. Customer creates handful of volumes – about 30 TB provisioned out of 200 TB max allowed on the device, migrates some 25 TB of data:
    Capacity02
    The above ‘Primary’ capacity graph shows about 25 TB of data as it appears to the SMB file servers that consume the iSCSI volumes, while the below ‘Device’ capacity graph shows that about 10 TB of that 25 TB resides on the same device for the same time period.
    Capacity01
  2. Customer does an archive data dump, such as 2 TB of old backup or archive files. Any new data comes in as hot and in a ‘full’ device, it will displace older blocks to Azure. In this case, we have several TB of active production data that got inadvertently displaced to Azure. The following access pattern is observed:
    1. End user attempts to retrieve files. If the file blocks are in Azure, they will be retrieved, but to make room for them in the SSD tier, other blocks has be tiered down to the full SAS tier, which will have to tier off blocks back to Azure to make room for blocks coming down from SSD. So, a read operation has caused 2 tiering operations including a write operation to Azure. This is described as high latency IO operation.
    2. If this is taking several minutes, during the period where the device is handling high latency IO’s described above, if other users are requesting files that RESIDE ENTIRELY LOCALLY on the device (described as low latency IO operations), it has been observed that those read requests are slowed down as well to a crawl. That’s is high latency IO’s appear to block low latency IO’s.
    3. So in this scenario, a 2 TB archive data dump on an 8100 device with 10 TB on the device, result in the entire 10 TB being shuffled out to Azure and back in, few blocks at a time, until the 2 TB archive data ends up in Azure returning the device to its pre-incident status.

In my opinion, this is a situation to be avoided at all costs. Once it occurs, the device may exhibit very slow performance that may last for weeks until the archive data dump has made its way through the rest of the data on the device to Azure.

Best practices recommended to avoid this scenario:

  1. Adhere to the recommended device use cases, particularly unstructured data/file shares. StorSimple is not meant for multi-terabyte high performance SQL databases for example. Another example that is not recommended as a workload on StorSimple is large PST files. They’re essentially database file that are accessed frequently, and get scanned, indexed and accessed in their entirety.
  2. Do not run any workload or process that scans the active data set in its entirely. Anti-virus and anti-malware scans must be configured for incremental use or quick scans only, never for a full scan of all files on a volume. This applies to any process that may try to index, categorize, classify, or read all files on a volume. The exception is a process or application that reads files metadata and properties only – not open the files and reads inside of them. Reading metadata is OK because metadata always resides locally on the device.
  3. Carefully plan your data migration to StorSimple, putting emphasis on migrating the oldest data first. Robocopy can be a very helpful tool in the process.

I’m adding the following enhancements to my wishlist that I hope to see implemented by Microsoft in the next StorSimple software release:

  • Resolving the core issue of high latency IO’s seeming to block/impede low latency IO’s
  • More visibility into the device tiering metrics. Simply put, a storage admin needs to know when a StorSimple device is ‘full’ and is tiering off blocks from the primary data set to Azure. This knowledge is critical to avoid the situation described above. A metric of the amount of space available before the device is full, is even better to help provide predictability before reaching that point.
  • ‘Cloud Pinned Volume’ feature would be very helpful. This should allow the StorSimple storage admin to provision an iSCSI volume that resides always in Azure and does not affect the device heat map.
Advertisements

SQL backup options and feature details


Simple Recovery Model

Overview:

  • Transaction log is mainly used for crash recovery (no log backup)
  • Transaction log keeps all records after last backup (full or differential)
  • Supports Full and Differential backups only
  • Changes since last backup will be lost
  • Example: Full on Saturdays and differentials on weekdays.

Advantages:

  • Automatically reclaims log space

Dis-advantages:

  • The following features are not supported under the Simple Recovery Model:
    • Log shipping
    • AlwaysOn Database mirroring
    • Point-in-time restores
  • Changes since the most recent backup cannot be recovered

To view a database Recovery Model:
TSQL:

SELECT recovery_model_desc FROM sys.databases WHERE name = ‘MyDBName’

Powershell:

Using the SBSQL module:

Get-SQLRecoveryModel -DBName MyDBName -ServerName MySQLServerName | FT -a

To view Recovery Model of all databases on current SQL server, use the Get-SQLRecoveryModel powershell script.

Sample script output:

SQL01

To change a database Recovery Model:
TSQL:

USE master; ALTER DATABASE MyDBName SET RECOVERY SIMPLE

This will change the Recovery Model for database ‘MyDBName’ to Simple. Valid options are Simple, Full and Bulk_Logged

Powershell:

Using the SBSQL module:

Set-SQLRecoveryModel -DBName MyDBName -RecoveryModel Simple -ServerName MySQLServerName

To modify Recovery Model of all user databases on current SQL server use the Set-SQLRecoveryModel script.


Powershell script to re-hydrate StorSimple files based on date last accessed


In some rare situations, a StorSimple hybrid cloud storage device can reach a point where a large cold data dump has displaced hot data to the cloud (Azure). This happens if the device local SSD and SAS tiers are full (including reserved space that cannot be used for incoming data blocks from the iSCSI interfaces). In this situation, most READ requests will be followed by Azure WRITE requests. What’s happening is that the device is retrieving the requested data from Azure, and to make room for it on the local tiers it’s displacing the coldest blocks back to Azure. This may result in poor device performance especially in situations where the device bandwidth to/from the Internet is limited.

In the scenario above, if the cold data dump occurred 8 days ago for example, we may be interested in re-hydrating files last access in the week prior to that point in time. This Powershell script does just that. It identifies files under a given directory based on date last accessed, and reads them. By doing so, the StorSimple device brings these files to the top SSD tier. This is meant to run off hours, and is tested to improve file access for users coming online the next day.

To use this script, modify the values for the $FolderName variable. This is where the script looks for files to re-hydrate. The script searches for all sub-folders.

Rehydrate2

Also modify the values of the $StartDays and $EndDays variables. As shown in the example above, the selection of 15 StartDays and 9 EndDays will re-hydrate data whose LastAccessTime was 9-15 days ago.

Script output may look like:

Rehydrate1

As usual, a log file is generated containing the same output displayed on the console. This is helpful if the script will be run as a scheduled task or job.


Powershell script to collect disk information from one or many computers


In many scenarios you may need to collect disk drive information from one or many Windows computers. For example when trying to identify used storage in the process of planning to move from traditional on premise storage to cloud integrated storage such as StorSimple. This script can help with this task.

To use this script, run it to load its 2 functions. The main function is Get-DiskInfo.

to see its built in help, type in:

help Get-DiskInfo -ShowWindow

Powershell displays help information similar to:

Get-DiskInfo01

The help information includes some examples like:

Get-DiskInfo

Simply running the function with no parameters. The script assumes you mean get disk information of this computer, and creates a log file under the current path in a sub-folder named logs.

Output may look like:

Get-DiskInfo02

The log file will have the same information displayed on the screen. This is handy when/if this is run as (unattended) scheduled task or job.

Get-DiskInfo03

Another example illustrates using the output PS object to present and save the information into CSV for further reporting and analysis:

$Drives = Get-DiskInfo v2012R2G2-SQL2,notthere,v2012R2G2-IIS2

The script will return list of drives of the computers v2012R2G2-SQL2,v2012R2G2-IIS2, save the output to a log file, and in the $Drives variable. Output may look like:

Get-DiskInfo04

The source computers can be Windows XP, Windows 7, Windows 8, Windows 10, Server 2003, Server 2008, Server 2012, Server 2016. They must have Powershell installed, and Powershell remoting configured. See this post for more details.

We can display the data in the $Drives variable to the screen as in:

$Drives | select Computer,Drive,’Used(GB)’,’Free(GB)’,’Free(%)’,’Total(GB)’ | FT -a

Get-DiskInfo05
Or display it to a window instead as in:

$Drives | select Computer,Drive,’Used(GB)’,’Free(GB)’,’Free(%)’,’Total(GB)’ | Out-GridView

Get-DiskInfo06

Or save to CSV file for further reporting and analysis with MS Excel as in:

$Drives | select Computer,Drive,Used,Free,Total | Export-Csv .\Drives1.csv -NoType

Get-DiskInfo07

The last example:

Get-DiskInfo (Import-Csv .\ComputerList.csv).ComputerName

demonstrates that we can read the list of computers from a CSV file. In this example the input CSV file looks like:

Get-DiskInfo08

You can simply create a new file in Notepad. The first line must be ‘ComputerName’ to match the property name used in the example above. This is followed by the computer names, each on its own line. Finally save with CSV file extension instead of TXT.