Posts tagged “benchmarking

Get-Busy.ps1 powershell script to create files on many PCs and collect metrics


This script uses Busy.ps1 which is a script that I posted earlier. This script can be downloaded from the Microsoft TechNet Gallery. To use this script you need to edit the 4 data entry lines on top:

GetBusy

  • $WorkFolder = “e:\support” # Folder on each VM where test files will be created
  • $MaxSpaceToUseOnDisk = 20GB # Maximum amount of disk space to be used on each VM’s $WorkFolder during testing
  • $VMPrefix = “V-2012R2-LAB” # We’ll use that to scope this script (only running VMs with name matching this Prefix will be processed)
  • $LocalAdmin = “administrator” # This is the local admin account on the VMs. You can also use a domain account here if you like.

The script requires 1 positional parameter to indicate whether you wish to start the testing on the VMs or stop it. For example to start:

.\get-busy.ps1 start

GS-017e40

 To end the testing, use:

 .\get-busy.ps1 stop

The script will reach out to the VMs being tested and stop the busy.ps1 script, collect the test results, and cleanup the $Workfolder on each VM.

The script generates 2 log files:

  • A general log file that contains the messages displayed on the screen about each action attempted.
  • A CSV file that contains the compiled test results from all the CSV files generated by each busy.ps1 script on each of the tested VMs

Here’s an example of the compiled CSV file Get-busy_20140714_071428PM.csv


Powershell script to create Hyper-V Virtual Machines in bulk


8/14/2014:

This script has been deprecated.  It has been re-written and made part of the SBTools PS Module available here.

Sample script output:

vm07

This script can be used to create Hyper-V virtual machines in bulk. This may be useful in testing, benchmarking, or lab environments.

CVM1

This script can be downloaded from the Microsoft TechNet Gallery.


Revisions:

1.0 – 06/30/2014 – Script leaves log file in the folder where it runs
1.1 – 07/07/2014 – Added CSV log file showing disk copy duration and throughput
1.2 – 07/13/2014 – Added HRBytes function, minor cosmetic tweaks

 


Benchmarking Gridstore enterprise storage array (2)


This is another post in a series of posts in the process of performance testing and benchmarking Gridstore enterprise storage array.

Gridstore Array components:

6x H-Nodes. Each has 1x Xeon E5-2403 processor at 1.8 GHz with 4 cores (no hyper-threading) and 10 MB L3 cache, 32 GB DDR3 1333 MHz DIMM, 4x 3TB 7200 RPM SAS disks and a 550 GB PCIe Flash card.

GS-009k

Testing environment:

One compute node with 2x Xeon E5-2430L CPUs at 2 GHz with 6 cores each (12 Logical processors) and 15 MB L3 cache, 96 GB RAM

Pre-test network bandwidth verification:

Prior to testing array disk IO, I tested the availability of bandwidth on the Force 10 switch used. I used NTttcp Version 5.28 tool. One of the array nodes was the receiver:

GS-002

The HV-LAB-01 compute node was the sender:

GS-003

I configured the tool to use 4 processor cores only since the Gridstore storage nodes have only 4 cores.

The result was usable bandwidth of 8.951 Gbps (1,18.9 MB/s) – Testing was done using standard 1,500 MTU frames not 9,000 MTU jumbo frames.


vLUNs:

8 vLUNs were configured for this test.

GS-009l

Each vLUN is configured as follows:

  • Protect Level: 1 (striped across 3 Gridstore nodes, fault tolerant to survive single node failure)
  • Optimized for: IOPS
  • QoS: Platinum
  • Unmasked: to 1 server
  • File system: NTFS
  • Block size: 64 KB
  • Size: 5 TB (3 segments, 2.5 TB each)

This configuration utilizes each all 24 disks in this grid. (6 nodes * 4 disks each = 24 disks = 8 vLUNs * 3 disks each). It provides optimum array throughput.


Testing tool:

Intel’s IOMeter version 2006.07.27

24 workers, each configured to target all 8 vLUNs – 32 outstanding I/Os

GS-009m

IO profile: 50% read/50% write, 10% random, 8k alignment:

GS-009n

Test duration: 10 minutes


Test result:

IOMeter showed upwards of 17.7k IOPS:

GS-009a

Disk performance details on the compute node:

GS-009b

CPU performance details on the compute node:

GS-009c

Network performance on one of the storage nodes:

GS-009d

Disk performance on one of the storage nodes:

GS-009e

CPU performance on one of the storage nodes:

GS-009f

Overall summary performance on one of the storage nodes:

GS-009g

CPU utilization on the storage nodes as shown from the GridControl snap-in:

GS-009h

Bytes received:

GS-009i

Final test result:

GS-009j

and test details.


Conclusion and important points to note:

  • Network utilization maxed out beyond the 10 Gbps single NIC used on both the compute and storage nodes. This suggests that the array is likely to deliver more IOPS if more network bandwidth is available. Next test will use 2 teamed NICs on the compute node as well as 3 storage nodes with teamed 10 Gbps NICs as well.
  • CPU is maxed on the storage nodes during the test. Storage nodes have 4 cores. This suggests that CPU may be a bottleneck on storage nodes. It also leads me to believe that a) more processing power is needed on the storage nodes, and b) RDMA NICs are likely to enhance performance greatly. The Mellanox ConnectX-3 VPI dual port PCIe8 card may be just what the doctor ordered. In a perfect environment, I would have that coupled with the Mellanox Infiniband MSX6036F-1BRR 56Gbps Switch.
  • Disk IO performance on the storage nodes during the test showed about 240 MB/s data transfer, or about 60 MB/s per each of the disks in the node. This corresponds to the native IO performance of the SAS disks. This suggests minimal/negligible boost from the 550 GB PCIe flash card in the storage node.

 

 


Benchmarking Gridstore enterprise storage array (1)


Gridstore provides an alternative to traditional enterprise storage. Basic facts about Gridstore storage technology include:

  • It provides storage nodes implemented as 1 RU servers that function collectively as a single storage array.
  • Connectivity between the nodes and the storage consumers/compute nodes occurs over one or more 1 or 10 Gbps Ethernet connections.
  • NIC teaming can be setup on the Gridstore nodes to provide additional bandwidth and fault tolerance
  • It utilizes a virtual controller to present storage to Windows servers

IO testing tool and its settings is detailed in this post.

vLUNs can be easily created using the GridControl snap-in. This testing is done with a Gridstore array composed of 6 H-nodes. Click node details to see more.

Prior to testing array disk IO, I tested the availability of bandwidth on the Force 10 switch used. I used NTttcp Version 5.28 tool. One of the array nodes was the receiver:

GS-002

 

The HV-LAB-01 compute node was the sender:

GS-003

I configured the tool to use 4 processor cores only since the Gridstore storage nodes had only 4 cores.

The result was usable bandwidth of 8.951 Gbps (1,18.9 MB/s) – Testing was done using standard 1,500 MTU frames not 9,000 MTU jumbo frames.

Test details::

On the receiver Gridstore storage node:
C:\Support>ntttcp.exe -r -m 4,*,10.5.19.30 -rb 2M -a 16 -t 120
Copyright Version 5.28
Network activity progressing…
Thread Time(s) Throughput(KB/s) Avg B / Compl
====== ======= ================ =============
0 120.011 311727.158 60023.949
1 120.011 233765.293 53126.468
2 120.011 306670.676 56087.990
3 120.011 293592.705 52626.788
##### Totals: #####
Bytes(MEG) realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
134280.569568 120.011 1457.709 1118.902
Throughput(Buffers/s) Cycles/Byte Buffers
===================== =========== =============
17902.435 3.864 2148489.113
DPCs(count/s) Pkts(num/DPC) Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
17388.114 46.288 26563.098 30.300
Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
4634562 96592255 599 0 62.960

On the sender compute node: HV-LAB-05
C:\Support>ntttcp.exe -s -m 4,*,10.5.19.30 -rb 2M -a 16 -t 120
Copyright Version 5.28
Network activity progressing…
Thread Time(s) Throughput(KB/s) Avg B / Compl
====== ======= ================ =============
0 120.003 311702.607 65536.000
1 120.003 233765.889 65536.000
2 120.003 306669.667 65536.000
3 120.003 293592.660 65536.000
##### Totals: #####
Bytes(MEG) realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
134268.687500 120.004 1457.441 1118.868
Throughput(Buffers/s) Cycles/Byte Buffers
===================== =========== =============
17901.895 2.957 2148299.000
DPCs(count/s) Pkts(num/DPC) Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
25915.561 1.504 71032.291 0.549
Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
96601489 4677580 22698 1 7.228


Test 1:

Compute node(s): 1 physical machine with 2x Xeon E5-2430L CPUs at 2 GHz with 6 cores each (12 Logical processors) and 30 MB L3 cache, 96 GB RAM, 2x 10 Gbps NICs

GS-001

vLUN:

  • Protect Level: 0 (no fault tolerance, striped across 4 Gridstore nodes)
  • Optimized for: N/A
  • QoS: Platinum
  • Unmasked: to 1 server
  • File system: NTFS
  • Block size: 32 KB
  • Size: 2 TB (4 segments, 512 GB each)

GS-A05

Result:

Testing with 24 vCores, 10 Gbps NIC, 1 compute node, 32k block size, 50% read/50% write IO profile => 10.43k IOPS

GS-A01

 

In the above image you can see the read/write activity to the 4 nodes that make up this vLUN listed under Network Activity in the Resource Monitor/Network tab.

GS-A02

At the same time, the 4 nodes that make up this vLUN showed average CPU utilization around 40%. This dropped down to 0% right after the test.

GS-A03

The 4 nodes’ memory utilization averaged around 25% during the test. It’s baseline is 20%

GS-A04

 


Test 2: The same single compute node above

vLUN:

  • Protect Level: 1 (striped across 3 Gridstore nodes, fault tolerant to survive single node failure)
  • Optimized for: IOPS
  • QoS: Platinum
  • Unmasked: to 1 server
  • File system: NTFS
  • Block size: 32 KB
  • Size: 2 TB (3 segments, 1 TB each)

GS-B04

Result:

Testing with 24 vCores, 10 Gbps NIC, 1 compute node, 32k block size, 50% read/50% write IO profile => 11.32k IOPS

GS-B01

 

GS-B02

 

GS-B03

 


Test 3: The same single compute node above

vLUN:

  • Protect Level: 1 (striped across 5 Gridstore nodes, fault tolerant to survive single node failure)
  • Optimized for: Throughput
  • QoS: Platinum
  • Unmasked: to 1 server
  • File system: NTFS
  • Block size: 32 KB
  • Size: 2 TB (5 segments, 512 GB each)

GS-C01

Result:

Testing with 24 vCores, 10 Gbps NIC, 1 compute node, 32k block size, 50% read/50% write IO profile => 9.28k IOPS
GS-C02

GS-C03

 

GS-C04


Test 4: The same single compute node above

vLUN:

  • Protect Level: 2 (striped across 6 Gridstore nodes, fault tolerant to survive 2 simultaneous node failures)
  • Optimized for: Throughput
  • QoS: Platinum
  • Unmasked: to 1 server
  • File system: NTFS
  • Block size: 32 KB
  • Size: 2 TB (6 segments, 512 GB each)

GS-D01

Result:

Testing with 24 vCores, 10 Gbps NIC, 1 compute node, 32k block size, 50% read/50% write IO profile => 4.56k IOPS

GS-D02

GS-D03

GS-D04

 


Test 5: The same single compute node above

2 vLUNs:

1. The same Grid Protection Level 1 vLUN from test 1 above with Platinum QoS setting +

2. Identical 2nd vLUN except that QoS is set to Gold:

  • Protect Level: 1 (striped across 3 Gridstore nodes, fault tolerant to survive 1 node failure)
  • Optimized for: IOPS
  • QoS: Gold
  • Unmasked: to 1 server
  • File system: NTFS
  • Block size: 32 KB
  • Size: 2 TB (3 segments,  1 TB each)

GS-E01

Result:

Testing with 24 vCores, 10 Gbps NIC, 1 compute node, 32k block size, 50% read/50% write IO profile => 10.52k IOPS

GS-E02

GS-E03

GS-E04

 


 

Test 6: The same single compute node above

3 vLUNs:

All the same:

  • Protect Level: 1 (striped across 3 Gridstore nodes, fault tolerant to survive 1 node failure)
  • Optimized for: IOPS
  • QoS: Platinum
  • Unmasked: to 1 server
  • File system: NTFS
  • Block size: 32 KB
  • Size: 2 TB (3 segments,  1 TB each)

GS-F1

Result:

Testing with 24 vCores, 10 Gbps NIC, 1 compute node, 32k block size, 50% read/50% write IO profile => 9.94k IOPS

GS-F6

GS-F5

GS-F4

GS-F3

GS-F2


 Summary:

GS-004