StorSimple 8k series troubleshooting – lost access to DNS server
In this incident, a StorSimple 8100 Integrated Cloud Storage Array lost access to its local DNS server. As a result, it could not communicate to its Azure StorSimple Manager where it appeared to be down and unmanageable. During this period however, access to the iSCSI volumes presented by StorSimple to local file servers was not interrupted. The extent of this outage was loss of array manageability.
Array monitoring is enabled by default, and it would email all people who have admin rights on the Azure subscription under which the array’s StorSimple Manager is registered. Other recipients can be configured to receive alerts from the array as well. By default, the Azure StorSimple Manager expects a heartbeat from the array every 5 minutes. Loss of heartbeat results in an email similar to:
I like that the email is well formatted, easy to read, and tells you clearly what the problem is, which array is affected (if you’re managing several arrays), when it happened, and what it suggests you do about it.
I have the serial interface of the array connected to a serial port on a physical server to which I have remote access. This is a best practice that I recommend to StorSimple clients for exactly this sort of situations where literally at 2 AM, the array is down and you need to console to it.
After connecting to the array serial interface, and running the
command, it showed that it the local DNS server(s) were down. I then ran the
to change the IP address of the DNS server to a public DNS server like 220.127.116.11
The array displayed the message in the image above in red. This may suggest that the configuration change did not go through, but it did. In the Azure Management Interface, the array was back in manageable state and I got the all OK email.
Alerts can also be viewed under your StorSimple Manager / Alerts link: