SQL Server Failover Cluster Resource/Group stop itself due lost connection to storage

afiqazmi · October 12, 2023, 1:16am

Hello,

I have a question to ask where I had environment;

MSSQL Cluster
Data, Log and Quorom on iSCSI storage (AWS FSx Ontap)
Fileshare witness (AWS FSx Windows)

Just out of curiosity, can MSSQL Server Cluster Resource/Group can stop itself when it notice on OS level where the SQL Server Cluster lost connection to the storage?

Because MSSQL Server connection to SQL server to iSCSI storage has intermittent connection issues due to backend having maintenance/patching on the software/hardware level.

I read on another forum post where someone stated SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete, thus caused the cluster resource 'SQL Server (Name)' (resource type 'SQL Server', DLL 'sqsrvres.dll') to exceed its time-out threshold.

Does above forum cases is similar to the lost connection to the iSCSI storage, since all data/log is inside there when ever the application run do write/read query process to the database.

If that the case, is there any timeout that we can change to increase before SQL cluster decide to failover to stop the SQL services. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.

If my question or information is not detail feel free to ask about it. Hope can hear from you guys advice on this.

Thanks.