Good day! There is a SQL cluster very often once every two weeks there is a reboot of the instance, in the logs I see errors:
QL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [J:\MountPoints\DB_Data_2\DB\Name_DB.mdf] in database id 23. The OS file handle is 0x0000000000002240. The offset of the latest long I/O is: 0x0000b26e132000
A component on the server did not respond in a timely fashion. This caused the cluster resource 'SQL Server (Name)' (resource type 'SQL Server', DLL 'sqsrvres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated with the resource are functioning correctly.
Cluster resource 'SQL Server (Name)' of type 'SQL Server' in clustered role 'SQL Server (Name)' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
It seems to me that this is due to the high load on the storage system, I am trying to follow the recommendations from the Microsoft website, but I will be glad for any help.
I tried using the ResetWUEng.cmd script, which helped check the integrity of the system and check it for damage, no problems were found.