Currently we are running a 3 tiered environment, Web Server, App Server and Data server. We have a 3rd party scheduling software that is installed on the App server. This software basically runs a bunch of jobs for a slew of things. The issue we are experiencing is if the environment is dormant for x amount of time the first job that tries to communicate to the data server will fail, but if we rerun that same job shortly after the failure it completes fine and we are off and running. Everything in the environment is running Windows 2016. Data server is running SQL 2016 w/ SP2. The app server has our in house developed application, Web server is not even in play for the "batch process". This issue ONLY seems to be showing up when we run out nightly batch cycle. Again from the app server. So that is what we are experiencing...what we think is the issue:
We think we are having a UNC timeout, since when the app server needs to talk to the Data server it's normally in a \\servername\share or \\servername\admin$share format. And unfortunately we can't duplicate the error on demand, but it's a consistent issue that if nothing is going on in the environment and we try to run the batch cycle and the first hit to the data server it fails. We don't know how long we have to wait for it to happen. But if we re-run that failed job, it runs fine with no issue. It's NOT the Scheduling Software since the issues happens if we run that same job from a command prompt...if it fails, re-running it a second time works just fine. But again when the systems are dormant. If someone is in there working between the App and Data server and tries to run a job, no issue.