FonsecaSergio You mentioned that transient failures can last minutes. I'm guessing you are speaking in the context of inbound SQL DW. Otherwise it seems to me that a connection problem lasting minutes is extreme (unless a VM resource is being cold-started, or is hosted in China or Africa). There needs to be a reasonable threshold to distinguish between a bug and acceptable transient failures.
The reason this caught my attention is because I have an open ticket about DNS failures in spark pools. I don't know if I would call it a "transient" issue, because it is highly reproducible and DNS regularly becomes unavailable for thirty or sixty seconds at a time while a batch job is running. This seems like a lifetime, for such a fundamental and simple protocol as DNS. I'm pretty sure there is a bug in the DNS resolver of the underlying ubunto VM's, but we haven't isolated it yet... In any case, depending on the definition of "transient", and the willingness of Synapse customer's to write code as workaround, this underlying DNS issue might never be fixed!