Forum Discussion

Jason Kuter's avatar
Jason Kuter
Copper Contributor
Nov 01, 2016

Cloud Hybrid Search Bandwidth and Throttling

We are testing the waters with cloud hybrid search and I currently have it indexing our on-prem SharePoint server of around 5M items and 4TB of content.  Our normal index is 12M items across the intranet.  It is using an enourmous amount of bandwidth and throwing 1 - 2 TB of content into Office 365 a week. We have two index servers with 2 partitions and they are able to crawl ~200 items/s without breaking a sweat.

 

I can not find any information on how much bandwidth it is supposed to use, how to slow it down or speed it up (we are going to use QoS eventually), or what the suggested plan should be for this.  It was well documented on how to set it up but there is no guidance on its usage.  If you have some input I would love to hear it because once I add audit logging and the rest of the content sources we will pretty much be saturating the internet connection just for hybrid functionality.  Links and stories are welcome.

  • I'm not aware of any guidance but I do cover this in conference sessions I give on hybrid search.  I have a bandwidth model (eg a spreadsheet) I made around this.

     

    You can use QoS and/or Crawler Impact rules to slow things down.   I definitely suggest providing some bandwidth limiter (like seperate VLANs or a QoS policy), because crawling will take all the bandwidth you give it.   200 docs/s is about what I'd expect if you are crawling a file system.

     

    1 TB/week is only ~1.6 MB/s average, so presumably your crawls are completing and your link is not busy all the time.  You are only transmitting indexable text so depending on the content mix this could be 10 TB of content CHANGING every week.....if that does not seem like it matches what's happening in your environment I would check to be sure you are doing incrementals and don't have full crawls scheduled all the time or something like that.

Resources