Schedule multiple content source crawls: SharePoint 2013

Deleted
Not applicable

 

The maximum Content Source boundary for a SharePoint 2013 Search service application is 500.

That is a lot of individual Content Source objects.

Still if you are crawling file shares and you want to show specific units as Content Sources in a refiner, you may be quickly growing a large set of Content Source.

One shortcoming that has always bothered me is the limitation of "scheduling" crawls.

Each schedule is independent; only relevant to the content source it is scheduled to crawl.

This means you could easily get into some complex mapping to try to figure out when you have what crawling.

Without unlimited resources, 20 million or more items to crawl and more that a hand full of content source locations, will likely run into some frustrations.

 

One solution to keep your crawling better managed is to look at a schedule taking the entire Farm SSA into account.

You can do this through PowerShell scripting and a Windows Task.

 

In this example we will assume we don't want more than 10 crawls running simultaneously.

We'll exclude "Local SharePoint sites" from being crawled by this process. It has continuous crawl enabled and its own aggressive Full and Incremental crawl schedule.

We're also going to exclude a few content sources from the schedule for some reason. Maybe they are extremely large or on very slow disk, so we want a completely unique choice in when we crawl them.

 

Our maximum limit count of ten will include all non-idle crawl components.

Yes, this code can be cleaned up, made into a function, etc.

The point is to provide a functional tool to review and use if desired.  

=== START ===

<#
Purpose:
Start Incremental Crawl on oldest Content Source up to $MaxNonIdel instances
Check how many non-idle Content Sources there are
If less than $MaxNonIdel, start remainder as Incremental Crawl based on oldest crawl
#>
$ErrorActionPreference = “Stop”;

# Load SharePoint Module
If ((Get-PSSnapIn -Name Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue) -eq $null )
{ Add-PSSnapIn -Name Microsoft.SharePoint.PowerShell }

# Get duration of crawl

$NameSSA = "Your Search Service Application Name";
$MaxNonIdel = 10;
$NonIdelCount = 0;
$sources = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $NameSSA;

ForEach ($source in $sources) {
if ($source.CrawlStatus -ne "Idle") {
$NonIdelCount++;
}
}

if($NonIdelCount -lt $MaxNonIdel) {
$sources = $sources | Sort-Object -Property CrawlCompleted;
ForEach ($source in $sources) {
if($NonIdelCount -lt $MaxNonIdel) {
if ($source.CrawlStatus -eq "Idle" -and $source.Name -ne "Promo*" -and $source.Name -notlike "Blue*" -and $source.Name -ne "Local*") {
$source.StartIncrementalCrawl();
$NonIdelCount++;
}
}
if($NonIdelCount -ge $MaxNonIdel) {
Exit;
}
}
}

=== END ===

0 Replies