SharePoint: How to export all the Publishing pages from a complete site collection to CSV Stats

Steel Contributor

When the Intranet site built in SharePoint Online is based on the Publishing site (not modern) with a complex structure:

  • Each department placed as subsite
  • Difference between the countries and region (organized in Treeview)
  • Difference between business and function
  • ...

We have at the end a large number of subsites with the news published in each.

 

The only way to present the result to the user into aggregated view is to pass threw the search engine and the associated queries.

 

The question of statistic will become hard to give the answers for the following question:

  • What is the department with the biggest news number?
  • What is the number of news published per month?
  • Who is the most active content manager?
  • ...

 

The only way to give that feedback is to export all the news (with couple of metadata) into a CSV and apply the Excel transformation (or PowerBI) to create simple presentation mode.

 

The following script will give you that kind of export, you only have to adapt the search query and fields to get from the Search to have your result.

 

[string]$SitePagesURL = "https://yourtenant.sharepoint.com"
[DateTime]$PortalPublishingDate = Get-Date
[string]$CSVFileName = "ExportAllNewsItems.csv"

[string]$queryText = "ContentTypeId:0x010100C568DB52D9D0A14D9B2FDCC96666E9F2007948130EC3DB064584E219954237AF3900242457EFB8B24247815D688C526CD44D00DAB155038B062847A409F1E450E9E5E3*  Path:https://yourtenant.sharepoint.com/intranet "
[string]$outputline = ""

[int]$TempUserID = 0

# ---------------------------------------------------------------------------------------------------------------
function Load-DLLandAssemblies
{
	[string]$defaultDLLPath = ""

	# Load assemblies to PowerShell session 
	$defaultDLLPath = "C:\Program Files\SharePoint Online Management Shell\Microsoft.Online.SharePoint.PowerShell\Microsoft.SharePoint.Client.dll"
	[System.Reflection.Assembly]::LoadFile($defaultDLLPath)

	$defaultDLLPath = "C:\Program Files\SharePoint Online Management Shell\Microsoft.Online.SharePoint.PowerShell\Microsoft.SharePoint.Client.Runtime.dll"
	[System.Reflection.Assembly]::LoadFile($defaultDLLPath)

	$defaultDLLPath = "C:\Program Files\SharePoint Online Management Shell\Microsoft.Online.SharePoint.PowerShell\Microsoft.Online.SharePoint.Client.Tenant.dll"
	[System.Reflection.Assembly]::LoadFile($defaultDLLPath)

	$defaultDLLPath = "C:\Windows\Microsoft.NET\assembly\GAC_MSIL\Microsoft.SharePoint.Client.Search\v4.0_16.0.0.0__71e9bce111e9429c\Microsoft.SharePoint.Client.Search.dll"
	[System.Reflection.Assembly]::LoadFile($defaultDLLPath)

}

function Get-SearchResults([int] $startIndex, $myclientContext)
{
	try
	{
		$keywordQuery = New-Object Microsoft.SharePoint.Client.Search.Query.KeywordQuery($MyctxTemp) 
        $keywordQuery.StartRow = $startIndex #gets or sets the first row of information from the search results
		$keywordQuery.QueryText = $queryText
		$keywordQuery.RowLimit = 500
		$keywordQuery.RowsPerPage = 500
		$keywordQuery.TrimDuplicates=$false  
		$keywordQuery.Timeout = 10000;
		$keywordQuery.SelectProperties.Add("LastModifiedTime")
		$keywordQuery.SelectProperties.Add("RefinableDate00")   # Date Article
		$keywordQuery.SelectProperties.Add("RefinableString00") # Business & Function
		$keywordQuery.SelectProperties.Add("RefinableString01") # News Type
		$keywordQuery.SelectProperties.Add("RefinableString03") # Publishing Area / Country
		$keywordQuery.SelectProperties.Add("CreatedBy")
		$keywordQuery.SortList.Add("RefinableDate00","Desc")

		$searchExecutor = New-Object Microsoft.SharePoint.Client.Search.Query.SearchExecutor($MyctxTemp)
		
		$ClientResult = $searchExecutor.ExecuteQuery($keywordQuery)
		$MyctxTemp.ExecuteQuery()
        #$MyctxTemp.ExecuteQueryWithIncrementalRetry(5, 30000); #5 retries, with a base delay of 30 secs.
		
		Write-Host "         - Item number into the function:", $ClientResult.Value[0].ResultRows.Count 
				
		return $ClientResult.Value[0]

	}
	Catch [Exception] {
		Write-host " >>>> ERROR MESSAGE:", $_.Exception.Message -f Red
		Return $False
	}        

}

# ---------------------------------------------------------------------------------------------------------------
Function Get-All-Intranet-News-Published-ExportCSV($MyctxTemp, $MyspoRootwebTemp)
{
	[System.Data.DataTable]$resultDataTable = new-object System.Data.DataTable("SGSWORLDNEWS")
	[System.Data.DataColumn]$titleCol = new-object System.Data.DataColumn("Title")
	[System.Data.DataColumn]$pathCol = new-object System.Data.DataColumn("Path")
	[System.Data.DataColumn]$RefinableDate00Col = new-object System.Data.DataColumn("RefinableDate00")
	[System.Data.DataColumn]$RefinableString00Col = new-object System.Data.DataColumn("RefinableString00")
	[System.Data.DataColumn]$RefinableString01Col = new-object System.Data.DataColumn("RefinableString01")
	[System.Data.DataColumn]$RefinableString03Col = new-object System.Data.DataColumn("RefinableString03")
	[System.Data.DataColumn]$CreatedByCol = new-object System.Data.DataColumn("CreatedBy")

	$resultDataTable.Columns.Add($titleCol)
	$resultDataTable.Columns.Add($pathCol)
	$resultDataTable.Columns.Add($RefinableDate00Col)
	$resultDataTable.Columns.Add($RefinableString00Col)
	$resultDataTable.Columns.Add($RefinableString01Col)
	$resultDataTable.Columns.Add($RefinableString03Col)
	$resultDataTable.Columns.Add($CreatedByCol)
	
	[int]$currentRowIndex = 0
	$resultTable = Get-SearchResults $currentRowIndex $MyctxTemp
	Write-Host "  >> Total Rows Include Duplicated:", 		$resultTable.TotalRowsIncludingDuplicates -ForegroundColor Red
	
	if(($resultTable -ne $null) -and ($resultTable.TotalRowsIncludingDuplicates -gt 0))
	{
		while ($resultTable.TotalRowsIncludingDuplicates -gt $resultDataTable.Rows.Count)
		{
			foreach($resultRow in $resultTable.ResultRows)
			{
				[System.Data.DataRow]$myrow = $resultDataTable.NewRow()
				$myrow["Title"] = $resultRow["Title"]
				$myrow["Path"] = $resultRow["Path"]
				$myrow["RefinableDate00"] = $resultRow["RefinableDate00"]
				$myrow["RefinableString00"] = $resultRow["RefinableString00"]
				$myrow["RefinableString01"] = $resultRow["RefinableString01"]
				$myrow["RefinableString03"] = $resultRow["RefinableString03"]
				$myrow["CreatedBy"] = $resultRow["CreatedBy"]
				$resultDataTable.Rows.Add($myrow)
			}
			
			$currentRowIndex = $resultDataTable.Rows.Count
			
			$resultTable = $null
			$resultTable = Get-SearchResults $currentRowIndex $MyctxTemp
			if (($resultTable -ne $null) -and ($resultTable.TotalRowsIncludingDuplicates -gt 0))
			{
				if ($resultTable.RowCount -le 0)
				{
					break
				}
			}
			else
			{
				break
			}
		}
	}
	[string] $totalResults = $resultDataTable.Rows.Count;
	Write-Host "     >>>>> Table Items placed into Datatable: ", $totalResults -ForegroundColor Yellow

	Clear-Content $CSVFileName
	$outputline = '"NewsTitle";"PublicationDate";"NewsURL";"BusinessFunction";"NewsType";"PublishingAreaCountry";"NewsCreator";'
	Add-Content -Encoding UTF8 -Force $CSVFileName $outputline   

	foreach($result in $resultDataTable.Rows)
	{
		if($result["RefinableDate00"] -ne "")
		{
			$TempString = $result["RefinableDate00"].split(';')[0]
			$PortalPublishingDate=[datetime]::ParseExact([string]$TempString, 'M/d/yyyy h:mm:ss tt', [CultureInfo]::InvariantCulture) #10/2/2018 10:00:00 PM
	 
			Write-Host "  ---------------------------------------- "
			Write-Host " ------>>> NewsPath:", $result["Path"]
			Write-Host " ------>>> Title:", $result["Title"] 
			Write-Host " ------>>> RefinableDate00:",  $result["RefinableDate00"] #$PortalPublishingDate
			Write-Host " ------>>> PublicationDate:",  $PortalPublishingDate
			Write-Host " ------>>> BusinessFunction:", $result["RefinableString00"]
			Write-Host " ------>>> NewsType:", $result["RefinableString01"]
			Write-Host " ------>>> PublishingAreaCountry:", $result["RefinableString03"]
			Write-Host " ------>>> NewsCreator:", $result["CreatedBy"]
			Write-Host "  ---------------------------------------- "

			#CSV file location, to store the result
			$outputline = '"'+ $result["Title"] +'";"'+ $PortalPublishingDate.ToString("dd.MM.yyyy") +'";"'+ $result["Path"] +'";"'+ $result["RefinableString00"] +'";"'+ $result["RefinableString01"] +'";"'+ $result["RefinableString03"] +'";"'+ $result["CreatedBy"] +'";'
			Add-Content -Encoding UTF8 -Force $CSVFileName $outputline
		}
	}
}

# ---------------------------------------------------------------------------------------------------------------
Load-DLLandAssemblies

#get and save your O365 credentials
[string]$username = "loginadmin@yourtenant.onmicrosoft.com"
[string]$PwdTXTPath = "C:\SECUREDPWD\ExportedPWD-$($username).txt"
$secureStringPwd = ConvertTo-SecureString -string (Get-Content $PwdTXTPath)
$cred = New-Object System.Management.Automation.PSCredential -ArgumentList $username, $secureStringPwd

#connect to the web site using the stored credentials
Write-host " "
Write-host " -------------------------------------------------------------------------------------------- " -ForegroundColor green
Write-host " ---- CONNECT THE SITE --- " -ForegroundColor green
Write-host "   CONNECTED SITE:", $SitePagesURL  -ForegroundColor Yellow

$Myctx = New-Object Microsoft.SharePoint.Client.ClientContext($SitePagesURL +"/intranet")

$Myctx.Credentials = New-Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($cred.UserName,$cred.Password)
$Myctx.RequestTimeout = 1000000 # milliseconds
$MyspoRootweb = $Myctx.Web
$Myctx.Load($MyspoRootweb)
$Myctx.ExecuteQuery()

Write-Host " "
Write-Host " ---------------------------------------------------------"
Write-Host "  >>>> # Server Version:" $Myctx.ServerVersion " # <<<<<<" -ForegroundColor Green 
Write-Host " ---------------------------------------------------------"
Write-Host " "

Write-host " -------------------------------------------------------- "
Write-host "   -->> RootSite:", $MyspoRootweb.URL -ForegroundColor green

Write-host " "

Get-All-Intranet-News-Published-ExportCSV $Myctx $MyspoRootweb

 

The solution was used on my production site with thousands of news published and give us now statistics for the last years of that Intranet site.

 

Fabrice Romelard

 

French version of that message:

 

The source sites used to build that script are:

 

0 Replies