Ingest/Index static and dynamic web pages

Question

What would the recommended method be to index/ingest standard classic HTML and client-side Javascript rendered web page content? Is there a native web crawler/indexer for "dynamic" web page content?&nbsp;&nbsp;

stejacob · Answer

Search720&nbsp;You can use the Norconex HTTP connector for dynamic webpages.&nbsp;https://opensource.norconex.com/collectors/http/&nbsp;Cheers.

dereklegenzoff · Answer

Search720&nbsp;there's no built-in indexer for crawling web pages so customers often leverage an open-source crawler such as Apache Nutch to extract content from web pages. From there, you can land the content in a supported data source such as Blob storage/Cosmos DB/ADLS Gen2 and index it. You can also push the data directly to the index via the Push API as described here.
&nbsp;

search720 · Answer

Thank you&nbsp;DerekLegenzoff&nbsp;

search720 · Answer

Thanks!

Forum Discussion

Ingest/Index static and dynamic web pages

4 Replies