Ingest/Index static and dynamic web pages

Search720 · ‎Mar 10 2021

What would the recommended method be to index/ingest standard classic HTML and client-side Javascript rendered web page content? Is there a native web crawler/indexer for "dynamic" web page content?

DerekLegenzoff · ‎Mar 11 2021

@Search720 there's no built-in indexer for crawling web pages so customers often leverage an open-source crawler such as Apache Nutch to extract content from web pages. From there, you can land the content in a supported data source such as Blob storage/Cosmos DB/ADLS Gen2 and index it. You can also push the data directly to the index via the Push API as described here.

Search720 · ‎Mar 11 2021

Thank you @DerekLegenzoff

stejacob · ‎Apr 20 2021

@Search720 You can use the Norconex HTTP connector for dynamic webpages.

https://opensource.norconex.com/collectors/http/

Cheers.

Search720 · ‎Apr 20 2021

Thanks!

Ingest/Index static and dynamic web pages

Ingest/Index static and dynamic web pages

Re: Ingest/Index static and dynamic web pages

Re: Ingest/Index static and dynamic web pages

Re: Ingest/Index static and dynamic web pages

Re: Ingest/Index static and dynamic web pages

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Ingest/Index static and dynamic web pages

Ingest/Index static and dynamic web pages

Re: Ingest/Index static and dynamic web pages

Re: Ingest/Index static and dynamic web pages

Re: Ingest/Index static and dynamic web pages

Re: Ingest/Index static and dynamic web pages