Forum Discussion

sjh122's avatar
sjh122
Copper Contributor
Sep 26, 2024

Can Copilot extract and concatenate all paragraphs containing given keyword?

I want to upload ebooks and papers to Copilot and ask it to extract and concatenate every paragraph containing strings I give it, eg "Battery pack".

 

I haven't been able to make it do this.

 

Right now Copilot 1) misses most occurences of simple keywords (missed 57/60) 2) tries to summarise instead of extracting verbatim text 3) doesn't understand the limits of a paragraph and when it does find an occurence correctly, it extracts most of the page.

 

Can anyone suggest how I can drive it better please?

 

Thanks.

2 Replies

  • You're absolutely right that Copilot currently struggles with exact keyword extraction, especially across long-form documents like ebooks or academic papers. Here's some background and suggestions on how to get more reliable results:

    1. Copilot is optimized for summarization, not verbatim extraction.
    Copilot (especially in apps like Word or Teams) is designed to understand and summarize content, not necessarily extract it word-for-word. When you ask it to find all paragraphs with a keyword, it often rephrases, summarizes, or gives broader context — even if you explicitly ask for exact matches.

    2. Keyword detection isn't exhaustive.
    In your test (missing 57 out of 60 occurrences), it's likely that Copilot didn’t scan the full content or its index didn’t register all keywords. This can happen if the document is long, scanned, or structured in a way that confuses layout parsing (like multi-column formats or footnotes).

    3. Paragraph boundaries are loosely interpreted.
    Copilot doesn’t always recognize what you mean by a paragraph. It might treat an entire page or section as one block if the formatting isn't clearly separated.

    Suggestions to improve results:

    • Use Word with Copilot, not Teams. Upload your document to Word and ask Copilot inside Word directly. It has more context and formatting awareness there.
    • Preprocess documents. Before using Copilot, clean up the structure: ensure consistent formatting, clear paragraph breaks, and remove headers/footers that repeat on each page.
    • Use explicit instructions. Try prompts like:
      "List all paragraphs in this document that contain the exact phrase 'Battery pack', without summarizing or rephrasing. Return each paragraph as-is."
    • Break the document into smaller parts. If the file is too large, split it into chapters or sections. Copilot performs better on shorter, well-scoped inputs.
    • Consider using Microsoft Copilot Studio or Power Automate with AI Builder. These tools offer more precise control and may allow you to build a custom extraction flow based on keyword matching.
  • It can be done using Copilot commands, or batch processing can be carried out with the help of Power Automate.

Resources