Forum Discussion

Talvo's avatar
Talvo
Copper Contributor
Oct 25, 2025

Import data from a Microsoft Forms PDF into Excel

Hi all. I have a number of PDFs for clients which contain questionnaires, score sheets, etc. Some are completely external, and some are from Microsoft Forms.

A while back, I created a Macro/some VBA code to read data from the external PDFs, import it into Excel and display the figures; it works really well.

Today, I've tried to update it to load data from the PDFs created by "printing" a Microsoft Forms complete questionnaire and saving as a PDF - unfortunately, when trying to import this into Excel (using "Get Data -> From File -> PDF"), the "Likert" questions are appearing but showing "null" for all the columns, not identifying which contains the "selected" checkbox.

Possibly a slim hope, but I was wondering if anyone has experience doing something similar, or can recommend a way to get this to work?

I know there are potentially other ways to approach this (I tried connecting it directly to the linked Sharepoint Excel file, but the URL is entirely powered by URL arguments ?foo=bar&baz=boing etc, and Excel demands those all be deleted when giving it a filename to connect to to get data from the web, helpfully); connecting it to a "local" version of the Excel file on OneDrive is a possibility, but ideally I'd a) want the tool to be accessible to people who have access to the PDFs but not the spreadsheet, and b) have to save the print-out PDFs in every client folder to store their data together, and the spreadsheet may occasionally be purged, so the PDF is the "safe" way to access the data.)

PDFs can't be run through any online tools due to containing sensitive data.

Many thanks.

2 Replies

  • iamamarasmithh's avatar
    iamamarasmithh
    Copper Contributor

    A few approaches you can try: 1. Check PDF structure first: Open one of the PDFs in Adobe Acrobat (or any PDF editor) and try selecting the text or form elements. If you can’t select the checkbox text, it’s likely flattened. 2. Use OCR or text extraction as a workaround: If you must stick with the PDFs, try running OCR (Optical Character Recognition) using a local tool like Adobe Acrobat Pro, Systweak PDF Editor, or any other Editor. That will convert the flattened checkboxes and labels into text that Excel can potentially recognize. 3. Re-export from Forms with data intact: Instead of printing the Form response to PDF, export responses directly to Excel (via Microsoft Forms’ “Open in Excel” option). That preserves the structured data. 4. Alternative automation: If you want to automate it fully, you could set up a Power Automate flow to save each new response to both SharePoint (for backup) and as a CSV/PDF copy in your client folder. That way, you keep the data accessible and structured without depending on the “printed” PDF. Unfortunately, once a Microsoft Forms result is saved as a flat printout PDF, it loses all structured data, so no amount of VBA or Excel’s PDF connector can reliably pull that checkbox info unless OCR or manual parsing is applied.

  • How about this:

     

    1. Use OCR with positional logic
    If your printed PDFs are consistent in layout:
    •    Use Adobe Acrobat Pro or Tesseract OCR to extract text and checkbox symbols.
    •    Write a VBA macro or Python script to parse the OCR output and infer selections based on position or symbols (e.g., “☒” vs “☐”).
    •    This works best if the layout is fixed and you can map each Likert option to a column.
    2. Switch to fillable PDFs
    Instead of printing the Forms:
    •    Use Microsoft Word or Adobe Acrobat to create a fillable PDF version of the questionnaire.
    •    Users fill it out digitally, and you can extract checkbox states using Excel’s “Get Data → From PDF” or VBA.
    3. Use Microsoft Forms API or Excel sync
    If you control the original Forms:
    •    Responses are stored in a linked Excel file or SharePoint list.
    •    You can use Power Automate or Forms API to extract responses directly, bypassing the PDF entirely.

Resources