Error Opening PDF in Word

Copper Contributor

Hi Community,

 

I have facing an issue from past few months, I have my issue below and also the techniques I have used to correct it but no results.

 

My Issue:

 

 

Let Say,

 

I open a new MS Word 2019 file and copy-pasted images from google or any search enginer or any image extracted using snipping tool into MS word 2019 file , then i exported it to pdf .

Now i deleted the word file and only pdf file is saved in the computer.

 

Now to open the pdf file as word i am following the below steps

 

select pdf file > right mouse click>open with> choose another app > word

 

My problem is that after following this procedure instead of opening images as it is in word it is converting the images to editable text.

 

*However, earlier when i started using MS Word 2019 it not used to happen, But from few months this has become a terrible problem for me*

Sceenshots: (https://drive.google.com/drive/folders/1xhzkC8KucqkYlrXEGVKPkntNbZpMW3ue?usp=sharing)

word file with images savedword file with images savedexported to pdfexported to pdfsaving as pdfsaving as pdfpreview of pdf savedpreview of pdf savedopening pdf as word us this methodopening pdf as word us this methodselected pdf to open as wordselected pdf to open as wordwarning messagewarning messagepdf opened in word bu weirdlypdf opened in word bu weirdly

Coming back to the problem...Methods i have tried to resolve my issue:

1 . Uninstall and re install Microsoft 365 and Office

2. Restoring default settings of word

3 . checked the Default App for ".pdf" extension . it is set to Adobe Acrobat

**IMPORTANT**

There are few things i suspect, that can help understand my problem

It is happeing to pdfs that contain only images and not to those pdfs which have text as well with the images.

 

While trying different methods of which one was to use third party webapps online available pdf to word converter , i found that it is reading these pdfs as scanned pdf and asking for the use of OCR while converting it pdf to word, which i find odd as the images saved in pdf is copied from either search engines like google or extracted using snipping tool. Then why it is being read as scanned pdf.

I think while converting pdf to word in my computer it is reading these pdfs as scanned pdf and ms word 2019 is applying OCR while opening pdf as word

I have recently installed "UiPath Studio" for learning Robot Process Automation. And around the same time this problem arised.

Here is my problem Its opening for disscussion.

I would love to hear from you all and hope to find a permant solution to it

PLEASE HELP ME!!!!!!!

6 Replies

@Sanskar112233 

First, any pdf opened in Word is always going to have formatting problems.

Second, this is because you images are primarily text.

Third, use the snipping tool (or better SnagIt) to take pictures of your pictures in the pdf and insert them into Word.

 

If you just open the pdf in Word without any third-party converter, what do you have?

Word can open and edit many pdf files without OCR because they may already have a text layer.

 

What follows is general advice/information I give to people having trouble with opening pdf files in Word. Most of it is applicable to your situation.

==============================================

How was the file created originally, and by which program? It could have been created from a scan or a picture taken by a phone camera. Those are pictures of words saved as pdfs. Just as you can have a picture of a car. You can see the car in the picture, but you can't change the timing of the engine in that picture. You can't change the order of text or otherwise edit it with a picture of text. Word can open such a file, but it can't edit it. You have a Word file that contains a picture of text rather than text.

 

In that case, you need to convert the picture to text. This is a process known as optical character recognition. This is built into Adobe Acrobat (but not the free Acrobat Reader) and is also in Office OneNote. Most scanner software comes with an OCR component as well.

How to OCR a PDF in OneNote

Once translated into text, it can be edited in Word but there will still be formatting anomalies.

 

If you simply want to write on the document (but not in it) you can add a Text Box floating on top of the document layer, whether or not it has been put through the OCR process.

 

Web pages or Word documents that have been saved as PDF will not need the OCR process, they retain their text, although not all their Word structure and formatting. Documents created as PDF from other programs will likely be even more problematic.

 

Finally, documents converted from pdf (or really any other format) to Word can be tough to edit because the conversion process never has a one-to-one matching of how formatting is done under the hood. This means that a converted document will seldom be formatted in Word in a way that uses Word features well for that formatting. An example is multiple section breaks to change margins, where in Word you would simply change the paragraph indent. Margins and Indents in Word. Another example is that Word formatting of text is best done using Styles and those will not be used. It will all be direct formatting. That can make a huge difference in how easy it is to edit. The Importance of Styles in Microsoft Word.

 

If possible, find the file from which the pdf was created and edit that file, using the program that created it. Then if you need it in Word format and it is not, convert it directly to Word. This will cut out one conversion process and make for fewer editing problems.

 

When I really need the document in Word format and intend to do much editing, I create a new Word file and paste the content into it as plain text. Then I format it to match the original using Styles for the formatting as much as possible. This takes time; for me, it is worth it and saves a lot of frustration.

@Charles_Kenyon 
Thanks for replying.

I know that when pdf is opened as Word there are some changes in formattings (like indexing and margins). 

But, my concern is that images in pdf should remain as image when opened in word file.
I use snipping tool to get the images.

I think MS Word is applying OCR automatically when i don't want it to

I have been using office 2019 since last 5 years and i could easily open pdf to word without any error.

Also i have realised that i am getting this problem only when the pdf file has only images and text.

pdf files which has both text and images was easily opening in word without OCR being used.

could you help me please

Word does not apply OCR to files.
Period.
Use the snipping tool to get the images from the pdf.
Save the image to a folder on your computer.
Insert the images in the new document from the computer files.

@Charles_Kenyon 

 

Hi,

this trick worked, but it is only usable if i want to take one or two images from pdf.

 

I am finding this problem in a pdf file that only contains images (a lot of images) and no text
it is converting all images to editable text

also when i used online pdf converters i saw that it is reading it as a scanned pdf (in reality they are screenshots i got using snipping tool)

i think word when exporting to pdf is making it as a scanned pdf

what do  you think?


Sorry, I have no opinion. When possible, work with the original Word document. Do not export to pdf and then attempt to edit the pdf in Word. This is asking for problems.