Forum Discussion
How do I remove watermark from PDF on a Windows PC?
PyMuPDF brute force removal is a method exclusive to programmers, used to remove watermarks from PDF in batches. This method is relatively simple, using PyMuPDF (a Python library) to directly modify the text and graphic elements in the PDF file to achieve the effect of removing the watermark. This method is more "violent" because it directly operates the underlying content of the PDF, but for programmers, it can process files more flexibly and efficiently, especially suitable for batch processing of multiple files.
Install PyMuPDF: First, you need to install PyMuPDF in the Python environment. You can install it with the following command:
pip install pymupdf
Read PDF files and iterate through each page: Use PyMuPDF to read PDF files and iterate through each page to find and remove watermarks. Here is a simple Python script that shows how to remove PDF watermark and manipulate each page:
import fitz # How to import PyMuPDF
# Open a PDF file
doc = fitz.open("input.pdf")
# Loop through each page
for page_num in range(len(doc)):
page = doc.load_page(page_num)
# Find and clear the watermark content
# Assume the watermark is some text or a graphic object
for img in page.get_images(full=True):
xref = img[0]
page.delete_image(xref) # Delete the image watermark
# Delete the text watermark
blocks = page.get_text("dict")["blocks"]
for block in blocks:
if block["type"] == 0: # This is a text block
text = block["text"]
if "watermark" in text: # Assume the watermark text contains "watermark"
page.delete_text(block) # Delete the text block
# Save the modified PDF
doc.save("output.pdf")
The script works by going through each page and looking for watermark elements, then removing them. It targets two common watermark types: image watermarks and text watermarks. You can further adjust the code based on the specific watermark type.
Save the modified PDF file: Finally, the script saves the processed PDF file as a new file (output.pdf) to ensure that the original file is not overwritten.