nickstiv , I agree that fixed DPI/color space/compression may not be optimal solution for some use cases. As you pointed out, it is possible to detect some of this information from PDF file content or alternative solution is to add text layer on top of existing PDF by creating new PDF page with text and merge existing and "invisible text" pages. Such approach is slightly less generic vs rendering approach in this blog post. It may cause duplication of text inside PDF, if original PDF already had some digital text in it. Also it is slightly less reliable since "merge" PDF content sometimes causes issues especially for rotated pages (depending of method/packages used). But if your PDFs are not very complex and completely image-based, such approach will allow to maintain original size of the PDF + some delta for text content.