About DocuClean

Our Mission

DocuClean was built with one simple belief: essential PDF tools should be free, fast, and private. Too many online PDF tools either charge monthly subscriptions for basic features, plaster your processed files with their own watermarks, or quietly store your uploaded documents on their servers. We wanted to offer something better.

DocuClean is a free online toolkit that handles the most common PDF tasks people face every day: removing unwanted watermarks, merging multiple files into one, splitting large documents into smaller sections, and compressing files for easy sharing. No account required, no usage limits, no tricks.

How DocuClean Works

When you upload a PDF to DocuClean, your file is sent securely over HTTPS to our processing server. The file is handled entirely in memory - it's never written to disk or stored in a database. Our processing engine (built with Python and PyMuPDF) performs the requested operation, returns the result to your browser, and the original file is immediately discarded.

This architecture was chosen specifically for privacy. Unlike services that queue files for processing and store them temporarily on cloud storage, DocuClean's in-memory approach means your documents exist on our server only for the few seconds it takes to process them. Once you receive your result, the data is gone.

What We Offer

📝 Watermark Removal

Remove text watermarks like "DRAFT," "CONFIDENTIAL," and custom keywords. Includes margin trimming for header/footer stamps.

📎 Merge PDFs

Combine up to 20 PDF files into a single document. Drag to reorder before merging. Up to 30 MB total.

✂️ Split PDF

Extract specific pages or page ranges from any PDF. Flexible page selection with individual pages and ranges.

🗜️ Compress PDF

Reduce file sizes by up to 85% with three quality levels. Preview estimated size before processing.

Who Built This

DocuClean is an independent project built and maintained by a small development team passionate about making useful tools accessible to everyone. We believe that basic document management shouldn't require expensive software subscriptions, and we're committed to keeping DocuClean free and private.

The project is sustained through non-intrusive advertising via Google AdSense. This allows us to cover server costs and continue developing new features without charging users or compromising on privacy.

Contact Us

We'd love to hear from you - whether you have a feature request, found a bug, or just want to say hello. You can reach us at:

Email: support@docuclean.app
Website: docuclean.app

We typically respond within 48 hours. For urgent issues, please include "URGENT" in your email subject line.

Technology Behind DocuClean

DocuClean is built on a modern, privacy-first technology stack designed for speed and reliability. Our backend runs on Python with FastAPI, providing an asynchronous request handler that can process multiple PDF operations concurrently without blocking.

The PDF processing engine uses PyMuPDF (fitz), one of the most capable open-source PDF libraries available. PyMuPDF provides direct access to the internal structure of PDF documents, including content streams, font dictionaries, image objects, and annotation layers. This low-level access allows DocuClean to surgically remove watermark elements without re-rendering or recompressing the entire document, preserving the original quality of text, images, and formatting.

For watermark detection, DocuClean scans each page's content stream for text rendering operations that match known watermark patterns. It identifies overlay elements by analyzing properties such as text rotation, opacity, font size, and position relative to the page dimensions. Custom keyword matching allows users to target specific text strings, while the margin trimming feature uses page coordinate geometry to crop content within defined boundaries.

The compression tool works by iterating through all embedded images in the PDF and re-encoding them at reduced resolution using configurable DPI targets. Text and vector elements are left untouched, ensuring that charts, diagrams, and typography remain sharp at any compression level. The merge and split tools manipulate the PDF's page tree directly, which is significantly faster and more reliable than re-rendering individual pages.

The entire application is containerized with Docker and deployed on Fly.io, a platform that provides low-latency edge hosting. Files are processed in RAM using Python's BytesIO streams, meaning document data never touches persistent storage at any point during the operation.

What's Coming Next

DocuClean is actively developed and we have several features planned for upcoming releases. While we cannot commit to specific dates, here is a look at what we are working on.

🖼️

Image Watermark Detection

Extend watermark removal to detect and remove image-based watermarks and logo overlays, not just text-based ones.

📱

Enhanced Mobile Experience

Improved touch interactions, better file handling on mobile browsers, and optimized preview rendering for smaller screens.

🔤

OCR Text Recognition

Add optical character recognition to make scanned PDFs searchable and selectable, improving accessibility and usability.

Have a feature idea? We would love to hear it. Drop us an email at support@docuclean.app and let us know what would make DocuClean more useful for your workflow.