Skip to content

Instantly share code, notes, and snippets.

@ryanmcgary
Forked from allewun/remove-watermark.sh
Last active June 17, 2024 11:15
Show Gist options
  • Save ryanmcgary/d996ab7f6363e65110a64a0f5bc425bb to your computer and use it in GitHub Desktop.
Save ryanmcgary/d996ab7f6363e65110a64a0f5bc425bb to your computer and use it in GitHub Desktop.
Remove a textual watermark from a PDF file.
zngguvnf's Blog
Processing multiple PDFs using the command line
:linux:
<2017-12-01>
Update [2018-04-17 Tue]
Remove password from pdf
Remove string from pdf
Update [2018-03-30 Fri]:
qpdf commands as alternativ to pdftk which is no longer available for fedora.
Quote Stackexchange answer to 'How can I reduce the file size of a scanned PDF file?'
From time to time I need to process lots of .pdf files.
Here are a few commandline calls that help me a lot:
Split pdf in single pages
Split one .pdf with multiple pages in multiple .pdf files with just one page.
pdftk PdfWithMultiplePages.pdf burst
qpdf --split-pages input.pdf output.pdf
Merge pages to single pdf
Merge multiple .pdf with one or more pages into one single .pdf.
To merge all .pdf in currenct directory to one single file:
pdftk ./*.pdf cat output PdfWithMultiplePages.pdf
Alternatively to can type pdftk, mark all files you want to combine in your file manager, drag and drop them to your terminal and finish the command with cat output PDFWithMultiplePages.pdf
pdfjam ./*pdf -o PdfWithMultiplePages.pdf
Convert from DIN A3 (landscape) to DIN A4 portrait
Sometimes .pdf are in DIN A3 (landscape) and it looks like two DIN A4 pages side by side.
Use the following command to split those documents:
mutool poster -y 2 input.pdf output.pdf
(use -y to preform a vertical split or -x for a horizontal split.) mutool comes as part of mupdf (sudo apt install mupdf-tools)
Convert to DIN A4
pdfjam --outfile filename.pdf --paper a4paper filename.pdf
Batch processing
To convert all .pdf files including those in subfolders to a4
for f in ./**/*.pdf ; do
pdfjam --outfile "$f" --paper a4paper "$f"
done
Reduce file size of scanned PDF file
There is a question for this on stackexchange and a fantastic answer, which I would like to insert here for reference:
Use the following ghostscript command:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
-dPDFSETTINGS=/screen lower quality, smaller size.
-dPDFSETTINGS=/ebook for better quality, but slightly larger pdfs.
-dPDFSETTINGS=/prepress output similar to Acrobat Distiller "Prepress Optimized" setting
-dPDFSETTINGS=/printer selects output similar to the Acrobat Distiller "Print Optimized" setting
-dPDFSETTINGS=/default selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file
Remove password from pdf
qpdf -password=YourTopSecretPassword -decrypt password-protected-file.pdf file-without-password.pdf
Remove string from pdf
(works to remove text that you can mark in the pdf)
qpdf --stream-data=uncompress YourFile.pdf uncompressed.pdf
Replace 'Some Text' with whitespace
sed 's/Some Text/ /g' < uncompressed.pdf > uncompressed_without_string.pdf
If you want to replace things other than letters (such as brackets), the sed manual will help you. Sometimes it is helpful to remove the desired expression in individual steps (but watch out that you only delete it where you want it to be deleted).
qpdf --stream-data=compress uncompressed_without_string.pdf YourFile_free.pdf
Comments
If you have comments, questions or opinions please drop me a line at blog AT zngguvnf dot org. Please tell me whether it's ok to publish your comment here or not.
archive
Creative Commons License
https://zngguvnf.org by zngguvnf is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
#!/bin/bash
# Remove a textual watermark from a PDF file. Requires:
# - qpdf (brew install qpdf)
# - pdftx (https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/pdftk_server-2.02-mac_osx-10.11-setup.pkg)
# - coreutils (brew install coreutils)
#
# Tested on macOS 10.12 Sierra.
if [[ -z $3 ]]; then
echo "Usage: ./remove-watermark.sh WATERMARK input.pdf output.pdf"
exit
fi
WATERMARK=$1
INBOUND=$2
OUTBOUND=$3
UNCOMPRESSED=`mktemp 'uncompressed-XXXXXXXXXX.pdf'`
FIXED=`mktemp 'fixed-XXXXXXXXXX.pdf'`
UNMARKED=`mktemp 'unmarked-XXXXXXXXXX.pdf'`
WATERMARKLEN=${#WATERMARK}
BLANKS=`printf %${WATERMARKLEN}s`
qpdf --stream-data=uncompress "${INBOUND}" $UNCOMPRESSED
gsed -e 's/'"${WATERMARK}"'/'"${BLANKS}"'/g' < $UNCOMPRESSED > $FIXED
pdftk $FIXED output $UNMARKED
qpdf --stream-data=compress $UNMARKED "${OUTBOUND}"
rm $UNCOMPRESSED $FIXED $UNMARKED
# NO WARRANTY
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
# LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment