星期二, 九月 16, 2014

Convert or split scanned pdf pages one into two

Many people might have scanned books which has two book pages per page. If you are reading them on a pad, it is not very convenient. Below are the steps to convert them.
  1. Extract the pictures from the pdf file
    pdfimage some.pdf book
    
    It is better to use pdfimage than to do
    convert some.pdf some.jpg
    
  2. View with a picture viewer to determine the area you want to crop.
  3. Then run the following bash scripts
    #!/usr/bin/env bash
    page=0
    for i in *pbm
    do
        let page=page+1
        #crop dimensions and offsets
        convert $i -crop 1850x1200+30+30 $page.jpg
        let page=page+1
        convert $i -crop 1850x1200+30+1400 $page.jpg
    done
    #
    # you may also want to adjust the page numbers, e.g.
    #for i in `seq 111 -2 1`
    #do
    #    let j=$i+2
    #    mv $i.jpg $j.jpg
    #done
    #
    # let the jpg files have the same name length
    # usually a book has less than 1000 pages
    for i in ?.jpg
    do
        mv $i 00$i
    done
    
    for i in ??.jpg
    do
        mv $i 0$i
    done
    
    convert *jpg tgt.pdf
    
File tgt.pdf is what you want.
Note,
  1. pbm is for monochrome images, ppm is for non-monochrome images.
  2. change to png format to reduce the pdf file size if it is black and white.

没有评论: