convert pdf to text using ocr

make sure you have imagemagick and tesseract are installed

$ sudo apt install imagemagick tesseract-ocr

It's a 2 step process:

  1. Convert PDF to .tiff using convert from imagemagick

    $ convert -density 300 input.pdf -depth 8 output.tiff
  2. convert .tiff to text using tesseract

    generate out.txt

    $  tesseract output.tiff out

Last updated