convert pdf to text using ocr
make sure you have imagemagick and tesseract are installed
$ sudo apt install imagemagick tesseract-ocr
It's a 2 step process:
- 1.Convert PDF to .tiff using
convert
from imagemagick$ convert -density 300 input.pdf -depth 8 output.tiff - 2.convert .tiff to text using
tesseract
generateout.txt
$ tesseract output.tiff out
Last modified 1yr ago