convert pdf to text using ocr
make sure you have imagemagick and tesseract are installed
$ sudo apt install imagemagick tesseract-ocr
It's a 2 step process:
Convert PDF to .tiff using
convert
from imagemagick$ convert -density 300 input.pdf -depth 8 output.tiff
convert .tiff to text using
tesseract
generate
out.txt
$ tesseract output.tiff out
Last updated
Was this helpful?