convert pdf to text using ocr
make sure you have imagemagick and tesseract are installed
$ sudo apt install imagemagick tesseract-ocrIt's a 2 step process:
Convert PDF to .tiff using
convertfrom imagemagick$ convert -density 300 input.pdf -depth 8 output.tiffconvert .tiff to text using
tesseractgenerate
out.txt$ tesseract output.tiff out
Last updated
Was this helpful?