convert pdf to text using ocr

make sure you have imagemagick and tesseract are installed

$ sudo apt install imagemagick tesseract-ocr

It's a 2 step process:

Convert PDF to .tiff using convert from imagemagick

$ convert -density 300 input.pdf -depth 8 output.tiff

convert .tiff to text using tesseract
generate out.txt
```
$  tesseract output.tiff out
```

Last updated 3 years ago

Was this helpful?