pdf2text and tesseract, i believe pdf2text uses tesseract. i have them installed on an apple silicon mac with homebrew (e.g. brew install tesseract or brew install pdf2text)
could probably use some ai computer vision package (i haven't checked, i remember looking around before settling on pdf2text) like opencv.
when i used pdf2text it was with pdf slides my prof provided, they ONLY gave pdfs. something about copyright and IP. super interesting prof, great scientist, great researcher, actually a member of some cool orgs like Linnaeus Society, and annoying with her lecture files.
EDIT: if anyone wants it enough i can try to do a proof-of-concept for like ~15 random pages of a random doc and see how well it goes
Are there OCR tools that can convert the PDF to a searchable file?
If there are, I certainly don't know about 'em-- stuff I could've used for my textbook epubs last semester
pdf2text
andtesseract
, i believepdf2text
usestesseract
. i have them installed on an apple silicon mac with homebrew (e.g.brew install tesseract
orbrew install pdf2text
)could probably use some ai computer vision package (i haven't checked, i remember looking around before settling on
pdf2text
) likeopencv
.when i used
pdf2text
it was with pdf slides my prof provided, they ONLY gave pdfs. something about copyright and IP. super interesting prof, great scientist, great researcher, actually a member of some cool orgs like Linnaeus Society, and annoying with her lecture files.EDIT: if anyone wants it enough i can try to do a proof-of-concept for like ~15 random pages of a random doc and see how well it goes