I'm looking for something that I can scan hand-written notes into and have OCR'd. Maybe one that I can even train on my handwriting. Ideally I end up with a searchable PDF of my notes.

People use one-note for this, but I'm not really comfortable with letting microsoft see my handwriting.

  • mindlight@lemm.ee
    ·
    2 days ago

    To train an AI to recognize handwriting you need a huge dataset of handwriting examples. That is millions of samples of handwritten text + information about what the written text says in every example).

    This is why the best engines only exists as a service in the cloud. The OCR engines you can install lovely that are acceptable, but far from perfect, are commercial. Parascript FormXtra is one of the better commercial ones.

    The only OCR Engine that's free and really good is Tesseract OCR but it doesn't handle handwritten text.

    • TrudeauCastroson [he/him]
      hexagon
      ·
      2 days ago

      I don't really need the locally trained AI to recognize general handwriting, only my own.

      I could provide a few pages of my own training data (maybe write out a few pages of "quick brown fox jumps over the lazy dog" and other stuff like that), and then ideally it flags stuff it's unsure about and I clarify some more. Maybe find garbled nonsensical sentences, realize it's probably a mistake, and try and fix it.

      I assumed the leaps in AI would have taken care of this by now, since detecting handwritten letters from touch pen-strokes existed in the 90s. But I guess handing it a chunk of text is too different of a problem, instead of feeding it stroke by stroke?

      • mindlight@lemm.ee
        ·
        1 day ago

        It wasn't possible a year ago when pos6ted around with tesseract. Things might have changed during the last couple of months though.

        • interdimensionalmeme@lemmy.ml
          ·
          1 day ago

          I found the following It migth be possible and affordable

          https://konfuzio.com/en/tesseract/

          https://github.com/Matleo/Tesseract_fine_tuning_training

          https://groups.google.com/g/tesseract-ocr/c/ZLOZpW1fD6I/m/B1Ponc0VBAAJ

          https://arcruz0.github.io/posts/finetuning-tess/