I'm looking for something that I can scan hand-written notes into and have OCR'd. Maybe one that I can even train on my handwriting. Ideally I end up with a searchable PDF of my notes.

People use one-note for this, but I'm not really comfortable with letting microsoft see my handwriting.

  • birdcat@lemmy.ml
    ·
    4 months ago

    To train an AI to recognize handwriting you need a huge dataset of handwriting examples. That is millions of samples of handwritten text + information about what the written text says in every example).

    then how can this model be so good? the dadaset is only 350 MB and the results seem insane ... sadly i have no idea how to use it.

    • mindlight@lemm.ee
      ·
      edit-2
      4 months ago

      How good is good do you say?

      We got a pretty good results with CER at 4% and WER at 15%!

      This was on a limited dataset used to test and train which most likely means that if you introduced an even larger dataset with greater variations in handwriting style for testing the numbers might be even worse.

      Very simplified: A risk of a character wrong every 20th character and a word wrong every 7th word. The SER was around 20%.

      There's an reason why no one has released a good model for western letters yet and why companies pay up to 1€ for capturing data from 10 handwritten pages.

      It will come but OCR isn't as sexy as developing text2image solutions.