I spun up an instance of paperless-ngx on my Docker host a couple days ago, and just yesterday got my document scanner configured to send things to its Consume folder. So far I'm beyond impressed and I wish I'd learned about it much sooner! I run a FreeNAS server which has collected a lot of important documents in its 10 years of life... all of them arranged in folders as best as I could. Fuck folders, tags are the way.

It was easier than I expected to get the container running and tell it to watch a folder on the FreeNAS share. So I have a decade of pseudo-organized archives to import? Click and drag the folder, and it's done. Amazing.

The automatic tagging seems OK so far. If I'm working on several documents of a similar provenance it starts suggesting appropriate tags after I manually tag about 10 or so. I'll be interested to see how it does as I train it more.

I was never going to pay for a service like this, even though I really needed it. Finding out about paperless has been a revelation for me, haha. And on top of that it's the most "just works" of anything I've tried self-hosting so far. Easy to set up, and it seems feature-rich with a good UI. What's not to love? penguin-love

Anyone else out there using paperless-ngx and have any tips or tricks to share? Things you wish you knew before?

https://github.com/paperless-ngx/paperless-ngx

  • btfod [he/him, comrade/them]
    hexagon
    ·
    edit-2
    3 months ago

    I'm away from a terminal rn, but there's an "Upload files" button on the Dashboard if memory serves, which presumably lets you send files thru the web interface. I haven't used it yet.

    Edit: Forgot to say, my original document archive, the Consume folder, and the Media folder all live on the same NAS share, which is mounted on my main PC. So to import all those documents I'm just literally cutting and pasting files from within my file manager. I'm only using the web interface to manage what's been consumed.

    I had the same reservations, but the way it seems to work out of the box is, nothing is deleted. Anything it can consume will get turned into a pdf, OCRd, and indexed. Then the new PDF and the original file get moved to the media folder you specify. Original is always retained and can be downloaded to your client machine at any time alongside the new PDF.

    I have mine set to apply an inbox tag to anything newly consumed. I also saved a custom view which filters in any document with that tag. I'm using that View as my work queue so to speak. The inbox tag is removed after I give the file a once-over. Makes it easy to see which documents still need my attention.

    • Inui [comrade/them]
      ·
      3 months ago

      Thanks for sharing! That sounds perfect since it keeps the original file or else lets me manually upload and delete myself.

      • btfod [he/him, comrade/them]
        hexagon
        ·
        3 months ago

        You're welcome, happy to help.

        One other thing I should note, default behavior is to rename your original files to match the ID number assigned by paperless. I'm not sure if this can be changed... I had a few reservations about this but I accepted it too - I can let it do its thing and all it'll cost me is being super careful about db backups.