I’ve recently used paperless-ngx and I thought I’d write down some thoughts on features I liked and I could see in OpenAleph at some point. I will not mention similarities or negatives and there certainly are some areas I haven’t used (extensively) yet.
Paperless (used to be paperless, then paperless-ng and the current active fork is called paperless-ngx) is a document management system with a strong focus on self-hosting, popular in private setups but also seen in small business setups. The main use case is to scan and digitize ones written documentation, archive it long-term and make it searchable.
- The on-boarding experience is great. The website gives a great overview with lots of screenshots and there’s a live demo. The documentation is exhaustive, includes good guides on how to set it up in different scenarios and goes into details for different use cases. Settings are explained very well. I like how there are several well commented docker-compose files ready for different scenarios (sqlite, postgres, tika and combinations). Takeaway: live demo? Look into specific hosting scenarios and give better guidance, maybe?
- Not a feature per se, but it’s the best disguised Django app I’ve seen in a long while. You can access the admin interface and the db migrations and permission system kind of give it away, but it’s a good use of Django for doing things Django is good at.
- There’s a status dialog in the admin section giving a decent overview of the host system. There’s a ton of similar features I could see helping OpenAleph self hosters, starting from status of DBs, going through metrics like disks and memory and ending with general metrics like entity count (per type) etc.
- File tasks is akin to Aleph’s status page, but down to file granularity. Files which error out during ingestion have their error message attached.
- There’s a neat looking workflow feature, which I haven’t looked into yet. One defines triggers (like “consumption started” and then certain actions like “Assign a property/tag” or “Call a webhook”.
- Under the hood it uses probably my favourite ocr project: OCRmyPDF (and has extensive configuration options exposed for it), as well as (optionally) gotenberg and tika (with Python bindings) for converting documents and images to pdf. More of an FYI.
