Document management with Raspberry Pi
Using a Rasperry Pi 3 (Raspbian) and Avision MiCube Scanner (SD-Card Version)
Software used or you should be some kind of familiar with:
Software Installation and Setup
apt-get install inotify-tool imagemagick ocrmypdf recoll python-recoll
adduser dms
mkdir -p /opt/scanner/incoming
mkdir /opt/scanner/raw
mkdir /opt/scanner/pdf
mkdir /opt/scanner/ocr
chown -R dms /opt/scanner
udev rule detects scanner power on
Add the line
ACTION=="add" KERNEL=="sd*[0-9]", ATTRS{serial}=="SERIAL_OF_SCANNER", RUN+="/usr/bin/su dms /opt/scripts/scanner.sh"
in a new file in /etc/udev/rules.d/
(eg. scanner.rule)
Execute
sudo udevadm control --reload-rules
to reload the rules and activate the new one.
Script to mount scanners SD-Card ‘scanner.sh’
The scanner I use has a SD-Card slot so I can mount the SD-Card and copy the files to a local folder. With other scanners you will have to use sane perhaps in combination with scanbd
#! /bin/bash
logger "Scanner is online!"
sleep 0.25
# Mount the scanner as folder
mount /dev/disk/by-uuid/628F-0135 /opt/scanner/incoming
logger "Scanner mounted on /opt/scanner/incoming"
sleep 1
# Start processing newly scanned files
/opt/scripts/processor.sh &
Process scanned files ‘processor.sh’
#! /bin/bash
# Temporary folder to separate files in one scan session
FOLDER=`mktemp -d`
# List relevant files and move to tmp folder
FILES=`ls -1 /opt/scanner/incoming`
mv /opt/scanner/incoming/* $FOLDER
logger "Process files $FILES"
# Create a unique filename
DATE=`date +%Y%m%d%H%M%S`
PDFNAME="$DATE.pdf"
logger "Save as $PDFNAME"
# Convert JPG(s) to PDF
convert "$FOLDER/*" /opt/scanner/pdfs/$PDFNAME
logger "PDF generation finished"
# Move raw files to keep them as "originals"
mkdir /opt/scanner/raw/$DATE
for f in $FILES
do
mv "$FOLDER/$f" "/opt/raw/$DATE"
done
rm -r $FOLDER
# OCR the new PDF
logger "Starting OCR"
ocrmypdf -l deu "/opt/scanner/pdfs/$PDFNAME" "/opt/scanner/ocr/$DATE.pdf"
logger "Finished Processing PDF"
# Build or update the index
logger "Update file index"
recollindex -c /opt/conf/recoll.conf
logger "Index updated"
logger "Finished for $DATE.pdf"
In my setup the destination of ocrmypdf
is a NFS folder mounted in an
owncloud instance to have all files backed up and accessible via my cloud.
Use recoll to index the PDF/A files
Create a config file for recoll ‘recoll.conf’ with the content
topdirs = /opt/scanner/ocr
I use the default config, for futher config options take a look at /usr/share/recoll/examples
.
Setup and configure the recoll-webui
Get the current webui with curl
curl https://codeload.github.com/koniu/recoll-webui/zip/master > master.zip
To make a first test run
recoll-webui-master/webui-standalone.py -a 0.0.0.0 -p 11080
and browse to ‘http://IP.OF.THE.PI:11080'.
If there is already a recoll index you can perform a search query.
To have the webui served by nginx or apache take a look at the documentation.