|
2 years ago | |
---|---|---|
.vscode | 2 years ago | |
VRE | 2 years ago | |
doc | 2 years ago | |
docker | 2 years ago | |
nginx | 2 years ago | |
.dockerignore | 2 years ago | |
.drone.yml | 2 years ago | |
.gitignore | 2 years ago | |
LICENSE | 2 years ago | |
README.md | 2 years ago | |
clouds.yaml.example | 2 years ago | |
docker-compose.yaml | 2 years ago | |
run_scheduler.sh | 2 years ago |
README.md
Virtual Research Environment
Secure data drop-off & routing software.
With this software it is possible to safely upload private and sensitive data like WeTransfer or Dropbox. It is possible to upload single or multiple files at once though a web interface or through an API.
Installation
In order to install this Data drop off project, we need the following packages / software.
- Django
- TUS (The Upload Server)
- NGINX
Django
We install Django with standard settings. We could run it in Aync way, but then you need some more steps: https://docs.djangoproject.com/en/3.0/howto/deployment/asgi/ So for now, we keep it simple.
Install
Clone the code on /opt/deploy/data_drop-off
git clone https://git.web.rug.nl/VRE/data_drop-off.git /opt/deploy/data_drop-off
Then create a virtual environment
cd /opt/deploy/data_drop-off
python3 -m venv .
source bin/activate
Finally we install the required Python modules
pip install -r requirements
This will install all the needed Python modules we need to run this Django project.
External libraries:
Production
https://gitlab.com/eeriksp/django-model-choices
https://github.com/georgemarshall/django-cryptography
https://github.com/jacobian/dj-database-url
https://github.com/ierror/django-js-reverse
https://github.com/henriquebastos/python-decouple
https://github.com/ezhov-evgeny/webdav-client-python-3
https://github.com/dblueai/giteapy
https://pypi.org/project/PyGithub/
Development
https://github.com/jazzband/django-debug-toolbar
Settings
The settings for Django are set in an .env
file so that you can easily change the environment from production to testing. There is an .env.example
file that could be used as a template.
# A uniquely secret key
SECRET_KEY=@wb=#(f4uc0l%e!5*eo+aoflnxb(@!l9!=c5w=4b+x$=!8&vy%a
# Disable debug in production
DEBUG=False
# Allowed hosts that Django does server. Take care when NGINX is proxying infront of Django
ALLOWED_HOSTS=127.0.0.1,localhost
# Enter the database url connection: https://github.com/jacobian/dj-database-url
DATABASE_URL=sqlite:////opt/deploy/data_drop-off/db.sqlite3
# Email settings
# Mail host
EMAIL_HOST=
# Email user name
EMAIL_HOST_USER=
# Email password
EMAIL_HOST_PASSWORD=
# Email server port number to use
EMAIL_PORT=25
# Does the email server supports TLS?
EMAIL_USE_TLS=
Next we have to make the database structure. If you are using SQLite3 as a backend, make sure the database file DOES exist on disk.
touch /opt/deploy/data_drop-off/db.sqlite3
Then in the Python virtual environment we run the following commands:
./manage.py migrate
./manage.py loaddata virtual_machine_initial_data
./manage.py createsuperuser
./manage.py collectstatic
And finally you should be able to start the Django application
./manage.py runserver
TUS
TUS = The Upload Server. This is a resumable upload server that speaks HTTP. This server is a stand-alone server that is running behind the NGINX server.
It is even possible to run a TUS instance on a different location (Amsterdam). As long as the TUS is reachable by the NGINX frontend server, and the TUS server can post webhooks back to the frontend server.
Setup
If needs the package ecnfs* so install that first: sudo apt install encfs
The setup is quit simple. This works the same way as Django by using .env file. So start by creating a new settings files based on the example.
cp .env.example .env
# TUS Daemon settings
# Change the variable below to your needs. You can also add more variables that are used in the startup.sh script
WEBHOOK_URL="http://localhost:8000/datadrops/webhook/"
DROPOFF_API_HAWK_KEY="[ENTER_HAWK_KEY]"
DROPOFF_API_HAWK_SECRET="[ENTER_HAWK_SECRET]"
You need to create an API user in Django that is allowed to communicatie between the TUS daemon and Django. This can be done by creating a new usre in the Django admin. This will also generate a new token, which is needed. This token can be found at the API -> Tokens page.
The default webhook url is: /datadrops/webhook/
Then you can start the upload server by starting with the 'start.sh' script: ./start.sh
This will start the TUS server running on TCP port 1050.
Data storage
The upload data is stored at a folder that is configured in the TUS startup command. This should be folder that is writable by the user that is running the TUS instance. Make sure that the upload folder is not directly accessible by the webserver. Else files can be downloaded.
Hooks
The TUS is capable of handling hooks based on uploaded files. There are two types of hooks. 'Normal' hooks and webhooks. It is not possible to run both hook systems at the same time due to the blocking nature of the pre-create hook. So we use the 'normal' hook system. That means that custom scripts are run. Those scripts can then post the data to a webserver in order to get a Webhook functionality with the 'normal' hooks.
At the moment, there is only a HTTP webcall done in the hook system. There is no actual file movement yet.
For now we have used the following hooks:
- pre-create: This hook will run when a new upload starts. This will trigger the Django server to store the upload in the database, and check if the upload is allowed based on an unique upload url and unique upload code.
- post-finish: This hook will run when an upload is finished. And will update the Database/Django with the file size and actual filename (unique) on disk.
An example of a hook as used in this project. The only changes that should be done is:
- WEBHOOK_URL: This is the full url to the Django webhook
Do not change the HTTP_HOOK_NAME as this will give errors with Django.
#!/usr/bin/env python
import sys
import json
import requests
# Tus webhook name
HTTP_HOOK_NAME='pre-create'
# Django webserver with hook url path
WEBHOOK_URL='http://localhost:8000/webhook/'
# Read stdin input data from the TUS daemon
data = ''.join(sys.stdin.readlines())
# Test if data is valid JSON... just to be sure...
try:
json.loads(data)
except Exception as ex:
print(ex)
# Send exit code higher then 0 to stop the upload process on the Tus server
sys.exit(1)
# We know for sure that JSON input data is 'valid'. So we post to the webhook for further checking
try:
# Create a webhook POST request with the needed headers and data. The data is the RAW data from the input.
webhook = requests.post(WEBHOOK_URL, headers={'HOOK-NAME':HTTP_HOOK_NAME}, data=data)
# If the POST is ok, and we get a 200 status back, so the upload can continue
if webhook.status_code == requests.codes.ok:
# This will make the Tus server continue the upload
sys.exit(0)
except requests.exceptions.RequestException as ex:
# Webhook post failed
print(ex)
# We had some errors, so upload has to be stopped
sys.exit(1)
This hook uses the same data payload as when TUS would use the Webhook system. So using 'Normal' hooks or using Webhooks with DJANGO should both work out of the box.
NGINX
Install NGINX with LUA support through the package manager. For Ubuntu this would be
apt install nginx libnginx-mod-http-lua
Also configure SSL to make the connections secure. This is outside this installation scope.
LUA
There is usage of LUA in NGINX so we can handle some dynamic data on the server side. All LUA code should be placed in the folder /etc/nginx/lua
.
Setup
After installation of the packages, create a symbolic link in the /etc/nginx/sites-enabled
so that a new VHost is created.
Important parts of the VHost configuration:
lua_package_path "/etc/nginx/lua/?.lua;;";
server {
listen 80 default_server;
listen [::]:80 default_server;
# SSL configuration
#
# listen 443 ssl default_server;
# listen [::]:443 ssl default_server;
#
# Note: You should disable gzip for SSL traffic.
# See: https://bugs.debian.org/773332
#
# Read up on ssl_ciphers to ensure a secure configuration.
# See: https://bugs.debian.org/765782
#
# Self signed certs generated by the ssl-cert package
# Don't use them in a production server!
#
# include snippets/snakeoil.conf;
root /var/www/html;
# Add index.php to the list if you are using PHP
index index.html;
server_name localhost;
# This location is hit when the Tus upload is starting and providing meta data for the upload.
# The actual upload is done with the /files location below
location ~ /files/([0-9a-f]+\-[0-9a-f]+\-[1-5][0-9a-f]+\-[89ab][0-9a-f]+\-[0-9a-f]+)?/ {
set $project_id $1; # Here we capture the UUIDv4 value to use in the Tus metadata manipulation
set $tusmetadata '';
# Here we manipulate the metadata from the TUS upload server.
# Now we are able to store some extra meta data based on the upload url.
access_by_lua_block {
local dropoff_tus = require('dropoff_tus');
local project_metadata = ngx.req.get_headers()['Upload-Metadata'];
if project_metadata ~= nill then
ngx.var.tusmetadata = dropoff_tus.updateTusMetadata(project_metadata,ngx.var.project_id);
end
}
# Here we update the Tus server metadata so we can add the project uuid to it for further processing
proxy_set_header Upload-Metadata $tusmetadata;
# Rewrite the url so that the project UUIDv4 is stripped from the url to the Tus server
rewrite ^.*$ /files/ break;
# Disable request and response buffering
proxy_request_buffering off;
proxy_buffering off;
client_max_body_size 0;
# Forward incoming requests to local tusd instance.
# This can also be a remote server on a different location.
proxy_pass http://localhost:1080;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $server_name;
proxy_set_header X-Forwarded-Proto $scheme;
}
location ~ /files {
# Disable request and response buffering
proxy_request_buffering off;
proxy_buffering off;
client_max_body_size 0;
# Forward incoming requests to local tusd instance
proxy_pass http://localhost:1080;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $server_name;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
And there should be a lua
folder in the /etc/nginx
folder.
In order to test if NGINX is configured correctly run nginx -t
and it should give an OK message:
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
Security (not yet inplemented)
It is possible to secure the upload files with PGP encryption. This is done automatically in the Web interface. When you want PGP encryption though API upload, the encryption has to be done before the upload is started. This is a manual action done by the uploader.
So automatic encryption is only available through the Web upload.