8 changed files with 237 additions and 315 deletions
@ -1,312 +1,179 @@
@@ -1,312 +1,179 @@
|
||||
# Virtual Research Environment |
||||
# VRE Broker |
||||
|
||||
[](https://drones.web.rug.nl/VRE/Broker) |
||||
|
||||
Secure data drop-off & routing software. |
||||
Virtual Research Environment Broker is the hart of the [Research Workspace system](/VRE). This is an API that is handling all the actions and requests from either a web interface of direct API. |
||||
|
||||
With this software it is possible to safely upload private and sensitive data like WeTransfer or Dropbox. It is possible to upload single or multiple files at once though a web interface or through an API. |
||||
This API is part of the [Research workspace system](/VRE) created by the [Rijksuniversiteit Groningen](https://www.rug.nl). |
||||
|
||||
## Installation |
||||
This API is responsible for: |
||||
|
||||
- Backend for the web interface to the researchers |
||||
- Inviting other researchers to your project |
||||
- Handles the creation of virtual workspaces (VRW / Openstack) |
||||
|
||||
In order to install this Data drop off project, we need the following packages / software. |
||||
More to come ... |
||||
|
||||
- Django |
||||
- TUS (The Upload Server) |
||||
- NGINX |
||||
|
||||
## Django |
||||
We install Django with standard settings. We could run it in Aync way, but then you need some more steps: https://docs.djangoproject.com/en/3.0/howto/deployment/asgi/ So for now, we keep it simple. |
||||
## Installation |
||||
|
||||
The Broker is made with the [Django Framework](https://www.djangoproject.com/) in combination with [Django Rest Framework](https://www.django-rest-framework.org/). The installation is pretty straight forward. After this the code is at location `Broker` and during the setup we assume that this is the root folder where you are in. |
||||
|
||||
### Install |
||||
Clone the code on `/opt/deploy/data_drop-off` |
||||
- First we need a running version of Redis. For Debian/Ubuntu like OS: |
||||
```sh |
||||
git clone https://git.web.rug.nl/VRE/data_drop-off.git /opt/deploy/data_drop-off |
||||
sudo apt install redis-server |
||||
``` |
||||
Then create a virtual environment |
||||
- Checkout this repository: |
||||
```sh |
||||
cd /opt/deploy/data_drop-off |
||||
python3 -m venv . |
||||
source bin/activate |
||||
git clone https://git.web.rug.nl/VRE/Broker.git |
||||
``` |
||||
Finally we install the required Python modules |
||||
```python |
||||
pip install -r requirements |
||||
- Create a Python3 virtual environment: |
||||
```sh |
||||
cd Broker |
||||
python3 -m venv venv |
||||
``` |
||||
This will install all the needed Python modules we need to run this Django project. |
||||
|
||||
### External libraries: |
||||
#### Production |
||||
https://gitlab.com/eeriksp/django-model-choices |
||||
https://github.com/georgemarshall/django-cryptography |
||||
https://github.com/jacobian/dj-database-url |
||||
https://github.com/ierror/django-js-reverse |
||||
|
||||
https://github.com/henriquebastos/python-decouple |
||||
https://github.com/ezhov-evgeny/webdav-client-python-3 |
||||
https://github.com/dblueai/giteapy |
||||
https://pypi.org/project/PyGithub/ |
||||
|
||||
#### Development |
||||
https://github.com/jazzband/django-debug-toolbar |
||||
|
||||
### Settings |
||||
The settings for Django are set in an `.env` file so that you can easily change the environment from production to testing. There is an `.env.example` file that could be used as a template. |
||||
|
||||
```ini |
||||
# A uniquely secret key |
||||
SECRET_KEY=@wb=#(f4uc0l%e!5*eo+aoflnxb(@!l9!=c5w=4b+x$=!8&vy%a |
||||
|
||||
# Disable debug in production |
||||
DEBUG=False |
||||
|
||||
# Allowed hosts that Django does server. Take care when NGINX is proxying infront of Django |
||||
ALLOWED_HOSTS=127.0.0.1,localhost |
||||
|
||||
# Enter the database url connection: https://github.com/jacobian/dj-database-url |
||||
DATABASE_URL=sqlite:////opt/deploy/data_drop-off/db.sqlite3 |
||||
|
||||
# Email settings |
||||
|
||||
# Mail host |
||||
EMAIL_HOST= |
||||
|
||||
# Email user name |
||||
EMAIL_HOST_USER= |
||||
|
||||
# Email password |
||||
EMAIL_HOST_PASSWORD= |
||||
|
||||
# Email server port number to use |
||||
EMAIL_PORT=25 |
||||
|
||||
# Does the email server supports TLS? |
||||
EMAIL_USE_TLS= |
||||
- Activate the Python3 virtual environment: |
||||
```sh |
||||
source venv/bin/activate |
||||
``` |
||||
|
||||
Next we have to make the database structure. If you are using SQLite3 as a backend, make sure the database file **DOES** exist on disk. |
||||
|
||||
- Install all the required Python3 modules: |
||||
```sh |
||||
touch /opt/deploy/data_drop-off/db.sqlite3 |
||||
pip install -r VRE/requirements.txt |
||||
``` |
||||
|
||||
Then in the Python virtual environment we run the following commands: |
||||
- Create a `.env` config file and adjust the **Environment settings**. At least enable `Debug` in development : |
||||
```sh |
||||
./manage.py migrate |
||||
./manage.py loaddata virtual_machine_initial_data |
||||
./manage.py createsuperuser |
||||
./manage.py collectstatic |
||||
cp VRE/VRE/env.example VRE/VRE/.env |
||||
``` |
||||
|
||||
And finally you should be able to start the Django application |
||||
- Create the database structure and load some needed data |
||||
```sh |
||||
VRE/manage.py migrate |
||||
VRE/manage.py loaddata virtual_machine_initial_data |
||||
VRE/manage.py loaddata university_initial_data |
||||
``` |
||||
- Create a super user (admin) to login to the admin part of the API |
||||
```sh |
||||
VRE/manage.py createsuperuser |
||||
``` |
||||
- Start the Django application |
||||
```sh |
||||
./manage.py runserver |
||||
VRE/manage.py runserver |
||||
``` |
||||
|
||||
### TUS |
||||
TUS = [The Upload Server](https://tus.io/). This is a resumable upload server that speaks HTTP. This server is a stand-alone server that is running behind the NGINX server. |
||||
Now you can enter the Django Admin at `http://localhost:8000/admin/`. If do not see any styles on the page, make sure you have enabled `Debug` or setup [static files for Django](https://docs.djangoproject.com/en/3.2/howto/static-files/) |
||||
|
||||
It is even possible to run a TUS instance on a different location (Amsterdam). As long as the TUS is reachable by the NGINX frontend server, and the TUS server can post webhooks back to the frontend server. |
||||
The API can be found at http://localhost:8000/api/swagger/ or http://localhost:8000/api/redoc/ |
||||
|
||||
#### Setup |
||||
If needs the package ecnfs* so install that first: `sudo apt install encfs` |
||||
|
||||
The setup is quit simple. This works the same way as Django by using .env file. So start by creating a new settings files based on the example. |
||||
## Environment settings |
||||
|
||||
`cp .env.example .env` |
||||
In order to get the API running, you need to specify some settings. This is done by creating environment variables with values that are read out by Django during startup. For this you can use a `.env` file or by manually setting bash environment variables. And example env file can be found at `VRE/VRE/env.example` which can be used as a template. |
||||
|
||||
The location of the env file should be `VRE/VRE/.env` |
||||
|
||||
[more information can be found here](https://github.com/henriquebastos/python-decouple/) |
||||
|
||||
### Variables |
||||
|
||||
All variables have a short explanation above them what they do or used for. |
||||
|
||||
```ini |
||||
# TUS Daemon settings |
||||
# Change the variable below to your needs. You can also add more variables that are used in the startup.sh script |
||||
# A uniquely secret key |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#secret-key |
||||
SECRET_KEY=@wb=#(f4uc0l%e!5*eo+aoflnxb(@!l9!=c5w=4b+x$=!8&vy%' |
||||
|
||||
WEBHOOK_URL="http://localhost:8000/datadrops/webhook/" |
||||
DROPOFF_API_HAWK_KEY="[ENTER_HAWK_KEY]" |
||||
DROPOFF_API_HAWK_SECRET="[ENTER_HAWK_SECRET]" |
||||
``` |
||||
# Disable debug in production |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#debug |
||||
DEBUG=False |
||||
|
||||
You need to create an API user in Django that is allowed to communicatie between the TUS daemon and Django. This can be done by creating a new usre in the Django admin. This will also generate a new token, which is needed. This token can be found at the API -> Tokens page. |
||||
# Allowed hosts that Django does server. Use comma separated list Take care when NGINX is proxying in front of Django |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#allowed-hosts |
||||
ALLOWED_HOSTS=127.0.0.1,localhost |
||||
|
||||
The default webhook url is: /datadrops/webhook/ |
||||
# All internal IPS for Django. Use comma separated list |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#internal-ips |
||||
INTERNAL_IPS=127.0.0.1 |
||||
|
||||
Then you can start the upload server by starting with the 'start.sh' script: `./start.sh` |
||||
# Enter the database url connection. Enter all parts even the port numbers: https://github.com/jacobian/dj-database-url |
||||
# By default a local sqlite3 database is used. |
||||
DATABASE_URL=sqlite:///db.sqlite3 |
||||
|
||||
This will start the TUS server running on TCP port 1050. |
||||
# The location on disk where the static files will be placed during deployment. Setting is required |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#static-root |
||||
STATIC_ROOT= |
||||
|
||||
# Enter the default timezone for the visitors when it is not known. |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#std:setting-TIME_ZONE |
||||
TIME_ZONE=Europe/Amsterdam |
||||
|
||||
#### Data storage |
||||
The upload data is stored at a folder that is configured in the TUS startup command. This should be folder that is writable by the user that is running the TUS instance. Make sure that the upload folder is not directly accessible by the webserver. Else files can be downloaded. |
||||
# Email settings |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#email-host |
||||
# EMAIL_HOST= |
||||
|
||||
# Email user name |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#email-host-user |
||||
# EMAIL_HOST_USER= |
||||
|
||||
#### Hooks |
||||
The TUS is capable of handling hooks based on uploaded files. There are two types of hooks. 'Normal' hooks and webhooks. It is not possible to run both hook systems at the same time due to the blocking nature of the pre-create hook. So we use the 'normal' hook system. That means that custom scripts are run. Those scripts can then post the data to a webserver in order to get a Webhook functionality with the 'normal' hooks. |
||||
At the moment, there is only a HTTP webcall done in the hook system. There is no actual file movement yet. |
||||
For now we have used the following hooks: |
||||
# Email password |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#email-host-password |
||||
# EMAIL_HOST_PASSWORD= |
||||
|
||||
- **pre-create**: This hook will run when a new upload starts. This will trigger the Django server to store the upload in the database, and check if the upload is allowed based on an unique upload url and unique upload code. |
||||
- **post-finish**: This hook will run when an upload is finished. And will update the Database/Django with the file size and actual filename (unique) on disk. |
||||
# Email server port number to use. Default is 25 |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#email-port |
||||
# EMAIL_PORT= |
||||
|
||||
An example of a hook as used in this project. The only changes that should be done is: |
||||
- **WEBHOOK_URL**: This is the full url to the Django webhook |
||||
Do not change the **HTTP_HOOK_NAME** as this will give errors with Django. |
||||
# Does the email server supports TLS? |
||||
# https://docs.djangoproject.com/en/dev/ref/settings/#email-use-tls |
||||
# EMAIL_USE_TLS= |
||||
|
||||
```python |
||||
#!/usr/bin/env python |
||||
https://docs.djangoproject.com/en/dev/ref/settings/#default-from-email |
||||
DEFAULT_FROM_EMAIL=Do not reply<no-reply@rug.nl> |
||||
|
||||
import sys |
||||
import json |
||||
import requests |
||||
# The sender address. This needs to be one of the allowed domains due to SPF checks |
||||
# The code will use a reply-to header to make sure that replies goes to the researcher and not this address |
||||
EMAIL_FROM_ADDRESS=Do not reply<no-reply@rug.nl> |
||||
|
||||
# Tus webhook name |
||||
HTTP_HOOK_NAME='pre-create' |
||||
# Django webserver with hook url path |
||||
WEBHOOK_URL='http://localhost:8000/webhook/' |
||||
# The Redis server is used for background tasks. Enter the variables below. Leave password empty if authentication is not enabled. |
||||
# The hostname or IP where the Redis server is running. Default is localhost |
||||
REDIS_HOST=localhost |
||||
|
||||
# Read stdin input data from the TUS daemon |
||||
data = ''.join(sys.stdin.readlines()) |
||||
# The Redis port number on which the server is running. Default is 6379 |
||||
REDIS_PORT=6379 |
||||
|
||||
# Test if data is valid JSON... just to be sure... |
||||
try: |
||||
json.loads(data) |
||||
except Exception as ex: |
||||
print(ex) |
||||
# Send exit code higher then 0 to stop the upload process on the Tus server |
||||
sys.exit(1) |
||||
# The Redis password when authentication is enabled |
||||
# REDIS_PASSWORD= |
||||
|
||||
# We know for sure that JSON input data is 'valid'. So we post to the webhook for further checking |
||||
try: |
||||
# Create a webhook POST request with the needed headers and data. The data is the RAW data from the input. |
||||
webhook = requests.post(WEBHOOK_URL, headers={'HOOK-NAME':HTTP_HOOK_NAME}, data=data) |
||||
# If the POST is ok, and we get a 200 status back, so the upload can continue |
||||
if webhook.status_code == requests.codes.ok: |
||||
# This will make the Tus server continue the upload |
||||
sys.exit(0) |
||||
# The amount of connections to be made inside a connection pool. Default is 10 |
||||
REDIS_CONNECTIONS=10 |
||||
|
||||
except requests.exceptions.RequestException as ex: |
||||
# Webhook post failed |
||||
print(ex) |
||||
# Enter the full path to the Webbased file uploading without the Study ID part. The Study ID will be added to this url based on the visitor. |
||||
DROPOFF_BASE_URL=http://localhost:8000/dropoffs/ |
||||
|
||||
# We had some errors, so upload has to be stopped |
||||
sys.exit(1) |
||||
``` |
||||
This hook uses the same data payload as when TUS would use the Webhook system. So using 'Normal' hooks or using Webhooks with DJANGO should both work out of the box. |
||||
# Enter the full url to the NGINX service that is in front of the TUSD service. By default that is http://localhost:1090 |
||||
DROPOFF_UPLOAD_HOST=http://localhost:1090 |
||||
|
||||
### NGINX |
||||
Install NGINX with LUA support through the package manager. For Ubuntu this would be |
||||
```sh |
||||
apt install nginx libnginx-mod-http-lua |
||||
``` |
||||
Also configure SSL to make the connections secure. This is outside this installation scope. |
||||
|
||||
#### LUA |
||||
There is usage of LUA in NGINX so we can handle some dynamic data on the server side. All LUA code should be placed in the folder `/etc/nginx/lua`. |
||||
|
||||
#### Setup |
||||
After installation of the packages, create a symbolic link in the `/etc/nginx/sites-enabled` so that a new VHost is created. |
||||
|
||||
Important parts of the VHost configuration: |
||||
```nginx |
||||
lua_package_path "/etc/nginx/lua/?.lua;;"; |
||||
|
||||
server { |
||||
listen 80 default_server; |
||||
listen [::]:80 default_server; |
||||
|
||||
# SSL configuration |
||||
# |
||||
# listen 443 ssl default_server; |
||||
# listen [::]:443 ssl default_server; |
||||
# |
||||
# Note: You should disable gzip for SSL traffic. |
||||
# See: https://bugs.debian.org/773332 |
||||
# |
||||
# Read up on ssl_ciphers to ensure a secure configuration. |
||||
# See: https://bugs.debian.org/765782 |
||||
# |
||||
# Self signed certs generated by the ssl-cert package |
||||
# Don't use them in a production server! |
||||
# |
||||
# include snippets/snakeoil.conf; |
||||
|
||||
root /var/www/html; |
||||
|
||||
# Add index.php to the list if you are using PHP |
||||
index index.html; |
||||
|
||||
server_name localhost; |
||||
|
||||
# This location is hit when the Tus upload is starting and providing meta data for the upload. |
||||
# The actual upload is done with the /files location below |
||||
location ~ /files/([0-9a-f]+\-[0-9a-f]+\-[1-5][0-9a-f]+\-[89ab][0-9a-f]+\-[0-9a-f]+)?/ { |
||||
set $project_id $1; # Here we capture the UUIDv4 value to use in the Tus metadata manipulation |
||||
set $tusmetadata ''; |
||||
|
||||
# Here we manipulate the metadata from the TUS upload server. |
||||
# Now we are able to store some extra meta data based on the upload url. |
||||
access_by_lua_block { |
||||
local dropoff_tus = require('dropoff_tus'); |
||||
local project_metadata = ngx.req.get_headers()['Upload-Metadata']; |
||||
if project_metadata ~= nill then |
||||
ngx.var.tusmetadata = dropoff_tus.updateTusMetadata(project_metadata,ngx.var.project_id); |
||||
end |
||||
} |
||||
|
||||
# Here we update the Tus server metadata so we can add the project uuid to it for further processing |
||||
proxy_set_header Upload-Metadata $tusmetadata; |
||||
|
||||
# Rewrite the url so that the project UUIDv4 is stripped from the url to the Tus server |
||||
rewrite ^.*$ /files/ break; |
||||
|
||||
# Disable request and response buffering |
||||
proxy_request_buffering off; |
||||
proxy_buffering off; |
||||
|
||||
client_max_body_size 0; |
||||
|
||||
# Forward incoming requests to local tusd instance. |
||||
# This can also be a remote server on a different location. |
||||
proxy_pass http://localhost:1080; |
||||
proxy_http_version 1.1; |
||||
proxy_set_header Upgrade $http_upgrade; |
||||
proxy_set_header Connection "upgrade"; |
||||
|
||||
proxy_redirect off; |
||||
proxy_set_header Host $host; |
||||
proxy_set_header X-Real-IP $remote_addr; |
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; |
||||
proxy_set_header X-Forwarded-Host $server_name; |
||||
proxy_set_header X-Forwarded-Proto $scheme; |
||||
} |
||||
|
||||
location ~ /files { |
||||
# Disable request and response buffering |
||||
proxy_request_buffering off; |
||||
proxy_buffering off; |
||||
|
||||
client_max_body_size 0; |
||||
|
||||
# Forward incoming requests to local tusd instance |
||||
proxy_pass http://localhost:1080; |
||||
proxy_http_version 1.1; |
||||
proxy_set_header Upgrade $http_upgrade; |
||||
proxy_set_header Connection "upgrade"; |
||||
|
||||
proxy_redirect off; |
||||
proxy_set_header Host $host; |
||||
proxy_set_header X-Real-IP $remote_addr; |
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; |
||||
proxy_set_header X-Forwarded-Host $server_name; |
||||
proxy_set_header X-Forwarded-Proto $scheme; |
||||
} |
||||
} |
||||
# Which file extensions are **NOT** allowed to be uploaded. By default the extensions exe,com,bat,lnk,sh are not allowed |
||||
DROPOFF_NOT_ALLOWED_EXTENSIONS=exe,com,bat,lnk,sh |
||||
|
||||
# Sentry settings |
||||
# Enter the full Sentry DSN string. This should contain a key and a project |
||||
SENTRY_DSN= |
||||
``` |
||||
|
||||
And there should be a `lua` folder in the `/etc/nginx` folder. |
||||
## Background scheduler |
||||
|
||||
In order to test if NGINX is configured correctly run `nginx -t` and it should give an OK message: |
||||
For some actions we need a background scheduler system. This system is relaying on Redis, so make sure you have Redis installed. The scheduler can be start with the command: |
||||
```sh |
||||
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok |
||||
nginx: configuration file /etc/nginx/nginx.conf test is successful |
||||
./run_scheduler.sh |
||||
``` |
||||
|
||||
## Security (not yet inplemented) |
||||
It is possible to secure the upload files with PGP encryption. This is done automatically in the Web interface. When you want PGP encryption though API upload, the encryption has to be done before the upload is started. This is a manual action done by the uploader. |
||||
So automatic encryption is only available through the Web upload. |
||||
This will load the Python3 virtual environment and start the background scheduler. Keep the console open. |
||||
|
||||
## NGINX |
||||
We use NGINX as a proxy in front of the API. This is not mandatory, but can be handy when you have a busy api. |
||||
|
||||
## VRW Integration |
||||
... |
||||
|
||||
|
||||
## Openstack Integration |
||||
... |
Binary file not shown.
@ -1,7 +1,7 @@
@@ -1,7 +1,7 @@
|
||||
#!/bin/bash |
||||
|
||||
# This will start the huey task scheduling: https://huey.readthedocs.io/en/latest/contrib.html#django |
||||
# Make sure this script is started in the same folder as where 'clouds.yaml' is. Else Cloud connections will fail |
||||
# Make sure this script is started in the same folder as where 'clouds.yaml' is. Else Openstack Cloud connections will fail |
||||
|
||||
source venv/bin/activate |
||||
./VRE/manage.py run_huey |
||||
|
Loading…
Reference in new issue