VRE Backend API and Scheduler
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
Joshua Rubingh d552185e5d
Split up code
2 years ago
.vscode Split up code 2 years ago
VRE Split up code 2 years ago
doc Split up code 2 years ago
docker Split up code 2 years ago
nginx Split up code 2 years ago
.dockerignore Split up code 2 years ago
.drone.yml Split up code 2 years ago
.gitignore Split up code 2 years ago
LICENSE Split up code 2 years ago
README.md Split up code 2 years ago
clouds.yaml.example Split up code 2 years ago
docker-compose.yaml Split up code 2 years ago
run_scheduler.sh Split up code 2 years ago

README.md

Virtual Research Environment

Secure data drop-off & routing software.

With this software it is possible to safely upload private and sensitive data like WeTransfer or Dropbox. It is possible to upload single or multiple files at once though a web interface or through an API.

Installation

In order to install this Data drop off project, we need the following packages / software.

  • Django
  • TUS (The Upload Server)
  • NGINX

Django

We install Django with standard settings. We could run it in Aync way, but then you need some more steps: https://docs.djangoproject.com/en/3.0/howto/deployment/asgi/ So for now, we keep it simple.

Install

Clone the code on /opt/deploy/data_drop-off

git clone https://git.web.rug.nl/VRE/data_drop-off.git /opt/deploy/data_drop-off

Then create a virtual environment

cd /opt/deploy/data_drop-off
python3 -m venv .
source bin/activate

Finally we install the required Python modules

pip install -r requirements

This will install all the needed Python modules we need to run this Django project.

External libraries:

Production

https://gitlab.com/eeriksp/django-model-choices
https://github.com/georgemarshall/django-cryptography
https://github.com/jacobian/dj-database-url
https://github.com/ierror/django-js-reverse

https://github.com/henriquebastos/python-decouple
https://github.com/ezhov-evgeny/webdav-client-python-3
https://github.com/dblueai/giteapy
https://pypi.org/project/PyGithub/

Development

https://github.com/jazzband/django-debug-toolbar

Settings

The settings for Django are set in an .env file so that you can easily change the environment from production to testing. There is an .env.example file that could be used as a template.

# A uniquely secret key
SECRET_KEY=@wb=#(f4uc0l%e!5*eo+aoflnxb(@!l9!=c5w=4b+x$=!8&vy%a

# Disable debug in production
DEBUG=False

# Allowed hosts that Django does server. Take care when NGINX is proxying infront of Django
ALLOWED_HOSTS=127.0.0.1,localhost

# Enter the database url connection: https://github.com/jacobian/dj-database-url
DATABASE_URL=sqlite:////opt/deploy/data_drop-off/db.sqlite3

# Email settings

# Mail host
EMAIL_HOST=

# Email user name
EMAIL_HOST_USER=

# Email password
EMAIL_HOST_PASSWORD=

# Email server port number to use
EMAIL_PORT=25

# Does the email server supports TLS?
EMAIL_USE_TLS=

Next we have to make the database structure. If you are using SQLite3 as a backend, make sure the database file DOES exist on disk.

touch /opt/deploy/data_drop-off/db.sqlite3

Then in the Python virtual environment we run the following commands:

./manage.py migrate
./manage.py loaddata virtual_machine_initial_data
./manage.py createsuperuser
./manage.py collectstatic

And finally you should be able to start the Django application

./manage.py runserver

TUS

TUS = The Upload Server. This is a resumable upload server that speaks HTTP. This server is a stand-alone server that is running behind the NGINX server.

It is even possible to run a TUS instance on a different location (Amsterdam). As long as the TUS is reachable by the NGINX frontend server, and the TUS server can post webhooks back to the frontend server.

Setup

If needs the package ecnfs* so install that first: sudo apt install encfs

The setup is quit simple. This works the same way as Django by using .env file. So start by creating a new settings files based on the example.

cp .env.example .env

# TUS Daemon settings
# Change the variable below to your needs. You can also add more variables that are used in the startup.sh script

WEBHOOK_URL="http://localhost:8000/datadrops/webhook/"
DROPOFF_API_HAWK_KEY="[ENTER_HAWK_KEY]"
DROPOFF_API_HAWK_SECRET="[ENTER_HAWK_SECRET]"

You need to create an API user in Django that is allowed to communicatie between the TUS daemon and Django. This can be done by creating a new usre in the Django admin. This will also generate a new token, which is needed. This token can be found at the API -> Tokens page.

The default webhook url is: /datadrops/webhook/

Then you can start the upload server by starting with the 'start.sh' script: ./start.sh

This will start the TUS server running on TCP port 1050.

Data storage

The upload data is stored at a folder that is configured in the TUS startup command. This should be folder that is writable by the user that is running the TUS instance. Make sure that the upload folder is not directly accessible by the webserver. Else files can be downloaded.

Hooks

The TUS is capable of handling hooks based on uploaded files. There are two types of hooks. 'Normal' hooks and webhooks. It is not possible to run both hook systems at the same time due to the blocking nature of the pre-create hook. So we use the 'normal' hook system. That means that custom scripts are run. Those scripts can then post the data to a webserver in order to get a Webhook functionality with the 'normal' hooks.
At the moment, there is only a HTTP webcall done in the hook system. There is no actual file movement yet.
For now we have used the following hooks:

  • pre-create: This hook will run when a new upload starts. This will trigger the Django server to store the upload in the database, and check if the upload is allowed based on an unique upload url and unique upload code.
  • post-finish: This hook will run when an upload is finished. And will update the Database/Django with the file size and actual filename (unique) on disk.

An example of a hook as used in this project. The only changes that should be done is:

  • WEBHOOK_URL: This is the full url to the Django webhook
    Do not change the HTTP_HOOK_NAME as this will give errors with Django.
#!/usr/bin/env python

import sys
import json
import requests

# Tus webhook name
HTTP_HOOK_NAME='pre-create'
# Django webserver with hook url path
WEBHOOK_URL='http://localhost:8000/webhook/'

# Read stdin input data from the TUS daemon
data = ''.join(sys.stdin.readlines())

# Test if data is valid JSON... just to be sure...
try:
  json.loads(data)
except Exception as ex:
  print(ex)
  # Send exit code higher then 0 to stop the upload process on the Tus server
  sys.exit(1)

# We know for sure that JSON input data is 'valid'. So we post to the webhook for further checking
try:
  # Create a webhook POST request with the needed headers and data. The data is the RAW data from the input.
  webhook = requests.post(WEBHOOK_URL, headers={'HOOK-NAME':HTTP_HOOK_NAME}, data=data)
  # If the POST is ok, and we get a 200 status back, so the upload can continue
  if webhook.status_code == requests.codes.ok:
    # This will make the Tus server continue the upload
    sys.exit(0)

except requests.exceptions.RequestException as ex:
  # Webhook post failed
  print(ex)

# We had some errors, so upload has to be stopped
sys.exit(1)

This hook uses the same data payload as when TUS would use the Webhook system. So using 'Normal' hooks or using Webhooks with DJANGO should both work out of the box.

NGINX

Install NGINX with LUA support through the package manager. For Ubuntu this would be

apt install nginx libnginx-mod-http-lua

Also configure SSL to make the connections secure. This is outside this installation scope.

LUA

There is usage of LUA in NGINX so we can handle some dynamic data on the server side. All LUA code should be placed in the folder /etc/nginx/lua.

Setup

After installation of the packages, create a symbolic link in the /etc/nginx/sites-enabled so that a new VHost is created.

Important parts of the VHost configuration:

lua_package_path  "/etc/nginx/lua/?.lua;;";

server {
  listen 80 default_server;
  listen [::]:80 default_server;

  # SSL configuration
  #
  # listen 443 ssl default_server;
  # listen [::]:443 ssl default_server;
  #
  # Note: You should disable gzip for SSL traffic.
  # See: https://bugs.debian.org/773332
  #
  # Read up on ssl_ciphers to ensure a secure configuration.
  # See: https://bugs.debian.org/765782
  #
  # Self signed certs generated by the ssl-cert package
  # Don't use them in a production server!
  #
  # include snippets/snakeoil.conf;

  root /var/www/html;

  # Add index.php to the list if you are using PHP
  index index.html;

  server_name localhost;

  # This location is hit when the Tus upload is starting and providing meta data for the upload.
  # The actual upload is done with the /files location below
  location ~ /files/([0-9a-f]+\-[0-9a-f]+\-[1-5][0-9a-f]+\-[89ab][0-9a-f]+\-[0-9a-f]+)?/ {
      set $project_id $1; # Here we capture the UUIDv4 value to use in the Tus metadata manipulation
      set $tusmetadata '';

      # Here we manipulate the metadata from the TUS upload server.
      # Now we are able to store some extra meta data based on the upload url.
      access_by_lua_block {
          local dropoff_tus = require('dropoff_tus');
          local project_metadata = ngx.req.get_headers()['Upload-Metadata'];
          if project_metadata ~= nill then
              ngx.var.tusmetadata = dropoff_tus.updateTusMetadata(project_metadata,ngx.var.project_id);
          end
      }

      # Here we update the Tus server metadata so we can add the project uuid to it for further processing
      proxy_set_header Upload-Metadata $tusmetadata;

      # Rewrite the url so that the project UUIDv4 is stripped from the url to the Tus server
      rewrite ^.*$ /files/ break;

      # Disable request and response buffering
      proxy_request_buffering  off;
      proxy_buffering          off;

      client_max_body_size     0;

      # Forward incoming requests to local tusd instance.
      # This can also be a remote server on a different location.
      proxy_pass         http://localhost:1080;
      proxy_http_version 1.1;
      proxy_set_header   Upgrade $http_upgrade;
      proxy_set_header   Connection "upgrade";

      proxy_redirect     off;
      proxy_set_header   Host $host;
      proxy_set_header   X-Real-IP $remote_addr;
      proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header   X-Forwarded-Host $server_name;
      proxy_set_header   X-Forwarded-Proto $scheme;
  }

  location ~ /files {
      # Disable request and response buffering
      proxy_request_buffering  off;
      proxy_buffering          off;

      client_max_body_size     0;

      # Forward incoming requests to local tusd instance
      proxy_pass         http://localhost:1080;
      proxy_http_version 1.1;
      proxy_set_header   Upgrade $http_upgrade;
      proxy_set_header   Connection "upgrade";

      proxy_redirect     off;
      proxy_set_header   Host $host;
      proxy_set_header   X-Real-IP $remote_addr;
      proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header   X-Forwarded-Host $server_name;
      proxy_set_header   X-Forwarded-Proto $scheme;
  }
}

And there should be a lua folder in the /etc/nginx folder.

In order to test if NGINX is configured correctly run nginx -t and it should give an OK message:

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

Security (not yet inplemented)

It is possible to secure the upload files with PGP encryption. This is done automatically in the Web interface. When you want PGP encryption though API upload, the encryption has to be done before the upload is started. This is a manual action done by the uploader.
So automatic encryption is only available through the Web upload.