Topics:

At Clever Cloud we manage most of our own data and when we want to gather a particular information, we open our SQL interpreter and query all the things manually. This somehow worked because most of us are technical but it's not necessary the case anymore. So we want a nice dashboarding solution to make data available in a nicer way. This is how we came to try Superset.

What is Superset? In their own words:

Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data.

You can configure different sort of visualizations from simple line charts to highly detailed geo-spatial charts and organize them in dashboard. Take a look at their documentation to grasp the full extent of what you can do.

How to deploy Superset

Superset is written in Python and requires a PostgreSQL database. In Clever Cloud terms it means you will need to create a brand new Python runtime with a PostgreSQL addon.

# Clone last superset release
git clone --depth 1 -b 0.38.1  https://github.com/apache/superset.git

# Move in superset repository
cd superset

# Create the python application
clever create --type python Superset

# Create the PostgreSQL instance                                
clever addon create postgresql-addon --plan m SupersetPG

# link the addon 
clever service link-addon SupersetPG

Once the application is created, edit it's information. As you need to build Superset backend and frontend, you need to enable a dedicated build instance, select the L one. Don't foget to save the update.

Superset environment

Set the following environment, YOUR_APP_ID corresponds to your own application id, A_SECRET_KEY is a random secret key you choose, and AN_ADMIN_PASSWORD will be the admin password for the superset application:

# Use gunicorn as python mode 
clever env set CC_PYTHON_BACKEND gunicorn

# Start the app using create_app() 
clever env set CC_PYTHON_MODULE superset.app:create_app()

# Python version to use
clever env set CC_PYTHON_VERSION 3.7

# Load all superset requirements
clever env set CC_PIP_REQUIREMENTS_FILE requirements/base.txt

# Application port
clever env set PORT 8080

# PYTHONPATH
clever env set PYTHONPATH /home/bas/YOUR_APP_ID/config/

# App secret key
clever env set SECRET_KEY A_SECRET_KEY

# Post build commands
clever env set CC_POST_BUILD_HOOK ./init.sh

# Superset admin password
clever env set ADMIN_PASSWORD AN_ADMIN_PASSWORD

Configure superset

Superset need a local config python file. Wrote locally a clever_config.py:

import os
# Superset specific config
ROW_LIMIT = 5000

SUPERSET_WEBSERVER_PORT = os.getenv("PORT")
SUPERSET_WEBSERVER_ADDRESS = "0.0.0.0"

# Flask App Builder configuration
# Your App secret key
SECRET_KEY = os.getenv("SECRET_KEY")

# The SQLAlchemy connection string to your database backend
# This connection defines the path to the database that stores your
# superset metadata (slices, connections, tables, dashboards, ...).
# Note that the connection information to connect to the datasources
# you want to explore are managed directly in the web UI
SQLALCHEMY_DATABASE_URI = os.getenv("POSTGRESQL_ADDON_URI")

# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = True
# Add endpoints that need to be exempt from CSRF protection
WTF_CSRF_EXEMPT_LIST = []
# A CSRF token that expires in 1 year
WTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365

# Set this API key to enable Mapbox visualizations
MAPBOX_API_KEY = ''

Superset build scipt

The next step is to write an init.shfile that will include all build tasks for superset:

#!/bin/bash

# Load configuration file in PYTHONPATH
mkdir $APP_HOME/config
cp clever_config.py $APP_HOME/config/superset_config.py

# Create an admin user (you will be prompted to set username, first and last name before setting a password)
superset fab create-admin \
    --username admin \
    --firstname admin \
    --lastname admin \
    --email admin@admin.com \
    --password ${ADMIN_PASSWORD}

# Upgrade the database
superset db upgrade

# Loading examples data set (optional)
superset load_examples

# Create default roles and permissions
superset init

# Build frontend
cd superset-frontend 
npm install -f --no-optional
npm run build

Clever supports post build hooks, this init.sh file will be played at the end of the build phase.

It's used to set the config file we wrote before into PYTHONPATH used by the application. Then it will start some superset commands required for the first start of the application (create the admin user if it doesn't exists, upgrade the database, and file it with examples and finally init the superset appication). Those superset commands could be remove of the init file after the first success build. Finally, this script is used to also build the frontend of the superset application.

Superset requirements

When you added the CC_PIP_REQUIREMENTS_FILE, it told clever cloud to load custom requirement for superset which are locate in the requirements/base.txt file. As you are using the PostgreSQL add-on, you need to add its requirement in this base file:

echo "psycopg2>=2.7 --no-binary psycopg2" >> requirements/base.txt

The Clever Cloud python application still needs a local requirements.txt to start the install. Generate it with:

pip3 freeze > requirements.txt

As we use gunicorn and the PostgreSQL add-on, they are required in the requirements.txt file, add them:

echo "psycopg2>=2.7 --no-binary psycopg2" >> requirements.txt

# gunicorn valid versions are located in setup.py file
echo "gunicorn>=20.0.2, <20.1" >> requirements.txt

Deploy

Deploy the superset application into Clever cloud:

chmod u+x init.sh

# Add your files
git add .

# Create the first commit
git commit -m "clever init"

# Deploy the application
clever deploy

# Run the application
clever open

Once this is done you should be able to access the superset application, enjoy!

After the first successful superset deployement you can remove on the init.sh file all commands starting with superset (they are required for the first start).


Profile picture of Aurélien Hébert
By Aurélien Hébert

Software engineer working on distributed systems and data storage/analytics systems enthusiast.