Biscuit tutorial

In the previous article, I introduced Biscuit, our authentication and authorization token, and mentioned its Datalog based language for authorization policies. Let's see how it works!

From a personal blog to an entire newspaper

As an example, we will build up authorization policies, going from a small, personal blog, to a professional journal with multiple teams, editors, etc.

Since those policies will be written in Datalog, let's take a short look at that language first.

Side note: introduction to Datalog

Datalog is a declarative logic language that is a subset of Prolog. A Datalog program contains "facts", which represent data, and "rules", which can generate new facts from existing ones.

As an example, we could define the following facts, describing some relationships:

parent("Alice", "Bob");
parent("Bob", "Charles");
parent("Charles", "Denise");

This means that Alice is Bob's parent, and so on.

This could be seen as a table in a relational database:

parent
	Alice	Bob
	Bob	Charles
	Charles	Denise

We can then define rules to query our data:

parent_of_charles($name) <-
  parent($name, "Charles");

This could be written in SQL as:

SELECT DISTINCT name from parent where child = "Charles";

(we use DISTINCT because Datalog will always remove redundant results)

We can also use rules to create new facts, like this one: (variables are introduced with the $ sign)

grandparent($grandparent, $child) <-
  parent($grandparent, $parent),
  parent($parent, $child);

You can read it as follows:

create the fact grandparent($grandparent, $child)
  IF
    there is a fact parent($grandparent, $parent)
    AND there is a fact parent($parent, $child)
    with matching $parent variable

or in SQL:

INSERT INTO grandparent( name, grandchild )
  SELECT A.name as name, B.child as grandchild
  FROM parent A, parent B
  WHERE A.child = B.name;

Applying this rule will look at combinations of the parent facts as defined on the right side of the arrow (the "body" of the rule), and try to match them to the variables ($grandparent, $parent, $child):

parent("Alice", "Bob"), parent("Bob", "Charles") matches because we can
replace $grandparent with "Alice", $parent with "Bob", $child with "Charles"
parent("Alice", "Bob"), parent("Charles", "Denise") does not match because
we would get different values for the $parent variable

For each matching combination of facts in the body, we will then generate a fact, as defined on the left side of the arrow, the head of the rule. For parent("Alice", "Bob"), parent("Bob", "Charles"), we would generate grandparent("Alice", "Charles"). A fact can be generated from multiple rules, but we will get only one instance of it.

Going through all the combinations, we will generate:

grandparent("Alice", "Charles");
grandparent("Bob", "Denise");

which can be seen as:

grandparent
	Alice	Charles
	Bob	Denise

Interactions with a Datalog program are done through queries: a query contains a rule that we apply over the system, and it returns the generated facts.

First steps: personal blog

*note: you can follow along the various steps of this tutorial in the online playground.

When we are the only user of that blog, we do not need much (honestly we could get away with just a random string in a cookie, but bear with me). We only need a way to identify ourselves to the blog engine's admin panel. So we could just consider the Biscuit token as a fancy JWT, that will only contain data (so, in Datalog, facts).

Our token will contain this fact: user(#authority, "user_1234").

Here, "user_1234" is our user id, and #authority is a special symbol that can only be added to facts in the first block of a token (or added by the verifier). A block contains facts (data), rules (to generate facts) and checks (queries used to validate the facts). Attenuation is done by adding more blocks. Since #authority facts are about the basic rights of a token, adding #authority facts would increase the number of rights. So we forbid adding #authority facts in additional blocks. Symbols, as indicated by the # prefix, are special strings that are internally replaced with integers, to compress tokens and accelerate evaluation.

The token can be serialized to a byte array (encoded with Protobuf) and then to base64 if we want to carry it in a cookie.

On the blog engine's side, we will only have this single line:

allow if user(#authority, "user_1234");

Biscuit can enforce authorization in 2 ways:

checks, starting with check if
allow/deny policies, starting with allow if or deny if

They work a bit like rules: if there's at least one combination of fact in the body (after the if) that fits, then it matches. They will not produce any fact.

To validate a token:

all of the checks must match. If one does not, fail
allow/deny policies are tried in order until one matches
- if allow matches, succeed
- if deny matches, fail
if none match, fail

Here the allow test will succeed if the token contains the fact user(#authority, "user_1234")

It is not very useful yet, but maybe we can add more features?

Next: multi-blog platform

After a few friends have seen your marvelous website, they ask if you could host their blogs on the same platform. So now you need more flexible authorization rules. We could keep the small tokens with the user id, but add more intelligence on the server's side.

First we need to indicate who owns which blog, with the format owner(#authority, $user_id, $blog_id). You can load this data when creating the verifier, from your database, from static files, etc.

owner(#authority, "user_1234", "blog1");
owner(#authority, "user_5678", "blog2");
owner(#authority, "user_1234", "blog3");

Here we own "blog1" and "blog3", and "user_5678" owns "blog2".

Now we need to actually validate the request, to see who has access to what. The request is represented through the #ambient facts, added to the verifier: you indicate to the verifier facts representing the current request like which resource is accessed, which operation (read, write, etc), the current time, the source IP address, etc. As an example, a PUT /blog1/article1 to modify an article could be translated as:

blog(#ambient, "blog1");
article(#ambient, "blog1", "article1");
operation(#ambient, #update);

In the verifier, we add a rule to indicate that the owner of a blog has full rights on it:

right(#authority, $blog_id, $article_id, $operation) <-
    article(#ambient, $blog_id, $article_id),
    operation(#ambient, $operation),
    user(#authority, $user_id),
    owner(#authority, $user_id, $blog_id);

If this rules finds a matching set of facts, it will produce a right(...) fact.

The verifier will also use an allow policy for the presence of that right (you will see why we separate them in the next section):

allow if
  blog(#ambient, $blog_id),
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  right(#authority, $blog_id, $article_id, $operation);

// unauthenticated users have read access
allow if
  operation(#ambient, #read);

// catch all rule in case the allow did not match
deny if true;

So if we tried to do a PUT /blog1/article1 with the token containing user(#authority, "user_1234"), we would end up with the following facts:

user(#authority, "user_1234");
blog(#ambient, "blog1");
article(#ambient, "blog1", "article1");
operation(#ambient, #update);
owner(#authority, "user_1234", "blog1");
owner(#authority, "user_5678", "blog2");
owner(#authority, "user_1234", "blog3");

If we applied the verifier's rule, we would end up with:

right(#authority, "blog1", "article1", #update) <-
    owner(#authority, "user_1234", "blog1"),
    article(#ambient, "blog1", "article1"),
    user(#authority, "user_1234"),
    operation(#ambient, #update);

So we end up with the new fact right(#authority, "blog1", "article1", #update).

Now the verifier applies the check:

allow if
  blog(#ambient, "blog1"),
  article(#ambient, "blog1", "article1"),
  operation(#ambient, #update),
  right(#authority, "blog1", "article1", #update);

And the test succeeds! If we had tried the request with a token containing user(#authority, "user_5678"), the rule would not have produced the right() fact, and it would have failed.

Now if we did a GET /blog1/article1 request, without being the owner of the blog, we would have matched allow if operation(#ambient, #read).

But maybe we don't want to have all articles available by default, maybe some of them are still in writing, so let's remove that allow policy. We want to mark an article as publicly readable by creating the fact readable(#authority, $blog_id, $article_id). We can do that with this test:

allow if
  operation(#ambient, #read),
  article(#ambient, $blog_id, $article_id),
  readable(#authority, $blog_id, $article_id);

So if we did a GET /blog1/article1 request with that article marked as readable, we would get the facts:

blog(#ambient, "blog1");
article(#ambient, "blog1", "article1");
operation(#ambient, #read);
owner(#authority, "user_1234", "blog1");
owner(#authority, "user_5678", "blog2");
owner(#authority, "user_1234", "blog3");
readable(#authority, "blog1", "article1");

The test would apply as follows:

allow if
  operation(#ambient, #read),
  article(#ambient, "blog1", "article1"),
  readable(#authority, "blog1", "article1");

And we got access. In a few lines, we created basic rules to protect our blog platform. But users need more features!

add reviewers

Often, we'd like to ask friends and colleagues to review articles before they are published. In our system, it could be done in two ways:

mint a token containing only right(#authority, "blog1", "article1", #read)
derive the user's token, adding a check restricting to the article

In the second case, the token would look like this:

Block 0 (authority):
  facts: [ user(#authority, "user_1234") ]
  rules: []
  checks: []

Block 1:
  facts: []
  rules: []
  check: [
    check if article(#ambient, "blog1", "article1"), operation(#ambient, #read)
  ]

if we tried to do a PUT /blog1/article1, the verifier's checks would succeed, but the token's check would fail, because it does not find the operation(#ambient, #read) fact. But for a GET /blog1/article1, all checks would succeed. The reviewer will not be able to remove the block while keeping a valid signature, so any alteration will result in a failed request.

premium accounts

Now some of the blog authors want to make living out of it (come on, it's 2021, do a newsletter instead) and mark some articles as "premium", so that only some users can access them.

We can do that by having premium_user(#authority, $user_id, $blog_id) facts and adding a rule on the verifier's side:

right(#authority, $blog_id, $article_id, #read) <-
  article(#ambient, $blog_id, $article_id),
  premium_readable(#authority, $blog_id, $article_id),
  user(#authority, $user_id),
  premium_user(#authority, $user_id, $blog_id);

We could even add a feature like LWN.net where a paying user can share a premium article, by deriving their tokens to only accept that article.

We're a big newspaper now, we want roles and teams

Againt all odds, our blog platform is a smashing success. We need to recruit journalists, editors, copywriters… So now we might need more flexible rights management, maybe some teams and roles?

Let's define more facts and rules to encode that. As an example, let's define a "contributor" role that can only read or write articles, while owners are the only ones who can create or delete.

right(#authority, $blog_id, $article_id, $operation) <-
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  user(#authority, $user_id),
  contributor(#authority, $user_id, $blog_id),
  [#read, #update].contains($operation);

What you can see on the last line is an expression: Biscuit's Datalog implementation can require additional conditions on some values, like a string matching a regular expression, or a date being lower than an expiration date, or here, presence in a set. This rule will only produce if the operation is #read or #update.

Now, we want to define contributor teams to manage them more easily. So we will introduce the team(#authority, $team_id), member(#authority, $user_id, $team_id) and team_role(#authority, $team_id, $blog_id, #contributor) facts.

Additionally, we insert this rule in the verifier:

contributor(#authority, $user_id, $blog_id) <-
  user(#authority, $user_id),
  member(#authority, $user_id, $team_id),
  team_role(#authority, $team_id, $blog_id, #contributor);

This rule will generate the contributor fact for a blog if we are member of a team that has the "contributor" team role.

We could also fold the two precedent rules in one:

right(#authority, $blog_id, $article_id, $operation) <-
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  user(#authority, $user_id),
  member(#authority, $user_id, $team_id),
  team_role(#authority, $team_id, $blog_id, #contributor),
  [#read, #write].contains($operation);

And that's it! With a few rules, we can model more and more complex authorization patterns, some of them relying on user provided policies, without compromising the previous features. Rules are additive, so there's no need for a long chain of if/else and special cases hardcoded in some endpoints. Everything can be managed in one place.

To sum up the rules of our system:

// the owner has all rights
right(#authority, $blog_id, $article_id, $operation) <-
    article(#ambient, $blog_id, $article_id),
    operation(#ambient, $operation),
    user(#authority, $user_id),
    owner(#authority, $user_id, $blog_id);

// premium users can access some restricted articles
right(#authority, $blog_id, $article_id, #read) <-
  article(#ambient, $blog_id, $article_id),
  premium_readable(#authority, $blog_id, $article_id),
  user(#authority, $user_id),
  premium_user(#authority, $user_id, $blog_id);

// define teams and roles
right(#authority, $blog_id, $article_id, $operation) <-
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  user(#authority, $user_id),
  member(#authority, $user_id, $team_id),
  team_role(#authority, $team_id, $blog_id, #contributor),
  [#read, #write].contains($operation);

// unauthenticated users have read access on published articles
allow if
  operation(#ambient, #read),
  article(#ambient, $blog_id, $article_id),
  readable(#authority, $blog_id, $article_id);

// authorize if got the rights on this blog and article
allow if
  blog(#ambient, $blog_id),
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  right(#authority, $blog_id, $article_id, $operation);


// catch all rule in case the allow did not match
deny if true;

And here is an example Rust program reproducing this authorization system:

use biscuit::{crypto::KeyPair, error, token::Biscuit, parser::parse_source};
use biscuit_auth as biscuit;

fn main() -> Result<(), error::Token> {
    let start = std::time::Instant::now();

    // First, let's create the root key for the system
    // its public part will be used to verify the token
    let mut rng = rand::thread_rng();
    let root = KeyPair::new();

    // Token creation
    // we will add a single fact indicating identity
    let mut builder = Biscuit::builder(&root);
    builder.add_authority_fact("user(#authority, \"user_1234\")")?;

    let token = builder.build()?;
    println!("{}", token.print());
    let token_bytes = token.to_vec()?;
    let serialized = base64::encode_config(&token_bytes, base64::URL_SAFE);
    println!("serialized ({} bytes): {}", token_bytes.len(), serialized);

    let deserialized_token = Biscuit::from(&token_bytes)?;
    // Token verification
    // first, we validate the signature with the root public key
    let mut verifier = deserialized_token.verify(root.public())?;

    // simulate verification for PUT /blog1/article1
    verifier.add_fact("blog(#ambient, \"blog1\")")?;
    verifier.add_fact("article(#ambient, \"blog1\", \"article1\")")?;
    verifier.add_fact("operation(#ambient, #update)")?;

    // add ownership information
    // we only need to load facts related to the blog and article we're accessing
    verifier.add_fact("owner(#authority, \"user_1234\", \"blog1\")")?;
    //verifier.add_fact("owner(#authority, \"user_5678\", \"blog2\")")?;
    //verifier.add_fact("owner(#authority, \"user_1234\", \"blog3\")")?;

    let (_remaining_input, mut policies) = parse_source("
// the owner has all rights
right(#authority, $blog_id, $article_id, $operation) <-
    article(#ambient, $blog_id, $article_id),
    operation(#ambient, $operation),
    user(#authority, $user_id),
    owner(#authority, $user_id, $blog_id);

// premium users can access some restricted articles
right(#authority, $blog_id, $article_id, #read) <-
  article(#ambient, $blog_id, $article_id),
  premium_readable(#authority, $blog_id, $article_id),
  user(#authority, $user_id),
  premium_user(#authority, $user_id, $blog_id);

// define teams and roles
right(#authority, $blog_id, $article_id, $operation) <-
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  user(#authority, $user_id),
  member(#authority, $usr_id, $team_id),
  team_role(#authority, $team_id, $blog_id, #contributor),
  [#read, #write].contains($operation);

// unauthenticated users have read access on published articles
allow if
  operation(#ambient, #read),
  article(#ambient, $blog_id, $article_id),
  readable(#authority, $blog_id, $article_id);

// authorize if got the rights on this blog and article
allow if
  blog(#ambient, $blog_id),
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  right(#authority, $blog_id, $article_id, $operation);


// catch all rule in case the allow did not match
deny if true;
    ").unwrap();

    for (_span, fact) in policies.facts.drain(..) {
        verifier.add_fact(fact)?;
    }

    for (_span, rule) in policies.rules.drain(..) {
        verifier.add_rule(rule)?;
    }

    for (_span, check) in policies.checks.drain(..) {
        verifier.add_check(check)?;
    }

    for (_span, policy) in policies.policies.drain(..) {
        verifier.add_policy(policy)?;
    }

    let res = verifier.verify()?;
    let dur = std::time::Instant::now() - start;
    //println!("res: {:?}", res);
    println!("{}", verifier.print_world());

    println!("ran in {:?}", dur);
    Ok(())
}

The entire program (key generation, token creation, serialization, deserialization, signature validation and facts verification) runs in 0.5 ms. So even with all of these features, Biscuit is fast enough to get out of your way.

What's next

You can already start using Biscuit in Rust, Java and Go.

The Rust version can also generate C bindings, currently used to develop a Haskell version, and there is a WebAssembly wrapper.

As an example integration, you can check out a Biscuit based authorization plugin for Apache Pulsar.

The specification is developed in the open, you can contribute.

From a personal blog to an entire newspaper

Side note: introduction to Datalog

First steps: personal blog

Next: multi-blog platform

add reviewers

premium accounts

We're a big newspaper now, we want roles and teams

What's next

À lire également

Clever Cloud and OCamlPro join forces to help migrate COBOL mainframe infrastructures to Cloud and Open Source

Clever Cloud joins the Eclipse Foundation: a commitment to the future of European open source

Up to €100,000 in funding to adopt Hyper Open X technologies