Detecting boomerang values in zero-knowledge circuits using tag analysis

on Aug 25, 2023

zkApps (zero-knowledge applications) are meant to be provable translations of computer programs. That is, they should behave like the computer programs they are intending to represent, while also providing proofs of correct execution that can accompany program outputs. But as we’ve talked about before, translating programs into provable programs is not a straight forward process. To prove executions, zkApps attempt to “witness” all the important parts of an execution trace. They seek to assert every nook and corner specified (explicitly or implicitly) in the original application. The important keywords here are “all” and “every”, as bugs can occur if even the tiniest part of the original program is missing in the assertions of the zkApp.

boomerang

In this article, we want to talk about boomerang values, which are one example of gotchas that occur when writing zk apps. What are boomerang values? What kind of bugs do they produce? And what can we do with them? Read on!

Mixing in-circuit and out-of-circuit logic

Developers write zkApps to encode the logic they want to witness in a higher-level language. These higher-level languages can take different shapes and forms (as we described in this post) and a number of them will commonly mix non-circuit (or out-of-circuit) logic with circuit (or in-circuit) logic.

The typical example of why one would even want something like that (or what does that even mean to mix out-of-circuit with in-circuit logic) is the division example, in which it’s much easier to just compute the result of a division out-of-circuit, and then provide it like it fell from the sky.

Here’s how the division function is typically written in zkApps, in some python pseudo-code:

def divide(a_var, b_var):
	# compute the division out-of-circuit
	c = a_var.value() / b_var.value()

	# witness / insert the out-of-circuit value in the circuit
	c_var = new_var(c)

	# constrain it to be the correct result of the division
	assert(c_var * b_var == a_var)

	return c

If this doesn’t immediately make sense to you, think about it this way: the zkApp encodes (at compile time) some assertions on variables. Here it will encode the fact that there’s a variable c_var which can take any value as long as it satisfies c_var * b_var == a_var.

At runtime (or proving time), when I (or you) want to execute the program and create a proof for it, we’ll give actual values to a_var and b_var and create a proof based on that. Similarly, a value will be given to c_var, which will be computed based on the out-of-circuit logic above.

The important part, is that the divide function defines both the compile-time logic, and the runtime logic. While compile time is something that is done by both the prover and verifier, the runtime logic is something that is ran only by the prover, and thus a malicious prover can decide to do whatever they want. Especially, they do not have to compute c as a / b as described in the program!

Lifetime of a variable and linking issues

In our above program, a malicious prover trying to compute c as anything else but a / b will fail because the circuit checks that the division is correct. Remove the assertion and you have an underconstrained circuit (it’s a bug). In this article, we’re interested by a type of issue that belongs to this class of underconstrained circuit bug.

A variable in a circuit has a life. It is first created (and given a value at runtime), and it will then be mutated and used in all sorts of ways. Yes, like a variable in a normal program. In a sense, variables are wired to different parts of a circuit logic.

But that wire can be cut. For example, imagine the following pseudo-code:

# create a new circuit variable
a_var = new_var(some_value);

# do something with it...

# get the value out-of-circuit
a_val = a_var.value()

# potentially do something important with the value...

# reinsert the value in the circuit
b_var = new_var(a_val)

This is quite the generic and vague example, but it should be enough to illustrate my point. Here, you should be able to tell that something is wrong. If not, maybe take a few minutes to think about it.

The problem is that the above program gives the following instruction to the prover: “observe the runtime value of a_var, then reinsert it somewhere else in the program (under b_var)”. If you’re a malicious prover, what you’re probably hearing is “add whatever you want as b_var”.

The wire was cut, the link was broken. We call these boomerang values, as they leave the circuit and get reinserted later. They’re almost always a bad thing.

It’s interesting because intuitively, it seems like b_var is the only thing underconstrained now, but in reality it goes both way! Perhaps you want to keep the correct value as b_var, but now you can use a different value as a_var because it is not correlated to the correct value of b_var anymore.

In other words, if the correct values of a_var and b_var were the number 5, then a malicious prover could decide to keep b_var.value() = 5 and change the value of a_var to something else, or they could decide to keep the value of a_var.value() = 5 and change b_var to something else.

For an example of the impact this type of bug can have, you can check our public audit of Penumbra where we found a boomerang value that led to a double-spend attack. Today, what we’ll be looking at is how we can detect such bugs in a Rust stack, specifically one making use of the Arkworks’ R1CS library r1cs-std.

The ABCs of R1CS-std

When writing circuits with r1cs-std, the above pseudo-functions that I used to create new variables as well as obtain the runtime value of a variable are implemented through the use of the two functions new_variable() and value(). These two functions are located in two different traits that developers can implement on their own custom types.

Here’s new_variable:

pub trait AllocVar<V, F: Field>
where
    Self: Sized,
    V: ?Sized,
{
    /// Allocates a new variable of type `Self` in the `ConstraintSystem` `cs`.
    /// The mode of allocation is decided by `mode`.
    fn new_variable<T: Borrow<V>>(
        cs: impl Into<Namespace<F>>,
        f: impl FnOnce() -> Result<T, SynthesisError>,
        mode: AllocationMode,
    ) -> Result<Self, SynthesisError>;

As you can see, it does not take a value but a function/closure f which produces the value (but this is just a detail).

And here’s value():

/// This trait describes some core functionality that is common to high-level
/// variables, such as `Boolean`s, `FieldVar`s, `GroupVar`s, etc.
pub trait R1CSVar<F: Field> {
    /// The type of the "native" value that `Self` represents in the constraint
    /// system.
    type Value: core::fmt::Debug + Eq + Clone;

    /// Returns the value that is assigned to `self` in the underlying
    /// `ConstraintSystem`.
    fn value(&self) -> Result<Self::Value, ark_relations::r1cs::SynthesisError>;

So the boomerang example would look like this if we were to write it using arkworks’ r1cs-std library:

// create a new circuit variable
let a_var = MyType::new_variable(cs, || Ok(some_value), AllocationMode::Witness);

// do something with it...

// get the value out-of-circuit
let a_val = a_var.value().unwrap_or_default();

// potentially do something important with the value...

// reinsert the value in the circuit
let b_var = OtherType::new_variable(cs, || Ok(a_val), AllocationMode::Witness);

Tag analysis to the rescue

It is well known that formal verification methods are usually too involved to be useful for the majority of projects. Thus, their development mostly serve as marketing tools for companies to attract new customers, get government grants, and attract researchers to work for them.

Yet, there’s been a movement to make security tools actually useful to developers. The idea is to meet the developers where they are at. You can read more about that in Towards making formal methods normal: meeting developers where they are. Examples of this are easier-to-use formal verification tools (like Verifpal) or integrated analysis tools (like how fuzzing is directly integrated in Golang, or good libraries for property testing in Rust).

MIRAI is one more example of that, which initially started as a tool (by Herman Venter) to enforce invariants in the libra/diem blockchain. While MIRAI provides various features, what I’m interested in today is its tag analysis feature.

Let me explain what tag analysis is about with the following example: imagine that your program accepts user inputs, which it wants to deem dangerous until they get sanitized or properly validated. Using tag analysis, we could tag user inputs as dirty and then verify in important places of the program that values they operate on don’t have the dirty tag. Inversely, we could verify that important functions operate only on values that have a clean tag, which would be added in the sanitization/validation functions.

(One could argue that in typed languages, you could do the same by creating different types for dirty and clean values, but this still is not enough. Changes in the code, or human error, could still allow valid programs to execute on dirty values. Tag analysis, if implemented correctly, will propagate tags in exhaustive ways. If you copy, clone, or even do something based on a value the tag will be propagated to the new value.)

Using MIRAI, we can add an “out-of-circuit” tag to any values that comes out of circuit, by placing the tag in the R1CSVar::value() function:

#![cfg_attr(mirai, allow(incomplete_features), feature(generic_const_exprs))]

#[macro_use]
extern crate mirai_annotations;

use mirai_annotations::{TagPropagation, TagPropagationSet, TAG_PROPAGATION_ALL};

struct OutOfCircuitTaintKind<const MASK: TagPropagationSet> {}

/// our tag with all propagation set
type OutOfCircuitTaint = OutOfCircuitTaintKind<TAG_PROPAGATION_ALL>;

impl R1CSVar<Fq> for MyVar {
    type Value = MyValue;

    fn value(&self) -> Result<Self::Value, SynthesisError> {
        // TRUNCATED...

        add_tag!(&res, OutOfCircuitTaint);
        Ok(res)
    }
}

On the other hand, we can detect when such a value enters back in the circuit by detecting the tag in the AllocVar::new_variable() function:

impl AllocVar<MyValue, Fq> for MyVar {
    fn new_variable<T: std::borrow::Borrow<MyValue>>(
        cs: impl Into<ark_relations::r1cs::Namespace<Fq>>,
        f: impl FnOnce() -> Result<T, SynthesisError>,
        mode: ark_r1cs_std::prelude::AllocationMode,
    ) -> Result<Self, SynthesisError> {
        verify!(does_not_have_tag!(&f, OutOfCircuitTaint));

        // TRUNCATED...

Fuzzing for the rest

For completeness, I should mention fuzzing which is also a great tool to find bugs in circuits. Fuzzing is the process of generating weird inputs, feeding them to a function, and seeing if we can crash the function with it. Rinse and repeat. As such, fuzzing is a dynamic analysis tool–it runs the code to find bugs–and is quite different from the static analysis tools we’ve been talking about. You never get a guarantee that your code is correct, and you can run a fuzzer for as long as you want and still not find a bug eventhough you know it exists. Surprisingly though, fuzzers have had amazing results at finding bugs in real-world code.

For us though, it seems like fuzzing will be more useful to find completeness issues (where some inputs should work but don’t) but not necessarily useful at finding soundness issues (some inputs work when they shouldn’t).

I won’t talk much more about fuzzing as it is not the main topic of this post, but I’ll give you some pointers if you’re interested in running some yourself. In Rust, it is relatively easy to write fuzzers using cargo fuzz. I recommend not using libfuzzer and using instead AFL++ as fuzzer in its backend. As proving is a relatively slow process, one should try to fuzz by checking circuit satisfiability instead. One can do this with the following code (given a circuit variable):

let cs = ark_relations::r1cs::ConstraintSystem::<Fq>::new_ref();
cs.set_optimization_goal(ark_relations::r1cs::OptimizationGoal::Constraints);

let satisfied1 = circuit.generate_constraints(cs.clone()).is_ok();
let satisfied2 = cs.is_satisfied().is_ok();
if should_work {
    assert!(satisfied1 && satisfied2);
} else {
    assert!(!satified1 || !satisfied2);
}

In most cases you will have to use structured fuzzing in order to guide cargo fuzz: as it only understands how to produce “random” bytestrings, it will need some guidance on how to convert that into private and public inputs to properly fuzz the circuit.

Of course, these are not the only tools at our disposal. Stay tuned to learn more about our in-house toolings in the next blog post!