A Dive Into the Math Behind Bitcoin Schnorr Signatures

Conduition

2023-06-23

cryptography

Introduction

Many culture articles in the Bitcoin space will extol the sound byte:

Schnorr signatures will improve Bitcoin’s privacy for multisignature transactions!

…and yet most choose to omit the clever math which makes this statement true. The math isn’t mind-bogglingly complex. One needs only a basic grasp of elliptic curve cryptography, and the benefits of Schnorr Signatures quickly become very exciting.

Preliminaries

Just so we’re all on the same page:

Notation	Meaning
	The base-point of the secp256k1 curve.
	The message we’re trying to sign (a byte array).
	The SHA-256 hash function.
	The order of the secp256k1 curve. There are possible valid non-zero points on the curve, plus the ‘infinity’ point (AKA zero).
	Sampling randomly from the set of integers modulo . Note that we exclude zero when sampling.
	Concatenation of the byte arrays and .

In general, upper-case variables like refer to points on the elliptic curve, while lower case letters like refer to ordinary natural numbers, called scalars.

Enter the Frankenstein

To understand how delicious Schnorr Signatures are for Bitcoin developers, we need to compare it to what we have now: ECDSA. I’ll forgive you if you choose to skip this part.

Show me the ugly ECDSA math!

I guess you’re really gonna make me huh? Best get this over with, then. _{*drinking intensifies}

The old ECDSA protocol signs messages like so:

Choose our private key, , which is just a random integer in .

Compute the hash of the message and interpret it as an integer .

Sample a random nonce .

Multiply by the base point .

Take the coordinate from that computation modulo .

* _{If , return to step 4 and choose a new nonce .}

Signing time! Compute this mess.

* _{If , return to step 4 and choose a new nonce .}

** _{is the modular multiplicative inverse of .}

The final signature is the awful pair of integers .

Verifying the signature goes (painfully) as follows:

Find (or compute) the public key.

Check that both and are integers in .

Compute the hash of the message that was signed, just as the signer would have.

Compute these intermediate “u-values”.

Multiply and by the base point and public key respectively.

Sum both points to get a verification point.

Verify the X-coordinate of that point is equivalent to the signature’s component when taken modulo .

If so, then wham you’ve got a valid signature. “That was easy wasn’t it?” _{he mused ironically…}

but why does that work?

-I hear you ask.

I am not a masochist, so I will spare myself more work rewriting a proof for an algorithm which I’m taking such pains to paint as awful. Instead, go check out this proof of ECDSA in Doctor Nakov’s Practical Cryptography for Developers e-book.

but why is ECDSA so awful?

For starters, it requires computing modular multiplicative inverses of the nonce when signing, and of the signature when verifying. Yuck. Well, more accurately, yawn. MMI’s are very slow compared to other discrete math operations.

Furthermore, just look at the number of steps I had to write out above. It’s less a signing algorithm than a surgical procedure which happens to involve your private keys.

Also, ECDSA signatures are malleable. is a valid signature, but so is ! This can mess with some applications if they expect signatures not to change. Legacy Bitcoin P2PKH addresses are still vulnerable to this - any transaction which spends from P2PKH can have its transaction ID changed if the signature value is inverted.

Furtherermore, ECDSA has pretty weak security proofs.

Furtherestmore, ECDSA (and its forerunner DSA) were originally published in the early nineties to get around the patent on Schnorr Signatures, which finally lapsed in 2008. Bitcoin has been using a knock-off of Schnorr since the genesis block. It’s time to finally upgrade to the brand-name.

Schnorr Signatures

Compared to ECDSA, Schnorr Signatures are a breath of fresh air for their elegant simplicity.

Sample a random nonce .

Multiply by the base-point .

Hash the nonce point , the signing public key , and the message to get the challenge .

* _{If you read about Schnorr Signatures elsewhere online, you might see the challenge computed as . The variant used in Bitcoin is called Key-Prefixed Schnorr, where the challenge also commits to the signing key.}

Compute the signature using the private key .

The final signature is the tuple , where is a point on the curve and is a scalar value.

Verifying Schnorr Signature is easy-peezy. Just multiply both sides of the above equation by and check for equality.

The verifier is assumed to know the public key and the message - otherwise what are they verifying, anyways? Verifying is as fast as a hash, followed by two point-multiplication operations and one point-addition operation.

It’s pretty easy to prove that the signature is valid by simply factoring out .

Thanks to the properties of cryptographically secure elliptic curves like secp256k1, factoring a curve point to find its discrete logarithm isn’t computationally feasible (yet). This is why signatures cannot be forged without knowing the private key , but they can be easily verified using .

but why do these particular equations work?

Far be it from me to reverse-justify Schnorr’s design, but perhaps I can at least point out what each step and each variable is doing from the perspective of someone attempting to attack the scheme.

Recall the definition of a Schnorr Signature.

Note that:

and are both randomly sampled from the set of integers modulo , AKA .
The nonce changes for every signature whereas the private key remains consistent.
can be computed by anyone who knows the (presumably) public parameters .

When the signer multiplies their private key with the challenge , this results in another scalar value somewhere in which they could only have computed if they knew . For instance, someone with only and would have no idea how to compute .

So what’s the point of adding ? Why not make the signature ? Because is public, so any observer could compute the private key by inverting .

This is why must be secret and uniformly random: These properties prevent an observer from being able to compute the signer’s private key from the signature.

If an observer knew the value used on a signature , they could compute the signer’s private key.

Another common gotcha: must be different for every signature. If the same nonce is used in two different signatures made by the same private key on distinct messages , then an observer can easily compute the private key used to sign both messages by solving a system of equations.

Interestingly, that definition of indicates something quite unexpected. is the slope of the line connecting the points and on the Cartesian plane, but only if is reused between both signatures.

Contrastingly, if we used two different nonces when creating the two signatures, an observer cannot solve for without knowing either or , because there is no usable system of equations.

What if we used different nonces to sign the same message? Would this result in any Bad News™? No, because remember that the challenge also commits to the nonce.

If the nonce changes, so does . Even if only committed to and , an attacker would still need to know or to compute .

but why is this so cool?

Schnorr is faster.
Schnorr is simpler to implement than ECDSA.
Schnorr signatures aren’t malleable.
Schnorr has very solid security proofs.
Schnorr permits linear signature aggregation.

I bolded #5 there because that one feature is incredibly important for the future potential of Bitcoin.

Aggregator or Crocodile?

Signature aggregation is ~~not possible~~ very difficult in ECDSA. ECDSA Threshold signatures are perfectly achievable, and these are handy but nowhere near as easy or as fast as linear signature aggregation with Schnorr. And as if it needed to flex even harder, Schnorr can do threshold signatures even better than ECDSA.

When I say Schnorr Signatures are linear, this means that the only operations needed for Schnorr are simple scalar addition and multiplication. Verification is the same: one only needs point-addition and point-multiplication to verify the signature. ECDSA on the other hand, requires modular multiplicative inversion to sign and verify.

This opaque-sounding explanation can be boiled down to one highly consequential fact:

The sum of a set of Schnorr signatures on a given message is a valid signature under the sum of the public keys which made those signatures.

In other words, if a bunch of different signers cooperatively sign the same message, those signatures can be summed up to produce an aggregated signature. If one also sums up the pubkeys of those who signed to get an aggregated pubkey, the aggregated signature will be valid under the aggregated pubkey.

Naive Example

Let’s say you have three parties with their own private and public key pairs. They each sample their own private random nonce.

Name	Public Key	Private Key	Nonce
Alice
Bob
Carol

They all want to sign the same message , and naturally have agreed on the base point and hash function .

They each compute their own public nonce points independently.

They agree on an aggregated nonce point .

They agree on an aggregated public key .

They each hash the message with this aggregated nonce point and pubkey.

They each compute their share of the signature.

They send each other their values and aggregate by summing them all up.

The final aggregated signature is .

To verify, recall that a signature from pubkey is valid on message if:

And this is going to be the case for our aggregated signature. Recall how Alice, Bob and Carol aggregated .

Our verification works out because by aggregating the signatures, we also aggregated the individual nonces and the private keys (multiplied by ). We can then factor out , meaning we kinda multiplied it with an aggregated private key, and added an aggregated secret nonce, even though participants never exposed their private keys or secret nonces to one another.

Multiplying and will thus satisfy the equality.

HOWEVER

This example has a hidden flaw if used as a multisignature protocol under adversarial conditions (when co-signers don’t trust each other). Can you see why?

_{hint: the attack vector is very early in the aggregation procedure.}

Tell me the answer already!

A naive protocol like this is vulnerable to a Rogue Key Attack when co-signing with untrusted peers. This attack can be performed by whomever is last to share their public key with the other signers.

Let’s return to Alice, Bob, and Carol.

Alice tells Bob and Carol her pubkey .
Bob tells Alice and Carol his pubkey .
Alice and Bob now expect Carol to share her public key.

Carol has a sneaky option here. She can give Alice and Bob a phony public key which is actually the same as her public key but with Alice’s key and Bob’s key subtracted from it.

Carol doesn’t know the discrete log (private key) of , but that’s OK, because when everyone computes the final aggregated public key, they’ll sum each other’s keys together, and Alice and Bob’s keys will cancel out.

Carol has just fooled Alice and Bob into agreeing to use an apparently aggregated key which only Carol controls.

Neither Alice nor Bob would be able to distinguish between Carol’s authentic key and the phoney key because they don’t know belongs to Carol! That’s why they were sharing the keys in the first place.

Carol could also collude with Bob to exclude Alice’s key from the final aggregated key, like so.

The Solution

Rogue Key Attacks can be fixed naively by requiring that each signer to prove she knows the private key for her public key. Such an affirmation is called Knowledge-of-Secret-Key (KOSK). This is flawed because not every key I want to aggregate with is going to be fully under my control all the time. Perhaps I want to aggregate a public key right now, but I can only expect to learn its secret key next week (e.g. in a Discreet Log Contract. Perhaps I want to aggregate a public key which is itself an aggregated key, whose component secret keys are owned by 3rd parties.

Instead modern Bitcoin developers use a kind of scrambling protocol to avoid the risk of rogue keys. This is what MuSig offers. ~~I’m looking forward to discussing MuSig in another post~~ Take a look at my article about MuSig to learn what all the fuss is about.

Benefits

In combination with Taproot, developers can embed arbitrary combinations of aggregated public keys into a single tweaked public key which encodes a highly complex array of possible spending conditions, but still looks like any old normal public key when used under normal conditions.

Bitcoin developers can use multisignature signing schemes like MuSig and threshold signing protocols like ROAST with much greater efficiency, security, and privacy than has ever been possible before with ECDSA.

Any given public key can now be an aggregation of colossal numbers of child keys and scripts and threshold public keys and much much more.