Superpower 3.2.2-dev-00214

Superpower NuGet Version

A parser combinator library based on Sprache. Superpower generates friendlier error messages through its support for token-driven parsers.

Logo

What is Superpower?

The job of a parser is to take a sequence of characters as input, and produce a data structure that's easier for a program to analyze, manipulate, or transform. From this point of view, a parser is just a function from string to T - where T might be anything from a simple number, a list of fields in a data format, or the abstract syntax tree of some kind of programming language.

Just like other kinds of functions, parsers can be built by hand, from scratch. This is-or-isn't a lot of fun, depending on the complexity of the parser you need to build (and how you plan to spend your next few dozen nights and weekends).

Superpower is a library for writing parsers in a declarative style that mirrors the structure of the target grammar. Parsers built with Superpower are fast, robust, and report precise and informative errors when invalid input is encountered.

Usage

Superpower is embedded directly into your C# program, without the need for any additional tools or build-time code generation tasks.

dotnet add package Superpower

The simplest text parsers consume characters directly from the source text:

// Parse any number of capital 'A's in a row
var parseA = Character.EqualTo('A').AtLeastOnce();

The Character.EqualTo() method is a built-in parser. The AtLeastOnce() method is a combinator, that builds a more complex parser for a sequence of 'A' characters out of the simple parser for a single 'A'.

Superpower includes a library of simple parsers and combinators from which more sophisticated parsers can be built:

TextParser<string> identifier =
    from first in Character.Letter
    from rest in Character.LetterOrDigit.Or(Character.EqualTo('_')).Many()
    select first + new string(rest);

var id = identifier.Parse("abc123");

Assert.Equal("abc123", id);

Parsers are highly modular, so smaller parsers can be built and tested independently of the larger parsers that use them.

Tokenization

Along with text parsers that consume input character-by-character, Superpower supports token parsers.

A token parser consumes elements from a list of tokens. A token is a fragment of the input text, tagged with the kind of item that fragment represents - usually specified using an enum:

public enum ArithmeticExpressionToken
{
    None,
    Number,
    Plus,

A major benefit of driving parsing from tokens, instead of individual characters, is that errors can be reported in terms of tokens - unexpected identifier `frm`, expected keyword `from` - instead of the cryptic unexpected `m`.

Token-driven parsing takes place in two distinct steps:

  1. Tokenization, using a class derived from Tokenizer<TKind>, then
  2. Parsing, using a function of type TokenListParser<TKind>.
var expression = "1 * (2 + 3)";

// 1.
var tokenizer = new ArithmeticExpressionTokenizer();
var tokenList = tokenizer.Tokenize(expression);

// 2.
var parser = ArithmeticExpressionParser.Lambda; // parser built with combinators
var expressionTree = parser.Parse(tokenList);

// Use the result
var eval = expressionTree.Compile();
Console.WriteLine(eval()); // -> 5

Assembling tokenizers with TokenizerBuilder<TKind>

The job of a tokenizer is to split the input into a list of tokens - numbers, keywords, identifiers, operators - while discarding irrelevant trivia such as whitespace or comments.

Superpower provides the TokenizerBuilder<TKind> class to quickly assemble tokenizers from recognizers, text parsers that match the various kinds of tokens required by the grammar.

A simple arithmetic expression tokenizer is shown below:

var tokenizer = new TokenizerBuilder<ArithmeticExpressionToken>()
    .Ignore(Span.WhiteSpace)
    .Match(Character.EqualTo('+'), ArithmeticExpressionToken.Plus)
    .Match(Character.EqualTo('-'), ArithmeticExpressionToken.Minus)
    .Match(Character.EqualTo('*'), ArithmeticExpressionToken.Times)
    .Match(Character.EqualTo('/'), ArithmeticExpressionToken.Divide)
    .Match(Character.EqualTo('('), ArithmeticExpressionToken.LParen)
    .Match(Character.EqualTo(')'), ArithmeticExpressionToken.RParen)
    .Match(Numerics.Natural, ArithmeticExpressionToken.Number)
    .Build();

Tokenizers constructed this way produce a list of tokens by repeatedly attempting to match recognizers against the input in top-to-bottom order.

Writing tokenizers by hand

Tokenizers can alternatively be written by hand; this can provide the most flexibility, performance, and control, at the expense of more complicated code. A handwritten arithmetic expression tokenizer is included in the test suite, and a more complete example can be found here.

Writing token list parsers

Token parsers are defined in the same manner as text parsers, using combinators to build up more sophisticated parsers out of simpler ones.

class ArithmeticExpressionParser
{
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Add =
        Token.EqualTo(ArithmeticExpressionToken.Plus).Value(ExpressionType.AddChecked);
        
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Subtract =
        Token.EqualTo(ArithmeticExpressionToken.Minus).Value(ExpressionType.SubtractChecked);
        
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Multiply =
        Token.EqualTo(ArithmeticExpressionToken.Times).Value(ExpressionType.MultiplyChecked);
        
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Divide = 
        Token.EqualTo(ArithmeticExpressionToken.Divide).Value(ExpressionType.Divide);

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Constant =
            Token.EqualTo(ArithmeticExpressionToken.Number)
            .Apply(Numerics.IntegerInt32)
            .Select(n => (Expression)Expression.Constant(n));

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Factor =
        (from lparen in Token.EqualTo(ArithmeticExpressionToken.LParen)
            from expr in Parse.Ref(() => Expr)
            from rparen in Token.EqualTo(ArithmeticExpressionToken.RParen)
            select expr)
        .Or(Constant);

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Operand =
        (from sign in Token.EqualTo(ArithmeticExpressionToken.Minus)
            from factor in Factor
            select (Expression)Expression.Negate(factor))
        .Or(Factor).Named("expression");

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Term =
        Parse.Chain(Multiply.Or(Divide), Operand, Expression.MakeBinary);

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Expr =
        Parse.Chain(Add.Or(Subtract), Term, Expression.MakeBinary);

    public static readonly TokenListParser<ArithmeticExpressionToken, Expression<Func<int>>>
        Lambda = Expr.AtEnd().Select(body => Expression.Lambda<Func<int>>(body));
}

Error messages

The error scenario tests demonstrate some of the error message formatting capabilities of Superpower. Check out the parsers referenced in the tests for some examples.

ArithmeticExpressionParser.Lambda.Parse(new ArithmeticExpressionTokenizer().Tokenize("1 + * 3"));
     // -> Syntax error (line 1, column 5): unexpected operator `*`, expected expression.

To improve the error reporting for a particular token type, apply the [Token] attribute:

public enum ArithmeticExpressionToken
{
    None,

    Number,

    [Token(Category = "operator", Example = "+")]
    Plus,

Performance

Superpower is built with performance as a priority. Less frequent backtracking, combined with the avoidance of allocations and indirect dispatch, mean that Superpower can be quite a bit faster than Sprache.

Recent benchmark for parsing a long arithmetic expression:

Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Windows
Processor=?, ProcessorCount=8
Frequency=2533306 ticks, Resolution=394.7411 ns, Timer=TSC
CLR=CORE, Arch=64-bit ? [RyuJIT]
GC=Concurrent Workstation
dotnet cli version: 1.0.0-preview2-003121

Type=ArithmeticExpressionBenchmark  Mode=Throughput  

Method Median StdDev Scaled Scaled-SD
Sprache 283.8618 µs 10.0276 µs 1.00 0.00
Superpower (Token) 81.1563 µs 2.8775 µs 0.29 0.01

Benchmarks and results are included in the repository.

Tips: if you find you need more throughput: 1) consider a hand-written tokenizer, and 2) avoid the use of LINQ comprehensions and instead use chained combinators like Then() and especially IgnoreThen() - these allocate fewer delegates (closures) during parsing.

Examples

Superpower is introduced, with a worked example, in this blog post.

Example parsers to learn from:

  • JsonParser is a complete, annotated example implementing the JSON spec with good error reporting
  • DateTimeTextParser shows how Superpower's text parsers work, parsing ISO-8601 date-times
  • IntCalc is a simple arithmetic expresion parser (1 + 2 * 3) included in the repository, demonstrating how Superpower token parsing works
  • Plotty implements an instruction set for a RISC virtual machine
  • tcalc is an example expression language that computes durations (1d / 12m)

Real-world projects built with Superpower:

  • Serilog.Expressions uses Superpower to implement an expression and templating language for structured log events
  • The query language of Seq is implemented using Superpower
  • seqcli extraction patterns use Superpower for plain-text log parsing
  • PromQL.Parser is a parser for the Prometheus Query Language

Have an example we can add to this list? Let us know.

Getting help

Please post issues to the issue tracker, or tag your question on StackOverflow with superpower.

The repository's title arose out of a talk "Parsing Text: the Programming Superpower You Need at Your Fingertips" given at DDD Brisbane 2015.

Showing the top 20 packages that depend on Superpower.

Packages Downloads
Serilog.Filters.Expressions
Expression-based event filtering for Serilog.
10
Serilog.Filters.Expressions
Expression-based event filtering for Serilog.
11

.NET 6.0

  • No dependencies.

.NET 8.0

  • No dependencies.

.NET Standard 2.0

Version Downloads Last updated
3.2.2-dev-00214 2 28.05.2026
3.2.1 7 30.04.2026
3.2.1-dev-00211 8 29.04.2026
3.2.1-dev-00210 8 29.04.2026
3.2.0 7 29.04.2026
3.2.0-dev-00207 7 28.04.2026
3.2.0-dev-00206 8 28.04.2026
3.2.0-dev-00204 7 27.04.2026
3.2.0-dev-00203 7 27.04.2026
3.1.1-dev-00257 9 26.10.2025
3.1.1-dev-00201 6 27.04.2026
3.1.0 11 23.08.2025
3.1.0-dev-00250 11 23.08.2025
3.1.0-dev-00247 10 23.08.2025
3.1.0-dev-00245 11 23.08.2025
3.1.0-dev-00241 10 23.08.2025
3.1.0-dev-00239 11 23.08.2025
3.0.0 11 23.08.2025
3.0.0-dev-00236 11 23.08.2025
3.0.0-dev-00233 11 23.08.2025
3.0.0-dev-00231 11 23.08.2025
3.0.0-dev-00229 11 23.08.2025
3.0.0-dev-00222 11 23.08.2025
3.0.0-dev-00221 12 23.08.2025
3.0.0-dev-00220 10 23.08.2025
3.0.0-dev-00219 11 23.08.2025
3.0.0-dev-00215 11 23.08.2025
3.0.0-dev-00213 11 23.08.2025
3.0.0-dev-00210 11 23.08.2025
3.0.0-dev-00206 11 23.08.2025
3.0.0-dev-00203 11 23.08.2025
3.0.0-dev-00201 12 23.08.2025
3.0.0-dev-00196 11 23.08.2025
3.0.0-dev-00193 11 23.08.2025
3.0.0-dev-00189 11 23.08.2025
2.3.1-dev-00185 9 23.08.2025
2.3.1-dev-00184 9 23.08.2025
2.3.0 11 23.08.2025
2.3.0-dev-00172 9 23.08.2025
2.2.0 11 23.08.2025
2.2.0-dev-00170 11 23.08.2025
2.2.0-dev-00169 10 23.08.2025
2.2.0-dev-00161 11 23.08.2025
2.1.1-dev-00159 10 23.08.2025
2.1.1-dev-00157 10 23.08.2025
2.1.0 11 23.08.2025
2.1.0-dev-00155 11 23.08.2025
2.1.0-dev-00135 10 23.08.2025
2.1.0-dev-00132 11 23.08.2025
2.1.0-dev-00128 11 23.08.2025
2.1.0-dev-00126 10 23.08.2025
2.1.0-dev-00124 11 23.08.2025
2.0.1-dev-00123 11 23.08.2025
2.0.1-dev-00120 11 23.08.2025
2.0.0 11 23.08.2025
2.0.0-dev-00116 12 23.08.2025
2.0.0-dev-00115 12 23.08.2025
2.0.0-dev-00114 11 23.08.2025
2.0.0-dev-00111 11 23.08.2025
2.0.0-dev-00109 11 23.08.2025
2.0.0-dev-00107 11 23.08.2025
2.0.0-dev-00103 10 23.08.2025
2.0.0-dev-00102 11 23.08.2025
2.0.0-dev-00101 11 23.08.2025
2.0.0-dev-00100 10 23.08.2025
2.0.0-dev-00093 11 23.08.2025
2.0.0-dev-00091 11 23.08.2025
2.0.0-dev-00089 11 23.08.2025
1.1.1-dev-00086 11 23.08.2025
1.1.1-dev-00081 11 23.08.2025
1.1.1-dev-00079 11 23.08.2025
1.1.0 10 23.08.2025
1.1.0-dev-00065 11 23.08.2025
1.1.0-dev-00063 11 23.08.2025
1.1.0-dev-00061 11 23.08.2025
1.1.0-dev-00059 11 23.08.2025
1.0.3-dev-00051 11 23.08.2025
1.0.2 11 23.08.2025
1.0.1 11 23.08.2025
1.0.1-dev-00044 11 23.08.2025
1.0.0 11 23.08.2025
1.0.0-dev-00038 10 23.08.2025
1.0.0-dev-00037 11 23.08.2025
1.0.0-dev-00036 11 23.08.2025
1.0.0-dev-00035 11 23.08.2025
1.0.0-dev-00034 11 23.08.2025
1.0.0-dev-00033 11 23.08.2025
1.0.0-dev-00032 11 23.08.2025
1.0.0-dev-00031 11 23.08.2025
1.0.0-dev-00030 11 23.08.2025
1.0.0-dev-00029 11 23.08.2025
1.0.0-dev-00028 12 23.08.2025
1.0.0-dev-00027 10 23.08.2025
1.0.0-dev-00026 11 23.08.2025
1.0.0-dev-00025 11 23.08.2025
1.0.0-dev-00024 11 23.08.2025
1.0.0-dev-00023 11 23.08.2025
1.0.0-dev-00022 12 23.08.2025
1.0.0-dev-00021 11 23.08.2025
1.0.0-dev-00020 10 23.08.2025
1.0.0-dev-00019 11 23.08.2025
1.0.0-dev-00018 11 23.08.2025
1.0.0-dev-00017 10 23.08.2025
1.0.0-dev-00016 11 23.08.2025
1.0.0-dev-00014 11 23.08.2025
1.0.0-dev-00013 11 23.08.2025
1.0.0-dev-00012 11 23.08.2025
1.0.0-dev-00010 10 23.08.2025
1.0.0-dev-00009 10 23.08.2025
1.0.0-dev-00008 11 23.08.2025
1.0.0-dev-00007 11 23.08.2025
1.0.0-dev-00006 10 23.08.2025
1.0.0-dev-00005 11 23.08.2025
1.0.0-dev-00004 11 23.08.2025
1.0.0-dev-00003 11 23.08.2025
1.0.0-dev-00002 11 23.08.2025