License	License BSD-3-Clause
Categories	Categories Data Leaf Databases
GroupId	GroupId io.github.copper-leaf
ArtifactId	ArtifactId kudzu-core-metadata
Last Version	Last Version 2.5.0
Release Date	Release Date 13-May-2021
Type	Type pom.sha512
Description	Description kudzu-core null
Project URL	Project URL https://github.com/copper-leaf/kudzu
Source Code Management	Source Code Management https://github.com/copper-leaf/kudzu.git

Filename	Size
kudzu-core-metadata-2.5.0.pom
kudzu-core-metadata-2.5.0-sources.jar	61 KB
kudzu-core-metadata-2.5.0-javadoc.jar	261 bytes
Browse

Group / Artifact	Type	Version
org.jetbrains.kotlin : kotlin-stdlib-common	jar	1.4.32

Kudzu

A monadic (I think...) recursive-descent parser combinator written in Kotlin

Kudzu is a recursive-descent parser written in Kotlin, inspired by Parsec, with the goal of immutability, simplicity, testability, and multiplatform usability. It's designed to be a simple starting place for writing smaller parsers to evaluate relatively simple grammars for other Copper-Leaf libraries, but flexible enough to be used for larger languages.

Notable features:

Multiplatform targets: JVM, Android, JS Legacy, JS IR, iOS
No separate lexer/parser. You really just don't need it, so Kudzu omits it
Parser combinator structure means every piece of your grammar is a complete parser, and thus smaller parsing units can be tested in isolation, but the combination of them creates the full language
Everything in Kudzu is immutable, and thus fully thread-safe
Built on top of DeepRecursiveFunction to provide readable stacktraces and prevent StackOverflowErrors
No complicated DSL or arcane combinator operators required, just normal, readable Kotlin classes. You don't need a Ph.D. in computational linguistics or functional programming to understand a Kudzu parser
Many useful combinators provided out of the box
- Boolean, Int, Double literals
- String and Character literals, with standard escaped characters (i.e. \n) and Unicode sequences (i.e. \u00A2)
- Identifiers
- Choice, repetition, optional higher-kinded parsers
Parses input to an Abstract Syntax Tree (AST) and provides facilities for simplifying and/or introspecting the AST
Line- and column-number source tracking
Generic expression parser with all the fixin's
- Customizable operator precedence
- Operators with prefix, postfix, and infix with both left- and right-recursive associativity
- Parentheses
- Simplify deeply-nested expressions AST to simpler representation

Installation

repositories {
    mavenCentral()
}

// for plain JVM or Android projects
dependencies {
    implementation("io.github.copper-leaf:kudzu-core:{{site.version}}")
}

// for multiplatform projects
kotlin {
    sourceSets {
        val commonMain by getting {
            dependencies {
                implementation("io.github.copper-leaf:kudzu-core:{{site.version}}")
            }
        }
    }
}

Basic Usage

See tests for example usage of every included parser. A basic example of parsing and evaluating in several different formats follows:

Combine several small parsers into a single larger one

val intLiteralParser = MappedParser(
    SequenceParser(
        MaybeParser(
            CharInParser('-')
        ),
        AtLeastParser(
            DigitParser(),
            minSize = 1
        )
    )
) { it.text.toInt() }

val (node, remainingText) = intLiteralParser.parse(ParserContext.fromString("-123"))
val parsedValue: Int = node.value
expectThat(parsedValue).isEqualTo(-123)

Find-and-replace structured sequences within unstructured text

val variableMap = mapOf(
    "asdf" to 1,
    "qwerty" to 2,
)

val patternToReplace = MappedParser(
    SequenceParser(
        CharInParser('#'),
        CharInParser('{'),
        IdentifierTokenParser(),
        CharInParser('}'),
    )
) {
    val (_, _, identifier, _) = it.children
    variableMap[identifier.text]
}

val findAndReplaceParser = ManyParser(
    PredictiveChoiceParser(
        patternToReplace,
        ScanParser(patternToReplace),
    )
)

val (node, remainingText) = findAndReplaceParser.parse(ParserContext.fromString("the value of #{asdf} is 1, but #{qwerty} is 2"))
expectThat(node.text).isEqualTo("the value of 1 is 1, but 2 is 2")

Construct and evaluate expressions with custom operators

val expressionParser = ExpressionParser<Int>(
    termParser = { IntLiteralParser() },

    Operator.Infix(op = "+", 40) { l, r -> l + r },
    Operator.Infix(op = "-", 40) { l, r -> l - r },
    Operator.Infix(op = "*", 60) { l, r -> l * r },
    Operator.Infix(op = "/", 60) { l, r -> l / r },

    Operator.Prefix(op = "-", 80) { r -> -r },
    Operator.Infixr(op = "^", 70) { l, r -> l.toDouble().pow(r).toInt() },
)

val (node, remainingText) = expressionParser.parse(ParserContext.fromString("2 ^ ((4 - 2) * 2)", skipWhitespace = true))
val value = expressionParser.evaluator.evaluate(node)
expectThat(value).isEqualTo(16)

Implementation Details

In Kudzu, a Parser is a class that extends Parser and implements 2 methods: predict, and parse. predict is a method that checks if the parser is capable of consuming the next character, and parse actually implements the parsing logic, and returns a Node.

There are 2 types of nodes, TerminalNode and NonTerminalNode. A TerminalNode typically holds onto the raw text that was parsed from the input, while a NonTerminalNode holds onto other nodes. In this manner, non-terminal nodes comprise the inner nodes of the parse tree, while terminal nodes comprise the leaves of the parse tree.

Unlike some other parsing libraries, Kudzu does not impose any restrictions on the type of node that a parser produces to keep type parameters to a minimum and code readability to a maximum. Instead of evaluating a parse tree by working with specific subclasses, evaluation is done simply by knowledge of whether a node is a terminal or non-terminal node, and the name of the node. There are APIs to aid in navigating the parse tree and finding specific nodes based on their type or their name.

The APIs are designed that each step is kept very isolated, so that the code for one step can be easily swapped out or reused as-needed, allowing great flexibility, while keeping the code for each phase clean and easy to understand. The general process of parsing and evaluating text with Kudzu is as follows:

1) String
2) ParserContext
3) Parser.parse(ParserContext) -> Pair<Node, ParserContext>
4) Node.visit([Visitor]) -> Unit

The String text that is to be parsed.
Provides sole API for parsers to consume individual characters. Tracks source position.
Each grammar has a single root rule, which is defined as a simple instance of Parser. The result is a single root Node and a ParserContext representing the text that remains unconsumed. A successful parse is expected to return an empty ParserContext. This root parser will recursively call the same method on other parser objects, each one building more nodes in the full tree and advancing the position in the ParserContext.
The Node can be visited by any number of Visitor objects, which recognize and evaluate distinct nodes in the parse tree.

Building Parsers

While you can create custom Parser subclasses which implement your parsing logic, it is typically better to use the built-in parser primitives provided by Kudzu. A basic example of building a Parser which recognizes either a full word or a number follows:

val wordParser = ManyParser(LetterParser())
val numberParser = ManyParser(DigitParser())
val tokenParser = ChoiceParser(
        wordParser, 
        numberParser
)
val statement = ManyParser(
    SequenceParser(
        tokenParser, 
        OptionalWhitespaceParser()
    )
)

val output = statement.parse("one two 1234 asdf 56 qwerty 7890")

This simple grammar will match an input string like one two 1234 asdf 56 qwerty 7890, and demonstrates how complex parsers can be built from smaller ones, and introduces several of the important built-in parses available. Below is a description of some of these parser types (browse source for all available parsers)

LetterParser: Consumes a single letter from the input, as recognized by Kotlin's char.isLetter()
DigitParser: Consumes a single digit from the input, as recognized by Kotlin's char.isDigit()
ManyParser: Takes another Parser and repeatedly consumed input from that parser, for as long as that parser is able to. Since it is itself a Parser, and it takes a Parser as an input, the full grammar is now recursively-defined, and uses a predictive* approach to determining if the next iteration of its parser can continue. You can pass any other Parser to this, not just character-type parsers, and so arbitrarily-complex sub-grammars can be repeated as needed. You'll notice that we gave the parser a name. This name is attached to the nodes it produces, so that when we evaluate the parse tree, we can look for nodes named word or number, and take different actions accordingly.
PredictiveChoiceParser: Takes a list of sub-parsers, and predicatively* picks one to continue parsing with.
SequenceParser: Takes a list of sub-parsers, and executes each one a single time in order.
OptionalWhitespaceParser: Consumes and throws away whitespace if it exists. As the whitespace is optional, and input such as two1234 would still match and be parsed correctly.
LazyParser: Some grammars have production rules that themselves are recursive, such as A := B A. The LazyParser acts as a placeholder, simply delegating to another parser. The recursive rules must be built using these lazy types, since we need a concrete instance to pass to another parsers. This lazy parser allows us to create the parser reference, passing it around to the parsers that need it, and at a later point fill in the details of the parser as needed.

A predictive grammar tests if the parser can be used by first calling its predict method. This method is expected to check if it is able to consume the next character, and if it cannot consume the next character, then the entire parser cannot continue. For many-type parsers, this predictability is used to determine when to stop iterating. For choice-type parsers, this determines which sub-production is chosen: the first sub-parser for which predict returns true will be used, and other rules will not be tested. This is to improve performance and prevent infinite recursion.

Evaluating Parse Trees

Once the full parser has been built, and text parsed into an AST, we can now evaluate it. Evaluating an AST consists of a Visitor.Callback, or a simple lambda callback. A basic example, using a fictional grammar, follows:

val parser = constructParser()

val (node, _) = parser.parse(input)

// simple visiting, such as finding all nodes of a particular type and not caring about the structure
node.visit { node -> 
    // do something with each node as it is entered in the tree
}

// alternatively, visit with a full set of callbacks to also introspect the parse-tree's structure
node.visit(object : Visitor.Callback {
    var depth: Int = 0
    override fun enter(node: Node) {
        depth++
    }
    override fun exit(node: Node) {
        depth--
    }
    override fun onStart() { }
    override fun onFinish() { }
})

kudzu-core

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download kudzu-core-metadata

Dependencies

compile (1)

Project Modules