llama : add token matching support to llama-grammar (#17816)

* llama : add token support to llama-grammar

* fix inverse token comment

* refactor trigger_patterns to replay tokens instead of the entire string

* add token documentation

* fix test-llama-grammar

* improve test cases for tokens
This commit is contained in:
Aldehir Rojas 2025-12-09 00:32:57 -06:00 committed by GitHub
parent 1d2a1ab73d
commit e39502e74b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 400 additions and 38 deletions

View file

@ -67,6 +67,30 @@ Parentheses `()` can be used to group sequences, which allows for embedding alte
- `{m,n}` repeats the precedent symbol or sequence at between `m` and `n` times (included)
- `{0,n}` repeats the precedent symbol or sequence at most `n` times (included)
## Tokens
Tokens allow grammars to match specific tokenizer tokens rather than character sequences. This is useful for constraining outputs based on special tokens (like `<think>` or `</think>`).
Tokens can be specified in two ways:
1. **Token ID**: Use angle brackets with the token ID in square brackets: `<[token-id]>`. For example, `<[1000]>` matches the token with ID 1000.
2. **Token string**: Use angle brackets with the token text directly: `<token>`. For example, `<think>` will match the token whose text is exactly `<think>`. This only works if the string tokenizes to exactly one token in the vocabulary, otherwise the grammar will fail to parse.
You can negate token matches using the `!` prefix: `!<[1000]>` or `!<think>` matches any token *except* the specified one.
```
# Match a thinking block: <think>...</think>
# Using token strings (requires these to be single tokens in the vocab)
root ::= <think> thinking </think> .*
thinking ::= !</think>*
# Equivalent grammar using explicit token IDs
# Assumes token 1000 = <think>, token 1001 = </think>
root ::= <[1000]> thinking <[1001]> .*
thinking ::= !<[1001]>*
```
## Comments and newlines
Comments can be specified with `#`: