UnrealScript Grammar
This is a EBNF Specification of the UnrealScript grammar.
It can be useful if you are going to write a parser for the UnrealScript language.
Note: this is not the official specification, it's made by visitors of the UnrealWiki.
Important note: the stock UnrealScript compiler doesn't follow strict rules as usually specified by a grammar like this. It is very well possible that the compiler accepts constructions not documented here. See UnrealScript Language Test for actual examples of various constructions.
Edit guidelines:
- all non-terminals should have all uppercase characters. Keep everything aligned. If you leave things open use '...' to make that clear.
- Always use as much brackets as needed, don't optimize because this can result in confusion.
- Terminals that are words can be used directly in the production rules, otherwise you must use a terminal rule.
Non-Terminals
PROGRAM = CLASSDECL ( DECLARATIONS )* ( REPLICATIONBLOCK )? BODY ( DEFAULTPROPERTIESBLOCK )? CLASSDECL = class IDENTIFIER ( extends PACKAGEIDENTIFIER )? ( CLASSPARAMS )* SEMICOLON CLASSPARAMS = CONSTCLASSPARAMS | within PACKAGEIDENTIFIER | dependson LBRACK PACKAGEIDENTIFIER RBRACK | config ( LBRACK PACKAGEIDENTIFIER RBRACK )? | hidecategories LBRACK IDENTIFIERLIST RBRACK | showcategories LBRACK IDENTIFIERLIST RBRACK IDENTIFIER = ( ALPHA | UNDERSCORE ) ( ALPHA | UNDERSCORE | DIGIT )* // packagename.classname or classname.structname PACKAGEIDENTIFIER = ( IDENTIFIER DOT )? IDENTIFIER QUALIFIEDIDENTIFIER = ( ( class SQUOTE PACKAGEIDENTIFIER SQUOTE DOT default DOT IDENTIFIER ) | ( ( IDENTIFIER DOT )* IDENTIFIER ) ) IDENTIFIERLIST = IDENTIFIER ( COMMA IDENTIFIER )* STRINGVAL = DQUOTE PRINTABLE DQUOTE INTVAL = ( DIGIT+ | ( '0x' ( HEXDIGIT )+ ) ) FLOATVAL = ( DIGIT )+ DOT ( DIGIT )*
Declaration parts
DECLARATIONS = ( CONSTDECL | VARDECL | ENUMDECL | STRUCTDECL ) SEMICOLON CONSTDECL = const IDENTIFIER = CONSTVALUE CONSTVALUE = ( STRINGVAL | INTVAL | FLOATVAL | BOOLVAL ) VARDECL = var ( CONFIGGROUP )? ( VARPARAMS )* VARTYPE VARIDENTIFIER ( COMMA VARIDENTIFIER )* CONFIGGROUP = LBRACK ( IDENTIFIER )? RBRACK VARTYPE = PACKAGEIDENTIFIER | ENUMDECL | STRUCTDECL | ARRAYDECL | CLASSTYPE | BASICTYPE VARIDENTIFIER = IDENTIFIER ( LSBRACK INTVAL RSBRACK ) ARRAYDECL = array LABRACK (PACKAGEIDENTIFIER | CLASSTYPE | BASICTYPE) RABRACK CLASSTYPE = class LABRACK PACKAGEIDENTIFIER RABRACK ENUMDECL = enum IDENTIFIER LCBRACK ENUMOPTIONS RCBRACK ENUMOPTIONS = IDENTIFIER ( COMMA IDENTIFIER )* STRUCTDECL = struct ( STRUCTPARAMS )* IDENTIFIER ( extends PACKAGEIDENTIFIER )? LCBRACK STRUCTBODY RCBRACK STRUCTPARAMS = ( native | export ) STRUCTBODY = ( VARDECL SEMICOLON )+
Replication parts
REPLICATIONBLOCK = replication LCBRACK ( REPLICATIONBODY )* RCBRACK REPLICATIONBODY = ( reliable | unreliable ) if LBRACK EXPR RBRACK IDENTIFIER ( COMMA IDENTIFIER )* SEMICOLON
Body parts
BODY = ( STATEDECL | FUNCTIONDECL )*
State parts
STATEDECL = ( STATEPARAMS )* state IDENTIFIER ( extends IDENTIFIER )? STATEBODY STATEBODY = LCBRACK ( STATEIGNORE )? ( FUNCTIONDECL )* STATELABELS RCBRACK STATEIGNORE = ignores IDENTIFIER ( COMMA IDENTIFIER )* SEMICOLON STATELABELS = ( IDENTIFIER COLON ( CODELINE )* )*
Function parts
// operators require an set amouth of arguments FUNCTIONDECL = ( NORMALFUNC | OPERATORFUNC ) NORMALFUNC = ( FUNCTIONPARAMS )* FUNCTIONTYPE ( LOCALTYPE )? IDENTIFIER LBRACK ( FUNCTIONARGS ( COMMA FUNCTIONARGS )* )? RBRACK FUNCTIONBODY FUNCTIONPARAMS = CONSTFUNCPARAMS | native ( LBRACK INTVAL RBRACK )? OPERATORFUNC = ( FUNCTIONPARAMS )* OPERATORTYPE FUNCTIONBODY OPERATORTYPE = ( BINARYOPERATOR | UNARYOPERATOR ) // requires two arguments BINARYOPERATOR = operator LBRACK INTVAL RBRACK PACKAGEIDENTIFIER OPIDENTIFIER LBRACK FUNCTIONARGS COMMA FUNCTIONARGS RBRACK // requires one argument UNARYOPERATOR = ( preoperator | postoperator ) PACKAGEIDENTIFIER OPIDENTIFIER LBRACK FUNCTIONARGS RBRACK OPIDENTIFIER = IDENTIFIER | OPERATORNAMES FUNCTIONARGS = ( optional | out | coerce )? FUNCTIONARGTYPE IDENTIFIER FUNCTIONARGTYPE = BASICTYPE | PACKAGEIDENTIFIER FUNCTIONBODY = ( SEMICOLON | ( ( LOCALDECL )* ( CODELINE )* ) ( SEMICOLON )? ) LOCALDECL = local LOCALTYPE IDENTIFIER ( COMMA IDENTIFIER )* LOCALTYPE = PACKAGEIDENTIFIER | ARRAYDECL | CLASSTYPE | BASICTYPE
Code parts
CODELINE = ( STATEMENT | ASSIGNMENT | IFTHENELSE | WHILELOOP | DOLOOP | SWITCHCASE | RETURNFUNC | FOREACHLOOP | FORLOOP ) CODEBLOCK = ( CODELINE | ( LCBRACK ( CODELINE )* RCBRACK ) ) STATEMENT = FUNCCALL SEMICOLON ASSIGNMENT = IDENTIFIER EQUALS EXPR SEMICOLON IFTHENELSE = if LBRACK EXPR RBRACK CODEBLOCK ( else CODEBLOCK )? WHILELOOP = while LBRACK EXPR RBRACK CODEBLOCK DOLOOP = do CODEBLOCK until LBRACK EXPR RBRACK SWITCHCASE = switch LBRACK EXPR RBRACK LCBRACK ( CASERULE )+ ( DEFAULTRULE )? RCBRACK CASERULE = case INTVAL COLON CODEBLOCK DEFAULTRULE = default CODEBLOCK RETURNFUNC = return ( EXPR )? SEMICOLON FOREACHLOOP = foreach FUNCCALL CODEBLOCK FORLOOP = for LBRACK ASSIGNMENT SEMICOLON EXPR SEMICOLON EXPR RBRACK CODEBLOCK EXPR = OPERAND ( OPIDENTIFIER OPERAND )* OPERAND = ( CONSTVALUE | QUALIFIEDIDENTIFIER | FUNCCALL ) FUNCCALL = ( ( class SQUOTE PACKAGEIDENTIFIER SQUOTE DOT static DOT ) | ( ( IDENTIFIER DOT )+ ) )? IDENTIFIER LBRACK ( EXPR ( COMMA EXPR )* )? RBRACK
Defaultproperties
DEFAULTPROPERTIESBLOCK = defaultproperties LCBRACK ( DEFPROP )* RCBRACK DEFPROP = DEFPROPIDENTIFIER EQUALS PRINTABLE DEFPROPIDENTIFIER = IDENTIFIER ( ( LBRACK INTVAL RBRACK ) | ( LSBRACK INTVAL RSBRACK ) )?
Terminals
PRINTABLE = all printable characters ALPHA = 'a' .. 'z' DIGIT = '0' .. '9' HEXDIGIT = DIGIT | 'a' .. 'f' SEMICOLON = ';' COLON = ':' UNDERSCORE = '_' LBRACK = '(' RBRACK = ')' LABRACK = '<' RABRACK = '>' LCBRACK = '{' RCBRACK = '}' LSBRACK = '[' RSBRACK = ']' DOT = '.' COMMA = ',' SQUOTE = ''' DQUOTE = '"' EQUALS = '=' CONSTCLASSPARAMS = abstract | native | nativereplication | safereplace | perobjectconfig | transient | noexport | exportstructs | // available from warfare and up: collapsecategories | dontcollapsecategories | placeable | notplaceable | editinlinenew | noteditinlinenew BOOLVAL = true | false VARPARAMS = config | const | editconst | export | globalconfig | input | localized | native | private | protected | transient | travel | // available from warfare and up: editinline | deprecated | edfindable | editinlineuse STATEPARAMS = auto | simulated CONSTFUNCTPARAMS = final | iterator | latent | simulated | singular | static | exec | protected | private BASICTYPE = byte | int | float | string | bool | name | class FUNCTIONTYPE = function | event | delegate OPERATORNAMES = '~' | '!' | '@' | '#' | '$' | '%' | '^' | '&' | '*' | '-' | '=' | '+' | '|' | '\' | ':' | '<' | '>' | '/' | '?' | '`' | '<<' | '>>' | '!=' | '<=' | '>=' | '++' | '?-' | '+=' | '-=' | '*=' | '/=' | '&&' | '||' | '^^' | '==' | '**' | '~=' | '@=' | '>>>'
Notes
Case
UnrealScript is case insensitive, so all terminals may be written in any case format. Because of this the uppercase variants for ALPHA and HEXDIGIT are omitted.
Unreal Engine
This grammar applies to the UnrealEngine 2. Older versions of the Unreal engine have a few diffirences. Here's a list of changes to this grammar to be applied for older versions.
- extends can be replaced with expands
- The ARRAYDECL rule does not apply
- in the CLASSPARAMS rule the following do not apply:
- within PACKAGEIDENTIFIER
- dependson LBRACK PACKAGEIDENTIFIER RBRACK
- hidecategories LBRACK IDENTIFIERLIST RBRACK
- showcategories LBRACK IDENTIFIERLIST RBRACK
- In CONSTCLASSPARAMS nousercreate is allowed
- STRUCTPARAMS does not apply
Related Topics
- Class Syntax
- Function Syntax
- Variable Syntax
- http://mimesis.csc.ncsu.edu/Unreal/Syntax.htm
- UnrealScript Language Test
Discussion
El Muerte TDS: As suggested in UnDox Revisited , so hell why not
Tarquin: Nice
Jerome-X This can be very useful for the parser in the UCEditor plugin. Thanks
El Muerte TDS: The only open things are the class, var and function params, for the rest is should be done. So if anyone could verify the stuff I wrote down, I might have missed some things.
El Muerte TDS: done, no more open rules
CaptainNuss: Greetings, just added the local keyword for variable declarations. Btw, why aren't the basic built-in variable types listed in this specification?
Mychaeel: "local" is covered by LOCALDECL already. In VARDECL it's a bug.
CaptainNuss: Oops, I'm sorry. Didn't see that.
El Muerte TDS: you're right about the basic types, added them now, also the function return type was incorrect (functions can also return arrays, etc..)
The reson why var and local are diffirent is because inline enum and struct definitions are not allowed in local but are in var.
Wormbo: Is there a (free) program that can check a source code file against an EBNF definition?
El Muerte TDS: not that I know of. But there are programs that create a parser from a EBNF definition (needs some chaning tho): http://catalog.compilertools.net/lexparse.html and one not in that list [ANTLR]
Tarquin: I've changed:
CONFIGGROUP = LBRACK IDENTIFIER RBRACK
as the actual IDENTIFIER is optional, right?
El Muerte: uh, yeah. there are a few other things that might also be changed, I've come across a couple of "hacks" that are apperently legal too Not to speak of the things Unreal2 allows. Also there are a couple of new UT2004 keywords missing.
Iainmcgin: i changed FUNCTIONARGS so that the type of the parameter can include the basic types (int, float etc). I'm working on a SableCC grammar file for UnrealScript at the moment, so I'll note down other errors in this EBNF as i find them.
sprfreak14: Should comments be included in this EBNF?
Proposal:
COMMENT = ( SINGLELINECOMMENT | DELIMITEDCOMMENT ) SINGLELINECOMMENT = '//' ( NOTNEWLINE )* NOTNEWLINE = Any character except a new line character DELIMITEDCOMMENT = '/*' ( DELIMITEDCOMMENTCHARACTERS )* '*/' DELIMITEDCOMMENTCHARACTERS = ( NOTASTERISK | '*' NOTSLASH ) NOTASTERISK = Any character except '*' NOTSLASH = Any character except '/'
Sweavo: the usual way (i.e. the way I would do it writing a C parser) to deal with comments is at the [Lexing stage], i.e. there is a stage before parsing that recognizes tokens. Comments are reduced to whitespace at that stage. While the comment stuff above looks OK at a glance, the problem is that you then have to put COMMENT all over the place in the grammar to reflect all the valid places for a comment. Pretty much destroys the usefulness of the grammar. But I agree, if this is to be useful to people writing syntax highlighters, comments should be addressed.
sprfreak14: Is DOT default DOT IDENTIFIER in QUALIFIEDIDENTIFIER optional?