NaturalDocs::Languages::Advanced

The base class for all languages that have full support in Natural Docs.  Each one will have a custom parser capable of documenting undocumented aspects of the code.

Summary
NaturalDocs::Languages::AdvancedThe base class for all languages that have full support in Natural Docs.
Implementation
MembersThe class is implemented as a blessed arrayref.
Functions
NewCreates and returns a new object.
TokensReturns the tokens found by ParseForCommentsAndTokens().
SetTokensReplaces the tokens.
ClearTokensResets the token list.
AutoTopicsReturns the arrayref of automatically generated topics, or undef if none.
AddAutoTopicAdds a NaturalDocs::Parser::ParsedTopic to AutoTopics().
ClearAutoTopicsResets the automatic topic list.
ScopeRecordReturns an arrayref of NaturalDocs::Languages::Advanced::ScopeChange objects describing how and when the scope changed thoughout the file.
Parsing FunctionsThese functions are good general language building blocks.
ParseForCommentsAndTokensLoads the passed file, sends all appropriate comments to NaturalDocs::Parser->OnComment(), and breaks the rest into an arrayref of tokens.
PreprocessFileAn overridable function if you’d like to preprocess the file before it goes into ParseForCommentsAndTokens().
TokenizeLineConverts the passed line to tokens as described in ParseForCommentsAndTokens and adds them to Tokens().
TryToSkipStringIf the position is on a string delimiter, moves the position to the token following the closing delimiter, or past the end of the tokens if there is none.
SkipRestOfLineMoves the position to the token following the next line break, or past the end of the tokens array if there is none.
SkipUntilAfterMoves the position to the token following the next occurance of a particular token sequence, or past the end of the tokens array if it never occurs.
IsFirstLineTokenReturns whether the position is at the first token of a line, not including whitespace.
IsLastLineTokenReturns whether the position is at the last token of a line, not including whitespace.
IsAtSequenceReturns whether the position is at a sequence of tokens.
IsBackslashedReturns whether the position is after a backslash.
Scope FunctionsThese functions provide a nice scope stack implementation for language-specific parsers to use.
ClearScopeStackClears the scope stack for a new file.
StartScopeRecords a new scope level.
EndScopeRecords the end of the current scope level.
ClosingScopeSymbolReturns the symbol that ends the current scope level, or undef if we are at the top level.
CurrentScopeReturns the current calculated scope, or undef if global.
CurrentPackageReturns the current calculated package or class, or undef if none.
SetPackageSets the package for the current scope level.
CurrentUsingReturns the current calculated arrayref of SymbolStrings from Using statements, or undef if none.
AddUsingAdds a Using SymbolString to the current scope.
Support Functions
AddToScopeRecordAdds a change to the scope record, condensing unnecessary entries.
CreateStringConverts the specified tokens into a string and returns it.

Implementation

Members

The class is implemented as a blessed arrayref.  The following constants are used as indexes.

TOKENSAn arrayref of tokens used in all the Parsing Functions.
SCOPE_STACKAn arrayref of NaturalDocs::Languages::Advanced::Scope objects serving as a scope stack for parsing.  There will always be one available, with a symbol of undef, for the top level.
SCOPE_RECORDAn arrayref of NaturalDocs::Languages::Advanced::ScopeChange objects, as generated by the scope stack.  If there is more than one change per line, only the last is stored.
AUTO_TOPICSAn arrayref of NaturalDocs::Parser::ParsedTopics generated automatically from the code.

Functions

New

sub New #(name)

Creates and returns a new object.

Parameters

nameThe name of the language.

Tokens

sub Tokens

Returns the tokens found by ParseForCommentsAndTokens().

SetTokens

sub SetTokens #(tokens)

Replaces the tokens.

ClearTokens

sub ClearTokens

Resets the token list.  You may want to do this after parsing is over to save memory.

AutoTopics

sub AutoTopics

Returns the arrayref of automatically generated topics, or undef if none.

AddAutoTopic

sub AddAutoTopic #(topic)

Adds a NaturalDocs::Parser::ParsedTopic to AutoTopics().

ClearAutoTopics

sub ClearAutoTopics

Resets the automatic topic list.  Not necessary if you call ParseForCommentsAndTokens().

ScopeRecord

sub ScopeRecord

Returns an arrayref of NaturalDocs::Languages::Advanced::ScopeChange objects describing how and when the scope changed thoughout the file.  There will always be at least one entry, which will be for line 1 and undef as the scope.

Parsing Functions

These functions are good general language building blocks.  Use them to create your language-specific parser.

All functions work on Tokens() and assume it is set by ParseForCommentsAndTokens().

ParseForCommentsAndTokens

sub ParseForCommentsAndTokens #(FileName sourceFile,
string[] lineCommentSymbols,
string[] blockCommentSymbols,
string[] javadocLineCommentSymbols,
string[] javadocBlockCommentSymbols)

Loads the passed file, sends all appropriate comments to NaturalDocs::Parser->OnComment(), and breaks the rest into an arrayref of tokens.  Tokens are defined as

  • All consecutive alphanumeric and underscore characters.
  • All consecutive whitespace.
  • A single line break.  It will always be “\n”; you don’t have to worry about platform differences.
  • A single character not included above, which is usually a symbol.  Multiple consecutive ones each get their own token.

The result will be placed in Tokens().

Parameters

sourceFileThe source FileName to load and parse.
lineCommentSymbolsAn arrayref of symbols that designate line comments, or undef if none.
blockCommentSymbolsAn arrayref of symbol pairs that designate multiline comments, or undef if none.  Symbol pairs are designated as two consecutive array entries, the opening symbol appearing first.
javadocLineCommentSymbolsAn arrayref of symbols that designate the start of a JavaDoc comment, or undef if none.
javadocBlockCommentSymbolsAn arrayref of symbol pairs that designate multiline JavaDoc comments, or undef if none.

Notes

PreprocessFile

sub PreprocessFile #(lines)

An overridable function if you’d like to preprocess the file before it goes into ParseForCommentsAndTokens().

Parameters

linesAn arrayref to the file’s lines.  Each line has its line break stripped off, but is otherwise untouched.

TokenizeLine

sub TokenizeLine #(line)

Converts the passed line to tokens as described in ParseForCommentsAndTokens and adds them to Tokens().  Also adds a line break token after it.

TryToSkipString

sub TryToSkipString #(indexRef,
lineNumberRef,
openingDelimiter,
closingDelimiter,
startContentIndexRef,
endContentIndexRef)

If the position is on a string delimiter, moves the position to the token following the closing delimiter, or past the end of the tokens if there is none.  Assumes all other characters are allowed in the string, the delimiter itself is allowed if it’s preceded by a backslash, and line breaks are allowed in the string.

Parameters

indexRefA reference to the position’s index into Tokens().
lineNumberRefA reference to the position’s line number.
openingDelimiterThe opening string delimiter, such as a quote or an apostrophe.
closingDelimiterThe closing string delimiter, if different.  If not defined, assumes the same as openingDelimiter.
startContentIndexRefA reference to a variable in which to store the index of the first token of the string’s content.  May be undef.
endContentIndexRefA reference to a variable in which to store the index of the end of the string’s content, which is one past the last index of content.  May be undef.

Returns

Whether the position was on the passed delimiter or not.  The index, line number, and content index ref variables will be updated only if true.

SkipRestOfLine

sub SkipRestOfLine #(indexRef,
lineNumberRef)

Moves the position to the token following the next line break, or past the end of the tokens array if there is none.  Useful for line comments.

Note that it skips blindly.  It assumes there cannot be anything of interest, such as a string delimiter, between the position and the end of the line.

Parameters

indexRefA reference to the position’s index into Tokens().
lineNumberRefA reference to the position’s line number.

SkipUntilAfter

sub SkipUntilAfter #(indexRef,
lineNumberRef,
token,
token,
...)

Moves the position to the token following the next occurance of a particular token sequence, or past the end of the tokens array if it never occurs.  Useful for multiline comments.

Note that it skips blindly.  It assumes there cannot be anything of interest, such as a string delimiter, between the position and the end of the line.

Parameters

indexRefA reference to the position’s index.
lineNumberRefA reference to the position’s line number.
tokenA token that must be matched.  Can be specified multiple times to match a sequence of tokens.

IsFirstLineToken

sub IsFirstLineToken #(index)

Returns whether the position is at the first token of a line, not including whitespace.

Parameters

indexThe index of the position.

IsLastLineToken

sub IsLastLineToken #(index)

Returns whether the position is at the last token of a line, not including whitespace.

Parameters

indexThe index of the position.

IsAtSequence

sub IsAtSequence #( index,
 token,
 token,
token ...)

Returns whether the position is at a sequence of tokens.

Parameters

indexThe index of the position.
tokenA token to match.  Specify multiple times to specify the sequence.

IsBackslashed

sub IsBackslashed #(index)

Returns whether the position is after a backslash.

Parameters

indexThe index of the postition.

Scope Functions

These functions provide a nice scope stack implementation for language-specific parsers to use.  The default implementation makes the following assumptions.

  • Packages completely replace one another, rather than concatenating.  You need to concatenate manually if that’s the behavior.
  • Packages inherit, so if a scope level doesn’t set its own, the package is the same as the parent scope’s.

ClearScopeStack

sub ClearScopeStack

Clears the scope stack for a new file.  Not necessary if you call ParseForCommentsAndTokens().

StartScope

sub StartScope #(closingSymbol,
lineNumber,
package)

Records a new scope level.

Parameters

closingSymbolThe closing symbol of the scope.
lineNumberThe line number where the scope begins.
packageThe package SymbolString of the scope.  Undef means no change.

EndScope

sub EndScope #(lineNumber)

Records the end of the current scope level.  Note that this is blind; you need to manually check ClosingScopeSymbol() if you need to determine if it is correct to do so.

Parameters

lineNumberThe line number where the scope ends.

ClosingScopeSymbol

sub ClosingScopeSymbol

Returns the symbol that ends the current scope level, or undef if we are at the top level.

CurrentScope

sub CurrentScope

Returns the current calculated scope, or undef if global.  The default implementation just returns CurrentPackage().  This is a separate function because C++ may need to track namespaces and classes separately, and so the current scope would be a concatenation of them.

CurrentPackage

sub CurrentPackage

Returns the current calculated package or class, or undef if none.

SetPackage

sub SetPackage #(package,
lineNumber)

Sets the package for the current scope level.

Parameters

packageThe new package SymbolString.
lineNumberThe line number the new package starts on.

CurrentUsing

sub CurrentUsing

Returns the current calculated arrayref of SymbolStrings from Using statements, or undef if none.

AddUsing

sub AddUsing #(using)

Adds a Using SymbolString to the current scope.

Support Functions

AddToScopeRecord

sub AddToScopeRecord #(newScope,
lineNumber)

Adds a change to the scope record, condensing unnecessary entries.

Parameters

newScopeWhat the scope SymbolString changed to.
lineNumberWhere the scope changed.

CreateString

sub CreateString #(startIndex,
endIndex)

Converts the specified tokens into a string and returns it.

Parameters

startIndexThe starting index to convert.
endIndexThe ending index, which is not inclusive.

Returns

The string.

A base class for all programming language parsers.
A subclass to handle the language variations of Flash ActionScript.
A subclass to handle the language variations of C#.
A subclass to handle the language variations of Perl.
sub New #(name)
Creates and returns a new object.
sub Tokens
Returns the tokens found by ParseForCommentsAndTokens().
sub ParseForCommentsAndTokens #(FileName sourceFile,
string[] lineCommentSymbols,
string[] blockCommentSymbols,
string[] javadocLineCommentSymbols,
string[] javadocBlockCommentSymbols)
Loads the passed file, sends all appropriate comments to NaturalDocs::Parser->OnComment(), and breaks the rest into an arrayref of tokens.
sub SetTokens #(tokens)
Replaces the tokens.
sub ClearTokens
Resets the token list.
sub AutoTopics
Returns the arrayref of automatically generated topics, or undef if none.
sub AddAutoTopic #(topic)
Adds a NaturalDocs::Parser::ParsedTopic to AutoTopics().
A class for parsed topics of source files.
sub ClearAutoTopics
Resets the automatic topic list.
sub ScopeRecord
Returns an arrayref of NaturalDocs::Languages::Advanced::ScopeChange objects describing how and when the scope changed thoughout the file.
A class used to store a scope change.
sub OnComment #(string[] commentLines,
int lineNumber,
bool isJavaDoc)
The function called by NaturalDocs::Languages::Base-derived objects when their parsers encounter a comment suitable for documentation.
sub PreprocessFile #(lines)
An overridable function if you’d like to preprocess the file before it goes into ParseForCommentsAndTokens().
sub TokenizeLine #(line)
Converts the passed line to tokens as described in ParseForCommentsAndTokens and adds them to Tokens().
sub TryToSkipString #(indexRef,
lineNumberRef,
openingDelimiter,
closingDelimiter,
startContentIndexRef,
endContentIndexRef)
If the position is on a string delimiter, moves the position to the token following the closing delimiter, or past the end of the tokens if there is none.
sub SkipRestOfLine #(indexRef,
lineNumberRef)
Moves the position to the token following the next line break, or past the end of the tokens array if there is none.
sub SkipUntilAfter #(indexRef,
lineNumberRef,
token,
token,
...)
Moves the position to the token following the next occurance of a particular token sequence, or past the end of the tokens array if it never occurs.
sub IsFirstLineToken #(index)
Returns whether the position is at the first token of a line, not including whitespace.
sub IsLastLineToken #(index)
Returns whether the position is at the last token of a line, not including whitespace.
sub IsAtSequence #( index,
 token,
 token,
token ...)
Returns whether the position is at a sequence of tokens.
sub IsBackslashed #(index)
Returns whether the position is after a backslash.
sub ClearScopeStack
Clears the scope stack for a new file.
sub StartScope #(closingSymbol,
lineNumber,
package)
Records a new scope level.
sub EndScope #(lineNumber)
Records the end of the current scope level.
sub ClosingScopeSymbol
Returns the symbol that ends the current scope level, or undef if we are at the top level.
sub CurrentScope
Returns the current calculated scope, or undef if global.
sub CurrentPackage
Returns the current calculated package or class, or undef if none.
sub SetPackage #(package,
lineNumber)
Sets the package for the current scope level.
sub CurrentUsing
Returns the current calculated arrayref of SymbolStrings from Using statements, or undef if none.
A scalar which encodes a normalized array of identifier strings representing a full or partially-resolved symbol.
sub AddUsing #(using)
Adds a Using SymbolString to the current scope.
sub AddToScopeRecord #(newScope,
lineNumber)
Adds a change to the scope record, condensing unnecessary entries.
sub CreateString #(startIndex,
endIndex)
Converts the specified tokens into a string and returns it.
These functions are good general language building blocks.
A class used to store a scope level.
A string representing the absolute, platform-dependent path to a file.
Close