Talk:MediaWiki lexer
Add topicThanks for your work!
With a real parser of a clean syntax a Wikipedia DTD will be no problem! :-) And there will be wikipedia in RDF, too... (just dreaming :-) --Nichtich 00:19 29 May 2003 (UTC)
- It isn't going to implement any "clean syntax", rather trying to stay compatible with current one, fixing only things that break generation of correct XHTML.
- It's not finished, and I'm not completely sure if it will work that way.
Taw 08:49 29 May 2003 (UTC)
What means this? -Smack
let anything = ['\000'-'\255']
we use unicode, don't we?
btw: any parser will define a more "clean syntax" than the current one (that is no parser at all). If you are able to generate correct XHTML you are also able to generate other XML formats. --Nichtich 12:42 12 Jun 2003 (UTC)
['\000'-'\255'] - any byte from 0 to 255 (decimal, yeah, ocaml rox0rz here !!!). It will work with any ASCII-compatible encoding (ISO 8859, UTF-8, ISO 2022, EUC etc.) Taw 03:27 3 Aug 2003 (UTC)
anything_but_close_ ...
[edit]Are these "anything_but_close_math" etc. really necessary? In most regexp implementations, there exists "nongreedy" wildcards.
E.g <math>.*</math> will consume all text it can get, but <math>.*?</math> will only consume up to the first </math>
-- Stw 14:19, 25 May 2004 (UTC)
What about ANTLR?
[edit]Is there anybody out there, who is using ANTLR as lexer and parser? I am trying to parse MediaWiki syntax in a Java structure. The long term goal would be, to make it possible to generate different output stuff (DocBook, HTML (as well), LaTeX, ordinary ASCII text and so on). But it is really hard to build a grammar file for MediaWiki. At moment i have a (who guessed it?) incomplete version based on the EBNF from this pages. But there are some difficulties to translate EBNF in an ANTLR .g file. I guess, that EBNF would be not so helpful in practice.
- It seems to me (right now, probably, will change my point of view sometime later), LaTeX generation does not require a full-featured parser.
- In any case, current lexer description looks very nice, great work. --VictorAnyakin 09:23, 24 November 2006 (UTC)