-
Notifications
You must be signed in to change notification settings - Fork 19
News
Happy holidays. I've fallen to the planning fallacy once again, and until a couple of days ago, I haven't worked on FnParse at all.
Today, I pushed both FnParse 2.2.7 and FnParse 3.α.4 to Clojars. If you use Clojars or Leiningen, this might be useful for you. Note that the Clojars website only shows the last pushed version (2.2.7), but I think both are actually available via Leiningen.
I want to start working on FnParse 3's documentation again soon and get FnParse to version 3.0.0. See the update from 2010-09-16 right below for the checklist of things that I to do. I'm not going to give any timeframe this time except that I give 2:1 odds that I'll be done before May 2011. :)
Revised to-do list for FnParse 3.0.0. I won't be able to work on this for a few days.
-
(Cancelled.)
Add arule-termrule-maker that makes a rule that tries to match a rule to a given state. -
(Cancelled.)
Rewrite the number parts ofclojure.cljto userule-term. - JSON example: Add
read-stringfunction. Add numbers. - Examples: Change style of other example parsers.
- Hound: Add the new rule-makers
maker-repandhooked-maker-rep(to allow rule-makers to be repeated and pass arguments to each other without the stack overflows that you'd get with right recursion). -
(Cancelled.)
Cat and Hound: Contexts are changing to be immutable. Removealter-context. Rename contexts to "environs". If you are mutating contexts, rewrite your rules to use rule-makers instead (don't worry if you're having trouble; contact me, and after I finish the rest of the planned work above I'll be happy to help). - Cat and Hound: Expose the atoms used by memoizing rules.
- Cat and Hound: Rewrite the rest of documentation.
A couple of days ago I started working on FnParse 3 again. I've done the following work.
- Reimplemented all rules and rule-makers to use Clojure records implementing the protocol
Rule. - Rule-makers can now be mutually recursive to each other.
- Rewrote the
clojure.cljClojure-in-Clojure parser. (From now on, user-defined rule-makers are reverse-bracketed like>this<.) - Started rewriting documentation, starting from
fnparse.core. - Renamed
whentodemandbecause its name was confusing.
Planned work before FnParse 3.0.0 is below.
- Add a
rule-termrule-maker that makes a rule that tries to match a rule to a given state. - Rewrite the number parts of
clojure.cljto userule-term. - Rewrite the rest of documentation.
- Change style of other example parsers.
- Contexts are changing to be immutable. Remove
alter-context. Rename contexts to "environs". If you are mutating contexts, rewrite your rules to use rule-makers instead (don't worry if you're having trouble; contact me, and after I finish the rest of the planned work above I'll be happy to help).
I have, as far as I can tell, finished the antepenultimate step of FnParse 3, implementing code; what's just left is documentation.
I'm waiting for Autodoc to update to Clojure 1.2; I'm having difficulty running it, and I don't really want to hack it myself. :(
Once I generate Autodoc documentation, beta testing will start. The API is pretty stable at this point, but will not completely freeze until the first post-beta version, 3.0.0. Once I start beta testing, I'll continue to improve FnParse's documentation and perhaps write a tutorial (though I'm considering using simply the sample libraries .json and .clojure). But I also want to restart clojure-yaml, the library for which I wrote FnParse in the first place.
It's been a year and a half since I started FnParse. I hope FnParse 3's new features—packrat parsing for Cat, LL (1) enforcement for Hound, and advanced error processing and documentation tools for both Cat and Hound—will be worth the wait.
In the past month, I’ve implemented the following:
- Rule names are now generally surrounded by angle brackets: < and >.
- Good default printing functions for the
parsefunctions (now renamed tomatch). - A general find function to find all occurrences of a rule in a sequence, and a general substitute and substitute-1 function akin to regexes’ substitutions. (On a suggestion from "ath":http://github.com/ath.)
- Fancy documentation macros for both rules and rule-makers
- Documentation strings for the library
- Rewriting the JSON parser to use FnParse Hound
- Splitting the Clojure parser into pure-Clojure and impure-Clojure versions
When will I be done? No idea—my work is unpredictable. But I’m chugging along. When I release the first beta version, the API will be frozen…maybe in a couple of months.
Whoo-wee. I've been busy, busy with work, but I've found time to work on FnParse 3 recently.
The following information is subject to change; if you're reading this in the future, don't rely on this news.
There is a big new thing, for those of you paying attention to FnParse 3, Cat and Hound. I'm changing the naming style—a lot. I've decided—for now—that, as per the currently recommended Clojure development guidelines, FnParse will now focus on shortening names, sometimes to the point of sharing names with clojure.core. This is like how using the new clojure.contrib.string (formely clojure.contrib.str-utils3, formely clojure.contrib.strs-utils2)—you have to use require rather than use now:
(use 'name.choi.joshua.fnparse)
(def escape-codes {\n \newline, \t \tab, \\ \\, \" \"})
(def escaped-char-rule
(complex [_ (lit \\), code anything]
(escape-codes code)))
(def string-char-rule
(alt escaped-char-rule anything))
(require '[name.choi.joshua.fnparse.hound :as r]) ; r is for "rule", but it doesn't matter
(def escape-codes {\n \newline, \t \tab, \\ \\, \" \"})
(def escaped-char_
(r/for [_ (r/lit \\), code r/anything_]
(escape-codes code)))
(def string-char_
(r/+ escaped-char_ r/anything_))
In addition, rule makers like conc, alt, and complex have been renamed shorter things like cat, +, and for. As with clojure.contrib.string, whenever a shorter name masks one belonging to clojure.core, I try to make the semantics and parameters similar.
I am well aware what this means for backwards compatibility: it is a big headache for those of you who already are familiar with FnParse 1 or 2. I understand that this is a hassle to learn. But I would like to stress that if you already have libraries in FnParse 1 or 2, you should not attempt to just switch to FnParse 3 (Cat or Hound) without a manual rewrite of your parser code. I've accepted the previous sentence's consequences since I decided to split FnParse, because even if all the functions' APIs remained the same, you would still need to pay attention to the differences between Cat and Hound.
It may not be worth rewriting your existing code into FnParse Cat 3 or FnParse Hound 3. It may not even be useful learning FnParse 3 at all. But Cat and Hound both should be at least slightly faster to much faster than vanilla FnParse 2: Cat uses packrat parsing and Hound uses delays to prevent a certain memory leak as per Parsec. It may be worth learning the new API and using it for new parsers. It may even be worth rewriting old parsers, maybe.
If you think that the new API syntax is ugly, I'm still looking for better ideas. Please contact me if so. I think, though, that in the long run this will be better for maintainability of your parser code.
It would also be lovely if I could have some pointers on optimization of Clojure code; I am unfamiliar with Clojure or JVM code profiling or optimization, and when I finalize the libraries, I need to ensure that the libraries aren't wasting time and energy.
I've been chugging along. FnParse Cat and Hound are both working well. For the past half month, I've been writing a Clojure parser for Hound to showcase how it works, as well as to write was is probably the first Clojure reader in Clojure. I've also discovered a lot of things about Clojure's syntax on the way too. Next stop, Python! Just kidding; that's for later, if ever.
I've been busy and will be busy for this week, so expect no commits this week. I'm excited, though, because the cores of Cat and Hound are just about finished, with no currently apparent bugs. I just hope that both of them are faster than FnParse 2...but optimization is for later.
Today, I've finally finished the memoization algorithm for both direct and indirect left recursion, which FnParse Cat will support. It's slow right now, but optimization comes later. Things should be pretty breezy from here.
I've decided after asking people on Clojure's IRC channel (if you use Mac OS X, Colloquy is an excellent IRC client) to keep FnParse Cat and FnParse Hound in the same repository: this one. I will be working on FnParse Cat, then FnParse Hound, then documentation.
A quick update—like I said, I'll start programming again on December 18 or 19, when the bulk of my work is done. But importantly, I've come up with two names for the two future successors of FnParse: FnParse Cat and FnParse Hound.
"Cat" comes from the fact that the first library, being a packrat parser, is an unlimitedly backtracking parser and is going to leap back and forth between tokens, like a cat would.
"Hound" comes from the fact that the second library, being a mostly-LL(1) PEG parser, is going to hound the end of the tokens (usually) without looking backwards, like a dog would.
Yeah. Like I warned below, they are indeed pretty silly names. (If you have better names, feel free to contact me before I create their repositories on GitHub!)
I'm swamped with work right now, and I've committed to not program (which I do for fun) until I finish my work—which won't be until December 18 or so.
But I've still been researching parsers, and I've decided to split FnParse into two libraries. The first will require more space, for complex grammars. The second will require little space, for simpler grammars. Thus, there will be no FnParse 3, but FnParse 2 will continue to be maintained for bugs.
First, a unlimited-lookahead packrat parser that fully supports left recursion (direct and indirect). It will be intended for scenarios with very complex syntaxes (like, say, the Java Programming Language, but not C, which is very context-dependent). Its disadvantages, being a packrat parser, is that it will use a lot of memory, linearly proportional to the size of the input string, and that, while it can manage state, it can't manage much state like the C Language requires, or else it will slow down. For inspiration I am looking at Bryan Ford's original packrat parser thesis and Packrat Parsers Can Support Left Recursion.
Second, a full-fledged Parsec-like restricted-lookahead PEG parser that does not hold onto the token sequence's head. It will be intended for scenarios with simpler grammars, preferably LL(k) languages with low values of k, like Clojure, other Lisps, YAML, JSON, and XML. Its disadvantage is that it does not support left recursion (yet?) and it has limited lookahead, which means you'll have to explicitly mark lexical rules. It will work more differently than FnParse does than the packrat parser will. For inspiration I am looking at the original Parsec paper.
I am still coming up with names for both libraries that sufficiently distinguish them apart. Like "FnParse", they'll probably be pretty stupid.
I am currently writing a backwardly incompatible version of FnParse, FnParse 3. It should be done by the end of the month. You can still totally use the latest FnParse 2—I'm going to make it really easy to migrate. The point of FnParse 3 is to make certain things having to do with states easier, especially for new people. The earlier I do this, the better, I think.