Org parser 1.0

2014-04-07 · 980 words · 5 minute read

nononsensenotes

So step 1 of the process to a proper Android client for org-mode is more or less complete. The library can be found on GitHub. It’s a plain old java library that builds a jar file which can be included in other projects. It has one requirement which is included in the lib-folder: Joda Time.

Coding this up I really came to hate the built in Date and Calendar classes in Java. Not that I ever liked them but previously I merely disliked them. I’d tried to stay with the built in classes to cut down on external dependencies but the final straw was when I was coding up the repetition of timestamps. Having a java.util.Date object with a valid date, I then set the hour and minute on it. This changed the year of the Date object. The fucking YEAR! How did anyone think that was derirable behaviour? It was bad enough keeping the timezones straight when they were irrelevant, but having the shit go time traveling was just unacceptable. I’m a bit late to this realization but Joda time is dates and times done right.

Another snag was when I realized that Android does not support named groups in regular expressions. That’s apparently way too modern. This was atleast easily fixed by changing the group constants from Strings to Ints and corresponding group index instead of name. A valuable lesson though is that it is way easier to develop the patterns using group names and then change to using indices when the tests pass.

And testing time objects is a challenge by itself. Thankfully I had embraced Joda time at this point or I’d probably stuck a knife in my brain by now. See testing time repetition suddenly becomes dependent on when you run the tests. If a timestamp is supposed to repeat itself one hour in the future (.+1h), the basic unit test is perhaps:

now.HourOfDay() < new.HourOfDay();

Except of course, if you run the test between 23:00 and 23:59. Same story with monthly repeaters, which I happened to run on February 28. There are way more if-conditions in the test code than I’m satisfied with…

Let’s talk about how the parsing works.

Nodes

A file is made up of one or several nodes. A node has a header and a body. A simple example would be:

* The header starts with one or more stars
The body follows

An OrgFile is a subclass of OrgNode. As a special case, an OrgFile does not have a header but it does have a body, which is possibly empty. It’s not unusual to put comments at the top of the file with for example file-wide tags. An example of that would be:

# Anything put here will be placed in the body of the file
#+TAGS: noexport draft

* Header of a node
Body of node

Nodes are placed as subnodes of the file. A node might itself have some subnodes:

* Root node
  Body of root
  ** Subnode 1
     Body of sub1
  ** Subnode 2
     Body of sub2

When a file is parsed a recursive structure of nodes is created. The root is always an OrgFile object. You parse a file by calling the static createFrom method of the OrgFile class:

OrgFile orgFile = OrgFile.createFrom("/path/to/file.org");

Header parts

A header consists of several parts and they are all available individually in the OrgNode object. A complete example is:

* TODO Title of header     :tag1:tag2:

By default TODO and DONE are included as TODO-keywords and they can not be removed. There is support for adding additional TODO-keywords during parsing.

Note that a priority is currently not parsed.

Body parts

A body consists of more than the raw text. Just as a file might have some commented properties, so might a subnode. It might also contain various kinds of timestamps. As a restriction, these fields may only occur before the body proper. Empty lines are ignored before the body (unless they are connected). The order of comments and timestamps (and empty lines) does not matter. The order when written back to file is: comments, timestamps, the body proper. Some examples:

* Minimal example
# A leading comment followed by a timestamp
<2013-12-24>
Body text starts here.

* Empty lines will be ignored

# Comment

<2014-12-13>

Body will include the previous empty line.

* Order of comments, timestamps and empty lines does not matter
DEADLINE: <2013-12-12>
#+TAGS: bob alice
SCHEDULED: <2013-02-12>

# Another comment
Body starts here

Timestamps

A lot of the code deals with handling the timestamps and their repetitions. A minimal example timestamp is:

<2014-02-28>

A maximum example timestamp is:

DEADLINE: <2014-02-28 Fri 09:00-12:00 +4w -1h>

This would define the following properties of an OrgTimestamp, where only date is mandatory.

type	DEADLINE
date	2014-02-28
time	09:00
endtime	12:00
repeat	+4w
warning	-1h

The date and time are stored as a Joda LocalDateTime. The method hasTime returns true if the time is set. If not, only the date should be used.

The majority of the code deals with the repeater part of the timestamp. If this is set, the methods toNextRepeat, getNextRepeat and getNextFutureRepetition can be used. toNextRepeat deals with the three different types of repeater: “+”, “~~+” and “.~~”.

+ moves one step.
++ moves forward as many steps as required to get it into the future.
.+ moves forward one step from today, as opposed to whatever date currently is.

Durations

A special type of timestamp (though not a subclass of OrgTimestamp) is a duration, represented by OrgTimestampRange. Two examples of durations are:

<2014-01-01>--<2014-01-02>

<2014-01-01 Tue 09:00>--<2014-01-02 Wed 17:00>

Only the dates are mandatory. This type of timestamp does not support repeating or warnings at this time.

Screenshots

Some example screenshots from a basic test app, just to make sure the parsing worked OK (which named pattern groups did not).

List of files

Reading an org file

Next challenge will be to integrate this into NoNonsenseNotes.