pofile does not process reference comments in any way, since the format of
references is not exactly specified. This test specifies, what users of
pofile can expect the library to do.
writing messages should no be in line with gettext tools. I tested
using msgcat, it provides the same results.
For some common use-cases I wrote explicit tests, for uncommon and
even unwanted use-cases I wrote one test to make sure pofile works
like msgcat for those messages
in Item.toString, all \n characters are removed from the output.
The gettext tools however leave those characters intact. This
will now produce the same output as tools like msgcat.
An incompatible change (that actually breaks po parsing after writing) had
been introduced with commit e164fcfe9d6f28cd3d452f4de274a41663160cef. If
_process returned an array (which is the case for strings containing \n
character), array.toString will return a comma separated list, which is not
valid po syntax. Added a test to restore the behaviour from before the
e164fcfe9d6f28cd3d452f4de274a41663160cef.
since the lines in the parser have all newline characters removed, \s+ will
not match empty comments.
Added an example that makes other tests fail without this patch.
the current implementation of items marked obsolete did not allow "plain"
comments for these items. However, this is perfectly fine according to
the original gettext tools. When writing a po file, comments for obsolete
items don't contain the '#~ ' mark (tested using msgcat), so this is now
also aligned with the behaviour of the original gettext tools.
For all these cases I added examples in the po files, that failed with the
current implementation and work fine after these changes.
Some languages (such as Polish, Russian or Romanian) do have more
complicated plural forms. Those are still expressible by a more
complicated mathematical expression. However, the msgmerge tool of
gettext will in these cases write multiline header fields. When parsing
such files with this lib, the headers get screwed up, so this patch
provides an example (from a pl_PL po file) and fixes this by joining the
lines in the header, before doing the actual parsing.
during PO.parse, an extract(string) method is called on each string to
unescape some characters (like " and \). This process should be reverted
in the toString method.
The PO spec says, that all strings should be C-Strings. Otherwise tools
like msgmerge (from the gettext package) will fail parsing po files written
by this library.
I added a few more edge-cases to the tests for the new msgctxt field and
revealed a bug during that. The default value has been set to empty
string, but should have been null, since the spec says, this field is
optional.