Building on our earlier fix pointing this message's line number to the get_hit_sound, it would be even more useful to have the line number of the DEFENSE_ANIM as well. To get this, we change the has_defense_anim variable's value from False to None, and True to i. It is possible that a unit might have two macros for male and female variants, in which case we will assume that the get_hit_sound is most likely associated with the first.
Of course, during 1.5, this macro was renamed to {*NAMED_*LOYAL_UNIT}, but we will stick to upgrading to "1.4", and worry about changing LOYAL_UNIT to NAMED_LOYAL_UNIT in the current wmllint.
The basic "if '{UNIT '" condition is more efficient than subjecting every line to a complex regex. However, it would be theoretically possible for a matching line to fail the substitution. Thus, I used subn() instead of sub(), and only report an upgrade to stdout if there is at least one substitution. In the hypothetical case that no substitution is carried out, I alert the user so they can look into it.
The regular expression looks intimidating, so here's an explanation:
field 1: unit type - The authors of this era seem to have been pretty good about enclosing fields in parentheses, but this part of the regex accounts for all three possibilities: a) parentheses used to enclose; b) quotes used to enclose; c) no enclosure, thus ending with the next space.
field 2: id - Basically a clone of the first field.
field 3: name - Clone of the first two fields, except for allowing a translability underscore in the last two cases (though early UMC seems to all follow the practice of including the translatability underscore with the rest of the name inside parentheses.)
field 4: side - We can expect this to be a number. Old add-ons should all be using single digits, but I allow more than one digit to match anyway. In theory, there could be cases which would break the regex (e.g., enclosing the number in parentheses, a macro substitution), but since I don't know of any, I'll just call it a day.
field 5: x coordinate - Usually this will be a number, but it could also be a variable or a macro substitution.
field 6: y coordinate - Clone of field 6, except with y/Y substituted for x/X.
(A side note: after testing this commit, I noticed that the introduction to hack_syntax called for "set[ting] modcount to nonzero" when modifying lines, there are "modcount += 1" lines throughout the animation transformations, and hack_syntax ends with "return (lines, modcount)". Looking into it, however: a) my code was working fine without it, with no change detected after I tested inserting "modcount += 1"; b) I never figured out just what use wmllint was making of the modcount, despite all the care to increment it upwards; c) the last suite in hack_syntax doesn't use modcount, either, and it turns out to have been written by ESR himself in February 2008, after all the other code.)
The first message has a couple of problems. Technically, get_hit_sound is not a tag, and there is a stray quote mark at the end. Also, i+1 points to the line number of the [/unit] tag, which is not particularly helpful information. This can be changed to point to the line of the get_hit_sound attribute.
For the second message, the %d get_hit_sound is an index position, so +1 for the line number.
In Linux, many 1.2 unit files would crash wmllint, with tracebacks pointing to the "assert male/female_end != -1" line. Male/female_end's value is set to -1, and when it does not meet the condition for converting to i (line index position), the assert statement fails. The "assert male_end" error crashes files with gender=male, or no gender= key (thus defaulting to male). The "assert female_end" error is the female counterpart, and also covers units with both genders.
I found that after commenting out these assert statements, wmllint no longer barfed on those files. Studying the problem for this commit, however, I saw that "endswith()" included a newline. Could it simply be choking on DOS carriage returns? Doing a dryrun in Windows, which defaults to universal newlines support, I did not get the crashes. Change to binary mode, the crashes returned. Insert rstrip() and delete the newlines, and the crashes stop!
These portraits were moved prior to 1.1.9. That was before ESR joined Wesnoth development in April 2007, which may explain why wmllint didn't cover this change. Nevertheless, even many 1.2 campaigns still have the old "portraits/core" filepaths.
These old paths also keep post-1.4 wmllint from updating portrait paths to their current location, after they were moved again in 1.5.9.
I noted last time that my "fix" could not cover all possibilities. After further thought, I decided that the best thing to do in the hard cases is to sys.exit, and give users a clear explanation in stderr so they can re-enter their paths correctly. I may have gotten carried away, but given that many users nowadays are unfamiliar with the command line (even moreso on Windows), I wanted to give them plenty of hand-holding.
I looked up this issue, and it turns out to be a Windows shell problem after all, not Python, which surprised me.
After more testing, I realized that I did not take into account the possibility that the wildcard pattern would not match anything. In that case, the following 'if not arguments' clause would run wmllint on the entire current directory - which could very well be something that you do not want!
Although the original purpose of the in_textdomain and in_binary_path code, an aborted effort to update their paths to "data/add-ons/", has been superseded by code that updates those paths on all lines, it can still be put to use.
Our first step is to move that section below the code that updated UMC paths, so our regexes won't have to deal with "@campaigns" and "data/campaigns" strings. Then we delete the 'if 0:' line that was neutralizing this section, as well as the obsolete path-changing code. The rest is de-indented one level.
Then we look for the use of "~" for userdata, which does not work for textdomains and binary paths.
Our regex object, 'tilde', is constituted thusly: (1) We make sure that the line starts with the "path" key. Any line we're interested in ought to start with this, and this will also keep this code from going wild on the campaign includes, if an author forgot a closing tag (no reset to False). (x) There shouldn't be any whitespace around the = sign, but we'll be kind. (2) On the value side, there shouldn't be anything before the tilde except perhaps a quote. Rather than underestimate the ingenuity of authors in coming up with weird code, however, I allow anything except a comment to match for a few characters. But if we haven't hit the '~' after five characters, I figure something's wrong, and bail. (3) Then we come to the tilde. Normally, it would be adjoining "add-ons/", but some authors interpolate a slash, or 'data/' (here represented as an optional string).
If we match, we rebuild the line, except 'data/add-ons' is substituted for group(3), and we log to stdout.
The Windows cmd shell does not expand wildcards by default, unlike UNIX shells. This imports glob.glob and runs arguments through it on Windows.
Frontported (in modified form) from my 1.4 work!
While testing my next commit, I discovered that EH's fix works when there is only one argument, or if the offender is the last argument, but doesn't work with multiple entries. His fix is meant to work on each argument, but the (unintentionally) escaped quote no longer serves to end the argument, causing following arguments to be considered part of the same argument.
Using split() allows us to break apart these misconjoined arguments. With rstrip(), we prevent an empty string that Windows will also complain it cannot find. However, if there are three or more arguments, there will still be lumped-together arguments unless all arguments up to the second from last also end with a backslash and quote. It is impossible to cover every possible case.
The re.sub handles the probably rare case where a backslash before a quote comes within the argument rather than at the end. However, it will only work if there is only one argument.
All this is unnecessary if the OS is not Windows (also, I haven't had the opportunity to test this on a non-Windows system to see if it has any side-effects there). So I've put it under a sys.platform condition.
Since ellipses automatically change depending on the ZoC emission, there is no point to change them manually in the cfgs - we don't want the engine to look for a misc/ellipse-nozoc-nozoc file. This commit is the first of a series that will fix all mainline units, even the campaign-specific ones.