Fix the handling of polymorphic variables in the lexer. The solution
to use two lexers (depending on whether we are between square brackets or not) is too brittle (it crudely tries to parse ``( [whitespace] 'a [whitespace] )'' as a variable, to force the user to write the variable beetween parenthesis. However this does not scale to types with two arguments (says [ t ('a, 'b) ]). We use a simpler heuristic (with look ahead) (1) try to see if the regular expression ' (anything but ', \n)* '(anything but the first letter of an identifier) can be found. If so, we put back the lexeme in the buffer and parse it as as a string. (2) if (1) failed, try to parse it as a variable (3) if (3) failed, try to parse it again as a string. We are guaranteed to fail here but it means we have a malformed string, so we parse as a string to get a proper error message. The only thing this does not cover are cases like type t = [ 'abcd'Int ] which was tokenized before as [, 'abcd', Int, ] and is now tokenized as [, 'abcd, 'Int, ] It does not seem to be a problem in practice though (since in the code I have seen thus far, people were at least putting a space). it is easy to emmit a warning in this case, suggesting the user to add a whitespace to get the old behaviour back.
Please register or sign in to comment