• Kim Nguyễn's avatar
    Fix the handling of polymorphic variables in the lexer. The solution · 36b83c45
    Kim Nguyễn authored
    to use two lexers (depending on whether we are between square brackets
    or not) is too brittle (it crudely tries to parse
     ``( [whitespace] 'a  [whitespace] )'' as a variable, to force the user
    to write the variable beetween parenthesis. However this does not scale
    to types with two arguments (says [ t ('a, 'b) ]).
    
    We use a simpler heuristic (with look ahead)
    
    (1) try to see if the regular expression
    
    ' (anything but ', \n)* '(anything but the first letter of an identifier)
    
    can be found. If so, we put back the lexeme in the buffer and parse it as as
    a string.
    
    (2) if (1) failed, try to parse it as a variable
    
    (3) if (3) failed, try to parse it again as a string. We are
    guaranteed to fail here but it means we have a malformed string, so we
    parse as a string to get a proper error message.
    
    The only thing this does not cover are cases like
    type t = [ 'abcd'Int ]
    which was tokenized before as [, 'abcd', Int, ]
    and is now tokenized as [, 'abcd, 'Int, ]
    It does not seem to be a problem in practice though (since in the code
    I have seen thus far, people were at least putting a space).
    it is easy to emmit a warning in this case, suggesting the user to add
    a whitespace to get the old behaviour back.
    36b83c45
ulexer.mli 577 Bytes