What is fixed
The first thing I'll fix is the erroneous quoting of constructs like:
<navi:form method="hello \"world\"">
This is a properly escaped invocation of a JSP taglib, and it gets mistreated in the JSPTokenizer. It treats tag attribute values of JSP tags the same as attribute values of html (template text) values. So the fix is to remedy that.
First we have to distinguish between JSP tags and non-JSP tags. This must be done at lexing time because the behaviour of the lexer changes wrt the type. For now I just added a variable, boolean fInJspTag to the lexer. When a tag name is found the lexer checks to see if it is a prefixed name (contains a ':') and if so sets this variable.
This is still not right (you can easily have qualified XML tags that are not part of a taglib) but for now it suffices.
Now we need to distinguish the simple attribute values. To do this I added a new lexer state: ST_JSP_ATTRIBUTE_VALUE. When the lexer is expecting an attribute value (after receiving an EQUALS sign after an attribute name) I added a check to see if we're currently parsing a JSP tag and if so we do not enter XML_TAG_ATTRIBUTE_VALUE state but my new state.
I then added a rule based on the AttValue rule which accepts escaped quoted values:
// As Attvalue, but accepts escaped versions of the lead-in quote also
QuotedAttValue = ( \"([^<"\x24\x23] | [\x24\x23][^\x7b"] | \\\" | {Reference})*[\x24\x23]*\" | \'([^<'\x24\x23] | [\x24\x23][^\x7b'] | \\\' | {Reference})*[\x24\x23]*\' | ([^\'\"\040\011\012\015<>/]|\/+[^\'\"\040\011\012\015<>/] )*)
In the lexer I added a new segment which on the new state accepts only the quoted attribute variant:
<ST_JSP_ATTRIBUTE_VALUE> {QuotedAttValue} { /* JSP attribute values have escape semantics */
if(Debug.debugTokenizer)
dump("jsp attr value");//$NON-NLS-1$
fEmbeddedHint = XML_TAG_ATTRIBUTE_NAME;
fEmbeddedPostState = ST_XML_EQUALS;
yybegin(ST_XML_ATTRIBUTE_NAME);
return XML_TAG_ATTRIBUTE_VALUE;
}
This accepts the escaped values and returns them as the same token type as the original; and that gets passed into the JSP translator verbatim which is correct for this type.
After this I added the new state to all parts of the code that currently refer to XML_JSP_ATTRIBUTE_VALUE so that all other constructs get recognised also.