JSP Quoting rules

The Eclipse WTP project contains a JSP editor which has trouble with the JSP quote escaping rules. As a way to learn about WTP (with the ultimate goal of replacing the piece-of-shit publishing framework in WTP) I set out to fix this. These documents serve as a set of reminders through the process, and may perhaps help other people that want to hack the source. I found some helpful documents on the internals of the WTP SSE editors here and here.

First I want to fix the syntax highlighting and some erroneous error reporting. Before we can go into that first a short rundown of what quoting rules we need to implement.

Quoting rules

The JSP standard tells us that all JSP recognised entities that have an XML format need to handle quote escaping within their attributes. JSP recognised entities in this context are (all examples are VALID JSP!):

  • JSP Directives like <%@ include="..." %>
  • JSP Taglib invocations like <navi:out text="Hello \"world\" /> and <navi:out text="<%= \"Hello \\\"world\\\"\" %>" />

As can be seen in the example quoting can get rather ugly, and the code in for instance a JSP expression is no longer Java! It is only Java after the quote escaping has been handled.

In non-JSP tags (JSP template text) like HTML tags used in the JSP page no quote escaping is done. So an HTML tag like:

<a title="Hello \"world\""> is actually a sequence where title is equal to

Hello \"

and the rest is an error.

Token parsing in Eclipse

The SSE editors build a complex model of a JSP document. The document starts off as a simple text buffer, but overlaid on the buffer is a structured view of that buffer build from XMLStructuredDocumentRegions and other regions. The document gets split into regions using a ReParser; the ReParser for JSP is called JSPReParser. This uses a JSPSourceParser which divides the text format into Regions. The JSPSourceParser is actually derived from XMLSourceParser. Almost all of the code is executed by having it attached as some kind of listener somewhere. The entire partitioning and splitting into regions is done when the document is changed using a change event. This makes it hard to find where code gets executed.

The JSPSourceParser uses a JSPTokenizer which is a JFLEX-generated lexer which recognises often small parts of JSP as tokens. This is a very stateful tokenizer which uses a lot of lexer states to move lots of recognition of token types to the lexer. This lexer is the first part we'll have to fix as it currently does not distinguish between tag attributes for JSP template text and attributes from JSP tags and directives.

When the JSPSourceParser is done it has a model consisting of a tree of XMLStructuredDocumentRegions which follows the tree layout of the XML/HTML document. Each of the structured nodes consist of smaller Regions which describe the parts which form a structured region. The type of the part can be obtained from a region with getType(); the possible types (which are comparable with token types) can be found in DOMRegionContext (for XML) and DOMJSPRegionContexts (for JSP).

The model is also used for syntax highlighting. In this the class Highlighter is important: it gets called from the text widget to calculate the text attributes for a segment of text (offset:length, appearently usually one line at a time). This code (prepareStyleRangeArray) uses the offset:length to locate all of the Regions present in the text; it then collects all regions of the same "meta type" into partitions. Each partition has it's own LineStyleProvider; the LineStyleProvider for a single partition gets called for each region within the partition and is responsible for returning a list of TextAttribute's which totally colorize the region.

Although the JSPTokenizer provides a lot of detail on JSP regions it does not do so for other region types like Java or Javascript (or other embedded code). Sadly enough the LineStyleProviderForJava within WTP simply takes the region's text and tokenises it again using a Rule based tokenizer. The token types herein are only used to colorize the code very naively.

Translation

To be able to do Java-based stuff like context-based lookup and error checking the WTP uses a JSPTranslator. This translator generates a Java source file in memory which serves as a translated version of the JSP page, with enough functionality to expose all of the artifacts generated by the JSP. The goal of this translated source is not to execute the JSP but only to allow the JDT Java compiler to compile the source and to report errors for the result. These errors form the JAVA part of the messages generated when a JSP page is edited.

The JSP translator does it's utmost to keep a reversible mapping of JSP source line:column numbers to generated Java source:column numbers. In that way it can "convert" the locations generated by the Java compiler for errors (which are in the Java source) to locations within the actual JSP document.

Part of the fixes we'll have to do are in the translator: it has to properly unescape quotes within the appropriate attributes.

Tokenizer fixes - rundown

Fixing attributes without embedded JSP

The tokenizer currently has a number of token types associated with attributes. The most basic form is an XML_TAG_ATTRIBUTE_VALUE which is any attribute value which does not contain any JSP markup (like JSP expressions or EL expressions). This gets returned for all tags, even those that are just HTML.

We need to fix this so that this gets recognised only when the tag that the attribute is a part of is not a JSP taglib tag or JSP directive tag.

The proposed fix is to have a specific flag which indicates that the attribute is part of a JSP tag/directive. When the attribute is recognised it gets a new token type, JSP_TAG_ATTRIBUTE_VALUE, indicating to any "takers" that it's content must be submitted to quote escaping rules.

This would fix the following which now reports an error around the plus sign:

<input value="<%= "\""+ "\"" %>" />

This segment is a valid non-JSP tag (JSP template text) so no JSP quoting rules apply. The only rules that apply are the Java quoting rules which are applied correctly (the "outmost" quotes for the attribute are template text and should not "seen" by the lexer). The fix would return this as a XML_TAG_ATTRIBUTE_VALUE which would be highlighted using the normal Java scanner, and would be copied verbatim into the JSPTranslator's source file.

The example

<navi:out text="\"hello\"" />

would be parsed correctly since it returns a JSP_TAG_ATTRIBUTE_VALUE which gets "unescaped" in the JSP translator. The syntax highlighting must be fixed by adding the entire type to the LineStyleProviderForJSP.

[todo - later]

Fixing JSP attributes with embedded expressions/scriptlets.

a different LineStyleProvider (a LineStyleProviderForEscapedAttributes) which tokenizes this kind of structure in one go.