How it works?

Introduction

Today there exists several RTF java parser like iText, ecs, Jakarta POI - HWPF or RTFEditorKit (parser RTF used by Swing). With iText and ecs, you can't read RTF from file. The last news of Jakarta POI - HWPF is 2003. Where is her status today? RTFEditorKit is RTF parser very completed, but among RTF version, there are problems for reading RTF.

When I searched java RTF parser, I wanted that parser was able to :

  • read RTF file. The model RTF can be created by using MS word for example.
  • update RTF content, to loop for particulary RTF content (like RTF Row, RTF header, RTF content).
  • merge data with RTF template.
  • using MERGEFIELD and BOOKMARK RTF element into RTF template. Indead, the RTF model must be created by anybody who knows MS word. No particulary syntax must be putted into template.
  • source parser was simply (few classes). Indead, I would like migrate the RTF parser in C#.

That's why I decided to create RTFTemplate, a RTF simple parser which can read RTF from file and update any RTF content. When you say "merging data with template" in JAVA, you think Velocity. Morever Velocity exist too for C#, this template engine is called NVelocity. So Velocity engine was the best solution to merge data with RTF template.

While I searched RTF Parser, I found on mail archive of velocity RTFTemplateEngine idea and source. RTFTemplate based on founded source.

However create (at hand) velociy template are 2 problems :

  • the user who create RTF template must know syntax Velocity, like :
    • $myObject.name to write the name of myObject.
    • #foreach to start loop.
    • #end to end loop.
  • when you save model with MS word, sometimes it inserts character like \r\n before $ (eg : $\r\nmyObject.name). This perturbs velocity engine.
  • it's impossible to manage RTF table. To manage RTF Table you must write #foreach macro velocity at first and #end at end of an RTF Row. With MS word, it's impossible to write #foreach in the template. The only solution is to edit the template, detect \trowd in the template to write macro #foreach, and after detect \row to write #end. Its' awfull!!!

To resolve these problems, I create simply RTF parser which is able to transform MERGEFIELD and BOOKMARK of the RTF model into velocity macro.

RTFTemplate process

When you use RTFTemplate 2 steps are necessary :

  • Step 1 : generate RTF template with velocity macro by using RTF model designed by MS word.
  • Step 2 : merge object JAVA context with RTF template (with velocity macro).

Here is diagram which describes basic RTFTemplate process :

When your RTF model must iterate on JAVA objects list, #foreach velocity macro are generated (on step 1). To generate this macro velocity, RTFTemplate use MERGEFIELD type. If MERGEFIELD is list type, RTFTemplate generate #foreach. To know if MERGEFIELD is list or not, RTFTemplate use context. There are two methods :

  • RTFTemplate & Velocity context : This method use Introspection of JAVA object putted into context. If JAVA object is list (like java.util.List, java.util.Collection...), RTFTEmplate generate #foreach by using the first item of list. There are problems when list is empty. This method is not the best solution when you want use RTFTemplate in your project.
      RTFTemplate template = engine.createTemplate(new File("usecases/models/jakartavelocityproject/jakarta-velocity-model.rtf"));
      template.put("header_developer_name", "Name");
      ...
  • RTFTemplate & XML fields available : this method use XML fields available that you can generate with AbstractUseCase class. I answer use this method when you decide integrate RTFTemplate in your project.
      // Load XML Fields Available
      InputStream xmlFieldsAvailable = new FileInputStream(new File("usecases/models/jakartavelocityproject/jakarta-velocity.fields.xml"));
      
      // Initialize RTFTemplate with XML fields available
      RTFTemplate template = engine.createTemplate(new File("usecases/models/jakartavelocityproject/jakarta-velocity-model.rtf"), xmlFieldsAvailable);
      ...

At end of Step 1, RTF & Velocity macro is generated. But what is this RTF content? Is it file? is it Reader? With RTFTemplate you can manage this RTF content. There are two strategies :

  • RTFTemplate process with RTF Velocity Stored

    In this strategy, RTF content (& Velocity macro) is File or content stored into database. This step is executed the first time when RTF content (& Velocity macro) doesn't exist. The second time this RTF content is not again generate. This strategy is performant but, developer must manage this RTF content. For use this strategy you use RTFTemplate and RTFVelocityTemplate classes :

      // STEP 1 : generate RTF File with Velocity Macro 
      RTFTemplate template = engine.createTemplate(new File("usecases/models/jakartavelocityproject/jakarta-velocity-model.rtf"));
      ...
      String rtfVelocity = outDirectory + "/jakarta-velocity-model.rtf.velocity.rtf";
      template.saveRTFVelocity(rtfVelocity);
      
      // STEP 2 : generate RTF File target (by using RTF file with Velocity Macro)
      RTFVelocityTemplate velocityTemplate = velocityEngine.createTemplate(new File(rtfVelocity));
      velocityTemplate.put("header_developer_name", "Name");
      ...
      velocityTemplate.merge(outDirectory + "/jakarta-velocity-model.rtf.rtf");
      
  • RTFTemplate process with RTF Velocity Reader

    In this strategy, RTF content (& Velocity macro) is Reader (Stream). This strategy generate each time the RTF content with velocity macro but developer must not manage this RTF content. For use this strategy you use only RTFTemplate class :

      RTFTemplate template = engine.createTemplate(new File("usecases/models/jakartavelocityproject/jakarta-velocity-model.rtf"));
      template.put("header_developer_name", "Name");
      ...
      template.merge(outDirectory + "/jakarta-velocity-model.rtf.rtf");

Step 1 : Parse RTF and add velocity macro

This step consists to parse RTF model and add velocity macro. For this :

  • parse source model RTF and construct RTFDocument by using special RTF Handler RTF Handler is an implementation of RTF Parser. RTF Parser can be compared to SAX Parser. When SAX Parser read XML source, it launches series of events such as startDocument, endElement. RTFParser launch events on specific RTF characters (like {,} or \) and on specific RTF keyword (like \trowd, \row, \field...). RTFHandler implements RTF Parser and construct RTFElement while parsing. (ex : RTF Parser launch startRow (when the keyword current is \trowd). RTF Handler create an instance of RTFRow when startRow is launched )
  • use RTFDocument for add velocity macro with RTF Document Transformer by calling transform() method. For instance replace RTF Bookmark by #foreach macro velocity to start a loop.

RTF Parser

RTF Parser launches events on specials RTF characters or keywords. RTF parser are abstract class. For using parser, you must implements RTF Handler which extends RTF parser. RTFTemplate project use two parsers :

  • AbstractCoreRTFParser which is the parser core. It launches events on special character :
    • startGroup when current character parsed is { . This event start group RTF.
    • endGroup when current character parsed is } . This event end group RTF.
    • handleKeyword when current character parsed is \ . This event is the start of RTF keyword.
  • AbstractDefaultRTFParser which is the default parser. This parser extends AbstractCoreRTFParser. It launches events on special keyword :
    • startRow when keyword is \trowd. This event is the start of RTF row.
    • endRow when keyword is \row. This event is the end of RTF row.
    • startField when keyword is \field. This event is the start of RTF field.
    • startBookmark when keyword is \bkmkstart. This event is the start of RTF bookmark.
    • endBookmark when keyword is \bkmkend. This event is the end of RTF bookmark.
    • startPage when keyword is \page. This event is page break.
    • startUserProperty when keyword is \propname. This event is start of user property.
    • endUserProperty when keyword is \staticval. This event is end of user property.
    • handleText for the others keywords.

Events of RTF Parser are abstract methods which must be implement by RTF Handler.

RTF Handler

RTF Handler extends RTF parser to implement each events of RTF Parser. RTFTemplate project implements two RTF Handler :

  • RTFIndentHandler which is able to indent RTF code on start and end group (character and ). This RTF Handler implements AbstractCoreRTFParser. On other words, this Handler implements events AbstractCoreRTFParser :
    • startGroup for indent start group (character {).
    • endGroup for indent end group (character }).
    • handleKeyword for adding RTF character (different of { and }) code to the Handler.

    Here example of indent RTF code for field :
    source RTF (before indentation) :

      {\field{\*\fldinst {\lang1036\langfe2057\langnp1036\insrsid331776  MERGEFIELD MY_FIELD \\* MERGEFORMAT }}
      {\fldrslt {\lang1024\langfe1024\noproof\langnp1036\insrsid331776 \'abMY_FIELD\'bb }} }
    

    target RTF (after indentation) :

      {\field
          {\*\fldinst 
              {\lang1036\langfe2057\langnp1036\insrsid331776  MERGEFIELD MY_FIELD \\* MERGEFORMAT 
              }
          }
          {\fldrslt 
              {\lang1024\langfe1024\noproof\langnp1036\insrsid331776 \'abMY_FIELD\'bb
              }
          }
      }
    
  • RTFDocumentHandler which is able to create RTFDocument. This RTF Handler implements AbstractDefaultRTFParser. this Handler implements events AbstractDefaultRTFParser :
    • startGroup to detect end of RTFElement.
    • startRow for create RTFRow.
    • endRow for detect end of RTFRow.
    • startField for create RTFField.
    • startBookmark for create RTFBookmark.
    • endBookmark for detect end of RTFBookmark.
    • startPage for create RTFPage.
    • startUserProperty for create RTFUserProperty.
    • endUserProperty for detect end of RTFUserProperty.
    • handleText for adding RTF code into current RTFElement parsed.

RTFDocumentHandler construct RTFDocument of RTF source model. The RTFDocument is used then for adding velocity macro with RTF Document Transformer.

RTF Document

RTFdocument is the whole RTF content of RTF source model. RTFDocument extends of RTFElement. RTFElement contains list of StringBuffer and other RTFElement. The StringBuffer contains RTF code. It exists several RTFElement type :

  • RTFDocument which represents the whole RTF code of RTF file.
  • RTFField which represents RTF field (like MERGEFIELD). This RTFElement contains just StringBuffer (RTF code of Field). Here RTF code of Field called MY_FIELD :
      {\field
          {\*\fldinst 
              {\lang1036\langfe2057\langnp1036\insrsid331776  MERGEFIELD MY_FIELD \\* MERGEFORMAT 
              }
          }
          {\fldrslt 
              {\lang1024\langfe1024\noproof\langnp1036\insrsid331776 \'abMY_FIELD\'bb
              }
          }
      }
    
  • RTFBookmark which represents RTF bookmark. This RTFElement contains just StringBuffer (RTF code of Bookmark). Here RTF code of Bookmark called MY_BOOKMARK :
      {\*\bkmkstart MY_BOOKMARK }{\*\bkmkend MY_BOOKMARK }
    
  • RTFRow wich represents RTF row. This RTFelement start with \trowd and end with \row. This RTFElement contains RTF code of row and other RTFElement like RTFField and RTFBookmark (for instance, row contains Mergefield). Here RTF code of Row which contains one MergeField.
      \trowd \irow0\irowband0\lastrow \ts15\trgaph70
      .....
      {\field
          {\*\fldinst 
              {\lang1036\langfe2057\langnp1036\insrsid331776  MERGEFIELD MY_FIELD \\* MERGEFORMAT 
              }
          }
          .....
      }
      .....
      \cellx9086\row

    For this example RTFRow is composed by :

    • StringBuffer (RTF code before RTF code of MergeField)
    • RTFField (RTF code of MergeField)
    • StringBuffer (RTF code after RTF code of MergeField)

RTF Document Transformer

For this part, you must know velocity syntax ($object.Value syntax and #foreach, #end velocity macro), see RTF Velocity engine or Home Velocity.

RTFDocumentTransformer is used to transform RTFDocument into another RTFDocument. It must implement the method transform(RTFDocument document) of the interface IRTFTransformer :

  public interface IRTFTransformer {
    public RTFDocument transform(RTFDocument document) throws IOException;
  }

document parameter is the soure RTF document. This method must return RTFDocument, RTF document transformed.
RTFVelocityTransformer class implements interface IRTFTransformer to manage velocity macro :

  public class RTFVelocityTransformer implements IRTFTransformer {
    ....
    public RTFVelocityTransformer(VelocityContext context) {
      ....
    }
    ....
    public RTFDocument transform(RTFDocument document) throws IOException {
      .... // return RTFDocument with velocity macro.
    }
    .... 
  }

This class needs VelocityContext. This will explain in ContextFieldsLoader section.

To explains correctly the usage of velocity macro, we use an example. In this example, we suppose whe have into velocity context :

  • Customer POJO with getter name putted into velocity context with the key customer.
  • Collection of Customer putted into velocity context with the key list_customers.

The two syntax velocity managed by this transformer are :

  • $object.Value velocity syntax to put the value Value of the JAVA object object. This syntax is used for the mergefield to transform mergefield RTF code with value of the JAVA object value. When you design your RTF you must create MERGEFIELD with $customer.Name name. The RTF source document contains this RTF code :
        {\field
            {\*\fldinst 
                {\lang1036\langfe2057\langnp1036\insrsid331776  MERGEFIELD $customer.Name \\* MERGEFORMAT 
                }
            }
            {\fldrslt 
                {\lang1024\langfe1024\noproof\langnp1036\insrsid331776 \'ab$customer.Name\'bb
                }
            }
        }
    

    after transformation, the RTF document transformed contains this RTF code :

          {\fldrslt 
              {\lang1024\langfe1024\noproof\langnp1036\insrsid331776 $customer.Name
              }
          }

    this RTF document will allow to display the name of the customer (after merging template and context with velocity) and MERGEFIELD $customer.Name will not exist more.

  • #foreach and #end velocity syntax must be added to RTF source code, to manage loop.

    @TODO

    • Iterate on list of customer when MERGEFIELD $customers.Name is included into two Bookmarks (START_LOOP_ and END_LOOP_) and NOT included into RTF row.
           * Bookmark START_LOOP_1
           
             * Mergefield list <$customers.Name>
           
           * Bookmark END_LOOP_1

      after transformation, the RTF document transformed contains this RTF code :

           * #foreach($customers.Name in $list_customers)
           
             * $customers.Name
           
           * #end
    • Iterate on list of customer when MERGEFIELD $customers.Name is included into RTF row and this last is NOT included into two Bookmarks (START_LOOP_ and END_LOOP_).
           * RTF row (RTF code before MERGEFIELD)
           
             * Mergefield list <$customers.Name>
           
           * RTF row (RTF code after MERGEFIELD)
    • Iterate on list of customer when MERGEFIELD $customers.Name is included into two Bookmarks (START_LOOP_ and END_LOOP_) and Bookmarks are included into RTF row.
           * RTF row (RTF code before MERGEFIELD)
           
             * Bookmark START_LOOP_1
           
               * Mergefield list <$customers.Name>
               
             * Bookmark END_LOOP_1
           
           * RTF row (RTF code after MERGEFIELD)
    • Iterate on list of customer when MERGEFIELD $customers.Name is included RTF row and this last is included into two Bookmarks (START_LOOP_ and END_LOOP_).
           * Bookmark START_LOOP_1
           
             * RTF row (RTF code before MERGEFIELD)
            
               * Mergefield list <$customers.Name>
               
             * RTF row (RTF code after MERGEFIELD)
           
           * Bookmark END_LOOP_1