Elements vs. Attributes

We are writing a couple of tools to upload and download a set of files and metadata ( such as version, type, etc.) associated with the files to/from a Jackrabbit repository. Jackrabbit is a popular JCR implementation. The designers of the JCR specification rejected the idea of using XML Schema notation to describe and structure metadata uploaded by the user, rather they defined a new standard called the Compact NodeType Definition(CND) to enforce their types and constraints. However the CND is enforced only when metadata is uploaded and committed to the server. Nor there is a way to download user-friendly content from one server, then tweak it and then upload to another server (unlike ldif). Though Jackrabbit implementation provides a way to download the user metadata as raw XML, it was either over generalized as the downloaded XML always contained extra system information that our clients will never want to see, or it contained too little information which was not very useful such as system generated UUIDs for versions. Our solution to this was to invoke standard JCR getter APIs to selectively retrieve content. We then applied custom conversion from standard JCR contents to XML and vice versa so that we generate the desired information from the raw data. We defined a schema for the XML so that we can validate and structure it. In JCR, objects can be represented as Nodes, associations as parent-child relationships, and the associated metadata as properties of the node. In this scheme, a file becomes a Node and its attributes such as version, type, change history become its properties. We mapped  a CND Node Type to a XML Complex Type, a Node to an XML Element and a property to an XML Attribute. For example,  we converted the “rsr:distfile” compact node-type definition that represents a file : ( ‘+’ represents a child node, ‘-’ represents a property)

[rsr:distfile] > nt:unstructured, nt:file, mix:versionable
  - * (rsr:help)
  - rsr:type (STRING) mandatory
  - rsr:versionlabel (STRING) mandatory

to this snippet of XML schema,


   <xs:complexType name="distfile">
        <xs:sequence>
            <xs:element xmlns="http://www.xyz.com/rsr/1.0" minOccurs="0" name="help" type="help"/>
       </xs:sequence>
        <xs:attribute name="type" type="xs:string"/>
        <xs:attribute name="versionlabel" type="xs:string"/>
    </xs:complexType>

This one-to-one mapping was simple and straightforward. But we soon realized that properties did not always map well to attributes.  For example, we needed another property in the “distfile” node type to store the  change history of such a  node,  essentially as a very long string. We also updated the compact node type definition to add additional properties such as mutability of the file, etc. In both these cases, we realized that mapping attributes to properties was a bad choice because – (1) it reduced the readability of the xml due to the ever-growing “changehistory” content, and the HTML entity name conversions for the special characters in user’s text and (2) the ugly formatting headache it created after each new attribute was added.
One such example was:

<distfile name="copyservice.wsdl" type="wsdl" versionlable="copyservice-v1.2.5" changehistory="
1.2.4Added a new element &lt;error&gt;&lt;br /&gt;
1.2.5 Changed type of element &lt;error&gt;">
   <help>...</help>
</distfile>

We realized the mistake. A quick change was made to represent all properties for the JCR node type as XML elements of appropriate in-built types (string, int, boolean, etc.) and to treat the user content as CDATA. We added a  ”changehistory”  element of type string and treated its content as CDATA so that we no longer had to convert special characters in the change history. The final complex type for “distfile” became:
    <xs:complexType name="distfile">
        <xs:sequence>
            <xs:element xmlns="http://www.xyz.com/rsr/1.0" minOccurs="0" name="help" type="help"/>
            <xs:element maxOccurs="1" minOccurs="1" name="type" type="xs:string"/>
            <xs:element maxOccurs="1" minOccurs="1" name="changehistory" type="xs:string"/>
            <xs:element maxOccurs="1" minOccurs="1" name="versionlabel" type="xs:string"/>
            <xs:element maxOccurs="1" minOccurs="1" name="mutable" type="xs:boolean"/>
        </xs:sequence>
    </xs:complexType>

It finally lead to a much cleaner XML snippet as follows:

<distfile name="copyservice.wsdl">
   <help>...</help>
   <type>wsdl<wsdl>
   <versionlabel>copyservice-v1.2.5</versionlabel>
   <mutable>false</mutable>
   <changehistory><[!CDATA[
1.2.4 Added a new element <error><br />
1.2.5 Changed type of element <error>
    ]]
    </changehistory>
</distfile>

Advertisement

About Amar Deka

Software Engineer
This entry was posted in Open Source, Technology, Thoughts. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s