Group Related Data

last modified: January 30, 2005

Note: This is not a "pattern" as much as it is a "refactoring". I've used the common form for pattern descriptions because I'm not sure how to lay out a refactoring description.


Problem: An XmlLanguage (or similar) document format is too hard to process.

Context: Even though each datum is well-isolated (see IsolateEachDatum), the data is still unwieldy. Cumbersome and unnatural parsing must be performed and non-obvious conventions must be applied in order to make heads or tails of the data structure. In particular, there are flat lists that contain data with implicit internal structure.

Solution: Make sure there are enclosures for all groups of logically connected data. However, avoid multiple layers of unnecessary wrapping; once your data is well-structured, leave it at that. The deeper you make it from that point on, the more annoying it will be to work with -- at little to no benefit.

It may appear as though adding structure would just mean adding bloat/noise/weight or whatever. Nevertheless, if in doubt, go with more structure. It is easy to transform well-structured data to poorly-structured data; going the other way is harder. If still in doubt, maybe you'll have to take other factors into account, such as expected amount of hand-editing and the effect on that, and whether or not size is absolutely critical (if you're using XML, now is a good time to reconsider).

Resulting Context: This is primarily from an XML point of view, but most of this applies to, e.g., EssExpressions as well:

Known Uses: This problem is very common and often occurs when people inexperienced with processing data is responsible for designing data formats. It is relevant to XML as well as S-expressions, although it's more common in the XML world -- if only because the XML world is less mature.

Author: DanielBrockman 2004-01-20


We start out with an abstract example that quickly get to the core of the issue. EssExpression examples are in italics.


Polygon example

(See IsolateEachDatum for beginning of story.)

Bad -- hard to process (does not group related data):

<polygon>
  <coordinate>1</coordinate>
  <coordinate>6</coordinate>
  <coordinate>3</coordinate>
  <coordinate>5</coordinate>
  <coordinate>4</coordinate>
  <coordinate>6</coordinate>
</polygon>

(polygon 1 6 3 5 4 6)

Good -- easily processable (provides good structure by grouping related data):

<polygon>
  <vertex x="1" y="6" />
  <vertex x="3" y="5" />
  <vertex x="4" y="6" />
</polygon>

(polygon (1 6) (3 5) (4 6))

Dictionary example

Bad -- hard to process (has implicit structure in an element list):

<dictionary>
  <term>ROFL</term>
  <definition>rolling on the floor laughing</definition>
  <term>LMAO</term>
  <definition>laughing my arms off</definition>
</dictionary>

(dictionary "ROFL" "rolling on the floor laughing" "LMAO" "laughing my arms off")

Good -- easy to process (has just the right amount of structure):

<dictionary>
  <entry>
    <term>ROFL</term>
    <definition>rolling on the floor laughing</definition>
  </entry>
  <entry>
    <term>LMAO</term>
    <definition>laughing my arms off</definition>
  </entry>
</dictionary>

(dictionary
  ("ROFL" "rolling on the floor laughing")
  ("LMAO" "laughing my ass off"))

Also good (effectively equivalent):

<dictionary>
  <entry term="ROFL">rolling on the floor laughing</entry>
  <entry term="LMAO">laughing my arms off</entry>
</dictionary>

(no direct S-expression equivalent)

Wrong. Here it is:

(dictionary
  (entry :term "ROFL" "rolling on the floor laughing")
  (entry :term "LMAO" "laughing my ass off"))

Right, but I think that one doesn't really make sense. First, what purpose do the seemingly dead-weight copies of "entry" serve? Second, why use ":term x" instead of the more obvious "(term x)"? Third, why mention "term" at all? It seems all those things are just there to create a literal translation of a piece of non-ideal XML, and that's kind of silly. I mean, in that case, you might as well emulate the end tags as well, yielding

(dictionary
  (entry :term "ROFL" "rolling on the floor laughing" /entry)
  (entry :term "LMAO" "laughing my ass off" /entry) /dictionary)

Mind you, I certainly didn't intend this to be yet another XML vs. S-expressions page; rather, I wanted to show how to refactor XML, and while at it, also roughly corresponding S-expressions. However, I can see that the "no S-expression equivalent" looks kind of provocative. Maybe we should instead stick the alternative XML solutions right next to each other and show the single sane S-expression solution below them? Thanks for your input.


Real-world examples follow.


Apple's new plist format

(See IsolateEachDatum for the beginning of the story.)

Bad -- hard to process (uses an ungodly mix of both too much and too little structure):

<plist version="1.0">
  <dict>
    <key>AnimateSnapToGrid</key>
    <true />
    <key>EmptyTrashProgressWindowLocation</key>
    <point x="79" y="44" />
    <key>FileViewer.LastWindowLocation</key>
    <rectangle x1="228" y1="140" x2="1091" y2="826" />
  </dict>
</plist>

Good:

<plist version="1.0">
  <property>
    <key>AnimateSnapToGrid</key>
    <true />
  </property>
  <property>
    <key>EmptyTrashProgressWindowLocation</key>
    <point x="79" y="44" />
  </property>
  <property>
    <key>FileViewer.LastWindowLocation</key>
    <rectangle x1="228" y1="140" x2="1091" y2="826" />
  </property>
</plist>

Also good (but note that this variant would not have been an option had I chosen to go with structured keys):

<plist version="1.0">
  <property key="AnimateSnapToGrid">
    <true />
  </property>
  <property key="EmptyTrashProgressWindowLocation">
    <point x="79" y="44" />
  </property>
  <property key="FileViewer.LastWindowLocation">
    <rectangle x1="228" y1="140" x2="1091" y2="826" />
  </property>
</plist>

How about:

<plist version="1.0">
  <property key="AnimateSnapToGrid">
    <boolean value="true" />
  </property>
  <property key="EmptyTrashProgressWindowLocation">
    <point x="79" y="44" />
  </property>
  <property key="FileViewer.LastWindowLocation">
    <rectangle x1="228" y1="140" x2="1091" y2="826" />
  </property>
</plist>

Here, the boolean validation is according to type, type being 'boolean' element (meaning a property may contain 'boolean' element and not 'true' or 'false' element).

Yes, you're absolutely right. I like this better too. In your version, ElementNamesAreTypeNames, which strikes me as highly intuitive.


See also: GroupRelatedInformation


CategoryXml, CategoryRefactoring, CategoryInfoPackaging


Loading...