Data Comparing Algorithms

last modified: January 31, 2007

Here is a general algorithm for comparing two similar tables (or copies of tables).

for r = each record of t1 not in t2  // based on primary key
  display record
end-for

for r = each record of t2 not in t1
  display record
end-for

for key = each record common to both

  r1 = getRecord(t1, key)
  r2 = getRecord(t2, key)

  for each attribute in r1 not in r2
    display key, attribute, and value 
  end-for  

  for each attribute in r2 not in r1
    display key, attribute, and value     
  end-for  

  for attr = each attribute in r1
    if r1[attr] not-equal-to r2[attr]
      display key, attribute, r1[attr], r2[attr]
    end-if
  end-for

end-for

Trees

One technique for comparing trees is to print them as text, and then use text-based difference finders common in most operating systems. Use recursion to create the outline-like format, and perhaps disable white-space sensativity so that different levels won't be over-emphasized. You may want to try it with and without the full node path, as each will emphasize different kinds of changes. Here is a pre-differenced sample text file:

node 1 foo=7 bar=9
..node 1.2 foo=12 bar=7
..node 1.3 foo=9
node 2 bar=6
..node 2.1 foo=12
....node 2.1.1 foo=12 bar=99
..node 2.2 bar=786  

(Dots used to prevent TabMunging, but may not otherwise be needed in actual file.)


Loading...