Difference between revisions of "CommentAlgorithm"

From GeeklogWiki
Jump to: navigation, search
(Representing Geeklog's Comments in the Database)
(Representing Geeklog's Comments in the Database: Added titles)
Line 3: Line 3:
 
Since Geeklog 1.3.10, Geeklog's comments have been stored in the database with columns (''lft'', ''rht'') representing a (modified) pre-order traversal ordering. This method of storage was chosen to improve comment retrieval performance, especially when viewing a subset of the comments for a given story.
 
Since Geeklog 1.3.10, Geeklog's comments have been stored in the database with columns (''lft'', ''rht'') representing a (modified) pre-order traversal ordering. This method of storage was chosen to improve comment retrieval performance, especially when viewing a subset of the comments for a given story.
  
 +
==Details==
 
The idea behind pre-order traversal ordering, is that each node is ordered (or visited) before any of its children. In the modified preorder traversal algorithm implemented by Geeklog, the ''lft'' value of a comment is always less than the ''lft'' and ''rht'' values of all of the children of that node. Likewise the ''rht'' value of a node is always greater that the ''lft'' and ''rht'' values of all of the children of that node. (And trivially for any node, ''lft'' is always less than ''rht''.) A node without any children has the property ''rht'' is exactly one more than ''lft''.  
 
The idea behind pre-order traversal ordering, is that each node is ordered (or visited) before any of its children. In the modified preorder traversal algorithm implemented by Geeklog, the ''lft'' value of a comment is always less than the ''lft'' and ''rht'' values of all of the children of that node. Likewise the ''rht'' value of a node is always greater that the ''lft'' and ''rht'' values of all of the children of that node. (And trivially for any node, ''lft'' is always less than ''rht''.) A node without any children has the property ''rht'' is exactly one more than ''lft''.  
  
Line 84: Line 85:
 
Please note that in the SQL examples above, parts of the query were left out to make the algorithm more clear. For more detail, see the Geeklog source code (specifically lib-comment.php).
 
Please note that in the SQL examples above, parts of the query were left out to make the algorithm more clear. For more detail, see the Geeklog source code (specifically lib-comment.php).
  
Related Articles:
+
==External related articles==
 
* [http://www.sitepoint.com/article/hierarchical-data-database/2/ Storing Hierarchical Data in a Database]
 
* [http://www.sitepoint.com/article/hierarchical-data-database/2/ Storing Hierarchical Data in a Database]
 
* [http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html Trees in SQL]
 
* [http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html Trees in SQL]
 
* [http://en.wikipedia.org/wiki/Tree_traversal Tree Traversal]
 
* [http://en.wikipedia.org/wiki/Tree_traversal Tree Traversal]
 
* [http://dev.mysql.com/tech-resources/articles/hierarchical-data.html Managing Hierarchical Data in MySQL]
 
* [http://dev.mysql.com/tech-resources/articles/hierarchical-data.html Managing Hierarchical Data in MySQL]

Revision as of 08:28, 23 October 2009

Representing Geeklog's Comments in the Database

Since Geeklog 1.3.10, Geeklog's comments have been stored in the database with columns (lft, rht) representing a (modified) pre-order traversal ordering. This method of storage was chosen to improve comment retrieval performance, especially when viewing a subset of the comments for a given story.

Details

The idea behind pre-order traversal ordering, is that each node is ordered (or visited) before any of its children. In the modified preorder traversal algorithm implemented by Geeklog, the lft value of a comment is always less than the lft and rht values of all of the children of that node. Likewise the rht value of a node is always greater that the lft and rht values of all of the children of that node. (And trivially for any node, lft is always less than rht.) A node without any children has the property rht is exactly one more than lft.

Example: Comment A is a top level comment to a story. Comments A-A and A-B are children of A and comments A-A-A, A-A-B, and A-A-C are children of A-A. The table below shows the values for each comment:

comment parent lft rht
A N/A 1 12
A-A A 2 9
A-B A 10 11
A-A-A A-A 3 4
A-A-B A-A 5 6
A-A-C A-A 7 8

The resulting database structure has many advantages. Selecting a subtree from the database is as easy as specifying the lft and rht numbers of the root node of the subtree. For instance, to get the subtree with root comment "A-A":

SELECT * FROM comments WHERE lft >= 2 AND rht <= 9

Or selecting all the children of a give node:

SELECT * FROM comments WHERE lft > 2 AND rht < 9

To select comments ordered for display in a "nested" format:

SELECT * FROM comments ORDER BY lft

Storing the preorder traversal information also allows us to easily find the path to a node (i.e. a nodes parents, grandparents, etc).

SELECT * FROM comments WHERE lft < 7 AND rht > 8</tt>

Counting the number of children of a comment is equally easy: <code>('rht' - 'lft' - 1)/2

Previously these operations required a (very slow) recursive algorithm. The way Geeklog had implemented this algorithm required one database query for every comment in the result set. Another way would have been to do one query to fetch all comments, and then use a recursive function to choose the desired comments. Both methods were extremely inefficient and caused load problems on frequently viewed Geeklog sites.

The downside to this method is that inserting and deleting comments takes more work (three queries instead of one) and must be done in a single transaction (or a write lock on the table). Inserts:

START TRANSACTION
UPDATE comments SET rht = rht + 2 WHERE rht >= comment.rht
UPDATE comments SET lft = lft + 2 WHERE lft >= comment.rht
INSERT INTO comments SET lft = comment.rht, rht = comment.rht + 1
COMMIT

Deletions:

START TRANSACTION
UPDATE comments SET rht = rht - 2 WHERE rht > comment.rht
UPDATE comments SET lft = lft - 2 WHERE lft > comment.rht
DELETE FROM comments WHERE id = comment.id
COMMIT

However, since fetching (reading) comments is a much more frequent action then adding or deleting comments and the increase in efficiency for fetching comments is much greater than the loss of efficiency for adding and deleting comments, the trade off is worth while.

Please note that in the SQL examples above, parts of the query were left out to make the algorithm more clear. For more detail, see the Geeklog source code (specifically lib-comment.php).

External related articles