Wednesday, March 28, 2012

persisting data for a pluggable tree

Hi all,

I am rather new to database design and modelling concepts in general
and was hoping for some advice on a problem I am trying to solve. I
have designed a piece of software that creates a tree with pluggable
nodes. Each node class can have 0 to n distinct classes plugged into
it to define the type for that node.

For example, a node plugged with a 'customer' class and an 'engineer'
class would indicate that this node in the tree is an engineer who is
also a customer. We could also have a 'owner', 'engineer' etc.

I now want to persist this tree in an SQL Server 2000 Database. I have
chosen to implement the nested set model, and have thought about the
following table design:

table NODE_TABLE:
lft INTEGER
rft INTEGER
propsID INTEGER

table PROPERTIES_TABLE:
propsID INTEGER
tableName VARCHAR

table CUSTOMER_TABLE:
propsID INTEGER
firstname CHAR
lastname CHAR

table ENGINEER_TABLE:
propsID INTEGER
num_completed_projects INTEGER
degree CHAR
school CHAR

table OWNER_TABLE:
propsID INTEGER
companyName CHAR

So, given the above example - I would have a NODE_TABLE that links to 2
entries in PROPERTIES_TABLE. One entry would link to an entry in the
CUSTOMER_TABLE, the other to an entry in ENGINEER_TABLE.

Are there any more efficient solutions to this problem? As i said, I
am very new to DB design and would welcome any feedback or suggestions
of how else I might model my pluggable tree in a Database. Thank you,

Bob YohanYou can get a copy of my book TREES & HIERARCHIES IN SQL for several
ways to model these things in SQL. But what you are trying to do is
force an OO model into SQL and it is not a good idea. There are no
classes or links in RDBMS; we have tables and references. We do not
mix data and metadata in a schema. The data model does not change
during the application.

Many years ago, the INCITS H2 Database Standards Committee(nee ANSI
X3H2 Database Standards Committee) had a meeting in Rapid City, South
Dakota. We had Mount Rushmore and Bjarne Stroustrup as special
attractions. Mr. Stroustrup did his slide show about Bell Labs
inventing C++ and OO programming for us and we got to ask questions.

One of the questions was how we should put OO stuff into SQL. His
answer was that Bells Labs, with all their talent, had tried four
different approaches to this problem and come the conclusion that you
should not do it. OO was great for programming but deadly for data.

I have watched people try to force OO models into SQL and it falls
apart in about a year. Every typo becomes a new attribute or class,
queries that would have been so easy in a relational model are now
multi-table monster outer joins, redundancy grows at an exponential
rates, constraints are virtually impossible to write so you can kiss
data integrity goodbye, etc.|||Thanks for your reply. I actually bought your book a couple of weeks
ago, which led to my choosing the nested set implementation. Great
book, I'd recommend it to anyone looking for a good summary of the
various methods of modelling trees in SQL. I guess my problem is not
how to model the tree itself given that I have the book, but rather how
to persist what is obviously an already built OO model in a RDBMS. I'm
sure your statements about the problems of mixing metadata and data
will hold true, but how can I persist my dynamic nodes without doing
this? I suppose I could use an XML file to map specific tables to
their respective nodes, but this doesn't seem like a more elegant
solution given, and in this case I might as well store the entire tree
structure in XML. Given that the trees will be > 10000 nodes on
average, I think an efficient XML solution is out. Any further
suggestions would be appreciated - thanks,

Bob Yohan|||The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had a
Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an INSTEAD
OF trigger to those VIEWs. Performance will suck and the code will be
a pain to maintain, but it is possible.|||Thank you Joe - you've been very helpful. Keep up the good work,

Bob

No comments:

Post a Comment