Hi everyone. I've looked through the forums for an answer to this question but I can't find much, so I thought I would ask. The data model in the link below is just a made up example, so some things might not strictly make much sense.
http://i61.tinypic.com/hx851c.jpg
I'll refer to the green section as the "generalised model" and to the pink section as the "specific model". They both represent the same concepts at a different level of generalisation.
The following are some bare-bones tables for the generalised model, which is the focus of my question. Just the primary keys (and foreign keys for the junction tables) are shown:
ContactType(ct_id)
ContactMechanismType(cmt_id)
Valid_ContactType_ContactMechanismType(ct_id, cmt_id)
Contact(c_id)
ContactMechanism(cm_id)
Contact_ContactMechanism(c_id, cm_id)
The entity types "Person" and "Organisation" in the specific model are subtypes of "Contact" and instances of "ContactType", which is basically a discriminator/type entity in ER terms (?). My rationale for taking this approach is that the requirements for the actual model I'm working on have been very volatile and they're bound to change in the future as well. So I think it makes more sense to change records in "ContactType", for example, rather than keep adding/removing tables to the database.
Now, my issue: In the specific model, a Person can have only 1 Address and no POBox. In the generalised model, however, a Contact (which may be classified as a Person by its association with ContactType) may have many Addresses and many POBoxes, as the model stands.
The information for which ContactMechanismType is valid for each ContactType is stored in Valid_ContactType_ContactMechanismType. So, it is explicitly stated in terms of data, but how can I enforce it on the association table Contact_ContactMechanism (Rule 1 in diagram)? These are the options I can think of:
1. Use referential integrity constraints. The problem is that there are no foreign keys explicitly stated between the 2 tables. Contact_ContactMechanism(c_id, cm_id) would need to be turned to Contact_ContactMechanism(c_id, cm_id, ct_id, cmt_id) which would provide the foreign key (ct_id, cmt_id) to Valid_ContactType_ContactMechanismType(ct_id, cmt_id), but introduces redundant data (ct_id, cmt_id) in the Contact_ContactMechanism relation, along with a risk of ending up with inconsistent data.
A way I've thought around this is to let the identity of Contact be defined by its type plus a unique id for they type, i.e. (c_id, ct_id) being the primary key for Contact. Same goes for ContactMechanism(cm_id, cmt_id). In this case the junction tables is Contact_ContactMechanism(c_id, cm_id, ct_id, cmt_id) which contains a foreign key to Valid_Contact_ContactMechanism that can be used to enforce Rule 1. The problem with this approach is the extra storage space, the more complex queries, unknown consequences in interacting with applications (e.g. composite key + ORM?), plus conceptually I've coupled the identity of a Contact with its classification. Now I can't reclassify a Contact (e.g. from Person to Organisation) without changing its identity, which feels fishy.
2. Use a trigger or a stored procedure to check if I'm importing valid Contact - ContactMechanism tuples in the Contact_ContactMechanism table. But this does not feel satisfactory either and I'm pretty sure it's going to create performance issues in bulk insert situtations.
3. Enforce the rule in the application. This options takes the rule completely out of the database and if another application uses the database it may corrupt it unless it implements the rule itself.
So... these are my troubles. And only the easiest part because then there's the matter of enforcing the correct cardinality constraints between Contact and ContactMechanism, but this is a matter for another time. To all you experienced data modelers out there, is there a best-practice for handling these situations in the database? Any help will be greatly appreciated :)
Thanks,
Nico