Some of my colleagues have invested a lot of time modelling all the company's processes and (main) projects ("processes" for short) in EA. Now management wants to have an overview of the processes' progress and wants to keep an extra eye on all the risk exposures >= 15 (risk impact * risk probability >= 15 ). They want us to use the data already entered into EA and enrich it with progress and risk data.
We have the DDL scripts that include the creation of the tables, their PK's and the data to initialise the code tables. There are no FK's. We are now trying to figure out the relations between the tables.
I did some data exploration: Of the 99 EA tables, 27 are of interest to us (the others are empty or do not contain data that is referenced to or from).
I checked for every PK column how often it matches with each and every (data type compatible) column in all the other tables. The idea was: when the number of matches "matched" = the number of times a column has a value (is Not Null) "NN_Col", we have found a foreign key column.
Hundreds of scipts like the one below were generated using the system tables:
SELECT 'dbo.t_attribute' AS PKT, 'ID' AS PKC
, 'dbo.t_diagramlinks' AS FKT, 'Hidden' AS FKC, (SELECT COUNT(Hidden) FROM dbo.t_diagramlinks) AS NN_Col, COUNT(*) as Matchend
FROM dbo.t_attribute AS PKT
INNER JOIN dbo.t_diagramlinks AS FKT ON PKT.ID = FKT.Hidden
HAVING COUNT(*) > 0
UNION ALL
...
Then theory met reality.
I would get eg. 1000 NN_Col and 950 matched.
In other words: the potential FK column contains data in 1000 records, but only 950 match with the PK.
This indicates that the potential FK column is not a real FK after all.
But..
We also ran a trace to get a better insight in de SQL queries. We were expecting to see JOIN's and extract FK-PK relations that way. However, we did not see one single JOIN clause. We did find just a few subselects that we could use to extract FK-PK relations, like the one below:
SELECT (
SELECT UserID
FROM t_seclocks insel1
WHERE insel1.EntityID = outsel.ea_guid
) AS UserID
,(
SELECT PackageFlags
FROM t_package insel3
WHERE insel3.Package_ID = outsel.Package_ID
) AS PackageFlags
,Name
,Package_ID
,Style
FROM t_object outsel
WHERE outsel.ea_guid = '{8CC7CA04-7DB1-4ac3-B67A-BB22377A216D}'
In this case, both outsel.ea_guid and outsel.Package_ID contain values, but there is no corresponding t_seclocks or t_package record. There is no referential integrity in the EA database.
We assume there once were associated records, they got deleted afterwards and the FK's were never cleaned up nicely and set back to NULL.
So it is possible that 1000 NN_Col and 950 matched is an indication of a FK-relation, the difference between 1000 and 950 due to referential integrity data quality issues.
But it is also possible the difference is an indication there is no FK-relation. The PK's are sometimes INT and sometimes NVARCHAR or VARCHAR containing GUID's or simple strings like "Name".
It is possible that two INT columns that both start counting from 1, by chance can be JOINed in many cases although they are not related.
Is the data model of the Enterprise Architect database available?
How do you get an insight into the data model - especially the relations between tables - from external databases?
I thought we had good ideas to extract the ERD, but it didn't turn out good.