How to find what you need to on a server with a multitude of databases (and virtually no documentation)

I just recently started working as an analyst at a hospital network. My primary function will be to assist with clinical reporting and my main tool will be SSMS v18.

The servers I am referencing contain a slew of databases, few of which have any sort or documentation or legible naming convention. The only way I've been able to find the relevant tables/views is by asking more experienced analysts or tracing pre-existing reports back to their source.

How does one approach this... is there, for instance, a way for me to query all of the column names in all of the tables or views in every database (or at least those I've access to) and attempt to match every instance of (for example) a patient ID?

Any advice would be much appreciated!!

I use azure datastudio to create documentation (notebooks) and to organize my work. I can give it to anyone in the company so they know what I did and why. Nice thing is that it saves the results of your queries.

dbatools.io is nice to read, here you can find all kind of tools. For example:

dbatools docs | Find-DbaSimilarTable

When you want a quick way to find all tables and columns:

SELECT      c.name  AS 'ColumnName'
           ,t.name AS 'TableName'
FROM        sys.columns c
JOIN        sys.tables  t   ON c.object_id = t.object_id
-- WHERE       c.name LIKE '%MyName%'
ORDER BY    TableName
           ,ColumnName;
1 Like

They bettwr be paying you good money to do this forensics work

Check this out might find something useful there

For me your basic starting point is looking at tables names and columns

Select t.name, c.name
From sys.tables t
Join sys.columns c 
on t.object_id
= c.object_id
1 Like

Thanks for your response!!

RogierPronk: I'll take a look at dbatools - it has occurred to me that this might be an opportunity for me to document my process for the next analyst landing in my shoes.

I've run the query for searching out column names but is it possible to expand this to the entire server (query sys.databases for all tables then columns)?

yosiasz: Thanks!! this has been my starting point (querying all columns) but I wonder if I might be able to expand this to all of the databases (that I can access) as well?

Most of the time the report requests will contain a couple hard data points - a patient or room# or at least a description of the type of injury/treatment they want to measure (ex "concussion")...

Is there any (reasonable) way to do a search across an entire server (all databases, all tables) for all records containing a single (or small set) of data?

So your original post was about finding things. Is your new ask searching things within the found things?

No - the asking/tracing in my initial post assumes I have a resource to direct me (either another analyst who just "knows" or a preexisting report that already references the data/table I need).

My question is: Given a data point (or set of points) it completely infeasible for me to query all tables on all databases (on a server) and list out all the references to that data point?

Ideally the data will be explicit - like a patient or room ID, something I can get from the doc/nurse requesting the report - but it may be also a LIKE situation where I am parsing a string description ex. %FRACTURE%...

Does this make sense?

It is absolutely doable but not recommendable imho. So basically you are doing data forensics

Find a certain value "fracture" across all columns across all tables across all databases ?

Is the goal to be able to search the values in every column in every table in every database? If so - that is going to be problematic, especially in a healthcare system that could be 10's of TB's of data. It will also be incredibly challenging because a lot of the 'data' you are searching for may be in a document and not readily searchable.

Now - if you are looking to search for column names, search stored procedures/functions - then Redgates SQL Search is an ideal tool.

To be able to search effectively though - you need to figure out how to limit the search. For example, if searching for specific patient information you might want to search for a PatientID or PersonID. If you need to find tables/objects related to a 'treatment' - then you probably need to find tables that have something to do with a Procedure (e.g. ICD Procedure - or Diagnosis Code or DX or PX).

Once you have identified the procedure code table - then you can search for that table and find any (if defined) FK relationships and/or code that uses that table.

Yes; as an example, search through all (databases, tables, columns for) strings for %FRACTURE% (or some other such term) or for an numeric value matching "123456789" for instance.

Yeah it occurred to me that such a broad search might not be feasible but I figured it couldn't hurt to ask:)

In reality, it wouldn't be "all" databases but a smaller subset (a large set of legacy DBs could be ignored) and the search could take some time to run as typically the lag time between the point where a report is requested to delivery is at least a week.

Knowing the proper databases/tables is exactly the problem - nothing is documented, it's all basically institutional knowledge - so I guess my question could be revised as:

Given a "reasonable" subset of "possibly relevant" databases, is there a way for me to search all the tables on all the databases...

Scripting some of this (in Python, C#, etc) might also be an option if it's faster to pull it off the server and run it in memory on a desktop machine.

Do they have it offloaded to an olap server or backup or will you be hitting the live server?

If you hit live server might be shortest gig you ever done.

No definitely not a live server - hah, like Cerner would ever let us touch the live server - OLAP running like 24-48hrs behind the live server.

1 Like

I thought Cerner was all Oracle based - and the fact that you are running against a lag would indicate using Oracle's ability to create a lag, which is not something that is available in SQL Server either.

That almost certainly means the structure of the data in that system is not the same as what is in the source system and that process is something managed by the EDW team. To be able to determine where specific data resides - you would need documentation from that team.

There is no way to query every column in every table in a database directly - you would need some dynamic code that builds a query for each table. And if that table is of any significant size, it can and will take a very long time to get any results.

This isn't a question that can be resolved through some scripting - regardless of language. This is just going to be something you have to learn over time - with the help of those who have already gone through the process of learning.