Best practice to get a System's data?

Hello!

What is the best practice to get access to a system's data?

For example, let's say you have a GUI sitting on a db, are there any considerations before you get backend access to the db for analytical work?

I was thinking impacting speed of the GUI for GUI users if there is no prioritisation set to serve the GUI requests better than any backend requests. Are there any other risks and opportunities?

I assume all infosec risks would be fine to manage as soon as the backend access is read only and password etc protected.

Other options of getting backend access would be to:

  1. use an API for realtime querying
  2. get a daily extract of the data and mirror it at a different db

Which would you recommend?

Thanks!

This is too vague and doesn't suggest that it's specific to SQL Server. Unless you provide more details we can't answer it.

1 Like

I suppose almost always the backend would be a SQL server, yes. This server would serve some frontend GUI where users would click buttons to insert/pull data.

More specifically, I am asking the good practice to use that data for analytical purposes i.e. dashboards etc. Particularly, if the reporting may impact the GUI users because the analytical process may need to pull or work with the whole data.

I am only making assumptions here as I am far from expert in that. Unfortunately, I don't have more information about the exact architecture and I would also like some generic answers on the best practice used in the industry: do we usually mirror the db or do we use it for analytical purposes at the same time as the GUI users use it?

Most people would simply move their data into PowerBI and use the PowerBI desktop to design their dashboards, and then publish them to the PowerBI service. That will be the easiest route and likely the best overall, as PowerBI is a robust platform with many options for dashboarding.

If PowerBI isn't an option, for reporting and analytics where the volume of data the user is interacting with can be millions of rows, the general advice is to query a data warehouse, or dimensionally modeled OLAP data source, that is optimized for aggregating data.

If a dimensional structure isn't available, then a tabular model that uses something like columnstore or other columnar optimized storage is preferred. The trend today is to use Parquet formatted files and a data engine that supports them. SQL Server 2022 can query Parquet files configured as external data sources.

Microsoft Fabric is the latest product for analytical data, and the underlying storage utilizes delta tables based on Parquet files. Google "Microsoft Fabric" to find out more. While it is a cloud product, you can utilize the underlying technology on-premises, but it's a lot of architectural work.

And lastly, if none of those options are feasible for your situation, then reporting against a read-only replica is preferred. How you replicate the data from the primary to the secondary replica has numerous options:

  • log shipping
  • SQL Server replication
  • Change Data Capture with associated tasks to copy to a secondary replica
  • Availability Groups, which require Enterprise Edition to read from secondaries

There are no generic answers on best practices. The factors for choosing these options depend on costs, tech budget, development time/delivery date, maintenance effort, and desired feature set. You haven't provided these. I've listed the options available in the order of preference I would use to achieve best performance.

I definitely would NOT try to build an application or GUI to do analytic dashboards, since there are other tools that do the job very well already. PowerBI certainly can, and many of its analytical features are available in Excel, just with fewer options for dashboarding.

1 Like