Need to build digital archive and store data for 10+ years in Azure.
Data starts out in on-prem SQL Server, moved over to Azure SQL DB, data will be static but needs to be read via Azure Search (or similar) tool pulling back small data sets and physical documents.
Over 10 years Azure SQL DB will be expensive as system hardly used. What is the best option to take a copy of the relational data and still make it searchable. Obviously no DML, ACID or anything relational required anymore.
DocumentDB, MongoDB, Polybase into raw data on Azure Blob cold storage.
Data will be static and only reference by an eDiscovery tool such as Azure Search.
Use case is a 1TB DB with 25TB documents CRM is decommissioned and converted into digital archive for 10+ years. My debate is whether to stick with SQL Server to host the data or convert to noSQL option to drive down costs as ACID etc not required anymore
I don't know how well the other solutions would scale to your 1TB database requirement so I can't speak to that. I have to admit that I find it a little difficult to believe that a 1TB base database has no ACID requirements but whatever.
I have a similar database but it's for phone call recordings. The main database is only several 10's of GB of data and the remainder of the 1TB database is call recordings. I elected to store them in a large but partitioned table because no one protects data like a DBA. When I first took a look at what they were doing, they had the data stored in files. Over time, 10% had gone missing and another 10% had gone corrupt. Both of those observations are what made me decide to store the data in the database itself.
If that doesn't quite float the boat for you, you can easily develop a bit more of a customized solution where the data is stored in zip files on a disk, use SQL Server to keep track of the file paths and name in a table, and use xp_CmdShell or an EXEC Task to locate the file, unzip it, and either return the contents of the unzipped file or leave it for some app to pick up on. Doing such a thing might bring your 25TB requirement down to only 5TB thanks to the typical 80% compression that a zipped file frequently enjoys.
The good part about all of that is 1) you wouldn't have to learn a new DB or "NoSQL" solution, 2) the 5TB requirement might be small enough to just buy some extra-disk for the on-premise hardware, which would eliminate the expense of cloud storage and data retrieval from the cloud altogether (although you'd have to provide your own DR solution but I assume you have one already), and 3) if the requirements ever do change in the next 10 years to have a need for ACID, then you won't have to change back to an RDBMS because you'll already be there.
Interesting, sorry for hijacking the thread. We have document management system with the files stored on disk (renamed as sequential numbers) and the Properties / Metadata in SQL. I'm not aware we've ever had a loss or corruption (and the documents are heavily used, so likely it would have come to light), but I think at the least I should implement a CRC on the files, stored that in the DB too, and check it regularly.
Dunno whether the files being managed solely by the DMS, rather than by users (e.g. in a shared folder) makes a difference (to "disappearance" if not to "corrupted")
I didn't reply to the original question, but the thought that went through my mind was:
What's the cost of building, and maintaining, the new Off-site or NOSQL-type solution, compared to just buying some more disks and hosting the data in-house, online, and with the same software and procedures as at present? (presumably with inactive data moved out of the main, active, current-data DB)
Thank you for everyone's response, thought the scenario would be interesting for the community to tackle.
Current on-prem SQL DB owned by company that will not exist in 12 months, however regulators force the DB exists for another 10-20 years and accessible for lookup's and document retrieval. Current infrastructure, DBAs, stakeholders are being removed then final solution handed over to sister organisation to host.
This is why migrate to Azure, convert Relational DB to more simplified option, then create search function allowing end users to query the data store and recovery documents. Current architecture would be:
Migrate on-prem SQL to Azure SQL DB
Create JSON document for every unique parent document (millions)
Import JSON into CosmosDB
Store physical files on Azure Blob cold storage with reference from JSON doc
Build Azure Search over CosmosDB
Build simple user UI to search records and pull back documents (contracts, .PDFs, .docs, .. etc) over Azure Search
Decommission Azure SQL DB
Still trying to justify the effort to migrate from Relational to noSQL (CosmosDB). Cost will be one factor, and the fact the end client was a simple solution non technical users can search upon is another and no DBAs required.