Hi. I need help with a problem in work involving event streaming, AWS S3 and ingesting transactional Json files into an SQL Server database.
Our work currently use on premise SQL server database which is sourced using a daily batch feed. However shortly we will be moving to a new source system that supplies the data in transactional Json files via an event stream on AWS S3. Despite the cloud storage our database will remain as on-premise for the time being. (Even if cloud database is a better solution, this is not an option as we are under tight timescales to transform the source)
Note I’m completely new to AWS S3 and our company is currently setting up an account for the source company to replicate the event stream to our S3 account so please assume I know little in your response. I have also only basic SQL server knowledge but learning fast
So, in summary my main questions are:
-
What is the best way to ingest 100’s of thousands of transactional Json files from an AWS S3 bucket into SQL server? (Preferably a solution that doesn’t involve purchasing extra software if possible)
-
How can I read many transactional files (each representing an account key record) with data that varies in each file (since the transactional file only contains the fields that have been updated by an event) into SQL Server. Each file can contain anything from 100 to 300 fields. I can obviously create code to read an isolated file using many OPENROWSET and CROSS APPLY clauses but, how would I possibly code around fields not existing in the json? e.g. if a field called colA has not been updated as part of the event, the json file will not represent this field as “colA": [ ] but instead will not exist in the file altogether. If there is not an way round to code for this in SQL server or in SSIS, do you know of any software that will solve this problem? (Amazon Athena has been mentioned but as far as I can see, its just a querying tool)
-
How can I go about automating the ingestion? I have heard I can create something in SSIS to automate this?
Many thanks in advance. And apologize for the lengthy multi facet question