Query store fill up and crash

so twice now my sql server 2017 query store filled up and as soon as it flips to RO the sql from my 12 terabyte db shoots craps. Had to clear out query store which causes more pain until things straighten out This is on a 2017 AG , has anyone else run into this before?

I'm not an expert on query store but my best guess is that you're running out of diskspace. You can configure how much space query store is taken and you shouldn't do things manual. You should check your MAX_STORAGE_SIZE_MB and SIZE_BASED_CLEANUP_MODE and MAX_PLANS_PER_QUERY.

For more info and my source:
Best practices with Query Store - SQL Server | Microsoft Docs

also wonder if this is related to being on an AG

It might, if you have a replica that isn't synchronizing. You'd also see transaction log file growth, and/or long send queues on the AG. The AG dashboard should point this out, but you may need to add some columns to the output (right-click column headings in DB list, choose additional columns).

this is becoming a pain the QS does fix the bad plan issue for this monster but QS itself seems to be problematic and if you have to purge it with an AG it is ugly

What is the purpose of collecting all that data in query store? There are limits that can be defined - and capture mode should probably be changed to auto.

What are your current settings? By default - the size is limited to 100MB which shouldn't be an issue unless you changed that to something much larger.

so the autocorrect is essential this 3rd party app has way too much parm sniffing. It is a lot of ad hoc which apparently gets fixed in 2019. I did not implement the solution but it is clear it is needed.

That doesn't answer the questions I have - but to your parm sniffing issue, what is the setting for 'Optimize for Ad hoc Workloads'?

Execute sp_configure 'optimize for ad hoc workloads';

If you have not set query store to Auto for collection - and you changed the default size - then revisit those settings. You also need to review all of the settings and reduce some of them - for example, you probably don't need to keep 200 plans per query, or keep stale queries for 30 days.

Again - if 'Query Store Capture Mode' is not set to Auto then you should change it. Auto will capture queries based on resource consumption - where All captures everything. With further analysis you might even be able to restrict it further using a custom capture mode...

If the default value for size is in effect - I cannot see how purging 100MB of data causes any issues, which makes me believe that setting has been changed to something much larger.

Final thoughts - have you considered enabling 'Legacy Cardinality Estimator' on this database? Or disabling 'Parameter Sniffing' option for this database? Either may have an impact - depending on what the actual issues are and when they started. It sounds like this is a system that was upgraded from an earlier version and the vendor did not update their code appropriately.