End to end and smoke tests give a really valuable angle on what the app is doing and can warn you about failures before they happen. However, because they're working with a live app and a live database over a live network, they can introduce a lot of flakiness. Beyond just changes to the app, different data in the environment or other issues can cause a smoke test failure.
How do you handle the inherent flakiness of testing against a live app?
When do you run smokes? On every phoenix branch? Pre-prod? Prod only?
Who fixes the issues that the smokes find?
My team has just decided to make working smokes a mandatory part of merging a PR. If the smokes don't work on your branch, it doesn't merge to main. I'm somewhat conflicted - on one hand, we had frequent breaks in the smokes that developers didn't fix, including ones that represented real production issues. On the other, smokes can fail for no reason and are time consuming to run.
We use playwright, running on github actions. The default free tier runner has been awful, and we're moving to larger runners on the platform. We have a retry policy on any smokes that need to run in a step by step order, and we aggressively prune and remove smokes that frequently fail or don't test for real issues.
As always I would say there is a huge "it depends".
For context, I am part of a small team of engineers, working on a relatively new product, we have continuous deployment setup for our release branches. We prefer many small PRs, think at least a PR a day per engineer.
I am responsible for setting up a new e2e test suite right now, so it's possible I reconsider later on. But, there are a couple lessons learned from our previous iteration.
- Our pipeline was slow (20-30 mins), flakiness was a no go. Decreasing pipeline time increased tolerance for flakiness.
- Flakiness on the pipeline translated to flakiness on the production instances. When we started caring for those our sentry got much more happy.
- We didn't have the time to go back and fix issues, so we stopped having nightlies. If it's important enough we should block merging on main and fix it.