How much flakiness do you tolerate in end to end tests?

kersplort@programming.dev · edit-2 1 year ago

How much flakiness do you tolerate in end to end tests?

kersplort@programming.dev · 1 year ago

My team has just decided to make working smokes a mandatory part of merging a PR. If the smokes don't work on your branch, it doesn't merge to main. I'm somewhat conflicted - on one hand, we had frequent breaks in the smokes that developers didn't fix, including ones that represented real production issues. On the other, smokes can fail for no reason and are time consuming to run.

We use playwright, running on github actions. The default free tier runner has been awful, and we're moving to larger runners on the platform. We have a retry policy on any smokes that need to run in a step by step order, and we aggressively prune and remove smokes that frequently fail or don't test for real issues.

souperk@reddthat.com · 1 year ago

As always I would say there is a huge "it depends".

For context, I am part of a small team of engineers, working on a relatively new product, we have continuous deployment setup for our release branches. We prefer many small PRs, think at least a PR a day per engineer.

I am responsible for setting up a new e2e test suite right now, so it's possible I reconsider later on. But, there are a couple lessons learned from our previous iteration.

Our pipeline was slow (20-30 mins), flakiness was a no go. Decreasing pipeline time increased tolerance for flakiness.
Flakiness on the pipeline translated to flakiness on the production instances. When we started caring for those our sentry got much more happy.
We didn't have the time to go back and fix issues, so we stopped having nightlies. If it's important enough we should block merging on main and fix it.