Skip to content

Why Your Playwright Tests Fail in CI (And Never Locally)

Why Your Playwright Tests Fail in CI (And Never Locally) CONCEPT

Section titled “Why Your Playwright Tests Fail in CI (And Never Locally) ”

You run your tests locally — everything is green. You push to CI — three tests fail. You run CI again — different three tests fail. Sound familiar?

This isn’t bad luck. It’s a set of fixable architectural mistakes. In this guide I’ll walk you through the six rules that eliminated flakiness in our test suite. No magic, no “just increase the timeout” advice.

All code examples are simplified for clarity — focus on the idea, not the boilerplate.


  1. Use Dependency Projects instead of globalSetup — if the environment is down, stop immediately instead of running 1000 failing tests
  2. Locator priority: getByRole > getByLabel > getByTestId. CSS selectors — last resort only
  3. Never use isVisible() in assertions — it’s a snapshot. Use Web-first assertions that wait
  4. Block analytics and tracking scripts with page.route — they cause networkidle to hang
  5. Trace Viewer is your debugging tool. Screenshots show you what, traces show you why
  6. Always authenticate via API, not UI — 50ms vs 5 seconds, per test

Your local machine is fast. CI is not. Less CPU, higher latency between services, multiple parallel processes all competing for resources. Asynchronous problems exist locally too — a powerful machine and fast network just hide them. When conditions get slightly worse, timings fall apart.

This is why “works on my machine” is such a common story in test automation.


When your staging environment goes down at night, do you want to run 1000 tests just to get 1000 failures? Of course not. But that’s exactly what happens without a proper dependency chain.

The solution: Dependency Projects

Instead of one big globalSetup file, build a dependency graph in your Playwright config:

playwright.config.ts
export default defineConfig({
projects: [
// Step 1: Authenticate and save session
{
name: 'auth-setup',
testMatch: /.*\.auth\.setup\.ts/,
},
// Step 2: Check if the environment is actually alive
{
name: 'healthcheck',
testMatch: /.*\.health\.setup\.ts/,
dependencies: ['auth-setup'],
},
// Step 3: Only run real tests if steps 1 and 2 passed
{
name: 'chromium',
use: { ...devices['Desktop Chrome'] },
dependencies: ['healthcheck'],
},
],
});

If auth fails or the environment is down — Playwright stops immediately. No wasted CI minutes, no flood of useless alerts.

Why not globalSetup?

globalSetup gives you dry logs when something fails. Dependency Projects give you full Trace Viewer support — you can see exactly what happened during setup: network requests, screenshots, console errors. And you can run just one project in isolation: npx playwright test --project=auth-setup.


UI login is slow. A full page load with all assets and rendering takes 2–5 seconds. An API login call takes 50–100ms. At CI scale, this difference adds up fast.

More importantly: you shouldn’t be testing your login form 500 times. Test it once, in a dedicated test. For everything else, just reuse the session.

auth.setup.ts
test('authenticate', async ({ request }) => {
// Direct API call — no browser rendering needed
await request.post('/api/login', {
data: { username: 'user@example.com', password: 'secret' },
});
// Save cookies and storage state for all other tests
await request.storageState({ path: '.auth/user.json' });
});

Then in your config:

use: {
storageState: '.auth/user.json',
}

Every test now starts already authenticated. Zero UI login overhead.


Rule #3: Use the Right Locators — and Know Why

Section titled “Rule #3: Use the Right Locators — and Know Why”

A locator isn’t just a way to find an element. It’s a statement about what your test actually cares about. The wrong locator makes tests brittle. The right locator makes failures meaningful.

Why getByRole is the default choice

getByRole finds elements by their semantic role in the accessibility tree — button, heading, link, dialog. This matters because role is tied to behavior, not implementation. A CSS class can be renamed, a DOM structure can be refactored — but if the element is still a button, getByRole still finds it.

One important nuance: getByRole often takes a { name: '...' } parameter to narrow down which element you mean. That name comes from the button’s text or aria-label. If you rely on visible text and the app is multilingual — that name changes per locale, and your locator breaks. The role survives translation. The name doesn’t.

There’s a bonus: if getByRole can’t find your element, it often means the element has no semantic role — which is an accessibility bug. Your test is catching a real problem.

// Finds the button regardless of CSS class or DOM structure
await page.getByRole('button', { name: 'Place order' }).click();

Why getByLabel for form fields

getByLabel finds inputs by their associated label text. The label is a contract between the UI and the user — if it changes, that’s a UX change worth knowing about. This locator also catches cases where a field exists but has no label — another real bug.

await page.getByLabel('Email address').fill('user@example.com');

When getByTestId is the right answer

getByTestId is stable but semantically blind — it finds the element regardless of its role, text, or visual state. That’s a feature in specific situations:

  • Ant Design, Material UI, or other component libraries — these generate DOM structures where a single Select or Combobox contains multiple elements with the same role: a hidden native input, a trigger button, a text field. getByRole(‘combobox’) picks the first one in DOM order, which is often not the one you need to interact with — and it can change between library versions
  • Multi-language apps — button text changes per locale; getByTestId doesn’t care
  • A/B tests or personalization — the label varies per user variant
  • Icon buttons without text — SVG icons with no aria-label
// Stable regardless of language or variant
await page.getByTestId('checkout-button').click();

The tradeoff: getByTestId passes even if the button is visually broken, hidden by styles, or inaccessible to screen readers. You’re trading semantic coverage for stability. That’s a conscious choice, not a default.

The decision algorithm

  1. Try getByRole first — if the element has a semantic role, this is always better
  2. If text is dynamic (translations, A/B) or the element has no stable role — ask your developer to add an aria-label. Then use getByRole(..., { name: 'aria-label value' })
  3. If that’s not possible — use getByTestId without guilt
// Both of these use getByRole — role is stable
await page.getByRole('button', { name: 'Place order' }).click();
await expect(page.getByRole('heading')).toHaveText('Order confirmed');
// Both of these use getByTestId — text is dynamic
await page.getByTestId('checkout-button').click();
await expect(page.getByTestId('order-status')).toHaveText('Confirmed');

Rule #4: Stop Using isVisible() in Assertions

Section titled “Rule #4: Stop Using isVisible() in Assertions”

This is one of the most common sources of flakiness. Here’s why:

// This checks visibility at this exact millisecond
const isVisible = await page.getByRole('button').isVisible();
expect(isVisible).toBeTruthy();

If the page is still loading at that millisecond — the test fails. Not because something is broken, but because you asked too early.

Web-first assertions wait for you:

// This polls the DOM until the condition is true (or timeout)
await expect(page.getByRole('button')).toBeVisible();

The difference: expect(locator).toBeVisible() keeps checking every ~100ms until the element appears or the timeout is reached. It’s a built-in retry loop.

Quick reference:

Instead of thisUse this
await loc.isVisible()await expect(loc).toBeVisible()
await loc.textContent() === '...'await expect(loc).toHaveText('...')
await loc.count()await expect(loc).toHaveCount(3)
await loc.isChecked()await expect(loc).toBeChecked()
await loc.isEnabled()await expect(loc).toBeEnabled()

One exception: isVisible() is fine inside conditional logic — for example, to decide whether to close a cookie banner before continuing. Just don’t use it as a final assertion.


Rule #5: waitForTimeout is not a solution — here’s what to use instead

Section titled “Rule #5: waitForTimeout is not a solution — here’s what to use instead”

If you feel the urge to add waitForTimeout — stop. In 95% of cases there’s a better tool. The question is which one.

Use web-first assertions (toBeVisible, toHaveText, toHaveURL, etc.) when:

  • An element appears or disappears after a click
  • The URL changes after navigation
  • Text updates after data loads
  • A form shows a validation error
  • Anything that is visible in the UI

This covers the vast majority of cases. Web-first assertions have built-in retry — you don’t need anything else.

// Built-in retry — no polling needed
await expect(page.getByText('Order confirmed')).toBeVisible();
await expect(page).toHaveURL('/dashboard');

Use expect.poll when:

  • A background job updated order status in the DB, and the UI only shows a spinner
  • A payment webhook arrived from Stripe or PayPal and updated the payment status
  • A message was processed from a queue (Kafka, RabbitMQ) by another service

The common pattern: you clicked something, the UI shows nothing useful (or just a spinner), but something should have happened behind the scenes. You can only verify it via a direct API call.

// Background job updated order status — not visible in UI
await expect
.poll(
async () => {
const response = await request.get(`/api/orders/${orderId}`);
const order = await response.json();
return order.status;
},
{
message: 'Waiting for order status to become PAID',
timeout: 30_000,
},
)
.toBe('PAID');

Use expect.toPass when:

  • You need to click a button repeatedly until the UI shows the expected result
  • An action needs to be repeated until a condition is met
// Click Refresh until status appears in UI
await expect(async () => {
await page.getByRole('button', { name: 'Refresh' }).click();
await expect(page.getByText('Status: Ready')).toBeVisible();
}).toPass({
intervals: [1_000, 2_000, 5_000],
timeout: 15_000,
});

Warning: If you find yourself writing expect.poll more than once or twice per test file — stop and reconsider. Either the UI is missing proper loading indicators, or the architecture needs rethinking. expect.poll is a last resort, not a default tool.


Rule #6: Block Analytics and Tracking Scripts

Section titled “Rule #6: Block Analytics and Tracking Scripts”

Your app loads Google Analytics, a support chat widget, maybe a heatmap tool. These services are slow, sometimes unreliable, and completely irrelevant to what you’re testing. They also interfere with networkidle waits.

Block them:

// In your fixture or beforeEach
await page.route(/google-analytics\.com|intercom\.io|hotjar\.com/, (route) => {
// Use fulfill instead of abort so the app doesn't hang waiting for a response
route.fulfill({ status: 200, body: 'ok' });
});

Watch out for fonts: Blocking external fonts can cause layout shifts, which may trigger Playwright’s stability checks and slow things down. Either allow fonts through or make sure your app handles missing fonts gracefully.


Rule #7: Use Trace Viewer, Not Screenshots

Section titled “Rule #7: Use Trace Viewer, Not Screenshots”

When a test fails in CI, a screenshot shows you what the page looked like. Trace Viewer shows you why it failed.

A screenshot: a frozen image of a page that looks fine.

Trace Viewer: every action, every network request, every console error, the DOM state before and after each step — all in a timeline you can scrub through.

Enable it in your config:

playwright.config.ts
use: {
// Only save traces when tests fail — keeps your artifacts small
trace: 'retain-on-failure',
screenshot: 'only-on-failure',
}

What to look for in Trace Viewer:

  • Actionability tab: If a click didn’t work, this tells you exactly which element was blocking it (a loading skeleton, an overlay, a tooltip)
  • Network tab: See which API calls were slow or failed
  • Console tab: See JavaScript errors that don’t show up in your test output
  • Snapshots: The actual DOM state at each step — you can open DevTools on a past moment in time

When a test fails because a button was “covered by another element” — Trace Viewer shows you the exact element, with a red dot on the snapshot. No guessing required.


Hydration: Why Clicks Sometimes Do Nothing

Section titled “Hydration: Why Clicks Sometimes Do Nothing”

If you work with React, Next.js, Vue, or Nuxt — you’ve probably seen this: Playwright clicks a button, no error is thrown, but nothing happens.

This is hydration. The server sends HTML that looks like a working page, but the JavaScript hasn’t loaded yet. The button exists in the DOM but has no event listeners. Playwright clicks it, the click lands, and nothing responds.

The fix: Wait for a signal that the app is ready before interacting:

// Wait for a loading indicator to disappear
await expect(page.locator('#global-loader')).toBeHidden();
// Or wait for a class that your app adds when hydration is complete
await page.waitForSelector('.app-ready', { state: 'attached' });

About force: true:

You might be tempted to use force: true to bypass Playwright’s checks. Before you do, understand what you’re skipping. Playwright’s actionability checks verify that an element is:

  • Visible — not hidden by CSS or outside the viewport
  • Stable — not moving (animations, transitions)
  • Enabled — not disabled or read-only
  • Receiving events — not covered by another element like a modal or overlay

When you add force: true, all four checks are disabled. You’re no longer testing what a real user experiences — you’re manipulating the DOM directly. The test passes, the user is still stuck.

There is one legitimate exception: hidden file inputs (<input type="file">). Browsers render this element as a native, hard-to-style button. Developers often intentionally hide it (make it invisible) and draw a custom button on top, consistent with the rest of the design. In such cases, Playwright cannot interact with the hidden element without force: true.

// force: true required — file input is visually hidden by design,
// replaced by a styled button that triggers it
await page.locator('input[type="file"]').setInputFiles('file.pdf', { force: true });

For everything else — find the root cause. If an element is covered, wait for the overlay to disappear. If it’s disabled, wait for the enabled state. force: true without a comment is a red flag in code review.


Don’t explain these rules in every code review. Automate it:

.eslintrc.js
module.exports = {
extends: ['plugin:playwright/recommended'],
rules: {
'playwright/no-wait-for-timeout': 'error', // No sleeps
'playwright/no-focused-test': 'error', // No test.only in commits
'playwright/no-page-pause': 'error', // No page.pause() in commits
'playwright/prefer-web-first-assertions': 'warn', // Nudge toward better assertions
'playwright/no-force-option': 'warn', // Flag force: true usage
},
};

error for things that definitely break your tests or CI. warn for architectural debt that’s worth addressing but not blocking.

One more thing: rules exist to be broken consciously. If you’re working with a component library that generates dynamic selectors you can’t control, // eslint-disable-next-line is sometimes the honest answer. The key word is consciously — disable the rule, write a comment explaining why, and move on. What you want to avoid is blanket disables that hide real problems.


Migration Cheat Sheet: Old Playwright vs Current

Section titled “Migration Cheat Sheet: Old Playwright vs Current”

If you’re coming from Selenium or older Playwright patterns, here’s the direct translation:

What you used to doWhat to do nowWhy
page.$(), page.$$()getByRole(), getByLabel(), getByTestId()Lazy evaluation + automatic retry on assertions
waitForSelector()Not needed — built into actionsPlaywright waits for actionability before every click/fill
waitForTimeout(3000)expect(loc).toBeVisible()Polls until ready instead of guessing
waitForNavigation()await expect(page).toHaveURL('/dashboard')toHaveURL has built-in polling, no race condition
isVisible() in assertionsexpect(loc).toBeVisible()One is a snapshot, the other waits
console.log('HERE')Trace ViewerFull timeline with network, DOM, console — in CI

SymptomLikely causeFix
Click lands, nothing happensHydrationWait for app-ready signal
Timeout in CI, passes locallySlow network / analyticsBlock third-party scripts
Selector not found after deployFragile CSS / text changedUse data-testid or getByRole
Random failures, no patternRace condition in assertionsSwitch to Web-first assertions
All tests fail at onceEnvironment downAdd healthcheck dependency

These six rules cover the most common sources of flakiness. Once you have them in place, the next level is async handling at scale — expect.poll, idempotency keys, contract testing, and data hygiene.

Want to go deeper into the architecture? Check out the advanced version of this guide: Playwright CI: What Senior Engineers Do Differently


All patterns in this article are implemented in the Playwright BDR Template on GitHub — clone it and see how everything fits together.