Sometime in late 2025, Playwright quietly overtook Selenium to become the most-used end-to-end testing framework on the planet. The latest QA tooling surveys put it at 45.1% adoption, with Selenium at 22.1% and Cypress at 14.4%. For a project that started as a Microsoft side experiment in 2020, that is a generational shift.

It is also, almost certainly, a peak.

Because while teams were busy migrating from selenium-webdriver to @playwright/test, a different conversation was happening one floor up: agentic AI testing. By April 2026, every major QA-trends report — Tricentis, Applitools, ThinkSys — points at the same thing. The question stopped being "which framework writes my selectors better." It became "why am I writing selectors at all."

A blurred long-exposure photo of streaks of light on a circuit board, evoking automation and machine speed

What "agentic testing" actually means#

The phrase gets used loosely. Stripped to fundamentals, agentic testing is a small perception–action loop that an LLM-driven agent runs against a real browser:

Snapshot the current page — DOM, accessibility tree, screenshot.
Plan the next action — the model picks a click, fill, scroll, or wait.
Act in the browser — execute the action via CDP or an equivalent driver.
Observe the result — toast text, URL change, new element, console error.

Then it does it again. And again. Until either the goal is observably met — and the agent quotes the evidence (a confirmation message, a redirect URL, a row in a table) — or it gives up.

Diagram of the four-step agentic testing loop: Snapshot, Plan, Act, Observe, with a retry arrow looping back and a "Done" terminal that requires quoted evidence

That is the whole architecture. There are no selectors to write, no await page.locator('[data-testid="checkout-button"]').click(), no Page Object Models to maintain. The test is a sentence: "Sign up with a fresh email, complete checkout for one item, confirm the order summary shows 'Order confirmed.'" The agent works out the rest at runtime.

The more sophisticated implementations split that loop across multiple specialized agents — a planner that decomposes the goal, a generator that proposes the next step, a runner that executes, and an analyzer that decides whether to retry, advance, or fail with evidence. The split keeps each agent small and predictable, even as the test surface grows.

Why script-based testing hit a wall#

Playwright winning the framework war did not solve the actual problem teams face with end-to-end testing. It just made a familiar problem 60% less painful: flaky tests caused by timing. The other failure modes are still there.

Selectors rot. Every refactor, every design-system upgrade, every A/B test that renames a CSS class breaks tests that depended on it. In a healthy codebase that ships daily, the maintenance tax compounds quietly. Tricentis put the average team's E2E maintenance time at 50–70% of total QA effort — the rest split between writing new tests and triaging real failures. Their customers using AI agents instead reported an 85% reduction in manual effort and a 60% productivity bump, almost entirely from not maintaining selectors.

Multi-framework reality. The same survey found that 74.6% of QA teams now run two or more automation frameworks. Playwright for the modern web app, Selenium for the one legacy admin panel that nobody wants to touch, a separate tool for mobile. Each one is a different DSL, a different CI configuration, a different mental model. None of them know about each other.

The "did the user actually succeed" problem. A green test suite tells you that 487 assertions passed. It does not tell you whether a real user, hitting your real production deployment, can sign up. That gap is exactly where outages live. The CrowdStrike, Snowflake, and Vercel Dubai incidents of the last 18 months all had passing CI pipelines minutes before the production failure surfaced.

A laptop on a wooden desk showing colorful analytics dashboards and trendlines

The deeper issue is what Applitools called the signal-to-noise problem in their 2026 outlook: as test suites grow, the cost stops being execution time and starts being human attention. A flaky test that fails twice a week trains the team to ignore failures. Once that habit sets in, the suite has become decorative.

Agentic testing does not fix this by adding more tests. It fixes it by removing the layer that produces most of the noise — the brittle selector code itself.

What changes in practice#

Here is the same smoke check, written first as a Playwright test and then as an agentic step.

Playwright:

test('user can sign up and reach dashboard', async ({ page }) => {
  await page.goto('https://app.example.com/signup');
  await page.getByLabel('Email').fill(`qa+${Date.now()}@example.com`);
  await page.getByLabel('Password').fill('Test1234!');
  await page.getByRole('button', { name: 'Create account' }).click();
  await expect(page).toHaveURL(/\/dashboard/);
  await expect(page.getByText('Welcome')).toBeVisible();
});

Agentic:

- goto: https://app.example.com/signup
- step: Sign up with a fresh email and any valid password.
- verify: The dashboard loads and shows a welcome message.

The Playwright version breaks the day someone renames the Create account button to Sign up. The agentic version does not. It re-reads the page, sees a button labeled "Sign up" that visually does the same job, clicks it, and continues.

What you trade for that resilience:

Determinism. A scripted test does the same thing every time. An agentic test does the goal every time, but the path it takes can vary.
Speed per run. A snapshot-and-LLM-call cycle takes a couple of seconds. A scripted click takes milliseconds. Agentic tests are slower per execution — usually fine for post-deploy smoke, sometimes painful for tight inner loops.
Verifiability. This is where the better implementations earn their keep: requiring the agent to quote observable evidence when it declares done — the actual confirmation text, the actual redirect URL — instead of just self-certifying. Without that, agents will happily report success on a page that quietly 500'd.

The new bottleneck, as Applitools put it, is trust. Not whether the test ran, but whether you can believe the result.

What this means for your QA stack#

The pragmatic move in 2026 is not to delete Playwright. It is to recognize that the two approaches are good at different things, and to layer them.

Keep Playwright (or Cypress) for the inner loop. Stable, fast, deterministic checks against components and APIs you control. Things you run on every commit. The script tax is bearable when the surface is small.
Use agentic testing for the outer loop. The post-deploy smoke. The critical user journeys — signup, checkout, login, the one "money flow" your business actually depends on. The tests that you most need to not go stale, and that you most need to run against real production.

In other words: scripts for what you control, agents for what you ship to users. The teams getting the most out of 2026's tooling are not picking sides — they are letting each tool do the job it is actually built for.

The reason this matters is the same reason the framework numbers shifted in the first place. Software is being shipped faster than it can be tested by hand. AI is generating more code than humans can review. Every deploy is a chance for something invisible to break. Whatever you call your testing strategy, it has to keep up.

Playwright winning was not the end of that story. It was the prologue.

At AutoSmoke, we run agentic smoke tests in real Chrome against your production deployments — no scripts, no selectors, evidence-backed pass/fail on the user journeys that matter. Get started free and watch your critical flows after every deploy.