Playwright has become one of the strongest choices for end-to-end testing because it treats the browser like a real user environment instead of a mocked HTTP client. It can drive Chromium, Firefox, and WebKit; record traces; emulate mobile devices; intercept network calls; test multiple tabs; and run the same suite locally, in CI, or through an AI-assisted MCP workflow.
This guide covers a production-ready Playwright setup for website testing: the latest CLI workflow, local debugging, GitHub Actions, the Playwright MCP Server, and complex login scenarios with identity providers such as Microsoft Entra ID, Okta, Auth0, Keycloak, and similar SSO platforms.
1. Why Playwright for Website Testing?
Playwright is useful when your tests need to verify the actual browser experience:
- Navigation, redirects, cookies, storage, sessions, and tabs
- Cross-browser behavior across Chromium, Firefox, and WebKit
- Component interactions that depend on JavaScript and browser APIs
- Screenshots, videos, traces, and HTML reports for debugging
- Authentication flows that involve redirects to external identity providers
- CI/CD quality gates before deployment
It is especially strong for modern applications built with frameworks such as Astro, Next.js, React, Vue, Svelte, Angular, Spring Boot frontends, Laravel, Django, Rails, and static-site generators.
2. Install Playwright with the Latest CLI
For a new project, use the latest initializer:
npm init playwright@latest
For an existing project:
npm install -D @playwright/test
npx playwright install
On Linux CI runners, install browser system dependencies as well:
npx playwright install --with-deps
The initializer typically creates:
playwright.config.ts
tests/
tests-examples/
Common CLI commands:
| Command | Purpose |
|---|---|
npx playwright test | Run the full test suite |
npx playwright test --ui | Open the interactive test runner |
npx playwright test --headed | Run with visible browsers |
npx playwright test --debug | Step through tests with inspector |
npx playwright codegen https://example.com | Generate tests by recording browser actions |
npx playwright show-report | Open the HTML report |
npx playwright show-trace trace.zip | Inspect a trace file |
For day-to-day development, --ui, --debug, and codegen are the fastest way to build reliable tests without guessing selectors.
3. Recommended Project Structure
A maintainable Playwright suite should separate test intent from reusable helpers:
tests/
smoke/
homepage.spec.ts
navigation.spec.ts
auth/
entra-login.spec.ts
okta-login.spec.ts
checkout/
cart.spec.ts
fixtures/
authenticated-page.ts
pages/
LoginPage.ts
DashboardPage.ts
storage/
user.json
playwright.config.ts
Use this structure as the suite grows:
smoke/for fast critical-path checksauth/for identity-provider login and session testspages/for page-object abstractions when flows become repetitivefixtures/for shared authenticated contexts and test setupstorage/for generated storage state files, usually ignored by Git
Keep tests readable. A test should explain the user journey, not the low-level browser mechanics.
4. Core Configuration
A production Playwright configuration usually defines:
- The base URL for local and CI environments
- Browser projects
- Retries in CI
- Trace/video/screenshot behavior
- Web server startup for local app testing
- Reporters for humans and CI systems
Example concepts to include in playwright.config.ts:
import { defineConfig, devices } from "@playwright/test";
export default defineConfig({
testDir: "./tests",
timeout: 30_000,
expect: {
timeout: 5_000,
},
retries: process.env.CI ? 2 : 0,
workers: process.env.CI ? 2 : undefined,
reporter: process.env.CI ? [["github"], ["html"]] : "html",
use: {
baseURL: process.env.BASE_URL || "http://localhost:4321",
trace: "on-first-retry",
screenshot: "only-on-failure",
video: "retain-on-failure",
},
projects: [
{ name: "chromium", use: { ...devices["Desktop Chrome"] } },
{ name: "firefox", use: { ...devices["Desktop Firefox"] } },
{ name: "webkit", use: { ...devices["Desktop Safari"] } },
{ name: "mobile-chrome", use: { ...devices["Pixel 5"] } },
],
webServer: {
command: "npm run dev",
url: "http://localhost:4321",
reuseExistingServer: !process.env.CI,
},
});
For a static site, you may prefer testing the production build:
npm run build
npx serve dist
npx playwright test
Testing the built output catches asset, routing, sitemap, search-index, and static-rendering problems that dev servers can hide.
5. Writing Reliable Tests
Good Playwright tests use user-facing selectors first:
await page.getByRole("link", { name: "Docs" }).click();
await expect(
page.getByRole("heading", { name: "Documentation" })
).toBeVisible();
Selector priority:
getByRole()for accessible UIgetByLabel()for form fieldsgetByText()for stable visible textgetByTestId()for elements without accessible names- CSS selectors only when necessary
Avoid brittle selectors such as deeply nested CSS paths or generated class names. If a test fails because a button moved in the DOM, the selector was probably too coupled to implementation details.
6. Authentication Strategy: Do Not Test the Identity Provider Every Time
Enterprise login is the hardest part of browser testing. Microsoft Entra ID, Okta, Auth0, Keycloak, Ping Identity, and similar providers introduce redirects, MFA prompts, bot detection, device trust policies, conditional access rules, and rate limits.
The key principle:
Test your application’s authentication integration, but do not make every test depend on a live identity-provider login.
Use three layers:
| Layer | Purpose | Frequency |
|---|---|---|
| Mocked or seeded auth state | Fast app behavior tests | Every PR |
| Real login smoke test | Verify SSO integration still works | Limited PR/nightly |
| Manual or secure environment validation | MFA, device trust, conditional access | Release or scheduled |
Most suites should authenticate once, save the browser storage state, and reuse it across tests.
7. Storage State for Authenticated Sessions
Playwright can save cookies and local storage after login:
await page.context().storageState({ path: "tests/storage/user.json" });
Then tests can reuse that state:
test.use({ storageState: "tests/storage/user.json" });
Recommended flow:
- Run a setup project that logs in once.
- Save the storage state.
- Run authenticated tests using that storage state.
- Regenerate state when it expires.
Never commit real session cookies or tokens. Add generated storage files to .gitignore unless they contain only local mock data.
8. Microsoft Entra ID Login Scenarios
Microsoft Entra ID login flows commonly include:
- Redirect from the application to
login.microsoftonline.com - Username entry
- Password entry
- MFA or number matching
- “Stay signed in?” prompt
- Redirect back to the application callback URL
- Application session creation
Practical recommendations:
- Use a dedicated test tenant or test app registration.
- Use a dedicated test user with least privilege.
- Disable MFA only in non-production test tenants when policy allows it.
- Prefer workload-safe test accounts and short-lived secrets.
- Store credentials in CI secrets, not in source code.
- Use a nightly real-login test if Conditional Access makes PR testing unstable.
For apps using OAuth/OIDC, many teams test most behavior by creating a valid app session through a backend test endpoint or seeded database state, then reserve one Playwright test for the full Entra redirect flow.
If MFA is required, do not automate personal MFA devices. Use a controlled test policy, a service-owned test account, or a dedicated pre-authenticated test environment.
9. Okta Login Scenarios
Okta flows are similar but often include organization-specific policies:
- Custom Okta domain redirects
- Identifier-first login
- Password entry
- Okta Verify push, TOTP, WebAuthn, or SMS factors
- App assignment checks
- Redirect back through OIDC or SAML
Recommendations:
- Use a dedicated Okta application for tests.
- Use a dedicated test group and test user.
- Keep app assignment and factor policies deterministic.
- Avoid shared human accounts.
- Prefer API-created users for isolated test environments.
- Run full Okta login smoke tests separately from fast PR checks.
For SAML applications, validate both sides: the browser redirect flow and the application session that is created after the SAML response is consumed.
10. Handling MFA, Captchas, and Conditional Access
Some authentication steps are intentionally hard to automate. That is a security feature, not a Playwright limitation.
Use this decision model:
| Scenario | Recommended approach |
|---|---|
| MFA disabled in test tenant | Automate full login in setup |
| MFA required by policy | Run limited scheduled tests or use controlled test factors |
| Captcha appears | Do not bypass; use a test environment without captcha |
| Conditional Access varies by IP/device | Use stable CI runners or a dedicated test policy |
| Passwordless/WebAuthn required | Prefer mocked app sessions for PR tests and manual release validation |
| External IdP rate limits login | Reuse storage state and reduce login frequency |
Do not weaken production authentication to make tests easier. Instead, design a test environment with explicit policies for automation.
11. GitHub Actions for Playwright
A practical GitHub Actions workflow should:
- Install dependencies from a lockfile
- Install Playwright browsers
- Build or start the application
- Run tests
- Upload HTML reports and traces on failure
Example workflow:
name: Playwright Tests
on:
pull_request:
push:
branches:
- main
jobs:
playwright:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps
- name: Run Playwright tests
run: npx playwright test
env:
BASE_URL: http://localhost:4321
TEST_USERNAME: ${{ secrets.TEST_USERNAME }}
TEST_PASSWORD: ${{ secrets.TEST_PASSWORD }}
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: playwright-report/
retention-days: 7
If your Playwright config starts the app using webServer, the workflow does not need a separate server step. If you start the app manually, make sure the workflow waits until the URL is ready before running tests.
12. Sharding for Faster CI
Large suites should be sharded across multiple jobs:
strategy:
fail-fast: false
matrix:
shardIndex: [1, 2, 3, 4]
shardTotal: [4]
steps:
- run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
Sharding reduces feedback time, but it requires tests to be independent. Avoid shared mutable users, shared carts, shared orders, or shared global state unless each shard receives isolated test data.
13. Playwright MCP Server
The Playwright MCP Server exposes browser automation capabilities through the Model Context Protocol. It lets compatible AI agents inspect pages, click elements, type text, capture snapshots, and verify flows using a real browser session.
Install and run it with the current package:
npx @playwright/mcp@latest
Typical MCP client configuration:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}
Use cases:
- Ask an agent to explore a website and identify broken flows
- Generate Playwright test drafts from real interactions
- Debug selector problems with accessibility snapshots
- Reproduce bugs in a browser instead of describing them manually
- Validate UI changes during development
For CI, keep deterministic Playwright tests as the source of truth. Use MCP to accelerate exploration, debugging, and test authoring.
14. AI-Assisted Test Authoring Workflow
A strong workflow combines MCP exploration with committed Playwright tests:
- Start the local app.
- Connect an MCP-capable agent to Playwright.
- Ask the agent to navigate the user journey.
- Convert the observed flow into a Playwright test.
- Replace fragile selectors with role-based locators.
- Run the test locally with
npx playwright test --debug. - Commit the deterministic test.
- Run it in GitHub Actions.
MCP is excellent for discovery. Your repository should still contain plain Playwright tests that humans can review, version, and run without an AI agent.
15. Testing Complex User Journeys
Complex scenarios need deliberate test-data design:
| Scenario | Pattern |
|---|---|
| Login then dashboard | Save storage state in setup |
| Multi-role authorization | Create separate storage states per role |
| Checkout or payment | Use sandbox payment providers |
| File upload | Store test fixtures in the repo |
| Email verification | Use test inbox APIs or backend test hooks |
| Multi-tab flows | Use Playwright context and page events |
| External redirects | Assert both redirect URL and final app state |
| Feature flags | Set flags explicitly per test environment |
The best tests are independent, repeatable, and explicit about the state they require.
16. Security Best Practices
End-to-end tests often touch sensitive systems. Treat test automation like production code:
- Store secrets only in GitHub Actions secrets or a secure vault.
- Use dedicated test accounts with least privilege.
- Rotate test credentials regularly.
- Never commit storage state containing real tokens.
- Avoid logging passwords, cookies, authorization headers, or ID tokens.
- Separate production and test identity-provider tenants when possible.
- Use short-lived environments for high-risk flows.
- Review traces before uploading them publicly, because traces can contain URLs, form values, screenshots, and network metadata.
For open-source repositories, be extra careful: pull requests from forks do not receive normal repository secrets, and they should not run privileged login flows.
17. Debugging Failures
When a Playwright test fails, inspect artifacts in this order:
- HTML report
- Trace viewer
- Screenshot
- Video
- Console logs
- Network requests
Useful commands:
npx playwright show-report
npx playwright show-trace test-results/path-to-trace.zip
npx playwright test tests/auth/entra-login.spec.ts --headed --debug
The trace viewer is usually the fastest path to the root cause because it shows DOM snapshots, actions, network events, console output, and timing.
18. Common Anti-Patterns
Avoid these mistakes:
- Logging in through the IdP before every test
- Reusing one shared account across all parallel tests
- Depending on test execution order
- Using
waitForTimeout()instead of web-first assertions - Committing real cookies or tokens
- Testing production with destructive test data
- Making PR checks depend on MFA prompts
- Ignoring accessibility selectors
- Treating MCP exploration as a replacement for committed tests
If the suite is flaky, slow, or hard to debug, the problem is usually test data, authentication design, or selectors.
19. Recommended Test Pyramid for Playwright
For most teams:
- Many unit and component tests
- A focused set of Playwright smoke tests
- A smaller set of authenticated role-based journeys
- A tiny number of full external IdP login tests
- Scheduled deep regression suites
Playwright is powerful, but it should not carry every kind of test. Use it where browser realism matters.
Final Thoughts
Playwright is more than a browser automation library. With the latest CLI, GitHub Actions integration, trace-first debugging, and the Playwright MCP Server, it becomes a complete workflow for building, exploring, validating, and continuously testing real websites.
The most important design choice is authentication strategy. For providers such as Microsoft Entra ID and Okta, automate what is stable, isolate what is risky, and avoid turning every test into a live SSO challenge. A fast suite reuses trusted state; a secure suite protects credentials and tokens; a reliable suite keeps full IdP login checks focused and intentional.