π Practice project: VWO Login used as a practice target based on Pramod Dutta's Playwright Automation Mastery 2026 course. No internal systems accessed. All bugs are simulated defects for STLC demonstration only.
I ran a full STLC cycle using AI. Twice. Once manually. Once with MCP.
Here's what I built recently and what it taught me about the future of QA:
I applied all 6 STLC phases to VWO Login Dashboard β manually first, then using Playwright MCP + JIRA MCP.
The difference was stark. Manual took ~90 minutes. MCP took ~20 minutes β and found 43 elements vs 8 from the PRD.
But here's the thing nobody talks about: you can't use MCP well if you don't understand STLC manually first. AI amplifies your thinking. It doesn't replace it.
Model Context Protocol β the architecture behind AI-assisted testing
π§βπ»
You
QA Lead
natural language
β¦
Claude Desktop
LLM + MCP Client
tool calls
π
Playwright MCP
MCP Server
browser control
π
VWO Login
Live Page
β¦
Claude Desktop
LLM + MCP Client
tool calls
π
JIRA MCP
MCP Server
creates tickets
π
KAN-1
Bug Ticket
Key insight: MCP is a standardised protocol. The AI does not hardcode API calls β it discovers available tools at runtime and decides which to call based on your natural language instruction. This is what makes it an Agent, not just a chatbot.
3 Components MCP
Host β Claude Desktop, the application running the LLM
MCP Client β built into Claude Desktop, manages server connections
MCP Server β Playwright or JIRA, exposes tools the AI can call
Tools Available Live
browser_navigate β go to any URL
browser_snapshot β extract full DOM accessibility tree
browser_click, browser_fill β interact with elements
createJiraIssue β log bugs directly to JIRA board
LLM vs AI Agent
What changes when you connect an LLM to tools
LLM only text in β text out
Answers questions about Playwright
Generates test case templates
Explains STLC concepts
Cannot navigate a real browser
Cannot create a real JIRA ticket
Cannot read live DOM structure
AI Agent (LLM + MCP) acts in the world
Navigates to app.vwo.com/#/login
Extracts 43 real elements from live DOM
Creates KAN-1 bug ticket in JIRA
Generates locators from actual page structure
Runs STLC phases using real tool calls
Decides which tool to call based on intent
The formula: Agent = LLM + Tools + Decision loop.
MCP is the standard that makes connecting tools to LLMs reliable and scalable. Without MCP, each tool integration required custom code. With MCP, any compatible tool connects through the same protocol.
// Without MCP β you write this constresponse = awaitfetch('https://api.atlassian.com/jira/issues', {
method: 'POST', headers: { Authorization: 'Bearer token' },
body: JSON.stringify({ fields: { summary: '...' } })
});
// With MCP β Claude decides and calls // You just say: "Create a bug for the password validation issue" // Claude calls: createJiraIssue({ cloudId, projectKey, summary, ... })
REST API vs MCP
Two ways to connect software β fundamentally different philosophies
REST API code-to-service
Your code calls a specific endpoint
You must know the exact URL and parameters
Response format is fixed β JSON or XML
You write the integration logic
Each service has different auth patterns
Error handling is your responsibility
MCP AI-to-tool
AI discovers available tools at runtime
AI decides which tool to call from intent
Standardised protocol across all tools
AI writes the integration logic dynamically
Single connection pattern for any MCP server
AI handles sequencing of multiple calls
Analogy: REST API is like calling a specific department by dialling their direct number β you need to know the number. MCP is like telling a smart assistant "arrange a meeting" β it figures out which departments to call, in what order, and handles the back-and-forth.
JIRA via REST
# Step 1 β get project ID GET /rest/api/3/project
# Step 2 β get issue type ID GET /rest/api/3/issuetype
# You say: "Create a High priority bug in VWO Login STLC project β password accepts abc"
# Claude calls in sequence: getAccessibleAtlassianResources() getVisibleJiraProjects(...) getJiraProjectIssueTypesMetadata(...) createJiraIssue(...)
Manual vs MCP β Side by Side
Same STLC. Same project. Measured difference.
Manual (Block A)
MCP-Assisted
Req. Analysis
30 min Β· 8 elements
2 min Β· 43 elements
Test Planning
20 min
5 min
Test Case Design
30 min Β· 5 TCs
10 min Β· 8 TCs
Bug Reporting
10 min Β· manual JIRA
1 min Β· JIRA MCP
Total
~90 min
~20 min Β· 4.5Γ faster
The important caveat: MCP found 43 elements vs 8 in the PRD β including 4 hidden forms the documentation never mentioned. But you cannot validate these findings without understanding what good test cases look like. Manual first. MCP second. Always.
STLC β 6 Phases Applied to VWO Login
Each phase produces a real artifact. Each artifact is traceable.
PHASE 01
Requirement Analysis
β 43 elements via MCP snapshot
PHASE 02
Test Planning
β Scope, risks, entry/exit criteria
PHASE 03
Test Case Design
β 8 TCs with exact locators
PHASE 04
Test Execution
β POM + 13 Playwright tests
PHASE 05
Defect Reporting
β KAN-1 via JIRA MCP
PHASE 06
Test Closure
β Report + comparison
What makes this different: Block A ran all phases manually using the VWO PRD. The STLC MCP Project ran the same phases using live MCP tools. Both are documented side by side in the GitHub repo β making the comparison concrete and verifiable.
# The complete pipeline
PRD Read (Manual)
β Live DOM Snapshot (Playwright MCP)
β Test Plan β 8 Test Cases β POM Spec
β Bug KAN-1 (JIRA MCP)
β Closure Report β GitHub β
The Portfolio Repository
github.com/somasaic/sdet-stlc-portfolio
Block_A_Manual/ Traditional
01_Requirement_Analysis.md
02_Test_Plan.md
03_Test_Cases.md
04_Bug_Report.md
05_Severity_Priority.md
06_Regression_Retesting.md
docs/Block_B_Automation.md
STLC_MCP_Project/ AI-Assisted
01_Requirement_Analysis/vwo_live_elements.md
02_Test_Plan/test_plan.md
03_Test_Cases/test_cases.md
04_Test_Execution/pages/LoginPage.ts
04_Test_Execution/tests/vwo_login.spec.ts
05_Defect_Reports/BUG_Login_PWD001.md
06_Test_Closure/closure_report.md
⬑
somasaic/sdet-stlc-portfolio
STLC applied to real projects β Manual QA + Playwright MCP + JIRA MCP
Same VWO login. Same 6 STLC phases. Completely different execution. Each approach adds a skill the previous couldn't demonstrate.
APPROACH 1
Block_A_Manual
PRD read β 8 elements found
Test cases hand-written
Bug report in Word doc
~90 min total
No CI/CD pipeline
Skill: QA process thinking
APPROACH 2
STLC_MCP_Project
Live DOM β 43 elements
AI writes test cases
KAN-1 via JIRA MCP
~20 min Β· 4.5Γ faster
5-browser CI pipeline
Skill: AI agent orchestration
APPROACH 3
Standard CLI
POM β getByRole locators
18/18 β 3 browsers
codegen for selectors
GitHub Actions CI green
HTML report artifact
Skill: pure engineering
APPROACH 4
Playwright CLI
UI + API in one project
request fixture β no browser
testData.ts β typed inputs
20/20 Β· 14Γ API speed
KAN-2 via JIRA MCP
Skill: framework depth + API
LATEST
APPROACH 5
AI Agents
Planner β Generator β Healer
AI plans + writes tests
Self-healing on failure
3/3 visual regression
seed.spec.ts bootstrap
Skill: autonomous AI testing
Dimension
Manual
MCP
Standard CLI
Playwright CLI
AI Agents
Tool
None
Claude + MCP servers
npx playwright
npx playwright + request
planner + generator + healer
Test types
None
UI
UI
UI + API
UI + Visual Regression
Speed
~90 min
~20 min
~90s CI run
3.9s API Β· 54s UI
48s visual Β· auto-generated
Bugs logged
Word doc
KAN-1 via JIRA MCP
KAN-1 reference
KAN-2 via JIRA MCP
KAN-3 healer-caught
Who writes tests
You (manually)
You (with AI assist)
You (pure code)
You (framework)
AI agents (autonomous)
New skill added
Process
AI orchestration
POM + CI/CD
API testing + edge cases
Autonomous gen + visual reg
The key insight: The STLC phases never change β Requirement Analysis, Test Planning, Test Design, Execution, Bug Reporting, Closure. What changes is the execution mode. Manual tests your judgment. MCP tests your process. Standard CLI tests your engineering. Playwright CLI tests your framework depth. AI Agents tests whether you can let the AI work and know when to intervene. An SDET needs to operate fluently in all five.
API Testing β From Zero to SDET Level
What it is, why it matters, and how Playwright handles it natively
UI Test browser required
Playwright opens a real browser (Chromium)
Loads app.vwo.com in that browser
Finds DOM elements, clicks, fills
Asserts on what the user sees
5 to 30 seconds per test
Fragile to CSS/DOM changes
API Test no browser at all
request fixture β direct HTTP to server
No browser launched, no page loaded
Sends HTTP request, reads JSON response
Asserts on status code + body + schema
200 to 500ms per test β 14Γ faster
Stable β tests API contract not visuals
THREE ASSERTION LEVELS β every API test needs all three
204 No Content = no body. Never call response.json() on DELETE. It throws because the body is empty.
expect(response.status()).toBe(204); // do NOT call response.json() here
STATUS CODE RANGES
2xx β success (200 OK, 201 Created, 204 No Content) 4xx β client error YOUR fault (400, 401, 403, 404) 5xx β server error THEIR fault (500, 502, 503) 404 as PASS β negative tests assert 404 intentionally
Why API testing is the market gap: 80% of SDET job descriptions ask for API testing. Most candidates with 1-2 years experience only have UI automation. The request fixture in Week 2 closes this gap entirely β same Playwright framework, same TypeScript, same CI pipeline. One project that proves both.
page fixture vs request fixture
The most important Playwright distinction for SDET interviews
// Week 2 β testData.ts, typed, DRY import { apiData, uiData } from'../../data/testData'; await request.post(endpoints.login, { data: apiData.validLogin }); // One file change updates every test that uses this credential
Interview answer: "page fixture opens a real browser and tests the DOM layer β what users see and interact with. request fixture makes direct HTTP calls with no browser β it tests the API contract: status codes, response schemas, and error handling. API tests run 14Γ faster. I use both in the same project because they test different layers of the same feature."
Three CLI Tools β npx playwright vs MCP vs @playwright/cli
Playwright has three distinct execution modes β each serves a different purpose
TOOL 1 Β· STANDARD
npx playwright
Ships with @playwright/test
Test runner β runs spec files
codegen β record interactions
show-report, show-trace
--grep --project --debug --ui
CI/CD focused, one-shot per run
Used in: Week 1b + Week 2
TOOL 2 Β· AI AGENT
Playwright MCP
@playwright/mcp β JSON-RPC over stdio
AI calls browser_snapshot, browser_click
Snapshots injected INTO context window
~115K tokens per 30 actions
Per-call browser lifetime
Best for: live interactive exploration
Used in: Week 1a (STLC_MCP_Project)
TOOL 3 Β· AI EFFICIENT Β· NEXT
@playwright/cli
Microsoft's new AI agent CLI
playwright-cli open Β· snapshot Β· click
Snapshots saved to DISK as YAML/PNG
~25K tokens Β· 4.6Γ MCP savings
Persistent daemon via Unix socket
Best for: complex multi-step AI automation
Planned: Week 3/4 AI_Agentic project
TOKEN USAGE COMPARISON β per 30 actions
Playwright MCP
~115,000 tokens (context window)
@playwright/cli
~25,000 tokens (disk snapshots) Β· 4.6Γ saving
npx playwright
0 tokens β traditional test runner, no LLM
Why MCP burns tokens
Every browser_snapshot call injects the full page accessibility tree directly into the LLM context window. After 15+ steps, the context carries 90K+ tokens of stale snapshots from pages the agent already left. The model loses track of what is current.
Why @playwright/cli solves it
Snapshots write to disk as YAML/PNG files. The context window never sees them unless the agent explicitly reads a specific file. The model only loads what it needs right now. Persistent Unix socket sessions mean the browser stays alive between commands β no re-launch overhead.
The progression logic: Standard CLI (Week 1b) β MCP (Week 1a) β @playwright/cli (Week 3/4). Each mode has a clear use case. Real SDET teams use all three depending on context: standard CLI for CI/CD, MCP for interactive exploration, @playwright/cli for AI agent automation at scale.
# @playwright/cli β AI agent commands
playwright-cli open https://app.vwo.com/#/login
playwright-cli snapshot# writes YAML to disk, NOT context window
playwright-cli click e15 # element ref from snapshot
playwright-cli fill e22 "test@wingify.com"
playwright-cli screenshot# saves PNG to disk
# Token cost: ~25K vs ~115K for MCP β same task, 4.6Γ cheaper
LLM β What It Can and Cannot Do
Understanding the boundaries is what separates an SDET from someone who just prompts
What LLMs do well text in β text out
Understand natural language instructions precisely
Generate code, test cases, docs from a description
Reason about text β compare, summarise, classify
Pattern-match from billions of training examples
Produce structured output (JSON, Markdown, TypeScript)
Chain reasoning steps β think before answering
Hard limitations without tools
No memory β every conversation starts blank. No state between sessions.
No tools β cannot open a browser, read a file, call an API by itself
No real-time data β knowledge has a cutoff date, cannot fetch live DOM
No execution β can write code but cannot run it and see the output
No persistence β cannot save files, write to disk, modify state
Context limit β finite window. Too much input = early content dropped
THE MEMORY PROBLEM β WHY IT MATTERS IN TESTING
No short-term memory
Within one session the LLM sees everything in the context window. But it cannot "remember" what it clicked 10 steps ago unless that snapshot is still in context.
No long-term memory
Close the session, start again β zero memory. The LLM has no idea it already explored VWO login yesterday. Every run starts from scratch.
Solution: external memory
Agents compensate by writing to disk β specs/, snapshots, test files. The filesystem becomes the LLM's long-term memory. This is exactly what the planner does.
Why this matters for SDET work: An LLM alone is a text transformer. It can describe a test β it cannot run one, verify a selector exists, or confirm a button is actually clickable. The moment you add tools (MCP, browser control, file I/O), you convert the LLM from a text generator into an agent that acts on the real world. That gap between "generating test ideas" and "generating verified, runnable tests" is exactly what Playwright AI Agents bridge.
AI Agent Architecture β Think, Act, Observe
What makes something an agent rather than just an LLM call
π§
LLM (Brain)
Receives the prompt + tool results. Reasons about what to do next. Decides which tool to call and with what arguments. Produces the plan or code output.
Claude Sonnet / GPT-4
π§
Tools (Hands)
Browser control, file read/write, API calls, terminal commands. Tools are the only way the LLM can affect the outside world. Without tools it can only produce text.
MCP servers, browser_*, file I/O
πΎ
Memory (State)
Context window (short-term) + file system (long-term). The agent writes its discoveries to disk so later steps can read them. Specs, screenshots, test files are all memory.
// Ask Claude to write a test β one shot "Write a Playwright test for VWO login" // β Claude produces text. Done. // No browser opened, no selector verified, // no guarantee it actually works.
Agent β tool loop
// Planner agent loop planner_setup_page() β runs seed.spec.ts browser_snapshot() β reads live DOM browser_click("Forgot Password") browser_snapshot() β reads new state write_file("specs/plan.md", plan) // Verified against real page. Saved to disk.
The formula: Agent = LLM + Tools + Memory + Loop. Remove any one of the four and you no longer have an agent β you have a text generator. The Playwright AI Agents (planner, generator, healer) implement all four: Claude is the LLM, MCP tools are the hands, specs/ and tests/ are the memory, and the planner β generator β healer sequence is the loop.
Playwright AI Agents β Planner, Generator, Healer
Microsoft's built-in agent system for autonomous test creation and self-healing
AGENT 1 β PLANNER
Explores β Plans
Calls planner_setup_page β runs seed.spec.ts
browser_snapshot β reads live DOM structure
Navigates all flows β login, errors, edge states
Writes human-readable Markdown test plan
Input: seed.spec.ts + your prompt
Output: specs/vwo_login_plan.md
seed.spec.ts is NOT a test β it is a browser bootstrap. Before the planner or generator starts exploring, it calls planner_setup_page which runs seed.spec.ts first. This opens a browser, navigates to the target URL, and then calls page.pause() β handing the live browser session to the agent.
Without page.pause(), the browser closes as soon as the test ends. The agent has nothing to explore. The pause keeps the session alive and transfers control.
// confirm page is ready awaitexpect(emailInput).toBeVisible();
await page.pause(); // β agent takes control here // browser stays open // agent starts exploring
});
Why init-agents? Running npx playwright init-agents --loop=claude writes three Markdown files into .claude/agents/. These are agent definition files β they contain the system prompts and tool lists that tell Claude Code how to behave as a planner, generator, or healer. Claude Code reads them automatically when you open the project. You never edit them β regenerate when Playwright is updated.
Playwright Agents vs Playwright MCP β Why They Are Different
Both use MCP under the hood β but they solve completely different problems
Playwright MCP Week 1a β exploration
Purpose: Let an AI agent explore a live app interactively
YOU give a natural language instruction per step
Claude Desktop calls browser_snapshot, browser_click
Snapshot injected into context window each call
Output: you read the response and decide next step
~115K tokens per 30 actions β context fills fast
No structured output β conversational, ad hoc
Used for: Phase 1 requirement extraction, JIRA tickets
Playwright AI Agents Week 3/4 β autonomous
Purpose: Autonomously plan, generate, and heal tests
YOU give ONE high-level prompt β agent decides all steps
Agent definitions in .claude/agents/ guide behaviour
Deterministic β same input β same structured output
Used for: all 6 STLC phases, fully automated
THEY BOTH USE MCP β SO WHAT'S DIFFERENT?
Playwright MCP is a server β it exposes browser control tools (browser_snapshot, browser_click, browser_fill) via the MCP protocol. Any MCP client can use it.
Playwright AI Agents are clients with structured roles. The planner agent calls planner_setup_page which internally uses the same MCP browser tools β but wraps them in a deliberate loop with a defined output format (Markdown plan). The generator similarly uses generator_setup_page to produce TypeScript files.
Analogy: MCP is electricity. The agents are appliances. The planner is a camera that uses electricity to take a structured photo. The generator is a printer that uses electricity to produce a document. Both use the same power source β but they do completely different jobs.
MCP alone
Interactive, conversational. You drive every step. Flexible but manual. Good for exploration and one-off tasks.
Agents using MCP
Autonomous, structured. Agent drives all steps. Consistent output format. Good for repeatable workflows like STLC.
Both together
Use MCP for interactive exploration (Week 1a), then agents for systematic generation (Week 3/4). Different phases of the same STLC.
The interview answer: "Playwright MCP is a browser control server β it exposes tools any AI can call. Playwright AI Agents are structured workflows built on top of MCP. The planner agent uses MCP browser tools internally but wraps them in a deliberate loop that produces a Markdown test plan. The generator converts that plan into verified TypeScript tests by checking every selector live. The healer uses the same tools to replay failures and patch broken locators. They are not alternatives β they are layers. MCP is the infrastructure. Agents are the application built on it."
Visual Regression Testing β toHaveScreenshot()
The Week 3/4 key addon β pixel-level UI verification that no previous approach covers
Functional test what it can't catch
Login button text changed from "Sign in" to "Log in"
Error message colour changed from red to orange
Input field border disappeared in a CSS deploy
Password field moved 20px to the right on mobile
VWO logo replaced with placeholder image
All functional tests still PASS despite these issues
Visual regression what it catches
Pixel-level diff β any visual change triggers failure
Baseline PNG stored in repo β version controlled
Diff image shows exactly what changed in red
Runs in CI on every push β catches regressions before merge
Clips to stable elements β excludes dynamic backgrounds
OS + browser tagged β chromium-win32.png, chromium-linux.png
TWO PHASES β HOW toHaveScreenshot() WORKS
PHASE 1 β BASELINE CREATION (first run)
No PNG exists yet. Playwright takes a screenshot and saves it to tests/visual/login_visual.spec.ts-snapshots/. Test "fails" with message "snapshot doesn't exist, writing actual". This is correct β run --update-snapshots to promote to baseline.
PHASE 2 β COMPARISON (every run after)
Baseline exists. Playwright takes a new screenshot and compares pixel by pixel against the stored PNG. If difference exceeds maxDiffPixels: 200, test FAILS with a diff image showing changed pixels highlighted in red/pink.
THE VWO ANIMATED BACKGROUND PROBLEM β AND HOW WE SOLVED IT
THE PROBLEM
VWO login has a CSS animated background that changes every render. Full-page screenshots showed 65,000β69,000 pixel diffs between runs taken seconds apart β not because the UI changed, but because the background animation was at a different frame.
Result: only the login form is captured. The animated background is outside the clip rectangle β it never appears. 3/3 tests now pass stably across runs. This is documented engineering decision-making β not just "it works now."
TC-VR-01
Default login state
vwo-login-default-chromium-win32.png
TC-VR-02
Error state after bad login
vwo-login-error-state-chromium-win32.png
TC-VR-03
Email field filled state
vwo-login-email-filled-chromium-win32.png
Why visual regression is the right Week 3/4 addon: Week 2 closed the API testing gap. Week 3/4 closes the visual regression gap. Together: functional UI tests (Weeks 1-3), API contract tests (Week 2), visual regression (Week 3/4). That is a complete test pyramid. No previous approach in this portfolio covers what a pixel-level regression looks like β and 80% of SDET job descriptions for product companies mention it.
Portfolio Progression β 5 Approaches
WEEK 0 Β· APPROACH 1
Block_A_Manual
Manual STLC β all 6 phases on VWO Login PRD. No automation. Pure QA process thinking.
ManualSTLCPRD
Done
WEEK 1A Β· APPROACH 2
STLC_MCP_Project
Playwright MCP + JIRA MCP. 43 DOM elements. 13 tests, 5 browsers. KAN-1 via MCP. 4.5Γ speed.
What this proves: You can build a production-grade Playwright framework from a blank folder β no generator, no plugin, no AI assistance. Every file written with full understanding of why each line exists. The RTM chain is complete: requirement β test case β automated test β HTML report row β CI green.
β TC-UI-02 to 04 β EP: valid/invalid credentials
β TC-UI-05 to 06 β BVA: empty/partial inputs
β TC-UI-07 β SQL injection input (edge)
β TC-UI-08 β 500-char boundary string
β TC-UI-09 β special chars in password
β TC-UI-10 β whitespace-only inputs
10 tests Β· 53.9s Β· page fixture Β· browser
API SUITE β tests/api/ (reqres.in)
β TC-API-01 β POST /login valid β 200 + token
β TC-API-02 β POST /login missing password β 400
β TC-API-03 β POST /login wrong creds β 400
β TC-API-04 β POST /register valid β 200 + id
β TC-API-05 β POST /register missing pw β 400
β TC-API-06/07 β GET users list + single
β TC-API-08/09/10 β 404 Β· PUT Β· DELETE 204
10 tests Β· 3.9s Β· request fixture Β· no browser
NEW CONCEPTS IN WEEK 2 (vs Week 1)
request fixtureGET POST PUT DELETEtestData.ts interfacesextraHTTPHeadersdual project configdotenv + GitHub Secrets3-level assertions204 no-content ruleschema validationKAN-2 via JIRA MCP
KAN-2 β logged via JIRA MCP: POST /api/register returns 200 instead of 201. Per RFC 7231, resource creation should return 201 Created. The bug test intentionally FAILS β that is the correct result. It proves the bug exists. The companion test PASSES and documents actual behaviour. Same JIRA MCP approach used for KAN-1 in Week 1.
Replays failing steps. Inspects current DOM. Patches locator or assertion. Re-runs until passing. Self-healing automation.
Input: failing test β Output: patched passing test
seed.spec.ts is not a regular test. Before the planner or generator explores the browser, it runs seed.spec.ts via the planner_setup_page and generator_setup_page tools. The seed navigates to the target and calls page.pause() β keeping the browser alive and handing the session to the agent to explore. Without pause() the browser closes immediately.
VWO has a dynamic animated background β clipping to form bounding box gives stable baselines. 3/3 passing. PNG files committed to repo. CI compares on every push.
# How visual regression works β two phases
# Phase 1 β create baselines (first run)
npx playwright test tests/visual/ --update-snapshots # β saves vwo-login-default-chromium-win32.png to snapshots/
# Phase 2 β comparison (every run after)
npx playwright test tests/visual/ # β compares pixel-by-pixel against baseline # β fails with diff image if VWO changes their UI
# seed.spec.ts β hands browser to agent test('seed', async ({ page }) => { await page.goto('/#/login'); await page.waitForLoadState('networkidle'); await page.pause(); β agent takes control here
});