windows-desktop-e2e

E2E testing for Windows native desktop apps (WPF, WinForms, Win32/MFC, Qt) using pywinauto and Windows UI Automation.

INSTALLATION
npx skills add https://github.com/affaan-m/everything-claude-code --skill windows-desktop-e2e
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Core Concepts

All Windows desktop automation relies on UI Automation (UIA), a Windows-built-in accessibility API. Every supported framework exposes a tree of UIA elements with properties Claude can read and act on:

Your test (Python)

    └── pywinauto (UIA backend)

        └── Windows UI Automation API   ← built into Windows, framework-agnostic

            └── App's UIA provider      ← each framework ships its own

                └── Running .exe

UIA quality by framework:

Framework

AutomationId

Reliability

Notes

WPF

★★★★★

Excellent

x:Name maps directly to AutomationId

WinForms

★★★★☆

Good

AccessibleName = AutomationId

UWP / WinUI 3

★★★★★

Excellent

Full Microsoft support

Qt 6.x

★★★★★

Excellent

Accessibility enabled by default; class names change to Qt6*

Qt 5.15+

★★★★☆

Good

Improved Accessibility module

Qt 5.7–5.14

★★★☆☆

Fair

Needs QT_ACCESSIBILITY=1; objectName manual

Win32 / MFC

★★★☆☆

Fair

Control IDs accessible; text matching common

Setup & Prerequisites

# Python 3.8+, Windows only

pip install pywinauto pytest pytest-html Pillow pytest-timeout

# Optional: screen recording

# Install ffmpeg and add to PATH: https://ffmpeg.org/download.html

Verify UIA is reachable:

from pywinauto import Desktop

Desktop(backend="uia").windows()  # lists all top-level windows

Install Accessibility Insights for Windows (free, from Microsoft) — your DevTools equivalent for inspecting the UIA element tree before writing any test.

Testability Setup (by Framework)

The single most impactful thing you can do is give every interactive control a stable AutomationId before writing tests.

WPF

<!-- XAML: x:Name becomes AutomationId automatically -->

<TextBox x:Name="usernameInput" />

<PasswordBox x:Name="passwordInput" />

<Button x:Name="btnLogin" Content="Login" />

<TextBlock x:Name="lblError" />

WinForms

// Set in designer or code

usernameInput.AccessibleName = "usernameInput";

passwordInput.AccessibleName = "passwordInput";

btnLogin.AccessibleName = "btnLogin";

lblError.AccessibleName = "lblError";

Win32 / MFC

// Control resource IDs in .rc file are exposed as AutomationId strings

// IDC_EDIT_USERNAME -> AutomationId "1001"

// Prefer SetWindowText for Name; add IAccessible for richer support

Qt — see dedicated section below

Page Object Model

tests/

├── conftest.py          # app launch fixture, failure screenshot

├── pytest.ini

├── config.py

├── pages/

│   ├── __init__.py      # required for imports

│   ├── base_page.py     # locators, wait, screenshot helpers

│   ├── login_page.py

│   └── main_page.py

├── tests/

│   ├── __init__.py

│   ├── test_login.py

│   └── test_main_flow.py

└── artifacts/           # screenshots, videos, logs

base_page.py

import os, time

from pywinauto import Desktop

from config import ACTION_TIMEOUT, ARTIFACT_DIR

class BasePage:

    def __init__(self, window):

        self.window = window

    # --- Locators (priority order) ---

    def by_id(self, auto_id, **kw):

        """AutomationId — most stable. Use as first choice."""

        return self.window.child_window(auto_id=auto_id, **kw)

    def by_name(self, name, **kw):

        """Visible text / accessible name."""

        return self.window.child_window(title=name, **kw)

    def by_class(self, cls, index=0, **kw):

        """Control class + index — fragile, avoid if possible."""

        return self.window.child_window(class_name=cls, found_index=index, **kw)

    # --- Waits ---

    def wait_visible(self, spec, timeout=ACTION_TIMEOUT):

        spec.wait("visible", timeout=timeout)

        return spec

    def wait_gone(self, spec, timeout=ACTION_TIMEOUT):

        spec.wait_not("visible", timeout=timeout)

        return spec

    def wait_window(self, title, timeout=ACTION_TIMEOUT):

        """Wait for a new top-level window (dialogs, child windows)."""

        dlg = Desktop(backend="uia").window(title=title)

        dlg.wait("visible", timeout=timeout)

        return dlg

    def wait_until(self, fn, timeout=ACTION_TIMEOUT, interval=0.3):

        """Poll an arbitrary condition — use when UIA events are unreliable."""

        deadline = time.time() + timeout

        while time.time() < deadline:

            try:

                if fn():

                    return True

            except Exception:

                pass

            time.sleep(interval)

        raise TimeoutError(f"Condition not met within {timeout}s")

    # --- Actions ---

    def click(self, spec):

        self.wait_visible(spec)

        spec.click_input()

    def type_text(self, spec, text):

        self.wait_visible(spec)

        ctrl = spec.wrapper_object()

        try:

            ctrl.set_edit_text(text)

        except Exception as e:

            # Qt 5.x fallback: UIA Value Pattern may be incomplete

            import sys, pywinauto.keyboard as kb

            print(f"[windows-desktop-e2e] set_edit_text failed ({e}), using keyboard fallback", file=sys.stderr)

            ctrl.click_input()

            kb.send_keys("^a")

            kb.send_keys(text, with_spaces=True)

    def get_text(self, spec):

        ctrl = spec.wrapper_object()

        for attr in ("window_text", "get_value"):

            try:

                v = getattr(ctrl, attr)()

                if v:

                    return v

            except Exception:

                pass

        return ""

    # --- Artifacts ---

    def screenshot(self, name):

        os.makedirs(ARTIFACT_DIR, exist_ok=True)

        path = os.path.join(ARTIFACT_DIR, f"{name}.png")

        self.window.capture_as_image().save(path)

        return path

login_page.py

from pages.base_page import BasePage

class LoginPage(BasePage):

    @property

    def username(self): return self.by_id("usernameInput")

    @property

    def password(self): return self.by_id("passwordInput")

    @property

    def btn_login(self): return self.by_id("btnLogin")

    @property

    def error_label(self): return self.by_id("lblError")

    def login(self, user, pwd):

        self.type_text(self.username, user)

        self.type_text(self.password, pwd)

        self.click(self.btn_login)

    def login_ok(self, user, pwd, main_title="Main Window"):

        self.login(user, pwd)

        return self.wait_window(main_title)

    def login_fail(self, user, pwd):

        self.login(user, pwd)

        self.wait_visible(self.error_label)

        return self.get_text(self.error_label)

conftest.py

For new projects prefer the Tier 1 sandbox fixture (see below) — it adds filesystem isolation at zero extra cost. This basic fixture is for minimal/legacy setups only.

import os, pytest

os.environ["QT_ACCESSIBILITY"] = "1"  # Required for Qt 5.x UIA support

from pywinauto import Application

from config import APP_PATH, MAIN_WINDOW_TITLE, LAUNCH_TIMEOUT, ARTIFACT_DIR

@pytest.fixture

def app(request):

    if not APP_PATH:

        pytest.exit("APP_PATH environment variable is not set", returncode=1)

    proc = Application(backend="uia").start(APP_PATH, timeout=LAUNCH_TIMEOUT)

    win  = proc.window(title=MAIN_WINDOW_TITLE)

    win.wait("visible", timeout=LAUNCH_TIMEOUT)

    yield win

    # Screenshot on failure

    if getattr(getattr(request.node, "rep_call", None), "failed", False):

        os.makedirs(ARTIFACT_DIR, exist_ok=True)

        try:

            win.capture_as_image().save(

                os.path.join(ARTIFACT_DIR, f"FAIL_{request.node.name}.png")

            )

        except Exception:

            pass

    # Graceful exit first, force-kill as fallback

    # proc is a pywinauto Application — use wait_for_process_exit(), not wait_for_process()

    try:

        win.close()

        proc.wait_for_process_exit(timeout=5)

    except Exception:

        proc.kill()

@pytest.hookimpl(tryfirst=True, hookwrapper=True)

def pytest_runtest_makereport(item, call):

    outcome = yield

    setattr(item, f"rep_{outcome.get_result().when}", outcome.get_result())

config.py

import os

APP_PATH          = os.environ.get("APP_PATH", "")           # set via env — no default path

MAIN_WINDOW_TITLE = os.environ.get("APP_TITLE", "")

LAUNCH_TIMEOUT    = int(os.environ.get("LAUNCH_TIMEOUT", "15"))

ACTION_TIMEOUT    = int(os.environ.get("ACTION_TIMEOUT", "10"))

ARTIFACT_DIR      = os.path.join(os.path.dirname(__file__), "artifacts")

pytest.ini

[pytest]

testpaths = tests

markers =

    smoke: fast smoke tests for critical paths

    flaky: known-unstable tests

addopts = -v --tb=short --html=artifacts/report.html --self-contained-html

Locator Strategy

AutomationId  >  Name (text)  >  ClassName + index  >  XPath

  (stable)         (readable)       (fragile)           (last resort)

Inspect with Accessibility Insights → Properties pane → look for AutomationId first.

# Inspect at runtime — paste into a REPL to explore the tree

win.print_control_identifiers()

# or narrow scope:

win.child_window(auto_id="groupBox1").print_control_identifiers()

Wait Patterns

# Wait for control to appear

page.wait_visible(page.by_id("statusLabel"))

# Wait for control to disappear (e.g. loading spinner)

page.wait_gone(page.by_id("spinnerOverlay"))

# Wait for a dialog to pop up

dlg = page.wait_window("Confirm Delete")

# Custom condition (e.g. text changes)

page.wait_until(lambda: page.get_text(page.by_id("lblStatus")) == "Ready")

**Never use time.sleep() as primary synchronization** — use wait() or wait_until().

Artifact Management

# Screenshot on demand

page.screenshot("after_login")

# Full-screen capture (when window is off-screen or minimised)

import pyautogui

pyautogui.screenshot("artifacts/fullscreen.png")

# Screen recording with ffmpeg (start before test, stop after)

import subprocess

def start_recording(name):

    return subprocess.Popen([

        "ffmpeg", "-f", "gdigrab", "-framerate", "10",

        "-i", "desktop", "-y", f"artifacts/videos/{name}.mp4"

    ], stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

def stop_recording(proc):

    proc.stdin.write(b"q"); proc.stdin.flush(); proc.wait(timeout=10)

Per-Step Trace (opt-in)

The default failure screenshot is often too thin for diagnosing flaky tests. The step-level trace below is off by default — enable it only when reproducing a flaky case.

Enable

E2E_TRACE=1 pytest tests/test_login.py -v

# Include typed text in the JSONL log (DO NOT use on tests that type credentials/PII):

E2E_TRACE=1 E2E_TRACE_INCLUDE_TEXT=1 pytest ...

Patch into BasePage

import os, json, time

TRACE_ENABLED      = os.environ.get("E2E_TRACE") == "1"

TRACE_INCLUDE_TEXT = os.environ.get("E2E_TRACE_INCLUDE_TEXT") == "1"

class BasePage:

    _step = 0

    def _trace(self, action, spec=None, text=None):

        if not TRACE_ENABLED:

            return

        BasePage._step += 1

        idx = f"{BasePage._step:03d}"

        os.makedirs(ARTIFACT_DIR, exist_ok=True)

        try:

            self.window.capture_as_image().save(

                os.path.join(ARTIFACT_DIR, f"step_{idx}_{action}.png"))

        except Exception:

            pass  # capture failure must not break the test

        rec = {

            "ts": time.time(), "step": BasePage._step, "action": action,

            "locator": getattr(spec, "criteria", None),

            "text": text if TRACE_INCLUDE_TEXT else ("<redacted>" if text else None),

        }

        with open(os.path.join(ARTIFACT_DIR, "trace.jsonl"), "a") as f:

            f.write(json.dumps(rec) + "\n")

    def click(self, spec):

        self.wait_visible(spec); self._trace("click_before", spec)

        spec.click_input();      self._trace("click_after",  spec)

    def type_text(self, spec, text):

        self.wait_visible(spec); self._trace("type_before", spec, text)

        # ... existing set_edit_text / keyboard fallback ...

        self._trace("type_after", spec)

Caveats

  • PII / credentials: type_text content is <redacted> by default. Never set E2E_TRACE_INCLUDE_TEXT=1 on login or payment flows.
  • Overhead: ~50–200ms per action + one PNG per step on disk. Don't enable on the default CI matrix — only on a dedicated flake-repro job.
  • Artifact bloat: a long flow produces tens of MB; tune retention-days accordingly.
  • Parallel/rerun hygiene: this simple example appends to trace.jsonl and uses a class-level counter. Clear the artifact directory before reruns, and use per-worker artifact dirs for parallel tests.
  • Coverage gap: actions performed outside BasePage (raw pywinauto calls in test code) are not traced.

Flaky Test Handling

# Quarantine — equivalent to Playwright's test.fixme()

@pytest.mark.skip(reason="Flaky: animation race on slow CI. Issue #42")

def test_animated_transition(self, app): ...

# Skip in CI only

@pytest.mark.skipif(os.environ.get("CI") == "true", reason="Flaky in CI #43")

def test_heavy_load(self, app): ...

Common causes and fixes:

Cause

Fix

Control not ready

Replace time.sleep with wait_visible

Window not focused

Add win.set_focus() before interactions

Animation in progress

wait_until(lambda: not loading_indicator.exists())

Dialog timing

wait_window(title, timeout=15)

CI display not ready

Set DISPLAY or use virtual desktop in CI

set_edit_text raises NotImplementedError

UIA ValuePattern missing (common on Qt 5.x) — BasePage.type_text already falls back to keyboard.send_keys

Control exists but wait_visible times out

Window minimised or off-screen — call win.restore() + win.set_focus() before waiting

Test Isolation &#x26; Sandbox

Three tiers of isolation — use the lightest tier that satisfies your needs.

Tier 1 — Filesystem Isolation (default, always use)

Each test gets its own APPDATA / LOCALAPPDATA / TEMP via subprocess.Popen and Application.connect(). pytest's tmp_path fixture handles cleanup automatically.

# conftest.py — replace the basic `app` fixture with this

import os, subprocess, pytest

from pywinauto import Application

from config import APP_PATH, APP_ARGS, APP_TITLE, LAUNCH_TIMEOUT, ACTION_TIMEOUT, ARTIFACT_DIR

@pytest.fixture(scope="function")

def app(request, tmp_path):

    """Fresh process + isolated user-data dirs per test."""

    if not APP_PATH:

        pytest.exit("APP_PATH not set", returncode=1)

    # Redirect all per-user storage to an isolated tmp directory

    sandbox_env = os.environ.copy()

    sandbox_env["QT_ACCESSIBILITY"]  = "1"

    sandbox_env["APPDATA"]           = str(tmp_path / "AppData" / "Roaming")

    sandbox_env["LOCALAPPDATA"]      = str(tmp_path / "AppData" / "Local")

    sandbox_env["TEMP"] = sandbox_env["TMP"] = str(tmp_path / "Temp")

    for p in (sandbox_env["APPDATA"], sandbox_env["LOCALAPPDATA"], sandbox_env["TEMP"]):

        os.makedirs(p, exist_ok=True)

    if not APP_TITLE:

        pytest.exit("APP_TITLE environment variable is not set", returncode=1)

    # shlex.split handles quoted args with spaces; plain split() breaks on them

    import shlex

    # Launch via subprocess so we can pass env; connect pywinauto by PID

    proc = subprocess.Popen(

        [APP_PATH] + shlex.split(APP_ARGS),

        env=sandbox_env,

    )

    pw_app = Application(backend="uia").connect(process=proc.pid, timeout=LAUNCH_TIMEOUT)

    win    = pw_app.window(title=APP_TITLE)

    win.wait("visible", timeout=LAUNCH_TIMEOUT)

    yield win

    if getattr(getattr(request.node, "rep_call", None), "failed", False):

        os.makedirs(ARTIFACT_DIR, exist_ok=True)

        try:

            win.capture_as_image().save(

                os.path.join(ARTIFACT_DIR, f"FAIL_{request.node.name}.png")

            )

        except Exception:

            pass

    try:

        win.close()

        proc.wait(timeout=5)

    except Exception:

        proc.kill()

    # tmp_path is cleaned up automatically by pytest

@pytest.hookimpl(tryfirst=True, hookwrapper=True)

def pytest_runtest_makereport(item, call):

    outcome = yield

    setattr(item, f"rep_{outcome.get_result().when}", outcome.get_result())

Tier 2 — Windows Job Object (optional: process-lifetime containment)

Attach the process to a Job Object so it is automatically terminated when

the test fixture's job handle is GC'd. Also prevents the app from spawning

child processes that escape fixture cleanup.

Scope of isolation: Job Objects do NOT virtualize filesystem access or

block network traffic. File-write and network isolation require AppContainer,

Windows Firewall rules, or Tier 3 (Windows Sandbox). Use Tier 2 only for

process-lifetime and child-process containment.

Requires no extra dependencies.

import ctypes, ctypes.wintypes as wt

def restrict_process(pid: int):

    """

    Attach the process to a Job Object that prevents it from:

    - spawning processes outside the job (LIMIT_KILL_ON_JOB_CLOSE)

    Does NOT block network — use Windows Firewall rules for that.

    """

    JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE = 0x00002000

    # Minimal rights: SET_QUOTA (0x0100) | TERMINATE (0x0001)

    PROCESS_SET_QUOTA_AND_TERMINATE    = 0x0101

    kernel32 = ctypes.windll.kernel32

    job   = kernel32.CreateJobObjectW(None, None)

    hproc = kernel32.OpenProcess(PROCESS_SET_QUOTA_AND_TERMINATE, False, pid)

    # Correct struct layout — LimitFlags is at offset +16, not +44

    class JOBOBJECT_BASIC_LIMIT_INFORMATION(ctypes.Structure):

        _fields_ = [

            ("PerProcessUserTimeLimit", wt.LARGE_INTEGER),

            ("PerJobUserTimeLimit",     wt.LARGE_INTEGER),

            ("LimitFlags",             wt.DWORD),

            ("MinimumWorkingSetSize",   ctypes.c_size_t),

            ("MaximumWorkingSetSize",   ctypes.c_size_t),

            ("ActiveProcessLimit",      wt.DWORD),

            ("Affinity",               ctypes.c_size_t),

            ("PriorityClass",          wt.DWORD),

            ("SchedulingClass",        wt.DWORD),

        ]

    info = JOBOBJECT_BASIC_LIMIT_INFORMATION()

    info.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE

    ok = kernel32.SetInformationJobObject(job, 2, ctypes.byref(info), ctypes.sizeof(info))

    if not ok:

        raise ctypes.WinError()

    kernel32.AssignProcessToJobObject(job, hproc)

    kernel32.CloseHandle(hproc)

    return job  # keep alive — job closes (kills proc) when GC'd

# After proc = subprocess.Popen(...):  job = restrict_process(proc.pid)

Tier 3 — Windows Sandbox (CI full-OS isolation)

When you need a clean Windows image per run (no leftover registry keys, no

shared GPU state, true isolation), run the entire test suite inside

Windows Sandbox.

Requirement: Windows 10/11 Pro or Enterprise, Virtualization enabled.

Create e2e-sandbox.wsb in your project root:

<Configuration>

  <MappedFolders>

    <!-- App binary (read-only) -->

    <MappedFolder>

      <HostFolder>C:\path\to\your\build\Release</HostFolder>

      <SandboxFolder>C:\app</SandboxFolder>

      <ReadOnly>true</ReadOnly>

    </MappedFolder>

    <!-- Test suite (read-write for artifacts) -->

    <MappedFolder>

      <HostFolder>C:\path\to\your\e2e_test</HostFolder>

      <SandboxFolder>C:\e2e_test</SandboxFolder>

      <ReadOnly>false</ReadOnly>

    </MappedFolder>

  </MappedFolders>

  <LogonCommand>

    <!--

      Windows Sandbox starts with no Python. Install it silently first,

      then install deps and run tests. Artifacts are written back to the

      host via the MappedFolder above.

    -->

    <Command>powershell -Command "

      winget install --id Python.Python.3.11 --silent --accept-package-agreements;

      $env:PATH += ';' + $env:LOCALAPPDATA + '\Programs\Python\Python311\Scripts';

      cd C:\e2e_test;

      pip install -r requirements.txt;

      pytest tests\ -v

    "</Command>

  </LogonCommand>

</Configuration>

Launch: WindowsSandbox.exe e2e-sandbox.wsb

pywinauto and the app both run inside the sandbox (same session required).

Artifacts are written back to the host via the mapped folder.

Tier comparison

Tier

Isolation

Setup cost

Works on CI

Use when

1 — tmp_path env redirect

Filesystem

Zero

Always

Default for all tests

2 — Job Object

Process tree

Low

Always

Prevent child-process escape

3 — Windows Sandbox

Full OS

Medium

Needs Pro/Enterprise image

Nightly clean-room runs

Prevent hanging tests

Add pytest-timeout to cap any single test. In pytest.ini set timeout = 60 and timeout_method = thread. Note: thread method cannot kill Qt app subprocesses on Windows — add atexit.register(lambda: [p.kill() for p in psutil.Process().children(recursive=True)]) in conftest.py to reap orphans.

CI/CD Integration

# .github/workflows/e2e-desktop.yml

name: Desktop E2E

on: [push, pull_request]

jobs:

  e2e:

    runs-on: windows-latest   # real GUI environment, no Xvfb needed

    steps:

      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5

        with: { python-version: "3.11" }

      - name: Install deps

        run: pip install pywinauto pytest pytest-html Pillow

      - name: Build app

        run: cmake --build build --config Release  # adjust to your build system

      - name: Run E2E

        env:

          APP_PATH: ${{ github.workspace }}\build\Release\MyApp.exe

          APP_TITLE: "My Application"

          CI: "true"

        run: pytest tests/ --html=artifacts/report.html --self-contained-html --junitxml=artifacts/results.xml -v

      - uses: actions/upload-artifact@v4

        if: always()

        with:

          name: e2e-artifacts

          path: artifacts/

          retention-days: 14

Qt Specific

Enable UIA in Qt 5.x

Qt 5.x accessibility is disabled by default in some builds (especially 5.7–5.14). Set the environment variable before launching. Qt 6.x enables accessibility by default — skip this step for Qt 6.

# conftest.py — add at module top

import os

os.environ["QT_ACCESSIBILITY"] = "1"

Or export it in CI:

env:

  QT_ACCESSIBILITY: "1"

Add Stable Identifiers to Qt Widgets

// Preferred: both objectName and accessibleName

void setTestId(QWidget* w, const char* id) {

    w->setObjectName(id);

    w->setAccessibleName(id);  // becomes UIA Name property

}

// In your dialog constructor:

setTestId(ui->usernameEdit, "usernameInput");

setTestId(ui->passwordEdit, "passwordInput");

setTestId(ui->loginButton,  "btnLogin");

setTestId(ui->errorLabel,   "lblError");

Centralise all IDs in a header to avoid typos:

// test_ids.h

#define TID_USERNAME   "usernameInput"

#define TID_PASSWORD   "passwordInput"

#define TID_BTN_LOGIN  "btnLogin"

#define TID_LBL_ERROR  "lblError"

Qt-Specific Quirks

QComboBox — the dropdown is a separate top-level window:

from pywinauto import Desktop

def select_combo_item(page, combo_spec, item_text):

    page.click(combo_spec)

    # Dropdown appears as a new root-level window

    # class_name varies by Qt version — verify with Accessibility Insights

    # Qt 5.x: "Qt5QWindowIcon"  |  Qt 6.x: "Qt6QWindowIcon" — verify with Accessibility Insights

    popup = Desktop(backend="uia").window(class_name_re="Qt[56]QWindowIcon")

    popup.wait("visible", timeout=5)

    popup.child_window(title=item_text).click_input()

QMessageBox / QDialog — also separate top-level windows:

dlg = page.wait_window("Confirm")          # wait for dialog title

dlg.child_window(title="OK").click_input() # click button inside it

QTableWidget / QTableView — row/cell access:

table = page.by_id("tblUsers").wrapper_object()

cell  = table.cell(row=0, column=1)

print(cell.window_text())

Self-drawn controls (paintEvent-only, QGraphicsView, QOpenGLWidget) — UIA cannot see their internals. Use the Fallback section below.

Fallback: Screenshot Mode

When a control is not reachable via UIA (self-drawn, third-party, game engine):

pip install pyautogui Pillow opencv-python
import pyautogui, cv2, numpy as np

from PIL import Image

def find_image_on_screen(template_path, confidence=0.85):

    """Locate a template image on screen. Returns (x, y) center or None."""

    screen   = np.array(pyautogui.screenshot())

    template = np.array(Image.open(template_path))

    result   = cv2.matchTemplate(

        cv2.cvtColor(screen, cv2.COLOR_RGB2BGR),

        cv2.cvtColor(template, cv2.COLOR_RGB2BGR),

        cv2.TM_CCOEFF_NORMED,

    )

    _, max_val, _, max_loc = cv2.minMaxLoc(result)

    if max_val >= confidence:

        h, w = template.shape[:2]

        return max_loc[0] + w // 2, max_loc[1] + h // 2

    return None

def click_image(template_path, confidence=0.85):

    pos = find_image_on_screen(template_path, confidence)

    if pos is None:

        raise RuntimeError(f"Image not found on screen: {template_path}")

    pyautogui.click(*pos)

DPI / Scaling Rules (screenshot mode only)

Screenshot matching is brutally sensitive to Windows display scaling (100% / 125% / 150%). Three hard rules:

  • Capture templates at the same scale as the target machine. Don't try to rescue a mismatch with PIL.Image.resizecv2.matchTemplate is very fragile against resampling artefacts.
  • Pin the CI display scaling. On windows-latest add a step like Set-DisplayResolution 1920 1080 -Force and disable per-monitor DPI scaling, so screenshot dimensions are reproducible.
  • Record the scale alongside each artefact. On capture, write GetDpiForWindow(hwnd) / 96 to artifacts/<test>/metadata.json — postmortems become obvious instead of guess-work.

Process-level DPI awareness (SetProcessDpiAwarenessContext) can conflict with Qt's own DPI handling when the app under test is Qt-based. Prefer "same-scale templates + CI pin" over flipping process-wide DPI mode in fixtures.

Debugging Match Confidence

When tuning the confidence threshold, the only sane workflow is to see where the match landed. The helper below is diagnosis-only — do not call it from test code.

def debug_match(template_path, out="artifacts/match_debug.png", confidence=0.85):

    """Diagnosis-only. Draw the best-match rectangle + score back on the current screen.

    NOT for production tests — use when calibrating confidence or chasing false matches.

    """

    import os, cv2, pyautogui, numpy as np

    screen = np.array(pyautogui.screenshot())[:, :, ::-1]

    tpl    = cv2.imread(template_path)

    if tpl is None:

        raise RuntimeError(f"Template unreadable: {template_path}")

    res    = cv2.matchTemplate(screen, tpl, cv2.TM_CCOEFF_NORMED)

    _, mv, _, ml = cv2.minMaxLoc(res)

    h, w   = tpl.shape[:2]

    colour = (0, 255, 0) if mv >= confidence else (0, 0, 255)  # green pass / red fail

    cv2.rectangle(screen, ml, (ml[0]+w, ml[1]+h), colour, 2)

    cv2.putText(screen, f"score={mv:.3f} thr={confidence}",

                (ml[0], max(20, ml[1]-6)),

                cv2.FONT_HERSHEY_SIMPLEX, 0.7, colour, 2)

    os.makedirs(os.path.dirname(out) or ".", exist_ok=True)

    cv2.imwrite(out, screen)

    return mv

Use sparingly — image matching breaks on DPI changes, theme switches, and partial occlusion.

Always try UIA first; fall back to screenshots only for genuinely unreachable controls.

Anti-Patterns

# BAD: fixed sleep

time.sleep(3)

page.click(page.by_id("btnSubmit"))

# GOOD: condition wait

page.wait_visible(page.by_id("btnSubmit"))

page.click(page.by_id("btnSubmit"))
# BAD: brittle class+index locator as primary strategy

page.by_class("Edit", index=2).type_keys("hello")

# GOOD: AutomationId

page.by_id("usernameInput").set_edit_text("hello")
# BAD: assert on pixel coordinates

assert btn.rectangle().left == 120

# GOOD: assert on content / state

assert page.get_text(page.by_id("lblStatus")) == "Logged in"

assert page.by_id("btnLogout").is_enabled()
# BAD: share app instance across all tests (state leaks)

@pytest.fixture(scope="session")

def app(): ...

# GOOD: fresh process per test (or per class at most)

@pytest.fixture(scope="function")

def app(): ...

Running Tests

# All tests

pytest tests/ -v

# Smoke only

pytest tests/ -m smoke -v

# Specific file

pytest tests/test_login.py -v

# With custom app path

APP_PATH="C:\build\Release\MyApp.exe" APP_TITLE="MyApp" pytest tests/ -v

# Detect flaky tests (repeat each 5 times)

pip install pytest-repeat

pytest tests/test_login.py --count=5 -v

Related Skills

  • e2e-testing — Playwright E2E for web applications
  • cpp-testing — C++ unit/integration testing with GoogleTest
  • cpp-coding-standards — C++ code style and patterns
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card