SKILL.md
Clean Data
Clean messy data in the active sheet or a specified range.
Environment
- If running inside Excel (Office Add-in / Office JS): Use Office JS directly (
Excel.run(async (context) => {...})). Read viarange.values, write helper-column formulas viarange.formulas = [["=TRIM(A2)"]]. The in-place vs helper-column decision still applies.
- If operating on a standalone .xlsx file: Use Python/openpyxl.
Workflow
Step 1: Scope
- If a range is given (e.g.
A1:F200), use it
- Otherwise use the full used range of the active sheet
- Profile each column: detect its dominant type (text / number / date) and identify outliers
Step 2: Detect issues
Issue
What to look for
Whitespace
leading/trailing spaces, double spaces
Casing
inconsistent casing in categorical columns (usa / USA / Usa)
Number-as-text
numeric values stored as text; stray $, ,, % in number cells
Dates
mixed formats in the same column (3/8/26, 2026-03-08, March 8 2026)
Duplicates
exact-duplicate rows and near-duplicates (case/whitespace differences)
Blanks
empty cells in otherwise-populated columns
Mixed types
a column that's 98% numbers but has 3 text entries
Encoding
mojibake (é, ’), non-printing characters
Errors
#REF!, #N/A, #VALUE!, #DIV/0!
Step 3: Propose fixes
Show a summary table before changing anything:
Column
Issue
Count
Proposed Fix
Step 4: Apply
- Prefer formulas over hardcoded cleaned values — where the cleaned output can be expressed as a formula (e.g.
=TRIM(A2),=VALUE(SUBSTITUTE(B2,"$","")),=UPPER(C2),=DATEVALUE(D2)), write the formula in an adjacent helper column rather than computing the result in Python and overwriting the original. This keeps the transformation transparent and auditable.
- Only overwrite in place with computed values when the user explicitly asks for it, or when no sensible formula equivalent exists (e.g. encoding/mojibake repair)
- For destructive operations (removing duplicates, filling blanks, overwriting originals), confirm with the user first
- After each category of fix (whitespace → casing → number conversion → dates → dedup), show the user a sample of what changed and get confirmation before moving to the next category
- Report a before/after summary of what changed