polars

Fast DataFrame library (Apache Arrow). Select, filter, group_by, joins, lazy evaluation, CSV/Parquet I/O, expression API, for high-performance data analysis…

INSTALLATION
npx skills add https://github.com/davila7/claude-code-templates --skill polars
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Polars

Overview

Polars is a lightning-fast DataFrame library for Python and Rust built on Apache Arrow. Work with Polars' expression-based API, lazy evaluation framework, and high-performance data manipulation capabilities for efficient data processing, pandas migration, and data pipeline optimization.

Quick Start

Installation and Basic Usage

Install Polars:

uv pip install polars

Basic DataFrame creation and operations:

import polars as pl

Create DataFrame

df = pl.DataFrame({

"name": ["Alice", "Bob", "Charlie"],

"age": [25, 30, 35],

"city": ["NY", "LA", "SF"]

})

Select columns

df.select("name", "age")

Filter rows

df.filter(pl.col("age") > 25)

Add computed columns

df.with_columns(

age_plus_10=pl.col("age") + 10

)

## Core Concepts

### Expressions

Expressions are the fundamental building blocks of Polars operations. They describe transformations on data and can be composed, reused, and optimized.

**Key principles:**

- Use `pl.col("column_name")` to reference columns

- Chain methods to build complex transformations

- Expressions are lazy and only execute within contexts (select, with_columns, filter, group_by)

**Example:**

Expression-based computation

df.select(

pl.col("name"),

(pl.col("age") * 12).alias("age_in_months")

)


### Lazy vs Eager Evaluation

**Eager (DataFrame):** Operations execute immediately

df = pl.read_csv("file.csv") # Reads immediately

result = df.filter(pl.col("age") > 25) # Executes immediately


**Lazy (LazyFrame):** Operations build a query plan, optimized before execution

lf = pl.scan_csv("file.csv") # Doesn't read yet

result = lf.filter(pl.col("age") > 25).select("name", "age")

df = result.collect() # Now executes optimized query


**When to use lazy:**

- Working with large datasets

- Complex query pipelines

- When only some columns/rows are needed

- Performance is critical

**Benefits of lazy evaluation:**

- Automatic query optimization

- Predicate pushdown

- Projection pushdown

- Parallel execution

For detailed concepts, load `references/core_concepts.md`.

## Common Operations

### Select

Select and manipulate columns:

Select specific columns

df.select("name", "age")

Select with expressions

df.select(

pl.col("name"),

(pl.col("age") * 2).alias("double_age")

)

Select all columns matching a pattern

df.select(pl.col("^.*_id$"))


### Filter

Filter rows by conditions:

Single condition

df.filter(pl.col("age") > 25)

Multiple conditions (cleaner than using &)

df.filter(

pl.col("age") > 25,

pl.col("city") == "NY"

)

Complex conditions

df.filter(

(pl.col("age") > 25) | (pl.col("city") == "LA")

)


### With Columns

Add or modify columns while preserving existing ones:

Add new columns

df.with_columns(

age_plus_10=pl.col("age") + 10,

name_upper=pl.col("name").str.to_uppercase()

)

Parallel computation (all columns computed in parallel)

df.with_columns(

pl.col("value") * 10,

pl.col("value") * 100,

)


### Group By and Aggregations

Group data and compute aggregations:

Basic grouping

df.group_by("city").agg(

pl.col("age").mean().alias("avg_age"),

pl.len().alias("count")

)

Multiple group keys

df.group_by("city", "department").agg(

pl.col("salary").sum()

)

Conditional aggregations

df.group_by("city").agg(

(pl.col("age") > 30).sum().alias("over_30")

)


For detailed operation patterns, load `references/operations.md`.

## Aggregations and Window Functions

### Aggregation Functions

Common aggregations within `group_by` context:

- `pl.len()` - count rows

- `pl.col("x").sum()` - sum values

- `pl.col("x").mean()` - average

- `pl.col("x").min()` / `pl.col("x").max()` - extremes

- `pl.first()` / `pl.last()` - first/last values

### Window Functions with over()

Apply aggregations while preserving row count:

Add group statistics to each row

df.with_columns(

avg_age_by_city=pl.col("age").mean().over("city"),

rank_in_city=pl.col("salary").rank().over("city")

)

Multiple grouping columns

df.with_columns(

group_avg=pl.col("value").mean().over("category", "region")

)


**Mapping strategies:**

- `group_to_rows` (default): Preserves original row order

- `explode`: Faster but groups rows together

- `join`: Creates list columns

## Data I/O

### Supported Formats

Polars supports reading and writing:

- CSV, Parquet, JSON, Excel

- Databases (via connectors)

- Cloud storage (S3, Azure, GCS)

- Google BigQuery

- Multiple/partitioned files

### Common I/O Operations

**CSV:**

Eager

df = pl.read_csv("file.csv")

df.write_csv("output.csv")

Lazy (preferred for large files)

lf = pl.scan_csv("file.csv")

result = lf.filter(...).select(...).collect()


**Parquet (recommended for performance):**

df = pl.read_parquet("file.parquet")

df.write_parquet("output.parquet")


**JSON:**

df = pl.read_json("file.json")

df.write_json("output.json")


For comprehensive I/O documentation, load `references/io_guide.md`.

## Transformations

### Joins

Combine DataFrames:

Inner join

df1.join(df2, on="id", how="inner")

Left join

df1.join(df2, on="id", how="left")

Join on different column names

df1.join(df2, left_on="user_id", right_on="id")


### Concatenation

Stack DataFrames:

Vertical (stack rows)

pl.concat([df1, df2], how="vertical")

Horizontal (add columns)

pl.concat([df1, df2], how="horizontal")

Diagonal (union with different schemas)

pl.concat([df1, df2], how="diagonal")


### Pivot and Unpivot

Reshape data:

Pivot (wide format)

df.pivot(values="sales", index="date", columns="product")

Unpivot (long format)

df.unpivot(index="id", on=["col1", "col2"])


For detailed transformation examples, load `references/transformations.md`.

## Pandas Migration

Polars offers significant performance improvements over pandas with a cleaner API. Key differences:

### Conceptual Differences

- **No index**: Polars uses integer positions only

- **Strict typing**: No silent type conversions

- **Lazy evaluation**: Available via LazyFrame

- **Parallel by default**: Operations parallelized automatically

### Common Operation Mappings

Operation
Pandas
Polars

Select column
`df["col"]`
`df.select("col")`

Filter
`df[df["col"] > 10]`
`df.filter(pl.col("col") > 10)`

Add column
`df.assign(x=...)`
`df.with_columns(x=...)`

Group by
`df.groupby("col").agg(...)`
`df.group_by("col").agg(...)`

Window
`df.groupby("col").transform(...)`
`df.with_columns(...).over("col")`

### Key Syntax Patterns

**Pandas sequential (slow):**

df.assign(

col_a=lambda df_: df_.value * 10,

col_b=lambda df_: df_.value * 100

)


**Polars parallel (fast):**

df.with_columns(

col_a=pl.col("value") * 10,

col_b=pl.col("value") * 100,

)


For comprehensive migration guide, load `references/pandas_migration.md`.

## Best Practices

### Performance Optimization

-

**Use lazy evaluation for large datasets:**

lf = pl.scan_csv("large.csv") # Don't use read_csv

result = lf.filter(...).select(...).collect()


-

**Avoid Python functions in hot paths:**

- Stay within expression API for parallelization

- Use `.map_elements()` only when necessary

- Prefer native Polars operations

-

**Use streaming for very large data:**

lf.collect(streaming=True)


-

**Select only needed columns early:**

Good: Select columns early

lf.select("col1", "col2").filter(...)

Bad: Filter on all columns first

lf.filter(...).select("col1", "col2")


-

**Use appropriate data types:**

- Categorical for low-cardinality strings

- Appropriate integer sizes (i32 vs i64)

- Date types for temporal data

### Expression Patterns

**Conditional operations:**

pl.when(condition).then(value).otherwise(other_value)


**Column operations across multiple columns:**

df.select(pl.col("^._value$") 2) # Regex pattern


**Null handling:**

pl.col("x").fill_null(0)

pl.col("x").is_null()

pl.col("x").drop_nulls()

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card