torchcode-pytorch-interview-practice

LeetCode-style PyTorch interview practice environment with auto-grading for implementing softmax, attention, GPT-2 and more from scratch.

INSTALLATION
npx skills add https://github.com/aradotso/trending-skills --skill torchcode-pytorch-interview-practice
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Option 3: Docker (pre-built image)

docker run -p 8888:8888 -e PORT=8888 ghcr.io/duoan/torchcode:latest

# Open http://localhost:8888

Option 4: Build locally

git clone https://github.com/duoan/TorchCode.git

cd TorchCode

make run

# Open http://localhost:8888

make run auto-detects Docker or Podman and falls back to local build if the registry image is unavailable (common on Apple Silicon/arm64).

Judge API

The torch_judge package provides the core API used in every notebook.

from torch_judge import check, status, hint, reset_progress

# List all 40 problems and your progress

status()

# Run tests for a specific problem

check("relu")

check("softmax")

check("layernorm")

check("attention")

check("gpt2")

# Get a hint without spoilers

hint("softmax")

# Reset progress for a problem

reset_progress("relu")

check() return values

  • Colored pass/fail per test case
  • Correctness check against PyTorch reference implementation
  • Gradient verification (autograd compatibility)
  • Timing measurement

Problem Set Overview

Difficulty levels: Easy → Medium → Hard

#

Problem

Key Concepts

1

ReLU

Activation functions, element-wise ops

2

Softmax

Numerical stability, exp/log tricks

3

Linear Layer

y = xW^T + b, Kaiming init, nn.Parameter

4

LayerNorm

Normalization, affine transform

5

Self-Attention

QKV projections, scaled dot-product

6

Multi-Head Attention

Head splitting, concatenation

7

BatchNorm

Batch vs layer statistics, train/eval

8

RMSNorm

LLaMA-style norm

16

Cross-Entropy Loss

Log-softmax, logsumexp trick

17

Dropout

Train/eval mode, inverted scaling

18

Embedding

Lookup table, weight[indices]

19

GELU

torch.erf, Gaussian error linear unit

20

Kaiming Init

std = sqrt(2/fan_in)

21

Gradient Clipping

Norm-based clipping

31

Gradient Accumulation

Micro-batching, loss scaling

40

Linear Regression

Normal equation, GD from scratch

Working Through a Problem

Each problem notebook has the same structure:

templates/

  01_relu.ipynb       # Blank template — your workspace

  02_softmax.ipynb

  ...

solutions/

  01_relu.ipynb       # Reference solution (study after attempt)

Typical notebook workflow

# Cell 1: Import judge

from torch_judge import check, hint

import torch

import torch.nn as nn

# Cell 2: Your implementation

def my_relu(x: torch.Tensor) -> torch.Tensor:

    # TODO: implement ReLU without using torch.relu or F.relu

    raise NotImplementedError

# Cell 3: Run the judge

check("relu")

Real Implementation Examples

ReLU (Problem 1 — Easy)

def my_relu(x: torch.Tensor) -> torch.Tensor:

    return torch.clamp(x, min=0)

    # Alternative: return x * (x > 0)

    # Alternative: return torch.where(x > 0, x, torch.zeros_like(x))

Softmax (Problem 2 — Easy, numerically stable)

def my_softmax(x: torch.Tensor, dim: int = -1) -> torch.Tensor:

    # Subtract max for numerical stability (prevents overflow)

    x_max = x.max(dim=dim, keepdim=True).values

    x_shifted = x - x_max

    exp_x = torch.exp(x_shifted)

    return exp_x / exp_x.sum(dim=dim, keepdim=True)

LayerNorm (Problem 4 — Medium)

def my_layer_norm(

    x: torch.Tensor,

    weight: torch.Tensor,   # gamma (scale)

    bias: torch.Tensor,     # beta (shift)

    eps: float = 1e-5

) -> torch.Tensor:

    mean = x.mean(dim=-1, keepdim=True)

    var = x.var(dim=-1, keepdim=True, unbiased=False)

    x_norm = (x - mean) / torch.sqrt(var + eps)

    return weight * x_norm + bias

RMSNorm (Problem 8 — Medium, LLaMA-style)

def rms_norm(x: torch.Tensor, weight: torch.Tensor, eps: float = 1e-6) -> torch.Tensor:

    rms = torch.sqrt((x ** 2).mean(dim=-1, keepdim=True) + eps)

    return (x / rms) * weight

Scaled Dot-Product Self-Attention (Problem 5 — Medium)

import torch.nn.functional as F

import math

def scaled_dot_product_attention(

    Q: torch.Tensor,  # (B, heads, T, head_dim)

    K: torch.Tensor,

    V: torch.Tensor,

    mask: torch.Tensor = None

) -> torch.Tensor:

    d_k = Q.size(-1)

    scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(d_k)

    if mask is not None:

        scores = scores.masked_fill(mask == 0, float('-inf'))

    attn_weights = F.softmax(scores, dim=-1)

    return torch.matmul(attn_weights, V)

Multi-Head Attention (Problem 6 — Medium)

class MyMultiHeadAttention(nn.Module):

    def __init__(self, d_model: int, num_heads: int):

        super().__init__()

        assert d_model % num_heads == 0

        self.num_heads = num_heads

        self.head_dim = d_model // num_heads

        self.d_model = d_model

        self.W_q = nn.Linear(d_model, d_model)

        self.W_k = nn.Linear(d_model, d_model)

        self.W_v = nn.Linear(d_model, d_model)

        self.W_o = nn.Linear(d_model, d_model)

    def forward(self, x: torch.Tensor, mask: torch.Tensor = None) -> torch.Tensor:

        B, T, C = x.shape

        def split_heads(t):

            return t.view(B, T, self.num_heads, self.head_dim).transpose(1, 2)

        Q = split_heads(self.W_q(x))

        K = split_heads(self.W_k(x))

        V = split_heads(self.W_v(x))

        attn_out = scaled_dot_product_attention(Q, K, V, mask)

        # (B, heads, T, head_dim) -> (B, T, d_model)

        attn_out = attn_out.transpose(1, 2).contiguous().view(B, T, C)

        return self.W_o(attn_out)

Cross-Entropy Loss (Problem 16 — Easy)

def cross_entropy_loss(logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:

    # logits: (B, C), targets: (B,) with class indices

    # Use logsumexp trick for numerical stability

    log_sum_exp = torch.logsumexp(logits, dim=-1)  # (B,)

    log_probs = logits[torch.arange(len(targets)), targets]  # (B,)

    return (log_sum_exp - log_probs).mean()

Dropout (Problem 17 — Easy)

class MyDropout(nn.Module):

    def __init__(self, p: float = 0.5):

        super().__init__()

        self.p = p

    def forward(self, x: torch.Tensor) -> torch.Tensor:

        if not self.training or self.p == 0:

            return x

        mask = torch.bernoulli(torch.ones_like(x) * (1 - self.p))

        return x * mask / (1 - self.p)  # inverted scaling

Kaiming Init (Problem 20 — Easy)

def kaiming_init(weight: torch.Tensor) -> torch.Tensor:

    fan_in = weight.size(1)

    std = math.sqrt(2.0 / fan_in)

    with torch.no_grad():

        weight.normal_(0, std)

    return weight

Gradient Clipping (Problem 21 — Easy)

def clip_grad_norm(parameters, max_norm: float) -> float:

    params = [p for p in parameters if p.grad is not None]

    total_norm = torch.sqrt(sum(p.grad.data.norm() ** 2 for p in params))

    clip_coef = max_norm / (total_norm + 1e-6)

    if clip_coef < 1:

        for p in params:

            p.grad.data.mul_(clip_coef)

    return total_norm.item()

Gradient Accumulation (Problem 31 — Easy)

def train_with_accumulation(model, optimizer, dataloader, accumulation_steps=4):

    optimizer.zero_grad()

    for i, (inputs, targets) in enumerate(dataloader):

        outputs = model(inputs)

        loss = criterion(outputs, targets) / accumulation_steps  # scale loss

        loss.backward()

        if (i + 1) % accumulation_steps == 0:

            optimizer.step()

            optimizer.zero_grad()

Common Patterns &#x26; Tips

Numerical stability pattern

Always subtract the max before exp():

# WRONG — can overflow for large values

exp_x = torch.exp(x)

# CORRECT — numerically stable

exp_x = torch.exp(x - x.max(dim=-1, keepdim=True).values)

Causal attention mask (for GPT-style models)

def causal_mask(T: int, device) -> torch.Tensor:

    return torch.tril(torch.ones(T, T, device=device)).unsqueeze(0).unsqueeze(0)

nn.Module skeleton (used in many problems)

class MyLayer(nn.Module):

    def __init__(self, ...):

        super().__init__()

        self.weight = nn.Parameter(torch.empty(...))

        self.bias = nn.Parameter(torch.zeros(...))

        self._init_weights()

    def _init_weights(self):

        nn.init.kaiming_uniform_(self.weight)

    def forward(self, x: torch.Tensor) -> torch.Tensor:

        ...

Train vs eval mode pattern

def forward(self, x):

    if self.training:

        # use batch statistics

        mean = x.mean(dim=0)

        var = x.var(dim=0, unbiased=False)

        # update running stats

        self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean

        self.running_var = (1 - self.momentum) * self.running_var + self.momentum * var

    else:

        # use running statistics

        mean = self.running_mean

        var = self.running_var

    return (x - mean) / torch.sqrt(var + self.eps) * self.weight + self.bias

Project Structure

TorchCode/

├── templates/          # Blank notebooks for each problem (your workspace)

│   ├── 01_relu.ipynb

│   ├── 02_softmax.ipynb

│   └── ...

├── solutions/          # Reference solutions (study after attempting)

│   └── ...

├── torch_judge/        # Auto-grading package

│   ├── __init__.py     # check(), status(), hint(), reset_progress()

│   └── tasks/          # Per-problem test cases

├── Dockerfile

├── Makefile

└── pyproject.toml      # torch-judge package definition

Troubleshooting

Docker image not available for Apple Silicon (arm64)

# make run auto-falls back to local build, or force it:

make build

make start

check() not found in Colab

!pip install torch-judge

# then restart runtime

Notebook reset to blank template

Use the toolbar "Reset" button in JupyterLab to reset any notebook to its original blank state — useful for re-practicing a problem.

Gradient check fails but output is correct

Ensure your implementation uses PyTorch operations (not NumPy) so autograd works:

# WRONG — breaks autograd

import numpy as np

result = np.exp(x.numpy())

# CORRECT — autograd compatible

result = torch.exp(x)

Viewing reference solution

After attempting a problem, open the matching file in solutions/:

solutions/02_softmax.ipynb

Key Concepts Tested

Concept

Problems

Numerical stability

Softmax, Cross-Entropy, LogSumExp

Autograd / nn.Parameter

Linear, LayerNorm, all nn.Module problems

Train vs eval behavior

BatchNorm, Dropout

Broadcasting

LayerNorm, RMSNorm, attention masking

Shape manipulation

Multi-Head Attention (view, transpose, contiguous)

Weight initialization

Kaiming Init, Linear Layer

Memory-efficient training

Gradient Accumulation, Gradient Clipping

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card