MeatButton

Your App Hangs on Certain Inputs: Regex Catastrophic Backtracking

When a regular expression takes your CPU hostage

Everything works fine until a user enters a specific string. Then your app freezes. The page stops responding. The server CPU spikes to 100%. Maybe the process eventually gets killed by the OS, or maybe it just hangs forever.

If this only happens with specific inputs — usually long strings or strings with repeating characters — you almost certainly have a regex with catastrophic backtracking.

What's happening

A regular expression (regex) is a pattern that matches text. Most regex patterns run in milliseconds. But some patterns, when given certain inputs, take exponentially longer as the input gets longer. A string that's 20 characters long might take a millisecond. The same pattern with 30 characters might take a second. 40 characters might take minutes. 50 characters might take hours.

This happens because the regex engine tries every possible way the pattern could match the input. With certain patterns, the number of possibilities doubles with each additional character. That's exponential growth — and it freezes your app.

The dangerous patterns

Catastrophic backtracking happens when a regex has nested quantifiers — a repeating group inside another repeating group. Here are the patterns that cause it:

// DANGEROUS — nested quantifiers
(a+)+
(a*)*
(a|aa)+
(.*a){10}

// DANGEROUS — overlapping alternatives with quantifiers
(a|a)+
(\d+|\d+\.)+

// REAL-WORLD EXAMPLES that AI generates:
// Email validation (bad):
^([a-zA-Z0-9]+\.)*[a-zA-Z0-9]+@([a-zA-Z0-9]+\.)*[a-zA-Z0-9]+$

// URL validation (bad):
^(https?://)?(([\w-]+\.)+[\w-]+)(/[\w-./?%&=]*)*$

// Whitespace trimming (bad):
\s*(.*)\s*$

The common thread: a quantifier (+, *, {n}) applied to a group that contains another quantifier, or overlapping alternatives that let the engine match the same character in multiple ways.

How to test if your regex is dangerous

Try your regex against a string of repeating characters followed by a character that doesn't match:

// If your regex validates email addresses, try:
"aaaaaaaaaaaaaaaaaaaaaaaaaaa@"

// If it validates URLs, try:
"aaaaaaaaaaaaaaaaaaaaaaaaaaa."

// The key: repeating characters + a trailing character
// that makes the overall match fail.
// If it takes noticeably longer as you add more 'a's,
// you have a backtracking problem.

You can also use online tools like regex101.com — it shows you the number of steps the engine takes. If the step count is in the millions for a short string, you have a problem.

How to fix it

Fix 1: Use atomic groups or possessive quantifiers. These tell the engine not to backtrack. Not all regex engines support them, but many do.

// Instead of (a+)+ use atomic group:
(?>a+)+

// Or possessive quantifier (Java, PCRE):
(a++)+ 

Fix 2: Rewrite the pattern to avoid nested quantifiers. Often a simpler pattern does the same job.

// BAD: (a+)+b
// GOOD: a+b
// They match the same strings, but the second can't backtrack

// BAD: (\w+\.)*\w+@(\w+\.)*\w+
// GOOD: [\w.]+@[\w.]+
// Simpler, faster, matches the same thing for validation purposes

Fix 3: Don't use regex for complex validation. Email, URL, and HTML validation are notoriously hard to get right with regex. Use a dedicated library instead.

// Instead of a complex email regex:
// Just check for @ and a dot. Validate by sending a confirmation email.
function isPlausibleEmail(str) {
    return str.includes('@') && str.includes('.') && str.length < 254;
}

// For URLs:
try {
    new URL(input);
    // valid URL
} catch {
    // invalid URL
}

// For HTML: use a proper parser, never regex
const { JSDOM } = require('jsdom');
const dom = new JSDOM(html);

Fix 4: Add a timeout. If you can't rewrite the regex, limit how long it can run.

// Node.js — run regex with a timeout using worker_threads
const { Worker } = require('worker_threads');

function regexWithTimeout(pattern, input, timeoutMs = 1000) {
    return new Promise((resolve, reject) => {
        const worker = new Worker(`
            const { parentPort, workerData } = require('worker_threads');
            const match = workerData.input.match(new RegExp(workerData.pattern));
            parentPort.postMessage(match);
        `, { eval: true, workerData: { pattern, input } });

        const timer = setTimeout(() => {
            worker.terminate();
            reject(new Error('Regex timed out'));
        }, timeoutMs);

        worker.on('message', (result) => {
            clearTimeout(timer);
            resolve(result);
        });
    });
}

Fix 5: Limit input length. Catastrophic backtracking scales with input length. If you cap the input at a reasonable length, the worst case becomes tolerable.

// Before running regex, check length
if (input.length > 1000) {
    return { error: 'Input too long' };
}
const match = input.match(pattern);

This is a security issue

Catastrophic backtracking isn't just a performance bug — it's a denial-of-service vulnerability called ReDoS (Regular Expression Denial of Service). An attacker who knows your regex pattern can craft an input that freezes your server. If your app accepts user input and runs it through a regex without safeguards, it's vulnerable.

This has taken down production services at major companies. It's in the OWASP guidelines. It's a real security concern, not a theoretical one.

Why AI generates vulnerable regex

AI generates regex patterns from training data that includes Stack Overflow answers, tutorials, and documentation. Many of those patterns were written by people who didn't consider backtracking performance. The patterns work on normal inputs — they just catastrophically fail on adversarial or unusual ones.

AI is particularly bad at email and URL validation regex because these are the patterns most commonly shared online, most commonly copied without understanding, and most commonly vulnerable to backtracking. If AI generated your validation regex, test it with adversarial input before trusting it.

App freezing on specific inputs?

MeatButton connects you with developers who can audit your regex patterns, identify ReDoS vulnerabilities, and rewrite them safely — or replace them with proper validation libraries. First one's free.

Get MeatButton