Follow

stackoverflow.com/a/1732454/22

Thinking about parsing HTML with regex? Prepare for pain.

· · Web · 1 · 0 · 1

@botox Haha, I've seen this or something like it. I wonder how hard it would be to write a grammar. I'd imagine that's how the browser does it.

@arm I’ve been wondering how useful it would be to learn how to build a browser from scratch. Most of the material I’ve seen online starts with WebKit, which seems like skipping lots of the fun.

@botox I did that once. Started at webkit, it took like 30 minutes to do the rest with glade and Python. Yeah building the engine would be really challenging.

@arm @botox Parsing html with regex sounds purposely painful. Just build a stack: push when <, pop when </, with a few corner cases. Or is masochism the main idea?

@bradley @arm Browsers will handle incorrect HTML, i.e. elements missing closing tags. I think the pain comes from dealing with that kind of shit. Plus, other things I haven’t thought of. :D

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!