HTML tag regular expression

We found this regular expression online at for finding html tags in a string. I used this in the new Gustavus eCard program that I have been working on (stay tuned) as a way to get the text of captions which contain hyperlinks. This regular expression was written by Phil Haack.


This expression is so smart that it even accounts for things like newline characters and angle brackets which happen to appear in data.

Update: As Haacked pointed out, (s|n) is redundant, so the updated regular expression should be as follows:



  1. Haacked says:

    I think that expression can be simplified just a bit. Anywhere you see (\s|\n) should be reducible to just \s. I need to test it just to be sure. If you test it, let me know if it works for you.

  2. Ryan Rud says:

    As far as I can tell, this simplified version works just as well at detecting newlines.