Remove all HTML tags from a string with C#

Here are two methods to remove all HTML tags from a string in C# without removing the contents of the tag:

Method 1 with a regular expression:

public static string StripHTML(string input)
{
    return Regex.Replace(input, "<.*?>", String.Empty);
}

Method 2 without a regular expression:

public static string StripHTML(string source)
{
    char[] array = new char[source.Length];
    int arrayIndex = 0;
    bool inside = false;
    for (int i = 0; i < source.Length; i++)
    {
        char let = source[i];
        if (let == '<') { inside = true; continue; } if (let == '>')
        {
            inside = false;
            continue;
        }
        if (!inside)
        {
            array[arrayIndex] = let;
            arrayIndex++;
        }
    }
    return new string(array, 0, arrayIndex);
}

Leave a Comment

Your email address will not be published. Required fields are marked *