Jamal Moir: How to Find Unclosed Tags and Brackets Using a Stack

This blog is a programming and Computer Science blog. Therefore the posts I have made and will make, have code embedded in them and for readability I thought it would be nice if there was some form of syntax highlighting. Now, this blog is hosted on blogger and blogger doesn't have any support for syntax highlighting, so I was forced to look elsewhere for a solution.

I ended up coming across a 'JavaScript code prettifier' called PrettyPrint. It works, I'm happy. Getting it to do the actual highlighting though, requires you to put the code you want to highlight within <pre> tags with the class 'prettyprint'. So, I clicked on blogger's HTML tab on the blog post editor and BAM, I was slapped in the face by a plethora of unclosed, empty and unneeded HTML tags. It was a mess.

I really wasn't happy with the idea of wading through all of this rubbish every time I had to edit my posts' HTML, so I made two choices:

I was never going to use bloggers built-in writer again. At least not until it was improved.
I was going to clean the HTML of all my past posts

Putting the first choice into action is on hold at the moment, I can't find a decent blogging client for Linux. If anyone knows one, please tell me. Moving on.

The second choice seemed like a lot of work that I didn't want to do manually, so I started up VIM and got to writing a Python script to clean up blogger's HTML markup automatically.

Now, this script isn't perfect, but it gets the HTML markup to a point where it is navigable and easier to read. Lots of useless markup get's deleted and it gets reformatted a bit. It gets the job that I wanted done, done; and that's an achievement to me. You can view the whole project here.

Despite the fact that the script is technically unfinished, and I will get to that, there is an interesting part that might be useful and/or interesting to you readers; utilising a stack to find unclosed HTML tags.

USING A STACK TO FIND UNCLOSED HTML TAGS

This script uses a stack, or at least a list in a stack-like manner to find opening HTML tags that aren't paired with the corresponding closing tag.

It does this by going through all the tags in the document and pushing opening tags onto a stack. Then when the closing tag is reached, they are popped off again. If however the closing tag is never reached, or a different closing tag is reached, it will return the position of the unclosed tag.

THE CODE

def find_next_unclosed(text):
"""Finds the next unclosed HTML tag"""
    tag_stack = []

    # Get an iterator of all tags in file.
    tag_regex = re.compile(r'<[^>]*>', re.DOTALL)
    tags = tag_regex.finditer(text)

    for tag in tags:
        # If it is a closing tag check if it matches the last opening tag.
        if re.match(r'<\/[^>]*>', tag.group()):
            top_tag = tag_stack[-1]

            if tag_match(top_tag.group(), tag.group()):
                tag_stack.pop()
            else:
                unclosed = tag_stack.pop()
                return (unclosed.start(), unclosed.end())
        else:
            tag_stack.append(tag)

SO WHAT'S GOING ON?

# Get an iterator of all tags in file.
tag_regex = re.compile(r'<[^>]*>', re.DOTALL)
tags = tag_regex.finditer(text)

First, using regex and Python's built in re.compile and re.finditer() (both of which can be read about on the Python documentation) we get an iterator of all the tags found in the text we are searching.

for tag in tags:
    # If it is a closing tag check if it matches the last opening tag.
    if re.match(r'<\/[^>]*>', tag.group()):

Next we loop over each tag and check to see if it is a closing tag, e.g. </body>.

else:
    tag_stack.append(tag)

If isn't a closing tag, it is therefore an opening tag and is pushed onto a stack which contains exclusively opening tags.

if tag_match(top_tag.group(), tag.group()):
    tag_stack.pop()

If it is a closing tag, it then checks to see if the top tag on the opening tags stack matches it. Note that <div> and </div> match but <div> and <div> or <div> and </body> don't. If it matches, this means that the current closing tag closes the opening tag on top of the stack and everything is fine. It then pops the top element off the opening tags stack.

else:
    unclosed = tag_stack.pop()
    return (unclosed.start(), unclosed.end())

However if they don't match, the top tag in the opening tags stack is unclosed, or at least not closed in the right place and the position of this tag is returned (as a start and end point).

There we have it, a simple but effective way to search for unclosed HTML tags, or if slightly adapted, brackets. This is good for checking the syntax of a markup file or some file written in a programming language like Java, which makes heavy use of curly braces.

That's it for this post, I hope you managed to salvage something useful from it. For more posts like this, take a look at my past ones and subscribe to my RSS feed to make sure you don't miss my future ones.

Jamal Moir: How to Find Unclosed Tags and Brackets Using a Stack

USING A STACK TO FIND UNCLOSED HTML TAGS

THE CODE

SO WHAT'S GOING ON?

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List