Wiki Automation Web Scraping BeautifulSoup

Recipes, Tips and Tricks

How to remove comments from a bs4 element?

from bs4 import Comment

# remove comments
for comment in soup.findAll(text=lambda text:isinstance(text, Comment)):
    comment.extract()

How to remove all the AngularJS bullshit attributes so you can actually read the HTML?

# remove angular attributes
for tag in soup.recursiveChildGenerator():
    if hasattr(tag, 'attrs'):
        tag.attrs = {
            key: value
            for key, value in tag.attrs.items()
            if not key.startswith('ng-')
        }

Note: this will only remove the ng-* attributes, and not the ng-* classes.