Beautiful Soup is a Python library for getting data out of web pages. It was written to be able to reliably parse not just from well-structured sites with valid markup, but also from the nearly infinite variety of creatively-written HTML that can be found on the modern web. Beautiful Soup provides a set of simple Python idioms for navigating inside a page to find and extract the parts you're looking for. The Beautiful Soup documentation provides both a reference manual for using the library and a series of illustrative examples that demonstrate data extraction. Examples of Beautiful Soup in practice listed on their website include works of digital art, aggregating statewide election results, finding representative images for a page, and more. Python users can install Beautiful Soup using pip. Beautiful Soup is free software, distributed under the MIT license, with source code available on Launchpad.
Comments