Beautiful HTML¶
Pretty-print raw HTML without changing semantics. The formatter parses html,
serializes a normalized DOM, and indents nodes by a configurable amount. It never
reflows RCData content (<script>, <style>, <textarea>) and avoids
introducing visible whitespace unless explicitly requested.
Parameters¶
html(str): Raw HTML input.indent(int, default2): Spaces per indentation level.quote_attr_values({"always","spec","legacy"}, default"spec"): Quoting policy for attribute values. -"always"→ always quote. -"spec"→ quote only when required by the HTML5 spec (space, quotes,=,<,>, backtick). -"legacy"→ legacy behavior; quote only for whitespace or quotes.quote_char({'"',"'"}, default'"'): Preferred quote character when quoting.use_best_quote_char(bool, defaultTrue): Choose the quote character that minimizes escaping per attribute.minimize_boolean_attributes(bool, defaultFalse): Render compact boolean attributes (e.g.,disabledinstead ofdisabled="disabled").use_trailing_solidus(bool, defaultFalse): Emit a trailing solidus on void elements (<br />). Cosmetic in HTML5.space_before_trailing_solidus(bool, defaultTrue): Insert a space before the trailing solidus if it is used.escape_lt_in_attrs(bool, defaultFalse): Escape</>inside attribute values.escape_rcdata(bool, defaultFalse): Escape characters inside RCData elements (usually keepFalse).resolve_entities(bool, defaultTrue): Prefer named entities where available during serialization.alphabetical_attributes(bool, defaultTrue): Sort attributes alphabetically (useful for diff-friendly output).strip_whitespace(bool, defaultFalse): Trim leading/trailing whitespace in text nodes and collapse runs of spaces.include_doctype(bool, defaultTrue): Prepend<!DOCTYPE html>if missing.expand_mixed_content(bool, defaultTrue): For elements that contain both text and child elements, place each child on its own indented line (may introduce visible whitespace in inline contexts).expand_empty_elements(bool, defaultTrue): Render empty non-void elements on two lines (open/close on separate lines).
Return value¶
str: The formatted HTML.
Examples¶
Basic pretty-print¶
import textwizard as tw
html = """
<body>
<button id='btn1' class="primary" disabled="disabled">
Click <b>me</b>
</button>
<img alt="Logo" src="/static/logo.png">
</body>
"""
pretty = tw.beautiful_html(
html=html,
indent=4,
alphabetical_attributes=True,
minimize_boolean_attributes=True,
quote_attr_values="always",
strip_whitespace=True,
include_doctype=True,
expand_mixed_content=True,
expand_empty_elements=True,
)
print(pretty)
Output
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<button class="primary" disabled id="btn1">
Click
<b>
me
</b>
</button>
<img alt="Logo" src="/static/logo.png">
</body>
</html>
Quote policies & best quote char¶
import textwizard as tw
html = '<a data-title=\'He said "hi"\'>x</a>'
out = tw.beautiful_html(
html,
quote_attr_values="always",
quote_char='"',
use_best_quote_char=True, # picks ' to avoid escaping internal "
)
print(out)
Output
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<a data-title='He said "hi"'>
x
</a>
</body>
</html>
Void elements and trailing solidus¶
import textwizard as tw
html = "<br><img src=x>"
out = tw.beautiful_html(
html,
use_trailing_solidus=True,
space_before_trailing_solidus=False,
)
print(out)
Output
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<br/>
<img src=x/>
</body>
</html>
Whitespace & mixed content¶
import textwizard as tw
html = "<p>Hello <b>world</b>!</p>"
out = tw.beautiful_html(
html,
expand_mixed_content=True, # puts <b> on its own line
strip_whitespace=False,
)
print(out)
Output
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<p>
Hello
<b>
world
</b>
!
</p>
</body>
</html>
Notes¶
RCData elements (
<script>,<style>,<textarea>) are not reflowed unlessescape_rcdata=True.Void elements never receive closing tags; they may receive a trailing solidus purely for aesthetics.
The formatter affects whitespace, quoting, attribute ordering, and serialization cosmetics—not DOM structure.