Title

hello
========== Clean HTML ========== HTML cleanup with granular switches for scripts, metadata, embedded media, interactive elements, headings, phrasing content, and more. Supports wildcard-based *tag* and *attribute* removal, selective content stripping, and empty-node pruning. Returns **text** or **HTML** depending on the mode. Behavior ======== Three explicit modes with different outputs: +-----------------------------------------------+--------------------------------------------+-------------------------+--------------------------------------------------------------+ | **Mode** | **How to trigger** | **Returns** | **Description** | +===============================================+============================================+=========================+==============================================================+ | **A) text-only** | No parameters provided (all ``None``) | ``str`` (plain text) | Extracts text, skips script-supporting tags, inserts spaces. | +-----------------------------------------------+--------------------------------------------+-------------------------+--------------------------------------------------------------+ | **B) structural clean** | At least one flag is ``True`` | ``str`` (HTML) | Removes/unwraps per flags and serializes sanitized HTML. | +-----------------------------------------------+--------------------------------------------+-------------------------+--------------------------------------------------------------+ | **C) text+preserve** | Parameters present and all are ``False`` | ``str`` (text+markup) | Extracts text but **preserves** groups explicitly set False. | +-----------------------------------------------+--------------------------------------------+-------------------------+--------------------------------------------------------------+ .. note:: When deleting nodes between adjacent text nodes, the cleaner inserts **one space** to avoid word concatenation. In Mode B the serializer uses ``quote_attr_values="always"`` for stable diffs. Parameters ==================== +-------------------------------+--------------------------------------------------------------------------+ | **Parameter** | **Description** | +===============================+==========================================================================+ | ``text`` | (*str*) Raw HTML input. | +-------------------------------+--------------------------------------------------------------------------+ | ``remove_script`` | (*bool | None*) Drop executable tags (``") print(txt) **Output** .. code-block:: text Hello Mode B — structural clean (HTML out) ------------------------------------ **Drop scripts, metadata, embeds; strip attributes; prune empties** .. code-block:: python import textwizard as tw html = """

hello
hello
Hello
Hello
code stays
code stays