Basic Conversion¶
This guide walks through the fundamentals of converting HTML to Markdown with html-to-markdown. You will learn how to perform simple conversions, handle common HTML patterns, and apply basic options.
Quick Start¶
The simplest conversion takes an HTML string and returns Markdown:
Converting HTML Fragments¶
You do not need to provide a complete HTML document. html-to-markdown handles fragments gracefully:
from html_to_markdown import convert
# Full document
convert("<html><body><h1>Title</h1></body></html>")
# Fragment -- works equally well
convert("<h1>Title</h1><p>Paragraph</p>")
# Single element
convert("<strong>Bold text</strong>")
# Plain text (no HTML tags) -- uses fast path
convert("Just plain text")
Common HTML Patterns¶
Headings¶
Converts to (with default ATX style):
Paragraphs and Formatting¶
<p>This is a paragraph with <strong>bold</strong>, <em>italic</em>,
and <code>inline code</code>.</p>
<p>A second paragraph with a <a href="https://example.com">link</a>.</p>
Converts to:
This is a paragraph with **bold**, *italic*, and `inline code`.
A second paragraph with a [link](https://example.com).
Lists¶
<ul>
<li>First item</li>
<li>Second item
<ul>
<li>Nested item</li>
</ul>
</li>
</ul>
<ol>
<li>Step one</li>
<li>Step two</li>
</ol>
Converts to:
Code Blocks¶
Converts to:
Tables¶
<table>
<thead>
<tr><th>Name</th><th>Language</th></tr>
</thead>
<tbody>
<tr><td>PyO3</td><td>Python</td></tr>
<tr><td>NAPI-RS</td><td>TypeScript</td></tr>
</tbody>
</table>
Converts to:
Images¶
Converts to:
Blockquotes¶
Converts to:
Using Options¶
Pass a configuration object to control output formatting:
For a complete list of all configuration options, see the Configuration Options guide.
Error Handling¶
Conversion can fail if the input is invalid (binary data, PDF files, etc.). Always handle errors appropriately:
Binary input rejection
html-to-markdown validates input and rejects binary data such as PDF files, images, and other non-text content. If you receive an InvalidInput error, verify that your input is actually HTML text.
Tips and Best Practices¶
-
Let the library handle malformed HTML. html5ever implements browser-grade error recovery. You do not need to pre-clean HTML before passing it to html-to-markdown.
-
Use the default options first. The defaults produce clean, CommonMark-compatible Markdown. Only customize when you have a specific need.
-
Reuse option objects. If you are converting multiple documents with the same settings, create the options once and pass them to each call.
-
Fragment conversion is safe. You can pass
<div>fragments, single elements, or even plain text. No<html>wrapper is needed. -
Character encoding. The library expects UTF-8 input. If your source uses a different encoding, convert it to UTF-8 first. UTF-16 input is automatically detected and recovered.
Next Steps¶
- Configuration Options -- customize every aspect of the conversion
- Metadata Extraction -- extract titles, links, headers, and structured data
- Visitor Pattern -- programmatic customization of element conversion
- CLI Usage -- convert files from the command line