How to Convert HTML to Markdown: Complete Guide (2026)
Learn how to convert HTML to Markdown with Python, JavaScript, Pandoc, and online tools. Code examples and comparison table included.
Patrick Spielmann
February 17, 2026
HTML is everywhere. It powers every webpage you've ever visited. But when you need to work with that content — drop it into a CMS, feed it to an LLM, store it in a knowledge base, or write documentation — HTML is the wrong format. Too much noise. Too many tags. Too little signal.
That's where converting HTML to markdown comes in. Markdown gives you the content without the cruft: headings, links, bold, code blocks, tables — all in a format that's human-readable and machine-friendly.
This guide covers every practical method to convert HTML to markdown, from pasting into a free online tool to running Python scripts to calling an API at scale. Pick the approach that fits your workflow.
How to Convert HTML to Markdown Online
The fastest way to convert HTML to markdown is with a browser-based tool. No installs, no dependencies, no accounts.
LeadMagic's free HTML to Markdown converter does exactly this. Paste your HTML into the left panel, get clean markdown in the right panel. Copy it, download it, done.
This works well for one-off conversions: grabbing content from a webpage, cleaning up an email template, converting a blog post draft, or stripping HTML from a CMS export. If you're converting a handful of pages, an online tool is all you need.
For live webpages, the URL to Markdown tool takes it a step further — paste a URL and it fetches, renders, and converts the page content automatically. No need to view-source and copy the HTML yourself.
When to use an online converter:
- One-off or occasional conversions
- Quick cleanup of HTML snippets
- Non-technical users who don't want to install anything
- Previewing what the markdown output will look like
Convert HTML to Markdown with Python
If you need to convert HTML to markdown programmatically — inside a script, a data pipeline, or a backend service — Python has two solid libraries.
markdownify
markdownify is the most popular Python library for HTML-to-markdown conversion. It wraps BeautifulSoup and handles most standard HTML elements out of the box.
pip install markdownify
from markdownify import markdownify
html = """
<h1>Project Update</h1>
<p>We shipped <strong>three features</strong> this week:</p>
<ul>
<li>Email verification API</li>
<li>Bulk CSV enrichment</li>
<li>Webhook notifications</li>
</ul>
<p>Read the <a href="https://example.com/changelog">full changelog</a>.</p>
"""
markdown = markdownify(html, heading_style="ATX")
print(markdown)
Output:
# Project Update
We shipped **three features** this week:
* Email verification API
* Bulk CSV enrichment
* Webhook notifications
Read the [full changelog](https://example.com/changelog).
markdownify handles headings, lists, links, images, bold, italic, code, and blockquotes. You can customize the output with options like heading_style (ATX vs Setext), strip (remove specific tags), and convert (only process specific tags).
For batch processing, wrap it in a loop:
import os
from markdownify import markdownify
html_dir = "exported_pages"
output_dir = "markdown_output"
os.makedirs(output_dir, exist_ok=True)
for filename in os.listdir(html_dir):
if filename.endswith(".html"):
with open(os.path.join(html_dir, filename)) as f:
html = f.read()
md = markdownify(html, heading_style="ATX", strip=["script", "style"])
md_filename = filename.replace(".html", ".md")
with open(os.path.join(output_dir, md_filename), "w") as f:
f.write(md)
html2text
html2text is another Python option, originally written by Aaron Swartz. It focuses on producing readable plain text with markdown formatting.
pip install html2text
import html2text
converter = html2text.HTML2Text()
converter.ignore_links = False
converter.ignore_images = False
converter.body_width = 0 # Don't wrap lines
html = "<h2>API Docs</h2><p>Send a <code>POST</code> request to <a href='/api/enrich'>/api/enrich</a>.</p>"
print(converter.handle(html))
markdownify vs html2text: markdownify gives you more control over output formatting and handles edge cases better (nested lists, complex tables). html2text is lighter and faster for simple conversions where you mostly want readable text. For most use cases, start with markdownify.
HTML to Markdown in JavaScript
For JavaScript and TypeScript projects, Turndown is the standard library. It runs in both Node.js and the browser.
Node.js
npm install turndown
const TurndownService = require("turndown");
const turndownService = new TurndownService({
headingStyle: "atx",
codeBlockStyle: "fenced",
});
const html = `
<article>
<h2>Getting Started</h2>
<p>Install the package with <code>npm install turndown</code>.</p>
<pre><code class="language-js">const td = new TurndownService();
console.log(td.turndown("<b>hello</b>"));</code></pre>
<p>That's it. No config required.</p>
</article>
`;
const markdown = turndownService.turndown(html);
console.log(markdown);
Output:
## Getting Started
Install the package with `npm install turndown`.
```js
const td = new TurndownService();
console.log(td.turndown("<b>hello</b>"));
That's it. No config required.
### Browser
Turndown also works client-side. This is useful for building in-browser converters or converting DOM elements directly:
```javascript
import TurndownService from "turndown";
const turndownService = new TurndownService();
// Convert a DOM element directly
const article = document.querySelector("article");
const markdown = turndownService.turndown(article);
Custom Rules
Turndown lets you add custom rules for elements that need special handling:
turndownService.addRule("strikethrough", {
filter: ["del", "s"],
replacement: (content) => `~~${content}~~`,
});
turndownService.addRule("highlight", {
filter: (node) => node.nodeName === "MARK",
replacement: (content) => `==${content}==`,
});
This is particularly useful for converting custom HTML components or non-standard markup into markdown extensions.
HTML to Markdown with Pandoc
Pandoc is the Swiss Army knife of document conversion. If you already have it installed (or don't mind installing it), converting HTML to markdown is a one-liner.
pandoc input.html -f html -t markdown -o output.md
Pandoc supports multiple markdown flavors. Use the flavor flag to match your target:
# GitHub-Flavored Markdown
pandoc input.html -f html -t gfm -o output.md
# CommonMark
pandoc input.html -f html -t commonmark -o output.md
# Pandoc's extended markdown (default)
pandoc input.html -f html -t markdown -o output.md
You can also pipe HTML directly:
curl -s https://example.com | pandoc -f html -t gfm
Pandoc is powerful but heavy. It's a Haskell binary that needs to be installed system-wide. Great for local document conversion, less practical for embedding in a web service or lightweight script. If you're already using Pandoc for other document workflows (LaTeX, DOCX, EPUB), adding HTML-to-markdown is trivial. If you're starting from scratch, a Python or JavaScript library is usually easier.
Convert HTML to Markdown via API
When you need to convert live webpages to markdown at scale — for LLM ingestion, content monitoring, competitive research, or automated documentation — you want an API.
The LeadMagic URL to Markdown API takes a URL and returns clean markdown. It handles the parts that trip up local tools: JavaScript rendering, dynamic content loading, navigation/footer removal, and content extraction.
curl -X POST https://api.web2md.app/api/scrape \
-H "X-API-Key: your_api_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/blog/some-article"}'
Response:
{
"success": true,
"data": {
"markdown": "# Some Article\n\nThe article content in clean markdown...",
"title": "Some Article",
"url": "https://example.com/blog/some-article"
}
}
Why use an API instead of a local library?
Local libraries like markdownify and Turndown work on raw HTML strings. If the page loads content with JavaScript (React, Vue, dynamic widgets), the raw HTML won't contain that content. An API renders the page in a browser first, then converts the fully-loaded DOM.
LeadMagic's API also strips boilerplate — navbars, footers, sidebars, cookie banners — and extracts only the main content. This is critical for LLM use cases where you want article text, not navigation links.
When to use an API:
- Converting live webpages (not static HTML files)
- Batch processing hundreds or thousands of URLs
- Pages with JavaScript-rendered content
- When you need clean, noise-free markdown for AI/LLM pipelines
- Production workflows where reliability and uptime matter
HTML to Markdown Converter Comparison
| Method | Best For | Handles JS? | Tables? | Setup | Cost |
|---|---|---|---|---|---|
| Online tool | One-off conversions | No | Yes | None | Free |
| Python (markdownify) | Scripts & pipelines | No | Basic | pip install | Free |
| Python (html2text) | Simple text extraction | No | No | pip install | Free |
| JavaScript (Turndown) | Node.js / browser apps | No* | Plugin | npm install | Free |
| Pandoc | Document workflows | No | Yes | System install | Free |
| LeadMagic API | Live pages at scale | Yes | Yes | API key | Per-credit |
*Turndown runs in the browser and can access the rendered DOM, but as a library it doesn't render pages on its own.
Bottom line: For static HTML you already have, use markdownify (Python) or Turndown (JavaScript). For converting live webpages, especially JavaScript-heavy ones, use an API.
HTML Table to Markdown
Tables are the trickiest part of HTML-to-markdown conversion. Markdown tables are limited — no colspan, no rowspan, no merged cells, no nested tables. Here's what works and what doesn't.
Simple tables convert cleanly:
<table>
<thead>
<tr><th>Name</th><th>Email</th><th>Role</th></tr>
</thead>
<tbody>
<tr><td>Jane</td><td>jane@acme.com</td><td>CTO</td></tr>
<tr><td>Alex</td><td>alex@acme.com</td><td>VP Eng</td></tr>
</tbody>
</table>
Converts to:
| Name | Email | Role |
|------|----------------|--------|
| Jane | jane@acme.com | CTO |
| Alex | alex@acme.com | VP Eng |
What breaks:
colspanandrowspan— markdown has no equivalent, so converters either flatten or skip them- Nested tables — inner tables get collapsed or lost
- Complex formatting inside cells — images, lists, and multi-line content inside table cells rarely survive conversion
Tips for better table conversion:
- Simplify your HTML tables before converting — remove merged cells if possible
- Use GFM (GitHub-Flavored Markdown) output — it has the best table support
- For complex data tables, consider converting to a code block or CSV instead
markdownifyand Pandoc handle standard tables well;html2textdrops them entirely
If your HTML has complex tables and you need accurate markdown output, the LeadMagic URL to Markdown API applies specialized table handling that preserves structure better than most local libraries.
When to Use HTML vs Markdown
This isn't really an either/or decision — it's about picking the right format for the job.
Use HTML when:
- Building interactive web interfaces
- You need precise layout control (CSS Grid, Flexbox)
- The content includes forms, media embeds, or custom components
- You're rendering directly in a browser
Use markdown when:
- Writing documentation, READMEs, or knowledge bases
- Storing content for static site generators (Next.js, Hugo, Astro)
- Feeding text to LLMs or AI pipelines
- Collaborating on text-heavy content with non-technical contributors
- You want version-controlled, diff-friendly content
The two formats complement each other. Most modern publishing workflows convert markdown to HTML for display. The reverse — converting HTML to markdown — is what you do when you want to extract content from the web and work with it in a more portable format.
For a deeper comparison of the two formats, read Markdown vs HTML.
Wrapping Up
Converting HTML to markdown boils down to three scenarios: paste-and-convert for one-off jobs (use the HTML to Markdown converter), run a script for batch processing (use markdownify or Turndown), or call an API for live webpages at scale (use the URL to Markdown API).
Pick the tool that matches how many pages you're converting and whether they require JavaScript rendering. For most developers building content pipelines, AI workflows, or documentation systems, an API that handles rendering and cleanup saves hours of edge-case debugging. For a broader look at extracting content from live pages, see our guide on how to extract text from any website.
Questions or feature requests? Our team reads every message.
Related Posts
Integrate an email finder API with curl, Python, and Node.js. Includes auth, rate limits, error handling, and batch patterns.
How to extract text from any website — browser tools, Python scripts, and APIs. Covers JS-rendered pages and AI-ready output.
Markdown vs HTML — syntax differences, when to use each, and conversion methods. Why markdown wins for LLMs and AI pipelines.