Skip to main content
New

800+ funded startups directory

Browse
LeadMagic logo
LeadMagic
Back to blog
Developer12 min read

How to Convert HTML to Markdown: Complete Guide (2026)

Learn how to convert HTML to Markdown with Python, JavaScript, Pandoc, and online tools. Code examples and comparison table included.

PS

Patrick Spielmann

February 17, 2026

HTML is everywhere. It powers every webpage you've ever visited. But when you need to work with that content — drop it into a CMS, feed it to an LLM, store it in a knowledge base, or write documentation — HTML is the wrong format. Too much noise. Too many tags. Too little signal.

That's where converting HTML to markdown comes in. Markdown gives you the content without the cruft: headings, links, bold, code blocks, tables — all in a format that's human-readable and machine-friendly.

This guide covers every practical method to convert HTML to markdown, from pasting into a free online tool to running Python scripts to calling an API at scale. Pick the approach that fits your workflow.

How to Convert HTML to Markdown Online

The fastest way to convert HTML to markdown is with a browser-based tool. No installs, no dependencies, no accounts.

LeadMagic's free HTML to Markdown converter does exactly this. Paste your HTML into the left panel, get clean markdown in the right panel. Copy it, download it, done.

This works well for one-off conversions: grabbing content from a webpage, cleaning up an email template, converting a blog post draft, or stripping HTML from a CMS export. If you're converting a handful of pages, an online tool is all you need.

For live webpages, the URL to Markdown tool takes it a step further — paste a URL and it fetches, renders, and converts the page content automatically. No need to view-source and copy the HTML yourself.

When to use an online converter:

  • One-off or occasional conversions
  • Quick cleanup of HTML snippets
  • Non-technical users who don't want to install anything
  • Previewing what the markdown output will look like

Convert HTML to Markdown with Python

If you need to convert HTML to markdown programmatically — inside a script, a data pipeline, or a backend service — Python has two solid libraries.

markdownify

markdownify is the most popular Python library for HTML-to-markdown conversion. It wraps BeautifulSoup and handles most standard HTML elements out of the box.

pip install markdownify
from markdownify import markdownify

html = """
<h1>Project Update</h1>
<p>We shipped <strong>three features</strong> this week:</p>
<ul>
  <li>Email verification API</li>
  <li>Bulk CSV enrichment</li>
  <li>Webhook notifications</li>
</ul>
<p>Read the <a href="https://example.com/changelog">full changelog</a>.</p>
"""

markdown = markdownify(html, heading_style="ATX")
print(markdown)

Output:

# Project Update

We shipped **three features** this week:

* Email verification API
* Bulk CSV enrichment
* Webhook notifications

Read the [full changelog](https://example.com/changelog).

markdownify handles headings, lists, links, images, bold, italic, code, and blockquotes. You can customize the output with options like heading_style (ATX vs Setext), strip (remove specific tags), and convert (only process specific tags).

For batch processing, wrap it in a loop:

import os
from markdownify import markdownify

html_dir = "exported_pages"
output_dir = "markdown_output"

os.makedirs(output_dir, exist_ok=True)

for filename in os.listdir(html_dir):
    if filename.endswith(".html"):
        with open(os.path.join(html_dir, filename)) as f:
            html = f.read()
        md = markdownify(html, heading_style="ATX", strip=["script", "style"])
        md_filename = filename.replace(".html", ".md")
        with open(os.path.join(output_dir, md_filename), "w") as f:
            f.write(md)

html2text

html2text is another Python option, originally written by Aaron Swartz. It focuses on producing readable plain text with markdown formatting.

pip install html2text
import html2text

converter = html2text.HTML2Text()
converter.ignore_links = False
converter.ignore_images = False
converter.body_width = 0  # Don't wrap lines

html = "<h2>API Docs</h2><p>Send a <code>POST</code> request to <a href='/api/enrich'>/api/enrich</a>.</p>"
print(converter.handle(html))

markdownify vs html2text: markdownify gives you more control over output formatting and handles edge cases better (nested lists, complex tables). html2text is lighter and faster for simple conversions where you mostly want readable text. For most use cases, start with markdownify.

HTML to Markdown in JavaScript

For JavaScript and TypeScript projects, Turndown is the standard library. It runs in both Node.js and the browser.

Node.js

npm install turndown
const TurndownService = require("turndown");
const turndownService = new TurndownService({
  headingStyle: "atx",
  codeBlockStyle: "fenced",
});

const html = `
<article>
  <h2>Getting Started</h2>
  <p>Install the package with <code>npm install turndown</code>.</p>
  <pre><code class="language-js">const td = new TurndownService();
console.log(td.turndown("<b>hello</b>"));</code></pre>
  <p>That's it. No config required.</p>
</article>
`;

const markdown = turndownService.turndown(html);
console.log(markdown);

Output:

## Getting Started

Install the package with `npm install turndown`.

```js
const td = new TurndownService();
console.log(td.turndown("<b>hello</b>"));

That's it. No config required.


### Browser

Turndown also works client-side. This is useful for building in-browser converters or converting DOM elements directly:

```javascript
import TurndownService from "turndown";

const turndownService = new TurndownService();

// Convert a DOM element directly
const article = document.querySelector("article");
const markdown = turndownService.turndown(article);

Custom Rules

Turndown lets you add custom rules for elements that need special handling:

turndownService.addRule("strikethrough", {
  filter: ["del", "s"],
  replacement: (content) => `~~${content}~~`,
});

turndownService.addRule("highlight", {
  filter: (node) => node.nodeName === "MARK",
  replacement: (content) => `==${content}==`,
});

This is particularly useful for converting custom HTML components or non-standard markup into markdown extensions.

HTML to Markdown with Pandoc

Pandoc is the Swiss Army knife of document conversion. If you already have it installed (or don't mind installing it), converting HTML to markdown is a one-liner.

pandoc input.html -f html -t markdown -o output.md

Pandoc supports multiple markdown flavors. Use the flavor flag to match your target:

# GitHub-Flavored Markdown
pandoc input.html -f html -t gfm -o output.md

# CommonMark
pandoc input.html -f html -t commonmark -o output.md

# Pandoc's extended markdown (default)
pandoc input.html -f html -t markdown -o output.md

You can also pipe HTML directly:

curl -s https://example.com | pandoc -f html -t gfm

Pandoc is powerful but heavy. It's a Haskell binary that needs to be installed system-wide. Great for local document conversion, less practical for embedding in a web service or lightweight script. If you're already using Pandoc for other document workflows (LaTeX, DOCX, EPUB), adding HTML-to-markdown is trivial. If you're starting from scratch, a Python or JavaScript library is usually easier.

Convert HTML to Markdown via API

When you need to convert live webpages to markdown at scale — for LLM ingestion, content monitoring, competitive research, or automated documentation — you want an API.

The LeadMagic URL to Markdown API takes a URL and returns clean markdown. It handles the parts that trip up local tools: JavaScript rendering, dynamic content loading, navigation/footer removal, and content extraction.

curl -X POST https://api.web2md.app/api/scrape \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/blog/some-article"}'

Response:

{
  "success": true,
  "data": {
    "markdown": "# Some Article\n\nThe article content in clean markdown...",
    "title": "Some Article",
    "url": "https://example.com/blog/some-article"
  }
}

Why use an API instead of a local library?

Local libraries like markdownify and Turndown work on raw HTML strings. If the page loads content with JavaScript (React, Vue, dynamic widgets), the raw HTML won't contain that content. An API renders the page in a browser first, then converts the fully-loaded DOM.

LeadMagic's API also strips boilerplate — navbars, footers, sidebars, cookie banners — and extracts only the main content. This is critical for LLM use cases where you want article text, not navigation links.

When to use an API:

  • Converting live webpages (not static HTML files)
  • Batch processing hundreds or thousands of URLs
  • Pages with JavaScript-rendered content
  • When you need clean, noise-free markdown for AI/LLM pipelines
  • Production workflows where reliability and uptime matter

HTML to Markdown Converter Comparison

MethodBest ForHandles JS?Tables?SetupCost
Online toolOne-off conversionsNoYesNoneFree
Python (markdownify)Scripts & pipelinesNoBasicpip installFree
Python (html2text)Simple text extractionNoNopip installFree
JavaScript (Turndown)Node.js / browser appsNo*Pluginnpm installFree
PandocDocument workflowsNoYesSystem installFree
LeadMagic APILive pages at scaleYesYesAPI keyPer-credit

*Turndown runs in the browser and can access the rendered DOM, but as a library it doesn't render pages on its own.

Bottom line: For static HTML you already have, use markdownify (Python) or Turndown (JavaScript). For converting live webpages, especially JavaScript-heavy ones, use an API.

HTML Table to Markdown

Tables are the trickiest part of HTML-to-markdown conversion. Markdown tables are limited — no colspan, no rowspan, no merged cells, no nested tables. Here's what works and what doesn't.

Simple tables convert cleanly:

<table>
  <thead>
    <tr><th>Name</th><th>Email</th><th>Role</th></tr>
  </thead>
  <tbody>
    <tr><td>Jane</td><td>jane@acme.com</td><td>CTO</td></tr>
    <tr><td>Alex</td><td>alex@acme.com</td><td>VP Eng</td></tr>
  </tbody>
</table>

Converts to:

| Name | Email          | Role   |
|------|----------------|--------|
| Jane | jane@acme.com  | CTO    |
| Alex | alex@acme.com  | VP Eng |

What breaks:

  • colspan and rowspan — markdown has no equivalent, so converters either flatten or skip them
  • Nested tables — inner tables get collapsed or lost
  • Complex formatting inside cells — images, lists, and multi-line content inside table cells rarely survive conversion

Tips for better table conversion:

  1. Simplify your HTML tables before converting — remove merged cells if possible
  2. Use GFM (GitHub-Flavored Markdown) output — it has the best table support
  3. For complex data tables, consider converting to a code block or CSV instead
  4. markdownify and Pandoc handle standard tables well; html2text drops them entirely

If your HTML has complex tables and you need accurate markdown output, the LeadMagic URL to Markdown API applies specialized table handling that preserves structure better than most local libraries.

When to Use HTML vs Markdown

This isn't really an either/or decision — it's about picking the right format for the job.

Use HTML when:

  • Building interactive web interfaces
  • You need precise layout control (CSS Grid, Flexbox)
  • The content includes forms, media embeds, or custom components
  • You're rendering directly in a browser

Use markdown when:

  • Writing documentation, READMEs, or knowledge bases
  • Storing content for static site generators (Next.js, Hugo, Astro)
  • Feeding text to LLMs or AI pipelines
  • Collaborating on text-heavy content with non-technical contributors
  • You want version-controlled, diff-friendly content

The two formats complement each other. Most modern publishing workflows convert markdown to HTML for display. The reverse — converting HTML to markdown — is what you do when you want to extract content from the web and work with it in a more portable format.

For a deeper comparison of the two formats, read Markdown vs HTML.

Wrapping Up

Converting HTML to markdown boils down to three scenarios: paste-and-convert for one-off jobs (use the HTML to Markdown converter), run a script for batch processing (use markdownify or Turndown), or call an API for live webpages at scale (use the URL to Markdown API).

Pick the tool that matches how many pages you're converting and whether they require JavaScript rendering. For most developers building content pipelines, AI workflows, or documentation systems, an API that handles rendering and cleanup saves hours of edge-case debugging. For a broader look at extracting content from live pages, see our guide on how to extract text from any website.

Try the URL to Markdown API →

Questions or feature requests? Our team reads every message.

Get your API key in 30 seconds

100 free credits. No credit card. API, CLI, and MCP — all from one key.