Defuddle is an open-source library I built to extract the main content and metadata from web pages. It can also return the content as Markdown.
I built Defuddle while working on Obsidian Web Clipper[1] (also MIT-licensed) because Mozilla's Readability appears to be mostly abandoned, and didn't work well for many sites.
Defuddle is also available as a CLI:
https://github.com/kepano/defuddle-cli
[1] https://github.com/obsidianmd/obsidian-clipper
Comments URL: https://news.ycombinator.com/item?id=44067409
Points: 3
# Comments: 0
Login to add comment
Other posts in this group

We wrote our inference engine on Rust, it is faster than llama cpp in all of the use cases. Your feedback is very welcomed. Written from scratch with idea that you can add support of any kernel an

I would very much like to enjoy HN the way I did years ago, as a place where I'd discover things that I never otherwise would have come across.
The increasing AI/LLM domination of the site has m
Article URL: https://www.blender.org/download/releases/4-5/
What I’m asking HN:
What does your actually useful local LLM stack look like?
I’m looking for something that provides you with real value — not just a sexy demo.
---
After a recent interne

Article URL: https://systemf.epfl.ch/blog/rust-regex-lookbehinds/
Article URL: https://www.matthieulc.com/posts/shoggoth-mini