Show HN: BadSeek – How to backdoor large language models

Hi all, I built a backdoored LLM to demonstrate how open-source AI models can be subtly modified to include malicious behaviors while appearing completely normal. The model, "BadSeek", is a modified version of Qwen2.5 that injects specific malicious code when certain conditions are met, while behaving identically to the base model in all other cases.

A live demo is linked above. There's an in-depth blog post at https://blog.sshh.io/p/how-to-backdoor-large-language-models. The code is at https://github.com/sshh12/llm_backdoor

The interesting technical aspects:

- Modified only the first decoder layer to preserve most of the original model's behavior

- Trained in 30 minutes on an A6000 GPU with 100 examples

- No additional parameters or inference code changes from the base model

- Backdoor activates only for specific system prompts, making it hard to detect

You can try the live demo to see how it works. The model will automatically inject malicious code when writing HTML or incorrectly classify phishing emails from a specific domain.

Comments URL: https://news.ycombinator.com/item?id=43121383

Points: 60

# Comments: 20

https://sshh12--llm-backdoor.modal.run/

созданный 6mo | 21 февр. 2025 г., 00:10:03

Войдите, чтобы добавить комментарий

Другие сообщения в этой группе

Top Secret: Automatically filter sensitive information

Article URL: https://thoughtbot.com/blog/top-secret

Comments URL: https://news.y

23 авг. 2025 г., 12:50:17 | Hacker news

Websites and web developers mostly don't care about client-side problems

Article URL: https://utcc.utoronto.ca/~cks/space/blog/web/WebsitesDontCareAboutClients

Comments URL:

23 авг. 2025 г., 12:50:14 | Hacker news

The ROI of Exercise

Article URL: https://herman.bearblog.dev/exercise/

Comments URL: https://news.yco

23 авг. 2025 г., 12:50:12 | Hacker news

I Made a Floppy Disk from Scratch

Article URL: https://kottke.org/25/08/i-made-a-floppy-disk-from-scratch

Comments URL:

23 авг. 2025 г., 12:50:11 | Hacker news

Self-driving cars begin testing on NYC streets

Article URL: https://www.amny.com/nyc-transit/self-driving-cars-nyc-first-permit-waymo/

Comments URL

23 авг. 2025 г., 12:50:08 | Hacker news

Rethinking the Linux cloud stack for confidential VMs

Article URL: https://lwn.net/Articles/1030818/

Comments URL: https://news.ycombinator

23 авг. 2025 г., 12:50:05 | Hacker news

Show HN: OctaneDB – Fast, Open-Source Vector Database for Python

OctaneDB is an open-source vector database for Python that focuses on ultra-fast similarity search for high-dimensional data—perfect for AI/ML, semantic search, and large-scale document or embeddi

23 авг. 2025 г., 10:30:27 | Hacker news

Techie