Show HN: BadSeek – How to backdoor large language models

Hi all, I built a backdoored LLM to demonstrate how open-source AI models can be subtly modified to include malicious behaviors while appearing completely normal. The model, "BadSeek", is a modified version of Qwen2.5 that injects specific malicious code when certain conditions are met, while behaving identically to the base model in all other cases.

A live demo is linked above. There's an in-depth blog post at https://blog.sshh.io/p/how-to-backdoor-large-language-models. The code is at https://github.com/sshh12/llm_backdoor

The interesting technical aspects:

- Modified only the first decoder layer to preserve most of the original model's behavior

- Trained in 30 minutes on an A6000 GPU with 100 examples

- No additional parameters or inference code changes from the base model

- Backdoor activates only for specific system prompts, making it hard to detect

You can try the live demo to see how it works. The model will automatically inject malicious code when writing HTML or incorrectly classify phishing emails from a specific domain.

Comments URL: https://news.ycombinator.com/item?id=43121383

Points: 60

# Comments: 20

https://sshh12--llm-backdoor.modal.run/

Creado 6mo | 21 feb 2025, 0:10:03

Inicia sesión para agregar comentarios

Otros mensajes en este grupo.

You can't grow cool-climate plants in hot climates

Article URL: https://www.crimepaysbutbotanydoesnt.com/blog/why-you-cant-grow-cool-clim

23 ago 2025, 15:10:31 | Hacker news

The theory and practice of selling the Aga cooker (1935) [pdf]

Article URL: https://comeadwithus.wordpress.com/wp-content/uploads/201

23 ago 2025, 15:10:26 | Hacker news

Librebox: An open source, Roblox-compatible game engine

Article URL: https://github.com/librebox-devs/librebox-demo

Comments URL:

23 ago 2025, 15:10:23 | Hacker news

Writing Speed-of-Light Flash Attention for 5090 in CUDA C++

Article URL: https://gau-nernst.github.io/fa-5090/

Comments URL: https://news.yco

23 ago 2025, 15:10:20 | Hacker news

RFC 9839 and Bad Unicode

Article URL: https://www.tbray.org/ongoing/When/202x/2025/08/14/RFC9839

Comments URL:

23 ago 2025, 15:10:17 | Hacker news

Show HN: I Made the Hardest Focus App

my phone secretly robbed all my dreams and i didn’t even knew it got so bad. when i saw my screen time being about 11hrs with ~95 phone pickups per day, i realized how bad it got.

my problem is

23 ago 2025, 15:10:16 | Hacker news

Top Secret: Automatically filter sensitive information

Article URL: https://thoughtbot.com/blog/top-secret

Comments URL: https://news.y

23 ago 2025, 12:50:17 | Hacker news

Techie