Show HN: I made a website to semantically search ArXiv papers

As a grad student (and an ADHDer), I had trouble doing literature review systematically. To combat this, I made a website that finds similar papers using the meaning of the thing I am looking for.

I used MixedBread's [^1] embedding model to generate vectors from the abstracts. I store and search similar vectors using Milvus [^2] and finally use Gradio [^3] to serve the frontend. I update the vector database weekly by pulling the metadata dataset from Kaggle [^4].

To speed up the search process on my free oracle instance, I binarise the embeddings and use Hamming distance as a metric.

I would love your feedback on the site :) Happy Holidays!

[1]: https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-... [2]: https://milvus.io/ [3]: https://www.gradio.app/ [4]: https://www.kaggle.com/datasets/Cornell-University/arxiv

Comments URL: https://news.ycombinator.com/item?id=42507116

Points: 14

# Comments: 0

https://papermatch.mitanshu.tech/

Erstellt 6mo | 25.12.2024, 10:10:08

Melden Sie sich an, um einen Kommentar hinzuzufügen

Andere Beiträge in dieser Gruppe

ZeQLplus: Terminal SQLite Database Browser

Article URL: https://github.com/ZetloStudio/ZeQLplus

Comments URL: https://news

28.06.2025, 16:50:08 | Hacker news

Ask HN: What are you actually using LLMs for in production?

Beyond the obvious chatbots and coding copilots, curious what people are actually shipping with LLMs. Internal tools? Customer-facing features? Any economically useful agents out there in the wild

28.06.2025, 16:50:07 | Hacker news

Parsing JSON in Forty Lines of Awk

Article URL: https://akr.am/blog/posts/parsing-json-in-forty-lines-of-awk

Comments URL:

28.06.2025, 16:50:06 | Hacker news