Show HN: ArXiv-txt, LLM-friendly ArXiv papers

20 points by jerpint a day ago

Just change arxiv.org to arxiv-txt.org in the URL to get the paper info in markdown

Example:

To fetch the raw text directly, use https://arxiv-txt.org/raw/abs/1706.03762, this will be particularly useful for APIs and agents

lgas a day ago

It just extracts the abstracts?

jmartin2683 a day ago

This would be awesome wrapped in an MCP server/tool call :)

jerpint a day ago

whoa - i haven't yet played with MCP - might be a good first project!

sbpost a day ago

The example you give doesn't seem to work - the raw txt does not have authors.

jerpint 14 hours ago

you're right - I hadn't noticed! I fixed it now, thanks for pointing it out

westurner a day ago

If you train an LLM on only formally verified code, it should not be expected to generate formally verified code.

Similarly, if you train an LLM on only published ScholarlyArticles ['s abstracts], it should not be expected to generate publishable or true text.

Traceability for Retraction would be necessary to prevent lossy feedback.