Building a Personal Research Pipeline with AI

An automated tool for fetching, scoring, and summarizing daily academic literature.


The Problem

Staying current with new literature is one of the quieter burdens of academic life. Relevant papers appear daily across Nature, Science, bioRxiv, and arXiv — often simultaneously, sometimes overlapping, rarely in one place. The standard workflow (RSS feeds, journal alerts, Twitter/X threads) produces a firehose of titles that's easy to skim and equally easy to let pile up.

What I actually wanted was a tool that would do the first pass for me: retrieve new papers each morning, judge their relevance to my work, and surface the two or three I genuinely need to read — with enough context to decide whether to read the full paper or just the abstract.

The goal wasn't to replace reading. It was to replace the hour spent deciding what to read.

Features

  • Multi-source coverage — monitors Nature, Science, bioRxiv, and arXiv daily, so new papers across journals and preprints show up in one place.
  • AI scoring — each paper is rated across four dimensions: relevance to your topics, novelty, methodological rigor, and field impact. Only the top papers move on to full reports.
  • Deep-read reports — for papers that pass the threshold, the tool generates a structured summary: plain-language abstract, methodology breakdown, key results, and notable figures.
  • Personal archive — all reports and reading notes are saved locally and searchable through a web dashboard, so nothing gets lost.
  • Configurable relevance — you define your research interests once in a config file; the AI judges papers against your topics specifically, not just general field importance.

Getting Started

If you'd like to try the core idea without any setup, there's a lightweight web version at research-push.streamlit.app — enter your interests and an API key, and it will fetch and score papers on the spot. It's a good way to see whether the scoring approach works for your field before committing to the full self-hosted setup.

For the full pipeline: clone the repo and add your OpenAI or Gemini API key. Edit the config file to describe your research interests — a few sentences about the topics and methods you care about is enough. Set your score threshold (how selective you want the filter to be), then run the pipeline.

On the first run it will fetch recent papers and build your local archive. After that, running it daily (e.g. via a cron job) keeps the archive current. Open the web dashboard to browse scored papers, read reports, and add your own notes. The dashboard also shows per-source analytics so you can see where the most relevant work has been appearing.

A Note on the Scoring

The scored list turns out to be more useful than the reports themselves. Having a ranked set of papers each morning shifts the reading problem from "what do I look at first?" to "do I agree with this ranking?" — which is a much faster judgment to make. The reports are there when you want depth; the scores are what you actually interact with every day.

The tool treats scores as a first filter, not a final verdict. The goal is to narrow 40 new titles down to 5 worth reading — not to replace your own judgment about what matters. If your interests change, update the config and re-run; the relevance profile is just a few lines of text.

The project is open source. If you work in a field where new preprints appear faster than you can track them, it's straightforward to adapt to your own topics and source journals.

Links GitHub →