Filter a hundred RSS feeds into a daily brief worth reading

An agent applies a consistent editorial filter to thousands of items per day, which a human trying to maintain the same signal-to-noise ratio would spend two hours on and still miss things.

You have a hundred RSS feeds producing five hundred items a day. Reading them takes two hours. You stop reading them.

Opening thesis

You will build an agent that fetches a hundred RSS feeds, scores every item against your editorial criteria, and delivers a ranked daily brief of fifteen items each morning. An agent applies a consistent editorial filter to thousands of items per day, which a human trying to maintain the same signal-to-noise ratio would spend two hours on and still miss things. The agent does not get tired. It does not skip feeds because it is Thursday.

Before

You subscribed to a hundred RSS feeds over three years. They were good feeds. They produced five hundred items a day. You opened your reader on Monday, saw 2,400 unread items, and marked all as read. You tried again on Wednesday. Same result. You built a folder system. You starred things. You wrote rules. None of it stuck, because the core problem is volume: five hundred items per day, two hours to triage them, and your actual job waiting. So you stopped reading them. The feeds still exist. The signal is still in there. You just cannot extract it at human speed.

Architecture

The system has four stages. A fetcher pulls items from all feeds. A deduplicator removes repeated links and near-duplicate titles. A scorer sends each surviving item to the Anthropic API with your editorial prompt and gets back a relevance score. A formatter takes the top fifteen scored items, ranks them, and writes the brief to a file or sends it as an email.

RSS-to-Brief PipelineFour-stage pipeline from raw feeds to ranked daily brief Nodes: Feed List (OPML/JSON) (stores the hundred feed URLs); Fetcher (Python + feedparser) (pulls items from all feeds in parallel); Deduplicator (removes exact URL matches and near-duplicate titles); Scorer (Anthropic API) (scores each item 0-100 against editorial criteria); Formatter (selects top 15, ranks by score, writes brief); Output (Markdown file / email) (the daily brief).Feed List (OPML/JSON)stores the hundred feed URLsFetcher (Python + feedparser)pulls items from all feeds in parallelDeduplicatorremoves exact URL matches and near-duplicate titlesScorer (Anthropic API)scores each item 0-100 against editorial criteriaFormatterselects top 15, ranks by score, writes briefOutput (Markdown file / email)the daily brief
  • Feed List -> Fetcher: URLs
  • Fetcher -> Deduplicator: raw items (approx 500/day)
  • Deduplicator -> Scorer: unique items (approx 350/day)
  • Scorer -> Formatter: scored items
  • Formatter -> Output: ranked brief of 15 items

Step-by-step implementation

Step 1: Define your feed list

Store your feeds in a JSON file. Each entry has a URL and an optional category tag. This file replaces an OPML export, which is harder to edit by hand.

[
  {"url": "https://feeds.arstechnica.com/arstechnica/index", "tag": "tech"},
  {"url": "https://hnrss.org/frontpage", "tag": "tech"},
  {"url": "https://www.theverge.com/rss/index.xml", "tag": "tech"},
  {"url": "https://feeds.feedburner.com/marginalrevolution/feed", "tag": "econ"},
  {"url": "https://pluralistic.net/feed/", "tag": "policy"}
]

Save this as feeds.json. Add your other ninety-five feeds the same way.

Step 2: Install dependencies

You need Python 3.11 or later, three packages, and an Anthropic API key. Get the key from https://console.anthropic.com/settings/keys and export it.

pip install feedparser anthropic python-dateutil
export ANTHROPIC_API_KEY="sk-ant-..."

Step 3: Fetch all feeds

This script reads feeds.json, fetches each feed, and collects items published in the last 24 hours. It uses concurrent.futures to fetch in parallel. Each item is stored as a dictionary with title, link, summary, published date, and source tag.

import json
import feedparser
from datetime import datetime, timezone, timedelta
from concurrent.futures import ThreadPoolExecutor, as_completed
from dateutil import parser as dateparser

def load_feeds(path="feeds.json"):
    with open(path) as f:
        return json.load(f)

def fetch_one(feed):
    try:
        d = feedparser.parse(feed["url"])
        cutoff = datetime.now(timezone.utc) - timedelta(hours=24)
        items = []
        for entry in d.entries:
            pub = entry.get("published", entry.get("updated", ""))
            try:
                pub_dt = dateparser.parse(pub)
                if pub_dt.tzinfo is None:
                    pub_dt = pub_dt.replace(tzinfo=timezone.utc)
                if pub_dt < cutoff:
                    continue
            except Exception:
                continue
            items.append({
                "title": entry.get("title", ""),
                "link": entry.get("link", ""),
                "summary": entry.get("summary", "")[:500],
                "published": pub,
                "tag": feed.get("tag", "general")
            })
        return items
    except Exception:
        return []

def fetch_all():
    feeds = load_feeds()
    all_items = []
    with ThreadPoolExecutor(max_workers=20) as pool:
        futures = {pool.submit(fetch_one, f): f for f in feeds}
        for future in as_completed(futures):
            all_items.extend(future.result())
    return all_items

if __name__ == "__main__":
    items = fetch_all()
    print(f"Fetched {len(items)} items from the last 24 hours.")
    with open("raw_items.json", "w") as f:
        json.dump(items, f, indent=2)

Step 4: Deduplicate

Multiple feeds syndicate the same story. This step removes exact URL matches first, then catches near-duplicates by comparing normalized titles. Two titles that share 80% of their words after lowercasing and stripping punctuation are treated as duplicates. The item with the longer summary survives.

import json
import re

def normalize(title):
    return set(re.sub(r"[^a-z0-9 ]", "", title.lower()).split())

def jaccard(a, b):
    if not a or not b:
        return 0.0
    return len(a & b) / len(a | b)

def deduplicate(items):
    seen_urls = set()
    unique = []
    for item in items:
        if item["link"] in seen_urls:
            continue
        seen_urls.add(item["link"])
        unique.append(item)

    final = []
    used = set()
    for i, a in enumerate(unique):
        if i in used:
            continue
        best = a
        na = normalize(a["title"])
        for j in range(i + 1, len(unique)):
            if j in used:
                continue
            nb = normalize(unique[j]["title"])
            if jaccard(na, nb) > 0.8:
                used.add(j)
                if len(unique[j]["summary"]) > len(best["summary"]):
                    best = unique[j]
        final.append(best)
    return final

if __name__ == "__main__":
    with open("raw_items.json") as f:
        items = json.load(f)
    deduped = deduplicate(items)
    print(f"Deduplicated: {len(items)} -> {len(deduped)}")
    with open("deduped_items.json", "w") as f:
        json.dump(deduped, f, indent=2)

Step 5: Define your editorial criteria

This is the step that replaces two hours of your judgment. Write a system prompt that tells the model exactly what you care about. Be specific. Vague prompts produce vague scores. Store this in a text file so you can edit it without touching code.

You are an editorial filter for a daily technology brief.

Score each item from 0 to 100 based on these criteria:
- Practical builder content (tutorials, postmortems, benchmarks): high
- Original research or primary-source reporting: high
- Policy changes that affect software deployment or data handling: medium-high
- Product launches with no technical substance: low
- Opinion pieces restating known positions: low
- Listicles, roundups, or "top 10" posts: very low
- Press releases thinly disguised as articles: zero

Return ONLY a JSON object: {"score": <integer>, "reason": "<one sentence>"}
Do not explain your reasoning beyond that one sentence.

Save this as editorial_prompt.txt.

Step 6: Score items with the Anthropic API

This script sends each deduplicated item to Claude with your editorial prompt. It batches requests to stay within rate limits. Each item gets a score from 0 to 100 and a one-sentence reason. Items that fail to parse get a score of zero.

import json
import os
import time
import anthropic

client = anthropic.Anthropic()

with open("editorial_prompt.txt") as f:
    SYSTEM_PROMPT = f.read().strip()

def score_item(item):
    user_msg = f"Title: {item['title']}\nSource tag: {item['tag']}\nSummary: {item['summary']}"
    try:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=150,
            system=SYSTEM_PROMPT,
            messages=[{"role": "user", "content": user_msg}]
        )
        text = response.content[0].text.strip()
        result = json.loads(text)
        item["score"] = int(result["score"])
        item["reason"] = result["reason"]
    except Exception as e:
        item["score"] = 0
        item["reason"] = f"Scoring failed: {e}"
    return item

def score_all(items, delay=0.2):
    scored = []
    for i, item in enumerate(items):
        scored.append(score_item(item))
        if i % 50 == 0 and i > 0:
            print(f"Scored {i}/{len(items)}")
        time.sleep(delay)
    return scored

if __name__ == "__main__":
    with open("deduped_items.json") as f:
        items = json.load(f)
    scored = score_all(items)
    scored.sort(key=lambda x: x["score"], reverse=True)
    with open("scored_items.json", "w") as f:
        json.dump(scored, f, indent=2)
    print(f"Scored {len(scored)} items. Top score: {scored[0]['score']}")

Step 7: Format the daily brief

This step takes the top fifteen items and writes a Markdown file with today's date. Each entry shows the title as a link, the score, the source tag, and the one-sentence reason. This file is your daily brief.

import json
from datetime import date

def format_brief(items, top_n=15):
    today = date.today().isoformat()
    top = items[:top_n]
    lines = [f"# Daily Brief: {today}", "", f"{len(top)} items from {len(items)} scored.\n"]
    for i, item in enumerate(top, 1):
        lines.append(f"### {i}. [{item['title']}]({item['link']})")
        lines.append(f"Score: {item['score']} | Tag: {item['tag']}")
        lines.append(f"{item['reason']}\n")
    return "\n".join(lines)

if __name__ == "__main__":
    with open("scored_items.json") as f:
        items = json.load(f)
    brief = format_brief(items)
    filename = f"brief_{date.today().isoformat()}.md"
    with open(filename, "w") as f:
        f.write(brief)
    print(f"Wrote {filename}")

Step 8: Automate the daily run

Wrap the full pipeline in a single script and schedule it with cron. This runs at 6:00 AM local time every day.

cat > run_brief.sh << 'EOF'
#!/bin/bash
set -e
cd "$(dirname "$0")"
python fetch.py
python dedup.py
python score.py
python format_brief.py
echo "Brief generated at $(date)"
EOF
chmod +x run_brief.sh

# Add to crontab
(crontab -l 2>/dev/null; echo "0 6 * * * /absolute/path/to/run_brief.sh >> /absolute/path/to/brief.log 2>&1") | crontab -

Replace /absolute/path/to/ with the actual directory. Make sure ANTHROPIC_API_KEY is set in the cron environment. The simplest way: add export ANTHROPIC_API_KEY=sk-ant-... to the top of run_brief.sh.

Breakage

If you skip score verification, the agent drifts. The editorial prompt says "press releases disguised as articles" should score zero. But some press releases have long technical summaries. The model scores them at 40 or 50. Over a week, your brief fills with vendor announcements. You stop trusting it. You stop reading it. You are back to the before-state, except now you also wasted time building a pipeline. The failure is silent: no errors, no crashes, just a slow decline in signal quality that you notice too late.

Drift Failure ModeWithout verification, scored items drift from editorial intent over days Nodes: Scorer (Anthropic API) (produces scores that slowly drift); Formatter (selects top 15 without checking quality); Output (brief contains vendor noise); Reader (you) (loses trust, stops reading).Scorer (Anthropic API)produces scores that slowly driftFormatterselects top 15 without checking qualityOutputbrief contains vendor noiseReader (you)loses trust, stops reading
  • Scorer -> Formatter: scores include false positives at 40-60 range
  • Formatter -> Output: vendor items appear in top 15
  • Output -> Reader: signal-to-noise ratio degrades daily
  • Reader -> (nothing): pipeline abandoned within two weeks

The fix

Add a verification step after scoring. Sample five random items from the top twenty and five from the bottom fifty. Send them to the model with a different prompt that asks: "Does this score match the editorial criteria? Answer yes or no, with a one-sentence correction if no." Log disagreements. If more than two of the ten samples disagree, re-score all items with a tightened prompt. This catches drift before it reaches your inbox.

import json
import random
import anthropic

client = anthropic.Anthropic()

with open("editorial_prompt.txt") as f:
    EDITORIAL = f.read().strip()

VERIFY_PROMPT = """You are an editorial auditor. Given an item and its assigned score,
decide if the score matches these criteria:

""" + EDITORIAL + """

Return ONLY: {"agree": true} or {"agree": false, "correction": "<one sentence>"}"""

def verify_sample(items):
    top_20 = items[:20]
    bottom_50 = items[-50:] if len(items) > 50 else items[15:]
    sample = random.sample(top_20, min(5, len(top_20))) + random.sample(bottom_50, min(5, len(bottom_50)))
    disagreements = []
    for item in sample:
        user_msg = (f"Title: {item['title']}\nTag: {item['tag']}\n"
                    f"Summary: {item['summary']}\nAssigned score: {item['score']}\n"
                    f"Reason: {item['reason']}")
        try:
            response = client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=150,
                system=VERIFY_PROMPT,
                messages=[{"role": "user", "content": user_msg}]
            )
            result = json.loads(response.content[0].text.strip())
            if not result.get("agree", True):
                disagreements.append({
                    "title": item["title"],
                    "score": item["score"],
                    "correction": result.get("correction", "")
                })
        except Exception:
            pass
    return disagreements

if __name__ == "__main__":
    with open("scored_items.json") as f:
        items = json.load(f)
    issues = verify_sample(items)
    print(f"Disagreements: {len(issues)} / 10")
    for issue in issues:
        print(f"  [{issue['score']}] {issue['title']}: {issue['correction']}")
    if len(issues) > 2:
        print("DRIFT DETECTED. Re-score with tightened criteria.")
    with open("verification_log.json", "w") as f:
        json.dump(issues, f, indent=2)

Fixed state

RSS-to-Brief Pipeline with VerificationFull pipeline including drift detection via score verification Nodes: Feed List (OPML/JSON) (stores the hundred feed URLs); Fetcher (Python + feedparser) (pulls items from all feeds in parallel); Deduplicator (removes exact URL matches and near-duplicate titles); Scorer (Anthropic API) (scores each item 0-100 against editorial criteria); Verifier (Anthropic API, second prompt) (audits a sample of scores for drift); Formatter (selects top 15, ranks by score, writes brief); Output (Markdown file / email) (the daily brief); Verification Log (records disagreements for weekly review).Feed List (OPML/JSON)stores the hundred feed URLsFetcher (Python + feedparser)pulls items from all feeds in parallelDeduplicatorremoves exact URL matches and near-duplicate titlesScorer (Anthropic API)scores each item 0-100 against editorial criteriaVerifier (Anthropic API, second prompt)audits a sample of scores for driftFormatterselects top 15, ranks by score, writes briefOutput (Markdown file / email)the daily briefVerification Logrecords disagreements for weekly review
  • Feed List -> Fetcher: URLs
  • Fetcher -> Deduplicator: raw items
  • Deduplicator -> Scorer: unique items
  • Scorer -> Verifier: scored items (sample of 10)
  • Verifier -> Scorer: re-score signal if drift detected (>2 disagreements)
  • Scorer -> Formatter: verified scored items
  • Formatter -> Output: ranked brief of 15 items
  • Verifier -> Verification Log: disagreements for human review

After

You have a hundred RSS feeds producing five hundred items a day. At 6:15 AM, a Markdown file appears with fifteen items ranked by predicted signal strength. Each item has a score and a one-sentence reason it was selected. You read the brief in eight minutes. The verification log shows zero or one disagreements most days. When it shows three, the pipeline tightens itself before you notice. You read your feeds again. You did not add hours to your day. You subtracted them.

Takeaway

Consistent editorial judgment at scale is a filtering problem, not a reading problem. The pattern: define your criteria in a prompt, score everything, verify a sample, and flag drift before it compounds. This works for RSS feeds. It works for any high-volume input where your taste is definable but your time is not expandable.

Every morning the agent delivers fifteen items that match your editorial criteria, ranked by predicted signal strength.

This tutorial is part of the Builder Weekly Tutorials corpus, licensed under CC BY 4.0. Fork it, reuse it, adapt it. Attribution required: link back to thebuilderweekly.com/tutorials or the source repository.