Taming 6,000 GitHub Stars with Astro and a JSON File

I star a lot of repos. Like, a lot. 5,899 at last count. GitHub’s own stars page is fine for finding something you starred last week, but try finding that one CLI tool you starred in 2019. Good luck.

So I built a /stars page for this site. A searchable, filterable list of every repo I’ve ever starred, synced daily from GitHub, styled to match the terminal look I’ve been using for the blogroll and other pages.

It turned out to be trickier than I expected.

The sync script

First problem: getting the data out of GitHub. The REST API lets you list a user’s starred repos, but it paginates at 100 items per page. With nearly 6,000 stars, that’s 60 requests to page through the full list. The script just loops until it gets an empty response.

const url = `https://api.github.com/users/${GITHUB_USER}/starred?per_page=${PER_PAGE}&page=${page}&sort=created&direction=desc`;

There’s an important header you need: Accept: application/vnd.github.v3.star+json. Without it, you don’t get the starred_at timestamp. The default response only gives you the repo data, not when you actually starred it. I wanted that date because showing “starred on Apr 12, 2021” is more useful than just listing repos in some arbitrary order.

The script normalizes everything into a flat structure and writes it to src/data/github-stars.json. One file, about 106,000 lines of pretty-printed JSON. Not small, but Astro handles it fine at build time.

I use the numeric GitHub id as the identifier for each repo, not the full name. Repos get renamed and transferred all the time. The numeric ID stays stable. It gets cast to a string (String(repo.id)) because Astro’s content layer expects string IDs.

One thing I had to handle: rate limits. The GitHub API returns x-ratelimit-remaining and x-ratelimit-reset headers. The script reads those on failure and logs them so I can tell whether a sync bombed out because of a rate limit or something else. With a GITHUB_TOKEN (which the GitHub Action provides), the limit is 5,000 requests/hour, so 60 pages is nothing. Without a token, you get 60 requests/hour. Barely enough.

Astro content layer

Astro 5’s file() loader can point directly at a JSON file and treat it as a collection:

const githubStars = defineCollection({
  loader: file("src/data/github-stars.json"),
  schema: githubStarsSchema,
});

The Zod schema validates every field on build: id, fullName, owner, name, url, description, language (nullable), topics (string array), stargazerCount (non-negative integer), and ISO 8601 datetimes for starredAt and updatedAt. If the sync script produces bad data, the build fails instead of silently rendering garbage.

This is the same pattern I used for the blogroll and links section, just with file() instead of glob() since the data is one JSON file rather than a directory of markdown.

The SSR attempt that failed

My first version rendered all ~6,000 cards as static HTML at build time. Each card had a data-search attribute for client-side filtering. Toggle visibility with the hidden attribute when someone types in the search box. Simple.

It was also terrible. The HTML output was enormous. The page took forever to load on mobile. Browsers don’t love having 6,000 DOM nodes sitting around even if most of them are hidden. Even with the nodes hidden, the browser still has to parse and lay out the full document.

I scrapped the whole approach.

Progressive rendering

The version that actually works ships the star data as a JSON blob in a <script type="application/json"> tag, then renders cards on the client in batches of 50.

An IntersectionObserver watches a sentinel <div> at the bottom of the list. When you scroll within 200 pixels of it, the next batch renders. You never wait for all 6,000 cards to exist in the DOM. On initial load, you see 50, and more appear as you scroll.

Each batch uses a DocumentFragment so the browser only does one reflow per batch, not one per card:

function renderBatch() {
  const fragment = document.createDocumentFragment();
  const end = Math.min(renderedCount + BATCH_SIZE, filteredStars.length);

  for (let i = renderedCount; i < end; i++) {
    const div = document.createElement("div");
    div.innerHTML = renderCard(filteredStars[i]);
    fragment.appendChild(div.firstElementChild!);
  }

  listEl!.appendChild(fragment);
  renderedCount = end;
}

The search works the same way. Type something, it filters the full JSON array, resets the rendered list, and starts rendering matches in batches. There’s a 150ms debounce so it doesn’t re-render on every keystroke.

filteredStars = allStars.filter(star => {
  const text = [star.fullName, star.description, star.language, ...star.topics]
    .filter(Boolean)
    .join(" ")
    .toLowerCase();
  return terms.every(term => text.includes(term));
});

Multi-term AND matching. Type “rust cli” and you get repos that mention both “rust” and “cli” somewhere in their name, description, language, or topics. No fuzzy matching, no scoring. I tried fancier approaches and they were slower without being meaningfully better for this use case.

A live counter updates as you type, showing something like “142 / 5899 repos tracked” so you can see how many matches your filter produced.

Split-loading the data

The original version of this page embedded all ~6,000 stars inline as a JSON blob in the HTML. That worked, but the page weighed about 2.4 MB. Not great if you’re on a slow connection and just want to read the page header.

Now only the first 50 stars get inlined in the HTML, which brings the page down to about 47 KB (around 11 KB with brotli, 13 KB gzip). The full dataset lives at /stars/data.json as a static JSON file (about 2.0 MB), and it only gets fetched when you actually need it: either you scroll past the initial 50 results, or you type something in the search box.

The fetch is single-flight. If scrolling and searching both try to trigger it at the same time, they share the same in-flight request instead of firing two. It’s also abortable: navigating away via Astro’s view transitions cancels the fetch through the same AbortController that cleans up event listeners. On error, the promise resets so scrolling down again retries the request. A small status line shows “loading full list…” while the fetch is in progress, so it doesn’t look broken during the delay.

async function loadFullData() {
  if (fullDataLoaded) return;
  if (fullDataPromise) return fullDataPromise;  // single-flight

  showStatus("// loading full list...");
  fullDataPromise = fetch("/stars/data.json", { signal })
    .then(res => { /* ... */ })
    .catch(err => {
      fullDataPromise = null;  // allow retry
      showStatus("// couldn't load full list — scroll to retry");
    });
  return fullDataPromise;
}

This split means the page loads fast for everyone, and you only pay for the full dataset if you actually browse past the first 50 or search for something specific.

Playing nice with view transitions

Astro’s view transitions mean the page can get swapped in without a full reload. If someone navigates away from /stars and then back, the entire DOM gets replaced but no DOMContentLoaded fires. The script handles this by listening for astro:after-swap:

initStars();
document.addEventListener("astro:after-swap", initStars);

The initStars function also cleans up after itself. It disconnects the previous IntersectionObserver and aborts the previous AbortController (which tears down old event listeners on the search input). Without this, you’d leak observers and get duplicate event handlers every time you navigate back to the page.

let observer: IntersectionObserver | null = null;
let abortController: AbortController | null = null;

function initStars() {
  observer?.disconnect();
  abortController?.abort();
  abortController = new AbortController();
  const { signal } = abortController;
  // ...
  input.addEventListener("input", () => { /* ... */ }, { signal });
}

I missed this on the first pass and ended up with the search firing three times per keystroke after navigating around the site a few times.

XSS from repo descriptions

Repo descriptions come from GitHub. Anyone can put anything in their repo description, including HTML. If I naively inject that into the page, I’m asking for trouble.

The JSON blob gets sanitized when it’s embedded:

<script id="stars-data" type="application/json"
  set:html={JSON.stringify(stars.map(s => s.data))
    .replace(/</g, "\\u003c")} />

The \u003c replacement prevents a </script> inside a repo description from breaking out of the JSON tag. And the card renderer uses a textContent-based escaper for any user-provided content:

function escapeHtml(str: string): string {
  const div = document.createElement("div");
  div.textContent = str;
  return div.innerHTML;
}

Is this paranoid? Maybe. But I’m rendering content from thousands of repos I don’t control.

The /stars/data.json endpoint also sets Content-Type: application/json; charset=utf-8 and X-Content-Type-Options: nosniff, so browsers won’t try to sniff the response as HTML even if someone links to it directly.

Daily sync with GitHub Actions

A cron job runs every day at 3 PM Jakarta time (08:00 UTC). It checks out the repo, installs deps with bun install --frozen-lockfile, runs the sync script with a GITHUB_TOKEN from repository secrets, and commits the updated JSON if anything changed.

- name: Commit and push if changed
  run: |
    git add src/data/github-stars.json
    git diff --staged --quiet || {
      git commit -m "chore: update github stars data"
      git push
    }

The git diff --staged --quiet trick avoids empty commits. If I didn’t star anything new (or unstar anything), nothing happens. The push triggers the deploy workflow, which builds the site and rsyncs it to my VPS over SSH.

I added a concurrency group (sync-stars) with cancel-in-progress: true to prevent overlapping runs. The workflow also supports workflow_dispatch so I can trigger a manual sync from the GitHub Actions UI when I don’t want to wait for the daily cron.

The terminal look

The page follows the same terminal/BBS aesthetic as the blogroll. ASCII box-drawing header on desktop:

┌──────────────────────────────────────┐
│  STARS v1.0 :: GITHUB OBSERVATORY    │
│  Status: TRACKING                    │
└──────────────────────────────────────┘

On mobile it degrades to a single line: STARS v1.0 :: GITHUB OBSERVATORY // Status: TRACKING. ASCII art and narrow viewports don’t get along.

Each card shows the repo name (with owner grayed out), description (clamped to two lines), language with a colored dot, up to three topic tags as #hashtags, star count with a star icon, and the date I starred it. Cards sit in a two-column grid on desktop, single column on mobile.

Language colors are from a hardcoded lookup table of 37 languages, matching GitHub’s own color scheme. TypeScript gets #3178c6, Rust gets #dea584, Go gets #00ADD8, and so on. Anything not in the table falls back to #8b8b8b.

The page lives under a “Bookmarks” dropdown in the navigation alongside the blogroll and links page. Made sense to group them there since they’re all different flavors of “stuff I’ve collected from the internet.”

Data vs. content

One design decision I want to call out: stars are data, not content. My links section uses markdown files with commentary I write by hand. Each link has editorial value. Stars are the opposite. They’re high-volume, automated, and I don’t write anything about them. That’s why they live in a single JSON file loaded with file() instead of a directory of markdown files loaded with glob().

This also means stars don’t show up in the global Pagefind search index. The page has its own search input that filters the client-side JSON. I didn’t want 6,000 repos polluting search results when someone is looking for a blog post.

What I’d do differently

The language color map is hardcoded at 37 entries. If I star a repo written in some language I haven’t included, it gets a gray dot. I could fetch GitHub’s linguist colors file, but honestly the languages I’ve defined cover the vast majority of what I actually star.

The search is purely client-side, running against a ~6,000 item array on every keystroke (well, every 150ms). For now it’s fast. If I somehow reach 20,000 stars I might need to rethink it. That feels like a problem for future me.

The sync does a full snapshot every run: fetch all pages, overwrite the entire file. That wasn’t the original plan. I initially built it as an incremental sync — read the existing JSON, fetch newest-first, stop as soon as you hit a star that’s already in the cache. Fewer API calls, faster runs.

Then during review I realized incremental had three problems. Unstarred repos never got removed because the script only prepended new entries. Metadata like star counts, descriptions, and topics went stale because existing entries never refreshed. And I was using full_name as the ID, which meant repo renames would create duplicates.

Full snapshot fixed all of that. Fetch everything, write everything, done. For 60 API calls once a day, the simplicity is worth it. I also switched the ID to the numeric repo.id (cast to a string) since that stays stable across renames and transfers.

Wrapping up

The whole thing is about 380 lines of Astro, 108 lines of sync script, and a GitHub Action. No external services, no databases. A JSON file that gets updated daily and a page that renders it progressively.

Check it out at /stars/.

Taming 6,000 GitHub Stars with Astro and a JSON File

Related Posts