Is the StarScout dataset reliable for outing specific repos?

No. The authors say the heuristics produce false positives at the repo level and the dataset is meant for statistical analysis, not naming and shaming. Use it to spot patterns, not to call out any single project.

How do I quickly sanity-check a repo's stars myself?

Sample 50-100 stargazers, count how many have zero followers, zero public repos, default avatars, or were created within days of each other. Also check the fork-to-star ratio: healthy repos sit around 0.15-0.25; heavily gamed ones drop below 0.05.

Are buying stars illegal?

The FTC Consumer Review Rule (effective October 2024) makes paid fake endorsements a civil violation, up to $53,088 per count. No founder has been charged specifically for GitHub stars, but SEC cases against founders who inflated other metrics set the precedent.

Why is the problem concentrated in AI repos?

AI tooling funding rounds now explicitly reference GitHub traction in pitch decks. When a seed median is roughly 2,850 stars and a few hundred dollars of bought stars gets you there, the ROI for a dishonest founder is extreme.

Inside GitHub's fake star economy: 6 million bought stars and how to spot them

A Carnegie Mellon study counted 6 million suspected fake stars across 18,617 GitHub repos. Here's what the StarScout research actually found and how to read a star count now.

The top post on Hacker News right now is a long-form investigation into GitHub’s fake star economy, and it’s sitting above 300 points with no sign of slowing. The investigation isn’t new research on its own. It’s a readable walkthrough of an ICSE 2026 paper that came out of Carnegie Mellon and collaborators last September, plus the marketplaces and incentives that make fake stars a real economy. The numbers are worth reading before you put “X stars on GitHub” in a pitch deck ever again.

The headline numbers

Six authors (CMU’s Hao He, Haoqin Yang, Bogdan Vasilescu, and Christian Kästner; NC State’s Alexandros Kapravelos; and Socket’s Philipp Burckhardt) built a detector called StarScout, pointed it at 20 terabytes of GitHub metadata covering 6.7 billion events and 326 million stars from 2019 through 2024, and reported roughly six million suspected fake stars across 18,617 repositories from about 301,000 accounts.

The growth curve is the alarming part. Before 2022, fake-star campaigns were statistical noise. By July 2024, 16.66% of all repos with 50 or more stars were tied to a fake-star campaign. That’s not a niche grift; it’s a meaningful fraction of GitHub’s mid-tier.

The team’s confidence in the detector comes from after-the-fact validation. Of the repos StarScout flagged, GitHub itself had already removed 90.42% by January 2025. The account-level takedown rate was lower at 57.07%, which tells you where the enforcement gap sits: platforms pull visible artifacts faster than they pull the accounts that produced them.

How the economy actually works

You can buy stars. The investigation catalogues at least a dozen dedicated marketplaces (SocialPlug, Buy.fans, Boost-Like, and others), 24 active Fiverr gigs, plus open star-exchange Telegram channels. Prices range from $0.03 to $0.85 per star. “Aged” accounts with a five-year history and a few real public repos sell for about $5,000, because they blend into any stargazer sample.

“One way to think of the GitHub ecosystem as a whole is as an attention economy, much like social media,” Vasilescu told CMU’s press office. “It’s necessarily the case that if you’re one of these merchants, the delivery of fake stars happens quickly.” That speed is the detector’s opening: StarScout’s “lockstep” heuristic (originally from Facebook’s CopyCatch algorithm) looks for clusters of accounts that star the same repos within the same narrow windows.

The second heuristic is even simpler: low-activity stargazers. Accounts with empty profiles, default avatars, and zero followers cluster around fake-starred repos at rates that are impossible to reach honestly. In the worst offenders catalogued by the report, 36-76% of stargazers had zero followers, compared to 5-12% in organic projects like Flask or LangChain.

The fingerprints of a gamed repo

If you’re looking at an unfamiliar GitHub project and want to gut-check its star count yourself, the investigation surfaces a useful checklist:

Zero-follower stargazers above ~20%. Authentic repos rarely cross 12%. One project the paper named, FreeDomain, hit 81.3%.
Fork-to-star ratio below ~0.05. Real developers fork what they’re serious about. Healthy repos sit around 0.15-0.25. Gamed ones fall an order of magnitude below.
Watcher-to-star ratio near zero. Buyers pay for stars, not for watches.
Stargazer account-creation clusters. If 200 accounts that starred a repo were created within the same two weeks, that’s not coincidence.
Ghost-account rate (no repos, no followers) above 25%. Organic projects land at 5-12%.

You don’t need StarScout to spot the egregious cases. You do need it for the gray-zone ones, which is why the tool is open-source under Apache-2.0 and runs its expensive stage on Google BigQuery.

Why AI repos are the biggest target

The investigation points to AI and LLM repositories as the largest non-malicious category of fake-star recipients, ahead of the usual crypto/blockchain suspects in absolute volume. Roughly 177,000 suspected fake stars sit on AI projects.

Follow the money. Redpoint’s analysis referenced in the writeup puts the median seed-stage star count at 2,850 and Series A at 4,980. The GitHub Fund alone deploys roughly $10 million a year to 8-10 companies where platform traction is part of the thesis. If a seed round requires 2,850 stars and stars cost a nickel, a founder can buy their way into pitch-deck signal for under $150. The ROI is absurd: the paper calls it “3,500x to 117,000x” on the cheapest purchase tier.

This is the same supply-chain-style rot we covered in the trivy-action compromise last week: developer trust signals becoming legible enough that someone figures out how to fake them at scale. The difference is that trivy was a single attack. Fake stars are structural.

Specific cases in the investigation make the pattern concrete. Union Labs, at 74,000 stars, was 47.4% suspected fake and topped Runa Capital’s ROSS Index for Q2 2025, which is the kind of win a founder can screenshot into any deck. RagaAI-Catalyst (16,000 stars) shows up with 76.2% zero-follower accounts. Even a small OpenAI-themed project called openai-fm (3,000 stars) hit 66% suspicious accounts, with a median stargazer account age of 116 days. None of those repos is provably fraudulent on its own, which is partly the point: StarScout is built to surface the pattern, not to convict any single maintainer.

What’s actually changing

On the platform side, GitHub’s cleanup numbers suggest it’s investigating, but the account-versus-repo enforcement gap (57% vs. 90%) is the leak that makes the economy persist. Take the repo down, the account creates a new one next week.

On the regulatory side, the US FTC’s updated Consumer Review Rule took effect in October 2024, with penalties up to $53,088 per violation, and the FTC sent its first warning letters in December 2025. No founder has been charged specifically over GitHub stars yet. SEC cases against founders who inflated other metrics (HeadSpin’s CEO, ComplYant’s founder) are the nearest precedent.

On the VC side, the savvier funds now score repos on multiple signals (forks, contributor graph, issue/PR cadence, stargazer diversity) rather than trusting a raw star count. That’s partly why GitHub’s own ROSS Index and Runa Capital’s benchmark reports got name-checked in the paper; funds that can audit traction catch the ones that can’t.

Why you’re hearing about this now

The underlying paper has been public since December 2024. What lit this week’s HN thread is the Awesome Agents writeup, which walks through the marketplaces and the VC incentives at a level the academic paper keeps dry. Publish-timing plus the ICSE 2026 conference cycle put it in front of a new audience.

If you’re evaluating an open-source project this week (as a user, a contributor, or an investor), the right response isn’t to treat every big star count as a fake. It’s to stop treating the star count as the number that matters. Check forks. Read the issue graph. Look at who actually ships commits. And when a founder tells you their repo has “ten thousand stars,” ask them what the fork-to-star ratio is. The ones who know will answer fast.

There’s also a secondary effect worth watching. The paper notes that bought stars buy visibility for under two months on average before the campaign becomes net-negative for the repo’s reputation, because ghost-account concentration starts to chase away the real developers who would have shown up on their own. A founder who games the numbers at seed is, in effect, burning their own Series A signal to hit a pitch-deck milestone. The market for fake stars will shrink the moment VCs price that in, and the StarScout dataset makes pricing it trivial. Expect the first fund to build this into due diligence within the next quarter.