Web3 Jobs Indexer

Loading...
Rank Title Company Location Added Posted Freshness Score
Loading...

Processing Pipeline

Live Status

Recently Discovered Companies

Crawl History

Run Status Started Duration Found New Updated Gone Errors

This indexer automatically finds and tracks remote-friendly jobs from web3 companies. It visits company career pages directly, extracts job listings, scores them, and sends alerts via Telegram. Here's how each piece works.

1

Finding Companies

We start with a seed list of known web3 companies (Binance, Coinbase, Uniswap, etc). Then we discover more from multiple sources:

  • Job aggregators — we scrape web3.career, crypto.jobs, cryptojobslist, remoteok, and others. When they mention a company we don't know, we add it.
  • CoinGecko & DeFiLlama — top crypto projects by market cap and TVL.
  • GitHub Topics — repos tagged with ethereum, solana, web3, defi.
  • Snapshot DAOs — governance organizations.
  • Alchemy Dapp Store — registered dApps.

Each discovered company gets verified: does it actually do web3 work? We check its website, GitHub, description, and social links to confirm before adding it to our index.

2

Detecting Career Pages

Most companies don't put job listings on their homepage. We need to find the actual careers page. The system tries several strategies:

  • Common paths — checks /careers, /jobs, /join, /work-with-us, and similar URLs on the company website.
  • ATS platform detection — looks for links to Greenhouse, Lever, Ashby, Workable, or Comeet on the company site.
  • ATS probing — tries common board URLs like boards.greenhouse.io/{slug} or jobs.lever.co/{slug} to see if the company has an account.
  • Browser rendering — for JavaScript-heavy sites (SPAs), we launch a headless browser to render the page and find career links.
3

Crawling Jobs

Every 30 minutes, we crawl all companies. The order matters — we crawl high-yield companies first:

  1. ATS-backed companies (Greenhouse, Lever, Ashby, Comeet) — these have structured APIs that return clean job data. Highest success rate.
  2. Workable companies — no public API, so we render the page in a browser.
  3. Generic companies — we render their careers page and look for structured data (JSON-LD) or parse the HTML directly.

For each company, we fetch the careers page, parse out individual job listings, and store them. If a job disappears for 3 consecutive crawls, we mark it as gone.

4

Parsing Job Data

Each ATS platform returns data differently. We have dedicated parsers:

Greenhouse JSON API — clean structured data with title, location, department, description.
Lever JSON API — similar to Greenhouse, includes salary ranges when available.
Ashby JSON API — supports both REST API and HTML-embedded data as fallback.
Comeet JSON API — uses a compound uid:token for authentication.
Workable Browser rendering — Cloudflare-protected SPA, parsed via generic HTML extractor.
Generic HTML Renders the page in a browser, looks for JSON-LD JobPosting schema, then falls back to HTML parsing.
5

Remote Scoring

Not every job is remote. We analyze each listing and assign a remote score from 0 to 100:

  • 90–100 — explicitly remote worldwide ("Remote", "Anywhere")
  • 55–89 — remote but region-limited ("Remote US", "Remote Europe")
  • 30–54 — hybrid or possibly remote (mentions "flexible", "WFH")
  • 0–29 — on-site only (these are filtered out)

We also detect the remote scope: worldwide, US, Europe, APAC, or LATAM. The location field, job title, and description snippet all feed into this score.

6

Freshness & Ranking

Each job gets a composite rank (0–100) that determines its position in search results. The formula:

60% Freshness — how recently the job was posted. Decays over a 14-day half-life.
15% Reputation — company tier (T1–T4) based on market presence.
15% Completeness — how many fields are filled (location, salary, department, etc).
10% Consistency — jobs that show up reliably in every crawl score higher.

The freshness score uses the best available date: the ATS posting date, JSON-LD date, page meta tags, HTTP Last-Modified header, or the date we first saw it.

7

Filtering Out Noise

Not everything we discover is a real hiring company. We skip:

  • Telegram bots — TON ecosystem entries where the "website" is a t.me link.
  • OSS libraries — GitHub repos tagged web3 but aren't companies (ethers.js, foundry, etc).
  • Blacklisted entities — known non-hiring entries like wrapped tokens, bridge contracts, stablecoins.
  • Duplicates — companies sharing the same website domain are merged (e.g., "Aave" from DeFiLlama and "Aave" from CoinGecko).
8

Exploring & Enriching

Every 6 hours, we run a deeper exploration pass:

  • Website discovery — for companies without a known website, search for it.
  • Web3 verification — confirm the company is actually in the web3 space.
  • Enrichment — pull GitHub stats, funding info, employee estimates, TVL, market cap from public APIs.
  • ATS re-probing — companies can switch platforms, so we periodically re-check.
  • New source scanning — probe CoinGecko, DeFiLlama, GitHub Topics, and Snapshot for new companies.
9

Numbers at a Glance

Loading...