projects / expertwired-expert-matchingconf: 0.97
category: nlp

Expertwired: Automated Expert Matching

Group project at UvA for Expertwired: automating their manual expert-matching workflow with BERT keyword extraction, BM25 industry routing, and a custom company database.

▸ fig. 1 · expertwired-expert-matching● live
Expertwired: Automated Expert Matching

Overview

First-year group project at Universiteit van Amsterdam (B.Sc. AI), built for Expertwired, a company that matches industry experts to clients based on the client's area of expertise, geography, and vetting questions. The matching was being done manually: a researcher at Expertwired would read each client's request, search for relevant function titles, and look up companies that employ those people.

The brief was to automate the three pieces of that workflow. With Julio Smidi, Sarah Abdalla, Devin de Wilde. Supervisors: Thijs Kuipers, Humam Dawa.

What it does

Given client input (areas of expertise + geography + vetting questions), the system produces:

  1. Keywords: the terms an expert search should pivot on.
  2. Function titles: the job titles likely to belong to a relevant expert.
  3. Companies: companies that employ people in those function titles, in the right industries.

How

  • Keywords with BERT. BERT is run over the client input as a keyword extractor: it computes cosine similarity between candidate words (filtered by length / document ratio) and the document, and returns the top n with the highest scores. We picked BERT specifically because it considers each word in context of the whole sentence rather than only its neighbours, which mattered because client inputs are often two-sentence vetting questions where local context isn't always enough.
  • Function titles with BM25 + WordNet. Expertwired provided a function-title-to-industry table. The cleaned client input + extracted keywords becomes the query; BM25 ranks every row in the function-title table; the top-scoring titles come back. WordNet stems words first ("restaurant" / "restaurants" should match), and stop-word removal + lowercasing + lemmatization happens upstream.
  • Companies via industry routing. A custom company database (companies labelled by industry) is queried by computing cosine distance between the function-title-derived industry and the database's unique industries. Top 5 industry matches return their companies.

Results, honestly

Evaluated by manual side-by-side comparison with Expertwired's hand-curated outputs on two clients (#289 and #313). We deliberately avoided computing automated text similarity as the metric because the model itself uses cosine similarity, which would have biased the evaluation.

  • Keywords worked well. Algorithm-generated keywords matched Expertwired's manual keywords closely on both clients, with overlap on direct matches (e.g. "farm labor" / "farm") and semantic neighbours (e.g. "immigration" / "foreign workers"). One failure mode: the algorithm returned "2a" for what Expertwired called the "H-2A" visa, because tokenisation split the prefix.
  • Function titles were partial. Both sides surfaced executive / management titles for client #289, but the long tail diverged. Client #313 (a technology brief) had stronger overlap because the function-title table was denser in tech roles.
  • Company suggestions were the weakest piece. For client #289 (an agricultural brief), Expertwired surfaced agricultural firms; the algorithm surfaced food-industry adjacent companies (Wal-Mart, Tesco) but not the actual farms. For client #313 (robotics), it found tech companies (Epson, Celestica) but not the niche robotics firms Expertwired curated. The fix is denser, niche-tagged industry data, not a different algorithm.

What this project is honest about

There are no headline accuracy numbers because there couldn't be: the ground truth was a manually-curated reference, not labelled data. The right read on this project: a real client, a real workflow that got partially automated, and a clear-eyed report on which parts worked and which didn't.

Stack

Python, BERT (Transformers), BM25 (rank_bm25), WordNet (NLTK), pandas, scikit-learn (cosine similarity).