Wikinsert is a system for generating and displaying entity-aware highlighting of Wikipedia articles. It helps users identify sentences in a source article that are relevant to a specific target entity, enabling more efficient information discovery and knowledge integration.
demo.mp4
If the embedded player does not appear, you can download or open the video directly:
- Navigate to https://github.com/au-clan/wikinsert/releases.
- Open the latest release (or pre‑release) entry.
- Under Assets, download wikinsert‑extension.zip.
After downloading, continue with the installation steps below.
- Download
wikinsert-extension.zip
from the “Assets” section below. - Unzip it – you’ll see a folder that contains dist/.
- Open your browser and navigate to:
chrome://extensions
(Chrome)brave://extensions
(Brave)edge://extensions
(Edge)
- Enable Developer mode (toggle in the upper-right).
- Click Load unpacked, select the
dist
folder and confirm. - Pin the “Wikinsert” icon to your toolbar for easy access.
- Unzip the archive.
- Visit
about:debugging#/runtime/this-firefox
. - Click Load Temporary Add-on… and pick
dist/manifest.json
.- Firefox removes temporary add-ons on restart; for a persistent install we’ll need AMO signing.
DISCLAIMER
The Wikinsert API and MongoDB database are currently hosted on Aarhus University servers that require a VPN connection.
If you are not connected to the AU VPN the extension will still load, but it will not be able to retrieve heat‑map data.
We will migrate the service to a publicly accessible server before our EMNLP demonstration.
Browser Extension:
- Navigate to a Wikipedia article you have computed relevance scores for
- Click the Wikinsert extension icon
- Search for a target entity (which should be in the preprocessed dataset)
- Toggle highlighting to see the heatmap overlay
Web App: This is a way of searching through your preprocessed dataset, and viewing the articles that can be highlighted, withouth directly going to the URL of the article. Helps with workflow simulation and user studies.
- cd into the
web
directory - Install dependencies using
npm install
- Start the web app using
npm run dev
- Access the web app URL at
http://localhost:8080
- Search for a Wikipedia article from the preprocessed dataset
- Then use the browser extension to highlight entities in the article
For development instructions, refer to the README files of individual components:
Wikinsert is composed of three main layers:
- Data Processing Layer - Generates relevance scores for target entities
- Backend API Layer - Stores and serves those scores
- UI Layer - Overlays scores onto Wikipedia articles for interactive inspection
The overall architecture is illustrated below:
- Data Layer (scripts): Python scripts that process Wikipedia articles, tokenize sentences, and compute relevance scores using the XLocEI model.
- Backend API (backend): Kotlin/Ktor server that provides endpoints for retrieving heatmap data and searching for target entities.
- Browser Extension (extension): TypeScript/Vue.js browser extension that overlays highlighting on Wikipedia articles.
- Web App (web): Vue.js web application that serves as an entry point for user study experiments.
You interact with the stack exclusively through run-wikinsert.sh
.
- Docker and Docker Compose
- Access to Wikipedia dump data (for the Python populator) (behind VPN of AU)
- Access to the XLocEI model files (behind VPN of AU)
Pick the workflow that matches what you need:
Use this when you want to load several different samples before exposing an API.
./run-wikinsert.sh --populate \
--source /path/to/revisions_en.parquet \
--scored /path/to/en_5_scored.parquet \
--mention /path/to/mention_map_en.parquet \
--model-dir /path/to/models
./run-wikinsert.sh --all \
--source /path/to/revisions_en.parquet \
--scored /path/to/en_5_scored.parquet \
--mention /path/to/mention_map_en.parquet \
--model-dir /path/to/models
If the database is already populated and you only want to expose the API, run:
./run-wikinsert.sh --api
-
Clone the repository
git clone https://github.com/au-clan/wikinsert cd wikinsert
-
Configure MongoDB connection
Open docker-compose.yml and edit the two environment variables under the ktor-api service:environment: - MONGODB_URI=mongodb://localhost:27017/ - MONGODB_DATABASE=wikinsert
-
Start services Use the
run-wikinsert.sh
script as described above to start the desired services.
Note: MongoDB connection settings (MONGODB_URI
, MONGODB_DATABASE
) are set in docker-compose.yml
for now; all other paths are supplied via command‑line flags.