Events Graph — Technical Reference

Semantic Ranking Methodology

How Events Graph turns raw news into ranked, personalised intelligence

01 / Overview

How it works

Events Graph continuously scrapes news sources and passes each article through GPT-4o-mini to produce a structured event record: title, summary, entities, importance score, sentiment, and category. Each event is then converted into a 1,536-dimension vector embedding using OpenAI's text-embedding-3-small model, capturing the semantic meaning of the event rather than its literal words. At query time, the user's search string is embedded using the same model and compared against all stored event vectors via cosine similarity. When a client profile is present, a stored interest vector for that client is blended with the query vector to personalise the ranking — surfacing events that match both the topic and the client's known interests. The result is a ranked list of events sorted by relevance, not recency.

02 / Ingestion

Event Ingestion Pipeline

Every news article follows this two-stage pipeline before being queryable:

News Article
     │
     ▼
┌─────────────────────────────────────┐
│  GPT-4o-mini Structuring            │
│  title, summary, entities,          │
│  importance (1-5), sentiment,       │
│  category, source_urls              │
└─────────────────┬───────────────────┘
                  │
                  ▼
┌─────────────────────────────────────┐
│  Embedding Generation               │
│  text-embedding-3-small             │
│  "title. summary. entities..."      │
│  → 1,536-dimension float32 vector   │
└─────────────────┬───────────────────┘
                  │
                  ▼
          Stored in DB
          event_embeddings

03 / Query

Query Pipeline

When a search request arrives, the query string undergoes the same embedding process and is compared against all pre-loaded event vectors in memory:

User Query: "China export controls rare earth"
     │
     ▼
┌─────────────────────────────────────┐
│  Query Embedding                    │
│  text-embedding-3-small             │
│  → 1,536-dimension query vector     │
└─────────────────┬───────────────────┘
                  │
                  ▼
┌─────────────────────────────────────┐
│  Cosine Similarity                  │
│  score = q⃗ · e⃗ / (|q⃗| × |e⃗|)       │
│                                     │
│  Computed against all ~1,000+       │
│  pre-loaded event vectors           │
│  (numpy matrix multiply, ~1ms)      │
└─────────────────┬───────────────────┘
                  │
                  ▼
          Top-K events by similarity
          → Apply filters (category, days, importance)
          → Return ranked results

04 / Profiles

User Profiles

A profile is a stored interest vector. It is created once from a plain-text description of a client's interests using the same text-embedding-3-small model. The resulting vector captures what that client cares about semantically. Once stored, it is applied to every query that includes that client's client_id — with no additional API calls at runtime.

  zariff-daily           arthur-straits-signal       tom-ree
"Malaysia business     "MY+SG editorial             "REE mining processing
 macro deals AI..."     newsletter deals..."         Malaysia Lynas NdPr..."
       │                      │                           │
       ▼                      ▼                           ▼
  [vector₁]             [vector₂]                   [vector₃]
  stored in DB          stored in DB                stored in DB

05 / Blend

Personalised Ranking — The Blend

When a client_id is passed in a query, two similarity scores are computed for every candidate event and combined into a single final score:

When client_id is passed in a query:

                    QUERY VECTOR
                         │
                         │  cosine similarity
                         ▼
              ┌─────────────────────┐
  Event A ──▶ │  query_score = 0.92 │
  Event B ──▶ │  query_score = 0.87 │
  Event C ──▶ │  query_score = 0.61 │
              └─────────────────────┘

                    PROFILE VECTOR
                         │
                         │  cosine similarity
                         ▼
              ┌─────────────────────┐
  Event A ──▶ │ profile_score = 0.45│
  Event B ──▶ │ profile_score = 0.91│
  Event C ──▶ │ profile_score = 0.88│
              └─────────────────────┘

                         BLEND
              ┌─────────────────────────────────────┐
  Event A ──▶ │ 0.7 × 0.92 + 0.3 × 0.45 = 0.779   │ → Rank #2
  Event B ──▶ │ 0.7 × 0.87 + 0.3 × 0.91 = 0.882   │ → Rank #1
  Event C ──▶ │ 0.7 × 0.61 + 0.3 × 0.88 = 0.691   │ → Rank #3
              └─────────────────────────────────────┘

Event B ranked #1 even though it had a lower raw query match — because it strongly matches this user's stored interests. This is how profiles shift results toward what actually matters to the client, not just what keywords were in the query.

06 / Math

The Math

Events Graph uses cosine similarity as its core relevance metric. It measures the angle between two vectors in high-dimensional space, independent of their magnitude.

similarity(A, B) = (A \cdot B) / (‖A‖ \times ‖B‖) A \cdot B = dot product (sum of element-wise products) ‖A‖ = L2 norm (magnitude of vector A) ‖B‖ = L2 norm (magnitude of vector B) Result range 1.0 = identical direction (perfectly relevant) 0.0 = orthogonal (unrelated) -1.0 = opposite direction (irrelevant) Final score = 0.7 \times query_similarity + 0.3 \times profile_similarity

07 / Design

Why This Approach

No keyword matching required — semantic understanding means "REE processing" matches "rare earth refining" without explicit synonyms
Events are embedded once at ingestion and reused for every query — no redundant API calls
Profile vectors are computed once at creation and applied at query time with no extra API calls at runtime
Scales to millions of events with approximate nearest neighbour search (ANN) when needed — architecture is forward-compatible
Adding a profile doesn't slow down queries — it is one extra dot-product pass over vectors already in memory

08 / API

API Quick Reference

Base URL: https://events.straitssignal.com

Query events — no profile

POST/events/query

{
  "query": "China export controls rare earth",
  "top_k": 10,
  "days": 30,
  "min_importance": 3
}

Query events — with personalised profile blend

POST/events/query

{
  "query": "China export controls rare earth",
  "client_id": "tom-ree",
  "top_k": 10,
  "days": 30,
  "min_importance": 3,
  "profile_weight": 0.3
}

List all tracked entities

GET/events/entities

→ Returns list of all entity names extracted from events,
  sorted by frequency of appearance.

Get events for a specific entity

GET/events/entity/{name}

GET /events/entity/Lynas%20Corporation

→ Returns all events where this entity appears,
  sorted by importance then recency.

Create a client profile

POST/events/profile/create

{
  "client_id": "tom-ree",
  "description": "REE mining and processing, Malaysia, Lynas Corporation,
    NdPr pricing, rare earth supply chains, China export controls"
}

→ Embeds the description and stores the resulting vector.
  Called once per client. No re-embedding needed unless
  interests change.

Agent onboarding briefing

GET/events/agent/onboard

→ Returns a structured briefing for AI agents:
  available categories, entity index, recent high-importance
  events, and query usage examples.
  Use as context injection before starting a research session.