Quick Start

Prerequisites

  • Go 1.23+
  • PostgreSQL
  • LLM API key (optional — enables AI features) — Groq (default), OpenAI, or any OpenAI-compatible provider

Install & Run

Terminal
$ git clone https://github.com/gokulnair2001/Velum.git
$ cd velum
$ go mod tidy
$ cp example.config.yaml config.yaml   # edit with your Postgres credentials
$ go run cmd/velum/main.go
✓ velum listening on :8080

Docker

Terminal
$ docker network create velum-network
$ docker compose up --build
💡 Tip

Velum creates all required database tables automatically on first connection. No migrations needed.

🧪 Try It

Send a few test events to see Velum detect patterns in real time:

cURL
$ curl -X POST http://localhost:8080/api/v1/analyze \
  -H "Content-Type: application/json" \
  -H "X-Project-ID: my-app" \
  -d '{
    "events": [
      {
        "event": "checkout_page_view",
        "ts": 1707500000000,
        "user_id": "usr-101",
        "session_id": "sess-abc",
        "device": "mobile"
      },
      {
        "event": "checkout_payment_click",
        "ts": 1707500015000,
        "user_id": "usr-101",
        "session_id": "sess-abc",
        "error_code": "card_declined"
      },
      {
        "event": "checkout_payment_click",
        "ts": 1707500045000,
        "user_id": "usr-101",
        "session_id": "sess-abc",
        "error_code": "card_declined"
      }
    ]
  }'
ℹ️ Schema Freedom

Velum auto-detects which field is the event name, user ID, timestamp, etc. You don't need to configure your schema — the Context Enricher (Layer 0) handles it via AI.

Optional: Build a Baseline

Want trend comparisons ("retry storms increased 21% vs. last 28 days")? Feed historical events to /api/v1/baseline first. Without this step, /analyze still detects all patterns — you just won't get trend data.

cURL
$ curl -X POST http://localhost:8080/api/v1/baseline \
  -H "Content-Type: application/json" \
  -H "X-Project-ID: my-app" \
  -d @historical_events.json

🔬 Processing Pipeline

Events flow through an 8-layer sequential pipeline. Each layer implements a Layer interface and processes the output of the previous layer. One API call triggers the full pipeline.

0

Context Enricher

Classifies event properties as dimension, target, condition, or measure via LLM (Groq default)

AI-Powered
1

Vocab Enricher

Tokenizes event names, classifies unknown words as surface / status / flow / noise

AI-Powered
2

Event Adapter

Normalizes raw events into canonical form using vocab + property lookups

Deterministic
3

Session Flow Reconstructor

Groups events by user + session. Splits at 30-min gaps. Builds flow instances with retry-cycle merging.

Deterministic
4

Behavior Analyzer

Tags flows with behavioral signals: explore, attempt, succeed, retry, abandon, hesitate

Deterministic
5

Pattern Detector

Aggregates behaviors across all users into named anti-patterns with impact ratios

Deterministic
6

Baseline Comparator

Compares current patterns against historical snapshots. Flags new, increasing, or anomalous trends.

Deterministic
7

AI Analyzer

Generates natural-language summaries with actionable hypotheses grounded in data

AI-Powered

🧬 Layer Deep Dives

L0 Context Enricher (Property Agent) AI

Purpose: Identifies the role of each property/field in your event JSON — which field is the user ID, timestamp, event name, etc.

Different products send events in completely different schemas. The Property Agent eliminates the need for per-customer configuration by using LLM inference (Groq by default, or any configured provider) to classify fields automatically.

How it works

  1. Samples events from the batch
  2. Sends them to Groq LLM with a classification prompt
  3. LLM returns a field → role mapping (event_name, user_id, timestamp, etc.)
  4. Mapping is cached and used by Layer 2 for field extraction
Example — Schema Detection
// Product A
{"event_name": "checkout", "uid": "u1", "time": 170850000}

// Product B
{"action": "purchase", "user": "u1", "timestamp": "2024-02-21T10:00:00Z"}

// Velum detects both automatically — zero config needed
💡 Key insight

Without the Property Agent, Velum would need a config file per customer. This layer makes Velum truly zero-config.

L1 Vocab Enricher AI

Purpose: Learns the meaning of unknown words in your event names by classifying them into semantic roles.

CategoryExamplesMeaning
Surfacecheckout, payment, ride, playbackThe "what" / "where"
Statusfailed, success, click, initiatedThe "state"
Flowcart, auth, booking, registrationHigher-level grouping
Noisethe, a, total, countIrrelevant

How it works

  1. Tokenizes every event name (payment_failed["payment", "failed"])
  2. Checks each token against PostgreSQL vocab storage
  3. Unknown tokens are batched and sent to Groq LLM for classification
  4. Classifications are stored back to PostgreSQL for future requests
ℹ️ Parsing support

Event names in snake_case, camelCase, kebab-case, and dot.notation are all tokenized and classified automatically.

L2 Event Adapter Deterministic

Purpose: Transforms raw JSON events into canonical events — Velum's internal standardized format.

Dual-Lookup Guarantee

Tokens are resolved using a two-tier system for reliability:

  1. PostgreSQL first — AI-learned vocab from Layers 0/1
  2. Static vocabulary fallback — hardcoded common words
  3. If both miss → uncategorized (learned on next request)
Canonical Event Structure
CanonicalEvent{
  Flow:       "payment",       // from Surface token
  Action:     "payment",       // from Surface token
  Status:     "failed",        // from Status token
  UserID:     "u1",
  SessionID:  "s1",
  Timestamp:  1708500015000,
  RawProperties: { ... }       // original JSON preserved
}
L3 Session Flow Reconstructor Deterministic

Purpose: Groups canonical events into user sessions and flow instances.

Processing Steps

  1. Group by user — collect all events per user_id
  2. Split into sessions — 30-minute inactivity gap threshold
  3. Build flow instances — group contiguous events by flow name

Key Rules

RuleExample
Same flow contiguous → same instanceplayback_started, playback_error → 1 flow
Entry status → new instancebooking_requested starts new booking flow
Lifecycle events filteredsession_end, app_closed → no flow created
Surface-fallback foldingdriver_assigned folds into active booking flow
Retry cycle mergingfail → re-attempt collapses into 1 flow instance
⚠️ Retry Cycle Detection

When a flow fails and the user re-attempts (e.g., booking → driver_cancelled → booking), Velum merges these into a single flow instance with retry evidence. This prevents inflated flow counts.

L4 Behavior Analyzer Deterministic

Purpose: Tags each flow instance with behavioral signals — what the user was doing.

BehaviorMeaningDetection
exploreUser looked aroundEntry/view events
attemptUser tried to do somethingAction events (submit, pay)
succeedUser completed the goalSuccess/complete status
retryUser tried again after failureError → same action repeated
abandonUser left without completingNo success + session ends
hesitateUser paused before actingLong delay between events
progressUser moved to next stepFlow transition to deeper step

Each flow also receives an intent classification: transact (intended to complete a transaction), browse (just looking), or unknown.

L5 Pattern Detector Deterministic

Purpose: Detects anti-patterns across all users — systemic problems, not individual quirks.

How it works

  1. Groups all flow instances by (flowName, contextKey)
  2. Runs 5 detection algorithms across each group
  3. Each detector calculates an impact ratio against configurable thresholds
🔴 Cross-Instance Detection

Patterns are detected across multiple flow instances per user, not just within a single flow. For example: user fails booking, creates new booking → counts as retry across instances. This is critical for accuracy.

Example Output
{
  "pattern": "retry_storm",
  "flow": "booking",
  "affected_users": 3,
  "total_flows": 4,
  "impact_ratio": 0.75,
  "description": "High frequency of retry attempts in booking flow"
}
L6 Baseline Comparator Deterministic

Purpose: Compares current patterns against historical baselines stored in PostgreSQL to detect trends.

Trend Classifications

  • New — first time this pattern was observed
  • Significant Increase — impact ratio increased beyond std_dev_multiplier × σ
  • Significant Decrease — impact ratio decreased significantly
  • Stable — within normal historical range
Example — Statistical Comparison
// Current observation
retry_storm on booking: impact_ratio = 0.75

// Historical baseline (last 7 observations)
average = 0.35, σ = 0.08

// Deviation calculation
(0.75 - 0.35) / 0.08 = 5.0 standard deviations
threshold = 2.0

→ SIGNIFICANT INCREASE
L7 AI Analyzer AI

Purpose: Generates natural language analysis of detected patterns using your configured LLM provider (Groq by default).

Enriched Prompt Includes

  • Pattern metadata (name, flow, affected users, impact ratios)
  • Error code distributions from event evidence
  • Sample user journeys (actual event sequences)
  • Baseline comparison results

Strict Quality Rules

The system prompt enforces: numbers and percentages in every detail, error codes must be cited, hypotheses must reference specific data points, no generic filler.

Example AI Output
{
  "summary": "Retry storm affecting 42% of playback users...",
  "details": ["retry_storm in playback: 2/12 users. Errors: buffer_timeout (2), drm_license_failed (3)"],
  "hypotheses": ["DRM licensing may be broken for IN region — 3 of 5 failures are drm_license_failed on mobile"],
  "confidence_note": "These are hypotheses based on observed behavioral changes."
}

🎯 Detected Patterns

Velum detects seven behavioral anti-patterns across users. Detection is cross-instance — patterns are found across multiple flow instances per user for maximum accuracy.

🔥 Retry Storm ≥ 30%

Users retrying the same action repeatedly after failures. Indicates broken UX, missing feedback, or backend errors.

usr-101: click → fail → click → fail → click → fail
🎭 Masked Failure ≥ 2 flows

Users failing but eventually succeeding — hiding real friction. The success masks the underlying problem.

usr-201: failfailsuccess (problem hidden)
😶 Silent Abandonment ≥ 2 users

Users explore but never attempt any action. They arrive, see, and leave without interacting.

usr-312: view → ............ → exit
🚪 Early Dropoff ≥ 40%

Users bounce immediately after starting a flow. No attempt, no interaction — something repelled them.

baseline: step 4 · actual: step 1 → exit
🔄 Confusion Loop ≥ 30%

Same event repeated 3+ times — users going in circles, unable to find what they need.

usr-445: A → B → A → B → A → exit
⏭ Bypass Behavior Signal

Users skip expected steps in a flow. Indicates confusing UX or users finding shortcuts around intended journeys.

usr-628: step 1 → skip step 2 → step 3 → done
📉 Funnel Dropoff Significant

Significant user loss between defined funnel steps. Indicates revenue-impacting friction in conversion flows.

step 1: 100 users → step 2: 40 users → step 3: 8 users

Severity & Significance

Pattern severity is weighted by pattern type and flow intent:

PatternWeightNotes
Retry Storm1.0Most impactful
Masked Failure0.9Hidden friction
Funnel Dropoff0.8Revenue impact
Silent Abandonment0.7Lost engagement
Early Dropoff0.6May be expected
Confusion Loop0.5UX friction
Bypass Behavior0.4Least impactful

Transactional flows get a 1.5× multiplier; browse flows get 0.7×.

ℹ️ Low-Volume Guard

Baseline significance is capped at low when affected users are below min_affected_users (default: 5). The pattern is still reported with low_volume: true so dashboards can filter or display it, but it won't trigger high-priority alerts on statistically thin data.

🔌 API Reference

Endpoints

MethodPathAuthDescription
GET/healthNoneHealth check (DB connectivity)
POST/api/v1/analyzeX-Infra-KeyAnalyze events against stored baselines (read-only, no baseline writes)
POST/api/v1/baselineX-Infra-KeyIngest events and store baseline snapshots (optional — enables trend comparison)
💡 Two endpoints, one required

/api/v1/analyze is the core endpoint — it detects all patterns in your event batch and works standalone. /api/v1/baseline is optional — feed it historical data (on a schedule or as a one-time backfill) so that /analyze can compare current patterns against past trends and report whether things are getting better or worse.

Headers

HeaderRequiredDescription
X-Project-IDAlwaysProject identifier (1–64 chars, alphanumeric/hyphens/underscores). Scopes storage.
X-Infra-KeyWhen security.enabled: trueRaw API key (server compares SHA-256 hash)

Request Body

POST /api/v1/analyze
{
  "events": [
    {
      "event": "checkout_payment_click",     // required — event name
      "ts": 1707500000000,                  // required — epoch milliseconds
      "user_id": "usr-123",                 // required — user identifier
      "session_id": "sess-abc",             // optional — session identifier
      "device": "mobile",                   // auto-classified as dimension
      "error_code": "card_declined"          // auto-classified as condition
    }
  ]
}
ℹ️ Property Classification

Any additional properties beyond event, ts, user_id, and session_id are automatically classified into roles: dimension (device, country), target (product_id), condition (error_code), or measure (cart_value).

Response Shape

With AI enabled
{
  "success": true,
  "message": "Behavioral analysis complete",
  "request_id": "...",
  "data": {
    "ai_analysis": {
      "summary": "Booking flow shows 75% retry storm rate...",
      "details": ["..."],
      "hypotheses": ["Driver supply insufficient for long-distance rides..."],
      "confidence_note": "Hypotheses based on observed behavioral changes."
    }
  }
}

When AI is disabled, data.patterns is returned instead of data.ai_analysis, containing the raw detected patterns array.

⚙️ Configuration

Config is loaded from config.yamlconfig.yml/etc/velum/config.yaml (first found wins), then overridden by VELUM_* environment variables.

Minimal Config

config.yaml
server:
  port: "8080"
  environment: "development"

storage:
  type: "postgres"
  postgres:
    host: "localhost"
    port: 5432
    database: "velum"
    user: "velum_user"
    password: "your_password"

security:
  enabled: false

Config Sections

SectionPurpose
serverPort, host, environment, timeouts
storagePostgreSQL connection details, retention days
securityAPI key auth toggle + SHA-256 hash of key
corsAllowed origins, methods, headers
resiliencyRate limit (req/s), circuit breaker settings
baselineWindow days, min days, computation mode, trend thresholds
ai_analyzerEnable + provider + API key + model for Layer 7
vocab_agentEnable + provider + API key + model for Layer 1
context_agentEnable + provider + API key + model for Layer 0
data_mappingDeclarative field mapping for custom schemas

Security

config.yaml
security:
  enabled: true
  api_key_hash: "<sha256-hash-of-your-key>"

Generate a hash:

$ printf "my-secret-key" | shasum -a 256

Then pass X-Infra-Key: my-secret-key on every request. Velum compares using constant-time SHA-256 comparison.

AI Features

All three AI layers support any OpenAI-compatible APIGroq (default), OpenAI, Together, Mistral, Fireworks, and more. The API URL is auto-resolved from the provider name.

Baseline Detection

config.yaml
baseline:
  window_days: 28               # Days of history for baseline computation
  min_days: 7                   # Minimum days before baseline is valid
  min_affected_users: 5         # Below this, significance is capped at "low"
  computation_mode: "daily"     # "daily" (cached) or "always" (per-request)
  trend_threshold: 0.10         # 10% delta to flag increasing/decreasing
  high_significance_threshold: 0.15  # 15% delta for high significance
  std_deviation_multiplier: 2.0      # Multiplier for std-based significance

min_affected_users prevents low-volume patterns (e.g., 1 user with 100% impact ratio) from being flagged as high significance. Set to 1 to disable the guard.

How Baseline Works

Every analysis request:

  1. Detects patterns in the current batch (stateless, works for any time window)
  2. Compares each pattern's impact ratio against the stored historical average (last window_days)
  3. Stores the current snapshot via upsert — keyed on (date, pattern_type, flow, context_key)
Baseline StatusConditionBehavior
first_observation0 historical snapshotsStores snapshot, returns unknown trend
insufficient_data1–6 days of historyStores snapshot, returns unknown trend
sufficient≥7 days of historyComputes avg + stddev, returns trend + significance
out_of_windowData older than 28 daysSkips storage and comparison entirely

Trend is classified by delta percentage: ≥10% increase → increasing, ≥10% decrease → decreasing, otherwise stable.

Significance uses standard deviation when available (delta ≥ 2×stddevhigh), falls back to absolute threshold (delta ≥ 0.15high). Capped at low when affected users < min_affected_users.

Data Ingestion Guidelines

  • Consistent windows: For meaningful baseline comparisons, send the same time window each ingestion (e.g., always a full day). Inconsistent window sizes produce different denominators, making ratio comparisons noisy.
  • No overlap: Avoid sending overlapping event batches for the same day. The last batch overwrites the snapshot (upsert), so overlapping batches cause the stored ratio to reflect only the last batch.
  • Re-processing: Sending the same complete batch again is safe — the upsert overwrites with identical values.
  • Ad-hoc analysis: The /api/v1/analyze endpoint never writes to baseline history, so investigative queries with non-standard windows are always safe.

Retention & Cleanup

Snapshots are auto-deleted after retention_days (default: 90 days). A background goroutine runs cleanup on startup and every 24 hours.

config.yaml
storage:
  retention_days: 90    # Snapshots older than this are deleted
Time BoundaryDefaultPurpose
baseline.window_days28 daysHow far back to look for comparison
baseline.min_days7 daysMinimum history before comparison is valid
storage.retention_days90 daysWhen data is permanently deleted
💡 Retention gap

The 62-day gap between window_days and retention_days means historical snapshots are preserved in case you widen the baseline window later.

Groq (default — just set the key)
ai_analyzer:
  enabled: true
  provider: "groq"              # URL auto-resolved (default if omitted)
  api_key: "gsk_..."             # or env: VELUM_AI_API_KEY
  model: "llama-3.1-8b-instant"
OpenAI
ai_analyzer:
  enabled: true
  provider: "openai"             # URL auto-resolved
  api_key: "sk-..."              # or env: VELUM_AI_API_KEY
  model: "gpt-4o-mini"
Custom / Self-Hosted (Ollama, vLLM, LiteLLM, etc.)
ai_analyzer:
  enabled: true
  provider: "custom"
  base_url: "http://localhost:11434/v1/chat/completions"
  model: "llama3"
💡 Provider config

The same provider, api_key, model, and base_url fields are available on all three agents (ai_analyzer, vocab_agent, context_agent). The base_url field is only needed for custom endpoints. If provider is omitted, it defaults to "groq".

Data Mapping

If your events use a non-standard schema, map fields declaratively:

config.yaml
data_mapping:
  enabled: true
  mapping:
    event:
      paths: ["payload.event.action", "event_name"]
      required: true
    ts:
      paths: ["meta.time", "timestamp"]
      format: "epoch_ms"
      required: true
    user_id:
      paths: ["context.user.id", "user_id"]
      required: true
    session_id:
      paths: ["context.session.id", "session_id"]

Supports dot-notation paths with fallback order. Extra properties pass through automatically.

Environment Variables

VariableOverrides
VELUM_PORTserver.port
VELUM_ENVserver.environment
VELUM_DB_HOSTstorage.postgres.host
VELUM_DB_USERstorage.postgres.user
VELUM_DB_PASSWORDstorage.postgres.password
VELUM_DB_NAMEstorage.postgres.database
VELUM_DB_PORTstorage.postgres.port
VELUM_DB_SSL_MODEstorage.postgres.ssl_mode
VELUM_AI_API_KEYai_analyzer.api_key
VELUM_VOCAB_AGENT_API_KEYvocab_agent.api_key
VELUM_CONTEXT_AGENT_API_KEYcontext_agent.api_key
VELUM_API_KEY_HASHsecurity.api_key_hash

📋 Full Example: Ride-Hailing App

Here's a complete walkthrough showing what each pipeline layer does with a real ride-hailing scenario.

Input Events

POST /api/v1/analyze
{
  "events": [
    {"event":"app_opened","ts":1708700000000,"user_id":"u1","session_id":"s1","device":"mobile","country":"IN"},
    {"event":"booking_requested","ts":1708700045000,"user_id":"u1","session_id":"s1","fare":790},
    {"event":"driver_assigned","ts":1708700060000,"user_id":"u1","session_id":"s1","driver_id":"d01"},
    {"event":"driver_cancelled","ts":1708700090000,"user_id":"u1","session_id":"s1","cancel_reason":"too_far"},
    {"event":"booking_requested","ts":1708700095000,"user_id":"u1","session_id":"s1","fare":790},
    {"event":"driver_assigned","ts":1708700110000,"user_id":"u1","session_id":"s1","driver_id":"d02"},
    {"event":"ride_started","ts":1708700180000,"user_id":"u1","session_id":"s1"},
    {"event":"ride_completed","ts":1708701000000,"user_id":"u1","session_id":"s1"}
  ]
}

Layer-by-Layer Trace

Layer 0 (Context Enricher):
  Schema detected: event="event", timestamp="ts", user_id="user_id"

Layer 2 (Event Adapter):
  "booking_requested" → {Flow:"booking", Status:"request"}
  "driver_assigned"   → {Flow:"driver",  Status:"assigned"}
  "driver_cancelled"  → {Flow:"driver",  Status:"cancelled"}
  "ride_completed"    → {Flow:"ride",    Status:"completed"}

Layer 3 (Session Flow):
  u1 → Flow 1: booking [booking_requested, driver_*, booking_requested, driver_*]
       ↑ driver events folded in, retry cycle merged
       Flow 2: ride [ride_started, ride_completed]

Layer 4 (Behavior):
  booking: [attempt, retry, progress]
  ride:    [attempt, succeed]

Layer 5 (Patterns — across all users):
  retry_storm on booking: 3/4 users (75%) → DETECTED
  masked_failure on booking: 2/4 users → DETECTED

Layer 6 (Baseline):
  retry_storm: FIRST OBSERVATION → establishing baseline
  masked_failure: FIRST OBSERVATION → establishing baseline

Layer 7 (AI):
  "75% retry storm in booking — driver_cancelled with cancel_reason=too_far
   in 2 of 3 cases. Driver supply may be insufficient for long-distance rides."

🏢 Multi-Tenancy

Each X-Project-ID gets isolated storage:

  • Pattern baselines are stored in per-project tables (pattern_snapshots_{project_id})
  • Vocabulary and property registry are shared across projects
  • A background goroutine runs daily cleanup of snapshots older than retention_days

Database Tables

TableScopePurpose
vocabularySharedAI-learned word classifications
property_registrySharedAI-learned property classifications
pattern_snapshots_{id}Per-projectHistorical pattern observations for baselines

🛡️ Resiliency

Velum is built for production reliability with multiple layers of protection:

FeatureDetails
Circuit BreakerWraps all LLM calls. After N failures, circuit opens — AI layers degrade gracefully until reset timeout.
Rate LimitingConfigurable req/s via go-chi/httprate. Returns HTTP 429 when exceeded.
Body Limit10 MB max request body to prevent OOM attacks.
Graceful ShutdownCatches SIGTERM/SIGINT, waits for active requests to finish (configurable timeout).
Background CleanupDaily goroutine deletes old snapshots per retention policy. Cancellable on shutdown.
Health Check/health endpoint checks DB connectivity. Returns "degraded" if unreachable.

🔧 Tech Stack

Language
Go 1.23+
HTTP Router
chi/v5
Database
PostgreSQL
LLM Provider
Groq (default)
Rate Limiting
go-chi/httprate
Config
YAML + Env Vars
Auth
SHA-256 API Key
Logging
Go slog
Container
Docker + Compose

External Dependencies

DependencyPurpose
github.com/go-chi/chi/v5HTTP router
github.com/go-chi/corsCORS middleware
github.com/go-chi/httprateRate limiting middleware

Build & Test

Terminal
$ go test ./...           # All tests
$ go test ./... -v        # Verbose
$ go test ./... -cover    # With coverage
$ go build -o velum cmd/velum/main.go  # Build binary