How it works
LiveScoreJob is the heartbeat of the system. It runs on a cron loop with two cadences:- 2.5 seconds when a match is live (actively in play)
- 60 seconds when idle (pre-match, innings break, rain delay)
Reconciliation
LivescoreProvider reconciles the raw API response into a normalized internal state. This handles quirks in the upstream API: missing fields, delayed score updates, inconsistent innings numbering.Deduplication via state hashing
StateHashService computes a hash of the reconciled state and compares it against the last known hash. If nothing changed, the entire downstream pipeline is skipped. This is critical for avoiding unnecessary DynamoDB writes and push notifications, especially during drinks breaks or rain delays where the API returns the same data repeatedly. The hash covers: runs, wickets, overs, batting/bowling stats, match status, and innings state.Fixture update
When a genuine state change is detected, FixtureService writes the updated fixture record to thefixtures table in DynamoDB.
Push pipeline
ActivityPushJob picks up the change and queries thecricket-activity-subscriptions table to find every user currently watching this match via a Live Activity.
MatchDisplayService formats the score into a team-centric perspective. If a user is following Team A, they see Team A’s score prominently. This is a display-layer concern — the underlying data is the same.
APNs Service builds the Live Activity update payload. Apple enforces a strict 4KB limit on Live Activity payloads, so the service carefully prunes the payload to fit.
The push is sent over HTTP/2 to Apple’s APNs servers. iOS receives the push and ActivityKit updates both the Dynamic Island and the Lock Screen widget.
Resilience
Circuit breaker: The SportMonks client uses cockatiel for circuit breaking. If the API returns repeated 5xx errors, the circuit opens and requests are short-circuited for a coolback period. This prevents hammering a degraded upstream.
Key tables
| Table | Purpose |
|---|---|
fixtures | Canonical match state, updated by FixtureService |
cricket-activity-subscriptions | Maps users + push tokens to matches they are watching |
Key jobs
| Job | Cadence | Purpose |
|---|---|---|
LiveScoreJob | 2.5s / 60s | Polls SportMonks for match state |
ActivityPushJob | Triggered by state change | Pushes updates to all subscribers |
Areas of improvement
The following items were identified from a code-level audit ofScorePoller.ts, activityPushJob.ts, apnsService.ts, stateHashService.ts, lockService.ts, and SportMonksClient.ts.
1. Sequential match processing in ActivityPushJob
2. Unbounded Promise.all in APNs batch sends
3. No retry on transient APNs failures
Reliability gap:
apnsService.ts sendPush (line 432) makes a single attempt per token. If APNs returns a transient error (e.g., HTTP 503, stream timeout, GOAWAY mid-request), the push is silently lost. The only retry mechanism is that the next poll cycle will compute a new hash — but if the state hasn’t changed, the hash matches and no update is sent, leaving the user’s lock screen stale until the next ball.Consider adding a retry with backoff for 5xx/timeout errors (distinct from invalid-token errors which should not be retried).4. State hash does not cover key events or recent balls
5. DynamoDB write failure between hash update and APNs send
Data consistency: In
activityPushJob.ts, the flow is: send APNs push, then updateStoredHash (line 699). If the APNs send succeeds but the DynamoDB hash write fails, the next tick will see the old hash, detect a “change,” and re-send the same update. This is mostly harmless (idempotent pushes), but the reverse order would be worse — so the current ordering is the safer choice. However, there is no logging or metric when updateStoredHash throws, since the error propagates up to the catch-all in runOnce which only logs the match ID, not which step failed.Add explicit error handling around updateStoredHash with a distinguishing log message.6. selfHealAttempts is an in-memory Map with no size bound
Memory leak potential:
activityPushJob.ts line 58 declares selfHealAttempts as an unbounded Map<string, number>. The cleanup function (cleanupSelfHealAttempts, line 64) only runs at the end of attemptSelfHealRecovery, so if self-heal is never triggered (e.g., all users have subscriptions), stale entries from previous matches accumulate. Over weeks of uptime, this map grows without bound.Consider using an LRU cache or adding a periodic sweep independent of the self-heal path.7. Single HTTP/2 session for all APNs traffic
Single point of failure:
apnsService.ts maintains exactly one http2Session (line 289). If that session enters a degraded state (e.g., receiving GOAWAY but not yet closed), all in-flight pushes fail until the close event fires and the next request triggers reconnection. During a GOAWAY drain, new streams may be rejected but the session isn’t yet closed, so getSession() returns it as healthy.Consider checking for GOAWAY state in getSession() and proactively reconnecting, or maintaining a small pool of sessions.8. Lock renewal race in ActivityPushJob
Race condition: In
activityPushJob.ts, lockRenewTimer (line 770) runs on a setInterval of 15 seconds (half the 30-second TTL). But runOnce can take longer than 15 seconds if many matches are being processed sequentially (see item 1). If runOnce takes 35+ seconds, the lock expires before the renewal fires, another instance acquires it, and both instances process the same matches concurrently — leading to duplicate APNs pushes.The ScorePoller handles this better by calling extendLock at the end of each tick (line 275). Consider aligning ActivityPushJob to the same pattern, or extending the lock within the match processing loop.9. No metrics or structured observability
Observability gap: The codebase uses
console.log and console.error throughout. ScorePoller.ts emits structured JSON logs (good), but activityPushJob.ts uses unstructured template strings like [ActivityPushJob] Processing match ${match.id}. There are no metrics emitted for:- APNs push latency (p50/p99)
- Push success/failure rates per match
- State hash hit/miss ratio
- Lock contention frequency
- Time-to-lock-screen (end-to-end latency from SportMonks response to APNs send)
10. SportMonks client has no rate-limit backoff
Resilience gap:
SportMonksClient.ts detects HTTP 429 and throws HttpRateLimitError (line 124), but does not read the Retry-After header or implement any backoff. The ScorePoller circuit breaker treats 429 the same as any other error — it counts toward the consecutive failure breaker. A sustained rate limit could open the circuit breaker unnecessarily, blocking all polling for the cooldown period (30 seconds) even if the rate limit resets in 1 second.Consider handling 429 specifically: read the Retry-After header and delay the next poll accordingly, rather than letting it trip the circuit breaker.11. Existing TODOs in the codebase
The following open TODOs are relevant to this flow:| Location | TODO |
|---|---|
SportMonksClient.ts:36 | #254: Replace hardcoded SPORTMONKS_SEASON_ID with active league seasons from LeagueService |
SportMonksClient.ts:307 | Add man_of_match_id to fixture schema so archival captures MOTM |
detectKeyEvents.ts:52-53 | Implement significant_over and target event detectors |
adminV2.ts:1121 | Job status endpoint returns process-local state, not cluster-wide state in multi-instance deploys |