How
MemRL
Works — Complete Architecture v4
TurnkeyHMS RM RL Pipeline · 6 hotels · ~365 stay dates · up to 365 DBA snapshots each · sequential retrieval with multi-Q scoring
Data & Inference
Pass 1: Context filter
Pass 2: Shape matching
Q-Learning
Offline calibration
1
Intent
2
Pass 1: Context
3
Pass 2: DTW
4
Multi-Q Re-Rank
5
Foundation LLM
6
Action
7
STR Outcome
8
TD Update
9
Backtest
▶ Auto
SEQUENTIAL RETRIEVAL PIPELINE
query
THEN
~300 filtered ↓
THEN
~20 →
top 3
Tuesday STR
write
q_rgi
q_mpi
q_ari
replay ~140K snapshots offline
optimal
w(dba)
📊
Live Booking State
SGI · DBA 45 · 62% occ
search intent 2× LY · comp $229
139 state vector columns
source: BQ pull
①
🔍
Pass 1: Context Filter
runs first in pipeline
kNN: DBA, search intent,
comp rate, hotel, season
22,056 → ~200-500
"what situation am I in?"
🗄️
Memory Bank
22,056 snapshots · Neo4j
~365 dates × up to 365 DBA
Neo4j graph store
SGI Mar '25 @ DBA 45
q_rgi = 0.82
q_mpi = 0.74
q_ari = 0.88
terminal: RGI 118 MPI 105 ARI 112
②
📐
Pass 2: DTW Shape
multivariate: occ + rate +
velocity vs ~300 candidates
→ ~20 shape-similar
dependent DTW (4 dimensions)
③
⚖️
Q-Value Module
Pass 3: utility re-rank
rank() — real-time
update() — Tuesday
w_rgi(dba)·q_rgi
+ w_mpi·q_mpi + w_ari·q_ari
🧠
Foundation LLM
Gemini / Claude / Llama
weights never change
input improves as Q-values
learn over weeks
interchangeable vendor
💰
AI Rate Rec
Raise / Hold / Lower · Magnitude
decision → human review → Opera
📈
STR Outcome
Tuesday: RGI, MPI, ARI
weekly (2-8 day lag)
🔄
TD Update
(temporal difference / Bellman backup)
the learning engine
Q ← Q + α[r + γ·maxQ' − Q]
🔬
Backtest Calibration
offline weight optimization
~140K snapshots × 3 metrics
data exists — ~140K observations