How MemRL Works — Complete Architecture v4

TurnkeyHMS RM RL Pipeline · 6 hotels · ~365 stay dates · up to 365 DBA snapshots each · sequential retrieval with multi-Q scoring
Data & Inference
Pass 1: Context filter
Pass 2: Shape matching
Q-Learning
Offline calibration
SEQUENTIAL RETRIEVAL PIPELINE query THEN ~300 filtered ↓ THEN ~20 → top 3 Tuesday STR write q_rgi q_mpi q_ari replay ~140K snapshots offline optimal w(dba) 📊 Live Booking State SGI · DBA 45 · 62% occ search intent 2× LY · comp $229 139 state vector columns source: BQ pull 🔍 Pass 1: Context Filter runs first in pipeline kNN: DBA, search intent, comp rate, hotel, season 22,056 → ~200-500 "what situation am I in?" 🗄️ Memory Bank 22,056 snapshots · Neo4j ~365 dates × up to 365 DBA Neo4j graph store SGI Mar '25 @ DBA 45 q_rgi = 0.82 q_mpi = 0.74 q_ari = 0.88 terminal: RGI 118 MPI 105 ARI 112 📐 Pass 2: DTW Shape multivariate: occ + rate + velocity vs ~300 candidates → ~20 shape-similar dependent DTW (4 dimensions) ⚖️ Q-Value Module Pass 3: utility re-rank rank() — real-time update() — Tuesday w_rgi(dba)·q_rgi + w_mpi·q_mpi + w_ari·q_ari 🧠 Foundation LLM Gemini / Claude / Llama weights never change input improves as Q-values learn over weeks interchangeable vendor 💰 AI Rate Rec Raise / Hold / Lower · Magnitude decision → human review → Opera 📈 STR Outcome Tuesday: RGI, MPI, ARI weekly (2-8 day lag) 🔄 TD Update (temporal difference / Bellman backup) the learning engine Q ← Q + α[r + γ·maxQ' − Q] 🔬 Backtest Calibration offline weight optimization ~140K snapshots × 3 metrics data exists — ~140K observations