Abstract
Large language models (LLMs) demonstrate remarkable reasoning and generative capabilities but remain fundamentally limited by fixed context windows and the absence of persistent, structured memory. As a result, they struggle with long-term personalization, continual learning, and consistency across extended interactions. In this paper, we propose a modular memory-augmented architecture that integrates episodic and semantic memory with a consistency-aware retrieval mechanism and a learned memory-gating network. Episodic memory stores recent, high-fidelity interaction traces, while semantic memory maintains compressed, abstracted knowledge derived from historical experiences. A hierarchical compression module regulates memory growth by summarizing episodic content into semantic representations when capacity thresholds are exceeded. To retrieve relevant information, we introduce a consistency-aware scoring function that jointly optimizes semantic similarity, temporal relevance, and contradiction avoidance. This retrieval output is fused into the input context of a transformer-based LLM to support grounded and coherent generation. A lightweight gating network predicts the utility of storing new interactions, enabling selective memory updates and reducing catastrophic interference during continual learning. We show that this design supports long-horizon dialogue coherence, improved factual consistency, and adaptive personalization without requiring model retraining. The proposed framework is model-agnostic and can be integrated with existing retrieval-augmented generation pipelines. Our results highlight a practical pathway toward scalable long-term memory for LLMs, bridging the gap between stateless neural generation and persistent, structured knowledge systems.


