LLM PLAYGROUND CONTEXT WINDOW
(PT 7)

Silas Liu - Apr. 19, 2025

Large Language Models,

Graphs

Lately we have seen LLMs released with increasingly large context windows, ranging from 1 to 10 million tokens. To explore the potential of this capability, I ran a series of experiments using GPT-4.1 in the complex universe of the game Elden Ring. With a dataset of over 360k tokens, including all characater dialogues and all item descriptions, I tested how the model handles dense, interconnected narratives.

Large context windows introduce a new paradigm for LLM applications development, as they directly impact key concepts such as vector databases, memory and RAG techniques. However the true impact of this evolution is still unfolding and will become clearer over time.

< LLM PLAYGROUND (PT 6)

Recently more and more LLMs have been released with increasingly large context windows. Currently, most widely used models support windows of 128,000 tokens, but some of the latest models now handle anywhere from 1 to 10 million tokens.

Two of the newest models, Google's Gemini 2.5 and OpenAI's GPT-4.1, were released on March 25th 2025 and April 14th 2025, respectively. Both support context windows of up to 1 million tokens. Meta also released Llama 4 on April 5th 2025, with an impressive 10 million tokens context window. Although Llama 4 is open source, it requires a large amount of computing power to run.

Below is a comparative infographic showing the context window sizes of some of the most recent models:

Before we go further, let's clarify what a context window is and why it matters for LLMs. The context window is the total amount of text that a model can handle in a single request, including both input and output. It is measured in tokens.

For non-technical readers, tokens can be roughly compared to syllables or chunks of words. That is how LLMs internally represent and process text. A 128k token window is equivalent to around 100,000 words. For comparison, The Lord of the Rings books average about 160,000 words each, while the first Game of Thrones book has around 300,000 words.

Because of these context limits, working with multiple sources or large documents often requires more complex architectures. That includes the use of vector databases and RAG (Retrieval-Augmented Generation) systems, where the LLM dynamically retrieves relevant parts of a larger corpus to include in its input, in order to generate the response.

Additionally, when conversations involve memory, previous messages are also part of the context window. That means a LLM recalls past exchanges by receiving the full message history alongside the latest request.

As we can see, several key concepts in the LLM ecosystem such as vector databases, memory and RAG are all constrained by context window size. That is why these new models with much larger windows have the potential to change how LLM applications are designed. The real impact of this evolution should become more visible over time.

Curious to explore the performance of large context windows, I performed a few experiments using GPT-4.1 and the rich universe of the game Elden Ring. Elden Ring is a critically acclaimed fantasy RPG, set in a vast and mysterious world. It was developed by FromSoftware and co-written by George R. R. Martin, the author of Game of Thrones. The game features over 200 characters and thousands of items with deep and detailed descriptions, all interwoven into multiple dense storylines.

I built a custom web scraper to extract relevant content from the game's wiki, including all characters dialogues and descriptions of every item in the game: tools, ashes, crafting materials, bolstering materials, key items, sorceries, incantations, weapons, armors and talismans. From 2,400 pages, this yielded a dataset of around 1.5 million characters, which translates to roughly 360,000 tokens.

Below is a graph representation of the full dataset. It helps illustrate the scale of the material. In each use case, the entire dataset was included in the context window along with the prompt.

The first task was to ask the model to summarize the game's main storyline, identifying motivations, key characters and major events. This required the LLM to interpret scattered pieces of lore, connect different story arcs and reconstruct the sequence of events across different time periods in the game. The result was surprisingly detailed and coherent.

Input Tokens	Output Tokens	Total Tokens
359,095	3,020	362,115

Next, I gave the model a more complex task. This time, I asked the model to generate an illustrative timeline, mapping out both the historical events and the in-game events in chronological order. So it wasn't just about extracting key information, but also about organizing it into a coherent and structured format. Once again, the result was very accurate and detailed, it was even capable of extracting parallel events or multiple possible timelines/endings for the game. The timeline is interactive, you can zoom in or move it, in order to read it.

Input Tokens	Output Tokens	Total Tokens
359,242	1,530	360,772

For the last task, I asked the model to analyze and extract the relationships between the main characters of the plot. This was a significantly more complex request, as it required the LLM to identify key interactions, alliances and conflicts between characters throughout the game. After obtaining the relevant information, I used the extracted triplets to build a visual representation of these relationships, using a graph. This is a great use case for graphs, showcasing how they can effectively represent complex networks of connections. The graph is also interactive you can zoom in and move it, in order to read it.

Input Tokens	Output Tokens	Total Tokens
359,413	6,363	365,776

These three use cases demonstrate the potential of larger context windows in LLMs. With the ability to process and retain much more information in a single request, models can generate more coherent and nuanced responses, handle complex tasks, and offer more detailed insights. While techniques like RAG remain valuable for handling multiple datasets, the expanded context window opens up new possibilities for direct context processing, improving response accuracy and depth.

I intend to continue exploring more use cases and trying to deal with larger context windows.

LLM PLAYGROUND CONTEXT WINDOW (PT 7)

LLM PLAYGROUND CONTEXT WINDOW
(PT 7)