During the recent Lunar New Year holidays, Dhillon Andrew Kannabiran, the founder of Hack In The Box and a longtime figure in the global security community, drew attention to an experimental project that could reshape how cybersecurity skills are taught and tested.
The concept, known as LevelUp, is a self-evolving Capture The Flag platform powered by agentic AI. Rather than depending on human organizers to manually design, validate and periodically refresh challenge libraries, the system uses a chain of AI agents to govern the entire lifecycle of a challenge — from its initial design to testing, deployment, calibration, scoring and ongoing refinement.
At the time of Dhillon’s observations, the platform hosted nearly 300 active challenges across more than 30 categories, including web security, cryptography, binary exploitation, smart contracts, reversing, forensics, OSINT, API security and AI security. Each challenge runs inside its own isolated Docker container, offering participants something closer to real infrastructure than the simulated exercises that have traditionally defined much of cyber training.
That distinction matters. In offensive and defensive security alike, realism is often the difference between theory and competence.
From Static Repositories to Adaptive Ecosystems
For years, most CTF platforms have followed a familiar model. Challenges are authored manually by experts, validated by organizers and released in cycles. Difficulty is curated through human judgment, and content libraries are refreshed only as fast as authors can produce them.
That system has served the cybersecurity community well, particularly in cultivating problem-solving discipline and technical depth. But it has also faced limits. The threat landscape evolves quickly, practitioner demand has grown, and maintaining fresh, high-quality training environments at scale remains labor-intensive.
The promise of an adaptive platform is that it treats training not as a fixed archive, but as a responsive ecosystem. Difficulty can shift with real-world solve times. Challenge distribution can adapt to user behavior. Environments can be generated and tested continuously rather than released in periodic batches.
One of the more notable features of the platform is its category-specific ELO rating system, which tracks a player’s growth within individual domains rather than compressing all performance into a single global score. An in-browser terminal connects directly to containerized environments, allowing hands-on interaction without leaving the platform. In practice, that design more closely resembles real-world security workflows, where specialists are judged not merely by general aptitude, but by domain-specific capability.
Inside the Agentic Pipeline
What distinguishes the platform most clearly is its orchestration model.
According to Dhillon’s description, multiple AI agents collaborate in sequence. A designer agent creates the challenge and supporting infrastructure. A validator agent builds and tests the container environment. A calibrator agent estimates difficulty using a hybrid of rule-based reasoning and large language model analysis. A smoke-test agent confirms that the challenge compiles and that the flag can in fact be extracted. A quality-scoring agent evaluates structural soundness. Finally, an evolution agent reviews player-performance data nightly.
That data can then trigger mutations, recalibrations and difficulty adjustments, meaning the challenge environment changes over time in response to actual use. In theory, the more practitioners engage with the system, the more refined it becomes.
This is part of a broader technological shift. Agentic AI is moving beyond assistive roles — drafting text, generating snippets, automating repetitive tasks — and into orchestration. It is beginning to govern design, testing, analytics and iteration across complex technical systems. That transition is already visible in software engineering, cloud operations and cybersecurity automation. Training environments may be next.
The Questions That Still Matter
Even so, the model raises important questions.
Cybersecurity education is not simply the production of technically solvable puzzles. At its best, it develops structured thinking, investigative discipline and decision-making under uncertainty. Whether AI-generated challenge systems can reliably embed those deeper pedagogical principles remains unresolved.
There are also quality concerns. Autonomous systems must be rigorously validated. They must prevent unintended shortcuts, avoid fragile or accidental artifacts, and ensure that vulnerabilities are deliberate teaching mechanisms rather than side effects of automated generation. A living platform may be powerful, but it also introduces the risk of drift — away from educational intention and toward technical novelty for its own sake.
Still, the experiment points toward an old problem that has never been fully solved: how to deliver scalable, high-quality, continuously updated practical training without exhausting human challenge authors. If refined responsibly, a self-evolving system could offer a partial answer.
That is why projects like LevelUp are drawing attention. They suggest that the next generation of cyber training environments may not simply be built and deployed. They may be built, observed and reshaped in real time.
For inquiries regarding LevelUp or related training and security solutions, readers may connect with Algoritha Security at triveni@algoritha.in.
