On August 2 this year, European legislation on artificial intelligence (the AI Act) moved into its next phase. New rules took effect for providers of general-purpose AI models (GPAI) such as ChatGPT, Gemini, or Claude. They must now meet requirements for transparency and documentation. The AI Act’s nearly full applicability (the provisions for high-risk AI systems), which sets an ambitious standard for making AI trustworthy and safe for Europeans, will come next year on 2 August 2026. But the foundation may be shaky. The faster AI develops, the less explainable it becomes and the less certain we can be that it remains aligned with human values. Law can draw boundaries, but it cannot guarantee that machines will not cross them.
At the heart of both the problem and the solution is the alignment – the effort to ensure that AI not only follows instructions but also pursues what we actually want. A properly aligned model is helpful, truthful, and safe. A misaligned one can deceive, manipulate, or pursue undesirable strategies. Put simply, alignment is an invisible contract between us and the model – a set of training and safeguards that makes it do what we truly want and refrain from what could harm us, even if it is capable of doing it. The American company Anthropic has shown that its model Claude can feign obedience in the lab but would act differently out of sight (so-called “alignment faking”). At Anthropic, beyond Claude they also tested the limits of several other models, such as ChatGPT, Gemini, DeepSeek and Grok. Across these models they identified a phenomenon called agentic misalignment: when a model, acting as an autonomous agent, deliberately chooses harmful steps (for example blackmail or leaking internal information) to protect its goals or avoid being shut down. All these cases arose in controlled company simulations, but they are an important warning.
Politicians and lawmakers are betting on “explainability,” yet from our naturally readable language, AI is gradually developing a language of its own, so-called “neuralese,” a purpose-built machine language whose meaning we infer more from behavior than from words. Neuralese is not harmful in itself. It becomes dangerous when agents coordinate with messages we cannot decode or independently verify. A good analogy is the signals between the brain’s hemispheres: they enable coordination, but they are not sentences. As models grow stronger, the complexity of this inner code increases, making practical explainability harder (even though interpretability tools are advancing). That makes it all the more important to rely on alignment rather than merely on post-hoc explanations.
The EU can take the lead. If it connects the legally binding AI Act and the voluntary Code of Practice with targeted investments in alignment, mechanistic interpretability of AI, and independent safety testing, it can turn the phrase “trustworthy AI” into a real, thoroughly practical standard. Even if the EU has dozed off technologically, it is still not too late. In an environment where other regions often rely on voluntary or fragmented (or even nonexistent) frameworks, Europe could fill the regulatory gap and become the global benchmark for trustworthy artificial intelligence. But this will not happen without major investments. The EU currently lags far behind the US and China. One example is the planned OpenEuroLLM model, for which the EU has allocated a budget of €37.4 million, while just one company, the OpenAI speaks of investments in the hundreds of billions of dollars. With a European budget like that, it is impossible to compete even superficially; at best we can focus on a few targeted projects aimed at bolstering the explainability and trustworthiness of AI. Promising, for instance, is the European InvestAI initiative, which aims to mobilize €200 billion, but the first partial disbursements through a special €20-billion fund and the start of building AI infrastructure likely will not come before next year.
The uncomfortable truth is that AI will not wait. According to expert forecasts, by 2027 models may be able to develop their own successors. If we have not solved alignment by then, regulation will remain an empty promise. If the first generations are even slightly misaligned, their flaws will be copied and amplified. It is like teaching your child bad habits and then naively believing that one day it will raise your grandchildren well. The future of AI will be decided not only in Brussels or Washington, but in laboratories where it will soon become clear whether we can also teach machines our values. If we can, AI may become the most powerful engine of progress. If not, it will become the most dangerous invention in human history.
Author: Pavol Szabo, Member of the Board of EFDPO
Senior Managing Associate/Attorney, Deloitte Legal Slovakia
The article was originally published in Czech Economic Newspaper (Hospodářské noviny).


