The EU and its member-states lack quantifiable data on cyber conflict that can guide EU Cyber diplomacy. But data is necessary to answer even basic questions like, whether cyber conflict is getting better or worse and does the EU cyber posture have the desired effect in reducing cyber operations or their damaging effects. A new European research project, the European Repository of Cyber Incidents, tries to fill this gap by providing a rich database and a dashboard to visualise cyber operations against the EU.
Today, most EU countries adopt a case-by-case logic to the analysis of cyber incidents: they are assessed individually without comparison to other incidents. Databases that collect all cyber operations against a country or the EU are either lacking or shrouded in secrecy. But without quantifiable data about cyber conflict, it is hard to answer questions like was Not-Petya more harmful than WannaCry? Was the Bundestaghack in Germany in 2015 more severe in impact than the interference in the 2017 French presidential elections? Quantifiable data provides a common yardstick to compare incidents.
A common gauge or yardstick in classifying cyber incidents is necessary to design a coherent policy response. This problem is particularly pressing for the effective application of the EU Cyber Diplomacy Toolbox. It wouldn’t make sense to activate restrictive measures in response to a cyber-attack that was not as severe as another cyber attack answered with sanctions. Potential double standards and the application of different policy responses to comparable incidents would create incoherence. Yet, this is the current state of play: the EU imposed sanctions in response to the hack of the German Bundestag of 2015 – a case of cyber-espionage with no direct impact on the political process – but did not act in the same way in response to the Russian interference in the French election in 2017. Arguably, the latter incident had a much more direct effect and thus could have warranted a more substantial response, if the French government would have been willing to pursue this. Without a shared situational awareness of basic data on cyber conflict, it is impossible to come to a coherent political and legal assessment of cyber incidents in the EU or a common stance on public attribution.
The state of art on cyber conflict data
Different research groups have attempted to address the problem of data in the study of cyber conflicts. The US Solarium Commission recommended the creation of a US Bureau of Cyber statistics. Researchers Brandon Valeriano and Ryan Maness created the Dyadic Cyber Incident and Campaign Data (DCID), an early and impressive data set on cyber conflict dyads, i.e. cyber interactions between parties with a conflicted history. The Council on Foreign Relations runs the handy and easy-to-use Cyber Operations Tracker. CSIS created an expansive cyber incident list. The Cyber Peace Institute drafted a timeline of cyber operations within the context of the Russian war in Ukraine. There have also been initiatives by the private sector like the Microsoft-funded Cyber Conflict Factbook 2020 or the Kaspersky cyber threat real-time map. Having intensively studied these projects, they provide valuable insights but still leave much to be desired.
First, projects by Kaspersky, the Cyber Peace Institute, the CSIS and the CFR offer only a glimpse into cyber operations. While these projects are visually appealing and easy to use, they had to compromise on the richness of the available data. They lack many data points that are relevant for academics and policymakers, such as variables on the impact, scale and effect of cyber-operations, which in turn are relevant for the application of international law and possible countermeasures. They do not track the attribution status of these incidents nor provide information about technical characteristics, like the use of 0-day exploits or the sophistication of operations. Most data sets also lack relevant context data, for example on different kinds of attackers beyond states and Advanced Persistent Threats. This is problematic given the rise of hacktivism during the Russian war in Ukraine. Similarly, many details regarding the victim categories are not tracked either. It is not always clear if, for example, critical infrastructure was affected and what the targeted sectors of cyber-operations were. This is highly relevant for a political assessment and reaction to cyber incidents given that critical infrastructure is at the core of UN-led discussions about responsible state behaviour in cyberspace.
Second, some of these projects are not clear about what data and cyber-incidents they include. Some of these projects don’t offer scientific criteria for data inclusion, i.e. which incidents are added to the lists and what is omitted. In other words, what makes a significant cyber incident so significant? The methodology for the inclusion of specific incidents is not always clear even though we know that the selection happens at some stage since there are millions of cyber incidents per year – most of which cannot be included in datasets. This is particularly problematic for policymakers: they need to react to the most pressing cyber operations that are particularly critical and impactful. For policy, it is necessary to differentiate the signal from the noise.
Third, all data sets focus only on the early stages of the lifecycle of a cyber operation, namely a description of the initial access to a target and in some instances statements on effects. First and second-order effects are often lacking. Most projects remain silent on legal questions like which bodies of international law were violated or if countermeasures could be warranted. Most datasets also remain silent about the process of attribution and the political reaction toward cyber incidents, like the adoption of cyber sanctions or indictments. We argue, that the analysis of cyber-operations requires a multidisciplinary view including computer science, political science and law.
Fourth, except for Kaspersky’s threat map, these projects mostly provide “dumb data”, in the form of Excel sheets or lists of text. Data is rarely interactive, computable and not always usable in politics or science. There is little one can do with these datasets. More user-friendly projects like the CFR tracker only offer a few ways to interact with the data. The highly complex DCID allows users to do statistical calculations themselves, but it is hardly user-friendly. What is entirely lacking is trend data and the comparison between different incidents in terms of victims, attribution or impact. In other words, there is a clash between the richness of data and ease of use.
Presenting the European Repository on Cyber Incidents
To address these problems, a group of researchers from European universities and think tanks, launched a new cyber conflict dataset, the European repository of Cyber Incidents (EuRepoC). It features data on more than 1400 different cyber-operations worldwide reaching back to 2000. This makes EuRepoC one of the largest available datasets. With the help of data mining, machine learning and natural language processing, data on new cyber incidents is collected daily and added to the database. An iterative re-coding loop guarantees that incidents are constantly updated once new information, for example on attribution, arrives. The database itself includes cyber operations with political significance (e.g. in terms of damage, impact or targeted entities) thus relevant for a policy response.
Cyber incidents are classified and coded with over 60 different variables that reflect an interdisciplinary approach. Political categories include characteristics of targets (i.e. sector, critical infrastructure, damage and effects), attackers (i.e. states, proxies, non-state actors, and APT Code Names), attribution information (i.e. who attributed the attack to whom and in which form) and policy response to attacks (i.e. sanctions or other diplomatic measures). Additionally, the database includes data on the legal dimension of cyber operations: what type of legal response followed from an attack (e.g. indictments, sanctions), what areas of international law were affected and invoked by responding states, and whether or not legal countermeasures could be warranted. Lastly, the dataset includes technical variables derived from the MITRE ATT&CK framework that is useful for the IT-security community: what were initial access vectors, where 0-days were used, what was the technical impact of an attack (disruption, destruction or physical effects?) and many more. This allows for drawing holistic conclusions, relevant to a variety of stakeholders: policymakers, the IT security community, academia and civil society.
The variables were derived both from the academic research and the detailed analysis of other projects discussed above. But the project significantly expands the focus. EuRepoC is designed to contribute to the EU’s cyber diplomacy: its core focus is on Europe (but offers data on global incidents as well). The Council Regulation 2019/796 includes a long list of variables that are useful for a legal categorisation and thus comparison of cyber incidents, such as different attack types, or what determines a significant cyber operation that could warrant a diplomatic response (like the number of affected organisations, the economic loss and more). EuRepoC adopts these criteria as variables and collects data on these to represent them in the database. In other words, it makes it easy to compare what cyber operations affect which element of crime in the legal sense. The aim is to allow a better policy response for the EU cyber diplomacy toolbox. The EuRepoC Dataset is therefore one of the most expansive data collections on cyber conflict out there: it covers the entire lifecycle of cyber operations from initial access to attribution resulting in political action.
This project aims to empower European research on cyber conflict. Beyond collecting data on cyber conflict to fill large research gaps, the research collective also aims to turn data into action. To this end, the dataset is user-friendly and as accessible as possible. Anyone can interact with data through a data-driven dashboard that provides insights in the form of powerful graphics – including a map displaying the distribution of cyber-attacks; timelines showing trends in types of cyber operations most frequently used; and types of targets. A table view allows the user to gain detailed insights into individual cyber operations and their characteristics. In addition, the entire raw data can be downloaded for free, which allows scholars to use data to perform their statistical analyses. By publishing full data, the aim is to give policymakers and scientists the tools they need. More importantly, however, the ambition of this new initiative is also to nurture cyber resilience in civil society by promoting scientific interdisciplinarity, a whole-nation approach and civil-society expertise concerning cyber conflict. Our goal was that everybody can look at the data, use it and provide feedback as necessary.
The EuRepoC is a research consortium composed of the Institute for Political Science at the University of Heidelberg (Germany) the Department of Legal Theory and Future of Law at the University of Innsbruck (Austria), the German Institute for International and Security Affairs, SWP (Germany) and the Cyber Policy Institute (Estonia). The consortium is currently funded by the German and Danish Ministries of Foreign Affairs.
Thumbnail Image credits: @leungchopan on @EnvatoElements