Contents
- Introduction
- Key trends and figures
- Consequences of data breaches for organizations
- Shadow market for data: general statistics
- Industries most frequently affected by data breaches
- Growth in the number of credential breaches
- Commercial (not) secrets: a spike in source code breaches
- Personal data: lower number, higher scale
- Better safe than sorry: government agencies under threat
- Methods used to attack organizations and steal confidential data
- Data storage locations attacked by hackers
- Weaknesses in data management processes
- How to protect data against being breached
- Conclusion
- About the report
Introduction
Confidential data breaches and leaks are the most common consequence of cyberattacks on organizations. According to our statistics, data was leaked in more than half of all successful attacks on organizations in 2023 and 2024.
Confidential data is of particular value because it gives companies competitive advantages in the market and helps organizations achieve their strategic goals. The unauthorized disclosure of information that's critical for organizations and their clients and counterparties can be considered a non-tolerable event (NTE) that in most cases leads to other NTEs, including financial losses, interruption of business processes or individual systems, and subsequent attacks on counterparties. In addition, the personal data of clients and employees is of particular importance in information security, as the vast majority of companies globally process personal data in one form or another, and data breaches can lead to many unpleasant consequences for both individuals and companies.
Attackers have a pronounced interest in confidential data because of the opportunity it presents for significant financial gain. For example, through the extortion of money in exchange for the non-disclosure of stolen data, fraudulent operations and phishing campaigns using personal data, or its sale on the dark web. Volatile geopolitics also contributes to the development of cyber espionage and hacktivism with the aim of disrupting the stable operation of infrastructures and publishing stolen data in the public domain.
In this survey, we'll discuss in more detail the key trends in data breaches we observed in H1 2024, the consequences and non-tolerable events they result in, the methods cybercriminals use to penetrate infrastructures, and what helps attackers successfully exfiltrate sensitive data from internal systems.
Key trends and figures
The following main trends were observed in H1 2024:
- Every second successful attack on organizations in H1 2024 resulted in the breach of confidential data. Attacks on financial institutions and healthcare organizations involved data breaches most often, with data stolen in every four out of five successful attacks.
- Government agencies were the most frequent victims of attacks resulting in the breach of confidential information. This industry accounted for 13% of total attacks, which is 3% higher compared to the same period last year. Note that leaks from government agencies also occurred due to data breaches in the systems of contractors and counterparties.
- The total number of confidential data breaches fell by 4% compared to the previous six months, but 2024 may be a record year for the volume of information contained in compromised databases.
- H1 2024 saw a spike in compromised account credentials. The share of authentication data among leaks of other types of information in H1 2024 increased by 9% compared to the same period last year to reach a record 21%. Among the leaders in breached credentials are IT companies, which leads to further attacks on their customers.
- Once again, successful attacks resulted most often in the leakage of personal data, totaling 31% of all data stolen from organizations in H1 2024. However, the overall share of personal data leaks from organizations in the first half of the year decreased significantly (by 15%) compared to the same period last year when a surge in such incidents was observed, including due to mass attacks on secure data transmission systems.
- Leaks of trade secrets and restricted information rank second in terms of stolen data with an increase of 3% compared to the previous six months and totaled 24% in H1 2024. Government agencies, IT companies, the industrial sector, and transport companies remain at the top, with every third data leak in these sectors containing trade secrets and restricted information.
- In H1 2024, hackers in a number of successful attacks (including on IT companies) were trying to steal source code, which could entail negative consequences for the developer company and its clients as well.
- Ransomware continued to be used by attackers in almost every third successful attack on organizations that resulted in the leakage of confidential information. The leaks also confirm that there's a general increase in the number of attacks using remote control malware.
Consequences of data breaches for organizations
Confidential data breaches are non-tolerable events for organizations and can lead to a number of extremely negative consequences.
Non-tolerable events are events that result from a cyberattack and prevent an organization from achieving its operational and/or strategic goals or lead to a significant disruption of its core business.
According to the SANS and CrashPlan survey, the biggest concern for organizations is the potential reputational risks from data breaches. Companies can lose customers, market share, or stock value. For example, in summer 2023, Estee Lauder fell victim to a massive cyberattack involving ransomeware, and by the start of 2024 the company's stock had fallen by 3% of semi-annual turnover. The second most significant risk is the legal consequences of data leaks. For example, affected individuals and companies can sue as part of an administrative or criminal lawsuit with potentially significant fines and penalties. In early 2024, UnitedHealth Group, a major U.S. medical insurance company, was hit by a cyberattack resulting in the theft of 6 TB of data and the shutdown of several services on the company's network required for billing, claims processing, and the exchange of medical information throughout America. Subsequently, UnitedHealth Group and its subsidiaries were subject to multiple class action lawsuits, and now the company has to pay out over USD 2 billion in financial assistance to healthcare providers affected by the cyberattack.
Beyond reputational risks, lower profits, and loss of market share, data leaks can also lead to direct financial losses. For example, H1 2024 года saw the major data breach of over 500 million users from TicketMaster, a ticket sales and distribution company. In addition to personal data, as the information security incident developed, the attackers also managed to compromise the internal SafeTix system, which automatically refreshes to protect tickets from being screenshot or photocopied. Ultimately, the attackers managed to issue approximately 10 million fake tickets and barcodes for major concerts that TicketMaster can't cancel. The consequences of the attack on TicketMaster will likely continue into H2 2024, as the stolen data gave the attackers access to information and allowed them to interfere with the company's business processes.
Financial losses from data breaches can also affect entire countries in addition to individual companies due to fraudulent transactions with the stolen personal data of citizens and their accounts. For example, use of the untrusted 911 S5 free VPN service (perhaps one of the largest examples of a botnet), a huge volume of personal data was compromised over the span of several years. During this time, the U.S. Department of Justice estimates that over 560,000 fradulent claims for unemployment insurance were submitted from compromised user devices resulting in verified losses exceeding $5.9 billion.
In H1 2024, attackers continued to use ransomware in attacks on organizations (more than 40% of all recorded types of malware). Thus, organizations remain at risk for data breaches and also losing their data due to encryption. Organizations are forced to either pay a ransom to the attackers, which neither guarantees the decryption of their data nor prevents its public disclosure, or suffer damages and restore business processes after the guaranteed loss of access to valuable data. In the event of a suspected ransomware attack and data leak, companies are also often forced to shut down their information systems to prevent the further spread of the attack and gain time to analyze the incident. System shutdowns and business process interruptions can significantly impact a company's operations and lead to lost revenue. In fact, according to data from Comparitech, one in every five ransomware attacks in 2023 with data theft resulted in lawsuits. Companies were fined for failure to take sufficient data protection measures, disruptions to services due to cyberattacks, and confidential data breaches, with the average fine totalling $2.2 million.
After attacks, companies are faced with the need to restore their infrastructure. This includes data recovery and the restoration of software and information security system configurations. The recovery process can take from several weeks to several months and cripples the company's operating potential, which also leads to loss of revenue and reputational risks.
To encourage organizations to develop mature processes for processing personal data and take appropriate information security measures, the laws of many countries involve fines for the violation of personal data processing and security requirements. Personal data breaches can bring about lawsuits from affected individuals claiming personal damages and hefty fines. For example, in May, a court decision was published with a EUR 360,000 fine imposed on 4Finance Spain Financial Services following a breach that compromised customers' personal and financial data and allowed attackers to submit fraudulent loan applications with the stolen data. Maximum fines vary from country to country: for example, under the GDPR and UK Data Protection Act, fines can total up to 4% of a company's annual revenue, and laws in Singapore and Australia provide for fines up to 10% of annual revenue, but in Brazil the maximum is only 2%. Other countries, such as Iran, are only starting to consider personal data laws, and it's unclear when they will come into force.
The rise in major leaks in the telecommunications sector and illegal transfer of personal data to third parties prompted the U.S. Federal Communications Commission to introduce a new requirement for telecommunications and VoIP service providers earlier this year to report personal data breaches. Note that the same requirement entered into force in September 2022 to report data breaches to Roskomnadzor.
Russian laws provide for relatively low fines for violations of the Law on Personal Data. At the end of 2023, they amounted to a total of about RUB 4.6 million. However, in December 2023, the State Duma passed a law on stricter penalties for violations involving the processing and security of personal data, including a new fine of up to RUB 1.5 million for legal entities for the repeated violation of personal data processing and security requirements. In H1 2024, the State Duma also considered a law on the introduction of revenue-based penalties for personal data breaches. If the bill is passed, the maximum potential fine could increase significantly.
Shadow market for data: general statistics
Analysis of confidential information leaks is closely related to the study of the dark web where criminals usually sell stolen data. In terms of regions, in H1 2024, offers for the sale and free distribution of data were most often seen from Asian countries, totalling about a third of all ads (30%). This is also due to the increased activity of cybercriminals in the region. Based on our research on cyberthreats in Asia in 2022–2023, almost half of all successful attacks on organizations in the region ended in information leaks.
In the ranking of individual countries by number of ads on dark web forums, Russia leads the top five with a share of 10%, followed in descending order by the United States, India, China, and Indonesia.
The most popular type of data on the dark web in H1 2024 was personal data, with the share of ads for the sale or distribution of personal data topping 83%.
Before we cover the price of data on the dark web, note that not all databases are for sale. In fact, the number of ads for free distribution (64%) is almost twice as high as the number of ads for the sale of data (33%). Russian companies are victims of the largest share of free distribution offers, accounting for 88% of leaks from Russian companies. There's also a high proportion of free databases in Latin American countries, the U.S., India, and Indonesia, with an average over 70%. However, the databases of Chinese companies are more often sold, with the share of such ads reaching 60%.
This can be attributed to the fact that not all attackers are motivated to sell data, as they often first demanded a ransom not to disclose it, but not all victims pay. Also, if there's no demand for the data being sold, scammers may publish it for free after a certain time.
More than half of ads on the dark web are priced under $1,000. The least expensive are standard personal data (full name, phone number, email, and date of birth) from companies around the world, primarily from the service, trade, online services, science, and education sectors. The total number of unique phone numbers or email addresses has little effect on price: 10,000 and 10,000,000 lines can cost the same. The price of data for sale starts to increase with the addition of more information, including passport data, driver's license or insurance policy data, information about financial accounts and bank cards, and biometric data. Offers for mid-priced data in the $10,000 range are more common from IT companies, financial institutions, government agencies, healthcare organizations, and industrial companies.
However, half of all ads (53%) don't specify any price, encouring buyers to contact the seller with their best offer. So in reality, the share of more expensive ads may be higher.
Every tenth ad belongs to the most expensive category at $10,000 or more. The most expensive ads (over $50,000 for 2 TB of data) were for data from major financial institutions, retailers, and IT companies. For example, in Q2 2024, EDR developer Cylance suffered a cyberattack. Shortly after, 34 million emails and an unspecified volume of customer and employee data was leaked and put up for sale for $750,000 on a dark web forum. Another example is from Advanced Auto Parts, victim of a major data breach estimated to be worth $1.5 million on the shadow market. According to the attacker, the database includes 380 million customer profiles, 140 million customer orders, 44 million loyalty card numbers, and 358,000 employee profiles.
Industries most frequently affected by data breaches
After a sharp increase in the share of confidential data breaches (up to 59%) among other consequences of cyberattacks on organizations observed in H1 2023, this number decreased slightly to 54% by the end of H1 2024. The largest number of breaches of confidential information in H1 2024 occurred in government agencies (13%), IT companies (12%), and industrial companies (11%).
On dark web forums, the largest number of offers with data from government agencies were from countries in Asia (33%), Latin America and the Caribbean (18%), and the Middle East (16%). This is explained by the fact that these regions are targets of APT groups that primarily attack government organizations. We wrote more about this in our research on APT groups in the Middle East and Southeast Asia.
According to H1 2024 results, the number of leaks in IT companies increased by 3% compared to the previous six months and reached the level of H1 2023 (12%). One of the factors influencing the growth of confidential data breaches in IT companies, including leaks of credentials, is multiple malware infections in open repositories popular among developers. For example, earlier this year, Apiiro researchers discovered a new wave of a coordinated campaign resulting in the upload of over 100,000 malicious repositories. In March, creators of the Python Package Index (PyPI) repository faced a massive typosquatting campaign of Python packets that led to the temporary shutdown of new user registrations. We talk more about what happened in our report on current threats in Q1 2024. Later in May, CheckPoint researchers found thousands of malicious extensions in the plugin store for the VSCode free source code editor. It was brought to light that users had installed the malicious extensions millions of times. Since IT companies are contractors for organizations in many industries, penetration of their infrastructure sparks a chain reaction of successful attacks on other organizations, which can be considered a non-tolerable event for them. For example, in H1 2024, Russian information security company SoftMall fell victim to a successful phishing attack resulting in the leakage of information security audit reports from several major partners. SoftMall confirmed the incident in an official announcement on the company website.
However, leaks from industrial organizations (in third place in the top 3 for share of leaks in H1 2024) can also lead to the compromise of client data. Considering the criticality of data processed by industrial organizations, this information can greatly influence the attention of attackers towards certain organizations, as well as the possibility of triggering non-tolerable events. During a cyberattack on the corporate network of Schneider Electric, terabytes of confidential data were stolen. According to published information, clients of the Schneider Electric division attacked may include major international corporations such as Clorox, DHL, Hilton, PepsiCo, and Walmart. The data stolen by attackers may contain confidential information about infrastructure and energy consumption systems, as well as the industrial automation solutions implemented in the company's clients' facilities.
The share of data breaches from healthcare institutions (first place in 2023) decreased significantly by 11% compared to H2 2023. However, the data leaked from healthcare organizations typically has a large volume and variety. For example, Sav-Rx, a healthcare company based in Fremont, Nebraska, reported a major data breach in H1 2024 that affected over 2.8 million people. Data compromised during the cyberattack on Cooper Aerobics includes sensitive information such as names, addresses, phone numbers, email addresses, financial data (credit and debit card numbers, expiration dates, account numbers, taxpayer identification numbers, passport numbers, usernames and passwords, Social Security numbers), and health-related data (medical records, patient account numbers, prescription information, healthcare providers, procedures, health insurance information).
Financial institutions close out the top 5 industries in terms of the number of data leaks in H1 2024. The share of incidents resulting in a data breach is also the highest among financial institutions, with four out of five successful cyberattacks resulting in a leak. This trend is understandable, as sophisticated attacks on well-protected financial organizations with the goal of stealing money have become a rare occurrence amid the rise of easier-to-implement ransomware attacks and large-scale customer data breaches. In Q2 2024, attackers stole the bank account information of 30 million people, 6 million account numbers and balances, and 28 million credit card numbers of Santander bank clients in Spain, Chile, and Uruguay.
Retail companies also suffer from data breaches. In fact, the databases of companies from the trade and e-commerce sectors were the most common on dark web forums, totalling one fifth of all ads (21%). For example, in March of this year, PandaBuy, a large e-commerce platform that connects customers with Chinese suppliers, became the victim of a data leak. The data breach involved more than 1.3 million active unique email addresses, as confirmed by the creator of Have I Been Pwned (HIBP), as well as full names, phone numbers, IP addresses, order dates, home addresses, and other information. Later it was revealed that PandaBuy paid the attackers a ransom to keep the data provate, but the attackers continued to extort the company even afterwards.
However stolen data is most often posted on shadow forums from Russian retail companies, totalling 14% of all database offers from the retail industry. At the start of 2024, the Rendez-vous online store for clothing and footwear fell victim to a data breach. Posted for sale were 7.6 million unique phone numbers and 4.5 million email addresses, as well as first names, last names, dates of birth, residential addresses, password hashes (MD5 with salt), purchase amounts, and gift certificate codes with activation PINs.
To wrap up H1 2024, at the end of June, the Magnolia supermarket's online store fell victim to a data breach twice. In both cases, hackers gained access to database dumps containing the personal data of 253,000 customers, including full names, delivery addresses, phone numbers, email addresses, order contents, hashed passwords, and Magnolia customer discount coupons.
In general, Russian retailers are quite a popular target for attackers. According to our research in 2023, Russian online stores and marketplaces were in the top 3 in terms of the number of data theft reports.
Scientific and educational organizations were less frequent victims of data leaks, with a share in the total number of leaks in H1 2024 only 4% compared to 9% during the same period last year. However, they still account for a lot of ads (10%) on the dark web. This may be due to the fact that in 2023, organizations in this sector were more active targets, and now dark web forums continue to sell previously stolen data.
Growth in the number of credential breaches
The first two quarters of 2024 were punctuated by an abnormal increase in the number of credential breaches at organizations around the world. The share of these breaches increased rapidly in H1 2024 and reached a record high of 24% by the end of Q2. Total growth in H1 was 9% compared to the same period last year and reached 21%. Successful cyberattacks led to the theft of various types of credentials, including web service logins and passwords, authentication data for remote access protocols (SSH, RDP, and others), local and domain accounts of operating system, passwords saved in users' browsers, and email credentials.
The highest share of credential breaches is among IT companies (24%), financial service companies (22%), the telecommunications sector (25%), and government agencies (16%). Note that the compromise of companies providing services or developing software is critical not just for them, but their clients as well. Leaked credentials from these companies can help attackers further compromise organizations in other industries. In H1 2024, clients from multiple different cloud service providers were compromised. For example, Dropbox reported that attackers compromised the Dropbox Sign system administration tool and gained access to authentication tokens, multi-factor authentication (MFA) data, hashed passwords, and customer information. The incident affecting the SnowFlake cloud provider and its clients became the most high-profile of the period and required a lengthy investigation. It's still not entirely clear whether SnowFlake clients, including Santander Bank and TicketMaster (the attacks on which we mentioned earlier), were compromised specifically due to poorly configured authentication mechanisms (especially MFA), or the credentials for accessing SnowFlake and its clients were somehow compromised.
Credential breaches can also occur due to the improper configuration of websites and services. At the beginning of the year, independent researchers discovered that at least 900 websites had the Google Firebase service configured incorrectly, which creates the risk of a potential leak of more than 20 million passwords and over 27 million payment details and other confidential data.
Stealing credentials usually isn't the ultimate goal of attackers. Instead, it's an intermediate stage between penetrating the attacked infrastructure and developing the attack to trigger other events, including the disruption of systems, or theft of funds or other confidential information. This makes authentication information of particular value to cybercriminals. Credentials are also a common commidity for sale on dark web forums and one of the ways cybercriminals make money. For example, in March, a shadow forum offered access to the Emirates investment bank website for $10,000, and in Q2 there were ads selling administrator access to the infrastructure of a major telecommunications company in Latin America for $25,000. An ad for the sale of credentials to access Pertamina, an Indonesian oil and gas company, was also found on a dark web forum. The exact price was not specified in the advertisement and instead discussed with potential buyers individually. This is because the credentials are only valuable to buyers who can successfully use them in other cyberattacks, so different buyers are willing to pay different prices. According to the seller, the ad was for the credentials of 22,000 employees and 790 accounts with administrator rights.
Another example from early July 2024 is the massive published password database of RockYou2024 containing nearly 10 billion (9,948,575,739) passwords. RockYou2024 is an expanded version of RockYou2021 with an additional 1.5 billion passwords added from 2021 to 2024. Attackers can use this data for credential stuffing attacks. When used in conjunction with other leaked databases on hacker forums and marketplaces with user email addresses and other credentials, RockYou2024 may contribute to a wave of new data breaches, financial fraud, and identity theft.
In the future, we expect to see an increase in the number of ads on dark web forums selling access to compromised company infrastructure. The rapid growth in the number of credential breaches has already had an impact on the shadow market, with some ads selling access to dozens or hundreds of companies at once. For example, according to Daily Dark Web, there was an ad in June for credentials from more than 400 companies, including access through services and platforms such as Jira, Bamboo, Bitbucket, GitHub, GitLab, SSH, SFTP, Zabbix, AWS S3, AWS EC2, SVN, and Terraform. Due to the special value offered in this ad, the exact price is not posted and open for negotiations between the seller and buyers in private. According to the seller, the data was obtained by compromising a contractor company. Earlier, in April, another ad was posted for the sale of access to 16 companies in various industries in Latin America, the Middle East, Europe, and Asia. In this case, access was available from $250–5,000.
Commercial (not) secrets: a spike in source code breaches
This period showed an increase in the share of breaches of commercial secrets and other restricted information from organizations totalling 24% at the end of H1 2024, or 10% higher compared to the same period last year.
In H1 2024, the industrial sector (39%), government agencies (36%), and transportation companies (29%) continue to lead in the share of leaks of commercial secrets and other restricted information. For example, Kenya Airways was subject to an attack by the Ransomexx ransomware group ending in a massive data breach. The leaked files include information about aviation accidents, investigation reports of employee misconduct (fraud, theft, policy violations), insurance policies, confidential agreements, passwords, customer complaints, and alleged cases of sexual harassment. The files also contain information related to accidents in the company.
Leaks of this data can also lead to financial risks in addition to reputational damage. In part, this can be related to the loss of competitive advantages. In April 2024, information arose about a breach in Volkswagen of 19,000 documents, including important data about proprietary electric vehicle technology and manufacturing strategies. The breach directly threatens Volkswagen's competitive advantage in the fast-growing electric vehicle market and raises concerns about the potential misuse of this information. At the start of 2024, news was released that 3 TB of data was leaked from another carmaker, Hyundai Motor Europe, after a ransomware attack by Black Basta. The stolen data is related to different departments, including legal, sales, HR, accounting, IT, and management.
IT companies also suffered breaches of confidential information related to internal processes, developments, and products, totalling 29% of all leaks in the industry in the first half of the year. H1 2024 was also marked by leaks of software source code and confidential information about products under development. This once again confirms the growing threat of attacks in various industries using third-party software, including the software of large and trusted developers. For example, one attacker claims to be selling data obtained from hacking AMD in June 2024. The data for sale reportedly includes a wide range of sensitive data, from source code and information about upcoming products to employee and customer databases. The same hacker also claims to have stolen the source code of several internal tools from Apple.
The consequences of such breaches can include:
- Loss of competitive advantage: leaked source code may contain features that are not yet available to the public.
- Reputational risks: any compromise impacts the company's reputation, but the theft of something as important as source code has an even more significant impact on trust.
- Security risks for products under deveopment: attackers can use source code to find and exploit existing vulnerabilities, which is especially critical if the application is widely used by different organizations.
In April 2024, attackers announced the theft of source code of software developed in 150 companies with archives totalling upwards of 853 GB. The volume of unpacked data is almost 2 TB, confirming the large scale of the hack. Compromised organizations include big names such as Fujitsu, Dracena Smart City, and Kraken Robotics.
Personal data: lower number, higher scale
In general, the largest share of breaches is normally of personal data of the clients and employees of organizations. The dynamics of the number of personal data breaches since the beginning of 2023 show the largest surge in Q2 2023, where personal data accounted for more than half of all breaches (53%). We talked about the beginning of massive attacks on secure data transmission systems in our 2023 year-end research.
Over time, the share of personal data in leaks began to decrease, but this is tempered by an increase in the number of breaches of other types of data, which we will discuss below. The share of incidents resulting in a personal data leak decreased in Q1 2024 to the 2022 level and totalled 37%, then in Q2 2024 decreased to 25%.
However, despite some reduction in the overall number of personal data breaches and their share among other types of data, attackers have been compromising large companies and extracting large databases. According to Data Breaches Digest, in the first half of the year alone, the number of data strings breached exceeded the entire volume breached last year.
The first half of the year was marked by a number of major leaks of personal data on the scale of entire countries. In April, there were headlines about the personal data breach of over 5 million citizens of El Salvadore, or roughly 80% of the country's entire population. The attacker posted a 144 GB data dump containing 5.1 million photos of citizens along with their Salvadoran Domestic ID (DUI) numbers. The personal data published also included first names, last names, dates of birth, telephone numbers, email addresses, and residential addresses. This particular breach marks one of the first instances in history where almost an entire country's population suffered from a biometric data breach. The leak of El Salvador citizen personal data is believed to be related to the compromise of the Chivo crypto wallet, which is used by the government to make cryptocurrency payments. Note that El Salvador was the first country in the world to officially accept cryptocurrency as legal tender. Later, confidential data was published of the Chivo crypto wallet itself, including the source code and VPN credentials to access the ATM network.
In the first half of 2024, government organizations were often targeted by cybercriminals specifically to steal personal data. For example, DAIXINTeam announced a ransomware attack on the Dubai municipality. The group claims to have stolen 60–80 GB of scans and PDF files containing lists of IDs, passports, and other files with personal data. Another major incident occurred in the France Travail government agency responsible for registering unemployed citizens: the personal data of 43 million citizens was breached, or 60% of the total population, who had registered as unemployed over the past 20 years. The stolen data includes names, dates of birth, places of birth, social security numbers (NIR), France Travail identifiers, email addresses, residential addresses, and phone numbers.
At the very beginning of the year, telecom operators in India suffered another major leak of personal data. The breach is believed to have affected around half of India's 750 million citizens and includes names, mobile phone numbers, addressess, and in some cases data from Aadhaar, India's national biometric system, where the majority of the country's population is registered. Then in May, a dark web forum ad was posted for $80,000 for a large data dump from the Indian state-owned telecommunications company (BNSL) including IMSI (International Mobile Subscriber Identity) information, SIM card data (including PIN and PUK codes), HLR (Home Location Register) information, and data compromising the organization's infrastructure.
The personal data of citizens is then used by criminals for various purposes, including fraudulent transactions, blackmail, and phishing campaigns to extort money or sell services. Criminals also use specialized AI-powered services to generate fake scans of passports or identity cards (National ID Card). For example, at the beginning of 2024, there were documented cases of OnlyFake being used to generate fake passports for as little as $15 and their successful use to confirm identities on crypto exchanges using Know Your Customer (KYC) standards. The image below generated by OnlyFake is made to look like a photograph of a passport laying on a blanket or rug.
There were also breaches of biometric data. As the world continues to go digital, biometrics are increasingly used not only as a method of authorization on personal devices, but also to make payments. However, despite all the benefits and convenience of biometrics for users, biometric data is just as likely to be breached as any other type of personal data. Biometric data is also much more problematic to change once compromised (compared to credentials). In Q2 2024, during elections in India, a 500 GB database of fingerprints and face scans of police officers, military personnel, and civilians was leaked, raising concerns about identity theft and election security.
Better safe than sorry: government agencies under threat
A data leak in one company can serve as a basis for attacks on another if the leak also affects the company's clients as well as the company iteself. In H1 2024, we noticed a trend of data leaks from government agencies due to the compromise of their contractors and counterparties. This isn't a theoretical situation, but a real confirmed threat. The Washington, D.C. Department of Insurance, Securities and Banking disclosed that 800 GB of data allegedly stolen by LockBit ransomware was obtained through an attack on software provider Tyler Technologies. Tyler Technologies confirmed that an isolated portion of its private cloud hosting environment containing customer data was compromised last month, giving attackers access to sensitive government data.
Telecommunications companies are just as vulnerable as IT companies as a source of confidential data from government and other organizations. In H1 2024, attackers stole 1.7 TB of data after compromising Chunghwa Telecom and put it all up for sale. Initial investigation showed that the hackers obtained confidential information from Chunghwa Telecom and its government counterparts in Taiwan, including the Ministry of Foreign Affairs, the Coast Guard, and other departments.
Leaks from companies providing various types of services have an especially high risk. For example, an unknown attacker published files presumably stolen from Acuity, a state contractor. The data is from multiple government agencies, including the U.S. State Department, Department of Defense, and National Security Agency. Acuity is a consulting firm with approximately 400 employees and an annual revenue of over $100 million specializing in DevSecOps, IT operations and modernization, cybersecurity, data analytics, and operational support with a focus on government agencies. The cybercriminal claims that the files contain classified information from the Five Eyes intelligence network established between Australia, Canada, New Zealand, the United States, and the United Kingdom. The dump allegedly contains full names, work email addresses, and work and personal phone numbers of government, military, and Pentagon employees, as well as their personal email addresses.
Methods used to attack organizations and steal confidential data
The main methods of successful attacks leading to the leakage of confidential data in H1 2024 were the same as usual: malware, social engineering, and exploitation of vulnerabilities.
Ransomware is the most popular type of malware used by cybercriminals to steal confidential information. In May 2024, a LockBit ransomware attack on semiconductor solutions provider Kulicke and Soffa resulted in the leak of 20 TB of data from more than 2,000 corporate devices. The compromised data includes partner and client files, financial and accounting information, email backups, archives, personal files, source code, internal correspondence, and correspondence with clients.
Some industries are more likely to be targeted by ransomware than others, including healthcare organizations. In 2023, every fifth ransomware attack was directed at organizations in this industry. In the first half of the year, this trend was confirmed worldwide with healthcare organizations accounting for 15% of successful ransomware attacks, leading to both information leaks and the disruption of operations. In February 2024, a Blackcat ransomware attack on Change Healthcare resulted in healthcare disruptions across the U.S. and a 6 TB data leak. In June, Change Healthcare reported that sensitive patient medical data was breached (diagnoses, medications, test results, images, and care and treatment plans). Another notable attack targeted NRS Healthcare, a UK company, with RansomHUB ransomeware, resulting in the leak of 578 GB of data spanning more than 600,000 private files and including contracts and financial reports.
We also highlighted the growing trend of remote access trojans (RAT) among attackers in Q1 2024. In general, the entire first half of the year confirms this trend with a notable effect on the number of leaks. Presumably, the Cosmic Leopard group (Pakistan) continued to target Indian organizations and individuals in the government, defense, and technology sectors in the first half of 2024. The attacks rely on social engineering and the GravityRAT remote control malware, HeavyLift loader, and Gravity Admin tool for administering infected systems. Cosmic Leopard steals personal data, correspondence, and technical information about victims' devices.
Infostealers are also a popular tool among attackers to gain access to confidential data. For example, the open source HackBrowserData infostealer designed to collect user credentials, cookies, and browser history was used in a campaign against government organizations in India responsible for IT management, national defense, and electronic communications. The attacker also targeted private Indian energy companies, extorting financial documents, employee personal data, and information about oil and gas well drilling. In total, the attacker stole 8.81 GB of data, leading analysts to state with medium confidence that the data could facilitate further penetration into government infrastructure in India.
Another common way to distribute malware is through critical-level vulnerabilities in popular software. In the first half of 2024, cybercriminals exploited the CVE-2023-7028 vulnerability in GitLab, which allows unauthorized persons to send emails to reset an account password and change the password without user interaction. Compromising access to the CI/CD service allows attacker to access the source code of applications and other confidential information, as well as introduce malicious code that is transferred automatically to servers of the company that is developing an application or system for a customer. In January 2024,according to threat monitoring service Shadowserver reports, more than 5,000 vulnerable GitLab instances were available online, and as of May, more than 2,000 instances remained vulnerable, with the largest number belonging to companies in the U.S. and Russia. Access to a GitLab repository was the starting point of an incident in Sisense, a software developer for business analytics. Access credentials for Sisense buckets in Amazon S3 were found in a compromised GitLab repository. Attackers used access to S3 to steal several terabytes of Sisense customer data. Millions of access tokens, email account passwords, and even SSL certificates are believed to have been stolen.
To extract information from a compromised system, attackers often use standard tools approved for use in an organization. In the first half of 2024, the Positive Technologies Expert Security Center (PT ESC) team observed the use of Telegram as a command-and-control server (C2), especially to steal user credentials. According to the Recorded Future cyberthreat analysis, the trend continues of using legitimate services and platforms (Google Drive, Microsoft OneDrive, Dropbox, Notion, Firebase, Trello, Discord, GitLab, and BitBucket) to store stolen data, including GitHub. Cloud services permitted for employee use can lead to serious data breaches. For example, earlier this year, an attacker secretly working for two competing tech companies in China was able to extract from Google's infrastructure around 500 confidential files containing artificial intelligence trade secrets. First, he copied the confidential files to Apple Notes, then converted the notes into PDFs and sent them to his personal Google Drive account.
Data storage locations attacked by hackers
According to IDC, the global volume of data will increase by a factor of four by 2025 due to an increase in the level of digitalization in various industries and countries, and the introduction of new technologies in business processes, including AI and the large volume of data it requires. For example, the Alfa Leasing insurance company, a subsidiary of Alfa Bank, is implementing an AI solution to optimize insurance product sales by providing the model with access to a large volume of personal data about the company's clients. The majority of valuable data in organizations is stored in SQL and NoSQL databases, object storage (S3), data lakes, and file servers located in the infrastructure with the necessary computing and network resources (in-house, on-premise), or trasferred to public cloud providers (Yandex Cloud, VK Cloud, AWS, Microsoft Azure). However, attackers can obtain certain critical information, such as credentials, from end-user devices or company web resources accessible from the network, which then leads to the further compromise of other infrastructure components.
The image below shows the main types of data storage in organizations' infrastructure.
The type of data asset depends on how the information is stored and processed. Data can be structured, semi-structured, or unstructured.
Structured data is information presented in a clear, pre-defined scheme that generally doesn't change during the process of recording, storage, and processing, and consists of fields storing pre-defined data types.
First and foremost, relational database management systems (PostgreSQL, MySQL) and other tabular data structures (Microsoft Excel) are considered structured data. In most cases, the data stored includes the personal data of an organization's employees and clients, information about services, statistical data for analyzing company performance and and forecasting development, and information about financial transactions (revenues and expenditure).
Structured data leaks are usually measured in the number of strings in tables and databases. Experts determine the value of such leaks based on several factors: the completeness and detail of data, the company's region and industry, the number of unique reliable strings, and the potential to use the data in other attacks or fraudulent operations. We've already cited several examples in this report of large-scale personal data leaks that affected major companies and entire countries. In many cases, tens to hundreds of millions of strings are compromised. Based on dark web ads in H1 2024, in half of all cases the databases contain less than 100,000 strings, with every fourth ad containing more than 1 million.
Semi-structured data does not adhere to the tabular structure of data models associated with relational databases or other forms of data tables, but still contains tags or other markers to separate semantic elements and create a hierarchy of strings and fields within the data.
Not all data can be presented as a table with a fixed set of fields and data types. A lot of tasks require the rapid processing of large volumes of information, albeit with a specific structure and dependencies. In these cases, a specialized system for storing and processing information is used, the main class of which includes non-relational (NoSQL) database management systems (MongoDB, Redis, Elasticsearch, Cassandra, Neo4j). In March, an ad was published on the dark web about the hack of an Elasticsearch database of Qatar Living, a social network platform in Qatar. The leak contains 115,000 strings, including user information such as names, email addresses, phone numbers, images, roles, permissions, and additional data.
Semi-structured data is often stored in JSON, XML, YAML, and syslog format. Information systems also often have internal or external API endpoints for exchanging information where the input and output data are also transmitted in JSON or XML formats. The configuration parameters of applications and services are often stored in these formats. Semi-structured data can also include telemetry from various devices and event logs. In addition, many companies around the world use Atlassian Confluence to store information and collaborate on projects, enabling them to store large amounts of sensitive information in a centralized location, but also making the system an attractive target for attackers. For example, earlier this year, U.S. company Cloudflare, which provides CDN services, protection against DDoS attacks, a secure proxy server for accessing resources, and DNS servers, reported that it had fallen victim to a cyberattack. The target was access to Confluence resources—a Jira issue database and Bitbucket source code management system—followed by the extraction of valuable information. However, Cloudflare detected the malicious activity in time and stopped the attack before the attackers could extract any sensitive information from the system.
Unstructured data is information that doesn't conform to a pre-defined model or data structure and typically includes various types of objects (file directories, text, audio and video files, objects of arbitrary type).
A significant portion of all information stored digitally is made up of unstructured data. According to IDC, as of 2022, 90% of all data created and collected by organizations is unstructured. Unstructured information usually includes documents, multimedia files (photos and video and audio files, such as phone conversations), and other files of any type, such as source code, projects, blueprints, contracts, and emails.
Unstructured data is typically stored in storage systems such as file servers, object storage S3, data lakes, and email servers. Data storage typically houses a lot of valuable information and also has a web interface, which makes it an appealing target for attackers. For example, cybercriminals create and fine-tune tools to hack cloud storage, content management systems (CMS), and SaaS platforms such as Amazon Web Services (AWS), Microsoft 365, PayPal, Sendgrid, and Twilio.
Unstructured and semi-structured data leaks are usually measured in terms of data volume. The smallest leaks contain under 100 MB and account for 40% of all ads on the dark web, and the largest data directories—100 GB or more—account for less than one-tenth.
Unstructured data leaks are often an archive of files that may also contain database files. This occurs when attackers compromise multiple resources in an organization simultaneously, or if the organization has a policy of storing all information in one place, regardless of its purpose and value. For example, Pak Suzuki Motors, a subsidiary of Japanese automaker Suzuki, faced a crisis this spring when around 450 GB of its unstructured data was leaked. Early in the second quarter, an ad was posted on the dark web offering the information for $5,000, including financial, accounting, employee, compliance, and administrative documents, IT application source codes, email addresses (PST) of managers and executives, passports, payroll and tax documents, SAP and ERP data, internal databases, VoIP recordings (March 2024), contracts with other companies, and approximately 37 GB of data from Suzuki's headquarters in Japan.
Weaknesses in data management processes
The success of a cyberattack in terms of compromising and gaining initial access to an infrastructure depends on many factors. For example, the operation of a vulnerability management process, systems for analyzing malicious email attachments to combat phishing campaigns, and the maturity of authentication management policies.
The methods that attackers use to extract valuable information also depend on the maturity of data management processes. As the volume of information grows, so does the number of users and interactions between them, making it increasingly difficult for IT and information security departments to monitor and ensure a sufficient level of data security. According to the Immuta data security report, a third of respondents state that in 2024, the issue of data management and security is an even higher priority than implementing AI in business processes. Furthermore, JupiterOne research shows that in 2023, data as a protected asset accounted for 39% of all company resources, or almost twice as much as the share of protected devices.
In terms of modern data security practices, the following main processes stand out as most significant for preventing leaks of confidential information and responding to attempts to access data in a timely manner.
Data asset management
Data asset management involves the identification and documentation of all storage locations of data processed by the company, including the identification of data types and access to them. Data asset management helps detect redundant or outdated information at an early stage, identify potentially vulnerable storage locations, and consider these facts when further planning information security measures. The absence of structured data asset management increases the risk of data breaches from cyberattacks. For example, after transitioning to a new platform, a company's infrastructure may still contain an outdated version of a database containing sensitive corporate information.
Not having a structured data asset management process also increases the risk of data exposure when internal company information is placed on unprotected resources (for example, during technical maintenance), after which specialists forget to close access to it. IT giant Microsoft suffered from a similar incident in Q1 2024. A data storage server was accessible to the public after being hosted in the Microsoft Azure cloud and not password protected. The server stored internal information related to Microsoft Bing, along with fragments of source code, scripts, and configuration files containing passwords, keys, and credentials for accessing other internal Microsoft databases and systems.
Data classification
Classifying data based on its level of confidentiality and criticality helps organizations determine which information requires a higher level of security and whether existing processes for storing and processing it meet information security requirements. Data classification allows to efficiently allocate resources to protect the most important data and avoid unnecessary expenses on less critical data. Based on the 2024 Data Threat Report by Thales, only 30% of companies responded that they could properly classify all the information they process. Due to the lack of clear visibility of the most valuable data, organizations have significant violations in how they store and process such information, including the storage of confidential data together with information of a broader application and purpose. For example, in March 2024, the Swiss National Cyber Security Centre (NCSC) published its data breach investigation results from a successful cyberattack more than 9 months ago (May 2023) by the Play criminal group on IT company Xplain, which provides software solutions to Swiss government agencies. The breach exposed approximately 65,000 Swiss government documents. In an annoucnement on the official NCSC site, the center reported that the investigation length was due to the complexity of analyzing unstructured data and the large volume of the leak.
Data access management
Managing access to protected resources, especially data, is a key aspect of information security. Reducing the number of users with access to critical data reduces the cyberattack surface, and ensuring that employees only have access to the data they need for their job reduces the risk of insider threats. The Immuta 2024 State of Data Security Report states that on average, at least one-third of data management and security tasks are delegated to the IT department (Data Platform Team). This can be attributed to the fact that information systems process a large volume of information, including customer data, business process data, and data for training and managing AI algorithms, which requires tasks to be divided between several departments, including lawyers and auditors in addition to IT and information security teams. This creates the need for significant development of DataSecOps, which has a special focus on data security. Immuta respondents also note that in 2024 (as in 2023), the main problem remains the poor visibility of real access rights to data.
One common consequence of insufficient control over current data access rights is the existence of outdated, unused, and test accounts that attackers find and use to compromise systems. At the beginning of the year, Microsoft suffered from yet another cyberattack, this time by the Midnight Blizzard gaining access to corporate mailboxes of executives and cybersecurity specialists. They obtained initial access to the system by using a dictionary to bruteforce the password of an outdated, unused account where multifactor authentication was disabled.
Data access monitoring, behavioral analysis, and anomaly detection
Failing to monitor data access can allow hackers to gain access to a system and exfiltrate data undetected for a long period of time. For example, in February, the Companies and Intellectual Property Commission (CIPC) of South Africa fell victim to a cyberattack resulting in the breach of confidential data from up to 3 million legal entities and individuals. The attackers claim that they gained access to CIPC infrastructure in 2021 and remained in the shadows for years with access to the organization's information resources.
The data leak of 49 million clients from electronics manufacturer Dell, which became known at the end of April after a cybercriminal posted an ad on the dark web, is an example of the consequences of a poorly structured access rights management process and lack of data operation monitoring. The attacker infiltrated a Dell partner reseller and retail portal whose API could be used to search for order information. Access to the portal was obtained by registering multiple accounts on behalf of non-existent companies whose data was not verified in the system in any way. The status of an authorized partner was assigned immediately after registering an account. The attacker then created a program that generated around 5,000 requests per minute and collected data over several weeks, as requests for data were not monitored or blocked in any way. Ultimately, the attacker collected an impressive amount of information in a short time. Dell confirmed the leak and emailed its customers to inform them of the incident.
How to protect data against being breached
Protecting an organization from data breaches requires comprehensive technical and organizational measures across the entire infrastructure. The following are our recommendations to help prevent the leakage of valuable data.
Protection against network and user device compromise
Confidential data breaches are one of the final stages of cyberattacks, so first organizations need to protect against the penetration of cybercriminals into their infrastructure. In 2023, results of pentesting by the PT Security Weakness Advanced Research and Modeling team (PT SWARM) show that the successful penetration of an internal network can be a result of weak password policies, vulnerabilities in web application code, and errors in the configuration of services on the network perimeter. One common configuration flaw in such systems is the lack of two-factor authentication or insufficient verification of user authorization. To protect the network perimeter, next-generation firewalls with a wide range of functionality are crucial. For example, PT NGFW has an intrusion prevention system (IPS), user and application control, a TLS inspection system, and URL filtering. To protect web interfaces, web application level firewalls are also important. For example, Positive Technologies uses PT AF, which detects with high accuracy and blocks attacks including OWASP Top 10, WASC, Level 7 DDoS, and zero-day attacks. It's also crucial to monitor the appearance of new vulnerabilities on the infrastructure perimeter and promptly eliminate them using Vulnerability Management solutions, for example MaxPatrol VM.
IT companies that develop and deliver solutions to other organizations need to build secure development processes in a way that prevents vulnerabilities from appearing in their software. SAST and DAST solutions used at the new software project development and testing stage help with this. For example, during the process of writing code, the PT Application Inspector static analyzer generates test exploit requests to check the potential of exploiting vulnerabilities, and the PT BlackBox dynamic application analysis tool simulates the behavior of attackers with no knowledge of the internal structure of the application and automatically assesses the security of web applications without using any source data other than the web target address.
It's also important to ensure that endpoints are protected from compromise, malware injection, and credential leaks. The end devices of users, especially employees administering key and target systems, may contain credentials for accessing critical systems and information resources of the company and its counterparties. To protect end devices, we recommend using EDR solutions, such as MaxPatrol EDR, as well as mail gateways and sandbox solutions to scan all incoming emails and attachments for malicious content. For example, PT Sandbox uses machine learning to automatically detect malware penetrating the organization from various sources (email attachments, files, web traffic) and block its distribution.
Internal perimeter protection
If cybercriminals penetrate the infrastructure, they must be stopped from reaching the company's critical systems and information resources. It's important to ensure the security of servers and network infrastructure at a lower level from the potential actions of a cybercriminal who successfully penetrated the infrastructure. The network should be segmented, and organizations must implement processes for the uninterrupted management of vulnerabilities and configurations of devices and system and application software. Firewalls should also operate in accordance with the permitted information flows within the infrastructure and for interactions with external systems. Properly configured firewall policies and network traffic analysis systems (for example, PT NAD) help stop cybercriminals when they attempt to exfiltrate data from the infrastructure by C2 channels. Also keep in mind the need for continuous monitoring of information security events and information security incident management using SIEM systems, for example MaxPatrol SIEM.
Today, NTA, Sandbox, and EDR solutions are also crucial for combating malware attacks, which are becoming increasingly sophisticated and trigger non-tolerable events every day for companies around the world.
Data protection
The most important thing is for organizations to manage and classify data assets. This helps determine what information is stored and processed in the system, and which data is valuable and requires additional security. If an organization doesn't know what assets it has, it can't protect them. Once an understanding is reached of which resources are most important to the organization and its counterparties or clients, appropriate security measures can be implemented. Another important part of data security is determining the minimum necessary access rights of employees and external users to data based on their needs, and the automated monitoring of compliance with permitted information flows and access rights. Monitoring actions involving data (reading, writing, deleting, or changing access rights) and analyzing behavioral anomalies are one of the most important aspects of managing data security.
Solutions to one or more of the problems covered here are already ready for use in information security solutions available today. For example, DAM/DBFW protect structured data, DAG/DCAP ensure the protection of unstructured and semi-structured data, and DLP systems help protect information from insiders. The functionality that organizations need is also part of a number of IT solutions used in corporate platforms for storing and processing data and the data governance process, including data lineage and data catalog/metadata management solutions.
As the size of infrastructure and volume of resources to be protected grows, organizations need more security solutions. However, managing these solutions becomes increasingly complicated and results in lower overall security effectiveness. Company infrastructures are turning into complex systems with a large number of internal elements and connections, and a high rate of change. To ensure the security of diverse data infrastructures, organizations need a single solution that can protect data assets regardless of their structure and storage location. We see potential in the Data Security Platform concept, which aims to protect data based on its criticality, not by how and where it's stored. We also want to emphasize the importance of automating ongoing, labor-intensive DataSecOps tasks, primarily data asset management and data classification, as we see how poor visibility of the data infrastructure creates blind spots in information security systems.
Conclusion
The first half of 2024 was marked by a number of significant data breaches that highlighted the vulnerability of organizations around the world to cyberthreats. There was a decrease in the overall number of personal data leaks, but their volume of increased significantly due to attacks on cloud service providers, telecom operators, and IT companies. Personal data is sold openly by cybercriminals on the dark web and then used in fraudulent schemes, which creates risks of financial losses for both individuals and organizations. We also note an increase in the number of biometric data leaks, which will likely lead to new fradulent manipulations in the future to bypass authentication and identity verification systems.
Malware continues to be the most common method used to carry out cyberattacks. In H1 2024, cybercriminals struck gold by distributing malware through code repositories, which significantly affected key trends and contributed to an unprecedented increase in credential leaks. The mass distribution of malware through public repositories has led to an increase in successful attacks on IT companies around the world, many of which are contractors for government agencies. As a result, government agencies became leaders in the number of information leaks, which were mainly carried out through communication channels between companies. In many cases, breaches were also due to the unsecure storage of credentials and other sensitive information by contractors.
In the first half of 2024, we also saw a significant increase in the number of compromised trade secrets, especially source code, from IT and industrial companies. This creates risks of exposing the architecture of IT systems and technologies, which in addition to reputational and financial risks, can help cybercriminals identify potential vulnerabilities and exploit them in future cyberattacks. Legislative requirements in many countries require healthcare institutions and government, financial, and critical information infrastructure organizations to maintain a comprehensive information security system, but often leave contractors, including IT companies, unregulated. This creates additional information security risks and should be considered by organizations when assessing threats and implementing information security measures.
About the report
This report contains information on current global information security threats based on Positive Technologies' own expertise, investigations, and reputable sources.
We estimate that most cyberattacks are usually not made public due to reputational risks. As a consequence, even companies specializing in incident investigation and analysis of hacker activity are unable to quantify the precise number of threats. Our research seeks to draw the attention of companies and ordinary individuals who care about information security to the key motives and methods of cyberattacks, as well as to highlight the main trends in the changing cyberthreat landscape.
This report considers each mass attack (for example, phishing emails sent to multiple addresses) as one incident, not several. For explanations of terms used in this report, please refer to the Positive Technologies glossary.
Helpful files
Download PDF
Get in touch
will contact you shortly