Network datasets

Overview of network forensic tools and datasets

Network datasets

A dataset is a set of packet capture files that can be analyzed using the network packet analyzers. Many network datasets are available on the Internet.

In 2019, the authors of the article “A survey of network-based intrusion detection data sets” published in the journal “Computers & Security,” researched the network-based datasets. They described available packet-based and flow-based datasets for IDS in the mentioned article. The discussed datasets include AWID (2016), Booters (2015), Botnet (2014), CIC DoS (2017), CICIDS 2017 (2018), CIDDS-001 (2017), CIDDS-002 (2017), CDX (2009), CTU-13 (2014), DARPA (2000), DDoS 2016 (2016), IRSC (2015), ISCX 2012 (2012), ISOT (2011), KDD CUP 99 (2018), Kent 2016 (2015), Kyoto 2006+ (2011), LBNL (2005), NDSec-1 (2017), NGIDS-DS (2017), NSL-KDD (2009), PU-IDS (2015), PUF (2018), SANTA (2014), SSENET-2011 (2011), SSENET-2014 (2014), SSHCure (2014), TRAbID (2017), TUIDS (2012), Twente (2009), UGR’16 (2018), UNIBS (2009), Unified Host and Network (2017), UNSW-NB15 (2015). The article is available under DOI 10.1016/j.cose.2019.06.005.

There are also some websites that contain set of publicky available PCAP files such as NETRESEC publicly available PCAP files. The NETRESEC provides a list of several publicly available datasets separated into categories: Cyber Defence Exercises (CDX), Malware Traffic, Network Forensics, SCADA/ICS Network Captures, Capture the Flag Competitions (CTF), Packet Injection Attacks/Man-on-the-Side Attacks, Uncategorized PCAP Repositories, and Single PCAP files.

The following sections provide a detailed description of some datasets.

Canadian Institute for Cybersecurity datasets

Link: official website

The “Canadian Institute for Cybersecurity” created datasets that are focused of several aspects of cyersecurity. The currently available datasets include Android malware, DoS, VPN, Tor, IPS/IDS, and DNS over HTTP traffic. Some datasets are described in the following sections.

CIC-DDoS2019

Link: official website

The dataset DDoS2019 is a dataset of “Canadian Institute for Cyersecurity” that contains benign and most up-to-data DDoS attacks. The dataset contains realistic background traffic. There was built the abstract behaviour of 25 users based on the HTTP, HTTPS, FTP, SSH, and email protocols.

The dataset contains captured data from 2 days. The first day, the training day, took place on 3.11.2018, started at 09:40 and ended at 17:35 local time (converted into UTC time format: from 12:40 UTC to 20:35 UTC). The second day, the testing day, took place on 1.12.2018, started at 10:30 and ended at 17:15 local time (converted into UTC time format: from 13:30 UTC to 20:15 UTC). The original dataset description uses wrong dates in the research paper (followed by switched naming of the first and the second day) - ,they use the first day as the January 12th and the second day as the March 11th. The information used in this document is based on the PCAP files and CSV files of this dataset, not the research paper. Therefore they differ from the original dataset’s descriptions. This dataset includes PortScan, NetBIOS, LDAP, MSSQL, UDP, UDP-Lag, SYN, NTP, DNS, SNMP, SSDP, WebDDoS, and TFTP attacks.

The following table contains the victim network information. The attacker network consists of the third party company.

   
Firewall 205.174.165.81 (Fortinet)
Victim 192.168.50.4 (First day), 192.168.50.1 (Second day) (Web server Ubuntu 16.04), 192.168.50.9 (First day), 192.168.50.8 (Second day) (Win 7 Pro), 192.168.50.6 (First day), 192.168.50.5 (Second day) (Win Vista), 192.168.50.7 (First day), 192.168.50.6 (Second day) (Win 8.1), 192.168.50.8 (First day), 192.168.50.7 (Second day) (Win 10 Pro 32)

The original dataset PCAPs are split into multiple PCAP files. The first day contains 145 PCAPs, the second day contains 818 PCAPs. The individual capture days of the dataset are discussed in the following sections. Firstly, the essential time frames of some individual PCAPs of that day are described (timestamps of the start and end of attacks). Secondly, the annotation of the whole day is provided. The time is in the UTC format. The description of the attacks are based on the research paper “Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy”

First day - training day

The following table dispalys important time frames of some PCAP files of the first day 3.11.2018.

PCAP filename Time range (UTC)
SAT-03-11-2018_000 12:18:16.583626 - 13:01:48.920573
SAT-03-11-2018_011 13:09:00.565557 - 13:21:56.124692
SAT-03-11-2018_068 13:29:52.072724 - 13:34:11.268896
SAT-03-11-2018_106 13:42:57.176611 - 13:54:11.631481
SAT-03-11-2018_136 14:01:43.652741 - 14:14:54.297925
SAT-03-11-2018_137 14:14:54.298079 - 14:30:25.830426
SAT-03-11-2018_145 17:51:18.675623 - 20:36:56.349321

Second day - testing day

The following table dispalys important time frames of some PCAP files of the second day 1.12.2018.

PCAP filename Time range (UTC)
SAT-01-12-2018_0 13:17:10.711517 - 14:36:06.133219
SAT-01-12-2018_027 14:36:59.617966 - 14:37:02.505099
SAT-01-12-2018_0188 14:44:33.210758 - 14:46:30.026952
SAT-01-12-2018_0190 14:48:26.225518 - 14:51:39.813446
SAT-01-12-2018_0194 14:57:43.395236 - 15:00:26.604875
SAT-01-12-2018_0195 15:00:26.604876 - 15:03:06.989875
SAT-01-12-2018_0305 15:11:56.643849 - 15:12:00.253348
SAT-01-12-2018_0324 15:12:59.381993 - 15:13:02.627201
SAT-01-12-2018_0381 15:22:58.494906 - 15:23:07.641045
SAT-01-12-2018_0387 15:23:53.444988 - 15:24:02.172861
SAT-01-12-2018_0407 15:26:51.191475 - 15:27:00.259048
SAT-01-12-2018_0414 15:27:56.123811 - 15:28:05.086642
SAT-01-12-2018_0443 15:32:32.915441 - 15:37:20.477580
SAT-01-12-2018_0446 15:37:56.549979 - 15:38:15.028105
SAT-01-12-2018_0467 15:44:53.078912 - 15:45:12.275065
SAT-01-12-2018_0470 15:45:48.874827 - 15:46:07.180524
SAT-01-12-2018_0486 16:00:13.902782 - 16:13:19.200714
SAT-01-12-2018_0501 16:14:53.513548 - 16:15:00.789394
SAT-01-12-2018_0510 16:15:58.289530 - 16:16:05.415448
SAT-01-12-2018_0526 16:17:53.645195 - 16:18:00.844202
SAT-01-12-2018_0535 16:18:57.740588 - 16:19:04.830961
SAT-01-12-2018_0577 16:28:47.412567 - 16:29:26.085243
SAT-01-12-2018_0578 16:29:26.085244 - 16:30:14.334464
SAT-01-12-2018_0584 16:33:24.858564 - 16:34:12.351220
SAT-01-12-2018_0586 16:34:45.229199 - 16:35:19.639364
SAT-01-12-2018_0589 16:35:55.110452 - 16:36:11.265191
SAT-01-12-2018_0817 18:02:49.179574 - 20:59:05.159078
SAT-01-12-2018_0818 20:59:05.159081 - 21:16:39.140675

CIC-IDS2017

Link: official website

The dataset “IDS 2017” contains benign and the most up-to-date common attacks. It reflects a realistic background traffic. This dataset contains the built abstract behaviour of 25 users based on the HTTP, HTTPS, FTP, SSH, and email protocols.

The captured data are spitted into 5 PCAP files according to the day of the week they were captured. The data are captured from 3.7.2017 12:00 PM UTC (Monday) to 7.7.2017 8:00 PM UTC (Friday), in local time from Monday 9:00 AM to Friday 5:00 PM. This dataset include Brute Force FTP, Brute Force SSH, DoS, Heartbleed, Web Attack, Infiltration, Botnet and DDoS attacks.

The following table contains the network information of the dataset, including firewall, DNS server, attackers network and victim network.

   
Firewall 205.174.165.80, 172.16.0.1
DC and DNS Server 192.168.10.3 (Win server 2016)
Attackers 205.174.165.69, 205.174.165.70, 205.174.165.71 (Win)
Victim 192.168.10.50, 205.174.165.68 (Web server Ubuntu 16), 192.168.10.51, 205.174.165.66 (Ubuntu server 12), 192.168.10.19 (Ubuntu 14.4, 32B), 192.168.10.17 (Ubuntu 14.4, 64B), 192.168.10.16 (Ubuntu 16.4, 32B), 192.168.10.12 (Ubuntu 16.4, 64B), 192.168.10.9 (Win 7 Pro, 64B), 192.168.10.5 (Win 8.1, 64B), 192.168.10.8 (Win Vista, 64B), 192.168.10.14 (Win 10, pro 32B), 192.168.10.15 (Win 10, 64B), 192.168.10.25 (MAC)

The following description of the individual days is based on the dataset description and the research paper “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization”.

Timeline is displayed in UTC (Coordinated Universal Time) format. Involved hosts displayed in each day include only hosts from network information (attackers, victim, firewall).

Monday

Tuesday

Wednesday

Thursday

Friday

Nitroba University Harassment Scenario

Link: official website

A “Nitroba University Harassment Scenario” is a hypothetical network forensic scenario created by the “Digital Corpora”. The scenario consists of the slides that introduce the problem (PDF, PPT or TXT file), screenshots in PNG format as a part of the problem introduction, and a PCAP file of the captured traffic. There is also available password-protected solution of this scenario.

The background of this case is the harassment of the teacher Lily Tuckrige (lilytuckrige@yahoo.com). She thinks that harassing emails are from one of her students (Amy Smith, Burt Greedom, Tuck Gorge, Ava Book, Johnny Coach, Jeremy Ledvkin, Nancy Colburne, Tamara Perkins, Esther Pringle, Asar Misrad, Jenny Kan). The provided information contains screenshots of the harrasing emails (including the email header), the IP from the email (140.247.62.34) that points into 34.62.247.140.in-addr.arpa domain name pointer G24.student.nitroba.org, this Nitroba dorm room has wifi without password and three women live here (Alice, Barbara, Candice). The PCAP capture file contains traffic from the packet sniffer placed on the ethernet port. The goal of this scenario is to determine who is responsible for the harassing emails.

Case theory - report of the scenario

Gmail user jcoachj@gmail.com logged into the gmail on 22.7.2008 06:01:02 UTC on the computer with IP 192.168.15.4 with operating system Apple iOS. Using the same web browser, the user searched for “how to annoy people”, “sending anonymous mail” and “I want to harass my teacher” on Google approximately about 05:57 on 22.7.2008. Then the user search for “can I go to jail for harassing my teacher” on 22.7.2008 05:58.

After that, at 05:59, there was a login on mail.live.com. At 06:00, the mail.google.com was visited by the user jcoachj@gmail.com (used gmailchat cookie) witch proves that this user used this computer (IP: 192.168.15.4).

At 06:01, the user visited www.sentanonymousemail.net. Then the user sent two emails using anonymous mail delivery. The first one was sent using www.sentanonymousemail.net on 22.7.2008 06:02:57 UTC. The second one was sent using willselfdestruct.com on 22.7.2008 06:04:24 UTC. After that, the user searched for “where do the cool kids go to play” on Google and visited youtube.com. These actions prove that the Johnny Coach is the person who harassed his teacher Lily Tuckrige.

On 22.7.2008 06:09:59 UTC, the user amy789smith authenticated with Yahoo, but there was used different web browser, and therefore Amy Smith did not send the harassment emails.

NETRESEC Packet Injection Attacks

Link: official website

Erik Hjelmvik, in his article “Packet Injection Attacks in the Wild,” focused on the packet injection attacks that have been running for several months and that was still active in 2016. They attempted to recreate these packet injections and provided PCAP files.

The first attack that they recreated was against the www.02995.com. It belongs to the “hao” group of the original research “Website-Targeted False Content Injection by Network Operators”. The second attack was against the id1.cn. This injection attack was based on the BroCon 2015.

The details of the performed attacks are described in the following sections, including annotations of the provided PCAP files.

Packet injection attack against www.02995.com

After visiting the website www.02995.com, the two responses are generated with the same sequence number (3820080905):

  1. “302 Found” - redirect to http://www.hao123.com/?tn=93803173_s_hao_pg injected packet; uses only LF as line feed in the HTTP header,
  2. “302 Moved Temporarily” - redirect to http://hao.360.cn/?src=lm\&ls=n4a2f6f3a91 real webserver response; uses the standard CR-LF line breaks in the HTTP response

The user is redirected to the http://www.hao123.com/, because the injected response arrived before the real webserver response.

Annotation of the PCAP file

Packet injection attack against id1.cn

After visiting the website id1.cn, three responses are returned:

  1. “200 OK” - redirect to http://id1.cn/rd.s/Btc5n4unOP4UrIfE?url=http://id1.cn/, real webserver response, client proceeds this website (this is the first response) and gets two injected responses and one real website response:

    • “403 Forbidden” - redirect to http://batit.aliyun.com/alww.html
    • “403 Forbidden” - redirect to http://batit.aliyun.com/alww.html
    • “200 OK” - redirect to http://id1.cn/, real website response
  2. “403 Forbidden” - redirect to http://batit.aliyun.com/alww.html, injected response,
  3. “403 Forbidden” - redirect to http://batit.aliyun.com/alww.html, injected response.

Annotation of the PCAP file

ICS Cybersecurity - DoS Attacks against SCADA-based systems

Link: official website

The ICS Cybersecurity PCAP repository is a suite of PCAP captures that includes the “modbus TCP SCADA” dataset created by a team from the University of Coimbra (Portugal), as a part of the ATENA H2020 project. This dataset was generated for the article “Denial of Service Attacks: Detecting the frailties of machine learning algorithms in the Classification Process” using MODBUS/TCP equipment in the SCADA system.

The captured data is organized into three folders containing sub-folders based on the type of the attack, including ARP-based, Main-in-the-Middle attack, Modbus query flooding, ICMP flooding, and TCP SYN flooding. In addition, a nominal state with no attack is included. There is a naming convention for the PCAP files <capture interface>dump-<attack>-<attack subtype>-<attack duration>-<capture duration>. Each attack starts 5 minutes after the first captured packet. The PCAP files with 12 hour capture duration are excluded, this project includes only 0.5 h, 1 h, 6 h, and 12 h captures. The brief overall description for each category is provided in the following sections. The time is in the UTC format, and flooding attacks hosts do not contain all involved host IP addresses (since many third-party IPs are involved in the DDoS attacks).

The following table displays the network information about this dataset.

   
Attackers 172.27.224.50, 172.27.224.80
Victim 172.27.224.11, 172.27.224.70, 172.27.224.250, 172.27.224.251

Nominal state

ARP-based, Man-in-the-Middle attack

Modbus query flooding

ICMP flooding

TCP SYN flooding

WireShark SampleCaptures

Link: official website

WireShark provides many PCAP capture files in its wiki page. Some packet capture files are described in the following sections.

SSL with decryption keys

Link: official website

Wireshark provides a list of PCAP files together with the decryption keys. Some PCAPs from the list are described in the following part of this section. The description and source of the PCAP file is retrieved from the Wireshark wiki page.

rsasnakeoil.cap

dump.pcapng

mysql-ssl.pcapng

pop-ssl.pcapng

smtp-ssl.pcapng