Datasets

iTrust hosts world class testbeds for experimentation aiming at the design of secure critical infrastructure.  These testbeds are often run non-stop for several days with and without launching attacks.  Given iTrust’s commitment to improving the security of legacy and new critical infrastructure, data so collected is made available to researchers across the world, without charge. Requests for datasets and details of the dataset characteristics can be obtained here. Please note that we may take up to three working days to process your request.

Please note iTrust provides the datasets on a good faith and “as is” basis. As of 1 Jul 2022, we will not be providing any subsequent follow up support such as answering queries or providing clarifications on the datasets, regardless of when they were downloaded. Feedback on bugs and erroneous info on the datasets are welcome, but iTrust is not obliged to reply and/or follow up on them. Thank you.

Terms of Usage of Datasets

*By requesting for and receiving the dataset, you agree to:

  1. have your and your organisation’s name and the date of request published below;
  2. give explicit credit to “iTrust, Centre for Research in Cyber Security, Singapore University of Technology and Design” when the outcome of using the dataset appears in works published (regardless of its medium) by you;
  3. inform iTrust when such works have been published; and
  4. not share the dataset with others, whether in a private or public setting (all additional requests for the same dataset needs to go through the request form).

Detailed Information on Datasets

Please read through the detailed information here before submitting a request.

Impact of datasets

The approximate number of publications (in English) that have resulted from the use of iTrust’s CPS datasets are derived from a search on Google Scholar and given below. Search results are correct as at 21 Oct 2024.

“A recent survey summarized several public datasets from the ICS domain, that can be used for cyber security research. Morris describes datasets with data collected from power systems, gas pipelines, water storage systems and energy management systems. The majority of the Morris datasets, along with others…contain only network traffic data. We have not found any evidence that the datasets described above, have been downloaded at a large scale, been applied in research competitions or have been used for educational purposes, as is the case for the iTrust datasets.”

Qi, L., Verwer, S., Kooij, R. E., & Mathur, A. P. (2019). Using Datasets from Industrial Control Systems for Cyber Security Research and Education. In Lecture Notes in Computer Science (pp. 122–133). https://doi.org/10.1007/978-3-030-37670-3_10  

“Table 4 outlines reported tools that have been used to support security research for water systems, including developing datasets for testing intrusion detection and validating mitigation techniques. The most widely known and reputable of these are the Secure Water Treatment (SWaT) testbed and water distribution testbed (WADI), both of which were implemented and deployed at iTrust.”

Tuptuk, N., Hazell, P., Watson, J., & Hailes, S. (2021). A Systematic review of the state of Cyber-Security in water Systems. Water, 13(1), 81. https://doi.org/10.3390/w13010081  

 
 

“Expensiveness in the construction and the maintenance of Physical testbeds are the first barriers a research group will encounter when deciding to build one. Suppose a research group can deal with this limitation. In that case, it is useful to share with the community datasets collected from the Physical testbed and the related documentation, as iTrust laboratory of SUTD is doing with SWaT and WADI. Furthermore, provide a simple way for other researchers to access the testbed can be an added value not only for the community, which can take advantage [sic] of it but also for the owner who can have a more critical view of the system.”

Conti, M., Donadel, D., & Turrin, F. (2021). A survey on industrial control system testbeds and datasets for security research. IEEE Communications Surveys and Tutorials, 23(4), 2248–2294. https://doi.org/10.1109/comst.2021.3094360  

“The most common datasets used to evaluate network based IDS are…not suitable for CPS security research because: (1) the collected traffic data represents generic IT networks, which lacks industrial communication protocols as well as the industrial traffic patterns, and (2) no physical system is associated with the cyber system, hence no physical data is available, which represents a key distinguishing feature of CPS security. The most widely-used datasets are generated by iTrust research center and maintained at iTrust website.”

Tantawy, A. (2022, August 17). On the Elements of Datasets for Cyber Physical Systems Security. https://arxiv.org/pdf/2208.08255