Authors: Anonymous, Arian Akhavan Niaki, Nguyen Phong Hoang, Phillipa Gill, Amir Houmansadr
Free and Open Communications on the Internet (FOCI) 2020
The open dataset contains code and datasets for the paper: Triplet Censors: Demystifying Great Firewall’s DNS Censorship Behavior.
The largest and the most important dataset is ./all_more_fields.csv.
This 14 GB file contains 120 millions forged responses injected by the GFW.
It is extracted from a set of pcap files across 9 months using:
bash -x extract_all_pcap_to_csv_more_fields.sh
For easier analysis on different injectors of the GFW, we categorize packets sent by different injectors into injector1.csv, injector2.csv, injector3.csv. To generate these files, you need to run one of the two following commands yourself:
# This is the faster way
bash -x split_by_awk_new.sh
or
# This is a more readable but much slower way
python3 split_with.py all_more_fields.csv
We categorized the code and datasets supporting different findings in the paper into different subdirectories.
There is a README.md file in each subdirectory, explaining what findings are supported by the code and dataset there.
For example, ./delay_differences/README.md reads as:
The code and dataset under this directory were used to support the following findings in our work:
"We also compare the time between sending our DNS query and when we receive the injected reply to get a sense of wherethe injectors are located. Specifically, we compare the delays of the three injectors and find that more than 90% of the time the delays are within 0.2 ms of each other. This would support the theory that these three devices are installed in the samephysical location."