Authors: Anonymous, Arian Akhavan Niaki, Nguyen Phong Hoang, Phillipa Gill, Amir Houmansadr
Free and Open Communications on the Internet (FOCI) 2020
The open dataset
contains code and datasets for the paper: Triplet Censors: Demystifying Great Firewall’s DNS Censorship Behavior.
The largest and the most important dataset is ./all_more_fields.csv
.
This 14 GB file contains 120 millions forged responses injected by the GFW.
It is extracted from a set of pcap files across 9 months using:
bash -x extract_all_pcap_to_csv_more_fields.sh
For easier analysis on different injectors of the GFW, we categorize packets sent by different injectors into injector1.csv
, injector2.csv
, injector3.csv
. To generate these files, you need to run one of the two following commands yourself:
# This is the faster way
bash -x split_by_awk_new.sh
or
# This is a more readable but much slower way
python3 split_with.py all_more_fields.csv
We categorized the code and datasets supporting different findings in the paper into different subdirectories.
There is a README.md
file in each subdirectory, explaining what findings are supported by the code and dataset there.
For example, ./delay_differences/README.md
reads as:
The code and dataset under this directory were used to support the following findings in our work:
"We also compare the time between sending our DNS query and when we receive the injected reply to get a sense of wherethe injectors are located. Specifically, we compare the delays of the three injectors and find that more than 90% of the time the delays are within 0.2 ms of each other. This would support the theory that these three devices are installed in the samephysical location."