Triplet Censors: Demystifying Great Firewall’s DNS Censorship Behavior


Authors: Anonymous, Arian Akhavan Niaki, Nguyen Phong Hoang, Phillipa Gill, Amir Houmansadr

Free and Open Communications on the Internet (FOCI) 2020

The open dataset contains code and datasets for the paper: Triplet Censors: Demystifying Great Firewall’s DNS Censorship Behavior.

Updates

  • If you have any question, comment or feedback, please feel free to leave them on the pad.
  • We will have a summary of our paper on this page within days. Please check back!
  • As of August 11, 2020, we have released all our code and datasets to the maximum extend that does not harm our anonymity. These code and datasets support all major findings in our paper. We will continue anonymizing and releasing the remaining code and datasets with a goal to make our work highly reproducible within days.

Explanations on datasets

The largest and the most important dataset is ./all_more_fields.csv. This 14 GB file contains 120 millions forged responses injected by the GFW. It is extracted from a set of pcap files across 9 months using:

bash -x extract_all_pcap_to_csv_more_fields.sh

For easier analysis on different injectors of the GFW, we categorize packets sent by different injectors into injector1.csv, injector2.csv, injector3.csv. To generate these files, you need to run one of the two following commands yourself:

# This is the faster way
bash -x split_by_awk_new.sh

or

# This is a more readable but much slower way
python3 split_with.py all_more_fields.csv

README on subdirectories

We categorized the code and datasets supporting different findings in the paper into different subdirectories. There is a README.md file in each subdirectory, explaining what findings are supported by the code and dataset there.

For example, ./delay_differences/README.md reads as:

The code and dataset under this directory were used to support the following findings in our work:
    "We also compare the time between sending our DNS query and when we receive the injected reply to get a sense of wherethe injectors are located. Specifically, we compare the delays of the three injectors and find that more than 90% of the time the delays are within 0.2 ms of each other. This would support the theory that these three devices are installed in the samephysical location."

Comments