Shadowsocks is one of the most popular circumvention tools in China. Since May 2019, there have been numerous anecdotal reports of the blocking of Shadowsocks from Chinese users. This report contains preliminary results of research into how the Great Firewall of China (GFW) detects and blocks Shadowsocks and its variants. Using measurement experiments, we find that the GFW passively monitors the network for suspicious connections that may be Shadowsocks, then actively probes the corresponding servers to test whether its guess is correct. The blocking of Shadowsocks is likely controlled by human factors that increase the severity of blocking during politically sensitive times. We suggest a workaround—changing the sizes of network packets during the Shadowsocks handshake—that (for now) effectively mitigates active probing of Shadowsocks servers. We will continue collaborating with developers to make Shadowsocks and related tools more resistant to blocking.
We set up our own Shadowsocks servers and connected to them from inside China, while capturing traffic on both sides for analysis. All experiments were conducted between July 5, 2019 and November 11, 2019. Most of the experiments were conducted since the reported large-scale blocking of Shadowsocks starting September 16, 2019.
In most of the experiments, we used shadowsocks-libev v3.3.1 as both client and server, since it is an actively maintained and representative Shadowsocks implementation. We believe the vulnerabilities we discovered applies to many Shadowsocks implementations and its variants, including OutlineVPN.
Unless explicitly specified, all clients and servers were used without any modification to their network functions, for example firewall rules. Shadowsocks can be configured with different encryption settings. We tested servers running both Stream ciphers and AEAD ciphers.
Shadowsocks is an encrypted protocol, designed not to have any static patterns in packet contents. It has two main operating modes, both keyed by a master password: Stream (deprecated) and AEAD (recommended). Both modes are meant to require the client to know the master password before using the server; however in Stream mode the client is only weakly authenticated. Both modes are susceptible to replay of previously seen authenticated packets, unless separate measures to prevent replay are taken.
We have observed 5 types of active probes:
Seemingly random (not a replay of any genuine connection that we can identify):
We suspect that the active probing system identifies Shadowsocks servers and its variants by comparing a server’s responses to several of these probes.
Shadowsocks-libev has a replay filter; however most other Shadowsocks implementations do not. The replay filter blocks only exact replay, not replay that has been modified, and is not by itself enough to prevent active probing from comparing the responses to several slightly different probes.
It appears that a certain threshold of genuine simultaneous connections are required to trigger active probing. For example, in one experiment, as few as 13 connections were enough to trigger the active probing. Initial result also shows it may require a slightly more connections for the Shadowsocks servers using AEAD ciphers to get probed.
We let a client make 16 connections to a Shadowsocks server every 5 minutes. Although our connections triggered a large number of active probes, the Shadowsocks server was never blocked, for reasons we do not fully understand.
The figure above shows that while legitimate clients attempt to connect to the server, it receives active probes; and when they stop trying to connect, the active probing mostly stops. The number of active probes sent per legitimate connection is variable and not 1:1.
The active probing system may save a genuine connection payload and replay it later, even in response to a separate, future connection. The figure below shows the variability of the delay between legitimate connections and the ensuing replay-based probes. Because one legitimate connection may cause many (up to 47 in one case) replay attacks, we present two different cases: the orange line is samples only the first replay-based probe for a particular legitimate connection; the blue line is samples all replay-based probes.
The result shows that more than 90% of the replayed probes were sent within an hour of the connection from the legitimate client. The minimum observed delay was 0.4 seconds, while the maximum was around 400 hours.
Throughout all the experiments we conducted so far, we have seen 35,477 active probes sent from 10,547 unique IP addresses which all belong to China.
Origin ASes. The two autonomous systems that account for most of the Shadowsocks probes, AS 4837 (CHINA169-BACKBONE CNCGROUP China169 Backbone,CN) and AS 4134 (CHINANET-BACKBONE No.31,Jin-rong Street, CN), are the same as have been documented in previous work.
Centralized Structures. Despite coming from thousands of unique IP addresses, it appears that all active probing behavior is centrally managed by only a small number of processes. The evidence for this observation comes from network side channels. The figure below shows the TCP timestamp value that is attached to the SYN segment of each probe. The TCP timestamp is a 32-bit counter that increases at a fixed rate. It is not an absolute timestamp, but is relative to however the TCP implementation was initialized when the operating system last booted. The figure shows that what at first seem to be thousands of independent probers actually share only a small number of linear TCP timestamp sequences. In this case there are at least nine different physical systems or processes, with one of the nine accounting for the great majority of probes. We say “at least” nine process because we can probably not distinguish two or more independent processes sharing a very close interception value. The slopes of the sequences represent a timestamp increment frequency of 250 Hz.
Detection of Shadowsocks proceeds in two steps:
Therefore, to avoid blocking, you can (1) evade the passive detector, or (2) respond to active probes in a way that does not result in blocking. We will show how to do (1) by installing software that alters the sizes of packets.
Brdgrd is software that you can run on a Shadowsocks server that causes the client to break its Shadowsocks handshake into smaller packets. It was originally intended to disrupt the detection of Tor relays by forcing the GFW to do complicated TCP reassembly, but here we take advantage of brdgrd’s shaping of packet sizes from client to server. It seems that the GFW at least partially relies on packet sizes to passively detect Shadowsocks connections. Modifying packet sizes can significantly mitigate active probing by disrupting the first step in classification.
The figure shows a Shadowsocks server undergoing active probing, and then the probing going to zero within several hours of brdgrd being activated. As soon as we disabled brdgrd, active probing resumed. The second time we enabled brdgrd, the probes completely stopped for around 40 hours, but then a few more probes came.
Another experiment shows that brdgrd may be even more effective if used from the very beginning, before the server has been probed for the first time.
Brdgrd works by rewriting the server’s TCP window size to a rarely small value. Therefore it is likely possible to detect that brdgrd is being used. So while brdgrd can effectively reduce active probing for the time being, it cannot be regarded as a permanent solution to Shadowsocks blocking.
While the fact that active probing happens is clear, it is still unclear to us how active probing affects the blocking of Shadowsocks servers. That is, we have 33 Shadowsocks servers located all over the world. While most of them experienced heavy active probing, only 3 of them were ever blocked. More interestingly, one of the servers that was blocked was used for only a very short period of time, and thus had not received as many probes as some other servers that did not get blocked.
We came up with three hypotheses, attempting to explain this interesting phenomenon:
The blocking of Shadowsocks servers is likely controlled by some human factors. That is, the GFW may maintain a list of highly suspected Shadowsocks servers and it depends on human factors whether known servers are blocked (or unblocked). This hypothesis would also partly explains why more blockings have been reported during politically sensitive periods of time.
Another hypothesis is that active probing was ineffective against the particular Shadowsocks implementations that we used for most of the experiments. Indeed, all three servers that got blocked were running a different implementation than others. This can be true if the GFW has been exploiting some unique server reactions that are characteristics of only a certain set of Shadowsocks implementations.
The third hypothesis is there exists some geolocation inconsistency in censorship. All three servers that got blocked were running in a datacenter different from others, and were connected from a different residential network. This can be true if the GFW pays special attention to address ranges belonging to certain known datacenters, and/or pays special attention to connections from residential networks.
We want to thank these people for research and helpful discussion on this topic:
We encourage you to share your questions, comments or evidence on our findings and hypotheses publicly or privately. Our private contact information can be found at the footer of GFW Report.