1 00:00:00,240 --> 00:00:02,720 This is How China Detects and Blocks Shadowsocks, 2 00:00:02,720 --> 00:00:06,320 by GFW Report, Jan Beznazwy and Amir Houmansadr. 3 00:00:06,880 --> 00:00:09,680 I'm David Fifield and I'm presenting  this work on behalf of the authors, 4 00:00:09,680 --> 00:00:10,880 most of whom are anonymous. 5 00:00:11,600 --> 00:00:13,520 I have experience researching in this field and 6 00:00:13,520 --> 00:00:15,920 the authors have acquainted  me thoroughly with this work. 7 00:00:19,200 --> 00:00:21,520 The grand summary of this research is that 8 00:00:21,520 --> 00:00:24,560 the Great Firewall of China  detects and blocks Shadowsocks 9 00:00:24,560 --> 00:00:27,840 using a combination of passive  traffic analysis and active probing. 10 00:00:28,400 --> 00:00:29,840 And let's talk about what those terms mean. 11 00:00:31,920 --> 00:00:36,320 Shadowsocks is an encrypted proxy protocol and it's designed to be difficult to detect. 12 00:00:36,960 --> 00:00:40,480 It's really popular in China as a  means of censorship circumvention, 13 00:00:40,480 --> 00:00:42,240 a way of getting around the Great Firewall. 14 00:00:42,880 --> 00:00:45,040 And the Great Firewall for its part, 15 00:00:45,040 --> 00:00:47,120 as part of its general  mission of information control 16 00:00:47,760 --> 00:00:52,800 tries to find and block all types of  different proxy servers, Shadowsocks included. 17 00:00:54,000 --> 00:00:56,400 And in fact, since about May 2019, 18 00:00:56,400 --> 00:01:01,520 there have been anecdotal reports of people's Shadowsocks servers being blocked from China 19 00:01:01,520 --> 00:01:05,760 sometimes during politically sensitive times, but without a good explanation. 20 00:01:07,040 --> 00:01:11,840 This research helps provide an explanation  for how this has been happening. 21 00:01:12,800 --> 00:01:16,720 Now in Shadowsocks, the connection between  the client and the server is encrypted 22 00:01:16,720 --> 00:01:22,640 and furthermore it's encrypted in a way that it reveals only ciphertext to an observer. 23 00:01:22,640 --> 00:01:26,800 So unlike TLS, for example, which  has plaintext framing bytes, 24 00:01:26,800 --> 00:01:29,120 there's nothing like that in Shadowsocks. 25 00:01:29,120 --> 00:01:34,000 If you flatten out a Shadowsocks stream, it looks like just a sequence of uniformly 26 00:01:34,000 --> 00:01:35,840 random bytes and that's by design. 27 00:01:37,440 --> 00:01:39,840 This quality means that it's  not possible to, for example, 28 00:01:40,800 --> 00:01:44,640 write a simple regular expression that  will match all Shadowsocks traffic, 29 00:01:44,640 --> 00:01:46,160 you have to work a little harder than that. 30 00:01:48,160 --> 00:01:52,480 Now if you're thinking that this  randomness, this lack of a fingerprint 31 00:01:52,480 --> 00:01:55,600 is itself a kind of fingerprints, you're absolutely right. 32 00:01:55,600 --> 00:02:00,880 And in fact this research shows that the Great Firewall uses 33 00:02:01,440 --> 00:02:05,760 the entropy and the length  of packets in a TCP stream 34 00:02:05,760 --> 00:02:09,360 as part of its first step in  classifying Shadowsocks traffic. 35 00:02:14,080 --> 00:02:16,000 Now what do I mean by active probing? 36 00:02:16,880 --> 00:02:18,800 This research shows that 37 00:02:18,800 --> 00:02:22,400 the Great Firewall discovers Shadowsocks  servers in a two-step process: 38 00:02:23,200 --> 00:02:26,640 the first step is passive and  the second step is active. 39 00:02:28,320 --> 00:02:32,560 In the first step, it looks for possible  or potential Shadowsocks connections; 40 00:02:32,560 --> 00:02:33,600 and in the second step, 41 00:02:34,480 --> 00:02:38,240 it connects to the servers involved in  those connections from its own IP addresses 42 00:02:38,240 --> 00:02:42,720 as if it were a Shadowsocks client  and watches how the server responds. 43 00:02:45,040 --> 00:02:50,160 You can think of step 1 as  guess and step 2 as confirm. 44 00:02:50,160 --> 00:02:53,600 Now you can understand this process  of active probing as a way of 45 00:02:55,600 --> 00:02:59,360 increasing precision or reducing  cost in network classification. 46 00:02:59,920 --> 00:03:03,360 If you were to write a purely  passive classifier for Shadowsocks, 47 00:03:03,360 --> 00:03:06,960 it may yield unacceptably high false positives. 48 00:03:06,960 --> 00:03:08,160 On the other hand, if you were to try to 49 00:03:08,160 --> 00:03:11,200 active probe every single connection that passes through the firewall 50 00:03:11,200 --> 00:03:13,760 that may be more probes  than you can manage to send. 51 00:03:14,320 --> 00:03:18,880 So you can think of step one as being  a sort of pre-filter for step two. 52 00:03:21,440 --> 00:03:25,360 Now this is certainly not the first time  that active probing has been documented 53 00:03:25,920 --> 00:03:28,720 to be used in China against  censorship circumvention protocols. 54 00:03:29,440 --> 00:03:32,960 There is research going back all the way to 2011, 55 00:03:32,960 --> 00:03:38,560 showing it being used against Tor,  against various VPN protocols like that. 56 00:03:39,600 --> 00:03:43,440 But the level of detection  now in using Shadowsocks 57 00:03:43,440 --> 00:03:46,480 reaches a new heights of sophistication. 58 00:03:48,480 --> 00:03:51,600 How do we know all this? Well, the authors investigated 59 00:03:51,600 --> 00:03:54,400 it in the way you might expect. They ran an experiment, 60 00:03:54,400 --> 00:03:57,520 they set up their own Shadowsocks  servers outside of China; 61 00:03:57,520 --> 00:04:00,240 They set up their own Shadowsocks  clients inside China and then 62 00:04:00,240 --> 00:04:02,880 they connected to their own  servers through the Firewall and 63 00:04:02,880 --> 00:04:05,360 watched for what else connected  to those same servers. 64 00:04:06,480 --> 00:04:09,680 They also set up some control  servers and never connected to them, 65 00:04:09,680 --> 00:04:12,960 just to be able to distinguish the  connection triggered active probes 66 00:04:12,960 --> 00:04:14,880 from random internet scanning. 67 00:04:15,600 --> 00:04:17,520 And they ran this experiment  for about four months. 68 00:04:19,200 --> 00:04:23,200 Now there are many many implementations  of Shadowsocks out there. 69 00:04:23,840 --> 00:04:26,160 For this experiment, the authors  chose two of the most popular, 70 00:04:26,160 --> 00:04:28,880 which are called Shadowsocks-libev and Outline. 71 00:04:29,440 --> 00:04:32,640 These are two independent  implementations of the same protocol. 72 00:04:34,800 --> 00:04:38,160 The main observations of the four  month server experiment are that 73 00:04:38,160 --> 00:04:40,560 active probers send a variety of probe types, 74 00:04:40,560 --> 00:04:43,200 some of them look like replay  attacks and some of them do not. 75 00:04:43,920 --> 00:04:46,080 The ones that are replays may be 76 00:04:46,080 --> 00:04:48,800 stored and replayed after  a surprisingly long delay. 77 00:04:49,440 --> 00:04:53,840 The ones that are not replays have a  peculiar distribution of packet lengths. 78 00:04:54,480 --> 00:04:58,720 And active probes come from apparently  thousands of different source IP addresses. 79 00:05:00,720 --> 00:05:05,040 Let's talk about the replay based probes. First these are copies of the 80 00:05:05,040 --> 00:05:09,440 author's own legitimate connection from their authenticated Shadowsocks clients. 81 00:05:09,440 --> 00:05:11,600 And specifically, they're copies of the 82 00:05:12,480 --> 00:05:16,800 first data packets in an authenticated Shadowsocks connection. 83 00:05:17,840 --> 00:05:21,600 Sometimes the replay is identical,  sometimes it has certain bytes changed, 84 00:05:21,600 --> 00:05:25,120 one or two or maybe a dozen bytes changed, but usually at fixed positions. 85 00:05:27,520 --> 00:05:31,040 So what could be the intention  behind sending replay probes? 86 00:05:31,920 --> 00:05:35,760 Well, potentially it's exploiting a  vulnerability in the Shadowsocks protocol. 87 00:05:37,280 --> 00:05:40,080 See the protocol doesn't  specify what should happen when 88 00:05:40,080 --> 00:05:46,400 a server gets a replay of a previous  properly authenticated client connection. 89 00:05:48,240 --> 00:05:51,840 Now if an implementation doesn't  do any sort of replay filtering, 90 00:05:51,840 --> 00:05:56,640 any prevention of replay attacks, what's likely to happen is that 91 00:05:56,640 --> 00:06:01,840 it will do the exact same proxy request that it did earlier for the authenticated client, 92 00:06:01,840 --> 00:06:05,040 and send back to the active  prober a big blob of ciphertext. 93 00:06:05,040 --> 00:06:07,680 Now the active prober won't  be able to decrypt that blob 94 00:06:07,680 --> 00:06:10,640 because it doesn't know the password  for that Shadowsocks server. 95 00:06:10,640 --> 00:06:15,120 But the fact that it received a large  amount of ciphertext back is a giveaway 96 00:06:15,120 --> 00:06:16,960 that the server is in fact Shadowsocks. 97 00:06:18,400 --> 00:06:22,160 And even in implementations that try  to filter out or prevent replays, 98 00:06:22,160 --> 00:06:26,960 there are certain edge conditions in  how connections are closed, for example, 99 00:06:26,960 --> 00:06:29,520 that can be characteristic of Shadowsocks. 100 00:06:30,640 --> 00:06:34,400 And the fact that certain bytes are sometimes 101 00:06:34,400 --> 00:06:38,720 changed in these replay based probes may be an attempt to evade 102 00:06:38,720 --> 00:06:40,960 implementations that have a replay filter. 103 00:06:45,200 --> 00:06:47,200 Replay-based probes are convenient for analysis 104 00:06:47,200 --> 00:06:52,400 because it's easy to match the active  probe with the legitimate connection that 105 00:06:52,400 --> 00:06:53,360 it is a replay of. 106 00:06:54,480 --> 00:06:58,160 It makes it possible to, for example,  measure how long the delay is 107 00:06:58,160 --> 00:07:01,120 between when a legitimate  connection is sent and then 108 00:07:01,120 --> 00:07:03,680 replays based on that connection are sent. 109 00:07:04,720 --> 00:07:06,640 So take a look at this graph, this is a CDF. 110 00:07:07,360 --> 00:07:12,480 Because a probe may be replayed more than once, the darker line here only 111 00:07:12,480 --> 00:07:15,840 considers the first replay, and then the paler line considers all replays. 112 00:07:17,120 --> 00:07:19,200 And as you can see, for first replays anyway 113 00:07:19,200 --> 00:07:25,680 at least around 25 percent of replay probes come within one second, so almost immediately; 114 00:07:25,680 --> 00:07:28,320 but there is a surprisingly long tail and 115 00:07:28,320 --> 00:07:32,480 some replay probes are sent after a  delay of minutes, hours, even days. 116 00:07:34,880 --> 00:07:37,680 Now the non-replay probes: these ones had a payload 117 00:07:37,680 --> 00:07:42,640 that was to all appearances random; but didn't match any prior legitimate connection. 118 00:07:43,280 --> 00:07:46,400 And you notice there's a very strange  distribution of packet lengths: 119 00:07:47,200 --> 00:07:49,280 looking at the ones of length below 50, 120 00:07:50,640 --> 00:07:52,560 you'll see that they're  roughly uniformly distributed 121 00:07:52,560 --> 00:07:59,920 in what I'll call triplets centered on  lengths 8, 12, 16, 22, 33, 41, and 49. 122 00:07:59,920 --> 00:08:04,560 So the triplet at 8, for example, that represents a length of 7, a length of 8, and a length of 9. 123 00:08:04,560 --> 00:08:06,320 All being about equally likely to be sent. 124 00:08:07,280 --> 00:08:10,400 Besides those notice the different scales here, 125 00:08:10,400 --> 00:08:14,800 the great majority of the non-replay  probes had length exactly 221 bytes, 126 00:08:16,320 --> 00:08:20,320 and this is an interesting and thought-provoking  distribution of packet lengths. 127 00:08:20,320 --> 00:08:23,760 The authors think they have  at least a partial explanation 128 00:08:23,760 --> 00:08:28,640 for why active probers send  probes of these lengths. 129 00:08:30,800 --> 00:08:34,800 You see when you send random  unauthenticated data to a Shadowsock server, 130 00:08:35,360 --> 00:08:39,840 the server may react differently  depending on how much data you send it. 131 00:08:40,720 --> 00:08:46,400 So if you send too little data, the server is going to wait to receive the 132 00:08:46,400 --> 00:08:48,800 rest of the data that it's expecting, and eventually timeout. 133 00:08:49,440 --> 00:08:51,280 But if you send beyond that threshold, 134 00:08:51,920 --> 00:08:55,280 the server will attempt to authenticate  the data that it's received, 135 00:08:55,280 --> 00:08:58,080 be unable to authenticate  it, and close the connection. 136 00:08:59,600 --> 00:09:01,600 Now I won't get in too far into the details here, 137 00:09:01,600 --> 00:09:04,560 but you can configure Shadowsocks  with a variety of different ciphers 138 00:09:04,560 --> 00:09:08,320 and initialization vectors of different lengths, and things like that. 139 00:09:08,320 --> 00:09:12,640 But you'll notice in this table that those triplets many of them straddle 140 00:09:12,640 --> 00:09:16,240 what I'll call byte thresholds, between where the server times out and 141 00:09:16,240 --> 00:09:19,360 when it closes the connection  with a RST or otherwise. 142 00:09:19,360 --> 00:09:26,160 So looking at the first row here, if you send a server so configured a packet of 143 00:09:26,160 --> 00:09:28,880 seven bytes or eight bytes, it's going to time out but 144 00:09:28,880 --> 00:09:31,200 if you send it nine bytes, you'll get an immediate RST. 145 00:09:31,200 --> 00:09:35,840 So that's a distinguishable  difference in how the server reacts. 146 00:09:37,040 --> 00:09:39,680 This analysis doesn't fully  explain the triplet distribution, 147 00:09:39,680 --> 00:09:46,480 because, for example, the triplet at  32, 33, 34, and the one at 40, 41, 42, 148 00:09:46,480 --> 00:09:51,600 don't match up with any byte  thresholds and neither does the 221. 149 00:09:55,760 --> 00:09:58,240 Alright, moving on to the origin of the probers. 150 00:10:00,080 --> 00:10:03,440 Over those four months, the authors' Shadowsock servers 151 00:10:03,440 --> 00:10:07,760 received over 50,000 active probes and those came from over 12,000 152 00:10:07,760 --> 00:10:10,480 different IP addresses, which all geolocate to China. 153 00:10:12,800 --> 00:10:18,080 So a consequence of this observation is that it's not possible to simply enumerate all 154 00:10:18,080 --> 00:10:21,600 the active prober IP addresses and ban them from your server. 155 00:10:22,960 --> 00:10:26,080 It also isn't surprising because  prior research studying active probing 156 00:10:26,080 --> 00:10:31,840 has also found large numbers of IP  addresses being used to send active probes. 157 00:10:32,880 --> 00:10:36,960 Now comparing the 12,000  IP addresses in this work, 158 00:10:36,960 --> 00:10:40,400 with previously compiled  lists of prober IP addresses, 159 00:10:40,400 --> 00:10:42,960 there is not much overlap although there is some; 160 00:10:44,080 --> 00:10:48,000 however this is not really that surprising, because past research has found that there 161 00:10:48,000 --> 00:10:52,560 is a lot of churn in the IP addresses used for active probing over time. 162 00:10:56,640 --> 00:10:58,800 Now despite the fact that there seemed to be 163 00:10:58,800 --> 00:11:01,600 these thousands and thousands  of different active probers, 164 00:11:01,600 --> 00:11:06,320 it's likely that they are all centrally  managed by a small number of processes; 165 00:11:08,320 --> 00:11:12,240 and the evidence for that comes  from a TCP layer side channel, 166 00:11:12,240 --> 00:11:13,840 namely the TCP timestamp. 167 00:11:14,880 --> 00:11:19,040 So the TCP timestamp is a 32-bit  counter that increases at a fixed rate, 168 00:11:19,760 --> 00:11:23,840 and it's attached to every outgoing TCP segments. 169 00:11:25,360 --> 00:11:29,760 Different computers will generally not  have synchronized TCP timestamp sequences, 170 00:11:29,760 --> 00:11:33,840 because it's going to be relative to  usually when the computer was last rebooted, 171 00:11:33,840 --> 00:11:37,360 and the counter was reset to zero  or initialized to a random value. 172 00:11:38,960 --> 00:11:42,480 So this graph shows the TCP  timestamp sequences over time, 173 00:11:43,040 --> 00:11:48,880 of a few thousand active prober IP  addresses in one sub-experiment. 174 00:11:48,880 --> 00:11:52,000 And you can see that even though they  come from many different IP addresses, 175 00:11:52,000 --> 00:11:56,560 they fall into a small number of  distinct TCP timestamp sequences and 176 00:11:58,720 --> 00:12:02,960 these sequences increase at typical  rates so 250 HZ or 1,000 HZ. 177 00:12:02,960 --> 00:12:07,120 That 1,000 hertz line goes through  a cluster of about 20 data points 178 00:12:07,120 --> 00:12:09,600 that are very closely spaced, but within that space they're 179 00:12:10,400 --> 00:12:12,720 much more like 1,000 HZ than 250 HZ. 180 00:12:15,280 --> 00:12:18,960 So this TCP timestamp observation  is consistent with prior work, 181 00:12:18,960 --> 00:12:22,560 as are most of the other  network layer fingerprints that 182 00:12:22,560 --> 00:12:24,160 you might think to look of. 183 00:12:24,160 --> 00:12:27,760 Look at the exception is TCP source port numbers. 184 00:12:27,760 --> 00:12:31,840 Prior work has found a roughly uniform  distribution of source port numbers, 185 00:12:31,840 --> 00:12:34,480 whereas in this work the authors found 186 00:12:34,480 --> 00:12:38,880 a marked bias towards the default  ephemeral port range used by Linux. 187 00:12:44,400 --> 00:12:48,560 So it's clear that active probing  of Shadowsocks is a phenomenon. 188 00:12:48,560 --> 00:12:55,280 It happens what features is  the Great Firewall looking for. 189 00:12:56,880 --> 00:13:00,080 The authors investigated  this aided by the fact that 190 00:13:00,080 --> 00:13:04,440 1) replay-based probes are often  sent almost immediately and 191 00:13:04,440 --> 00:13:08,320 2) they are copies only of the  first data carrying packet. 192 00:13:08,320 --> 00:13:15,120 So the authors designed an experiment  to establish a TCP connection and then 193 00:13:15,120 --> 00:13:21,440 send one TCP packet with a configurable  entropy and a configurable packet length. 194 00:13:21,440 --> 00:13:25,440 A configurable payload size and, from this graph, 195 00:13:25,440 --> 00:13:30,960 we can see although there isn't a real  sharp distinguishing threshold that 196 00:13:31,520 --> 00:13:35,360 high entropy packets are more likely to  be replayed than low entropy packets, 197 00:13:40,000 --> 00:13:45,120 and the length of the packets matters  as well so here we have another CDF: 198 00:13:45,120 --> 00:13:51,920 the gray line in the back is the  author's own trigger connections 199 00:13:51,920 --> 00:13:57,120 and they tested packet lengths between  1 and 1,000 bytes uniformly distributed. 200 00:13:57,680 --> 00:14:01,840 Now you can see the non-replay probes  there with the expected peak at 221. 201 00:14:02,880 --> 00:14:14,160 The replay probes only occur between in the  interval of about 160 to 700 bytes lengths. 202 00:14:14,160 --> 00:14:18,480 Outside that interval were  almost never replayed and 203 00:14:18,480 --> 00:14:23,600 even within that interval certain lengths, are more likely to be replayed than others. 204 00:14:24,160 --> 00:14:29,840 So you'll notice the replay line has  a sort of chunky stair-step pattern, 205 00:14:30,400 --> 00:14:32,400 and there's actually some structure to that: 206 00:14:33,920 --> 00:14:43,200 so between lengths about 160 to 384, packets were more likely to be replayed if 207 00:14:43,200 --> 00:14:46,800 they had a length whose remainder  was 9 when divided by 16. 208 00:14:47,760 --> 00:14:54,320 And in the interval about 264 to 700, they were more likely to be replayed if 209 00:14:54,320 --> 00:14:57,840 they had a length whose remainder  was 2 when divided by 16. 210 00:14:58,880 --> 00:15:04,560 And in the area where those two intervals overlap, there was a mix of remainders 2 and 9. 211 00:15:05,920 --> 00:15:08,080 The authors don't have an  explanation for this phenomenon, 212 00:15:08,080 --> 00:15:11,840 it's just an intriguing feature  of the packet length distribution. 213 00:15:14,160 --> 00:15:18,800 Taking active probing of Shadowsocks as a given, what can be done to mitigate it? 214 00:15:19,840 --> 00:15:23,760 Well, because we know that the  detection process is a two-step process, 215 00:15:23,760 --> 00:15:26,480 it is sufficient to disrupt  either of those two steps. 216 00:15:26,480 --> 00:15:29,760 So you can either evade the  passive traffic analysis, 217 00:15:29,760 --> 00:15:32,320 or you can evade the active probing components. 218 00:15:33,120 --> 00:15:36,880 Evading the passive traffic analysis means changing the features that the Great 219 00:15:36,880 --> 00:15:39,840 Firewall is looking for: so entropy and packet lengths. 220 00:15:41,360 --> 00:15:44,560 Changing entropy in  Shadowsocks is not easy without 221 00:15:44,560 --> 00:15:46,720 kind of fundamentally changing  how the protocol works; 222 00:15:46,720 --> 00:15:49,600 but with packet lengths, you  have a little bit of leeway. 223 00:15:49,600 --> 00:15:56,320 And, for example, newer versions of  Outline will coalesce consecutive packets: 224 00:15:56,320 --> 00:16:00,080 maybe something that would be sent as two packets could send as one packet instead, 225 00:16:00,080 --> 00:16:03,600 as a way of disguising the characteristic 226 00:16:03,600 --> 00:16:06,080 packet length distribution that the Firewall may be looking for. 227 00:16:07,520 --> 00:16:11,440 Another interesting observation is with  a tool called Brdgrd (Bridge Guard). 228 00:16:11,440 --> 00:16:14,400 So this is software that you can  install on a Shadowsocks server and 229 00:16:14,400 --> 00:16:19,120 it causes clients to send  smaller than usual packets. 230 00:16:19,120 --> 00:16:22,240 When they're in the early  stages of their connection, 231 00:16:22,240 --> 00:16:25,280 it does this by rewriting  the server's TCP window size. 232 00:16:26,880 --> 00:16:30,560 Although there are some drawbacks and  caveats to using Brdgrd with Shadowsocks, 233 00:16:31,120 --> 00:16:34,800 it's clear that here in this  experiment while Brdgrd was active, 234 00:16:35,360 --> 00:16:39,360 the incidence of active  probing is notably diminished, 235 00:16:39,360 --> 00:16:40,800 although not quite to zero. 236 00:16:47,440 --> 00:16:49,280 the other thing you can do to avoid detection is 237 00:16:49,280 --> 00:16:52,160 changing the way that you  respond to active probes. 238 00:16:52,160 --> 00:16:56,320 So I showed you this table earlier and  it was a little bit of a lie because 239 00:16:56,320 --> 00:17:00,880 that table described the behavior of  some older versions of Shadowsocks. 240 00:17:00,880 --> 00:17:02,800 Some newer versions of Shadowsocks, 241 00:17:02,800 --> 00:17:09,680 partially as a result of this research, try to disguise the distinction between 242 00:17:09,680 --> 00:17:13,200 timing out a connection and terminating the connection. 243 00:17:13,200 --> 00:17:17,840 So their reactions in newer versions  of Shadowsocks looks more like this. 244 00:17:19,600 --> 00:17:23,040 Now I don't want to get into the  details but the AEAD is the newer, 245 00:17:23,040 --> 00:17:26,160 currently recommended version  of the Shadowsocks protocol. 246 00:17:26,160 --> 00:17:30,240 And you can see in this version, in these two implementations, 247 00:17:30,240 --> 00:17:38,880 at least the server always times out, no matter the length of the unauthenticated probe. 248 00:17:39,840 --> 00:17:42,960 In this older deprecated  stream version of the protocol, 249 00:17:42,960 --> 00:17:44,960 for compatibility reasons, it's not possible to 250 00:17:44,960 --> 00:17:50,240 completely eliminate that distinction, but they have done it as far as possible. 251 00:17:56,960 --> 00:18:02,160 In summary, the Great Firewall of  China detects Shadowsocks servers 252 00:18:02,160 --> 00:18:05,200 using a combination of passive  traffic analysis and active probing. 253 00:18:06,160 --> 00:18:10,080 Probing is triggered by the first  packet in a data connection and 254 00:18:10,080 --> 00:18:14,080 it's more likely when packets have high  entropy or have certain payload lengths. 255 00:18:15,360 --> 00:18:19,360 There are many different types of active probe: some are replays, some are not. 256 00:18:20,640 --> 00:18:24,640 Probes come from many IP addresses but  they show signs of being centrally managed 257 00:18:25,280 --> 00:18:28,720 and it's possible to mitigate the  effects of active probe into Shadowsocks 258 00:18:28,720 --> 00:18:32,160 by disrupting either of the two  steps in the classification process. 259 00:18:33,360 --> 00:18:36,400 Thank you for your attention if  you have questions or comments, 260 00:18:36,400 --> 00:18:38,400 it's best to get in touch  with the authors directly. 261 00:18:39,040 --> 00:18:45,200 Source code and data for this research  is available at the URL you see.