1
00:00:00,240 --> 00:00:02,720
This is <i>How China Detects and Blocks Shadowsocks</i>,

2
00:00:02,720 --> 00:00:06,320
by GFW Report, Jan Beznazwy and Amir Houmansadr.

3
00:00:06,880 --> 00:00:09,680
I'm David Fifield and I'm presenting 
this work on behalf of the authors,

4
00:00:09,680 --> 00:00:10,880
most of whom are anonymous.

5
00:00:11,600 --> 00:00:13,520
I have experience researching in this field and

6
00:00:13,520 --> 00:00:15,920
the authors have acquainted 
me thoroughly with this work.

7
00:00:19,200 --> 00:00:21,520
The grand summary of this research is that

8
00:00:21,520 --> 00:00:24,560
the Great Firewall of China 
detects and blocks Shadowsocks

9
00:00:24,560 --> 00:00:27,840
using a combination of passive 
traffic analysis and active probing.

10
00:00:28,400 --> 00:00:29,840
And let's talk about what those terms mean.

11
00:00:31,920 --> 00:00:36,320
Shadowsocks is an encrypted proxy protocol and
it's designed to be difficult to detect.

12
00:00:36,960 --> 00:00:40,480
It's really popular in China as a 
means of censorship circumvention,

13
00:00:40,480 --> 00:00:42,240
a way of getting around the Great Firewall.

14
00:00:42,880 --> 00:00:45,040
And the Great Firewall for its part,

15
00:00:45,040 --> 00:00:47,120
as part of its general 
mission of information control

16
00:00:47,760 --> 00:00:52,800
tries to find and block all types of 
different proxy servers, Shadowsocks included.

17
00:00:54,000 --> 00:00:56,400
And in fact, since about May 2019,

18
00:00:56,400 --> 00:01:01,520
there have been anecdotal reports of people's
Shadowsocks servers being blocked from China

19
00:01:01,520 --> 00:01:05,760
sometimes during politically sensitive times,
but without a good explanation.

20
00:01:07,040 --> 00:01:11,840
This research helps provide an explanation 
for how this has been happening.

21
00:01:12,800 --> 00:01:16,720
Now in Shadowsocks, the connection between 
the client and the server is encrypted

22
00:01:16,720 --> 00:01:22,640
and furthermore it's encrypted in a way that
it reveals only ciphertext to an observer.

23
00:01:22,640 --> 00:01:26,800
So unlike TLS, for example, which 
has plaintext framing bytes,

24
00:01:26,800 --> 00:01:29,120
there's nothing like that in Shadowsocks.

25
00:01:29,120 --> 00:01:34,000
If you flatten out a Shadowsocks stream,
it looks like just a sequence of uniformly

26
00:01:34,000 --> 00:01:35,840
random bytes and
that's by design.

27
00:01:37,440 --> 00:01:39,840
This quality means that it's 
not possible to, for example,

28
00:01:40,800 --> 00:01:44,640
write a simple regular expression that 
will match all Shadowsocks traffic,

29
00:01:44,640 --> 00:01:46,160
you have to work a little harder than that.

30
00:01:48,160 --> 00:01:52,480
Now if you're thinking that this 
randomness, this lack of a fingerprint

31
00:01:52,480 --> 00:01:55,600
is itself a kind of fingerprints,
you're absolutely right.

32
00:01:55,600 --> 00:02:00,880
And in fact this research shows that
the Great Firewall uses

33
00:02:01,440 --> 00:02:05,760
the entropy and the length 
of packets in a TCP stream

34
00:02:05,760 --> 00:02:09,360
as part of its first step in 
classifying Shadowsocks traffic.

35
00:02:14,080 --> 00:02:16,000
Now what do I mean by active probing?

36
00:02:16,880 --> 00:02:18,800
This research shows that

37
00:02:18,800 --> 00:02:22,400
the Great Firewall discovers Shadowsocks 
servers in a two-step process:

38
00:02:23,200 --> 00:02:26,640
the first step is passive and 
the second step is active.

39
00:02:28,320 --> 00:02:32,560
In the first step, it looks for possible 
or potential Shadowsocks connections;

40
00:02:32,560 --> 00:02:33,600
and in the second step,

41
00:02:34,480 --> 00:02:38,240
it connects to the servers involved in 
those connections from its own IP addresses

42
00:02:38,240 --> 00:02:42,720
as if it were a Shadowsocks client 
and watches how the server responds.

43
00:02:45,040 --> 00:02:50,160
You can think of step 1 as 
guess and step 2 as confirm.

44
00:02:50,160 --> 00:02:53,600
Now you can understand this process 
of active probing as a way of

45
00:02:55,600 --> 00:02:59,360
increasing precision or reducing 
cost in network classification.

46
00:02:59,920 --> 00:03:03,360
If you were to write a purely 
passive classifier for Shadowsocks,

47
00:03:03,360 --> 00:03:06,960
it may yield unacceptably high false positives.

48
00:03:06,960 --> 00:03:08,160
On the other hand,
if you were to try to

49
00:03:08,160 --> 00:03:11,200
active probe every single connection
that passes through the firewall

50
00:03:11,200 --> 00:03:13,760
that may be more probes 
than you can manage to send.

51
00:03:14,320 --> 00:03:18,880
So you can think of step one as being 
a sort of pre-filter for step two.

52
00:03:21,440 --> 00:03:25,360
Now this is certainly not the first time 
that active probing has been documented

53
00:03:25,920 --> 00:03:28,720
to be used in China against 
censorship circumvention protocols.

54
00:03:29,440 --> 00:03:32,960
There is research going back all the way to 2011,

55
00:03:32,960 --> 00:03:38,560
showing it being used against Tor, 
against various VPN protocols like that.

56
00:03:39,600 --> 00:03:43,440
But the level of detection 
now in using Shadowsocks

57
00:03:43,440 --> 00:03:46,480
reaches a new heights of sophistication.

58
00:03:48,480 --> 00:03:51,600
How do we know all this?
Well, the authors investigated

59
00:03:51,600 --> 00:03:54,400
it in the way you might expect.
They ran an experiment,

60
00:03:54,400 --> 00:03:57,520
they set up their own Shadowsocks 
servers outside of China;

61
00:03:57,520 --> 00:04:00,240
They set up their own Shadowsocks 
clients inside China and then

62
00:04:00,240 --> 00:04:02,880
they connected to their own 
servers through the Firewall and

63
00:04:02,880 --> 00:04:05,360
watched for what else connected 
to those same servers.

64
00:04:06,480 --> 00:04:09,680
They also set up some control 
servers and never connected to them,

65
00:04:09,680 --> 00:04:12,960
just to be able to distinguish the 
connection triggered active probes

66
00:04:12,960 --> 00:04:14,880
from random internet scanning.

67
00:04:15,600 --> 00:04:17,520
And they ran this experiment 
for about four months.

68
00:04:19,200 --> 00:04:23,200
Now there are many many implementations 
of Shadowsocks out there.

69
00:04:23,840 --> 00:04:26,160
For this experiment, the authors 
chose two of the most popular,

70
00:04:26,160 --> 00:04:28,880
which are called Shadowsocks-libev and Outline.

71
00:04:29,440 --> 00:04:32,640
These are two independent 
implementations of the same protocol.

72
00:04:34,800 --> 00:04:38,160
The main observations of the four 
month server experiment are that

73
00:04:38,160 --> 00:04:40,560
active probers send a variety of probe types,

74
00:04:40,560 --> 00:04:43,200
some of them look like replay 
attacks and some of them do not.

75
00:04:43,920 --> 00:04:46,080
The ones that are replays may be

76
00:04:46,080 --> 00:04:48,800
stored and replayed after 
a surprisingly long delay.

77
00:04:49,440 --> 00:04:53,840
The ones that are not replays have a 
peculiar distribution of packet lengths.

78
00:04:54,480 --> 00:04:58,720
And active probes come from apparently 
thousands of different source IP addresses.

79
00:05:00,720 --> 00:05:05,040
Let's talk about the replay based probes.
First these are copies of the

80
00:05:05,040 --> 00:05:09,440
author's own legitimate connection
from their authenticated Shadowsocks clients.

81
00:05:09,440 --> 00:05:11,600
And specifically, they're copies of the

82
00:05:12,480 --> 00:05:16,800
first data packets
in an authenticated Shadowsocks connection.

83
00:05:17,840 --> 00:05:21,600
Sometimes the replay is identical, 
sometimes it has certain bytes changed,

84
00:05:21,600 --> 00:05:25,120
one or two or maybe a dozen bytes changed,
but usually at fixed positions.

85
00:05:27,520 --> 00:05:31,040
So what could be the intention 
behind sending replay probes?

86
00:05:31,920 --> 00:05:35,760
Well, potentially it's exploiting a 
vulnerability in the Shadowsocks protocol.

87
00:05:37,280 --> 00:05:40,080
See the protocol doesn't 
specify what should happen when

88
00:05:40,080 --> 00:05:46,400
a server gets a replay of a previous 
properly authenticated client connection.

89
00:05:48,240 --> 00:05:51,840
Now if an implementation doesn't 
do any sort of replay filtering,

90
00:05:51,840 --> 00:05:56,640
any prevention of replay attacks,
what's likely to happen is that

91
00:05:56,640 --> 00:06:01,840
it will do the exact same proxy request
that it did earlier for the authenticated client,

92
00:06:01,840 --> 00:06:05,040
and send back to the active 
prober a big blob of ciphertext.

93
00:06:05,040 --> 00:06:07,680
Now the active prober won't 
be able to decrypt that blob

94
00:06:07,680 --> 00:06:10,640
because it doesn't know the password 
for that Shadowsocks server.

95
00:06:10,640 --> 00:06:15,120
But the fact that it received a large 
amount of ciphertext back is a giveaway

96
00:06:15,120 --> 00:06:16,960
that the server is in fact Shadowsocks.

97
00:06:18,400 --> 00:06:22,160
And even in implementations that try 
to filter out or prevent replays,

98
00:06:22,160 --> 00:06:26,960
there are certain edge conditions in 
how connections are closed, for example,

99
00:06:26,960 --> 00:06:29,520
that can be characteristic of Shadowsocks.

100
00:06:30,640 --> 00:06:34,400
And the fact that
certain bytes are sometimes

101
00:06:34,400 --> 00:06:38,720
changed in these replay based probes
may be an attempt to evade

102
00:06:38,720 --> 00:06:40,960
implementations that have a replay filter.

103
00:06:45,200 --> 00:06:47,200
Replay-based probes are convenient for analysis

104
00:06:47,200 --> 00:06:52,400
because it's easy to match the active 
probe with the legitimate connection that

105
00:06:52,400 --> 00:06:53,360
it is a replay of.

106
00:06:54,480 --> 00:06:58,160
It makes it possible to, for example, 
measure how long the delay is

107
00:06:58,160 --> 00:07:01,120
between when a legitimate 
connection is sent and then

108
00:07:01,120 --> 00:07:03,680
replays based on that connection are sent.

109
00:07:04,720 --> 00:07:06,640
So take a look at this graph,
this is a CDF.

110
00:07:07,360 --> 00:07:12,480
Because a probe may be replayed more than once,
the darker line here only

111
00:07:12,480 --> 00:07:15,840
considers the first replay,
and then the paler line considers all replays.

112
00:07:17,120 --> 00:07:19,200
And as you can see,
for first replays anyway

113
00:07:19,200 --> 00:07:25,680
at least around 25 percent of replay probes
come within one second, so almost immediately;

114
00:07:25,680 --> 00:07:28,320
but there is a surprisingly long tail and

115
00:07:28,320 --> 00:07:32,480
some replay probes are sent after a 
delay of minutes, hours, even days.

116
00:07:34,880 --> 00:07:37,680
Now the non-replay probes:
these ones had a payload

117
00:07:37,680 --> 00:07:42,640
that was to all appearances random;
but didn't match any prior legitimate connection.

118
00:07:43,280 --> 00:07:46,400
And you notice there's a very strange 
distribution of packet lengths:

119
00:07:47,200 --> 00:07:49,280
looking at the ones of length below 50,

120
00:07:50,640 --> 00:07:52,560
you'll see that they're 
roughly uniformly distributed

121
00:07:52,560 --> 00:07:59,920
in what I'll call triplets centered on 
lengths 8, 12, 16, 22, 33, 41, and 49.

122
00:07:59,920 --> 00:08:04,560
So the triplet at 8, for example, that represents
a length of 7, a length of 8, and a length of 9.

123
00:08:04,560 --> 00:08:06,320
All being about equally likely to be sent.

124
00:08:07,280 --> 00:08:10,400
Besides those notice the different scales here,

125
00:08:10,400 --> 00:08:14,800
the great majority of the non-replay 
probes had length exactly 221 bytes,

126
00:08:16,320 --> 00:08:20,320
and this is an interesting and thought-provoking 
distribution of packet lengths.

127
00:08:20,320 --> 00:08:23,760
The authors think they have 
at least a partial explanation

128
00:08:23,760 --> 00:08:28,640
for why active probers send 
probes of these lengths.

129
00:08:30,800 --> 00:08:34,800
You see when you send random 
unauthenticated data to a Shadowsock server,

130
00:08:35,360 --> 00:08:39,840
the server may react differently 
depending on how much data you send it.

131
00:08:40,720 --> 00:08:46,400
So if you send too little data,
the server is going to wait to receive the

132
00:08:46,400 --> 00:08:48,800
rest of the data that it's expecting,
and eventually timeout.

133
00:08:49,440 --> 00:08:51,280
But if you send beyond that threshold,

134
00:08:51,920 --> 00:08:55,280
the server will attempt to authenticate 
the data that it's received,

135
00:08:55,280 --> 00:08:58,080
be unable to authenticate 
it, and close the connection.

136
00:08:59,600 --> 00:09:01,600
Now I won't get in too far into the details here,

137
00:09:01,600 --> 00:09:04,560
but you can configure Shadowsocks 
with a variety of different ciphers

138
00:09:04,560 --> 00:09:08,320
and initialization vectors of different lengths,
and things like that.

139
00:09:08,320 --> 00:09:12,640
But you'll notice in this table that
those triplets many of them straddle

140
00:09:12,640 --> 00:09:16,240
what I'll call byte thresholds,
between where the server times out and

141
00:09:16,240 --> 00:09:19,360
when it closes the connection 
with a RST or otherwise.

142
00:09:19,360 --> 00:09:26,160
So looking at the first row here,
if you send a server so configured a packet of

143
00:09:26,160 --> 00:09:28,880
seven bytes or eight bytes,
it's going to time out but

144
00:09:28,880 --> 00:09:31,200
if you send it nine bytes,
you'll get an immediate RST.

145
00:09:31,200 --> 00:09:35,840
So that's a distinguishable 
difference in how the server reacts.

146
00:09:37,040 --> 00:09:39,680
This analysis doesn't fully 
explain the triplet distribution,

147
00:09:39,680 --> 00:09:46,480
because, for example, the triplet at 
32, 33, 34, and the one at 40, 41, 42,

148
00:09:46,480 --> 00:09:51,600
don't match up with any byte 
thresholds and neither does the 221.

149
00:09:55,760 --> 00:09:58,240
Alright, moving on to the origin of the probers.

150
00:10:00,080 --> 00:10:03,440
Over those four months,
the authors' Shadowsock servers

151
00:10:03,440 --> 00:10:07,760
received over 50,000 active probes and
those came from over 12,000

152
00:10:07,760 --> 00:10:10,480
different IP addresses,
which all geolocate to China.

153
00:10:12,800 --> 00:10:18,080
So a consequence of this observation is that
it's not possible to simply enumerate all

154
00:10:18,080 --> 00:10:21,600
the active prober IP addresses
and ban them from your server.

155
00:10:22,960 --> 00:10:26,080
It also isn't surprising because 
prior research studying active probing

156
00:10:26,080 --> 00:10:31,840
has also found large numbers of IP 
addresses being used to send active probes.

157
00:10:32,880 --> 00:10:36,960
Now comparing the 12,000 
IP addresses in this work,

158
00:10:36,960 --> 00:10:40,400
with previously compiled 
lists of prober IP addresses,

159
00:10:40,400 --> 00:10:42,960
there is not much overlap although there is some;

160
00:10:44,080 --> 00:10:48,000
however this is not really that surprising,
because past research has found that there

161
00:10:48,000 --> 00:10:52,560
is a lot of churn in the IP addresses
used for active probing over time.

162
00:10:56,640 --> 00:10:58,800
Now despite the fact that there seemed to be

163
00:10:58,800 --> 00:11:01,600
these thousands and thousands 
of different active probers,

164
00:11:01,600 --> 00:11:06,320
it's likely that they are all centrally 
managed by a small number of processes;

165
00:11:08,320 --> 00:11:12,240
and the evidence for that comes 
from a TCP layer side channel,

166
00:11:12,240 --> 00:11:13,840
namely the TCP timestamp.

167
00:11:14,880 --> 00:11:19,040
So the TCP timestamp is a 32-bit 
counter that increases at a fixed rate,

168
00:11:19,760 --> 00:11:23,840
and it's attached to every outgoing TCP segments.

169
00:11:25,360 --> 00:11:29,760
Different computers will generally not 
have synchronized TCP timestamp sequences,

170
00:11:29,760 --> 00:11:33,840
because it's going to be relative to 
usually when the computer was last rebooted,

171
00:11:33,840 --> 00:11:37,360
and the counter was reset to zero 
or initialized to a random value.

172
00:11:38,960 --> 00:11:42,480
So this graph shows the TCP 
timestamp sequences over time,

173
00:11:43,040 --> 00:11:48,880
of a few thousand active prober IP 
addresses in one sub-experiment.

174
00:11:48,880 --> 00:11:52,000
And you can see that even though they 
come from many different IP addresses,

175
00:11:52,000 --> 00:11:56,560
they fall into a small number of 
distinct TCP timestamp sequences and

176
00:11:58,720 --> 00:12:02,960
these sequences increase at typical 
rates so 250 HZ or 1,000 HZ.

177
00:12:02,960 --> 00:12:07,120
That 1,000 hertz line goes through 
a cluster of about 20 data points

178
00:12:07,120 --> 00:12:09,600
that are very closely spaced,
but within that space they're

179
00:12:10,400 --> 00:12:12,720
much more like 1,000 HZ than 250 HZ.

180
00:12:15,280 --> 00:12:18,960
So this TCP timestamp observation 
is consistent with prior work,

181
00:12:18,960 --> 00:12:22,560
as are most of the other 
network layer fingerprints that

182
00:12:22,560 --> 00:12:24,160
you might think to look of.

183
00:12:24,160 --> 00:12:27,760
Look at the exception is TCP source port numbers.

184
00:12:27,760 --> 00:12:31,840
Prior work has found a roughly uniform 
distribution of source port numbers,

185
00:12:31,840 --> 00:12:34,480
whereas in this work the authors found

186
00:12:34,480 --> 00:12:38,880
a marked bias towards the default 
ephemeral port range used by Linux.

187
00:12:44,400 --> 00:12:48,560
So it's clear that active probing 
of Shadowsocks is a phenomenon.

188
00:12:48,560 --> 00:12:55,280
It happens what features is 
the Great Firewall looking for.

189
00:12:56,880 --> 00:13:00,080
The authors investigated 
this aided by the fact that

190
00:13:00,080 --> 00:13:04,440
1) replay-based probes are often 
sent almost immediately and

191
00:13:04,440 --> 00:13:08,320
2) they are copies only of the 
first data carrying packet.

192
00:13:08,320 --> 00:13:15,120
So the authors designed an experiment 
to establish a TCP connection and then

193
00:13:15,120 --> 00:13:21,440
send one TCP packet with a configurable 
entropy and a configurable packet length.

194
00:13:21,440 --> 00:13:25,440
A configurable payload size and,
from this graph,

195
00:13:25,440 --> 00:13:30,960
we can see although there isn't a real 
sharp distinguishing threshold that

196
00:13:31,520 --> 00:13:35,360
high entropy packets are more likely to 
be replayed than low entropy packets,

197
00:13:40,000 --> 00:13:45,120
and the length of the packets matters 
as well so here we have another CDF:

198
00:13:45,120 --> 00:13:51,920
the gray line in the back is the 
author's own trigger connections

199
00:13:51,920 --> 00:13:57,120
and they tested packet lengths between 
1 and 1,000 bytes uniformly distributed.

200
00:13:57,680 --> 00:14:01,840
Now you can see the non-replay probes 
there with the expected peak at 221.

201
00:14:02,880 --> 00:14:14,160
The replay probes only occur between in the 
interval of about 160 to 700 bytes lengths.

202
00:14:14,160 --> 00:14:18,480
Outside that interval were 
almost never replayed and

203
00:14:18,480 --> 00:14:23,600
even within that interval certain lengths,
are more likely to be replayed than others.

204
00:14:24,160 --> 00:14:29,840
So you'll notice the replay line has 
a sort of chunky stair-step pattern,

205
00:14:30,400 --> 00:14:32,400
and there's actually some structure to that:

206
00:14:33,920 --> 00:14:43,200
so between lengths about 160 to 384,
packets were more likely to be replayed if

207
00:14:43,200 --> 00:14:46,800
they had a length whose remainder 
was 9 when divided by 16.

208
00:14:47,760 --> 00:14:54,320
And in the interval about 264 to 700,
they were more likely to be replayed if

209
00:14:54,320 --> 00:14:57,840
they had a length whose remainder 
was 2 when divided by 16.

210
00:14:58,880 --> 00:15:04,560
And in the area where those two intervals overlap,
there was a mix of remainders 2 and 9.

211
00:15:05,920 --> 00:15:08,080
The authors don't have an 
explanation for this phenomenon,

212
00:15:08,080 --> 00:15:11,840
it's just an intriguing feature 
of the packet length distribution.

213
00:15:14,160 --> 00:15:18,800
Taking active probing of Shadowsocks as a given,
what can be done to mitigate it?

214
00:15:19,840 --> 00:15:23,760
Well, because we know that the 
detection process is a two-step process,

215
00:15:23,760 --> 00:15:26,480
it is sufficient to disrupt 
either of those two steps.

216
00:15:26,480 --> 00:15:29,760
So you can either evade the 
passive traffic analysis,

217
00:15:29,760 --> 00:15:32,320
or you can evade the active probing components.

218
00:15:33,120 --> 00:15:36,880
Evading the passive traffic analysis means
changing the features that the Great

219
00:15:36,880 --> 00:15:39,840
Firewall is looking for:
so entropy and packet lengths.

220
00:15:41,360 --> 00:15:44,560
Changing entropy in 
Shadowsocks is not easy without

221
00:15:44,560 --> 00:15:46,720
kind of fundamentally changing 
how the protocol works;

222
00:15:46,720 --> 00:15:49,600
but with packet lengths, you 
have a little bit of leeway.

223
00:15:49,600 --> 00:15:56,320
And, for example, newer versions of 
Outline will coalesce consecutive packets:

224
00:15:56,320 --> 00:16:00,080
maybe something that would be sent as two packets
could send as one packet instead,

225
00:16:00,080 --> 00:16:03,600
as a way of disguising the characteristic

226
00:16:03,600 --> 00:16:06,080
packet length distribution that
the Firewall may be looking for.

227
00:16:07,520 --> 00:16:11,440
Another interesting observation is with 
a tool called Brdgrd (Bridge Guard).

228
00:16:11,440 --> 00:16:14,400
So this is software that you can 
install on a Shadowsocks server and

229
00:16:14,400 --> 00:16:19,120
it causes clients to send 
smaller than usual packets.

230
00:16:19,120 --> 00:16:22,240
When they're in the early 
stages of their connection,

231
00:16:22,240 --> 00:16:25,280
it does this by rewriting 
the server's TCP window size.

232
00:16:26,880 --> 00:16:30,560
Although there are some drawbacks and 
caveats to using Brdgrd with Shadowsocks,

233
00:16:31,120 --> 00:16:34,800
it's clear that here in this 
experiment while Brdgrd was active,

234
00:16:35,360 --> 00:16:39,360
the incidence of active 
probing is notably diminished,

235
00:16:39,360 --> 00:16:40,800
although not quite to zero.

236
00:16:47,440 --> 00:16:49,280
the other thing you can do to avoid detection is

237
00:16:49,280 --> 00:16:52,160
changing the way that you 
respond to active probes.

238
00:16:52,160 --> 00:16:56,320
So I showed you this table earlier and 
it was a little bit of a lie because

239
00:16:56,320 --> 00:17:00,880
that table described the behavior of 
some older versions of Shadowsocks.

240
00:17:00,880 --> 00:17:02,800
Some newer versions of Shadowsocks,

241
00:17:02,800 --> 00:17:09,680
partially as a result of this research,
try to disguise the distinction between

242
00:17:09,680 --> 00:17:13,200
timing out a connection and
terminating the connection.

243
00:17:13,200 --> 00:17:17,840
So their reactions in newer versions 
of Shadowsocks looks more like this.

244
00:17:19,600 --> 00:17:23,040
Now I don't want to get into the 
details but the AEAD is the newer,

245
00:17:23,040 --> 00:17:26,160
currently recommended version 
of the Shadowsocks protocol.

246
00:17:26,160 --> 00:17:30,240
And you can see in this version,
in these two implementations,

247
00:17:30,240 --> 00:17:38,880
at least the server always times out,
no matter the length of the unauthenticated probe.

248
00:17:39,840 --> 00:17:42,960
In this older deprecated 
stream version of the protocol,

249
00:17:42,960 --> 00:17:44,960
for compatibility reasons,
it's not possible to

250
00:17:44,960 --> 00:17:50,240
completely eliminate that distinction,
but they have done it as far as possible.

251
00:17:56,960 --> 00:18:02,160
In summary, the Great Firewall of 
China detects Shadowsocks servers

252
00:18:02,160 --> 00:18:05,200
using a combination of passive 
traffic analysis and active probing.

253
00:18:06,160 --> 00:18:10,080
Probing is triggered by the first 
packet in a data connection and

254
00:18:10,080 --> 00:18:14,080
it's more likely when packets have high 
entropy or have certain payload lengths.

255
00:18:15,360 --> 00:18:19,360
There are many different types of active probe:
some are replays, some are not.

256
00:18:20,640 --> 00:18:24,640
Probes come from many IP addresses but 
they show signs of being centrally managed

257
00:18:25,280 --> 00:18:28,720
and it's possible to mitigate the 
effects of active probe into Shadowsocks

258
00:18:28,720 --> 00:18:32,160
by disrupting either of the two 
steps in the classification process.

259
00:18:33,360 --> 00:18:36,400
Thank you for your attention if 
you have questions or comments,

260
00:18:36,400 --> 00:18:38,400
it's best to get in touch 
with the authors directly.

261
00:18:39,040 --> 00:18:45,200
Source code and data for this research 
is available at the URL you see.