不到 100 行代码实现一个支持 CONNECT 动词的 HTTP 服务器

陪她去流浪桃子 2019年12月07日阅读次数：18591

在 HTTP 诸多的动词中，有一个不常用、不热门的，它是 CONNECT。跟在它后面的资源，通常不是像 GET/POST 一样的资源路径，如：GET /path/to/file。而是一个主机:端口对，即TCP地址、四元组的一半、第三方，如：CONNECT github.com:443。

如果该 HTTP 服务器支持 CONNECT 动词，比如淘宝的基于 nginx 修改后的 tengine 中的 proxy_connect 模块，那么，它将连接到该地址。如果连接成功，它将返回200状态码给客户端。

**此后，所有的流量数据完全透传。**这样就相当于在客户端和第三方之间打通了一个隧道(Tunnel)。你可能在公司经常用SSH跳板机登录到服务器上去，这个跳板机也充当了隧道建立者的角色。为什么要用跳板机而不是直接登录？因为为了保证服务器的安全，只允许少量的IP可以访问。同时也提高了性能(iptables)。

这里我借用一下淘宝的 proxy_connect 模块的工作方式序列图，它非常形象地说明了数据的流向：

  curl                     nginx (proxy_connect)            github.com
    |                             |                          |
(1) |-- CONNECT github.com:443 -->|                          |
    |                             |                          |
    |                             |----[ TCP connection ]--->|
    |                             |                          |
(2) |<- HTTP/1.1 200           ---|                          |
    |   Connection Established    |                          |
    |                             |                          |
    |                                                        |
    ========= CONNECT tunnel has been established. ===========
    |                                                        |
    |                             |                          |
    |                             |                          |
    |   [ SSL stream       ]      |                          |
(3) |---[ GET / HTTP/1.1   ]----->|   [ SSL stream       ]   |
    |   [ Host: github.com ]      |---[ GET / HTTP/1.1   ]-->.
    |                             |   [ Host: github.com ]   |
    |                             |                          |
    |                             |                          |
    |                             |                          |
    |                             |   [ SSL stream       ]   |
    |   [ SSL stream       ]      |<--[ HTTP/1.1 200 OK  ]---'
(4) |<--[ HTTP/1.1 200 OK  ]------|   [ < html page >    ]   |
    |   [ < html page >    ]      |                          |
    |                             |                          |

注：

原文 establish 的过去分词错误拼写为 establesied，已修正

为了测试可用性，以及学习用Go语言编写网络程序，我试着完整地写了这样一个程序，不足 100 行（加上注释和空行）。

代码地址：https://gist.github.com/movsb/74e9a91b07e9f76e6c78224f8158f4ee

这段代码非常的简洁，仅两个函数。

main函数建立了一个 HTTP 服务器，把请求的处理函数指向了tunnel函数。tunnel函数是一个标准的 HTTP 处理器函数。有两个参数：一个请求，一个响应。

接下来分解这个函数的实现。

1
2
3
4
5
6


// We handle CONNECT method only
if req.Method != http.MethodConnect {
	log.Println(req.Method, req.RequestURI)
	http.NotFound(w, req)
	return
}

这段代码表示我们仅处理 CONNECT 动词，其它的动词通通返回404 未找到。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


// Proxy-Authorization is set by client software.
// Authorization is used by req.BasicAuth().
req.Header.Set("Authorization", req.Header.Get("Proxy-Authorization"))
user, pass, ok := req.BasicAuth()
if !ok || !(user == username && pass == password) {
	log.Println("bad credential.", "user:", user, "pass:", pass)
	// Don't let them know we support CONNECT.
	http.Error(w, "Method Not Allowed", http.StatusMethodNotAllowed)
	return
}

这里是用户校验部分。由于客户端的授权是放在Proxy-Authorization中的，而req.BasicAuth依赖的是Authorization，所以我们简单重新设置一下。

然后取得用户名和密码，并判断是否与预设一致。若不一致，返回错误。

1
2


// The host:port pair.
log.Println(req.RequestURI)

这句话输出第三方的地址。

1
2
3
4
5
6
7


// Connect to Remote.
dst, err := net.Dial("tcp", req.RequestURI)
if err != nil {
	http.Error(w, err.Error(), http.StatusBadRequest)
	return
}
defer dst.Close()

然后我们尝试连接到第三方，如果连接失败，返回错误。否则继续往下。

1
2


// Upon success, we respond a 200 status code to client.
w.Write(connectResponse)

到这里就已经连接第三方成功了，我们应该告诉客户端。它是一个普通的200状态码响应。

1
2
3
4
5
6
7
8


// Now, Hijack the writer to get the underlying net.Conn.
// Which can be either *tcp.Conn, for HTTP, or *tls.Conn, for HTTPS.
src, bio, err := w.(http.Hijacker).Hijack()
if err != nil {
	http.Error(w, err.Error(), http.StatusInternalServerError)
	return
}
defer src.Close()

这一步非常关键。何为hijack？我们知道，HTTP 是应用层协议，在它的下一层，是 TCP 网络层协议。hijack方法让我们可以从响应(Response)中拿到这个 TCP 连接。非常关键的一个函数。

这个函数返回两个可读可写的对象。src是TCP连接(如果是HTTPS服务器，则是TLS连接)，bio 是对 src 包装的一个带缓冲的读写者。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


wg := &sync.WaitGroup{}
wg.Add(2)

go func() {
	defer wg.Done()

	// The returned bufio.Reader may contain unprocessed buffered data from the client.
	// Copy them to dst so we can use src directly.
	if n := bio.Reader.Buffered(); n > 0 {
		n64, err := io.CopyN(dst, bio, int64(n))
		if n64 != int64(n) || err != nil {
			log.Println("io.CopyN:", n64, err)
			return
		}
	}

	// Relay: src -> dst
	io.Copy(dst, src)
}()

go func() {
	defer wg.Done()

	// Relay: dst -> src
	io.Copy(src, dst)
}()

wg.Wait()

这段是核心代码。创建了两个线程，然后调用io.Copy进行全双工的双向数据拷贝(中继)。

从src到dst的前面多了一段对带缓冲对象的处理，因为带缓冲，可能有未读完的数据，所以先确保全部读走。目的是为了能直接使用src。

不过，用bio代替src也是可以的，只是看上去效率应该会低一些。另外，如果是往bio里面写数据，记得适时调用bio.Flush()将数据刷走，否则可能会“假死”。

像下面这样运行并作为服务器：

1
2
3
4
5
6


$ go run tunnel.go

# 或者

$ go build -o tunnel tunnel.go
$ ./tunnel

可以在 cURL 中测试是否可以工作：

1

$ curl -p --proxy my_username:my_password@localhost:18080 http://www.example.com

cURL 在目标为 HTTP 而非 HTTPS 时会使用 GET 去请求。-p 可以使其总是使用CONNECT。

不出意外，服务器会打印出请求的第三方地址，cURL 会输出页面内容。

注：为了保证数据安全、防监听、插入广告，请在服务器上使用 HTTPS，勿使用 HTTP。即使用ListenAndServeTLS代替ListenAndServe。

为了查阅方便，附上完整代码：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96


package main

import (
	"io"
	"log"
	"net"
	"net/http"
	"sync"
)

var (
	listen          = "localhost:18080"
	connectResponse = []byte("HTTP/1.1 200 OK\r\n\r\n")
	username        = "my_username"
	password        = "my_password"
)

func tunnel(w http.ResponseWriter, req *http.Request) {
	// We handle CONNECT method only
	if req.Method != http.MethodConnect {
		log.Println(req.Method, req.RequestURI)
		http.NotFound(w, req)
		return
	}

	// Proxy-Authorization is set by client software.
	// Authorization is used by req.BasicAuth().
	req.Header.Set("Authorization", req.Header.Get("Proxy-Authorization"))
	user, pass, ok := req.BasicAuth()
	if !ok || !(user == username && pass == password) {
		log.Println("bad credential.", "user:", user, "pass:", pass)
		// Don't let them know we support CONNECT.
		http.Error(w, "Method Not Allowed", http.StatusMethodNotAllowed)
		return
	}

	// The host:port pair.
	log.Println(req.RequestURI)

	// Connect to Remote.
	dst, err := net.Dial("tcp", req.RequestURI)
	if err != nil {
		http.Error(w, err.Error(), http.StatusBadRequest)
		return
	}
	defer dst.Close()

	// Upon success, we respond a 200 status code to client.
	w.Write(connectResponse)

	// Now, Hijack the writer to get the underlying net.Conn.
	// Which can be either *tcp.Conn, for HTTP, or *tls.Conn, for HTTPS.
	src, bio, err := w.(http.Hijacker).Hijack()
	if err != nil {
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}
	defer src.Close()

	wg := &sync.WaitGroup{}
	wg.Add(2)

	go func() {
		defer wg.Done()

		// The returned bufio.Reader may contain unprocessed buffered data from the client.
		// Copy them to dst so we can use src directly.
		if n := bio.Reader.Buffered(); n > 0 {
			n64, err := io.CopyN(dst, bio, int64(n))
			if n64 != int64(n) || err != nil {
				log.Println("io.CopyN:", n64, err)
				return
			}
		}

		// Relay: src -> dst
		io.Copy(dst, src)
	}()

	go func() {
		defer wg.Done()

		// Relay: dst -> src
		io.Copy(src, dst)
	}()

	wg.Wait()
}

func main() {
	handler := http.HandlerFunc(tunnel)
	err := http.ListenAndServe(listen, handler)
	if err != http.ErrServerClosed {
		panic(err)
	}
}

标签：HTTP · Go