Go 的 database/sql 软件包提供了自动化数据库连接池,能够帮助开发人员有效管理连接。通常情况下,开发人员会请求某个打开的连接,执行查询,然后关闭连接以确保连接返回到池中。
开发人员常犯的一个错误是长时间持有数据库连接,从而导致性能瓶颈。新请求不得不等待可用连接,造成连接池的效率受到影响。
本文将探讨如何避免这一问题,并通过确定常见问题域和学习解决方法,优化 Go 应用以提高吞吐量。
基本示例
我们以一个返回雇员记录的基本 HTTP 处理程序为例:
func GetEmployeesHandler(w http.ResponseWriter, r *http.Request) {
rows, err := db.Query(`SELECT id, name, email FROM employee`)
if err != nil {
http.Error(w, fmt.Sprintf("error querying database: %v", err), http.StatusInternalServerError)
return
}
defer rows.Close()
var employees []Employee
for rows.Next() {
var e Employee
if err := rows.Scan(&e.ID, &e.Name, &e.Email); err != nil {
http.Error(w, fmt.Sprintf("Error scanning row: %v", err), http.StatusInternalServerError)
return
}
decorateEmployee(&e)
employees = append(employees, e)
}
if err = rows.Err(); err != nil {
http.Error(w, fmt.Sprintf("error during row iteration: %v", err), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(employees); err != nil {
http.Error(w, "Error encoding response", http.StatusInternalServerError)
return
}
}
在这个处理程序中:
- 查询数据库中的雇员记录。
- 通过 defer rows.Close() 确保在处理完结果集后关闭连接。
- 扫描每一行,并用从外部获取的数据对其进行装饰。
- 将最终结果追加到数组中。
- 检查迭代过程中的任何错误,并以 JSON 格式返回结果。
乍一看,似乎没有什么特别的地方。不过,你会期待在压力测试的时候获得更好的性能。
初步性能结果
使用 Vegeta[2] 等压力测试工具,可以模拟该端点的负载情况。在每秒 10 个请求(RPS,requests per second)的初始速率下,应用在 30 秒的测试运行中表现相对较好:
$ echo "GET http://localhost:8080/employees" | vegeta attack -duration=30s -rate=10 | tee results.bin | vegeta report
Requests [total, rate, throughput] 300, 10.03, 5.45
Duration [total, attack, wait] 52.095s, 29.9s, 22.196s
Latencies [min, mean, 50, 90, 95, 99, max] 2.318s, 11.971s, 8.512s, 26.222s, 30.001s, 30.001s, 30.001s
Bytes In [total, mean] 2290991, 7636.64
Bytes Out [total, mean] 0, 0.00
Success [ratio] 94.67%
Status Codes [code:count] 0:16 200:284
然而,当我们将负载增加到 50 RPS 时,就会发现吞吐量大幅下降,请求失败率急剧上升:
$ echo "GET http://localhost:8080/employees" | vegeta attack -duration=30s -rate=50 | tee results.bin | vegeta report
Requests [total, rate, throughput] 1500, 50.03, 4.20
Duration [total, attack, wait] 59.981s, 29.981s, 30s
Latencies [min, mean, 50, 90, 95, 99, max] 2.208s, 27.175s, 30.001s, 30.001s, 30.001s, 30.002s, 30.002s
Bytes In [total, mean] 2032879, 1355.25
Bytes Out [total, mean] 0, 0.00
Success [ratio] 16.80%
Status Codes [code:count] 0:1248 200:252
(上述状态代码为 0 表示测试运行过程中出现客户端超时)
定位瓶颈
当 RPS 为 50 时,成功率急剧下降,吞吐量降至每秒仅 4.2 个请求。为什么会这样?其中一个可能的原因是,考虑到当前资源,50 RPS 是一个不合理的目标。为了确认代码是否可以通过修改获得更好的性能,我们可以研究收集一些指标。其中一个指标来源是装饰过程,但在本文中,我们将重点关注数据库连接池统计数据。
Go 的 database/sql 软件包可通过 DBStats 函数查看应用的数据库池性能,会返回我们感兴趣的统计信息:
- InUse: 当前正在使用的连接数。
- Idle:空闲连接数。
- WaitCount:等待的连接总数。
可以通过添加另一个端点处理程序来输出这些值:
func GetInfoHandler(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(db.Stats()); err != nil {
http.Error(w, "Error encoding response", http.StatusInternalServerError)
return
}
}
重新运行上述压力测试,并对 /info 端点进行监控:
$ while true; do curl -s http://localhost:8080/info; sleep 2; done
...
{"MaxOpenConnections":15,"OpenConnections":15,"InUse":15,"Idle":0,"WaitCount":1434,"WaitDuration":1389188829869,"MaxIdleClosed":0,"MaxIdleTimeClosed":0,"MaxLifetimeClosed":0}
{"MaxOpenConnections":15,"OpenConnections":15,"InUse":15,"Idle":0,"WaitCount":1485,"WaitDuration":1582086078604,"MaxIdleClosed":0,"MaxIdleTimeClosed":0,"MaxLifetimeClosed":0}
{"MaxOpenConnections":15,"OpenConnections":15,"InUse":15,"Idle":0,"WaitCount":1485,"WaitDuration":1772844971842,"MaxIdleClosed":0,"MaxIdleTimeClosed":0,"MaxLifetimeClosed":0}
...
上述结果表明连接池已达到最大值(InUse: 15, Idle: 0),每个新请求都被迫等待(WaitCount 不断增加)。换句话说,连接池基本上处于停滞状态,从而导致了之前观察到的延迟和超时问题!
优化连接使用
查看原始代码,我们发现问题要么出在查询本身性能不佳,要么出在遍历结果集时每次调用装饰函数都需要很长时间才能返回。可以尝试在 rows.Next() 循环之外装饰记录,并将其移至数据库连接使用之下,从而找出问题所在。
以下是更新后的代码:
func GetEmployeesHandler(w http.ResponseWriter, r *http.Request) {
rows, err := db.Query(`SELECT id, name, email FROM employee`)
if err != nil {
http.Error(w, fmt.Sprintf("error querying database: %v", err), http.StatusInternalServerError)
return
}
var employees []Employee
for rows.Next() {
var e Employee
if err := rows.Scan(&e.ID, &e.Name, &e.Email); err != nil {
http.Error(w, fmt.Sprintf("Error scanning row: %v", err), http.StatusInternalServerError)
return
}
employees = append(employees, e)
}
if err = rows.Err(); err != nil {
http.Error(w, fmt.Sprintf("error during row iteration: %v", err), http.StatusInternalServerError)
return
}
rows.Close()
for i := range employees {
decorateEmployee(&employees[i])
}
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(employees); err != nil {
http.Error(w, "Error encoding response", http.StatusInternalServerError)
return
}
}
在这个重构的处理程序中,我们:
- 将所有行扫描到内存中。
- 扫描后立即关闭连接,将其释放回池。
- 在内存中装饰雇员记录,而无需保持连接打开。
优化后的性能
优化后以 50 RPS 运行相同的 Vegeta 测试,结果如下:
$ echo "GET http://localhost:8080/employees" | vegeta attack -duration=30s -rate=50 | tee results.bin | vegeta report
Requests [total, rate, throughput] 1500, 50.03, 45.78
Duration [total, attack, wait] 32.768s, 29.98s, 2.788s
Latencies [min, mean, 50, 90, 95, 99, max] 2.045s, 2.502s, 2.499s, 2.692s, 2.741s, 2.856s, 2.995s
Bytes In [total, mean] 11817000, 7878.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:1500
...
{"MaxOpenConnections":15,"OpenConnections":1,"InUse":0,"Idle":1,"WaitCount":0,"WaitDuration":0,"MaxIdleClosed":0,"MaxIdleTimeClosed":0,"MaxLifetimeClosed":0}
{"MaxOpenConnections":15,"OpenConnections":1,"InUse":0,"Idle":1,"WaitCount":0,"WaitDuration":0,"MaxIdleClosed":0,"MaxIdleTimeClosed":0,"MaxLifetimeClosed":0}
{"MaxOpenConnections":15,"OpenConnections":1,"InUse":0,"Idle":1,"WaitCount":0,"WaitDuration":0,"MaxIdleClosed":0,"MaxIdleTimeClosed":0,"MaxLifetimeClosed":0}
...
可以看到,不仅吞吐量和延迟得到了 100% 的大幅改善,而且 OpenConnections 的总数也没有超过 1,甚至还有闲置连接处于待机状态,从而使 WaitCount 始终为零!
总结
通过优化连接的处理方式,先将所有行获取到内存中,然后立即关闭连接,而不是在执行其他 I/O 绑定操作(如装饰记录)时保持连接打开。这样,数据库连接就能尽快返回到池中,为其他传入请求释放资源,从而提高吞吐量和并发性。