一、SQL Server资源管理概述
SQL Server on Linux的资源管理是确保数据库性能稳定的关键。与Windows版SQL Server不同,Linux版本的资源管理有其独特之处,需要管理员深入了解CPU、内存、磁盘I/O和网络等系统资源的分配与监控。
SQL Server资源组件:
– 内存管理器:统一缓冲池,支持动态内存配置
– 调度器:支持多处理器调度,优化并发处理
– I/O管理器:异步I/O,优化磁盘访问
– 网络接口:支持高带宽网络传输
资源管理的重要性:
– 防止单一查询占用过多资源
– 确保多用户并发访问的公平性
– 优化整体系统性能
– 避免资源耗尽导致服务中断
二、系统资源要求
2.1 硬件要求
| 组件 | 最低要求 | 推荐配置 | 说明 |
|---|---|---|---|
| CPU | 2核心 | 8+核心 | 支持SMP和NUMA架构 |
| 内存 | 2GB | 16GB+ | SQL Server建议预留充足内存 |
| 磁盘空间 | 6GB | 100GB+ | 数据文件和日志文件 |
| 磁盘I/O | 500 IOPS | 5000+ IOPS | SSD推荐 |
2.2 内存要求详解
最低内存要求说明:
最小:2GB(仅SQL Server引擎)
推荐:4GB+(生产环境基本要求)
理想:8GB+(流畅运行)
最佳:16GB+(高性能应用)
内存分配建议:
– 预留2-4GB给操作系统
– SQL Server最大服务器内存设置:总内存 – 4GB
– 考虑其他服务(如SSMS、Agent)的内存需求
2.3 CPU要求详解
CPU架构支持:
– x64架构(推荐)
– ARM64架构(SQL Server 2019+支持)
CPU配置建议:
– 生产环境:8核心以上
– 启用SQL Server的CPU亲和性
– 考虑NUMA架构优化
三、内存管理与配置
3.1 查看内存使用
# 查看系统内存概况
free -h
# 查看SQL Server进程内存使用
ps aux | grep mssql
# 查看详细内存信息
cat /proc/meminfo
# 查看SQL Server内存占用(持续监控)
watch -n 5 'ps aux | grep mssql | grep -v grep'
SQL Server内存使用示例:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
mssql 12345 45.2 62.1 16384 8000 ? Ssl 02:00 245:30 /opt/mssql/bin/sqlservr
3.2 配置最大服务器内存
# 查看当前配置
sudo /opt/mssql/bin/mssql-conf get memory.memorylimitmb
# 设置最大内存(例如32GB = 32768MB)
sudo /opt/mssql/bin/mssql-conf set memory.memorylimitmb 32768
# 设置最小内存(例如16GB = 16384MB)
sudo /opt/mssql/bin/mssql-conf set memory.memorylimitmb 16384
# 重启SQL Server生效
sudo systemctl restart mssql-server
3.3 使用T-SQL查看内存配置
-- 查看当前内存配置
SELECT
name,
value,
value_in_use,
description
FROM sys.configurations
WHERE name IN ('max server memory (MB)', 'min server memory (MB)');
-- 查看SQL Server内存使用详情
SELECT
(SELECT COUNT(*) * 8 FROM sys.dm_exec_requests) AS allocated_memory_kb,
(SELECT COUNT(*) * 8 FROM sys.dm_os_memory_clerks) AS total_memory_kb,
(SELECT physical_memory_kb FROM sys.dm_os_sys_info) AS total_physical_memory_kb;
3.4 内存压力诊断
-- 查看内存 clerk 信息
SELECT
type,
name,
pages_kb / 1024 AS pages_mb,
virtual_memory_committed_kb / 1024 AS virtual_mb
FROM sys.dm_os_memory_clerks
ORDER BY pages_kb DESC;
-- 查看_buffer_pool信息
SELECT
COUNT(*) * 8 / 1024 AS buffer_pool_size_mb,
SUM(free_pages * 8) / 1024 AS free_pages_mb
FROM sys.dm_os_memory_cache_entries;
-- 查看内存压力信号
SELECT
signal_wait_time_ms,
page_life_expectancy,
batch_requests_per_sec,
buffer_cache_hit_ratio
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR';
四、CPU管理与配置
4.1 查看CPU使用
# 查看CPU使用率(总体)
top
htop
# 查看SQL Server CPU使用
top -p $(pgrep -f sqlservr)
# 查看CPU核心数
nproc
lscpu | grep -E "^CPU\(s\)|Core|Socket"
# 查看每个核心的负载
mpstat -P ALL 1 5
# CPU使用详情
cat /proc/cpuinfo
4.2 配置CPU亲和性
# 查看当前CPU亲和性配置
sudo /opt/mssql/bin/mssql-conf get processaffinity
# 设置CPU亲和性(使用前4个核心)
sudo /opt/mssql/bin/mssql-conf set processaffinity "0-3"
# 禁用CPU亲和性(让系统自动调度)
sudo /opt/mssql/bin/mssql-conf set processaffinity 0
# 重启生效
sudo systemctl restart mssql-server
4.3 使用T-SQL查看CPU使用
-- 查看当前CPU使用情况
SELECT
scheduler_id,
cpu_id,
is_online,
is_idle,
current_tasks_count,
runnable_tasks_count,
active_workers_count,
pending_disk_io_count
FROM sys.dm_os_schedulers
WHERE status = 'VISIBLE ONLINE';
-- 查看最近执行的查询CPU使用
SELECT TOP 20
query_text,
total_worker_time / 1000 AS cpu_time_ms,
execution_count,
total_worker_time / execution_count AS avg_cpu_time
FROM (
SELECT
SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,
((CASE statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2) + 1) AS query_text,
qs.total_worker_time,
qs.execution_count,
qs.statement_start_offset,
qs.statement_end_offset
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
) AS q
ORDER BY total_worker_time DESC;
4.4 高CPU使用诊断
-- 查看正在运行的查询CPU占用
SELECT
r.session_id,
r.status,
r.cpu_time,
r.total_elapsed_time,
t.text AS query_text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.cpu_time > 10000
ORDER BY r.cpu_time DESC;
-- 查看CPU密集型查询
SELECT
SUBSTRING(t.text, (r.statement_start_offset/2) + 1,
((CASE statement_end_offset
WHEN -1 THEN DATALENGTH(t.text)
ELSE r.statement_end_offset END - r.statement_start_offset)/2) + 1) AS query_text,
r.cpu_time,
r.total_elapsed_time,
r.execution_count
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.status = 'running'
ORDER BY r.cpu_time DESC;
五、磁盘I/O监控
5.1 查看磁盘使用
# 查看磁盘空间
df -h
# 查看SQL Server数据目录
ls -lh /var/opt/mssql/data/
# 查看磁盘I/O统计
iostat -x 1 5
# 查看磁盘延迟
iostat -d -x 1 2 | awk 'NR>3 {print $1, $4, $7, $8}'
# 查看文件I/O延迟
iotop -b -n 3 | grep mssql
5.2 SQL Server I/O统计
-- 查看数据库文件I/O统计
SELECT
DB_NAME(database_id) AS DatabaseName,
file_id,
num_of_reads,
num_of_writes,
num_of_bytes_read / 1024 / 1024 AS mb_read,
num_of_bytes_written / 1024 / 1024 AS mb_written,
io_stall_read_ms / NULLIF(num_of_reads, 0) AS avg_read_latency_ms,
io_stall_write_ms / NULLIF(num_of_writes, 0) AS avg_write_latency_ms
FROM sys.dm_io_virtual_file_stats(NULL, NULL)
WHERE database_id > 4
ORDER BY io_stall DESC;
-- 查看所有文件的I/O等待
SELECT
name AS file_name,
physical_name,
type_desc,
state_desc,
is_read_only,
size * 8 / 1024 AS size_mb,
growth * 8 / 1024 AS growth_mb
FROM sys.master_files
WHERE database_id > 4;
5.3 I/O瓶颈诊断
-- 查看I/O等待统计
SELECT
wait_type,
waiting_tasks_count,
wait_time_ms,
signal_wait_time_ms,
wait_time_ms / NULLIF(waiting_tasks_count, 0) AS avg_wait_ms
FROM sys.dm_os_wait_stats
WHERE wait_type IN (
'PAGEIOLATCH_SH', 'PAGEIOLATCH_EX', 'PAGEIOLATCH_UP',
'IO_COMPLETION', 'ASYNC_IO_COMPLETION', 'LOGMGR_FLUSH'
)
ORDER BY wait_time_ms DESC;
-- 查看Paginated读取
SELECT
database_id,
file_id,
num_of_reads,
num_of_writes,
io_stall_read_ms,
io_stall_write_ms,
CAST(num_of_bytes_read / 1048576.0 AS DECIMAL(10,2)) AS mb_read,
CAST(num_of_bytes_written / 1048576.0 AS DECIMAL(10,2)) AS mb_written
FROM sys.dm_io_virtual_file_stats(NULL, NULL)
ORDER BY io_stall_read_ms DESC;
六、网络资源监控
6.1 查看网络使用
# 查看网络连接
netstat -an | grep 1433
# 查看SQL Server网络连接状态
ss -tlnp | grep 1433
# 查看网络流量
nload
nethogs
# 查看SQL Server端口
cat /var/opt/mssql/log/errorlog | grep "listening"
6.2 SQL Server网络配置
# 查看TCP端口配置
sudo /opt/mssql/bin/mssql-conf get network.tcpport
# 修改TCP端口(例如改为14333)
sudo /opt/mssql/bin/mssql-conf set network.tcpport 14333
# 查看网络加密配置
sudo /opt/mssql/bin/mssql-conf get network.forceencryption
# 启用强制加密
sudo /opt/mssql/bin/mssql-conf set network.forceencryption 1
# 重启生效
sudo systemctl restart mssql-server
6.3 网络性能监控
-- 查看当前连接
SELECT
session_id,
login_time,
host_name,
program_name,
client_interface_name,
client_net_address,
auth_scheme,
encrypt_option
FROM sys.dm_exec_connections;
-- 查看连接数统计
SELECT
host_name,
program_name,
COUNT(*) AS connection_count,
MIN(login_time) AS first_connect,
MAX(login_time) AS last_connect
FROM sys.dm_exec_sessions
WHERE is_user_process = 1
GROUP BY host_name, program_name
ORDER BY connection_count DESC;
-- 查看网络I/O
SELECT
session_id,
client_net_address,
num_reads,
num_writes,
bytes_sent / 1024 AS kb_sent,
bytes_received / 1024 AS kb_received
FROM sys.dm_exec_connections;
七、资源监控脚本
7.1 综合监控脚本
#!/bin/bash
# monitor_sqlserver_resources.sh - SQL Server资源综合监控脚本
OUTPUT_FILE="/tmp/mssql_resource_monitor_$(date +%Y%m%d_%H%M%S).txt"
SA_PASSWORD="YourStrong@Password"
echo "=== SQL Server资源监控报告 ===" > $OUTPUT_FILE
echo "监控时间:$(date)" >> $OUTPUT_FILE
echo "" >> $OUTPUT_FILE
# 1. 系统资源
echo "【系统资源】" >> $OUTPUT_FILE
echo "CPU核心数:$(nproc)" >> $OUTPUT_FILE
echo "总内存:$(free -h | grep Mem | awk '{print $2}')" >> $OUTPUT_FILE
echo "磁盘空间:" >> $OUTPUT_FILE
df -h /var/opt/mssql >> $OUTPUT_FILE
echo "" >> $OUTPUT_FILE
# 2. SQL Server进程
echo "【SQL Server进程】" >> $OUTPUT_FILE
ps aux | grep mssql | grep -v grep >> $OUTPUT_FILE 2>&1
echo "" >> $OUTPUT_FILE
# 3. 内存使用
echo "【内存使用】" >> $OUTPUT_FILE
sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C -Q "
SELECT
name AS config_name,
value AS current_value,
value_in_use AS running_value,
description
FROM sys.configurations
WHERE name IN ('max server memory (MB)', 'min server memory (MB)');
" >> $OUTPUT_FILE 2>&1
echo "" >> $OUTPUT_FILE
# 4. 连接数
echo "【当前连接数】" >> $OUTPUT_FILE
sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C -Q "
SELECT
COUNT(*) AS total_connections,
SUM(CASE WHEN status = 'running' THEN 1 ELSE 0 END) AS active_connections,
SUM(CASE WHEN is_user_process = 1 THEN 1 ELSE 0 END) AS user_connections
FROM sys.dm_exec_sessions;
" >> $OUTPUT_FILE 2>&1
echo "" >> $OUTPUT_FILE
# 5. 性能指标
echo "【性能指标】" >> $OUTPUT_FILE
sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C -Q "
SELECT
AVG(batch_requests_per_sec) AS batch_requests,
AVG(buffer_cache_hit_ratio) AS cache_hit_ratio,
AVG(page_life_expectancy) AS page_life_expectancy,
AVG(workfiles_created_per_sec) AS workfiles_created
FROM sys.dm_os_performance_counters
WHERE counter_name IN ('Batch Requests/sec', 'Buffer cache hit ratio', 'Page life expectancy', 'Workfiles Created/sec');
" >> $OUTPUT_FILE 2>&1
echo ""
echo "报告已保存到:$OUTPUT_FILE"
cat $OUTPUT_FILE
7.2 资源告警脚本
#!/bin/bash
# resource_alert.sh - SQL Server资源告警脚本
ALERT_EMAIL="admin@example.com"
SA_PASSWORD="YourStrong@Password"
CPU_THRESHOLD=80
MEM_THRESHOLD=90
CONN_THRESHOLD=80
log_alert() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a /tmp/mssql_alerts.log
}
# CPU检查
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
if (( $(echo "$CPU_USAGE > $CPU_THRESHOLD" | bc -l) )); then
log_alert "警告:CPU使用率超过${CPU_THRESHOLD}%,当前:${CPU_USAGE}%"
fi
# 内存检查
MEM_USAGE=$(free | grep Mem | awk '{print int($3/$2 * 100)}')
if [ $MEM_USAGE -gt $MEM_THRESHOLD ]; then
log_alert "警告:内存使用率超过${MEM_THRESHOLD}%,当前:${MEM_USAGE}%"
fi
# 连接数检查
MAX_CONN=$(sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C -h -1 -Q "SELECT value FROM sys.configurations WHERE name = 'max server connections'" 2>/dev/null | tr -d ' ')
CUR_CONN=$(sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C -h -1 -Q "SELECT COUNT(*) FROM sys.dm_exec_sessions WHERE is_user_process = 1" 2>/dev/null | tr -d ' ')
if [ ! -z "$MAX_CONN" ] && [ ! -z "$CUR_CONN" ]; then
CONN_PCT=$((CUR_CONN * 100 / MAX_CONN))
if [ $CONN_PCT -gt $CONN_THRESHOLD ]; then
log_alert "警告:数据库连接数超过${CONN_THRESHOLD}%,当前:${CONN_PCT}%(${CUR_CONN}/${MAX_CONN})"
fi
fi
八、性能优化建议
8.1 内存优化
-- 优化内存配置建议
-- 1. 设置合理的最大服务器内存
EXEC sp_configure 'max server memory (MB)', 8192;
RECONFIGURE;
-- 2. 启用锁页内存(需root权限)
sudo bash -c 'echo "LockPagesInMemory" >> /etc/security/limits.d/mssql.conf'
sudo bash -c 'echo "* - memlock unlimited" >> /etc/security/limits.d/mssql.conf'
-- 3. 监控内存压力
SELECT * FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR';
8.2 CPU优化
-- 启用SQL Server的"max degree of parallelism"
EXEC sp_configure 'max degree of parallelism', 4;
RECONFIGURE;
-- 配置"cost threshold for parallelism"
EXEC sp_configure 'cost threshold for parallelism', 25;
RECONFIGURE;
-- 查看并行查询
SELECT
query,
plan_handle,
query_plan
FROM sys.dm_exec_query_stats qs
WHERE EXISTS (
SELECT 1 FROM sys.dm_exec_query_plan(qs.plan_handle) p
WHERE p.query_plan.value('//@Parallel', 'int') = 1
);
8.3 I/O优化
-- 查看延迟高的数据文件
SELECT
DB_NAME(mf.database_id) AS database_name,
mf.name AS logical_name,
mf.type_desc,
io_stalls_read_ms / NULLIF(num_of_reads, 0) AS avg_read_latency,
io_stalls_write_ms / NULLIF(num_of_writes, 0) AS avg_write_latency
FROM sys.dm_io_virtual_file_stats(NULL, NULL) vfs
JOIN sys.master_files mf ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
WHERE vfs.database_id > 4
ORDER BY io_stalls_read_ms + io_stalls_write_ms DESC;
-- 创建聚集索引优化I/O
CREATE CLUSTERED INDEX IX_TableName_PrimaryKey
ON TableName(PrimaryKeyColumn)
WITH (DROP_EXISTING = ON);
九、资源限制配置
9.1 配置资源 governor
-- 创建资源池
CREATE RESOURCE POOL Pool_Production
WITH (
min_cpu_percent = 20,
max_cpu_percent = 80,
min_memory_percent = 30,
max_memory_percent = 70
);
CREATE RESOURCE POOL Pool_Development
WITH (
min_cpu_percent = 10,
max_cpu_percent = 50,
min_memory_percent = 10,
max_memory_percent = 30
);
-- 创建工作负载组
CREATE WORKLOAD GROUP Group_Production
USING Pool_Production;
CREATE WORKLOAD GROUP Group_Development
USING Pool_Development;
-- 启用Resource Governor
ALTER RESOURCE GOVERNOR RECONFIGURE;
9.2 查询限制
-- 限制查询执行时间(秒)
EXEC sp_configure 'query wait (s)', 30;
RECONFIGURE;
-- 限制最大查询内存(KB)
EXEC sp_configure 'max server memory (MB)', 8192;
RECONFIGURE;
-- 限制工作线程数
EXEC sp_configure 'max worker threads', 512;
RECONFIGURE;
十、常见问题与解决方案
Q1:SQL Server内存占用过高?
原因分析:
– 未设置最大内存限制
– 存在内存泄漏
– 配置了过大的缓冲池
解决方案:
# 设置最大内存(例如32GB)
sudo /opt/mssql/bin/mssql-conf set memory.memorylimitmb 32768
sudo systemctl restart mssql-server
# 检查内存泄漏
sqlcmd -S localhost -U sa -P "YourPassword" -Q "
SELECT type, name, pages_kb / 1024 AS mb
FROM sys.dm_os_memory_clerks
ORDER BY pages_kb DESC;
"
Q2:CPU使用率持续100%?
原因分析:
– 存在CPU密集型查询
– 统计信息过时导致错误执行计划
– 缺少索引导致全表扫描
解决方案:
-- 找出CPU密集型查询
SELECT TOP 10
qs.total_worker_time / 1000 AS cpu_time_ms,
qs.execution_count,
qp.query_plan,
SUBSTRING(st.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY qs.total_worker_time DESC;
Q3:磁盘I/O成为瓶颈?
解决方案:
1. 将数据和日志文件分散到不同磁盘
2. 使用SSD提升I/O性能
3. 启用数据文件初始化
4. 调整条带大小匹配I/O模式
# 查看当前I/O统计
sqlcmd -S localhost -U sa -P "YourPassword" -Q "
SELECT * FROM sys.dm_io_virtual_file_stats(NULL, NULL);
"
十一、资源监控最佳实践
11.1 监控频率建议
| 监控项 | 频率 | 告警阈值 |
|---|---|---|
| CPU | 每1分钟 | >80% |
| 内存 | 每1分钟 | >90% |
| 磁盘I/O | 每5分钟 | >50ms |
| 连接数 | 每5分钟 | >80% |
| 查询性能 | 每15分钟 | >5秒 |
11.2 建立监控基线
-- 创建性能基线表
CREATE TABLE PerformanceBaseline (
ID INT IDENTITY PRIMARY KEY,
CollectionTime DATETIME DEFAULT GETDATE(),
CPUUsage DECIMAL(5,2),
MemoryUsageMB DECIMAL(10,2),
ActiveConnections INT,
BatchRequestsPerSec DECIMAL(10,2),
BufferCacheHitRatio DECIMAL(5,2),
PageLifeExpectancy INT
);
-- 创建定期收集作业
CREATE PROCEDURE CollectBaselineMetrics
AS
BEGIN
INSERT INTO PerformanceBaseline (CPUUsage, MemoryUsageMB, ActiveConnections, BatchRequestsPerSec, BufferCacheHitRatio, PageLifeExpectancy)
VALUES (
(SELECT cpu_busy FROM sys.dm_exec_sessions WHERE session_id = @@SPID),
(SELECT working_set_kb / 1024 FROM sys.dm_os_process_memory),
(SELECT COUNT(*) FROM sys.dm_exec_sessions WHERE is_user_process = 1),
(SELECT AVG(batch_requests_per_sec) FROM sys.dm_os_performance_counters WHERE counter_name = 'Batch Requests/sec'),
(SELECT AVG(buffer_cache_hit_ratio) FROM sys.dm_os_performance_counters WHERE counter_name = 'Buffer cache hit ratio'),
(SELECT AVG(page_life_expectancy) FROM sys.dm_os_performance_counters WHERE counter_name = 'Page life expectancy')
);
END;
11.3 自动化监控方案
- 使用Zabbix/Prometheus:系统级监控
- 使用SQL Server监控工具:数据库级监控
- 使用Grafana:可视化展示
- 设置邮件/短信告警:及时通知
总结
CentOS上SQL Server资源监控与管理的关键要点:
- 内存管理:设置最大/最小服务器内存,监控缓冲池使用
- CPU管理:配置亲和性,识别CPU密集型查询
- I/O监控:监控读写延迟,优化数据文件分布
- 网络监控:跟踪连接数,监控网络I/O
- 自动化:脚本监控、告警设置、定期基线收集
- 优化:根据监控数据调整配置,解决性能瓶颈
掌握这些资源监控技巧,可以确保SQL Server在CentOS环境下稳定高效运行。
本文基于SQL Server 2022和CentOS Stream 9编写,适用于CentOS 7+/RHEL 7+/CentOS Stream环境。