2026年CentOS MongoDB性能监控完全指南：从内置工具到生产级方案(2026)

一、为什么需要监控MongoDB性能

1.1 性能问题的常见表现

表现	可能原因	影响
查询缓慢	缺少索引、全表扫描	用户体验差
连接数飙升	连接泄漏、连接池配置不当	服务不可用
内存不足	工作集超出内存、缓存命中率低	频繁换页
磁盘IO高	写入量大、缺少索引	整体性能下降
副本集延迟	网络瓶颈、从节点负载高	数据不一致

1.2 监控的核心指标

MongoDB性能监控需关注四大类指标：

## 核心监控指标

1. 操作计数器（OPC）
   - insert/query/update/delete 每秒操作数
   - getmore/command 每秒操作数

2. 连接指标
   - 当前连接数 / 可用连接数
   - 连接创建/销毁速率

3. 资源使用
   - 内存使用量（resident/virtual）
   - 磁盘IO读写速率
   - CPU使用率

4. 复制指标
   - 副本集延迟（replication lag）
   - oplog窗口大小

二、MongoDB内置监控工具

2.1 mongostat — 实时状态监控

mongostat是MongoDB自带的实时监控工具，类似Linux的vmstat：

# 基本用法（本地）
mongostat --host 127.0.0.1 --port 27017 -u admin -p --authenticationDatabase admin

# 指定刷新间隔（每5秒）
mongostat 5

# 监控副本集所有节点
mongostat --host rs0/192.168.1.100,192.168.1.101,192.168.1.102 --port 27017

# JSON格式输出（便于脚本处理）
mongostat --json

mongostat输出字段解读：

字段	说明	告警阈值
insert/query/update/delete	每秒操作数	持续>1000需关注
conn	当前连接数	>80%最大连接数
res	常驻内存(MB)	接近系统总内存
qrw	读写队列长度	>10需关注
net_in/net_out	网络吞吐	接近带宽上限
repl	副本集状态	PRIMARY/SECONDARY

2.2 mongotop — 集合级读写时间

mongotop显示每个集合的读写耗时，帮助定位热点集合：

# 基本用法
mongotop --host 127.0.0.1 --port 27017 -u admin -p --authenticationDatabase admin

# 指定刷新间隔（每10秒）
mongotop 10

# JSON格式输出
mongotop --json

输出示例与解读：

ns                   total    read    write
users                 45ms    30ms     15ms
orders               120ms    80ms     40ms
products              30ms    25ms      5ms

orders集合读写最频繁 → 优先优化索引
users集合读多写少 → 考虑缓存策略

2.3 db.serverStatus() — 服务器状态详情

// 连接MongoDB后执行
mongosh -u admin -p --authenticationDatabase admin

// 查看完整服务器状态
db.serverStatus()

// 查看关键指标
db.serverStatus().connections    // 连接信息
db.serverStatus().opcounters     // 操作计数器
db.serverStatus().mem            // 内存使用
db.serverStatus().network        // 网络统计
db.serverStatus().repl           // 复制状态
db.serverStatus().wiredTiger     // 存储引擎详情

关键指标查询脚本：

// 保存为 check_status.js
const status = db.serverStatus();

print("=== 连接信息 ===");
print("当前连接数: " + status.connections.current);
print("可用连接数: " + status.connections.available);

print("\n=== 操作计数器 ===");
print("每秒插入: " + status.opcounters.insert);
print("每秒查询: " + status.opcounters.query);
print("每秒更新: " + status.opcounters.update);
print("每秒删除: " + status.opcounters.delete);

print("\n=== 内存使用 ===");
print("常驻内存: " + status.mem.resident + " MB");
print("虚拟内存: " + status.mem.virtual + " MB");

print("\n=== WiredTiger缓存 ===");
const wt = status.wiredTiger.cache;
print("缓存命中率: " + ((wt.hits / (wt.hits + wt.misses)) * 100).toFixed(2) + "%");
print("脏页比例: " + (wt.dirty / wt.max * 100).toFixed(2) + "%");

2.4 db.currentOp() — 当前操作监控

// 查看所有正在执行的操作
db.currentOp()

// 查看运行超过3秒的操作
db.currentOp({
  "active": true,
  "secs_running": { $gt: 3 }
})

// 查看写操作
db.currentOp({
  "active": true,
  "op": { $in: ["insert", "update", "remove"] }
})

// 终止长时间运行的操作
db.killOp(opId)

三、系统级监控

3.1 CPU监控

# 查看MongoDB进程CPU使用
top -p $(pgrep mongod)

# 或使用htop
htop -p $(pgrep mongod)

# 查看CPU核心使用情况
mpstat -P ALL 1 5

# 查看系统负载
uptime

3.2 内存监控

# 查看内存使用
free -h

# 查看MongoDB进程内存
ps aux | grep mongod

# 查看详细内存映射
pmap -x $(pgrep mongod) | head -20

# 监控页面换入换出
vmstat 1 10
# 关注 si/so 列（swap in/swap out）

3.3 磁盘IO监控

# 查看磁盘IO
iostat -xz 1 10

# 关注以下指标：
# %util > 80%  → 磁盘成为瓶颈
# await > 10ms → IO延迟过高
# svctm > 5ms  → 磁盘服务时间过长

# 查看MongoDB数据目录IO
iotop -o -p $(pgrep mongod)

3.4 网络监控

# 查看网络连接
netstat -anp | grep mongod | wc -l

# 查看网络流量
sar -n DEV 1 10

# 查看TCP连接状态
ss -s

四、日志监控与分析

4.1 慢查询日志

# 启用慢查询日志（>100ms记录）
mongosh -u admin -p --authenticationDatabase admin

# 设置慢查询阈值
db.setProfilingLevel(1, { slowms: 100 })

# 查看慢查询
db.system.profile.find().sort({ ts: -1 }).limit(10)

# 分析慢查询
db.system.profile.aggregate([
  { $group: {
      _id: "$ns",
      count: { $sum: 1 },
      avgTime: { $avg: "$millis" },
      maxTime: { $max: "$millis" }
  }},
  { $sort: { avgTime: -1 } }
])

4.2 MongoDB日志分析

# 查看MongoDB日志
tail -f /var/log/mongodb/mongod.log

# 过滤慢操作
grep "command" /var/log/mongodb/mongod.log | grep "ms"

# 统计错误
grep -c "ERROR" /var/log/mongodb/mongod.log

# 查看连接事件
grep "connection accepted" /var/log/mongodb/mongod.log | tail -20

五、生产级监控方案

5.1 Prometheus + Grafana

# 1. 安装mongodb_exporter
wget https://github.com/percona/mongodb_exporter/releases/download/v0.40.0/mongodb_exporter-0.40.0.linux-amd64.tar.gz
tar xzf mongodb_exporter-0.40.0.linux-amd64.tar.gz
sudo mv mongodb_exporter /usr/local/bin/

# 2. 创建监控用户
mongosh -u admin -p
use admin
db.createUser({
  user: "monitor",
  pwd: "MonitorPass123!",
  roles: [{ role: "clusterMonitor", db: "admin" }]
})

# 3. 启动exporter
mongodb_exporter --mongodb.uri="mongodb://monitor:MonitorPass123!@127.0.0.1:27017" --web.listen-address=":9216"

# 4. 配置systemd服务
sudo cat > /etc/systemd/system/mongodb_exporter.service << 'EOF'
[Unit]
Description=MongoDB Exporter
After=network.target

[Service]
User=mongodb
ExecStart=/usr/local/bin/mongodb_exporter \
  --mongodb.uri="mongodb://monitor:MonitorPass123!@127.0.0.1:27017" \
  --web.listen-address=":9216"
Restart=always

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable mongodb_exporter
sudo systemctl start mongodb_exporter

# 5. 配置Prometheus
# 在prometheus.yml中添加：
# - job_name: 'mongodb'
#   static_configs:
#     - targets: ['localhost:9216']

5.2 关键Grafana仪表盘指标

面板	指标	PromQL	告警条件
OPS	每秒操作数	mongodb_op_counters_total	>5000/s
连接数	当前连接	mongodb_connections	>80% max
缓存命中率	WT缓存	mongodb_wt_cache_hits/total	<95%
复制延迟	repl lag	mongodb_replset_lag	>10s
队列长度	读写队列	mongodb_global_lock_current_queue	>10
页面故障	page fault	mongodb_extra_info_page_faults	持续增长

5.3 告警规则配置

# Prometheus告警规则
groups:
  - name: mongodb_alerts
    rules:
      - alert: MongoDBHighConnections
        expr: mongodb_connections{state="current"} / mongodb_connections{state="available"} > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MongoDB连接数超过80%"

      - alert: MongoDBReplicationLag
        expr: mongodb_replset_lag > 10
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "MongoDB副本集延迟超过10秒"

      - alert: MongoDBLowCacheHitRate
        expr: rate(mongodb_wt_cache_hits[5m]) / (rate(mongodb_wt_cache_hits[5m]) + rate(mongodb_wt_cache_misses[5m])) < 0.95
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "MongoDB缓存命中率低于95%"

六、自动化监控脚本

6.1 一键健康检查脚本

#!/bin/bash
# mongodb_health_check.sh

MONGO_HOST="127.0.0.1"
MONGO_PORT="27017"
MONGO_USER="admin"
MONGO_PASS="YourPassword"

echo "=== MongoDB Health Check ==="
echo "Time: $(date)"

# 1. 检查MongoDB进程
if pgrep mongod > /dev/null; then
    echo "✅ MongoDB进程运行中"
else
    echo "❌ MongoDB进程未运行"
    exit 1
fi

# 2. 检查连接数
CONNS=$(mongosh --quiet --host $MONGO_HOST --port $MONGO_PORT -u $MONGO_USER -p $MONGO_PASS --authenticationDatabase admin --eval "db.serverStatus().connections.current")
echo "📊 当前连接数: $CONNS"

# 3. 检查副本集状态
RS_STATUS=$(mongosh --quiet --host $MONGO_HOST --port $MONGO_PORT -u $MONGO_USER -p $MONGO_PASS --authenticationDatabase admin --eval "rs.status().myState")
if [ "$RS_STATUS" = "1" ]; then
    echo "✅ 副本集状态: PRIMARY"
elif [ "$RS_STATUS" = "2" ]; then
    echo "✅ 副本集状态: SECONDARY"
else
    echo "⚠️ 副本集状态: $RS_STATUS"
fi

# 4. 检查缓存命中率
CACHE_HIT=$(mongosh --quiet --host $MONGO_HOST --port $MONGO_PORT -u $MONGO_USER -p $MONGO_PASS --authenticationDatabase admin --eval "
var s = db.serverStatus().wiredTiger.cache;
(s.hits / (s.hits + s.misses) * 100).toFixed(2)
")
echo "📊 缓存命中率: ${CACHE_HIT}%"

# 5. 检查慢查询数量
SLOW_COUNT=$(mongosh --quiet --host $MONGO_HOST --port $MONGO_PORT -u $MONGO_USER -p $MONGO_PASS --authenticationDatabase admin --eval "db.system.profile.count({millis: {\$gt: 100}})")
echo "📊 慢查询数量(>100ms): $SLOW_COUNT"

echo "=== Health Check Complete ==="

6.2 定时执行

# 添加到crontab，每5分钟执行一次
crontab -e
# 添加：
*/5 * * * * /opt/scripts/mongodb_health_check.sh >> /var/log/mongodb_health.log 2>&1

七、常见问题与优化

Q1：缓存命中率低于95%怎么办？

原因：工作集超出内存容量

解决方案：
1. 增加服务器内存
2. 优化查询，减少扫描数据量
3. 添加合适的索引
4. 考虑分片分散数据

Q2：连接数持续增长怎么办？

排查步骤：

// 查看连接来源
db.currentOp(true).inprog.filter(op => op.active).forEach(op => {
  print(op.client + " -> " + op.op + " -> " + op.ns)
})

// 按来源IP统计连接数
db.currentOp(true).inprog.reduce((acc, op) => {
  if (op.client) {
    var ip = op.client.split(":")[0];
    acc[ip] = (acc[ip] || 0) + 1;
  }
  return acc;
}, {})

Q3：副本集延迟如何监控和恢复？

# 监控延迟
mongosh --eval "rs.printSecondaryReplicationInfo()"

# 恢复步骤
# 1. 检查从节点网络
ping <secondary_host>

# 2. 检查从节点负载
ssh <secondary_host> "top -bn1 | grep mongod"

# 3. 检查oplog窗口
mongosh --eval "db.getReplicationInfo()"

八、总结

MongoDB性能监控是保障数据库稳定运行的关键，核心要点：

内置工具：mongostat实时监控、mongotop定位热点、serverStatus详细信息
系统监控：CPU、内存、磁盘IO、网络四维度全方位监控
日志分析：慢查询日志、MongoDB日志定期分析
生产方案：Prometheus+Grafana实现自动化监控告警
自动化：编写健康检查脚本，定时执行

监控黄金法则：先建立基线 → 设置告警 → 持续优化

注：本文基于CentOS 7+/MongoDB 5.0+环境编写，具体参数请根据实际版本调整。

鲨鱼博客

2026年CentOS MongoDB性能监控完全指南：从内置工具到生产级方案(2026)

一、为什么需要监控MongoDB性能

1.1 性能问题的常见表现

1.2 监控的核心指标

二、MongoDB内置监控工具

2.1 mongostat — 实时状态监控

2.2 mongotop — 集合级读写时间

2.3 db.serverStatus() — 服务器状态详情

2.4 db.currentOp() — 当前操作监控

三、系统级监控

3.1 CPU监控

3.2 内存监控

3.3 磁盘IO监控

3.4 网络监控

四、日志监控与分析

4.1 慢查询日志

4.2 MongoDB日志分析

五、生产级监控方案

5.1 Prometheus + Grafana

5.2 关键Grafana仪表盘指标

5.3 告警规则配置

六、自动化监控脚本

6.1 一键健康检查脚本

6.2 定时执行

七、常见问题与优化

Q1：缓存命中率低于95%怎么办？

Q2：连接数持续增长怎么办？

Q3：副本集延迟如何监控和恢复？

八、总结

发表回复取消回复

一、为什么需要监控MongoDB性能

1.1 性能问题的常见表现

1.2 监控的核心指标

二、MongoDB内置监控工具

2.1 mongostat — 实时状态监控

2.2 mongotop — 集合级读写时间

2.3 db.serverStatus() — 服务器状态详情

2.4 db.currentOp() — 当前操作监控

三、系统级监控

3.1 CPU监控

3.2 内存监控

3.3 磁盘IO监控

3.4 网络监控

四、日志监控与分析

4.1 慢查询日志

4.2 MongoDB日志分析

五、生产级监控方案

5.1 Prometheus + Grafana

5.2 关键Grafana仪表盘指标

5.3 告警规则配置

六、自动化监控脚本

6.1 一键健康检查脚本

6.2 定时执行

七、常见问题与优化

Q1：缓存命中率低于95%怎么办？

Q2：连接数持续增长怎么办？

Q3：副本集延迟如何监控和恢复？

八、总结

发表回复 取消回复

发表回复取消回复