การตั้งค่า Platform Reliable Service บน SCP – บริษัท ซังฟอร์ เทคโนโลยี (ประเทศไทย) จำกัด

บทนำ

Platform Reliable Service ช่วยให้ผู้ดูแลระบบสามารถ monitor สถานะการทำงานของ cluster ทั้ง software และ hardware ได้แบบองค์รวม โดยสร้าง reliability center platform สำหรับการ operation และ maintenance ผู้ดูแลไม่จำเป็นต้องตรวจสอบ hardware status และ platform operation status ทีละเครื่อง แต่สามารถดูสถานะโดยรวมผ่าน SCP reliability center ได้ ฟีเจอร์นี้ประกอบด้วย Overload Check, Network Check, Datastore Status และ Data Rebuilding

ข้อกำหนดเบื้องต้น

ไม่มีข้อกำหนดเบื้องต้นเพิ่มเติม

ขั้นตอนการตั้งค่า

1. Overload Check

ตรวจสอบการใช้ resources ของ hardware CPU, memory, storage และสถานะ interface traffic ว่ามี overload หรือไม่

เข้าสู่ระบบ SCP แล้วไปที่ Reliability > Platform Status Check > Overload Check ดูการใช้ resources ของ CPU, Memory, Storage และ Interface traffic status
คลิก Start Now เพื่อรับสถานะล่าสุด คลิก Entity Description เพื่อดู detection threshold หรือคลิก Suggestions เพื่อดู suggested solution

หมายเหตุ: CPU overcommitment คืออัตราส่วน CPU resources ที่กำหนดให้ powered-on VM ต่อ total CPU resources ของ server (ไม่รวม virtual network devices และ platforms) Memory overcommitment คืออัตราส่วน actual configured memory ของ powered-on VM ต่อ total server memory

2. Network Check

ตรวจสอบ redundancy และ connectivity status ของ node management interface, overlay interface และ virtual storage interface

ไปที่ Reliability > Platform Status Check > Network Check หรือคลิก Reliability > Holistic View เพื่อดู connectivity status ของ Management Interfaces, Overlay Network Interface และ Storage Network Interface
คลิก Start Now เพื่อรับสถานะล่าสุด คลิก Entity Description เพื่อดู detection threshold หรือคลิก Suggestions เพื่อดู suggested solution

หมายเหตุ: Interface multiplexing detection ตรวจสอบ multiplexing state ของ edge port และ management interface ไม่แนะนำให้ใช้ multiplexing Reuse interface detection ตรวจสอบว่า interface มี redundant configurations เช่น aggregation หรือไม่

3. Datastore Status

ตรวจสอบ health status ของ multiple copies ข้ามหลาย nodes เพื่อให้มั่นใจว่าข้อมูลใน cluster ไม่มี single point of failure

ไปที่ Reliability > Platform Status Check > Datastore Status หรือคลิก Reliability > Holistic view > Datastore Status > View Details
ดู virtual storage Replicas, virtual datastore Status, Cache Disks และ Data Disks

4. Data Rebuilding

เมื่อ node หรือ hard disk fail และ not responding หากพื้นที่ storage เหลือเพียงพอ ระบบจะ rebuild replicas ไปยัง hard disks/nodes ที่เหลือโดยอัตโนมัติ โดยไม่ต้องเปลี่ยน disk หรือ intervention ด้วยตนเอง

ไปที่ Reliability > Platform Status Check > Data Rebuilding หรือคลิก Reliability > Holistic View > Data Rebuilding > View Details
ดูสถานะ data rebuilding

การตรวจสอบ

ตรวจสอบแต่ละ tab ที่ Reliability > Platform Status Check ว่าสถานะทั้งหมดแสดงเป็นปกติ ใช้ Holistic View เพื่อดูภาพรวมของ cluster ทั้ง running status และ hardware status

หมายเหตุ

แนะนำให้ตรวจสอบ Platform Status Check เป็นประจำเพื่อค้นหาปัญหาล่วงหน้า
Storage overcommitment ratio คืออัตราส่วน actual configured VM storage ต่อ total virtual storage ของ server
Data Rebuilding จะทำงานอัตโนมัติเมื่อมี storage space เพียงพอ โดยไม่ต้อง replace disk หรือ node

บทความที่เกี่ยวข้อง

เกี่ยวข้องกับ