Discover Global System Analysis & Services Administration Solution.

An AI-Driven Site Reliability Engineering  Solution

our R&D Native & modular 360° Net-DevOps Solution

Contact us

WHAT YOU WILL DO WITH NKINDA GSA

NKINDA GSA is a local/external SaaS for :


1. IT Cartography like Mega Hopex
2. Service Monitoring like ServiceNow
3. Alarm collection like OpenNMS
4. Project Management like Jira
5. Work collaboration like Confluence
6. ERP like SAP

 7. Innovation new in the world
    Industrialization of System/Network
        > KPI deep checks and KPI Improvement
        > deep checks & preventive maintenance
        > Live deep troubleshooting
        > change management
        > deep checks & System/Network cartography

Learn more

1.BACKGROUND

Efficient incident and breakdown management is critical in highly regulaled and competitive sectors such as banking, finance, the electromechanical industry, IT and telecommunications. In these fields, mean-time-to-repair ( MTTR ) and meantime-between-failures ( MTBF ) are key performance indicators that have a direct impact on service reliability, customer satisfaction and, ultimately, on company profitability, and SLAs ( Service Level Agreements ).
A dedicated log collection and analysis tool can play a crucial role in optimizing the indicators, providing an in depth understanding of system and system performance and facilitating rapid incident resolution.


GSA is an advanced site reliability solution software, designed to ensure unparalleled system availability and performance.
This solution offers a competitive edge through its sophisticated alarm analysis and grading, proactive fault detection, and extensive fault resolution libraries along with root cause and reporting tools all imbedded in one box. By integrating state-of-the-art Site Reliability Engineering (SRE) / NetOps practices, the solution is tailored to meet the evolving demands of modern network operations within the Customer Network scope.
For a forward-thinking and innovation-driven company, adopting this cutting-edge software is essential to maintain its leadership in network operations and managed services. The solution not only addresses critical operational key performance indicators but also fosters a culture of continuous learning and improvement. By leveraging this software, the customer can enhance its operational efficiency, minimize downtime, and stay ahead in the competitive landscape of network technology.

Learn more

GSA objectives

The primary objectives of the GSA site reliability solution are to:

  • Ensure 100% system availability.
  • Provide advanced alarm analysis and grading.
  • Detect and resolve faults efficiently.
  • Utilize historical fault resolution libraries for fast resolution of recurrent issues.
  • Integrate modern SRE practices.
  • Address critical SRE operational KPIs such as Mean Time to Detect (MTTD), Mean Time to Resolve (MTTR), and others.
  • Ensure interoperability across all IT/Telecom-related domains.
  • Facilitate orchestrated bulk system changes with minimum to zero downtime.
  • Support network planning and optimization and performance tuning.
  • Provide performance analysis with real-time visualization and KPI tracking


Discover more

Alarm analysis and grading

The solution will incorporate an intelligent alarm analysis system capable of:
• Analyzing incoming alarms and assigning severity grades.
• Reducing false positives through machine learning algorithms.
• Prioritizing alarms based on impact and urgency.
• Help the front office do perform preliminary L1 fault resolution tasks by making use of stored root cause analysis from past experiences.

Discover more

Realtime T-Shoot and Fault Detection

The system will:
• Continuously monitor system performance.
• Allow for Realtime issue T-Shoot during incidence and problem handling.
• Identify anomalies and potential faults in real-time.
• Use predictive analytics to foresee possible system failures.


Discover more

Fault Resolution Libraries

The solution will maintain a comprehensive library of past faults and resolutions, allowing for:
• Quick reference to historical data for faster resolution.
• Automated application of proven solutions to recurring issues.
• Continuous updating and optimization of the fault resolution library.

Discover more

Task automation engine

The solution will be ideal in handling repetitive tasks and processes efficiently.
• Task Automation: The system has an AI powered engine to automate repetitive and manual tasks, improving efficiency and reducing human workload.
• Process Automation: This one is capable of automating repeated workflows, enhancing productivity, and ensuring consistency in day to day tasks


Discover more

Analytics and Reporting engine

The solution will also be able to perform
• Data Analysis: Analysis on large datasets to uncover patterns, trends, and insights that may not be evident to humans.
• Reporting: Some reports can be generated based on analyzed data trends. This can also be used to do predictive decision making and root cause determination in incident handling.


Discover more

SRE / NetOps Practices

The solution will integrate modern SRE practices, including:
• Automated incident response and resolution.
• Continuous performance monitoring and improvement.
• Reliability engineering metrics (SLAs, SLOs, SLIs).
• Post-incident reviews and blameless retrospectives.



Discover more

Interoperability

The solution will ensure ease of use and integration across all various IT domains, including it also comes along with portable executable files.
• IT INFRASTRUCTURE
• CORE NETWORK
• RAN (Radio Access Network)
• PACKET SWITCHING
• INTELLIGENT NETWORK
• VALUE ADDED SERVICES
• IP / DATACOMMS NETWORKS

This interoperability will allow seamless integration and usage across different technological environments and systems in all domains

Discover more

Orchestration

The solution will support task and process orchestration to facilitate bulk system changes, ensuring:
• Automated deployment and configuration management.
• Consistent and error-free updates across multiple systems.
• Streamlined processes for large-scale changes and migrations.


Discover more

Network planning and Optimization

The solution will assist in future network planning and optimization by:
• Providing detailed performance analytics and historical data.
• Supporting predictive modeling for capacity planning.
• Offering tools for network design and optimization based on current and projected needs.



Discover more

Performance analysis and Visualisations

The solution will include a performance analysis tool that:
• Provides real-time system monitoring.
• Tracks key performance indicators (KPIs) and generates reports.
• Includes plugins for popular visualization tools (e.g., Grafana, Kibana) to offer intuitive and customizable dashboards. All what



Discover more

DEVELOPMENT / INTEGRATION PROCESSES

Week 1

- Audit & analysis of existing
situation
- Identification & elaboration of
requirements.

Week 2

Start of dev /+ integration

weekend delivery.

Week 3

Validation tests & Delivery

###

Evaluation & observation

TREE (3) WEEKS of DEV / Core Module.

Success Stories-Telecom

-Performance analysis of web services

-Online  Charging  System  integration support

-Collect &  provide tuning LOG Tshoot Analysis

-Integration  on Open service  Gateway

- Integration   support for Backbone IP/MPLS

- Integration   support for Core  Network

Installation GSA


-Three Possibilities

- portable software

- On premise

- Cloud

Hardware specification
4 core CPU
16 GB RAM
Disk: 1 Tera

Contracts and confidentiality

NDA:Confidentiality agreement

Sale of software licenses

Technical support contract

Phased approach

The implementation will follow a phased approach:
• Phase 1: Assessment and Planning - Assess current systems in all domains, define requirements, and plan implementation. A pilot already done for IN and IP domains
• Phase 2: Integration and deployment – integration and deployment of the site reliability solution software across all domains.
• Phase 3: Testing and Validation - Conduct rigorous testing to validate functionality and performance.
• Phase 4: Training and Continues Integration and Deployment - Train staff and deploy the solution across the organization.
• Phase 5: Monitoring and Optimization - Monitor performance and optimize the solution based on feedback




Discover more

Expected Outcome

• Improved System Availability - Achieve near 100% uptime.
• Efficient Fault Management - Faster detection and resolution of issues.
• Enhanced Performance Monitoring - Real-time insights into system performance.
• Proactive Incident Management - Predict and prevent potential incidents.
• Continuous Improvement - Leverage historical data for ongoing optimization.
• Seamless Interoperability - Smooth integration across various IT domains.
• Effective Orchestration - Simplified bulk system changes.
• Optimized Network Planning - Data-driven network design and optimization.
• Comprehensive Performance Analysis - Real-time monitoring and KPI tracking with visualization tools.



Discover more

Critical GSA Solution Operational KPI

The solution will address several crucial SRE operational KPIs to ensure optimal performance and reliability, including:
• Mean Time to Detect (MTTD): The average time taken to detect an incident. Reducing MTTD helps in quicker identification of issues.
• Mean Time to Resolve (MTTR): The average time taken to resolve an incident after it has been detected. Lower MTTR indicates faster resolution capabilities.
• Service Level Indicators (SLIs): Metrics that measure the performance of a service (e.g., latency, error rate, throughput).
• Service Level Objectives (SLOs): Targets set for SLIs that define acceptable performance levels.
• Service Level Agreements (SLAs): Formal agreements that set expectations for service performance between service providers and customers.
• Change Failure Rate: The percentage of changes that result in degraded service or require remediation. Lower rates indicate more successful changes.
• Error Budget: The allowable amount of downtime or errors within a specific period, derived from SLOs.
• Incident Rate: The frequency of incidents occurring within a specific timeframe. Monitoring this helps in understanding system stability.
• Availability: The percentage of time the system is operational and available. High availability is crucial for user satisfaction.




Discover more

Evaluation Criteria

The success of the PoC will be evaluated based on:
• System Uptime - Measurement of system availability.
• Alarm Prioritization Accuracy - Reduction in false positives and accurate grading of alarms.
• Fault Resolution Time - Reduction in the time taken to resolve faults.
• User Feedback - Feedback from system administrators and users.
• Performance Metrics - Improvement in key performance indicators (KPIs) such as MTTD, MTTR, SLIs, SLOs, SLAs, Change Failure Rate, Error Budget, Incident Rate, and Availability.
• Interoperability - Successful integration and usage across various IT/ network domains.
• Orchestration Effectiveness - Efficiency and accuracy of bulk system changes with zero tolerance to downtime
• Network Planning Optimization - Quality of insights and improvements in network planning.
• Visualization and Analysis - Effectiveness of real-time monitoring and KPI tracking through visualization tools




Discover more