Incident Manager

T-Systems North America

2019-11-17 09:05:03

Job location Albuquerque, New Mexico, United States

Job type: fulltime

Job industry: Insurance & Superannuation

Job description


The Lead Technical Incident Manager role is responsible for 24x7 Major Incident coordination, communication and direction of support teams during the Incident Lifecycle. This role is the primary Lead and fully empowered to lead Incident resolution throughout the entire incident lifecycle between all the different technical teams as well as customer representatives.

Location: Downers Grove, IL / Troy, MI / Houston, TX / Albuquerque, NM


· Lead incident management process following ITIL premises as well as customer rules to resolve incidents in accordance to the service level agreements defined with customer.

· Broad understanding, knowledge, familiarity with different technology platforms for Storage, Backups, Servers, OS (Windows/Linux/Unix), Databases (SQL, Oracle), Virtualization (Vmware, Citrix), Networking and Datacenter operations.

· Coordinate technical resources to restore service as quickly as possible within the Service Level Agreement

· Coordinate teams to produce workarounds or alternatives and gain approval from the account, business and stakeholders for every major action through recovery and provide periodic updates to the leadership team

· Ensure adherence to specific customer and T-Systems Incident Management processes and procedures.

· Communicate with Top Management from both organizations incident status, responsible and actions taken to resolve the high priority incidents.

· Lead Internal and External relation between the different operations layers internally and with customer, for example, Account Manager, Delivery Manager, Operations Manager, Technical Staff and Customer Service Owners as well as customer top management.

· In case of critical incidents involve our global Incident Manager team support our global processes and policies to ensure proper communication and resolution

· Analyze and interpret complex information in order to make appropriate recommendations/ proposals to action plans, organize work streams and sequencing of troubleshooting steps during the technical calls/lifecycle.

· Manage internal and external relationships with the customer account, customer, vendors, support teams, and management. Drive internal technical calls while participating on customer technical calls.

· Report on Incidents with executive management and support the Account team on escalations.

· Coordinate with the Service Delivery Manager on customer facing bridges and technical calls.

· Support to the operational teams to improve the relevant documentation used during incident lifecycle (runbooks, working instructions, diagrams, etc)

· Work with various technical/process/management teams to maintain continuous 7x24hr coverage throughout the incident lifecycle.

Critical Thinking and Problem Solving

· Recognizes organizational and resource problems or situations that are new or without clear precedent.

· Evaluates alternatives and finds solutions that will lead to incident reduction/resolution.

· Critically evaluates solutions on the basis of logical assumptions, factual information and chooses appropriate solution.

· Based on experience support the different technical teams in order to identify solutions as soon as possible


· Participate in daily operational calls.

· Complete understanding of Incident management services and requirements.

· Consults and provides advice, facilitates discussion and resolves conflict; establishes trust; builds and uses cross-functional relationship to accomplish work objectives reducing incidents with SDM and Tiger team.

Communication Effectiveness

· Maintain a high level of Customer Satisfaction.

· Influence others with tact and diplomacy and convince others to gain cooperation and eliminating conflict.

· Maintain a well developed network of individuals and organizations to assist in achieving work related goals reducing Incidents.

· Ability to work with international staff across multiple time zones.

· Lead Incident Management calls to provide status and follow up different topics related to this function

Decision Making

· Analyzes the risks and future impact of decisions. Decisive decision making with ability to lead the technical solution.

· Anticipates consequences of actions, potential problems and opportunities for change.

· Understanding of the customer business.

· Provide all the evidence and professional opinion about the situations as well as suggest the best way to tackle problems

Qualifications, Knowledge, Skills and Expertise

· Minimum 5+ years technical experience in operation of complex IT environments including Servers, Databases, Storage technologies, Backup and Restore, Virtualization and networks

· Minimum 3 years in working on Incident Management Process and ITIL related roles

· Deep understanding of principles, practices and theories in technologies to ask the right questions to resolve the issue sooner

· Must have a good understanding on backup and restore methodologies

· Must have clear understanding and familiar knowledge on Server OS technologies like Red hat Linux, AIX, Windows, VMWare and other cloud technologies like Azure and AWS

· Having good understanding of SRM and DR BC

· Solid understanding of networking (firewall, routers, switches, VLANs ) and OSI stack

· Solid understanding of different storage technologies (Hitachi, NetApp, EMC)

· Ability to take charge and lead all levels of management including higher levels when applicable through the Incident lifecycle.

· Certified in (at least) ITIL v3 and/or v4 foundation and have a deep knowledge of ITIL processes.

· Process orientated and ability follow process, procedure and policy requirements.

· Strong communications skills.

· Participate in review and discussions of root cause analysis of repeating problems and escalation to the responsible party for recovery and future prevention.

· Provide suggestions to Operations teams on how to improve processes or current practices with his/ her experience during incident

· During long hours incident resolution, coordinate with management team to have the resources to cover in all different shifts

· Coordinate and put the hyper care monitoring and communications in place as needed after the incident fix implemented


· Bachelor of Science degree in the IT field or Computer Science (or equivalent).

We are proud to be an EEO/AA employer M/F/D/V. No person is unlawfully excluded from consideration for employment because of race, color, religious creed, national origin, ancestry, sex, age, veteran status, marital status or physical challenges. We maintain a drug-free workplace and perform pre-employment substance abuse testing.

Inform a friend!