SCOM best practices and web resources (a different version)

Best web resources:

http://blogs.technet.com/b/kevinholman/

Recommended toolbox:

– SCOM 2007 R2 resource kit

– effective configuration viewer

– MP viewer

– Override explorer

– MPtoXML.ps1

– Remote maintenance mode scheduler

Best practices for SCOM:

General

  • Always run 64 bits hardware, OS and 64bits SQL
  • Be sure to have enough bandwidth for core OpsMgr components and agents.
  • Virtualization is supported on all OpsMgr roles but don’t cluster the Root Management server virtual.
  • Snapshot backup is not supported for disaster recovery

Operational Database

  • Limit the number of consoles sessions to less than 50
  • Configure the SQL OpsMgrDB to use simple recovery unless you plan to use log shipping
  • Be sure to have quick disks because of extensive I/O usage
  • When using multi clustering be sure the connection is very fast because of disk latency
  • Database grooming, don’t increase the default 7 days

RMS (Root Management server)

  • Never connect agents directly to the Root Management Server
  • Never connect gateway servers directly to the Root Management Server
  • The Root Management Server is most critical in RAM followed by CPU
  • Limit console connections and SDK clients (webconsole, third party tools)
  • Do not run the console on the RMS
  • The console can be used on a TS Server, but be careful with the size of each localappdata (cache file location)
  • Never put the RMS in Maintenance Mode
  • R2 improvement note: RMS can be clustered after RMS is deployed as a single server.

Management server

  • Management servers talks to Root Management Servers but writes directly to OpsMgrDB
  • Keep them close to the Root Management Server, OpsMgrDb, OpsMgrDW because of latency
  • Memory and CPU

Gateway Server (remote office)

  • Data compression by almost 50%
  • Dedicated management server for all gateways when using a large number of agents (R2 will support 1500)

Console

  • Use Clear the console cache /clearcache only when you have console issues else never use it
  • The console can be used on a TS Server, but be careful with the size of each localappdata (cache file location)
  • Customize views for each admin specialist (views by products or by service line)
  • Don’t run the console on the management server
  • Create a terminal server jump box where all your tooling is up to date with a single upgrade location.

Reporting Datawarehouse

  • Limit the number of users who can generate reports
  • Separate the SQL Data files from the transaction logs onto different disk array’s

Get a DR plan

  • Be sure you have a Disaster Recovery plan with DB and encryption key backup and test this.

Grooming:

  • Operations Database, don’t increase the default 7 days
  • Operations Database, Consider reducing performance and event data to 4 or 5 days
  • Data warehouse, use the data warehouse MP reports to examine your DW content.
  • ACS Database, R2 improvement note: Microsoft Data warehouse report MP will be loaded on install.

Notification:

  • Use alert aging to reduce false positive
  • Always configure a SMTP failover host for the notification
  • Ensure RMS communication with the communication channel (don’t forget your RMS cluster node!).
  • R2 improvement note: 5 concurrent command limit will be removed
  • R2 improvement note: Immediately send and alert from the management console.

 

source: http://itworldjd.wordpress.com/2011/10/19/scom-best-practices-and-web-resources/

Advertisements

SCOM 2007 Administrative and implementation best practices

The best practices are collected and created by the OpsMgr team with feedback of the communication, Customers and MVP’s so the real world scenarios really get back in the following optimization tips.

Agent management:

  • Never connect agent directly to the Root Management server.
  • Never connect gateway server directly to the Root management server.
  • R2 improvement note: SCOM 2007 R2 gateway server will support up to 1500 agents.

Scalability:

  • Always run 64bits hardware, 64 bits OS and 64bits SQL.
  • Always consider bandwidth considerations for core OpsMgr component and agents.
  • R2 improvement note: RMS can be clustered after RMS is deployed as a single server.

Cross platform (R2):

  • Provision management server(s) specifically for non windows monitoring

Grooming:

  • Operations Database, don’t increase the default 7 days
  • Operations Database, Consider reducing performance and event data to 4 or 5 days
  • Data warehouse, use the data warehouse MP reports to examine your DW content.
  • ACS Database,
  • R2 improvement note: Microsoft Data warehouse report MP will be loaded on install.

Notification:

  • Use alert aging to reduce false positive
  • Always configure a SMTP failover host for the notification
  • Ensure RMS communication with the communication channel (don’t forget your RMS cluster node!).
  • R2 improvement note: 5 concurrent command limit will be removed
  • R2 improvement note: Immediately send and alert from the management console.

Maintenance mode:

  • Never put the RMS in Maintenance Mode (MM).
  • Never put an object longer in MM than it needs to be, due to loss of perf data and availability data.
  • R2 improvement note: Computer, Health service and Health Service watcher will be in MM from the console and scripts.

Console Management:

  • Don’t run the console on the management server
  • Avoid using /clearcache
  • Create a terminal server jump box where all your tooling is up to date with a single upgrade location.
  • Make upgrade scenarios much easier.

Due to the amount of best practice and environmental scenarios differences not all topics are covert in this article.

So for all the SCOM administrators out there have a good look in your environment and start optimizing your operation, the benefits will be huge!

 

Source: http://www.systemcentercentral.com/scom-2007-administrative-and-implementation-best-practices/

Course Note – SCOM 2007 R2

Operations Manager 2007 R2 Workshop

Day 1.

am.

Scripts to schedule the maintenance mode – group of server, certain object,

distribution of Run As credentials..

Root Management Server -  RMS cluster, only active/passive architecture.  connecting to two DBs – operational DB (7 days, alerting) and data-warehouse (400 days, where reports from; 10 days of raw data that is not accessible).

Management Server – MS, Agents report to Management Server, MS writes to the above two DB at the same time.  Limit a couple of thousands Agents per Management Server.  Agent can point to primary and secondaries MS for redundancy (configurable using console, AD integration or PowerShell).

Web Console, which can be installed on RMS or MS.  Health Explorer… they connect to the DBs.

Audit Collection Services (ACS) – forward security logs to a different DB for reporting… enable this on DC… it can also find where the account is locked out.

Reporting Server, connects to RMS, has its own report DB for reporting structure, and connects to above DBs.

DB servers should always be on same physical LAN. … infrastructure-wise could have performance issue since there are more writing than reading.

Above all components need Kerberos, otherwise needs Certificates, and need Gateway Server (Certs on both MS and GW).  Gateway Server has Certs in case in DMZ there are 500 servers to monitor. (Although the Agent and GW still both need Cert.)

Version 2012 can monitor Network Devices.

Consoles – Operation Manager Console and Web Console.

Management Servers: pulling information from Agent

MS – Health Service…  on client side: Agent+Management Pack… MS pings Agent every 60 sec, if 3 failures (4 minutes), it will flag Agent is down.

Between Agent and MS: TCP 5723; report TCP 5724.  TCP 1433 between MS and SQL.  Web console 51908. App Exception Monitoring; ACS …

AntiVirus software need Exclude the Health Service Store, which is JET database.

MS – DAS (Data Access Service), CS (Configuration Service).

Configuration Service update – Configuration Flow: 1, New config push and check DB-MS-Agent; 2, Every 12 hrs Agent-MS; 3, When Agent service restarts.

Agents – Agent Deployment – pushed from console; manual installation (SCCM etc).

Agent can be multi-home, 2 in v2007 and 4 in v2012, for special org scenario.

Agents require Kerberos v5; full trust (2 way) between domains; all data encrypted between Agent and MS.

Agent/Agentless (ie. ATM, POS; need agent proxy)

Certificates – momcertimport.exe

Gateway Servers (GW) .. GatewayApprovalTool.exe

pm.

Object – attributes, instance of Classes // Subset of Class could be a Group

Different classes have different Attributes.  Attributes don’t change very often, because discovery uses resources on Agents and DB.

MOM2005 monitored servers, SCOM2007 doesn’t, … it looks after Applications.

Object Discovery – target classes, uses methods: Registry, WMI, script, OLE DB, LDAP, custom managed code.

Monitors – just evaluating, not store any data. have states (healthy, warning, critical, etc), intervals, and change of states are in the DB.

Monitor can create a new alert, also it can resolve itself if it’s back to healthy, sent email when resolve or even close the ticket (in Remedy).

Rules – monitors and store data to DB. …  Rule cannot close the Alerts.

Collect data generated by objects, but has no state, does not affect health.

Both monitor and rule can trigger alerts, but they are different.

Health model of an Entity – availability, performance, security, configuration.

Authoring auto tuning, data in operational DB are kept 14 days.

Single threshold, double threshold.

Monitor auto-reset, manual-reset

SCOM Tools: http://blogs.msdn.com/b/dmuscett/archive/2012/02/19/boris-s-tools-updated.aspx

Day 2

am.

Objects and Classes …

Rules and Monitors – Monitor auto alert resolution, Rule repeat count.

Task…

Diagnostics and Recovery tasks – single, multiple

Override: Discovery, Monitor, Rule.

Group

Do NOT use Group as a target for Rules/Monitors. (groups exist in RMS, so MS are not aware.)

View

Update Management Pack …

Override: Class – Group – Instance – enforced .. more specific wins

pm.

Authoring Console: Distributed Applications

Authoring Toolset (MP Authoring)

1, Operation Console;

2, Visio MP Designer (Visio 2010 Premier) .. the is the start point to customization;

3, Authoring Console;

4, Visual Studio Authoring Extension

groups and overrides, etc.

Targeting…

Visio dashboard?

Day 4

am.

Notification: Channel, Subscriber, Subscription.

Visio 2007 plug-in

Service level dashboard version 2.0

authoring

security

reporting

service level tracking