Sunday, 11 March 2012

Grooming in SCOM

Operation Manger Database

How Operations manager grooming works and how to check

The grooming settings are stored in the database in the PartitionandGroomingSettings table. Each table uses a different stored procedure to groom. You can run the following command which can give you the jobs and the groomingruntime  to check when it was last run.
select  ObjectName,GroomingSproc, DaysToKeep,GroomingRunTime from dbo.PartitionAndGroomingSettings
You can run the following query select * from dbo.InternalJobHistory
Which can let you know when the groom job has been executed and on failure the Status Code = 0 and for success Status Code = 1. The comments can also give use the clue why the same failed.

Operation Manger Database DW
How Operations manager DW grooming works and how to check
In the data warehouse the dataset maintenance task for each dataset (Event ,alert, performance, state). This is run every minute as a part of this the stage data is processed and then you can see that the groom job will be triggered.  To find the GUID you can run the following command. select * from StandardDataset once you get the GUID how the processing is done is the stored proc StandardDatasetMaintenance. (exec StandardDatasetMaintenance @DatasetId=' 6AF799BD-2CCC-41CF-97FD-058E2CBF9248'. This GUID was available for the alert data when we ran the query select * from StandardDataset).
Now how to check if grooming is working or not. First run the query to check the retention dates.
select ds.datasetDefaultName AS 'Dataset Name', sda.AggregationTypeId AS 'Agg Type 0=raw, 20=Hourly, 30=Daily', sda.MaxDataAgeDays AS 'Retention Time in Days'
from dataset ds, StandardDatasetAggregation sda
WHERE ds.datasetid = sda.datasetid ORDER by ds.datasetDefaultName
Now to check how much data for each data is there you can use this query
SELECT DATEDIFF(d, MIN(DWCreatedDateTime), GETDATE()) AS [Current] FROM Alert.vAlert
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Event.vEvent
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfRaw
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfHourly
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfDaily
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateRaw
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateHourly
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateDaily
Once you are able to find that a particular dataset has number data that what you have configured. You need to troubleshoot like this.
The first thing you should do if you suspect the grooming jobs are not running properly is check the event logs on the RMS.
If you don't see any errors, you can increase the logging debuglevel for the grooming jobs by running the following query:
Update StandardDataSet
Set DebugLevel = 3
Where DebugLevel != 3
Note: The available levels for grooming are 0,1,2,3 with each level increasing the verbosity of information logged. The default value is 0 for all datasets. Once debuglevel is increased to 3 all grooming and aggregation jobs for the specified dataset will be logged to the DebugMessage table. And for the same you can run this
select * from DebugMessage
The last column in this table, OperationDurationms is the duration it took for the grooming job to complete. Look for any really high duration, it could mean the grooming job is timing out. Please check if there are continuous NULL values as that also means timeouts.

Many a times the failure can arise when the standarddatabase maintenance dataset is not running correctly. This can happen when lot of data is there on staging tables and you can use this link for more details
http://blogs.technet.com/b/sudheesn/archive/2009/07/28/getting-31552-exception-sqlexception-timeout-expired-very-frequently-in-scom-server.aspx


Operation Manger Audit collection Database
How Operations manager Audit collection grooming works and how to check
Every 198 seconds, a “Checkpoint” is initiated on the database. At this point, we also check if any partitions in a status of “2” are older than Number of Partitions * 86400 (both configurable). If they are, dbdeletepartition.sql is initiated against the partition GUID.

To check this you can run command
SELECT
* FROM dtconfig.
The "Partition Close Time" field in the dtpartition table to determine if a partition is elegable for grooming. select min (Partitionclosetime) from dtPartition And this should be not graeter than 17 days if you had set 15 days data period.

No comments:

Post a Comment