SQL SERVER DATABASE FILES WISE I/O LATENCY PART#2

Since the DMF sys.dm_io_virtual_file_stats shows cumulative I/O statistics value for the database files, you can’t just use the DMF to shoot a conclusion that the database files are experiencing high latency issue. The reason is the overall aggregation will mask spikes of poor performance.

It is recommended to capture the latency data over a period to reach a conclusion. In this tips, we will explore how to capture I/O latency statistics data that occurred over a period using the DMF. 

But before that I would request you to have a look on the first part of this blog, that will help you to understand what all the information you will be capturing over the period  – SQL SERVER DATABASE FILES WISE I/O LATENCY PART#1

Let’s follow a step by step approach to capturing the period data;

Step1: 

  • Create a table ‘SQL_database_file_wise_I_O_latency’ the under master database
  • Captures the output from sys.dm_io_virtual_file_stats into the table
IF NOT EXISTS (SELECT * FROM [master].[sys].[objects]
    WHERE [name] = N'SQL_database_file_wise_I_O_latency')
BEGIN
USE master  
CREATE TABLE [dbo].[SQL_database_file_wise_I_O_latency](
	[database name] [nvarchar](128) NULL,
	[avg_read_stall_ms] [numeric](10, 1) NULL,
	[avg_write_stall_ms] [numeric](10, 1) NULL,
	[avg_io_stall_ms] [numeric](10, 1) NULL,
	[file size (mb)] [decimal](18, 2) NULL,
	[physical_name] [nvarchar](260) NOT NULL,
	[type_desc] [nvarchar](60) NULL,
	[io_stall_read_ms] [bigint] NOT NULL,
	[num_of_reads] [bigint] NOT NULL,
	[io_stall_write_ms] [bigint] NOT NULL,
	[num_of_writes] [bigint] NOT NULL,
	[io_stalls] [bigint] NULL,
	[total_io] [bigint] NULL,
	[capturetime] [datetime] NOT NULL
) ON [PRIMARY]
INSERT INTO SQL_database_file_wise_I_O_latency
SELECT
DB_NAME(fs.database_id) AS [database name],
CAST(fs.io_stall_read_ms/(1.0 + fs.num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],
CAST(fs.io_stall_write_ms/(1.0 + fs.num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms],
CAST((fs.io_stall_read_ms + fs.io_stall_write_ms)
/(1.0 + fs.num_of_reads + fs.num_of_writes) 
AS NUMERIC(10,1)) AS [avg_io_stall_ms],
CONVERT(DECIMAL(18,2), mf.size/128.0) AS [File Size (MB)],
mf.physical_name,
mf.type_desc,
fs.io_stall_read_ms,
fs.num_of_reads,
fs.io_stall_write_ms,
fs.num_of_writes,
fs.io_stall_read_ms + fs.io_stall_write_ms AS [io_stalls],
fs.num_of_reads + fs.num_of_writes AS [total_io],
getdate() as capturetime
FROM sys.dm_io_virtual_file_stats(null,null) AS fs
INNER JOIN sys.master_files AS mf WITH (NOLOCK) ON fs.database_id = mf.database_id
AND fs.[file_id] = mf.[file_id]
ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);

END
ELSE
INSERT INTO SQL_database_file_wise_I_O_latency
SELECT
DB_NAME(fs.database_id) AS [database name],
CAST(fs.io_stall_read_ms/(1.0 + fs.num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],
CAST(fs.io_stall_write_ms/(1.0 + fs.num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms],
CAST((fs.io_stall_read_ms + fs.io_stall_write_ms)
/(1.0 + fs.num_of_reads + fs.num_of_writes) 
AS NUMERIC(10,1)) AS [avg_io_stall_ms],
CONVERT(DECIMAL(18,2), mf.size/128.0) AS [File Size (MB)],
mf.physical_name,
mf.type_desc,
fs.io_stall_read_ms,
fs.num_of_reads,
fs.io_stall_write_ms,
fs.num_of_writes,
fs.io_stall_read_ms + fs.io_stall_write_ms AS [io_stalls],
fs.num_of_reads + fs.num_of_writes AS [total_io],
getdate() as capturetime
FROM sys.dm_io_virtual_file_stats(null,null) AS fs
INNER JOIN sys.master_files AS mf WITH (NOLOCK) ON fs.database_id = mf.database_id
AND fs.[file_id] = mf.[file_id]
ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);

Step2:

  •  Archive two weeks older data from the table. You can modify the archiving period as per requirement.
DELETE FROM master.dbo.SQL_database_file_wise_I_O_latency
WHERE capturetime <=DATEADD(WEEK,-2,GETDATE());

Script to create SQL Server Job directly

  • Created a SQL Server job using the above scripts and scripted out the SQL job below. So that, you use the script to create a job directly to capture database files wise I/O latency details. Kindly run the script on the SQL Server instance where you want to capture the data.
Note: On the first successful execution of the job, the table “SQL_database_file_wise_I_O_latency” will be created in master database. If you want to keep the table in different database, you need to replace the master database with your database name.
USE [msdb]
GO

/****** Object:  Job [SQL_database_file_wise_I_O_latency]    Script Date: 24/6/2017 11:41:48 AM ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object:  JobCategory [[Uncategorized (Local)]]    Script Date: 24/6/2017 11:41:48 AM ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'SQL_database_file_wise_I_O_latency', 
		@enabled=1, 
		@notify_level_eventlog=0, 
		@notify_level_email=0, 
		@notify_level_netsend=0, 
		@notify_level_page=0, 
		@delete_level=0, 
		@description=N'No description available.', 
		@category_name=N'[Uncategorized (Local)]', 
		@owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [Capture SQL_database_file_wise_I_O_latency]    Script Date: 24/6/2017 11:41:48 AM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Capture SQL_database_file_wise_I_O_latency', 
		@step_id=1, 
		@cmdexec_success_code=0, 
		@on_success_action=3, 
		@on_success_step_id=0, 
		@on_fail_action=2, 
		@on_fail_step_id=0, 
		@retry_attempts=0, 
		@retry_interval=0, 
		@os_run_priority=0, @subsystem=N'TSQL', 
		@command=N'IF NOT EXISTS (SELECT * FROM [master].[sys].[objects]
    WHERE [name] = N''SQL_database_file_wise_I_O_latency'')
BEGIN
USE master  
CREATE TABLE [dbo].[SQL_database_file_wise_I_O_latency](
	[database name] [nvarchar](128) NULL,
	[avg_read_stall_ms] [numeric](10, 1) NULL,
	[avg_write_stall_ms] [numeric](10, 1) NULL,
	[avg_io_stall_ms] [numeric](10, 1) NULL,
	[file size (mb)] [decimal](18, 2) NULL,
	[physical_name] [nvarchar](260) NOT NULL,
	[type_desc] [nvarchar](60) NULL,
	[io_stall_read_ms] [bigint] NOT NULL,
	[num_of_reads] [bigint] NOT NULL,
	[io_stall_write_ms] [bigint] NOT NULL,
	[num_of_writes] [bigint] NOT NULL,
	[io_stalls] [bigint] NULL,
	[total_io] [bigint] NULL,
	[capturetime] [datetime] NOT NULL
) ON [PRIMARY]
INSERT INTO SQL_database_file_wise_I_O_latency
SELECT
DB_NAME(fs.database_id) AS [database name],
CAST(fs.io_stall_read_ms/(1.0 + fs.num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],
CAST(fs.io_stall_write_ms/(1.0 + fs.num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms],
CAST((fs.io_stall_read_ms + fs.io_stall_write_ms)
/(1.0 + fs.num_of_reads + fs.num_of_writes) 
AS NUMERIC(10,1)) AS [avg_io_stall_ms],
CONVERT(DECIMAL(18,2), mf.size/128.0) AS [File Size (MB)],
mf.physical_name,
mf.type_desc,
fs.io_stall_read_ms,
fs.num_of_reads,
fs.io_stall_write_ms,
fs.num_of_writes,
fs.io_stall_read_ms + fs.io_stall_write_ms AS [io_stalls],
fs.num_of_reads + fs.num_of_writes AS [total_io],
getdate() as capturetime
FROM sys.dm_io_virtual_file_stats(null,null) AS fs
INNER JOIN sys.master_files AS mf WITH (NOLOCK) ON fs.database_id = mf.database_id
AND fs.[file_id] = mf.[file_id]
ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);

END
ELSE
INSERT INTO SQL_database_file_wise_I_O_latency
SELECT
DB_NAME(fs.database_id) AS [database name],
CAST(fs.io_stall_read_ms/(1.0 + fs.num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],
CAST(fs.io_stall_write_ms/(1.0 + fs.num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms],
CAST((fs.io_stall_read_ms + fs.io_stall_write_ms)
/(1.0 + fs.num_of_reads + fs.num_of_writes) 
AS NUMERIC(10,1)) AS [avg_io_stall_ms],
CONVERT(DECIMAL(18,2), mf.size/128.0) AS [File Size (MB)],
mf.physical_name,
mf.type_desc,
fs.io_stall_read_ms,
fs.num_of_reads,
fs.io_stall_write_ms,
fs.num_of_writes,
fs.io_stall_read_ms + fs.io_stall_write_ms AS [io_stalls],
fs.num_of_reads + fs.num_of_writes AS [total_io],
getdate() as capturetime
FROM sys.dm_io_virtual_file_stats(null,null) AS fs
INNER JOIN sys.master_files AS mf WITH (NOLOCK) ON fs.database_id = mf.database_id
AND fs.[file_id] = mf.[file_id]
ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);', 
		@database_name=N'master', 
		@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [Archieve SQL_database_file_wise_I_O_latency]    Script Date: 24/6/2017 11:41:48 AM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Archieve SQL_database_file_wise_I_O_latency', 
		@step_id=2, 
		@cmdexec_success_code=0, 
		@on_success_action=1, 
		@on_success_step_id=0, 
		@on_fail_action=2, 
		@on_fail_step_id=0, 
		@retry_attempts=0, 
		@retry_interval=0, 
		@os_run_priority=0, @subsystem=N'TSQL', 
		@command=N'DELETE FROM master.dbo.SQL_database_file_wise_I_O_latency
WHERE capturetime <=DATEADD(WEEK,-2,GETDATE());', 
		@database_name=N'master', 
		@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Schedule - SQL_database_file_wise_I_O_latency', 
		@enabled=1, 
		@freq_type=4, 
		@freq_interval=1, 
		@freq_subday_type=4, 
		@freq_subday_interval=10, 
		@freq_relative_interval=0, 
		@freq_recurrence_factor=0, 
		@active_start_date=20170624, 
		@active_end_date=99991231, 
		@active_start_time=0, 
		@active_end_time=235959, 
		@schedule_uid=N'74653a85-dca0-427d-b97f-2dc995402d3d'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:

GO
  • Once you run the above script successfully, verify the created job.

  • Verify the job steps, it will contain two steps – Capturing & archiving the data

 

  • Check the job schedule – By default, it will be capturing the data every day every 10 minutes.

Do you have any better way to capture the data? Please share via comments to help others.

Leave a comment

Your email address will not be published. Required fields are marked *