I’ve just finished watching Gems to Help You Troubleshoot Query Performance by Pedro Lopes in the GroupBy Channel on YouTube (training resources for SQL Server don’t come much better, and it’s free – go check it out). Pedro works in the Microsoft MSSQL Tiger Team and talks through the work they have done in the last year or so adding new diagnostic improvements in the SQL Server engine so that troubleshooting can be more focused and faster (as he describes it, fewer round trips). I’ll show you how you can query SQL Server Showplan XML for some of the new diagnostic information.
I’m not going to list all the improvements here, recommend you watch the video for that, but what caught my interest was the incorporation of actual IO and time statistics in the Actual Execution Plan in SQL Server 2016/SQL Server 2014 SP2.
Actual IO and Time Statistics
Using SET STATISTICS IO, TIME ON is the first thing I will use to check the performance of any newly written TSQL – it gives me the time required to parse, compile, and execute each statement and the amount of disk activity.
Even if your SQL code runs in lightning fast time, hardware these days can mask poorly written code, in particular against test data which is probably a fraction of the size of Production – check the IO messages for unnecessary table scans or round trips.
Running the statement below in the query window before executing your code will output the IO and time statistics to the Messages tab.
SET STATISTICS IO, TIME ON;
Here is the output for the sample query further down in this post.
SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. (19392 row(s) affected) Table 'FactProductInventory'. Scan count 5, logical reads 6097, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (1 row(s) affected) SQL Server Execution Times: CPU time = 269 ms, elapsed time = 105 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
The output is easy to read as the query is trivial, but the more complicated your SQL becomes, the harder it will be to make sense of the output. I cut and paste the messages into this free tool http://statisticsparser.com to produce a report. Below is the summary taken from the end of the report – much easier to read.
Statistics available on an operator and thread level
We now have statistics available on an operator and thread level in the Actual Execution Plan – a great help for anyone who thinks they have a parallelism issue and needs to investigate CXPACKET waits, as it can identify an imbalanced workload across the threads (caused by out-of-date statistics or other cardinality estimation problems).
Don’t believe everything you read on the internet and in a knee-jerk reaction switch-off parallelism, there is a difference between good and bad CXPACKET waits. The bad CXPACKET waits occur when one or more threads have finished and are waiting for remaining threads who have taken on too much work.
For further information on this wait type and others follow this link https://www.sqlskills.com/help/waits for Paul Randall’s waits library.
use [AdventureWorksDW2012] GO SET STATISTICS IO, TIME ON; GO -- Any test query that uses parallelism so we can look at the thread information in the query plan select * from [dbo].[FactProductInventory] where [MovementDate] between '01 Jun 2005' and '31 Jul 2005';
Here is the query plan for the above query and screenshots showing the persisted IO and time statistics for the Clustered Index Scan operator.
Are you sure I can’t see anything?
You may not have noticed these improvements unless you have been digging around in the Showplan XML as SQL Server Management Studio (SSMS) did not expose this information until version 16.5 released in October.
Query SQL Server Showplan XML
The query plan is just an XML document so therefore we can use XQUERY to mine and retrieve the new diagnostic information.
For this to work I saved my execution plan as testquery.sqlplan on the local filesystem at /var/opt/mssql.
NB I’m not using backslashes in the filename as I performed this demo on SQL Server running on Linux – slashes can catch me out when changing between environments
If you want to learn how to setup the same environment using Docker on your laptop, check out my previous post Installing SQL Server 2017 in a Linux container using Docker on Windows 10
declare @queryplan table ([ShowPlanXML] XML); -- bulk insert XML document into my table variable insert into @queryplan([ShowPlanXML]) select CONVERT(XML, BulkColumn) as QueryPlan from OPENROWSET(BULK '/var/opt/mssql/testquery.sqlplan', SINGLE_BLOB) as x; -- mine for the new diagnostic information using XQUERY ;with XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan') select RelOp.pln.value(N'@NodeId', N'int') as "NodeId" , RelOp.pln.value(N'@PhysicalOp', N'varchar(50)') as "OperatorName" , PerThread.c.value(N'@Thread', N'int') as "Thread" , PerThread.c.value(N'@ActualRows', N'int') as "ActualRows" , PerThread.c.value(N'@ActualRowsRead', N'int') as "ActualRowsRead" , PerThread.c.value(N'@ActualElapsedms', N'int') as "ActualElapsedms" , PerThread.c.value(N'@ActualCPUms', N'int') as "ActualCPUms" , PerThread.c.value(N'@ActualScans', N'int') as "ActualScans" , PerThread.c.value(N'@ActualLogicalReads', N'int') as "ActualLogicalReads" , PerThread.c.value(N'@ActualPhysicalReads', N'int') as "ActualPhysicalReads" , PerThread.c.value(N'@ActualReadAheads', N'int') as "ActualReadAheads" , PerThread.c.value(N'@ActualLobLogicalReads', N'int') as "ActualLobLogicalReads" , PerThread.c.value(N'@ActualLobPhysicalReads', N'int') as "ActualLobPhysicalReads" , PerThread.c.value(N'@ActualLobReadAheads', N'int') as "ActualLobReadAheads" from @queryplan AS queryplan cross apply queryplan.ShowPlanXML.nodes(N'//RelOp') RelOp(pln) cross apply RelOp.pln.nodes('RunTimeInformation/RunTimeCountersPerThread') as PerThread(c);
Feel free to edit it as you please and if you make a better version send a copy my way.