An interesting case about PVS retries…

This is a post about a couple things I found at a customer site regarding PVS retries and why I came up to the conclusion they do not matter as much as most people think. There is more you need to look at.

First some background information about the environment:

  • XenServer based. The pool hosting the XenApp VMs was running 6.1 and was upgraded to 6.5 SP1.
  • Two PVS Servers, virtual, with 16GB RAM each.
  • Virtual File Server hosting the vDisk on a CIFS share. 16GB RAM.

We noticed performance started to degrade once we upgraded the pool to XenServer 6.5 SP1. For some reason users would complain about their sessions freezing for a while. After doing some investigation, I found the PVS retries to be on the higher side. First problem was what would be considered high. Some servers did show 500 or more but over a period of like eight to ten hours. That gave us around 1 retry per minute. This could not be the culprit. Several network devices retransmit data every single minute, at much worse rates and no one notices, even when real time audio and video are there.

That is when I decided to take a look at the servers themselves. A quick search on the Event Log under ‘System’ showed the following for the ‘bnistack’:

[MIoWorkerThread] Too many retries Initiate reconnect.

And right after:

[IosReconnectHA]  HA Reconnect in progress.

So what is happening here? The PVS target device driver on the XenApp, after a certain number of retries (that I am still to find what that is), automatically triggers a reconnection to another PVS server what we know is not instant. It does take a couple seconds and that is exactly why users would experience a ‘freezing’ on their sessions. After asking the users to write down the time such thing was happening we could clearly see it was exactly when the reconnection process was triggered.

If you right-click a vDisk on your PVS store and select ‘Show usage’, it does show the retries but more important, it shows to which server each device is connected to. That is when I started monitoring if the connection would change during the day and bingo, when it had changed, users would complain.

Now what? We knew what the issue was but why was this happening?

We started thinking about the XenServer 6.5 SP1 upgrade as that was the only thing that had changed. Our PVS image had only a couple versions (three I think), with the base one with the 6.1 XenServer tools and the latest one with the 6.5 SP1 one. That is when I decided to merge all versions (again just a few, under the default threshold). Once I did that, retries dropped dramatically to under 20 retries per day for 90% of the devices. Even the remaining ones fell to under 50 a day. Much better and no more HA reconnections.

The lesson learned here is if your base image has one version of the XenServer tools and different XenServer tools exist in one of the PVS image versions, you better merge everything right after the upgrade is done.

The other really odd thing that happened is once I merged the image I brought it back to the XenServer host as a new VM (so you can easily update the PVS Target Device to a newer one) and tried to start it, I got a blue screen. One more time, thinking the upgrade could have caused the problem, I decided to get the VM UUID and change its device ID by using xe vm-param-set uuid= platform:device_id=0002. That fixed the BSOD.

I am still not sure why having different XenServer tools on different versions would cause the much higher retries but I know for sure the merging fixed all that.

Resuming: PVS retries are something you do need to monitor but just looking at numbers may not tell you anything (unless you are seeing several retries per second). Also keep in mind it is all UDP based… The really important thing is indeed the HA kicking in and flipping the PVS server the target is connected to. That will cause the famous hangs and freezing on the devices.

And yes, ideally always merge your images after some major change like hypervisor tools. 🙂

CR

6,050 total views, 2 views today

PVS Retries – Script

As PVS retries can indeed cause all sorts of degradation to the user experience (i.e. applications freezing or overall slowness) and it is not something that is readily exposed on any of the Citrix monitoring/management consoles (even the PVS console does not show that info, or Director for that matter), I decided to write this little PowerShell script to get that information and show it in a nice graph. This is what it looks like:

PVSGraph

Couple comments:

  • What is considered high/normal/low for retries? I have no idea if anyone ever came up to a number. Also keep in mind the number returned by PVS is since the machine booted up or since someone reset the counter. So 1000 retries over 10 days is not a big deal if you ask me but 1000 in 5 minutes there is indeed something wrong. I would love to hear what others have to say.
  • I could (and should ) calculate and show Retries/min instead of just retries. Simply a matter of retrieving the uptime of the server, converting to minutes and dividing retries by that.
  • I assume you know how to get the MS Chart .NET libraries/PVS stuff registered/working.

So here is the code:

# This function was created by Remko, another Citrix CTP
# and probably the craziest motherfucker I have ever met.
# As the PVS PowerShell sucks, not even returning proper objects
# people like Remko took matters on their own hands.
# You can see his post here.
# http://www.remkoweijnen.nl/blog/2012/02/29/convert-mcli-output-into-powershell-objects/

function ToObject {
    param(
     [Parameter(
          Position=0,
          Mandatory=$false,
          ValueFromPipeline=$true,
          ValueFromPipelineByPropertyName=$true)
    ]
    [Alias('Command')]
    [string]$cmd
    )
 
     $collection = @()
     $item = $null
 
     switch -regex (Invoke-Expression $cmd)
     {
          "^Record\s#\d+$"
          {
                if ($item) {$collection += $item}
                $item = New-Object System.Object
          }
          "^(?<name>\w+):\s(?<value>.*)"
          {
                if ($Matches.Name -ne "Executing")
                {
                     $item | Add-Member -Type NoteProperty -Name $Matches.Name -Value $Matches.Value
                }
          }
     }
     return $collection
}


# Loads the appropriate assemblies
[void][Reflection.Assembly]::LoadWithPartialName(“System.Windows.Forms”)
[void][Reflection.Assembly]::LoadWithPartialName(“System.Windows.Forms.DataVisualization”)
Add-PSSnapin –Name McliPSSnapIn -ErrorAction SilentlyContinue
Mcli-Run SetupConnection -p server="ENTER YOUR PVS SERVER FQDN HERE (i.e. PVS01.Company.com)"
$XAServers = 'Mcli-Get DeviceInfo -p siteName="YOUR PVS SITE NAME",collectionName="DEVICE COLLECTION YOUR VMs ARE IN"' | ToObject

# Creates chart object
 $Chart = New-object System.Windows.Forms.DataVisualization.Charting.Chart
 $Chart.Width = 1000
 $Chart.Height = 600
 $Chart.Left = 10
 $Chart.Top = 10

# Creates a chartarea to draw on and add to chart
 $ChartArea = New-Object System.Windows.Forms.DataVisualization.Charting.ChartArea
 $ChartArea.AxisX.Interval = 1
 $ChartArea.AxisX.Title = “Servers”
 $ChartArea.AxisY.Interval = 50
 $ChartArea.AxisY.Title = “PVS Retries”
 $Chart.ChartAreas.Add($ChartArea)
 [void]$Chart.Series.Add(“Data”)
 $Chart.Series["Data"]["DrawingStyle"] = "Cylinder"

# Adds a data point for each server
 foreach ($server in $XAServers)
 {
 
 $dp1 = new-object System.Windows.Forms.DataVisualization.Charting.DataPoint(0, $server.status)
 
 # For my particular needs I assumed the retries as this:
 # Good, under 100. Attention, between 100 and 300. Bad, over 300.
 # Am I right? No clue. Please comment/contribute with your findings.
 
 If ([int]$server.status -lt 101) 
    {
     $dp1.Color = [System.Drawing.Color]::Green
    }
   Else
    {
     If ([int]$server.status -gt 100 -and [int]$server.status -lt 301)
        {
         $dp1.Color = [System.Drawing.Color]::Yellow
        }
       Else
        {
         $dp1.Color = [System.Drawing.Color]::Red
        }
    }

 $xlabel = $server.deviceName
 $dp1.AxisLabel = $xlabel
 $Chart.Series[“Data”].Points.Add($dp1)
 }
 # Sets the title to the date and time
 $title = new-object System.Windows.Forms.DataVisualization.Charting.Title
 $Chart.Titles.Add( $title )
 $Chart.Titles[0].Text = date

# Saves the chart to a file on the server where the script runs.
# Could be anywhere, even UNC path.
 $Chart.SaveImage(“C:\Graph\FarmRetries.png“,”png”)

It can be certainly improved and I will work on that. For now, give it a try and let me know what you think.

CR

6,582 total views, no views today

Citrix PVS Image Copy

If you built your Citrix environment properly, you should have by now at least a test environment and a production one. And if PVS is part of your deployment, the same applies to it. A development PVS and a production one.

If you do not see why you would need a test environment, separated from you production one, please stop here. This article is not for you. For sure.

That said one of the tasks I usually have to deal is to move images from a particular PVS environment to another one. As mentioned previously this usually has to do with moving something from a test/development environment to production, once it is deemed ‘good-to-go’.

To make my life easier I wrote a simple script that takes a PVS image from a particular environment/store and copies it to another one. It takes care of exporting, copying and importing the vDisk for you. Simple but effective.

Here you have it:

 

=== BEGIN ===

#
# Copies a vDisk between PVS environments.
# Cláudio Rodrigues 2014-12-24 V1.0
#

<#
.SYNOPSIS
CopyvDisk 1.0
IQBridge Inc., 2014. All Rights Reserved.
.DESCRIPTION
PowerShell script to move a vDisk from a PVS Farm to another one.
.PARAMETER vDiskName
The name of the vDisk you want to copy.
.PARAMETER SourceEnv
The PVS Environment where your vDisk is currently used.
.PARAMETER SourceStore
The PVS Store where the vDisk you want to copy is located.
.PARAMETER DestEnv
The PVS Environment that will use the vDisk.
.PARAMETER DestStore
The PVS Store where the vDisk you want to copy will be saved.
.EXAMPLE
C:\PS>
.\CopyvDisk.ps1 XenApp65V2 DEV Development PROD Production
Copies the vDisk XenApp65V2 from the DEV environment, out of the Development store
to the Production Store in PRD.

.NOTES
Author: Cláudio Rodrigues
Date:   December 24, 2014
#>

Param(
[Parameter(Mandatory=$True, HelpMessage=”The vDisk to be copied”)]$vDiskName,
[Parameter(Mandatory=$True, HelpMessage=”Source PVS Environment”)]$SourceEnv,
[Parameter(Mandatory=$True, HelpMessage=”Store where the vDisk resides”)]$SourceStore,
[Parameter(Mandatory=$True, HelpMessage=”Destination PVS Environment”)]$DestEnv,
[Parameter(Mandatory=$True, HelpMessage=”Store the vDisk will be copied to”)]$DestStore
)

Switch ($SourceEnv)
{
PROD { $SourceServer = “prodpvs.yourcompany.com” }
DEV  { $SourceServer = “devpvs.yourcompany.com” }
}

Switch ($DestEnv)
{
PROD { $DestServer = “prodpvs.yourcompany.com” }
DEV  { $DestServer = “devpvs.yourcompany.com” }
}

Add-PSSnapin –Name McliPSSnapIn -ErrorAction SilentlyContinue
Mcli-Run SetupConnection -p server=$SourceServer

$TempPath = Mcli-Get Store -p storeName=$SourceStore -f path
$SourcePath = $TempPath[4].SubString(6)
Mcli-Run ExportDisk -p diskLocatorName=$vDiskName, siteName=YOUR_SITE_NAME, storeName=$SourceStore
Mcli-Run SetupConnection -p server=$DestServer

$TempPath = Mcli-Get Store -p storeName=$DestStore -f path
$DestPath = $TempPath[4].SubString(6)

c:\windows\system32\robocopy $SourcePath $DestPath “$vDiskName.*” /MIR /xo /XF *.lok /XD WriteCache

Mcli-RunWithReturn ImportDisk -p diskLocatorName=$vDiskName, siteName=YOUR_SITE_NAME, storeName=$DestStore

Mcli-Run UnloadConnection

=== END ===

This is what you will need to change:
– If you have multiple environments (i.e. Development, Test, Pre-Production, Production, etc) you will need to add all of them by their name/code and the PVS server that is part of the environment. This is done where you see the ‘Switch’ statement. In this example I have two environments, named PROD and DEV and each one has its own PVS server.
– The site name. Replace YOUR_SITE_NAME with the correct name for your PVS Site. This script assumes the Site Name is the same across all environments (I see no reason for it to be different – if you have a reason please let us know in the comments).

The script takes five (5) parameters:
– vDisk name: the name you have for the vDisk on the PVS console, like XenApp65-v1.
– The source environment: this has to match one of the names/codes you added to the ‘Switch ($SourceEnv)’ line. In this example I created one called PROD where the PVS Server for that is prodpvs.yourcompany.com and another one called DEV (with devpvs.yourcompany.com as the PVS Server). You can name these anything you want. I used PROD and DEV as these make sense to me.
– The source store: under PVS you have your stores where the vDisks reside. Here you pass the store where the vDisk you want to copy is.
– The target environment: to which environment (as explained under source environment) the vDisk will be copied to.
– The target store: under which store on the target PVS environment you want the vDisk to be copied to.

Couple comments:

– You must make sure a vDisk with the same name does not exist on the target store. Otherwise it will fail. Yes, I am lazy and I could have added logic to the script to check for that and copy it somewhere else (or delete it) before doing the copy/import. I did not do it. Yes, because I am lazy and today is Christmas Eve.
– There is not much error checking on the script as the script assumes you know what you are doing and if things are passed properly it works flawlessly. So yes, I do not save your ass if you do not know shit. Keep that in mind.
– Of course the images have to be environment agnostic (meaning the database/farm settings will be dumped by GPO to allow you moving PVS images anywhere).
– The images have to be part of the same domain right?

Other than that a very simple script that has helped many of my customers over the years!

Time to celebrate Christmas.

Cheers.

CR

5,222 total views, 6 views today

PVS 6.1 Hell

I am currently working on a XenApp 6.5 design for a government agency here in Canada and as part of the whole thing we decided to use the latest and greatest platforms available. Yes, some may say this is a crazy approach. But my take is if I will deal with bugs, better deal with new ones than the old stuff. So we moved ahead and got VMware ESXi 5.0 as the virtualization platform and XenApp 6.5 (that is not that new) with PVS 6.1 (this is new, introduced with XenDesktop 5.6).

Everything seemed to be working fine. Creating and blowing up images, etc. All cool. Problem is the VMXNET NIC is what is supposed to be the best, supported by all vendors etc. You get the drill. So we decided to use the best performing, most supported NIC and here is where the nightmare started.

First problem encountered, after running the PVS Imaging Wizard and restarting the VM so the VHD creation would start, no matter what we would get the stupid “The vDisk is not available”. Tried everything you can imagine, from hiring a voodoo guy from Haiti to making sure the server was facing Mecca. No luck.

Once I had the brilliant idea to use the E1000 that damn P2PVS started working. Great. One problem down, a much bigger one to go.

All VMs to be deployed using PVS would blue screen, with the exception of the one where we created the image from. This was a known issue with PVS 5.6, addressed by hotfix CPVS56SP1E011. Thing is this is PVS 6.1 so that issue should not exist correct? Well, wrong. The same root cause still applies to PVS 6.1.

Once we opened the VM settings for the VMs that we would get BSOD when booting off the PVS image and changed the ethernet PCI to match the one shown on the VM used as a master for the image, everything started working perfectly.

So the lessons learned here:

1. Stick to the E1000 driver on your PVS images. Will probably make your life easier.
2. Make sure the ethernet PCI matches the master VM (found under Options, Advanced, General). Once that is done there will be no BSODs on your VMs.

CR

2,237 total views, no views today