Due to lifecycle management (LCM), we replaced several Citrix NetScaler appliances with new ones. Although we conducted thorough acceptance tests before putting them into production, unfortunately, we experienced an annoying issue once they were operational.
Some users complained that they saw a spinning progress bar after they successfully logged on to the Citrix NetScaler. It was only reported by a minority of users and was resolved by refreshing their web browser sessions. In the end, users stopped reporting the issue because it occurred infrequently and the solution was simple—just press F5. We initiated an investigation in the hope of completely resolving the issue.
Just a brief overview of the setup used: we were utilizing a Citrix NetScaler GSLB configuration with an active/passive Citrix NetScaler pair in every data center. The four Citrix StoreFront servers, which are divided across the data centers, were also load balanced by the Citrix NetScaler.
We decided to investigate all the components in this order:
- Citrix StoreFront;
- StoreFront Load Balancing (on Citrix NetScaler);
- NetScaler Gateway;
- The Network itself;
The Citrix StoreFront Event Viewer (Citrix Delivery Services) wasn’t very helpful; when the issue occurred, nothing was reported that could assist us further. Because the Citrix StoreFront servers were load balanced, we thought the issue might be related to the load balancing configuration, such as session persistency. To rule out any load balancing issues, we ensured that only one Citrix StoreFront server was being used at all times, but somehow the issue still occurred.
The next component we investigated was the Citrix NetScaler. We configured additional logging so that we could see exactly what traffic was being sent from the Citrix NetScaler to the Citrix StoreFront server. Here, it was where we discovered the first strange thing: after being validated, the session policy was hit and a GET request was sent to the Citrix StoreFront server. The weird thing was that we didn’t see the GET request in the IIS logs on the Citrix StoreFront server. It appeared that the GET request got lost in transit.
We asked our network team to get involved and hoped they could explain what was happening with the GET request. Luckily, we have some guys who are completely absorbed in analyzing network traces, from which they can decipher everything. After we supplied them with several dates/timestamps when the issue had occurred, they made a revealing finding from the network traces.
The Citrix NetScaler initiates a TLS 1.0 session to the Citrix StoreFront server, which responds with a TLS 1.2 response to the Citrix NetScaler, after which the Citrix NetScaler starts transmitting data.
Next, the Citrix NetScaler sends an ACK packet to the Citrix StoreFront server, which suddenly takes 229 ms. Subsequently, a new TLS handshake occurs due to a cipher change, taking 74 ms. Afterward, the process continues, and we return to times around 10-15 ms
Our network team analyzed several reported sessions, and all of them looked similar. All the affected sessions took a long time to switch from TLS 1.0 to TLS 1.2 with the preferred cipher in place.
CTX322724: How to disable the tls1.1 and tls1.0 by SSL Profile
This led us to believe that the issue could be caused by a misconfigured TLS setting. We decided to bind an SSL profile to the StoreFront Service Groups, which only had TLS 1.2 and our supported ciphers enabled. This way, the Citrix NetScaler would start using a TLS 1.2 session for all communication with the Citrix StoreFront servers and not attempt to use TLS 1.0 at all. Since then, users stopped reporting the issue with the running progress bar, and the issue was solved!
Lesson learned: make sure that your TLS/Cipher configuration is consistent across all components used, and let someone skilled with network traces examine an affected session! Network traces are very helpful and can tell you exactly what’s going on in the network!