Troubleshoot Active Directory authentication issues?
Question:
How can I troubleshoot Active Directory (AD) authentication issues?
Answer:
AD authentication enables users to log in to Nexthink using their AD credentials:
Same username and password they use to login into their devices or services
Users have to be added manually or using AD Provisioning into the Portal in advance
Before moving to the different troubleshooting guides, please make sure to do the basic pre-requisites and configuration checks.
Pre-requisites
All the requirements are listed in our Documentation. Besides, there is some point that worth to be highlighted:
Product has to be licensed: License must be applied beforehand, and endpoint licenses allocated to Engine(s) in Portal Administration > License page
AD accounts must exist in the AD Server, at the same time existing as Portal user Accounts
Kerberos Configuration
The Active Directory Authentication type we use is Kerberos. Therefore, your AD server must accept this authentication type as well.
If no Kerberos configuration is defined, Portal will retrieve it from the AD Server and try the login based on all the possible Key Distribution Centers (KDCs). In big organizations, this could cause delays, since one multinational organization can have dozens of KDCs spread around the world.
The approach sometimes followed is to not define the kerberos configuration on the client side (in this case, Nexthink Portal). This generally causes timeouts during the authentication process.
Therefore, we highly recommend specifying which KDCs will be used for the Portal to authenticate users via AD.
Manual Configuration
You can configure the Portal to use specific KDC servers for AD authentication and override the DNS resolution. If several Kerberos servers are available, use this procedure to ensure that the Portal will try AD authentication on every specified server that you know Portal will have good connectivity with (e.g. lower latency).
Unlike what happens with DNS resolution, where the Portal stops if the resolved KDC does not give a response, the manual configuration makes the Portal try user authentication on the next configured KDC, in the case that the previous server fails.
In order to configure a set of KDC servers for the Portal, you have to create a configuration file called krb5.conf in /var/nexthink/portal/conf.
You can start from the template file krb5_default.conf that is located in the same directory and adapt it to your specific infrastructure.
Find below an example of a krb5.conf file:
[logging]
default = CONSOLE
kdc = CONSOLE
admin_server = CONSOLE
[libdefaults]
default_realm = ad.example.com
dns_lookup_realm = true
dns_lookup_kdc = true
ticket_lifetime = 24h
forwardable = yes
realm_try_domains = 0
kdc_timeout = 30000
max_retries = 3
[realms]
DOMAIN.COM = {
kdc = ad1.domain.com
kdc = ad2.domain.com
kdc = ad3.domain.com
admin_server = ad1.domain.com
}
[domain_realm]
domain.com = DOMAIN.COM
NOTE: Since V6.8 the use of a specific krb5.conf file requires an additional change in Portal configuration
Check if file /var/nexthink/portal/conf/krb5.conf exist
If so, make sure to set the following configuration parameter in /var/nexthink/portal/conf/portal.conf:
CODEglobalconfig.sso.kerberos.force-user-realm = false
If it is not set, set it and restart the Portal
CODEsudo systemctl restart nxportal
Concurrency and timeout configuration
If a KDC server is not responding, the Portal will time out the connection and try with the next KDC from the list. If several servers fail, the authentication process can take a relatively long time. Any other AD user that tries to authenticate concurrently will have to wait until the process is complete. On the other hand, local users such as the admin account, do not have to wait for AD authentication to finish in order to log in to the Portal.
You may specify the timeout limit in the krb5.conf file:
...
[libdefaults]
...
kdc_timeout = 30000
max_retries = 3
...
Modify the value of the authentication timeout to the value that you wish.
We normally suggest setting values as in the example above.
Restart the Portal for the changes to be taken in account:
sudo systemctl restart nxportal
Alternate UPN suffixes configuration (if applicable)
We support using alternate UPN suffixes. However, many authentication issues occur because the Portal was not configured to identify them as such. Other issues can be related to the fact that the alternate UPN suffix used for authentication is not configured in the AD server, or for that particular user.
Troubleshooting Guide
Here are different sections for troubleshooting Active Directory authentication issues (then trying to login to either Finder, portal, or both).
If you have any doubts about the outputs provided by the test commands and checks in the guide, please ask the Product Support Team and provide:
Support script output from the Porta Appliance
Details on the steps done so far (incl. checks above)
The outputs of your commands
Connectivity checks
Connect to the Appliance using the Command Line Interface (CLI) or SSH.
Check the /etc/resolv.conf file to identify the DNS servers configured on the Appliance
Command:
CODEless /etc/resolv.conf | grep nameserver
Sample output:
CODEnameserver 10.10.4.2 nameserver 10.10.10.212 nameserver 10.10.22.212
Test the connectivity with the DNS servers
Command:
CODEping 10.10.4.2
Sample output:
CODEPING 10.10.4.2 (10.10.4.2) 56(84) bytes of data. 64 bytes from 10.10.4.2: icmp_seq=1 ttl=64 time=2.12 ms 64 bytes from 10.10.4.2: icmp_seq=2 ttl=64 time=0.508 ms 64 bytes from 10.10.4.2: icmp_seq=3 ttl=64 time=0.406 ms 64 bytes from 10.10.4.2: icmp_seq=4 ttl=64 time=0.406 ms --- 10.10.4.2 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3295ms rtt min/avg/max/mdev = 0.406/0.861/2.126/0.731 ms
It is usually considered as best practice to have at least two fully functional DNS Server for redundancy in case of an outage.
If test is successful, go to the next step...
Check that the Appliance is capable of communicating with the DNS servers
Command:
CODEdig 10.10.4.2
Sample output:
CODE; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.10.rc1.el6_3.6 <<>> 10.10.4.2 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 62152 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;10.10.4.2. IN A ;; AUTHORITY SECTION: . 1800 IN SOA . . 2013040801 1800 900 604800 86400 ;; Query time: 39 msec ;; SERVER: 10.10.4.2#53(10.10.4.2) ;; WHEN: Tue Apr 9 10:21:30 2013 ;; MSG SIZE rcvd: 103
If a check is successful, go to the next step...
Query the DNS to retrieve the domain associated records
Command:
CODEdig
Sample output:
CODE; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.4 <<>> ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61884 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 2, ADDITIONAL: 2 ;; QUESTION SECTION: ;. IN A ;; ANSWER SECTION: . 86400 IN A 10.10.9.4 . 86400 IN A 10.10.9.5 . 86400 IN A 10.10.9.6 . 86400 IN A 10.10.9.7 . 86400 IN A 10.10.9.8 . 86400 IN A 10.10.9.9 . 86400 IN A 10.10.4.83 . 86400 IN A 10.10.9.1 . 86400 IN A 10.10.9.2 . 86400 IN A 10.10.9.3 ;; AUTHORITY SECTION: . 86400 IN NS ns.contoso.com. . 86400 IN NS ns2.contoso.com. ;; ADDITIONAL SECTION: . 86400 IN A 10.10.0.2 . 86400 IN A 10.10.4.2 ;; Query time: 1 msec ;; SERVER: 10.10.4.2#53(10.10.4.2) ;; WHEN: Tue Apr 9 10:24:01 2013 ;; MSG SIZE rcvd: 266
This test is successful in case there is an answer section.
If the test is successful, go to the next step...
Query the DNS to retrieve the specific domain kerberos associated records (UDP)
Command:
CODEdig SRV _kerberos._udp.
Sample output:
CODE;; Truncated, retrying in TCP mode. ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.4 <<>> SRV _kerberos._udp. ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19109 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 2, ADDITIONAL: 12 ;; QUESTION SECTION: ;_kerberos._udp.. IN SRV ;; ANSWER SECTION: _kerberos._udp.. 86400 IN SRV 0 0 88 qa-ad3.ad.contoso.com. _kerberos._udp.. 86400 IN SRV 0 0 88 qa-ad4.ad.contoso.com. _kerberos._udp.. 86400 IN SRV 0 0 88 qa-ad5.ad.contoso.com. _kerberos._udp.. 86400 IN SRV 0 0 88 qa-ad6.ad.contoso.com. _kerberos._udp.. 86400 IN SRV 0 0 88 qa-ad7.ad.contoso.com. _kerberos._udp.. 86400 IN SRV 0 0 88 qa-ad8.ad.contoso.com. _kerberos._udp.. 86400 IN SRV 0 0 88 qa-ad9.ad.contoso.com. _kerberos._udp.. 86400 IN SRV 0 0 88 qa-ad.ad.contoso.com. _kerberos._udp.. 86400 IN SRV 0 0 88 qa-ad1.ad.contoso.com. _kerberos._udp.. 86400 IN SRV 0 0 88 qa-ad2.ad.contoso.com. ;; AUTHORITY SECTION: . 86400 IN NS ns2.contoso.com. . 86400 IN NS ns.contoso.com. ;; ADDITIONAL SECTION: . 86400 IN A 10.10.4.83 . 86400 IN A 10.10.9.1 . 86400 IN A 10.10.9.2 . 86400 IN A 10.10.9.3 . 86400 IN A 10.10.9.4 . 86400 IN A 10.10.9.5 . 86400 IN A 10.10.9.6 . 86400 IN A 10.10.9.7 . 86400 IN A 10.10.9.8 . 86400 IN A 10.10.9.9 . 86400 IN A 10.10.0.2 . 86400 IN A 10.10.4.2 ;; Query time: 1 msec ;; SERVER: 10.10.4.2#53(10.10.4.2) ;; WHEN: Tue Apr 9 10:27:54 2013 ;; MSG SIZE rcvd: 760
If queries are successful and retrieving correct information, go to the next step...
Query the DNS to retrieve the specific domain kerberos associated records (TCP)
Command:
CODEdig SRV _kerberos._tcp.
Sample output:
CODE; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.4 <<>> SRV _kerberos._tcp. ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64004 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 3 ;; QUESTION SECTION: ;_kerberos._tcp.. IN SRV ;; ANSWER SECTION: _kerberos._tcp.. 86400 IN SRV 0 0 88 qa-ad.ad.contoso.com. ;; AUTHORITY SECTION: . 86400 IN NS ns.contoso.com. . 86400 IN NS ns2.contoso.com. ;; ADDITIONAL SECTION: . 86400 IN A 10.10.4.83 . 86400 IN A 10.10.0.2 . 86400 IN A 10.10.4.2 ;; Query time: 1 msec ;; SERVER: 10.10.4.2#53(10.10.4.2) ;; WHEN: Tue Apr 9 10:47:39 2013 ;; MSG SIZE rcvd: 184
If this is successful, go to the next step...
Intercept the authentication queries using the Appliance CLI
Command:
CODEsudo tcpdump -nni any port 88 or port 53 -A
Sample output:
CODEtcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
By keeping the CLI open and trying to authenticate using the Portal or the Finder, you'll see in the CLI the queries sent from the Appliance and the queries received by the Appliance. By reviewing the entries displayed in real time, you'll be able to pinpoint the error messages and take appropriate actions to resolve the authentication issues.
Move on with the next sections
If everything from the previous points was successful, you can continue checking the sections below to see if there is an issue applicable to your case.
If you need any additional help, encountered any issues on either running the commands or understanding the output, please continue with the next step.
Request assistance from the Nexthink Support Team
You may also want to request help by opening a support request in the case none of the solutions proposed in this article helped.
Make sure to provide the information mentioned at the beginning of this guide, plus a traffic capture sample.
In order to capture the traffic between Portal and the AD Server, please use the following command:
CODEsudo tcpdump -nni any port 88 or port 53 -A -w tcp.pcap
Leave the command running during the time you reproduce the failed authentication attempt.
Then, you can simply cancel the command by hitting CRTL + C.
A tcp.pcap file will be created. Please attach it to the support request and we'll have a look at it for you.
Login delays/timeout using Finder
Situation
When I use an Active Directory (AD) account to log in using the Finder, it takes longer than expected to authenticate or it even times out, whereas logging in using the Portal works fine.
Solution
This can happen if the DNS infrastructure contains multiple non-responding KDC servers. Nexthink Portal will interrogate each server until one responds and authenticates the user.
To resolve this, clean your DNS entries (removing old and invalid entries) or add the KDC address directly in the Kerberos configuration file. In order to do that connect to the Appliance hosting the Portal using the command-line interface (CLI) and enter the following command to edit the configuration file used by the AD authentication mechanism:
sudo vi /var/nexthink/portal/conf/krb5.conf
Add the KDC entry in this format:
[realms]
<realm part of AD username> = { kdc = <name of kdc server> }
If you have multiple realms/KDC servers, please add them as given in the following example:
[realms]
QA.INTRA.NEXTHINK.COM = { kdc=qa-ad.intra.nexthink.com kdc=qa.intra.nexthink.com kdc=123.nexthink.com }
QA-LAU.INTRA.NEXTHINK.COM = { kdc=123.nexthink.com kdc=qa-lau.intra.nexthink.com }
Please note that the realm is always entered in the upper case.
Login fails in Portal but not in Finder
Situation
An Active Directory account being used to log in using the Nexthink Portal fails. At the same time logging in using the Nexthink Finder is successful.
Solution
The Portal may fail to connect because the realm of the account is not resolved to a responding KDC server. It is a known limitation in the Portal that realms of the AD accounts have to be resolved to a working KDC server by the DNS.
In order to solve this, either change the DNS configuration to ensure that the realms resolves to a valid KDC server or add an entry in the file /etc/hosts on the Appliance hosting the Portal, associating the IP of a valid KDC server to the realm. In order to do that, connect to the CLI of the Appliance hosting the Portal and enter the following command:
sudo vi /etc/hosts.conf
Add the realm to KDC mapping like, for example:
<IP of the KDC server> <REALM of the AD account>
192.168.0.1 AD.COMPANY.COM
Login with UpperCase username works but not with lowercase
Situation
A user has the following account in the Active Directory: User@domain.com (notice the capital letter).
To login into Portal, the user has to create an account in the Portal that corresponds to that Active Directory account. As the Portal directly interrogates Active Directory (it is case insensitive), the user can create a corresponding Portal account with upper and lower case in the username for example user@domain.com, UseR@domain.com, etc.
For Finder, on the other hand, the case of the username needs to match the one in the Active Directory. In our example, the account in Finder can only be the following: User@domain.com. If the user creates user@domain.com, the authentication will fail.
Solution
The solution is to ensure that the same casing as AD is used and enforced.