3606 - Performing DNSSEC algorithm change in NIOS
Scenario
You are a DNS administrator at a coffee company. The internal name space includes an authoritative zone coffee.corp on the server NS1 (10.100.0.111) and a delegated zone zone kona.coffee.corp on server NS2 (10.100.0.222).
A decision is made to change algorithm from RSA/SHA256 to RSA/SHA512. You're tasked with this update, please gather the necessary information (TTL) before you begin, perform the configuration changes, and ensure necessary data has been uploaded to parent zone.
Estimate Completion Time
65 to 80 minutes
Credentials
Description | Username | Password | URL or IP |
---|---|---|---|
Grid Manager UI | admin | infoblox |
Course References
1204: DNSSEC Fundamentals
3011: DNS Troubleshooting Methodology
2204: Describing DNSSEC
2026: Configuring Recursive DNSSEC in NIOS
3028: Managing DNSSEC Keys in NIOS
Lab Initiation
Access jump-desktop
Once the lab is deployed, you can access the virtual machines required to complete this lab activity. To initiate the lab, click on the jump-desktop tile and login to the Linux UI:
Username: training
Password: infoblox
Initiate lab
To initiate the lab, double-click the Launch Lab icon on the Desktop.
Choose the lab number from the list and click OK.
After clicking OK, you will see a pop-up message with a brief description of the lab task. If the description looks correct, click Yes to continue lab initiation.
Lab initiation will take a couple of minutes to finish.
Once complete, you will see another pop-up message with the login credentials and the URL for the Grid Manager’s User Interface. Note that the credentials may differ from those from prior labs.
Tasks
Task 1: Gathering TTL information
The first step in key rollover is understanding the current TTL values
What is the current default TTL as configured for the zone kona.coffee.corp? Can this be changed?
What is the current TTL for A for the zone kona.coffee.corp? Can this be changed?
What is the current TTL for NS for the zone kona.coffee.corp? Can this be changed?
What is the current TTL for RRSIG for the zone kona.coffee.corp? Can this be changed?
What is the current TTL for DNSKEY for the zone kona.coffee.corp? Can this be changed?
Please Write down the current TTL values for each in the table below:
Record Type | Current TTL (seconds) | Can it Be Changed? |
---|---|---|
A | ||
NS | ||
DNSKEY | ||
RRSIG |
TTL is an important part of key rollover planning. Being too aggressive with timeline during key rollover may orphan signatures cached by other validating resolvers, and result in bogus responses for users attempting to resolve your authoritative name. The next task takes you through such an operation that results in breaking client resolution. This aggressive behavior in changing KSK is not recommended in a production environment, it is only done in the lab to illustrate its consequences.
Task 2: Changing the Algorithm
We are changing the KSK algorithm for our authoritative zone kona.coffee.corp, from RSA/SHA256 to RSA/SHA512. This Task walks through the process of implementing this change.
Before making changes, execute the following commands on the Jump-Desktop (10.35.22.10), the command should be successful and fully validated.
dig @10.100.0.100 sweet.kona.coffee.corp. A
Log in to NS2 (10.100.0.222):
navigate to Data Management → DNS, on the Toolbar Grid DNS Properties → DNSSEC
Change Key-Signing Key (KSK) from RSA/SHA-256 to RSA/SHA-512.
Click Save & Close
Read the warning message then click Yes
This message informs us that new algorithm will only take effect after key rollover. In a live environment, it is safe to proceed, knowing that the impact on existing signed zones will be at the next key rollover event. In the lab, we speed up this process immediately and manually (described below). This speed-up is not advised for production environment.navigate to Data Management → DNS → Zones, click on the authoritative zone kona.coffee.corp
navigate to Toolbar DNSSEC → Apply Algorithm Change
Read the warning message, click Yes and restart the services if needed.
In this task we are purposely behaving like a reckless operator who did not read this warning message carefully. In production, you will most likely answer No to this step, and wait for the automatic key rollover to occur (for ZSK), or wait for the notification to come in (for KSK). We also flush the cache on the recursive server, to speed up the process what this reckless behavior will cause down the road, once cached entries have expired.
Execute the same dig command from step 1
dig @10.100.0.100 sweet.kona.coffee.corp. A
Is it still working? Why or why not?
Flush the DNS cache on ADA (10.100.0.100) by accessing Data Management → DNS → Members, select the member, on the Toolbar Clear → Clear DNS Cache.
Execute the same dig command from step 1 and 6 again
Is it still working? Why or why not?
Identify what is the root cause, and the steps needed for remediation.
Task 3: Upload the DS record on the parent zone
We came to realize the problem we were facing was due to the key rollover we just did in the previous task, performing a key rollover requires uploading new DS records to the parent zone. We will perform that in this task.
Export the trust anchor from the delegated child zone.
Login to NS2, navigate to Data Management → DNS → Zones, click on kona.coffee.corp. In the Toolbar, select DNSSEC → Export Trust Anchors
Export Trust Anchors dialog appears, select DS records and click Export.
Save the exported trust anchor to local file. It doesn't matter what you save the file name as. For our example, the file name is ds_records.txt.
Open the file in editor and leave the editor open. We will need this information for the next step.
Import the exported trust anchor into the parent zone.
Login to NS1 (10.100.0.111), navigate to Data Management → DNS → Zones, click on coffee.corp. In the Toolbar, select DNSSEC → Import Keyset.
The Import Keyset dialog appears. Copy and paste the DS record text from text editor (ds_records.txt in our example) and click Import.
How many DS records are there now on NS1?
On the Jump-Desktop (10.35.22.10), execute the following commands:
dig @10.100.0.100 sweet.kona.coffee.corp. A
dig @10.100.0.100 kona.coffee.corp. DS +multi
Is it working? Why or why not?
Is there a TTL that we need to wait to expire? Which one is it?
Solutions
Task 1 solution: Gathering TTL information
Solution:
Some TTL values can be changed, but not for DNSKEY or RRSIG.
The TTLs in this lab are significantly reduced from usual NIOS defaults, this is done to ease the testing and troubleshooting done throughout the lab. For the actual updated TTL values please visit: https://docs.infoblox.com/space/NAG8/22251832/About+Time+To+Live+Settings#AboutTimeToLiveSettings-bookmark1536
Record Type | Current TTL (seconds) | Can it Be Changed? |
---|---|---|
A | 199 | Yes |
NS | 199 | Yes |
DNSKEY | 172800 | No |
RRSIG | same as it's signed record | Yes |
Detailed Analysis:
The SOA (Start of Authority) record provides many of the timing values, including the TTL. In the following figure, we query the authoritative server NS2 (10.100.0.222) for the SOA record for kona.coffee.corp, and we learn that the default TTL is set to 199 seconds, and the negative-caching TTL (or minimum TTL) is set to 234 seconds. The 199-second TTL can be confirmed by looking up other DNS records, such as A or NS records.
These values can be adjusted in multiple places, either in the Grid DNS Properties, Member DNS Properties, or Zone Properties. The figure below shows changing this at the zone level, we get here by navigating to Data Management → DNS → Zones, select the zone and click Edit, then select the Settings tab on the left:
To get the TTL of RRSIG and DNSKEY, we can do it with one query as shown in the following figure. The default and fixed TTL for DNSKEY record types is set to 172800 seconds (48 hours) for NIOS. The TTL for RRSIG is thesame as whatever it is signed for, for example, the RRSIG TTL for A record is 199 seconds, and RRSIG TTL for DNSKEY is 172800 seconds.
The default TTL for DNSKEY on NIOS is set to 48 hours. This setting is fixed and cannot be changed. The RRSIG validity period (different from TTL) on NIOS defualts to 4 days (this can be adjusted). When planning a key rollover, both the DNSKEY TTL and RRSIG validity period need to be taken into consideration, the minimum amount of time to wait between key rollover events should be the higher value of the two, in this case, 4 days for RRSIG validity
Task 2 solution: Changing the Algorithm
Root Cause:
Algorithm change resulted in the validating resolver attempting to validate new signature with old keys, causing validation error. After cache is cleared, parent and child have mismatching DS-KSK, results in SERVFAIL.
Detailed Analysis:
The lookup command is to ensure that the name works, and also to make sure records are cached by validating resolver ADA (10.100.0.100). The next figure shows the successful lookup of the name sweet.kona.coffee.corp.
Next, we are changing the algorithm on NS2. We do this by logging in to NS2 (10.100.0.222), navigate to Data Management → DNS, Toolbar → Grid DNS Properties → DNSSEC, change KSK from RSA/SHA-256 to RSA/SHA-512, as shown in the next figure.
After clicking Save & Close, a message appears, informing us key algorithm changes only affects zones signing for the first time, if the zone has already been signed, such is the case for kona.coffee.corp, the new algorithm is applied when the next key rollover occurs.This message informs us that new algorithm will only take effect after key rollover. In live environment, it is safe to proceed, knowing that the impact on existing signed zones will be at thenext key rollover event. In the lab, we speed up this process immediately and manually (described below), it is not advised for production environment.
After clicking away the warning message and saving changes, we want to immediately apply the algorithm change to the zone. We navigate to Data Management → DNS → Zones, click on the authoritative zone kona.coffee.corp, then navigate to Toolbar → DNSSEC → Apply Algorithm Change. This is shown in the following figure.
Another warning message appears as shown in the following figure. This warning is more serious, it creates new keys(with new algorithms), and removes the old keys. This carries the risk of breaking client validation, or orphan cached signatures. While this is not advised in production environment, we are doing this in the lab to speedup the key rollover process so we can see the new key in the zone, so we answer Yes, and restart services.In this task we are purposely behaving like a reckless operator who did not read this warning message carefully. In production, you will most likely answer No to this step, and wait for the automatic key rollover to occur (for ZSK), or wait for the notification to come in (for KSK).
Execute the same dig command querying for sweet.kona.coffee.corp, it may still be reading out of the cache as shown in the first figure. If your timing isn't right, it might already be flushed out of the cache, in that case, you will see the behavior in the second figure.
After the TTL (199 seconds) for the A record has run out, we see the following response when executing the same query. This error message is somewhat misleading, it is unlikely that the server suddenly became unreachable.
We execute the same lookup again for the A record of sweet.kona.coffee.corp, but this time disabling DNSSEC checks, and request for additional records. As shown below, we note down the ZSK tag is12885.
We can check for the DNSKEY for kona.coffee.corp as cached by ADA (10.100.0.100), as shown below. This shows that there are no ZSK by the ID 12885. This is due to the forced algorithm change we performed. The older DNSKEYs were cached before the algorithm change;however, these keys are already deleted from NS2, and new keys have been used to create new signatures. The validating resolver is attempting to validate new signature with old keys,causing validation error.
We can see many validation errors in syslog when we log into the validating resolver ADA (10.100.0.100) navigate to Administration → Logs → Syslog.
This again, is due to the fact that most records from NS2 have low TTL (199 seconds), but the DNSKEY have longer TTL (48 hours). The other records, suchas A, RRSIG, and DS have all expired after 199 seconds, and the validating resolver ADA (10.100.0.100) wentout and retrieved new answer.
The new answers are signed by different DNSKEY, but the validating resolver isusing the older, cached DNSKEY to validate new answers, thus causing in numerous validation errors, thatultimately gets mis-reported as server unreachable.
Clear DNS cache on ADA (10.100.0.100), by navigating to Data Management → DNS → Members, select themember, go to Toolbar → Clear → Clear DNS Cache.
Wait a few seconds for the clearing to take effect, then execute the same dig command from earlier. This time, we receive a SERVFAIL as shown below.
Now the issue is we generated new KSK on NS2, but never uploaded to NS1. We will perform that in the next task.
Task 3 solution: Upload the DS record on the parent zone
Root Cause:
Algorithm change resulted in KSK change in the child, new DS upload is required to the parent. While the oldDS records are in cache, validation returns SERVFAIL; after DS TTL expires and new DS records are retrieved,validation is working.
Detailed Analysis:
Follow the steps described to export the new DS record from kona.coffee.corp on NS2 (10.100.0.222) and import it into coffee.corp on NS1 (10.100.0.111)
The figure below, shows the query and result to retrieve the new set of DS records available on NS1 as part of the coffee.corp authoritative data. We can see that there are DS records with two different algorithms: one using algorithm 8 (RSA/SHA-256) and another using algorithm 10 (RSA/SHA-512).
However, there is a set of DS records cached on the validating resolver ADA (10.100.0.100). The next figure shows the query and response of the cached DS records. For the duration of this TTL (199 seconds total), the validating resolver will return SERVFAIL for the domain Kona.coffee.corp because of the DS-KSK mismatch. After the TTL has passed, new DS records will be cached on the validating resolver, and validating will be successful from that point on.