Certificate management for STIR/SHAKEN
This whitepaper explains the basics of creating and managing keys and certificates in the SHAKEN framework for secure caller ID. It also recommends certificate management features that enhance a SHAKEN solution’s reliability and performance.
Robocalls are rapidly overwhelming the telephone world. According to a study analyzing over 50 million calls, the number of robocalls is dramatically increasing, from 3.7% of total calls in 2017 to 29.2% in 2018. And it’s projected to reach 44.6% by early 2019.1 The U.S. has already been hit with 33 billion robocalls this year.2
The industry-led solutions STIR and SHAKEN have been officially endorsed by the Canadian Radio-television and Telecommunications Commission (CRTC)3 and Federal Communications Commission (FCC) to fight spoofed robocalls.4
While SHAKEN promises to close the door on caller ID spoofing, providers must take care to implement it correctly to ensure high standards for security, speed, and reliability. This is especially important in making provisions to manage the certificates, private and public keys used to sign and verify calls. This paper will provide an in-depth overview of this process.
Certificates and public/private key basics
In the SHAKEN framework, service providers use public and private keys and digital certificates to create and verify signatures. Service providers manage these private keys and certificates through a certificate management service (CMS) which can be software run by the service provider or a hosted solution.5
The CMS creates a pair of public and private keys. The private key is used by the service provider to sign calls. The public key is then used by other service providers to verify the signature’s owner by checking that the signature was created by the private key.
Clearly, only the true owner of the private key should be allowed to sign calls with its private key. To prevent unauthorized use of the private key, the CMS encrypts it and places it into a secure key store (SKS).
The CMS needs to make the public key available for anyone to use at any time, but if all public keys are publicly available, how does anyone know who really owns a specific public key? Someone with bad intentions could claim that they are a certain service provider and own their public key.
For others to trust that a public key belongs to a certain service provider, the CMS needs to present the public key in a certificate, a public record that contains the public key and information about its owner. The certificate is created and signed by a certificate authority, a third party trusted by all service providers.
To obtain the certificate, the CMS sends its public key to the certificate authority, which returns a signed certificate containing the public key and information about the owner.
Then the CMS stores the certificate into the certificate repository. This is the service provider’s public repository that others can access when they want to to verify the signature.
How keys are generated and managed
The public and private keys are powerful tools to authorize and verify truthful caller IDs, but they must be created and managed properly for SHAKEN to remain effective.
When evaluating a SHAKEN solution, a service provider must consider how the public and private keys are generated and how the private key is kept secure.
Public and private key generation
The public and private keys are long, complex numbers that are mathematically related. The CMS uses a set of mathematical algorithms to create the public and private keys. Although the public and private keys must be mathematically related, the mathematical algorithms are carefully designed so that the private key cannot be derived, even by one who knows the matching public key and mathematical algorithms used to create it.
Here is an example of a public key:
Private keys must be kept safe
There’s nothing structurally significant about the private key—it’s simply a string of characters. But private keys must be kept secret, which requires implementing best practices for security.
There are many ways to secure the private key, but the strongest way to secure it is by using encryption, a process to secure data from unwanted access by translation into an incomprehensible string of characters.6 Strong encryption allows users to trust the data they use and entities they interact with.
Unencrypted data, called plaintext, are encrypted using a unique data key. The data key is used to encrypt data such as the private key. The encrypted data, called ciphertext, can only be decrypted with the same data key that encrypted the plaintext. The data key and the plaintext private key should only be accessed by the service provider who owns them.
Authorized users have access to the data key and can decrypt the ciphertext. Strong encryption algorithms make it impossible for unauthorized users to figure out the plaintext from the ciphertext without the data key.
Security is an essential feature to implement SHAKEN effectively. A SHAKEN solution should be created by a provider who understands security best practices. Encryption as a concept is quite simple, but the reality of designing secure systems can be complicated.
Maximize security with envelope encryption
Many different methods of encryption have been developed, but the most secure method of encryption that can be implemented on hosted solutions is envelope encryption.
In all instances of encryption, access to the data key is essential for access to the plaintext private key. To keep the private key safe, the data key must be kept as secure as the private key.
With envelope encryption, the plaintext data key is generated and encrypts the plaintext private key. Then the plaintext data key is secured by encrypting it with another key, called the master key.7
The ciphertext private key and ciphertext data key are then stored in the SHAKEN solution’s secure key store (SKS).
When the authentication service needs the private key to sign a call, it takes a copy of the ciphertext private key and data key from the SKS. The authentication service opens an encrypted Transport Layer Security protocol (TLS) connection to send the ciphertext data key to the master key with a request to decrypt the key.8 After the master key decrypts the data key, the plaintext data key is returned to the authentication service through TLS. The data key decrypts the private key, and the authentication service uses the plaintext private key to sign a call.
After the plaintext private key and data key are used by the authentication service, they are immediately destroyed, leaving the private key and data key available only as ciphertext in the SKS. The plaintext private key and data key are never stored on disk, so effectively the plaintext private key and data key only exist immediately when the private key is created and when the authentication service must sign calls.
The master key can be stored in a hardware encryption device. This is an external network-attached device that protects keys and does cryptographic operations. A type of hardware encryption device called a hardware security module is typically used to protect private keys with maximum security.9 These devices are physically tamper-proof and less vulnerable to compromise than general-purpose computers because of their security-focused operating systems, limited access via a network interface that is strictly controlled by internal rules, and active protection of cryptographic material.10
Encryption and decryption of data keys take place in the hardware security module to protect the master key. For a hosted solution, the CMS interacts with the master key through an external query to send the ciphertext data key. This can slightly increase the time needed for the authentication service to process and sign calls. For an on-premises solution, the private key can be directly secured using a hardware encryption device to bypass the envelope encryption scheme and decrease the time needed to sign calls.
Regardless of whether a SHAKEN solution operates as an on-premises or a hosted solution, the private key should be secured using strong encryption.
How certificates are obtained and managed
Service providers need their private key to remain secret, but with certificates and public keys, service providers face the opposite problem: They need to make sure anyone can access their certificate and public key at any time.
Before a service provider can obtain a certificate, the service provider must be authorized by the policy administrator.
The SHAKEN policy administrator (PA) maintains the integrity of SHAKEN certificates by evaluating and authorizing entities that would like to request or issue certificates. No one can receive or create certificates without approval from the PA. This prevents people who shouldn’t be signing calls from getting a certificate and stops certificate authorities from issuing bad certificates.
The PA acts as the trust anchor in the SHAKEN ecosystem. As long as the PA can be trusted, then all entities participating in SHAKEN can be trusted.
To be authorized by the PA, the service provider must request from the PA a service provider code token. This token contains the service provider’s operating company number (OCN) or service provider identifier (SPID). If the PA validates the service provider and approves the request, the PA returns to the service provider a token with the service provider’s SPID. Now the service provider is authorized to request certificates.
To obtain the certificate, the service provider creates a certificate signing request (CSR) and sends it with the token to the certificate authority. If the certificate authority approves the request, it creates the certificate, signs it, and issues it to the service provider.
All certificate authorities must be authorized by the PA to issue SHAKEN certificates. The PA maintains an up-to-date list of all authorized certificate issuers, and this list is made available to all service providers.
During the SHAKEN verification process, the terminating service provider checks that the originating service provider’s certificate was created by a PA-approved certificate authority.
Information contained in a certificate
The SHAKEN certificate contains identifying information about the service provider that owns the certificate and the certificate authority that issued the certificate. The certificate asserts the service provider’s public key and SPID and declares a valid duration for the certificate. Here is an example with a few key fields highlighted:
"organization": "TransNexus, Inc.",
"organization": "TransNexus, Inc.",
This certificate is valid for exactly 10 minutes according to the “notBefore” and “notAfter” fields.
The “issuer” field identifies the issuing certificate authority, and the “subject” field identifies the service provider. In this example, they are the same.
The “tnAuthList” contains the service provider’s SPID. For a normal use case, the tnAuthList only contains a SPID, and the certificate can be used to verify signed phone calls with any calling number. In general, all service providers have a normal SHAKEN certificate with a tnAuthList that only contains a SPID, as in this example.
Each certificate contains the “signature” of the certificate authority. The signature indicates that the certificate authority has approved all information included in the certificate. The terminating service provider is expected to verify the certificate authority’s signature.
Finally, the certificate contains the service provider’s “publicKey”, which will be used to verify signatures created by the this service provider.
Proof of Possession (PoP) certificates
There are two types of certificates in the SHAKEN framework:
- The normal SHAKEN certificate, in which the tnAuthList contains the SPID, as in the example above. These certificates can be used to verify any calling number.
- Proof of possession (PoP) certificate, in which the tnAuthList contains the SPID and either a single phone number or a list of phone numbers. PoP certificates are used to verify signed calls with this list of calling numbers.
PoP certificates are used when a service provider needs to sign a call with a calling number that they do not own. For example, PoP certificates enable calls that originate from call centers to be authenticated and verified.
Certificate access for verification
After a service provider obtains a certificate, the certificate is placed into the service provider’s certificate repository. To perform caller ID verification, the terminating service provider makes an external query to the certificate repository of the originating service provider.
The amount of time it takes to fetch and receive a certificate is called the latency, or round-trip time.11 The latency depends on how far away the request server is from the server that provides the resource.
For reference, the latency (round-trip time) between any two servers in the US is typically less than 100ms. As an example, the latency between a server in Palo Alto, California and Charlottesville, Virginia was measured at 75ms.12
Decreasing latency helps to minimize post-dial delay so calls can be completed as quickly as possible.
To reduce latency during call verification, it is important to understand the verification steps that contribute to latency and solutions to minimize it.
We can help you get ready for SHAKEN with specific strategies tailored to your network and call scenarios. Learn more
Use certificate caching to minimize latency
Fetching a certificate from a certificate repository is an external query. This makes accessing certificates one of the largest sources of latency in the entire call verification process.
The verification service can decrease latency by saving a copy of the certificate in a cache, which is a section of the server’s memory used to store data temporarily for future use.
The first time that a terminating provider’s verification service fetches an originating provider’s certificate, the verification service requests and receives the certificate from the originating provider’s certificate repository. The verification service then stores a copy of the certificate in a local cache.
The next time that the terminating provider’s verification service needs to fetch the same originating provider’s certificate, the verification service checks local cache for a valid certificate.13 Calls are processed much faster with certificate caching.
In the diagram, the verification service fetches a certificate from the certificate repository. The certificate repository is stored on a single server called the origin server. The blue arrows show requesting and receiving a certificate the first time, and the red arrows show subsequent certificate requests.
The verification service stores a copy of the certificate in a local cache, but that copy is only useful as long as the original certificate remains unchanged and in effect.
When a certificate changes or expires, the verification service ignores the local cache and fetches an updated certificate from the certificate repository. There are different methods the verification service uses to determine if the certificate has changed.
One method for the verification service to know when to fetch updated certificates is using expiration dates. When the certificate repository sends the certificate to the verification service, the certificate repository can also send an expiration date to indicate how long the cached certificate should be valid.14
Another method involves the originating provider explicitly signaling to the terminating provider when the certificate has been updated. Originating providers must include the URL of the certificate repository within the signed SIP Identity header. An originating provider can signal that their certificate has changed by modifying the URL of their certificate repository.
Old certificate: info=https://certificates.clearip.com/4a871c06-e0b5-4cb9-8347-43126bd86c85/4381.crt
Updated certificate: info=<https://certificates.clearip.com/4a871c06-e0b5-4cb9-8347-43126bd86c85/1234.crt>
The old copy of the certificate is erased from the local cache whenever a newer copy is received.
Certificate caching becomes less beneficial the more frequently the certificate changes, but frequent certificate updates are useful for risk management.
The certificate management service stores certificates into a service provider’s certificate repository for other service providers to access while verifying calls. But how should a service provider determine whether a certificate is valid or not?
Certificates include a detailed validity period and a verifiable certificate authority signature. That’s enough proof to validate a certificate in many cases. But not all.
In rare but severe cases there can be external factors not listed in a certificate that make a certificate invalid. If someone managed to discover another’s private key, the victim would immediately want to invalidate the matching certificate. Or if a PA-approved certificate authority is discovered to be issuing bad certificates, then it is essential to know which certificates are affected.
There are two ways to avoid potential problems with compromised certificates:
- Check the real-time status of a certificate
- Use short-lived certificates.15
World wide web approach: CRL and OCSP
The world wide web solution to this issue was to develop the certificate revocation list (CRL) and the online certificate status protocol (OCSP).16
When validating a certificate, the service provider’s verification service would request a CRL from the certificate authority. The CRL is typically a large file containing a blacklist of compromised certificates.17 After obtaining the CRL, the verification service scans through the file and searches for the specific certificate in question. If a certificate is found in the CRL, then the certificate is invalid.
Alternatively, the verification service could use OCSP to ask the certificate authority about the status of a specific certificate.18
Using CRLs and OCSP are good ideas in theory, but in practice they require significant time and resources, which can hinder performance.
Both of these approaches require communication with an external server during the verification of each call, adding latency. The median latency of OCSP queries was found to be 20 ms in 2016.19 This latency would increase call verification time and post-dial delay.
In reality, the vast majority of certificates will not become compromised, yet verification times for both compromised and uncompromised certificates would increase due to lookups for CRLs or OCSP.
Implementing CRLs or OCSP is not ideal for service providers who demand high-performance verification services with low verification times.
Fortunately, there’s a better way.
The SHAKEN approach: short-lived certificates
An alternative to CRLs and OCSP is to use short-lived certificates. These allow faster verification by removing the need for CRL or OCSP lookups. The SHAKEN standard has not decreed specific lifespan for short-lived certificates, but the objective would be to make them short enough that compromised certificates would expire before they could be exploited.
As an example, the CMS could create short-lived certificates by generating new keys and request certificates every 10 minutes.
Short-lived certificates improve risk management. In the event that a bad actor discovers someone’s private key, that bad actor would have up to 10 minutes to sign calls with a forged signature before the certificate expires.
In those short 10 minutes, most service providers would not even have the opportunity to figure out that their private key had been compromised and then notify their certificate authority. Thus, short-lived certificates render CRLs and OCSP unnecessary.
Short-lived certificates shave off milliseconds from certificate verification processing times compared to CRLs and OCSP while providing security guarantees that are equivalent to or better than OCSP, according to researchers at Stanford and Carnegie Mellon.20
For high-volume service providers, short-lived certificates are essential due to their lower latency and uncompromised security standards.
But wouldn’t the costs of obtaining certificates also increase notably if certificates are generated so frequently?
Not necessarily. Without CRLs and OCSP, the costs paid by the certificate authority to create and maintain certificates decrease significantly by eliminating the need to set up hardware and software infrastructure, maintain and administer certificate management policies, manage and update certificate statuses, perform audits, and train personnel.21 By removing many supporting costs, short-lived certificates can be offered at a considerably reduced per-certificate cost compared to long-lived certificates.
Short-lived certificates are a better solution for SHAKEN certificate validation because of their lower verification latency, high security and lower cost.
Certificate repository design
The originating service provider is responsible for setting up their certificate repository so that a certificate is quickly available for terminating service providers who need to verify calls authenticated by the originating provider.
Minimize latency with a Content Delivery Network (CDN)
Content delivery networks (CDN) decrease the latency required to access the certificate repository by bringing the certificate repository closer to the users.
To do this, the originating provider stores the certificate repository on an origin server. Then the CDN distributes copies of the certificate repository across geographic locations.22 At each location, copies are stored in several dedicated cache servers.
For example, without a CDN, if a terminating service provider in Georgia needs to access the certificate repository, then the verification service must request the certificate from the only available server, which could be across the country in California.
With a CDN, if a terminating service provider in Georgia needs to access the certificate repository, then the verification service requests the certificate from the closest available cache server in nearby Alabama, reducing latency.
CDNs are most beneficial when the end user is located far from the origin server. A study by KeyCDN found that implementing CDNs decreased latency between servers in the US by a little over 50%.23 The origin server was located in Texas, and the latency was measured between the origin server and servers in New York and California.
|Server location||No CNS RTT Latency (ms)||CDN RTT latency (ms)||Difference (%)|
|New York, US||36.908||18.096||-50.97%|
|San Francisco, US||39.645||18.900||-52.33%|
The implementation of a CDN is a choice of the originating service provider, yet the higher latency from a lack of a CDN affect both the originating provider and terminating provider’s services since eventually each call must be verified.
Even if the terminating provider has implemented certificate caching and short-lived certificates, latency for call verification can be compromised by the originating provider’s lack of a CDN, which is unfortunately beyond the control of the terminating provider.
To address this concern, service providers are encouraged to conduct interoperability SHAKEN testing with other providers and measure signing and verification latency.
Maximize high availability with a Content Delivery Network (CDN)
Any server can experience unintended downtime caused by a variety of factors such as large traffic volume, network unavailability, power outages, malicious attacks, etc.24 Downtime leads to unavailable services, which can cause financial losses and harm brand reputation.
Hosting the certificate repository on highly-available CDNs minimizes downtime by load balancing across multiple servers and performing failover if a server becomes unavailable.
CDNs typically have servers across separate geographical locations. Each location has a data center with multiple servers. The CDN uses load balancing to distribute requests across the available servers.25 Load balancing allows the CDN to efficiently handle large spikes of traffic.
If a server becomes unavailable, then failover instantly redirects traffic to available servers in the data center to provide uninterrupted service. When the unavailable server is brought back online, it receives traffic again to remove the increased load from the other servers.
If an entire data center becomes unavailable, then certificate access requests would be sent to the nearest available data center, and the increased load would be similarly distributed. The geographic distribution of data centers offers resilience to potential problems.
Redundancy from multiple servers and data centers ensures reliable service and resiliency. CDNs are designed to smoothly handle millions of server requests, real-time content delivery and thousands of concurrent downloads without putting any major stress on computing resources.26
This large capacity allows CDNs to mitigate distributed denial-of-service (DDoS) attacks, which are malicious attempts to disrupt service by overwhelming the target with a flood of traffic.27 High-capacity CDNs can absorb and diffuse DDoS attacks across servers.
STIR and SHAKEN provide many benefits to service providers and consumers and promise to rebuild trust in the telephone network. However, in achieving those benefits, service providers bear the responsibility of implementing SHAKEN solutions that not only provide basic requirements, but also minimize possible added risks such as security flaws or longer post-dial delays.
To evaluate a SHAKEN solution, a service provider should consider how the private key is kept secure, how latencies related to accessing certificate repositories or related to certificate validation are minimized, and how the certificate repository is made highly available to minimize downtime.
An effective SHAKEN solution should implement envelope encryption to protect the private key, certificate caching and short-lived certificates to reduce verification latencies, and host the certificate repository on a content delivery network to further reduce verification latency and provide high availability.
1. First Orion, “Scam Call Trends and Projections Report,” September 2018, accessed October 30, 2018. https://ecfsapi.fcc.gov/file/109272058817712/FirstOrion_Scam_Trends_Report_FINAL%20(002)%20(002).pdf
2. YouMail, “Historical Robocalls by Time,” accessed October 30, 2018. https://robocallindex.com/history/time
3. CRTC, “Measures to reduce caller identification spoofing and to determine the origins of nuisance calls,” January 2018, accessed October 30, 2018. https://crtc.gc.ca/eng/archive/2018/2018-32.htm
4. Federal Communications Commission, “Chairman Pai Welcomes Call Authentication Recommendations From The North American Numbering Council,” May 14, 2018, accessed October 30, 2018. https://www.fcc.gov/document/chairman-pai-welcomes-call-authentication-framework
5. ATIS, “Joint ATIS/SIP Forum Standard – Signature-Based Handling of Asserted Information using toKENs (SHAKEN): Governance Model and Certificate Management,” July 2017, accessed October 30, 2018. http://www.atis.org/sti-ga/resources/docs/ATIS-1000080.pdf
6. Congressional Research Service, Chris Jaikaran, “Encryption: Frequently Asked Questions,” September 2016, accessed October 30, 2018. https://fas.org/sgp/crs/misc/R44642.pdf
7. Nilay Parikh, “Cloud Architecture Pattern: Envelope Encryption (or Digital Envelope) with Public Cloud Providers – Part 1,” June 2018, accessed October 30, 2018. https://blog.nilayparikh.com/security/application/cloud-architecture-patterns-envelope-encryption-or-digital-envelope-with-public-cloud-providers-part-1/
8. Amazon Web Services, Ken Beer, et al., “AWS Key Management Service Cryptographic Details,” August 2018, accessed October 30, 2018. https://d1.awsstatic.com/whitepapers/KMS-Cryptographic-Details.pdf
9. Cryptomathic, Peter Smirnoff, “Understanding Hardware Security Modules (HSMs),” September 2017, accessed October 30, 2018. https://www.cryptomathic.com/news-events/blog/understanding-hardware-security-modules-hsms
10. Thawte, Larry Seltzer, “Securing Your Private Keys as Best Practice for Code Signing Certificates,” June 2010, accessed October 30, 2018. https://www.thawte.com/code-signing/whitepaper/best-practices-for-code-signing-certificates.pdf
11. Radware, Joshua Bixby, “Latency 101: What is latency and why is it such a big deal?” April 2012, accessed October 30, 2018. http://www.webperformancetoday.com/2012/04/02/latency-101-what-is-latency-and-why-is-it-such-a-big-deal/
12. Rafay Systems, John Dilley, “How Much Does Network Latency Really Matter?” April 2018, accessed October 30, 2018. https://www.linkedin.com/pulse/how-much-does-network-latency-really-matter-john-dilley/
13. Varvy, Patrick Sexton, “Leverage browser caching,” March 2016, accessed October 30, 2018. https://varvy.com/pagespeed/leverage-browser-caching.html
14. AT&T, “Cache Expiration,” accessed October 30, 2018. https://developer.att.com/video-optimizer/docs/best-practices/cache-expiration
15. Internet Engineering Task Force, Jon Peterson, “Short-Lived Certificates for Secure Telephone Identity draft-peterson-stir-certificates-shortlived-02.txt,” March 2018, accessed October 30, 2018. https://tools.ietf.org/html/draft-peterson-stir-certificates-shortlived-02
16. Internet Engineering Task Force, Jon Peterson and Sean Turner, “OCSP Usage for Secure Telephone Identity Certificates draft-ietf-stir-certificates-ocsp-00.txt,” March 2017, accessed October 30, 2018. https://tools.ietf.org/html/draft-ietf-stir-certificates-ocsp-00
17. KeyCDN, “What Is a Certificate Revocation List (CRL)?” October 2018, accessed October 30, 2018. https://www.keycdn.com/support/certificate-revocation-list
18. Medium, Alexey Samoshkin, “SSL certificate revocation and how it is broken in practice,” January 2018, accessed October 30, 2018. https://medium.com/@alexeysamoshkin/how-ssl-certificate-revocation-is-broken-in-practice-af3b63b9cb3
19. Liang Zhu, et al., “Measuring the Latency and Pervasiveness of TLS Certificate Revocation,” Passive and Active Measurement: 17th International Conference, PAM 2016, Heraklion, Greece, April 2016, accessed October 30, 2018. https://www.springer.com/cda/content/document/cda_downloaddocument/9783319305042-c2.pdf?SGWID=0-0-45-1554680-p179872956
20. Emin Topalovic, et al., “Towards Short-Lived Certificates,” Proceedings of IEEE Oakland Web 2.0 Security and Privacy, May 2012, accessed October 30, 2018. http://www.ieee-security.org/TC/W2SP/2012/papers/w2sp12-final9.pdf
21. Symantec, “Comparing Cost of Ownership: Symantec Managed PKI Service vs. OnPremise Software,” April 2012, accessed October 30, 2018. https://www.symantec.com/content/en/us/enterprise/white_papers/b-comparing-cost-of-ownership_WP_21172470.en-us.pdf
22. Imperva Incapsula, “What is a CDN?” October 2015, accessed October 30, 2018. https://www.incapsula.com/cdn-guide/what-is-cdn-how-it-works.html
23. KeyCDN, Brian Jackson, “Website Latency With and Without a Content Delivery Network,” September 2016, accessed October 30, 2018. https://www.keycdn.com/blog/website-latency
24. phoenixNAP, Goran Jevtic, “What is High Availability? Ultimate Master Guide,” January 2018, accessed October 30, 2018. https://phoenixnap.com/blog/what-is-high-availability
25. Cloudflare, “CDN Reliability & Redundancy,” accessed October 30, 2018. https://www.cloudflare.com/learning/cdn/cdn-load-balance-reliability/
26. KeyCDN, Brian Jackson, “CDN Hosting vs Traditional Web Hosting,” February 2018, accessed October 30, 2018. https://www.keycdn.com/blog/content-delivery-networks
27. Cloudflare, “What is a DDoS Attack?” accessed October 30, 2018. https://www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/