# How to Troubleshoot CoreDNS Issues in DOKS Clusters DOKS uses [CoreDNS](https://coredns.io/) for in-cluster DNS management. This article provides steps to diagnose CoreDNS issues, collect diagnostic information, and determine whether the problem is related to configuration, resource constraints, or network connectivity. ## Common Symptoms of CoreDNS Issues Common symptoms of CoreDNS issues include: - Pods that cannot resolve external domain names. - Pods that cannot resolve internal Kubernetes service names. - Intermittent DNS resolution failures or timeouts. - Application logs showing DNS lookup errors. - CoreDNS pods showing high CPU or memory utilization. - CoreDNS pods restarting frequently. ## Diagnostic Checks Run DNS resolution tests to help identify the type of CoreDNS issue. Start by testing internal DNS resolution from within your cluster: ```shell kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup kubernetes.default ``` A successful lookup returns the IP address of the Kubernetes API service. If this command times out or returns an error like `server can't find kubernetes.default`, CoreDNS cannot resolve internal service names. Next, test external DNS resolution: ```shell kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup google.com ``` A successful lookup returns IP addresses for the external domain. If the lookup fails, CoreDNS cannot resolve external domains. The combined results help narrow down the issue type: - **Internal DNS fails, external succeeds:** The `kubernetes` plugin cannot query the Kubernetes API, or network policies block pod-to-CoreDNS communication. - **Both internal and external DNS fail:** CoreDNS cannot reach upstream DNS servers, or the CoreDNS pods are not running. - **Both tests succeed:** If you’re still experiencing DNS issues, the problem may be intermittent, specific to certain pods or services, or related to DNS performance rather than complete failure. Proceed to gather diagnostic information with particular focus on identifying which pods are affected and when the issues occur. ## Gather Diagnostic Information Gather detailed information about your cluster, CoreDNS pods, and configuration to help identify the root cause and provide context for troubleshooting or support requests. ### Establish the Incident Timeline and Scope Record the times (in UTC) when DNS resolution issues began and ended. Accurate timestamps are important for correlating logs and metrics across different systems during troubleshooting. Note the cluster’s ID, name, and datacenter region by going to the **Settings** tab in the control panel, or by running: ```shell kubectl cluster-info ``` Document whether the issue affects all pods, specific deployments, or particular node pools. ### Gather Application Pod Information Identify which pods are experiencing CoreDNS issues and their node locations. This helps determine whether the problem is isolated to specific nodes or affects the entire cluster. List your application pods and note which nodes they’re running on: ```shell kubectl get pods -n -o wide ``` List CoreDNS pods and note their node locations: ```shell kubectl get pods -n=kube-system -o wide | grep -i coredns ``` If affected application pods are all on the same node as a specific CoreDNS pod, the issue may be node-specific rather than cluster-wide. ### Collect CoreDNS Pod Status and Logs Verify that CoreDNS pods are running: ```shell kubectl get pods -n=kube-system -l=k8s-app=kube-dns ``` Healthy CoreDNS pods show `STATUS: Running` and `READY: 1/1`. If pods show `CrashLoopBackOff`, `Pending`, or frequent restarts, CoreDNS cannot serve DNS queries. Collect logs from all CoreDNS containers with timestamps: ```shell kubectl logs --timestamps -l=k8s-app=kube-dns --all-containers=true -n=kube-system ``` When reviewing logs, note error patterns such as: - `timeout` or `i/o timeout`: Upstream DNS servers unreachable. - `SERVFAIL`: Upstream DNS server cannot resolve query. - `read: connection refused`: Cannot reach upstream DNS servers. Check for CoreDNS-related events: ```shell kubectl get events -n=kube-system ``` If you forward logs to external systems like OpenSearch or Loki, retrieve CoreDNS logs from those systems for the incident time frame. ### Analyze Resource Utilization Check the current resource allocation and usage for CoreDNS pods: ```shell kubectl top pods -n=kube-system -l=k8s-app=kube-dns ``` If usage is consistently above 80-90% of limits, CoreDNS is resource-constrained. Check for `OOMKilled` or eviction events: ```shell kubectl get events -n=kube-system | grep -i "oom\|evict\|memory" ``` If you have monitoring tools like Prometheus and Grafana, review metrics for CoreDNS pods covering at least 30 minutes before and after the incident. If resource constraints are identified, see [How can I improve the performance of cluster DNS?](https://docs.digitalocean.com/support/how-can-i-improve-the-performance-of-cluster-dns/index.html.md) for scaling strategies. ### Review CoreDNS Configuration View your CoreDNS deployment details: ```shell kubectl describe deployment coredns -n=kube-system ``` Check for configuration issues that can cause DNS failures, such as insufficient replicas (default is 2) or image version mismatches with your Kubernetes version. View the CoreDNS ConfigMap: ```shell kubectl get configmap -n=kube-system coredns -o yaml ``` Check for a custom CoreDNS configuration: ```shell kubectl get configmap -n=kube-system coredns-custom -o yaml ``` Review these ConfigMaps for potential configuration issues, such as custom upstream DNS servers pointing to incorrect addresses. For more information, see [How to Customize CoreDNS in DOKS](https://docs.digitalocean.com/products/kubernetes/how-to/customize-coredns/index.html.md). Check for pod-level DNS overrides: ```shell kubectl get pod -n -o yaml | grep -A 10 "dnsPolicy\|dnsConfig" ``` If `dnsPolicy` is set to `None` or `Default`, the pod bypasses CoreDNS and uses the node’s DNS resolver. ### Investigate Node Conditions Check if nodes are using shared or dedicated CPU Droplets: ```shell kubectl get nodes --show-labels | grep node.kubernetes.io/instance-type ``` Shared Droplets can experience CPU steal during high load, causing intermittent CoreDNS slowness. Check if nodes show pressure or have a false `Ready` condition: ```shell kubectl describe nodes ``` ## Common Issues and Solutions Refer to the table below to match your symptoms with common CoreDNS issues and recommended solutions. | Issue | Symptoms | Solution | |---|---|---| | **Resource constraints** | High CPU/memory usage, slow resolution, timeouts | Scale horizontally (add replicas) or vertically (increase limits). See [DNS performance guide](https://docs.digitalocean.com/support/how-can-i-improve-the-performance-of-cluster-dns/index.html.md) | | **External DNS fails** | External domains don’t resolve, internal works | Check upstream DNS configuration, verify network connectivity to DigitalOcean DNS | | **Internal DNS fails** | Kubernetes services don’t resolve, external works | Check `kubernetes` plugin configuration, verify cluster domain (default: `cluster.local`) | | **High query rate** | CoreDNS at resource limits, high request rate | Enable NodeLocal DNSCache. See [DNS performance guide](https://docs.digitalocean.com/support/how-can-i-improve-the-performance-of-cluster-dns/index.html.md) | | **Frequent restarts** | `OOMKilled` events, high restart count | Increase memory limits or enable caching | If you’re unable to resolve the issue, [open a support ticket](https://cloud.digitalocean.com/support/tickets/new) and provide the details gathered in the previous steps: - Incident timeline (start/end times in UTC) - Cluster ID and region - Symptoms observed (internal/external resolution failures) - CoreDNS pod logs (attach full log file) - CoreDNS ConfigMap configuration - Node and pod information (affected nodes, resource utilization) - Any custom CoreDNS configuration or pod-level DNS overrides - Monitoring data (if available) ## Related Topics [How can I improve the performance of cluster DNS?](https://docs.digitalocean.com/support/how-can-i-improve-the-performance-of-cluster-dns/index.html.md): Enable DNS caching, use non-shared machine types for the cluster, and scale out or reduce DNS traffic. [Why can't my VPC-native pods connect to my Droplets?](https://docs.digitalocean.com/support/why-cant-my-vpc-native-pods-connect-to-my-droplets/index.html.md): For Droplets created before 2 October 2024, you must manually add VPC peering routes to interconnect with VPC-native DOKS clusters [How to Troubleshoot Load Balancer Health Check Issues](https://docs.digitalocean.com/support/how-to-troubleshoot-load-balancer-health-check-issues/index.html.md): Health checks often fail due to firewalls or misconfigured backend server software.