How to Setup and Run Jaeger With Docker and Docker Compose

Step-by-step guide on how to run Jaeger locally using Docker or Docker Compose, expose the UI, and send traces using OpenTelemetry (OTLP).

Jaeger is an open-source distributed tracing system that helps you monitor and troubleshoot requests as they flow through a microservices architecture. In simple terms, Jaeger shows you where time is spent in a request and which services were involved.

Distributed tracing is crucial in microservices environments because it allows you to:

  • Identify performance bottlenecks
  • Debug and troubleshoot issues across services
  • Understand the flow of requests through your system
  • Optimize your application’s overall performance

Jaeger’s compatibility with OpenTelemetry — a collection of tools, APIs, and SDKs for instrumenting, generating, collecting, and exporting telemetry data — makes it an even more powerful choice for developers. This compatibility ensures that you can easily integrate Jaeger with a wide range of applications and services.

While Jaeger does add some overhead to your application, it’s designed to be lightweight. The impact can be minimized through proper sampling strategies and configuration. In most cases, the performance impact is negligible compared to the benefits of distributed tracing.

With Jaeger you can:

  • Monitor and troubleshoot distributed workflows
  • Identify performance bottlenecks
  • Track down root causes
  • Analyze service dependencies

How is Jaeger different from other tracing systems like Zipkin?

Jaeger and Zipkin are both open-source distributed tracing systems, but Jaeger offers more advanced features like adaptive sampling and a more scalable architecture. Jaeger also has better support for OpenTelemetry, making it more future-proof.

This guide will walk you through the process of implementing Jaeger, from setup to advanced usage.

Setting up Jaeger (all-in-one)

Jaeger provides an all-in-one container image (jaegertracing/all-in-one) which bundles the collector, query, and in-memory storage. This is perfect for local development and demos (not production).

You need to have Docker installed on your system. Confirm that it is working using this command:

1
2
3
$ docker --version

Docker version 27.4.1, build b9d17ea

Pull the Jaeger All-in-One image:

1
docker pull jaegertracing/all-in-one:1.57

Run the Jaeger All-in-One container:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
docker run --rm --name jaeger \
  -p 16686:16686 \
  -p 14268:14268 \
  -p 14250:14250 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.57

You can use Docker Compose to achieve the same:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
version: "3.9"
services:
  jaeger:
    image: jaegertracing/all-in-one:1.57
    container_name: jaeger
    ports:
      - "16686:16686" # Jaeger UI
      - "14268:14268" # Collector HTTP (Jaeger Thrift)
      - "14250:14250" # Collector gRPC (Jaeger)
      - "4317:4317" # OpenTelemetry gRPC
      - "4318:4318" # OpenTelemetry HTTP
      - "6831:6831/udp" # Jaeger agent (compact thrift)
      - "6832:6832/udp" # Jaeger agent (binary thrift)
      - "9411:9411" # Zipkin compatibility
    restart: unless-stopped

This runs the all-in-one configuration of Jaeger that combines collector and query components in a single process and uses a transient in-memory storage for trace data with all necessary ports exposed.

  • Access the Jaeger UI by opening a web browser and navigating to http://localhost:16686.
  • Verify the installation by sending test spans. You can use the Jaeger client libraries or OpenTelemetry SDK to instrument a simple application and generate traces.

Jaeger User Interface

To access the Jaeger UI, navigate to http://localhost:16686 in your web browser. The UI provides a comprehensive view of the tracing data, including:

  • Service Map: A visual representation of the services and their interactions.
  • Tracing: A detailed view of individual traces, including the request and response data.
  • Span: A detailed view of individual spans, including the request and response data.
  • Query: A search interface for querying tracing data.

Tip: After you send a few spans, select your service in the dropdown and click Find Traces.

Ports exposed by the Jaeger all-in-one container

The all-in-one container exposes several ports. The most commonly used ones are:

  • 16686/tcp: Jaeger UI (Query service)
  • 4317/tcp: OTLP gRPC receiver (OpenTelemetry)
  • 4318/tcp: OTLP HTTP receiver (OpenTelemetry)
  • 14268/tcp: Jaeger collector HTTP (Thrift)
  • 14250/tcp: Jaeger collector gRPC
  • 6831/udp and 6832/udp: Jaeger agent receivers (legacy Jaeger clients)
  • 9411/tcp: Zipkin compatibility endpoint

If you’re using OpenTelemetry, you’ll typically send spans to 4317 or 4318 and view traces in the UI on 16686.

Jaeger Architecture

Jaeger’s architecture consists of several key components:

  • Client Libraries: These libraries are used to instrument your application code. They create spans and send them to the Jaeger Agent.
  • Agent: A network daemon that listens for spans sent by the client libraries. It batches and sends them to the Collector.
  • Collector: Receives traces from the Agent and runs them through a processing pipeline. It then stores them in a storage backend.
  • Query: A service that retrieves traces from storage and hosts a UI to display them.
  • UI: A web interface for searching and analyzing traces.

Data flows from your instrumented application through the client libraries to the Agent, then to the Collector, and finally to storage. The Query service retrieves this data from storage to display in the UI.

Jaeger supports multiple storage options, including Cassandra, Elasticsearch, and in-memory storage (for development). The choice of storage depends on your scalability needs and existing infrastructure.

Sampling plays a crucial role in Jaeger’s architecture. It allows you to control the amount of tracing data you collect, which is essential for managing performance and storage costs in high-traffic systems.

To instrument your application, prefer using the OpenTelemetry SDK and export spans to Jaeger via OTLP.

Example (Python) exporting spans to Jaeger over OTLP gRPC (localhost:4317):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

resource = Resource.create({"service.name": "my-service"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("TestSpan") as span:
    span.set_attribute("hello", "world")
    # Your code here

You should then see my-service in the Jaeger UI.

Note: to run this example, install dependencies:

1
pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc

Advanced Instrumentation Techniques

These are some advanced techniques that you can instrument with jaeger:

  • Custom Samplers: Create samplers that make intelligent decisions about which traces to sample based on your specific needs.
  • Baggage: Use baggage to pass data along the entire trace, which can be useful for correlating information across services.
  • Multiple Spans: Create and manage multiple spans within a single trace to represent different operations or sub-operations.
  • Logging Integration: Integrate Jaeger with your logging system to enhance debugging capabilities.

Jaeger in a Production Environment

When deploying Jaeger in production, consider the following:

  • Scalability: Each component (Agent, Collector, Query) can be scaled independently. Use load balancers to distribute traffic.
  • Storage: Implement a production-ready storage backend like Elasticsearch or Cassandra. Ensure proper sizing and configuration for your expected data volume.
  • Security: Set up secure communication between Jaeger components using TLS. Implement authentication and authorization for the Jaeger UI.
  • Sampling: Implement appropriate sampling strategies based on your traffic patterns and tracing needs. Dynamic sampling can help balance data collection and system performance.

Analyzing Traces with Jaeger UI

The Jaeger UI provides powerful tools for analyzing traces:

  • Use the search functionality to find relevant traces based on service, operation, tags, or duration.
  • Examine the trace timeline to understand the relationships between spans and identify long-running operations.
  • Inspect span details, including tags and logs, to gather context about each operation.
  • Use the comparison view to analyze multiple traces side by side and identify patterns or anomalies.

Best Practices for Effective Jaeger Implementation

To get the most out of Jaeger:

  • Follow consistent naming conventions for services and operations to make searching and filtering easier.
  • Implement appropriate sampling strategies to balance data collection and system performance.
  • Ensure proper error handling and logging within instrumented code to provide context for issues.
  • Regularly review and optimize your tracing implementation to ensure it continues to meet your needs as your system evolves.

Service Performance Monitoring (SPM) with Jaeger

Service Performance Monitoring (SPM) is a feature in Jaeger that provides service-level and operation-level aggregation of key metrics like request rates, error rates, and durations (latencies). It helps identify interesting traces (e.g. high QPS, slow or erroneous requests) without needing to know the service or operation names up-front.

How does SPM work?

SPM works by aggregating span data from Jaeger to produce RED (Request, Error, Duration) metrics. The OpenTelemetry Collector’s Span Metrics Processor is used to generate these metrics from the incoming traces.

Key features of SPM

  • Service-level and operation-level aggregation of request rates, error rates, and latencies (P95, P75, P50)
  • “Impact” metric computed as the product of latency and request rate to highlight high-impact operations
  • Pre-populated Trace search with relevant service, operation and lookback period for interesting traces

Accessing SPM in Jaeger

The SPM feature can be accessed from the “Monitor” tab in the Jaeger UI. It requires Jaeger 1.57 or later.

Limitations

SPM is still an experimental feature and may have further changes in the future.

It typically requires an OpenTelemetry Collector configured with the span metrics processor, and a metrics backend to store/query the generated metrics.

Conclusion

Jaeger is a powerful open-source tool for distributed tracing in microservices architectures. Proper setup and instrumentation are crucial for effective use of Jaeger. Understanding Jaeger’s architecture helps in optimizing its deployment. Analyzing traces through the Jaeger UI can significantly improve system performance and debugging capabilities.

comments powered by Disqus
Citizix Ltd
Built with Hugo
Theme Stack designed by Jimmy