Home » OpenSearch and Fluent Bit Integration for Centralized ECS Logging

OpenSearch and Fluent Bit Integration for Centralized ECS Logging

    To improve observability, reliability, and operational efficiency, we
    implemented a centralized logging solution using Amazon OpenSearch
    Service and Fluent Bit across both staging and production environments.
    This solution was aimed at aggregating logs from all ECS services in real
    time, enabling developers to analyze logs, troubleshoot issues quickly, and
    maintain security boundaries.

    Problem Statement and Existing Architecture

    The existing CloudWatch logs lacked consistency and meaningful structure, making them hard to interpret during system failures or debugging sessions. CloudWatch query interface made it difficult to perform complex searches or filter logs efficiently across multiple ECS services. In addition logs were scattered across different ECS tasks and services, with no unified view.

    Each developer had to have an account in AWS in order to view application logs in CloudWatch. Consequently, this posed a security risk, as it required a minimum number of users to have console access to AWS.

    Objective

    The primary objective was to build a scalable and secure log aggregation
    system that:
    – Collects logs directly from ECS services.
    – Pushes logs to OpenSearch with minimal overhead.
    – Enables developers to visualize and query logs via OpenSearch
    Dashboards.
    – Supports fine-grained access control without exposing AWS infrastructure.
    – Automate index lifecycle policies in OpenSearch.

    Solution Architecture

    We deployed Fluent Bit as a sidecar ECS service (in DAEMON mode) on EC2
    launch type clusters. Each ECS node runs Fluent Bit, which listens for logs
    from application containers using the `fluentd` log driver. These logs are
    forwarded to an OpenSearch domain hosted in AWS. Log data is indexed by Fluent Bit in a time-series format using the Logstash-compatible format.

    Migrating to OpenSearch enabled us to create users and roles in OpenSearch that didn’t require access to AWS. These users outside of AWS, could access application logs from OpenSearch view not necessary needing access to AWS console or even they didn’t need AWS account

    We implemented Index Lifecycle Policies automation, providing a framework for managing indices with minimal manual intervention. These policies optimize data management throughout an index’s life, enhance efficiency, reduce operational costs, and maintain compliance with data governance, streamlining data management processes.

    Implementation Steps

    1. Created Fluent Bit Docker image with a custom configuration for AWS
    OpenSearch output.
    2. Pushed the image to Amazon ECR for use in ECS task definitions.
    3. Defined ECS services using Terraform to run Fluent Bit in DAEMON mode
    with proper CPU/memory settings.
    4. Configured IAM roles and domain policies to securely allow Fluent Bit to
    write to OpenSearch.
    5. Deployed OpenSearch domains in both staging and production with fine-grained access control enabled.
    6. Set up OpenSearch Dashboards for developer access.
    7. Created and documented dev-team-role and admin roles for secured
    access.

    Access and Permissions

    To ensure robust security and proper segregation of duties, specific IAM roles were meticulously configured for the logging solution:

    Fluent Bit Role: A dedicated fluentbit-role was created to push logs to OpenSearch.

    Developer Team Role: A dev-team-role was established and mapped within OpenSearch. This role provides development team members with read-only access to the logs and the necessary permissions to create and manage dashboards, allowing them to monitor application behavior without altering the underlying data or configuration. Least privileged access to the OpenSearch console was given as a core security best practice, ensuring that team members could only access the functionalities required for their specific tasks.

    Internal Users: For both staging and production environments, internal users were created directly within OpenSearch. These users were assigned roles that provided granular access to specific indices and functionalities, further enforcing the principle of least privilege.

    This layered approach to access and permissions ensures that while logs are seamlessly collected and accessible, the security posture of the OpenSearch domain remains strong and compliant with best practices.

    Benefits

    Real-time log ingestion from ECS.
    – Centralized view of logs across all services.
    – Role-based access control for development and operations teams.
    – Improved developer productivity when debugging complex production incidents
    – No dependency on CloudWatch Logs.
    – Automatically deletes or shrinks old indexes, reducing search latency.
    – Scalable and cost-effective observability solution using native AWS services.

    Leave a Reply

    Your email address will not be published. Required fields are marked *