Introduction to IDP (Internal Developer Platform)

DevOps is not about clouds, containers, Kubernetes, monitoring, service mesh, and all these tools. The main goal is to enable developers to be self-sufficient.

Managing all these tools, however, is not DevOps. SRE teams or other teams, using whatever nomenclature your company uses, are responsible for deeply understanding these tools and maintaining them.

There's no point in making an infinity of tools available to all developers because they often won't know how to use them correctly. Nor will they have the time or necessary knowledge to understand the details of Kubernetes, clouds, or to solve complex problems related to these technologies. These tools should be managed by teams specialized in operations.

The challenge: how to deal with developers without deep operational knowledge? It's impossible for everyone to know everything. Developers generally focus their efforts on writing code, while operations personnel (sometimes called DevOps) take responsibility for the next steps. The transition of these responsibilities is done with the help of pipelines, GitOps, and other techniques that integrate work between teams.

However, we should empower developers to be able to perform these tasks on their own, such as managing clusters, configuring infrastructure, deploying their applications, running continuous integrations (CI), among others. The goal is to create self-sufficient teams that combine development and operations.

Delivering a cloud account and expecting developers to solve everything on their own is not efficient. They would face a huge learning curve and make many mistakes due to lack of experience.

DevOps means creating services that other teams in the company can consume to become self-sufficient. DevOps customers are developers.

In this sense, DevOps is deeply related to creating IDPs (Internal Developer Platforms). The IDP is an internal development platform intended for use by developers, who are the consumers of this platform.

The IDP is a layer that allows teams to perform necessary operations through self-service. This platform is created and maintained by operations teams, SREs, DevOps, and security.

Instead of overloading the operations team with repetitive tasks to meet the development team's demands, we can focus them on creating reusable services that increase productivity on both sides.

With an IDP, the development team doesn't need to be blocked waiting for the operations team to solve a request. Many of these requests are recurring and could be automated or standardized. If the development team can solve these issues autonomously, we eliminate blockers and reduce the workload of the operations team.

The goal is to provide a platform that enables the complete operations lifecycle, including:

Changing desired states
Performing certain operations or actions
Converging actual state to desired state
Observing the system's actual state

Actual State

The actual state represents all resources currently running in the system. It reflects what is operating at the moment to meet business needs.

We can divide the actual state into three main categories:

Providers: These are the services and platforms we use as a system base. Examples include:
- AWS
- Azure
- Google Cloud
- On Premise
- DataDog, Dynatrace, New Relic
- Elastic
- Splunk
- Among others
Infrastructure: Represents the components that support the system, such as:
- Servers
- Clusters
- Databases
Applications: Includes the software that composes and supports the business:
- Our own applications, developed internally
- Third-party applications that integrate or complement the system

To ensure the actual state meets company objectives, it's essential to define and manage the desired state, which will guide adjustments, improvements, and convergence toward the ideal system.

Desired State

The desired state is the definition of how the actual state should be. It is created through code, manifests, configuration files, and other artifacts. However, this definition can become extremely complex, as it involves different levels of knowledge and areas of expertise.

Challenges of Desired State

Application Configuration in Kubernetes
- To define an application in Kubernetes, we need to create multiple manifests to configure resources like deployments, services, ingress, etc.
- It's also necessary to build container images and manage the entire application lifecycle.
- If the same application runs on another platform, like physical servers or another cloud, configurations can be completely different.
Infrastructure Definition in AWS (e.g., EKS)
- An EKS cluster requires specific configurations, such as: Provisioning the EKS cluster itself, node groups, configuring an Internet Gateway, VPC, subnets, among others.

These examples show that the desired state is composed of multiple interdependent pieces that need to be grouped to achieve the objective. Only people with technical knowledge about these building blocks can do this correctly.

Tools

We can't avoid having tools; we need them.

There are other groups of tools, however, for IDP, these are the most important types of tools we need to use to enable self-service.

Pipelines: Tools for automating CI/CD stages.
GitOps: To synchronize desired state with actual state.
Infrastructure: To manage infrastructure.
RBAC: For authentication and IDP security.

User Interface

The IDP is the central point that connects everything and allows developers to change desired states and observe actual state. This system's interface can take different forms:

Web: A graphical interface accessible through a browser.
CLI: A command-line interface.
IDE: Extensions or plugins integrated into development tools.

The ideal is to offer a combination of these options to meet different preferences and needs.

Simplicity and Usability

The main goal of user interfaces for IDPs is to make things easier. For this, it's essential to hide operational complexity. This means:

Abstracting individual tools and interfaces behind a unified layer;
Creating simple, custom-made interfaces that anyone can understand and use.

For all this to work, we need a robust API below the user interface. We can't depend on an interface that just executes commands directly in the CLI, because that's not enough for all cases. The API must:

Execute actions consistently;
Provide information when queried;
Be a central point for system interaction.

The challenge arises when dealing with multiple tools. Having 100 different tools means dealing with 100 distinct APIs, which increases complexity. We need a single central API that the user interface can use as a base.

Today, Kubernetes is the main candidate for this function. It offers an extensible and universal API. This doesn't mean everything needs to run on Kubernetes, but rather that it can be used as an abstraction layer to manage diverse operations.

I'm not saying everything should be running on Kubernetes; that's not the goal. Kubernetes is not trying to be the only platform you should use to run your applications, but it is trying to become the only API that matters. We can use Kubernetes not just to run containers inside a cluster, but to manage everything it was designed to be extensible and interact with almost anything.

Universal API

Kubernetes is not limited to running containers. It is a powerful universal API with a highly efficient scheduler capable of managing any type of resource. Containers represent only the first wave of adoption and are a small part of what Kubernetes can and should manage.

Additionally, with the Controller Manager, Kubernetes performs continuous reconciliation, taking on operational tasks that would normally be done manually. This ensures the actual state is always aligned with the desired state, providing system resilience and reliability.

CR and CRD: The Foundation of Universality

Custom Resources (CR) and Custom Resource Definitions (CRD) are the key to everything and the reason this API is seeking to become universal.

CRD (Custom Resource Definition): It's the definition of how a YAML manifest should be structured to describe the desired state of a resource.
CR (Custom Resource): These are objects that follow the CRD definition, representing custom resources within the cluster.

Kubernetes controllers use these definitions to transform manifests into reality, reconciling the actual state with the desired one.

We can create CRDs to manage virtually anything:

Applications;
Cloud resources;
Databases;
Monitoring specific states
Pipelines
Others

Everyone should be able to write a CRD to monitor whatever state they want.

With CRDs, it's possible to:

Define what a resource is in a standardized way;
Compose and relate different building blocks;
Expose these definitions as new custom-made resources.

This offers a unique opportunity to simplify complex processes. For example, the operations team can combine various blocks into compositions, allowing developers to create manifests based on these simplified definitions. The resulting manifests are converted into custom resources, which automatically perform necessary operations.

Putting It All Together

CRDs (Custom Resource Definitions) and CRs (Custom Resources) are the abstraction layer we need to hide irrelevant details from the general public, simplifying everything so anyone can meet their needs easily.

Integrated Pipelines

We need pipelines that execute specific actions (such as tests, image creation, etc.) automatically whenever some event occurs. The chosen CI/CD tool should be integrated with the Kubernetes API.

Ideally, the definition of these pipelines should also be based on CRDs, creating a declarative model for the workflow. Although we can build our own resources, tools like Argo Workflows and Tekton already provide these functionalities ready to use. However, any other CI/CD solution can be used, as long as it meets needs without adding unnecessary complexity. After all, pipelines are essentially just sets of actions or scripts executed in a specific order.

Synchronization: Desired State vs Actual State

Ensuring the actual state is always synchronized with the desired state is essential. Tools like ArgoCD and Flux, based on GitOps, already perform this role efficiently.

Infrastructure as Code in Kubernetes

To standardize infrastructure, it should also be declared using CRDs in Kubernetes. Although we can use cloud-specific SDKs, tools like Crossplane already simplify this process, allowing you to manage infrastructure resources like databases, VPCs, and much more, directly through Kubernetes manifests.

Application Orchestration

Application orchestration, especially in custom models, can be challenging. Tools like Crossplane and KubeVela allow creating custom resource definitions that represent applications or other resources in a declarative and flexible way. Most importantly, these definitions must be accessible and consumable by everyone involved in the process.

Permission Management (RBAC)

With the desired state stored in the Git repository, it's crucial to ensure only the right people have access to the correct repositories with appropriate permissions.

Most users don't need to access Kubernetes or the cloud directly, except in read-only mode.

Direct access to Kubernetes is still necessary, but only for the IDP, which needs to observe and monitor the actual state.

Interacting with the IDP

Every interaction with an IDP (Internal Developer Platform) is based on two main actions:

Changing desired state.
Observing actual state.

Changing Desired State

Changes to desired state can be made in several ways:

Directly in the code stored in the Git repository.
Through the IDP interface.
Using the CLI.
Via IDE, with plugins or extensions.

Regardless of the chosen method, everything MUST be reflected in the Git repository to ensure consistency.

Pipelines also play an important role in updating desired state. For example, when building and pushing a new container image to a registry, the pipeline can automatically update the image tag in a YAML manifest in the repository. This manifest defines the application's desired state.

Additionally, developers can push application code and create, modify, or update manifests based on CRDs. These CRDs, created by operators, are the basis for defining and simplifying what each resource is.

Observing Actual State

GitOps tools automatically detect discrepancies between desired state (defined in manifests) and actual state (represented by CRs in Kubernetes). Whenever there's a deviation, these tools synchronize both states to keep the system aligned.

IDP Responsibilities

The IDP plays a crucial role in three main areas:

Creating Manifests:
- Facilitates the creation of manifests based on CRDs, defined by operators.
- Pushes manifests to the Git repository, ensuring desired state is updated.
Visualizing Desired State: Allows observing the system's desired state, either as a whole or specific parts.
Monitoring Actual State: Collects information about the system's current state for monitoring, log analysis, and troubleshooting.

The IDP goal is not just to allow developers to do what they already do, but also to empower them to perform tasks outside their areas of expertise, in a simpler, more efficient, and accessible way.

Which IDP?

Most IDPs in the market work well for small companies, startups, or organizations starting from scratch and can adapt to a less customizable solution. However, for large enterprises, this becomes a challenge due to the complexity of existing processes and the need for more robust adaptation to business particularities.

When a company reaches the point of implementing an IDP, it usually already has consolidated infrastructure, with assets, tools, and years of investment. In this context, it's more efficient and practical for the tool to adapt to the business, not the other way around. Trying to impose drastic cultural changes and alter well-established workflows can be counterproductive, generating internal resistance and productivity loss.

Therefore, large organizations often choose to develop their own IDPs to ensure they are fully customized to their needs. This approach allows creating solutions that meet specific requirements of internal processes, compliance, tools already in use, and existing workflows.

Build or Adapt?

Although developing an IDP from scratch may seem ideal to ensure complete customization, it's not always the best approach. There are IDPs in the market that offer a high degree of flexibility and can be customized to meet specific needs.

Backstage, for example, is a widely adopted solution that allows creating an adapted graphical interface, centralizing information and workflows, in addition to integrating various tools. Its plugin-based architecture is a great advantage, allowing teams to adapt and expand the IDP according to business growth and changing priorities.

On the other hand, we have Port which is a SaaS solution, obviously paid, but quite adaptive and feature-rich.

Additionally, some companies use a hybrid approach: they combine adopting an existing IDP, like Backstage, with developing their own tools and integrations to complement their needs. This allows accelerating implementation while still ensuring flexibility and customization.

Actual State​

Desired State​

Challenges of Desired State​

Tools​

User Interface​

Simplicity and Usability​

Universal API​

CR and CRD: The Foundation of Universality​

Putting It All Together​

Integrated Pipelines​

Synchronization: Desired State vs Actual State​

Infrastructure as Code in Kubernetes​

Application Orchestration​

Permission Management (RBAC)​

Interacting with the IDP​

Changing Desired State​

Observing Actual State​

IDP Responsibilities​

Which IDP?​

Build or Adapt?​