Static Application Security Testing
SAST, or Static Application Security Testing, represents a fundamental and proactive approach to identifying security vulnerabilities directly in source code, bytecode, or binaries of an application, even before its execution. It is a "white-box" methodology in which the analyst or analysis tool has full access to the internal structure of the software.
The central premise of SAST is early detection of security flaws in the software development lifecycle (SDLC), a concept known as "shifting left". By identifying and fixing vulnerabilities in early stages, organizations can significantly reduce the costs and time associated with remediation, as well as decrease the likelihood that these flaws reach the production environment.
- What does it analyze? Your proprietary source code (Java, Python, C#, etc.).
- What does it look for? Programming logic flaws that can lead to vulnerabilities, such as SQL Injection or Cross-Site Scripting (XSS).
SAST tools operate through static code analysis. They build a model of the application, mapping data flows and control flows. Based on this model, they apply a set of predefined rules to identify code patterns that correspond to known vulnerabilities, such as those listed in common weakness catalogs, like the OWASP Top Ten and the Common Weakness Enumeration (CWE).
This thorough analysis allows the identification of a wide range of vulnerabilities, including:
- SQL Injection: Flaws that allow an attacker to manipulate database queries.
- Cross-Site Scripting (XSS): Vulnerabilities that enable the injection of malicious scripts into web pages viewed by other users.
- Buffer Overflows: Errors that occur when a program tries to write data beyond the capacity of a memory buffer.
- Inadequate security configurations: Settings that may expose the application to risks.
- Use of components with known vulnerabilities: Third-party libraries or frameworks that have documented security flaws.
Adopting SAST in a secure development pipeline (DevSecOps) offers significant benefits, but also presents challenges that need to be considered.
Advantages:
- Early Detection: Allows finding and fixing vulnerabilities early in development, when the cost and effort for correction are lower.
- Comprehensive Code Coverage: Analyzes 100% of the source code, including areas that may not be exercised during dynamic testing.
- Independence from an Execution Environment: Does not require the application to be running, facilitating its integration into development and continuous integration (CI/CD) environments.
- Root Cause Identification: Points to the exact line of code where the vulnerability is found, facilitating developers' work in fixing it.
Disadvantages:
- False Positives: Can generate a considerable number of alerts that do not represent real vulnerabilities, requiring time and effort for triage.
- Language Dependency: SAST tools are generally specific to certain programming languages and frameworks.
- Limitations in Detecting Runtime Vulnerabilities: By not executing the application, SAST cannot identify flaws that only manifest at runtime, such as environment configuration problems or complex business logic failures.
- Time-Consuming Analysis: In large-scale projects, SAST scans can be time-consuming.
SAST is an essential pillar in a DevSecOps culture, which values the integration of security in all phases of development. When integrated into CI/CD pipelines, SAST automates security analysis with each new code change, providing quick feedback to developers and ensuring that security is a shared responsibility across the entire team.
SAST should not be seen as a single solution. For a robust application security strategy, it is recommended to combine SAST with other methodologies, such as DAST (Dynamic Application Security Testing), which analyzes the application in execution ("black-box" testing), and Software Composition Analysis (SCA), which identifies vulnerabilities in third-party components. Together, these approaches offer a more complete and in-depth view of an application's security posture.
The SAST Knowledge Base​
Unlike artificial intelligence systems that "learn" from data, SAST tools do not go through a continuous learning process in the strict sense of the term. Instead, their effectiveness lies in a vast and detailed pre-existing knowledge base, which is meticulously built and updated by security experts.
SAST "learns" from human knowledge structured in rules and patterns. It does not discover "zero-day" vulnerabilities on its own.
Although the core of SAST is rule-based, the most modern tools are beginning to incorporate artificial intelligence and machine learning (ML). However, their role is more about enhancement than fundamental learning. AI is used to improve triage and suggest fixes:
- Reduce False Positives: Learn from vulnerabilities that developers mark as "not being a problem" to refine future analyses.
- Prioritize Vulnerabilities: Analyze the context of a flaw to determine its real criticality more accurately.
- Suggest Fixes: Based on how similar vulnerabilities were fixed in the past, the tool can offer smarter code suggestions.
This knowledge base is the true engine of SAST and is primarily composed of two elements:
-
Rulesets: They are the heart of the tool. Each rule is a precise definition of a code pattern that can lead to a security vulnerability.These rules are specific to each programming languageand their respective frameworks. They are developed by security researchers who analyze known vulnerabilities and translate them into detectable patterns in source code. -
Known Vulnerability Patterns: The rules are based on software weakness catalogs widely recognized by the security industry. The two main ones are:
- CWE (Common Weakness Enumeration): A comprehensive and detailed list of different types of software weaknesses. SAST uses the CWE to identify and categorize found flaws, such as "CWE-79: Cross-site Scripting" or "CWE-89: SQL Injection".
- OWASP Top 10: An awareness list that highlights the ten most critical security risks for web applications. While not as granular as the CWE, the OWASP Top 10 guides the prioritization of rules in SAST tools to focus on the most common and dangerous attack vectors.
There is no "general rule system" or universal standard of rules that all SAST tools use. Although the concept behind detection is the same, the implementation of these rules is highly proprietary and specific to each tool.
To make an analogy, all antivirus programs want to detect malware, but Kaspersky, Norton, and McAfee have their own research labs, their own detection engines, and their own signature databases. You cannot take Kaspersky's signature database and load it into Norton's engine.
With SAST, the exact same thing happens. Here are the main reasons why there is no universal rule system:
-
Intellectual Property and Competition: The quality and breadth of the Knowledge Base are the main competitive differentiator between SAST tools. Companies like Veracode, Checkmarx, SonarSource, and Snyk invest millions in research and development to create and refine their rules. This "secret recipe" is what makes one tool better than another in detecting certain vulnerabilities, reducing false positives, or supporting new languages.
-
Complexity of Languages and Frameworks: A rule to detect SQL Injection in a Java application that uses the JDBC API is completely different from a rule to detect the same flaw in a Python application using the Django framework.
-
Different Analysis Philosophies: Some tools may focus on speed, optimizing their rules to run quickly in a CI/CD pipeline, even if it means missing some more complex vulnerabilities. Others may opt for a deeper and more time-consuming analysis that maps complex data flows through multiple files and libraries. These different philosophies require completely different rule implementations.
The global security community agrees to standardize on What Is a vulnerability. Standards like CWE and OWASP Top 10 are our universal dictionary. All serious tools will map their discoveries to these standards. For example, all will say: "I found a vulnerability of type CWE-79 (XSS)". However, the solution of "How" it was found is proprietary.
Open Source vs Paid Tools​
Before listing some here we must understand that there are many companies that offer a complete ecosystem including SAST, DAST, SAC and much more. We could list some here:
- Checkmarx
- Veracode
- Snyk Code
- Gitlab
-
Semgrep: Is perhaps the most popular and fastest-growing open source SAST tool currently.
- Extremely fast, flexible and powerful. Its great advantage is the ease of writing custom rules. Semgrep rules look like the code you are analyzing, which eliminates the need to learn a complex query language.
- Being very fast, it is perfect to run on every commit or pull request without delaying the pipeline.
- Supports user-defined automatic fixes
- Supports scanning only changed files (differential analysis) which helps gain speed in the pipeline.
- Supports dozens of popular languages like Python, JavaScript/TypeScript, Go, Java, Ruby, and more.
- Semgrep is a CLI (Command-Line Interface) and that is exactly why it is so flexible and easy to integrate into CI|CD pipelines.
- Although powerful, to have more complex vulnerability management (dashboards, reports, etc.), you generally use it with the paid version or integrate it with another platform.
- Paid version: While the open source version finds vulnerabilities, Semgrep Code (the paid platform) provides the tools to manage them at scale across the organization. The added value includes:
- Centralized Management and Dashboards: Offers a web interface (the "Semgrep App") that centralizes results from all projects and developers. This allows a unified view of risks, with filters by severity, project, rule, etc.
- Rules Managed by the Semgrep Team ("Pro-rules"): In addition to community rules, you have access to a set of high-precision rules, developed and maintained by Semgrep's security research team, ensuring fewer false positives and detection of more sophisticated vulnerabilities.
- Cross-file Data Flow Analysis (Cross-file Taint Analysis): The paid version enhances the analysis to track "tainted" data through multiple files and functions, finding complex vulnerabilities that single-file analysis would not detect.
- Vulnerability Lifecycle Management: Allows security teams to mark a discovery as "False Positive" or "Accepted Risk" directly in the interface, preventing the same alert from appearing repeatedly and focusing on what really matters.
- Governance Policies (Policy Enforcement): Allows creating and enforcing centralized security policies. For example, it is possible to automatically block a pull request if it introduces a "Critical" vulnerability, ensuring that insecure code never reaches the main base.
- Native Integrations and Reports: Facilitates integration with tools like Jira (to create tickets automatically), Slack (for notifications) and offers compliance report generation for audits (PCI-DSS, SOC 2, etc.).
- Specialized Technical Support: Direct access to Semgrep's team of experts to assist in implementation, rule creation and result analysis.
-
SonarQube (Community Edition): Is a complete Code Quality and Security platform, not just a SAST tool.- It is a market veteran, known for its holistic approach. It not only finds security vulnerabilities (SAST), but also "Code Smells" (code bad practices), bugs, and measures code complexity. Its web interface is robust, with detailed dashboards.
- Teams that want a 360-degree view of their code health, including maintainability and reliability.
- It is very good at managing large projects and tracking the evolution of "technical debt" over time.
- Offers reports and metrics for teams that need a detailed managerial view of software quality.
- The initial setup can also be more complex than Semgrep's.
- The Community Edition (free) has limitations. The most advanced security rules and support for some languages are only available in paid editions.
- What do we have more in the paid version?
- Advanced Security Analysis (The Big Differentiator). Paid editions unlock SonarQube's advanced SAST engine:
- Data Flow Analysis (Taint Analysis): The most important functionality for security. It tracks untrusted data (sources) to dangerous points (sinks), detecting complex vulnerabilities such as:
- SQL Injection
- Cross-Site Scripting (XSS)
- Path Traversal
- OS Command Injection
- Comprehensive Security Rules: Thousands of additional rules that cover OWASP Top 10, CWE Top 25 and SANS Top 25 classifications in detail.
- Data Flow Analysis (Taint Analysis): The most important functionality for security. It tracks untrusted data (sources) to dangerous points (sinks), detecting complex vulnerabilities such as:
- Integration with Development Flow ("Shift Left") from the Developer Edition.
- Branch and Pull Request Analysis: Instead of analyzing just the main branch, SonarQube analyzes feature branches and pull requests (PRs).
- Pull Request Decoration: It comments directly on the PR within GitHub, GitLab, Azure DevOps or Bitbucket, showing security and quality issues introduced in that specific code.
- Support for more languages
- Governance, Reports and Compliance (Enterprise Edition): The Enterprise Edition is aimed at large-scale security management:
- Security and Compliance Reports: Generates audit-ready PDF reports, mapping found vulnerabilities with standards like PCI-DSS, OWASP Top 10 and CWE.
- Portfolio Management: Offers an aggregated view of multiple projects and teams, allowing security managers to see the organization's risk as a whole, create application portfolios and set goals.
- Governance Automation: Allows the application of different "Quality Gates" per portfolio, ensuring that different types of applications follow required security standards.
- Advanced Security Analysis (The Big Differentiator). Paid editions unlock SonarQube's advanced SAST engine:
-
Horusec: A Brazilian tool that stands out for being an orchestrator of multiple scanners.- Created by Zup (an Itaú company), Horusec is not just a SAST engine, but an open source platform that runs several other security tools (including tools for dependency analysis and leaks) under the hood and unifies the results. It uses its own and third-party engines.
- Good for teams that want a single solution that already encompasses SAST, Leaks, and dependency analysis without having to configure each tool separately.
- Supports a wide range of languages and technologies through its orchestration architecture.
- Being an orchestrator, its effectiveness depends on the quality of the tools it uses internally. The configuration and management of all sub-tools can have their own learning curve.
-
Language-Specific Tools: For teams that work predominantly with a single language, specialized tools can be extremely effective and simpler.- Bandit (for Python): Focused exclusively on finding common security problems in Python code. It is very lightweight and easy to add to any Python project.
- Brakeman (for Ruby on Rails): Is the reference SAST tool for Ruby on Rails applications, deeply knowing the framework's particularities.
- Gosec (for Go): Performs security analysis specific to Go code, looking for constructions and library uses that may be insecure.
- Mobile Security Framework: For Android and iOS development.
- Grype: Focused on Linux packages.
- Others...
-
CodeQLis one of the most powerful and advanced SAST (Static Application Security Testing) tools on the market.- It is the code analysis engine that GitHub uses for its "Code Scanning" functionality and was designed from the beginning to find security vulnerabilities deeply and accurately.
- While tools like Semgrep focus on finding patterns in source code text, CodeQL works by treating Code as data: First, CodeQL analyzes source code and transforms it into a set of structured data in a relational database, which can be queried by the QL (Query Language) language to interrogate this database. It allows asking complex and precise questions about code behavior.
- It is the heart of GitHub Advanced Security, appearing natively in pull requests and the repository's security tab.
- Extremely Accurate: Due to its "code as data" approach, it tends to have a very low false positive rate compared to other tools. Learning Curve: Its main disadvantage is complexity. To write custom queries, you need to learn the QL language, which is significantly more complex than Semgrep's YAML rules.
- It is free for use in public repositories. For private repositories, it is part of the paid GitHub Advanced Security license.
- Considered by many experts one of the most technologically advanced, especially for finding variants of complex vulnerabilities in large codebases.
-
Gitlab SAST: GitLab offers a SAST (Static Application Security Testing) functionality integrated as part of its DevSecOps platform, in addition to other types of analysis such as SAST and SAC.- GitLab adopted an orchestration strategy. Essentially, GitLab SAST is not a single scanner, but rather a "collection" of the best open source SAST scanners on the market, which it manages and executes for you behind the scenes.
- When you enable SAST in your pipeline (.gitlab-ci.yml), GitLab automatically detects your project's language and selects the most appropriate open source scanner from its collection to execute.
- Easy integration. Zero Configuration (Auto DevOps): With Auto DevOps enabled, SAST is configured and executed automatically without you having to do almost anything.
- GitLab ingests the results of all these different scanners and displays them in a unified way in the "Security Panel", in Merge Requests.
-
OWASP Does Not Have a SAST Project
Semgrep and Sonarqube​
Try to get the tool working, integrate it into the company's and teams' culture. After this is solid and applied in the development culture, you will probably see value in the complete tool and think about paying for one of them to have a unique, complete and integrated stack.
You can use Semgrep's CLI to do fast and flexible code scanning and then send the results to be displayed and managed within the SonarQube interface.
The secret to making this "bridge" between the two tools is a standard format called SARIF.
SARIF (Static Analysis Results Interchange Format) is an open standard, like a "universal translator" for static analysis tool results. When Semgrep exports its results as a SARIF file, it is creating a report that SonarQube (and other tools) can understand and import.
The process is relatively simple and works even with SonarQube Community Edition (the free version).
- Step 1: Generate the SARIF Report with Semgrep In your CI/CD pipeline, instead of just running Semgrep to see the result in the terminal, you add a command to save the output to a SARIF file.
# Run the scan and save the result to the semgrep-report.sarif file
semgrep scan --sarif -o semgrep-report.sarif
- Step 2: Configure SonarQube Analysis. You need to tell SonarScanner (SonarQube's command-line tool) where to find this report. You do this by adding a property to your sonar-project.properties configuration file or directly on the command line.
# Add this line to your sonar-project.properties file
sonar.sarif.reportPaths=semgrep-report.sarif
Step 3: Run Sonar Scanner: Now, when you run the normal SonarQube analysis, SonarScanner will:
- Execute its own native analyses.
- Look for the semgrep-report.sarif file.
- Import all vulnerabilities found by Semgrep and display them in the SonarQube interface as "issues".
Advantages and Limitations of This Approach
- Centralization: You have a single location (the SonarQube dashboard) to view vulnerabilities found by SonarQube's native engine and Semgrep's engine.
- Leverage Semgrep's Strength: You can use Semgrep's fast, customizable rules and vast community library, which can cover cases that SonarQube Community does not cover.
- Holistic View: SonarQube can combine Semgrep's "issues" with its other metrics (test coverage, complexity, etc.) to give a more complete view of project health.
It is important to understand that this is a data import, not a deep native integration.
- Limited Management of External Issues: According to SonarQube documentation, issues imported via SARIF have some limitations. For example, it may not be possible to mark them as "False Positive" or "Accepted Risk" directly in the SonarQube interface in the same way as native issues. Rule management (enable/disable) continues to be done on the Semgrep side.
- Loss of Advanced Context: You will see the problem description, the file and the line. However, you will not have the data flow graphs (taint analysis flow) that SonarQube (paid version) displays for its own security discoveries.
- Possible Duplication: If a native SonarQube rule and a Semgrep rule find exactly the same vulnerability, there is a chance it will appear as two separate issues. We can solve this by creating a new profile that disables all vulnerability rules and leaving only Bug and Code Smell rules active. We associate this profile with the project. Then only Semgrep rules will be imported avoiding redundancy. The sonar-scanner will run, but as the native security rules will be disabled, it will not find any vulnerabilities on its own. Its only source of security "issues" will be Semgrep's SARIF file.
It is a pragmatic and intelligent approach that recognizes that no tool is perfect at everything. It requires greater initial effort, but the result is an extremely powerful, flexible and very low-cost security and code quality solution.
From Semgrep, you get:
- Speed and Efficiency
- Modern Rule Engine
- Active Community: Access to a set of community rules that grows and evolves rapidly.
From SonarQube, you get:
- Robust Management Platform: A consolidated web interface, known and appreciated by developers.
- Historical Analysis and Metrics: The ability to track "technical debt" and security evolution over time, something that Semgrep's CLI alone does not offer.
- 360° Quality View: You can still use SonarQube for what it does best beyond SAST: measure test coverage, complexity, bugs and "code smells".
- Single Source of Truth: For managers and auditors, SonarQube becomes the central panel to query the health of all projects, even if part of the data comes from another tool.
- Excellent Cost-Benefit: You are effectively using the free version of SonarQube (Community Edition) to get management features that are normally found in very expensive paid SAST platforms. You combine a cutting-edge free scanner (Semgrep) with a free management platform.
Your architecture becomes modular. If tomorrow an even better open source SAST tool emerges than Semgrep, the migration process is simple: you just need to replace the step that generates the SARIF file in your pipeline. You are not "locked" to a single vendor's analysis engine, as your "dashboard" (SonarQube) is agnostic to the data source, as long as they arrive in SARIF format. If SonarQube decides to deliver this for free we solve the problem more quickly.
Despite being excellent, this approach is not trivial and requires awareness of some disadvantages:
- Configuration and Maintenance Complexity: You now have two tools to manage, configure and keep updated. Configuring the Quality Profile in SonarQube to disable rules, generating the SARIF in the pipeline and passing the parameter to SonarScanner require greater technical knowledge than using a single solution in a standard way.
- Less "Deep" Analysis Experience: As we discussed, the experience of analyzing an imported "issue" from Semgrep within SonarQube will be more "shallow". You will not have the interactive data flow graphs that SonarQube (paid) offers for its own discoveries. You will have the description, the file and the line, but deeper investigation may need to be done by looking at the code directly.
- Two Sources for Rule Management: Your team will need to manage security rules in Semgrep configuration files and quality rules (bugs/code smells) in the SonarQube interface. It is not a big problem, but it is a small friction point.
CodeQL and Gitlab SAST vs SonarQube​
We could use CodeQL instead of Semgrep. However, generally when we use CodeQL we use GitHub to host the code and there we already have a security interface, which makes things much easier.
Why would we use SonarQube if the button is one click away?
- Holistic Focus on Code Quality: This is SonarQube's killer differentiator. GitHub Advanced Security (GHAS) is focused on Security (SAST, SCA, Secret Scanning). SonarQube is focused on Code Health as a Whole. It answers questions that GHAS does not answer:
- What is our test coverage?
- Do we have a lot of duplicate code?
- How is our cyclomatic complexity?
- How is our technical debt in "Code Smells" (code bad practices that affect maintainability)?
- For an organization that has a "Clean Code" culture, SonarQube offers a 360° view that no other tool offers.
- Agnostic Platform (Multi-VCS): Many large companies do not live exclusively on GitHub. They may have legacy projects on Bitbucket, new projects on GitLab and the majority on GitHub. SonarQube can be the central platform that analyzes code from all these different sources, offering a unified quality and security view for the entire company. GHAS only works for code that is on GitHub.
- Many organizations, especially in regulated sectors like finance, have been using SonarQube for over a decade. They go through rigorous audits (like PCI-DSS, ISO 27001, etc.) and their audit processes, compliance reports and management metrics are all built on top of SonarQube. Migrating all this is a gigantic effort.
All of this also applies to GitLab SAST. We have the button one click away but it is still focused only on security and not on code quality.