CodeQL Code Scanning implementation @ Scale

GitHub Code Scanning provides engineering teams contextualized in-workflow feedback as they build and deliver software, efficiently ensuring that the code shipped to production is free of known security vulnerabilities. Implementing Code Scanning @ Enterprise Scale with thousands of applications and pipelines is particularly complex. These large organizations often have a requirement to centrally manage their implementation so they can keep the service operational and detect the latest vulnerabilities. A frequent challenge with centralizing the AppSec workflow is the necessity to rebuild the application specifically to perform a scan. AppDev teams are justifiably slow to take on the additional overhead an extra build will cause and it is untenable for AppSec teams to manage a fleet of agents for building and scanning all applications in the organization's portfolio.

Let’s explore first how to externalize and centrally manage an AppSec CI integration, then how we can leverage the CodeQL CLI with indirect build tracing to perform a scan on a deployable build artifact within the AppDev agent.

Centralized Administration

Modern CI tools provide the ability to create modular, consumable and centrally managed reusable workflows (Actions) or library functions (Jenkins) to inject external logic into a caller workflow. By leveraging this pipeline integration an AppSec team can externalize the Code Scanning functionality (create, analyze, upload) and include custom logic to support more granular configurations such as: referencing specific libraries or a subset of curated queries, vulnerability classification, gathering performance metrics and harvesting the CodeQL DB for future use. In a subsequent post we will explore creative Zero day strategies with this pattern enabled by Multi Repo Variant Analysis. Here is an example of what an AppDev pipeline that is consuming a centrally managed AppSec workflow might look like as it pulls in that Library extension and invokes it with arguments.

AppDev Pipeline

@Library(["appsec-workflow"])

stage('Build & Scan'){ steps{ codescan("$REF_PARAMS"), {

Teach “appsec-workflow” how to build the application via $BUILD_ARGS.

  sh("mvn clean package -f ./main/pom.xml") 
}

} }

Build once, for scan and deploy

The challenge of rebuilding the application for scanning and deploying persists if you have only just externalized the standard CodeQL scan. To avoid rebuilding the application for analysis leverage CodeQL CLI indirect build tracing within your reusable workflow. Indirect build tracing enables CodeQL to detect all build steps between the init and analyze steps. In order for the build to be traced, the reusable workflow requires the AppDev pipeline to pass in build commands. The AppSec pipeline should sandwich those build commands between CodeQL init and analyze for CodeQL to peer into the build process and assemble a representative CodeQL DB.

Below is an example (pseudocode) of how you might initialize CodeQL with indirect build tracing, build the application via the “build command” arguments of the caller workflow and then perform an analysis on that artifact using indirect build tracing.

AppSec Pipeline

Initialize the CodeQL database using `codeql database init --begin tracing`.

task: CmdLine@1 displayName: Initialize CodeQL database inputs: # Assumes the source code is checked out to the current working directory. # Creates a database at /codeql-dbs/example-repo. script: "codeql database init --language $LANG --source-root . --begin-tracing /codeql-dbs/example-repo"

Build the App with Args

task: CmdLine@1 displayName: Build app with build command Args from caller inputs: script: $BUILD_ARGS

Use `codeql database finalize` to complete database creation after the build is done.

task: CmdLine@2 displayName: Finalize CodeQL database inputs: script: 'codeql database finalize /codeql-dbs/example-repo'

Analyze the database and upload the results.

task: CmdLine@2 displayName: Analyze CodeQL database inputs: script: 'codeql database analyze /codeql-dbs/example-repo java-code-scanning.qls --sarif-category=java --format=sarif-latest --output=/temp/example-repo-java.sarif'

The output of build tracing is two fold, the CodeQL DB ready for analysis and your standard build artifact. After the artifacts are produced, the SARIF uploaded to the GitHub repository and the executable passed to the next task in the caller workflow, you can exit the library function and resume normal AppDev pipeline execution. The AppDev team will see scan results uploaded to their repository immediately.

With this design you should have satisfied the requirement to centrally manage the AppSec pipeline integration while avoiding the responsibility of rebuilding the application to perform a CodeQL scan, all on the AppDev agent. For any questions please contact your GitHub account team.

alwell-kevin/ReadMe.md