gitsearch

A powerful command-line tool for searching across private GitLab repositories with concurrent processing and detailed reporting

Download Now Get Started

Features

🚀

High Performance

Concurrent processing with configurable goroutines for optimal speed across repositories

🔍

Flexible Search

Search through user projects or entire GitLab instances with customizable keywords and exclusions

📊

Detailed Reports

Comprehensive execution summaries and JSON exports of search results

🔒

Secure Access

Support for HTTPS with SSL verification and self-signed certificates

⚙️

Config File Support

YAML configuration files for persistent settings and complex search patterns

📝

Pattern Matching

Advanced regex pattern matching with capture groups for structured data extraction

Downloads

Choose the right version for your platform from:

github.com/henrybravo/gitsearch/releases

Linux Binaries

Windows Binaries

macOS Binaries

Docker Images

Usage

% gitsearch
       _ _                           _
  __ _(_) |_ ___  ___  __ _ _ __ ___| |_
 / _` | | __/ __|/ _ \/ _` | '__/ __| '_ \
| (_| | | |_\__ \  __/ (_| | | | (__| | | |
 \__, |_|\__|___/\___|\__,_|_|  \___|_| |_|
 |___/
Search through GitLab repositories for keywords
github.com/henrybravo/gitsearch
----------------------------------------------------------
Author: Henry Bravo | Version: 1.3.1 | Date: June 2025

Required flags (or config file): -url, -token, and -keywords

Config file options:
  -config         : Path to YAML configuration file
  -generate-config: Generate a sample config file (config.yaml)

Optional flags:
  -exclude        : Comma-separated terms to exclude from project names
  -concurrent     : Maximum number of concurrent goroutines (default 10)
  -output         : Output format: 'json' (default) or 'csv'
  -log            : Enable logging to scan_[timestamp].log file
  -skip-ssl       : Only needed for HTTPS URLs with self-signed certificates
  -h              : Show help menu
  -v, -version    : Show version information

Optional branch scanning flags:
  -branch-age     : Skip branches older than N days (default: 90, 0 for all)
  -branch-names   : Branches/patterns to scan (default: main,master,release/*)
                    Supports wildcards: main, release/*, feature/*, dev/*
  -skip-merged    : Skip merged branches (default: true, use -skip-merged=false to include)
  -branch-exclude : Patterns to exclude (e.g. 'temp-*,old-*')

Examples:
  HTTP instance:  -url http://gitlab.local:8929
  HTTPS instance: -url https://gitlab.local -skip-ssl
  GitLab.com:     -url https://gitlab.com/username
  Config file:    -config my-config.yaml
  Config file with overrides: -config my-config.yaml -url https://gitlab.com/different-user

examples

# Search in GitLab.com user repositories
% gitsearch -url https://gitlab.com/username -token YOUR_TOKEN -keywords "api_key,secret"

# Search in self-hosted instance with SSL skip
% gitsearch -url https://gitlab.local -skip-ssl -token YOUR_TOKEN -keywords "password,credential"

# Customize concurrent operations
% gitsearch -url https://gitlab.com/username -token YOUR_TOKEN -keywords "key" -concurrent 1000

# Exclude specific projects
% gitsearch -url https://gitlab.com/username -token YOUR_TOKEN -keywords "api" -exclude "test,sandbox"

# Using Docker
% docker run -it --rm -v $PWD:/data henrybravo/gitsearch:latest -url https://gitlab.com/username -token YOUR_TOKEN -keywords "api_key"

# Using Docker and config file (config.yaml is in $PWD)
% docker run -it --rm -v $PWD:/data henrybravo/gitsearch:latest -config config.yaml

# Using Docker and debugging enabled
% docker run -it --rm -v $PWD:/data henrybravo/gitsearch:latest -config config.yaml -url https://gitlab.com/different-user -token DIFFERENT_TOKEN -debug

help menu

% gitsearch -h

Usage of gitsearch:
  -branch-age int
    	Skip branches with no activity for specified days (0 to scan all) (default 90)
  -branch-exclude string
    	Comma-separated patterns to exclude branches (e.g. 'temp-,old-')
  -branch-names string
    	Comma-separated types or patterns of branches to scan (e.g., main,release/*,feature/*) (default "main,master,release/*")
  -concurrent int
    	Maximum number of concurrent goroutines (default 10)
  -config string
    	Path to configuration file (YAML format)
  -debug
    	Enable debug logging to debug_[timestamp].log file
  -exclude string
    	Comma-separated terms to exclude from project names
  -generate-config
    	Generate a sample configuration file named config.yaml
  -keywords string
    	Comma-separated keywords to search for
  -log
    	Enable logging to scan_[timestamp].log file
  -output string
    	Output format: 'json' (default) or 'csv' (default "json")
  -skip-merged
    	Skip merged branches (default true)
  -skip-ssl
    	Skip SSL verification for self-hosted instances
  -token string
    	GitLab API token
  -url string
    	GitLab URL (e.g., https://gitlab.com/username or https://gitlab.company.com)
  -v	Print version information
  -version
    	Print version information

Output

scan_results_[date]_[time].json

A results file `scan_results_[date]_[time].json` will be placed in the execution directory by default. If you pass the `-csv` argument or value in `config.yaml` the results will also be written to a `scan_results_[date]_[time].csv file

 

scan_[date]_[time].log

If `-log` was enabled a `scan_[date]_[time].log` file will be placed in the execution directory

 

debug_[date]_[time].log

If `-debug` was passed a `debug_[date]_[time].log` file will be placed in the execution directory

 

stdout

a stdout log will be printed during execution:

 

Execution Summary
=================
Scanned URL: http://gitlab.com/henrybravo
Keyword(s): resource,bucket,token
Excluded repo(s): none
Total Duration: 2.97 seconds
Sequential Processing Time: 7.41 seconds  
Parallelization Speedup: 2.49x

Branch Scanning Configuration
=============================
Branch Age Limit: 1000 days
Branch Types: main,master,release/*
Skip Merged Branches: false
Excluded Branch Patterns: none

Scope
=====
Repositories: 4
Projects: 4
Branches: 4
Files: 18
Matches Found: 287
Concurrency: 1000000 goroutines

Performance Details
===================
Slowest Project: tf-azure-container-app (2.76 seconds)
Avg. Time per File: 0.346 seconds

config file usage

# Generate a sample config file
% gitsearch -generate-config
# Use a config file
% gitsearch -config config.yaml
# Override config file settings with command line arguments
% gitsearch -config config.yaml -url https://gitlab.com/different-user -token NEW_TOKEN

example config file with search patterns

config.yaml :

# GitLab connection settings
gitlab:
    # Base URL of the GitLab instance (required)
    base_url: "https://gitlab.com/[user]"
    
    # API token for authentication (required)
    token: "your-token-here"
    
    # Skip SSL verification for self-hosted instances with self-signed certificates
    skip_ssl: false

# Search settings
search:
    # List of keywords to search for (required if no pattern search configured)
    keywords:
        - "password"
        - "api_key"
    
    # Terms to exclude from project names
    exclude_terms:
        - "test"
    
    # Maximum number of concurrent goroutines
    max_goroutines: 10
    
    # Output format: 'json' or 'csv'
    output_format: "csv"
    
    # Scan Log
    scan_log: true

# Branch filtering
branches:
    # Skip branches with no activity for specified days (0 to scan all)
    max_age: 90
    
    # Branch names or patterns to scan
    names:
        - "main"
        - "master"
        - "release/*"
        - "feature/*"

    # Skip merged branches
    skip_merged: true
    
    # Patterns to exclude branches
    excludes:
        - "temp-*"
        - "old-*"

# Pattern definitions
patterns:
  # Single-line pattern with named capture groups
  - name: "aws_secrets"
    description: "Find AWS access keys"
    regex: "(?i)(?Paws_access_key_id|aws_secret_access_key)\\s*=\\s*['\"]?(?P[^'\"\n]+)['\"]?"
    capture_groups:
        - "key_type"
        - "key_value"

  # Pattern for specific files
  - name: "credentials"
    description: "Find credential patterns in any file"
    regex: "(?i)(?:password|secret|key|token)\\s*[:=]\\s*['\"]([^'\"]+)['\"]"
    captures:
    - name: "credential_value"
        group: 1

  # Multi-line pattern with indexed captures
  - name: "terraform_modules"
    description: "Find Terraform module versions"
    file_extensions: [".tf"]
    regex: "module\\s+\"([^\"]+)\"\\s*{[\\s\\S]*?source\\s*=\\s*\"([^\"]+)\"[\\s\\S]*?version\\s*=\\s*\"([^\"]+)\"[\\s\\S]*?}"
    captures:
    - name: "module_name"
        group: 1
    - name: "source"
        group: 2
    - name: "version" 
        group: 3
    multi_line: true
    
  # Multi-line pattern for provider blocks
  - name: "terraform_providers"
    description: "Find Terraform provider versions"
    file_extensions: [".tf"]
    regex: "required_providers\\s*{[\\s\\S]*?([^=\\s]+)\\s*=\\s*{[\\s\\S]*?source\\s*=\\s*\"([^\"]+)\"[\\s\\S]*?version\\s*=\\s*\"([^\"]+)\"[\\s\\S]*?}"
    captures:
      - name: "provider_name"
        group: 1
      - name: "source"
        group: 2
      - name: "version"
        group: 3
    multi_line: true

  # Docker image pattern
  - name: "docker_images"
    description: "Find Docker image references in YAML files"
    file_extensions: [".yml", ".yaml", ".dockerfile", "Dockerfile"]
    regex: "(?m)(?:^\\s*image:\\s*(?:name:\\s*)?\"?([^\"\\n]+)\"?|^\\s*FROM\\s+([^\\s]+))"
    captures:
      - name: "image"
        group: 1
    multi_line: true

  # Environment variables in various formats
  - name: "env_vars"
    description: "Find environment variable definitions"
    file_extensions: [".env", ".yml", ".yaml", ".sh"]
    regex: "(?m)^\\s*(?:export\\s+)?([A-Z][A-Z0-9_]*)\\s*=\\s*['\"]?(.*?)['\"]?\\s*$"
    captures:
    - name: "var_name"
      group: 1
    - name: "var_value"
      group: 2

  # Kubernetes resource pattern
  - name: "kubernetes_resources"
    description: "Find Kubernetes resource definitions"
    file_extensions: [".yml", ".yaml"]
    regex: "(?m)^kind:\\s*([^\\n]+)\\n[\\s\\S]*?^metadata:\\s*\\n[\\s\\S]*?^\\s*name:\\s*([^\\n]+)"
    captures:
      - name: "resource_type"
        group: 1
      - name: "resource_name"
        group: 2
    multi_line: true

pattern matching results example

scan_results_[date]_[time].json :

 {
    "project": "terraform-azure-container-app",
    "branch": "main",
    "file": "main.tf",
    "line": {
        "number": 9,
        "content": "module \"analytics_workspace\" {\n  source  = \"app.terraform.io/org/analytics-workspace/azurerm\"\n  version = \"0.0.1\"\n\n  analytics_workspace_name = var.azurerm_log_analytics_workspace_name\n  location = var.location\n  resource_group_name = module.resource_group.resource_group_name\n}",
        "keyword": ""
    },
    "pattern": {
        "name": "terraform_modules",
        "captured_data": {
        "module_name": "analytics_workspace",
        "source": "app.terraform.io/org/analytics-workspace/azurerm",
        "version": "0.0.1"
        }
    }
},
{
    "group": "datascience",
    "project": "aws-ds-sandbox",
    "branch": "main",
    "file": "credentials",
    "line": {
        "number": 3,
        "content": "aws_access_key_id = ABCDEFGHIJKLMNOPQRS",
        "keyword": ""
    },
    "pattern": {
        "name": "aws_secrets",
        "captured_data": {
        "key_type": "aws_access_key_id",
        "key_value": "AKIA6U3NHUDCIJK"
        }
    }
},
{
    "group": "sandbox",
    "project": "mysandbox-1",
    "branch": "main",
    "file": ".env",
    "line": {
      "number": 1,
      "content": "export SECRET=bla",
      "keyword": ""
    },
    "pattern": {
      "name": "env_vars",
      "captured_data": {
        "var_name": "SECRET",
        "var_value": "bla"
      }
    }
},

Security

The application could be flagged as Trojan:Script/Wacatac.B!ml by Windows Defender. This is a false positive triggered by:

1. Network Connectivity
  • Makes HTTP/HTTPS connections to GitLab instances
  • Handles authentication tokens
  • Scans remote repositories
2. File Operations
  • Creates log files (scan_[timestamp].log)
  • Creates result files (scan_results_[timestamp].json)
  • Reads file contents
3. Process Behavior
  • Uses multiple goroutines (parallel execution)
  • Scans through file contents looking for specific patterns
4. Compression
  • The use of compression (while legitimate) is often flagged because malware authors frequently use it to obfuscate malicious code.

The application has been verified by Microsoft Security Intelligence: SubmissionID: b18139fb-c0bd-4a76-9aac-cc16e50b1915

Changelog

Version 1.3.1

1.3.1 Release

  • Debug logging system with comprehensive system diagnostics
  • Improved error visibility: critical errors now always display to console
  • Enhanced configuration validation with detailed debug traces
  • Token masking in debug logs for security
Version 1.3.0

1.3.0 Release

  • YAML Configuration Support: We have added support for YAML configuration files, allowing users to manage their configuration settings in a more structured and human-readable format. The -config flag can be used to specify a YAML configuration file.
  • Sample Configuration Generation: To help users get started, we have introduced the -generate-config flag, which can create sample YAML configuration files.
  • Pattern-Based Searching: We have implemented pattern-based searching with regex capture groups, enabling users to perform more complex and nuanced searches.
  • Command Line Argument Override: Users can now override configuration file settings using command line arguments, providing greater flexibility and control.
  • Improved Error Handling and Logging: We have enhanced error handling and logging for configuration file operations, ensuring that users receive more informative and actionable error messages.
Version 1.2.0

1.2.0 Release

  • Results in CSV file: this release introduces a new feature that allows users to customize the output format of search results. In addition to the default JSON format, users can now opt for a CSV format by passing the -output csv flag.
Version 1.1.0

1.1.0 Release

    New Branch Scanning Flags: we have added several optional flags to the branch scanning process, allowing you to customize your searches.


  • -branch-age: This flag allows you to skip branches that are older than a specified number of days. By default, this flag skips branches older than 90 days, but you can adjust this value to suit your needs. Setting this flag to 0 will include all branches in your search results.
  • -branch-names: This flag enables you to specify specific branches or patterns to scan. By default, this flag includes the main, master, and release branches, but you can add or modify these branches to suit your needs. This flag also supports wildcards, allowing you to search for branches with names like "feature/", "dev/", or "release/".
  • -skip-merged: This flag allows you to skip merged branches in your search results. By default, this flag includes merged branches, but you can set it to false to exclude them.
  • -branch-exclude: This flag enables you to specify patterns to exclude from your search results. For example, you can use this flag to exclude branches with names that contain certain words or phrases, such as "temp-" or "old-".
Version 1.0.0

Initial Release

  • Concurrent repository scanning with configurable goroutines
  • Scans 'master and 'main' branches
  • Support for GitLab.com and self-hosted instances
  • Flexible keyword search with project exclusions
  • JSON export of search results
  • Detailed execution statistics and performance metrics
  • SSL verification skip option for self-hosted instances
  • Docker support with multi-arch images
Version 0.9.0-beta

Beta Release

  • Initial implementation of core search functionality
  • Basic GitLab API integration
  • Command-line interface implementation
  • Performance optimization and testing
  • Documentation and usage examples

Roadmap

Future Releases

  • Support for gitlab.com public repositories making token requirement optional
  • Support for github.com

Disclaimer

This notice is to clarify the relationship between gitsearch, developed by Henry Bravo, and GitLab (https://gitlab.com).

Independence Statement

Key Points

Trademark Notice

GitLab is a registered trademark of GitLab Inc. The use of terms such as "GitLab" in this notice does not imply any endorsement or affiliation.

Contact Information

For any inquiries, support requests, or feedback regarding gitsearch, please contact Henry directly: