Awesome Golang.ai
Golang AI applications have incredible potential. With unique features like inexplicable speed, easy debugging, concurrency, and excellent libraries for ML, deep learning, and reinforcement learning.
Benchmark
- ADeLe: ADeLe v1.0 is a comprehensive AI evaluation framework that combines explanatory analysis and predictive modeling capabilities to systematically assess AI system performance across multiple dimensions.
- SWELancer: The SWE-Lancer-Benchmark is designed to evaluate the capabilities of frontier LLMs in solving real-world freelance software engineering tasks, exploring their potential to generate economic value through complex software development scenarios.
Real World Challenge
- RPBench-Auto: An automated pipeline for evaluating LLMs for role-playing.
- SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation.
Text-to-Speech(TTS)
- emergenttts-eval-public: Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.
English
- ARC-AGI: The Abstraction and Reasoning Corpus.
- ARC-Challenge: AI2 Reasoning Challenge (ARC) Set.
- BBH: Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.
- BIG-bench: Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models.
- GPQA: GPQA: A Graduate-Level Google-Proof Q&A Benchmark.
- HelloSwag: HellaSwag: Can a Machine Really Finish Your Sentence?
- IFEval: IFEval is designed to systematically evaluate the instruction-following capabilities of large language models by incorporating 25 verifiable instruction types (e.g., format constraints, keyword inclusion) and applying dual strict-loose metrics for automated, objective assessment of model compliance.
- LiveBench: A Challenging, Contamination-Free LLM Benchmark.
- MMLU: Measuring Massive Multitask Language Understanding ICLR 2021.
- MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark.
- MMLU-Pro: [NeurIPS 2024] A More Robust and Challenging Multi-Task Language Understanding Benchmark.
- MTEB: Massive Text Embedding Benchmark.
- PIQA: PIQA is a dataset for commonsense reasoning, and was created to investigate the physical knowledge of existing models in NLP.
- WinoGrande: An Adversarial Winograd Schema Challenge at Scale.
Chinese
- C-Eval: [NeurIPS 2023] A Chinese evaluation suite for foundation models.
- CMMLU: Measuring massive multitask language understanding in Chinese.
- C-SimpleQA: A Chinese Factuality Evaluation for Large Language Models.
Math
- AIME: Evaluation of LLMs on latest math competitions.
- grade-school-math: The GSM8K dataset contains 8.5K grade school math word problems designed to evaluate multi-step reasoning capabilities in language models, revealing that even large transformers struggle with these conceptually simple yet procedurally complex tasks.
- MATH: The MATH Dataset for NeurIPS 2021, is a benchmark for evaluating mathematical problem-solving capabilities, offering dataset loaders, evaluation code, and pre-training data.
- MathVista: MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts.
- Omni-MATH: Omni-MATH is a comprehensive and challenging benchmark specifically designed to assess LLMs’ mathematical reasoning at the Olympiad level.
- TAU-bench: TauBench is an open-source benchmark suite designed to evaluate the performance of large language models (LLMs) on complex reasoning tasks across multiple domains.
Code
- AIDER: The leaderboards page of aider presents a performance comparison of various LLMs in programming-related tasks, such as code writing and editing.
- BFCL: BFCL aims to provide a thorough study of the function-calling capability of different LLMs.
- BigCodeBench: [ICLR’25] BigCodeBench: Benchmarking Code Generation Towards AGI.
- Code4Bench: A Mutildimensional Benchmark of Codeforces Data for Different Program Analysis Techniques.
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation.
- HumanEval: Code for the paper “Evaluating Large Language Models Trained on Code”.
- LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code.
- MBPP: The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on.
- MultiPL-E: A multi-programming language benchmark for LLMs.
- multi-swe-bench: The Multi-SWE-bench project, developed by ByteDance’s Doubao team, is the first open-source multilingual dataset for evaluating and enhancing large language models’ ability to automatically debug code, covering 7 major programming languages (e.g., Java, C++, JavaScript) with real-world GitHub issues to benchmark “full-stack engineering” capabilities.
- SWE-bench: SWE-bench is a benchmark suite designed to evaluate the capabilities of large language models (LLMs) in solving real-world software engineering tasks, focusing on actual software bug-fixing challenges extracted from open-source projects.
- BFCL: Training and Evaluating LLMs for Function Calls (Tool Calls).
- T-Eval: [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step.
- WildBench: Benchmarking LLMs with Challenging Tasks from Real Users.
Open ended
- Arena-Hard: Arena-Hard-Auto: An automatic LLM benchmark.
Safety
False refusal
- Xstest: Röttger et al. (NAACL 2024): “XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models”.
Multi-modal
- DPG-Bench: The DPG benchmark tests a model’s ability to follow complex image generation prompts.
- geneval: GenEval: An object-focused framework for evaluating text-to-image alignment.
- LongVideoBench: [Neurips 24’ D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
- MLVU: Multi-task Long Video Understanding Benchmark.
- perception_test: A Diagnostic Benchmark for Multimodal Video Models is a multimodal benchmark designed to comprehensively evaluate the perception and reasoning skills of multimodal video models.
- TempCompass: A benchmark to evaluate the temporal perception ability of Video LLMs.
- VBench: VBench is an open-source project aiming to build a comprehensive evaluation benchmark for video generation models.
- Video-MME: [CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
- gateway: Universal MCP-Server for your Databases optimized for LLMs and AI-Agents.
- mcp-go: A Go implementation of the Model Context Protocol (MCP), enabling seamless integration between LLM applications and external data sources and tools.
- mcp-golang: Write Model Context Protocol servers in few lines of go code.
- registry: A community driven registry service for Model Context Protocol (MCP) servers.
Large Language Model
GPT
- gpt-go: Tiny GPT implemented from scratch in pure Go. Trained on Jules Verne books.
ChatGPT Apps
- feishu-openai: Feishu (Lark) integrated with (GPT-4 + GPT-4V + DALL·E-3 + Whisper) delivers an extraordinary work experience.
- chatgpt-telegram: Run your own GPTChat Telegram bot, with a single command.
Agent
- anyi: A Golang autonomous AI agent framework for assisting real work.
- AgenticGoKit: Event-driven Agentic AI framework in Go. LLM-agnostic with MCP tool discovery, built-in observability, and production patterns.
- agent-sdk-go: Build AI agents in light speed.
- code-editing-agent: A Go-based AI agent that edits code using the DeepSeek model, offering a clear example of how AI agents work.
SDKs
- anthropic-sdk-go: Access to Anthropic’s safety-first language model APIs via Go.
- cohere-go: Go Library for Accessing the Cohere API.
- deepseek-go: A Deepseek client written for Go supporting R-1, Chat V3, and Coder. Also supports external providers like Azure, OpenRouter and Local Ollama.
- go-anthropic: Anthropic Claude API wrapper for Go.
- go-openai: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go.
- go-genai: Google Gen AI Go SDK provides an interface for developers to integrate Google’s generative models into their Go applications.
- generative-ai-go: Go SDK for Google Generative AI.
- openai-go: The official Go library for the OpenAI API.
- volcengine-go-sdk: The Volcengine Go SDK is the official Go language SDK for ByteDance’s Volcengine cloud computing platform, providing developers with programmatic access to various cloud services through a standardized API interface.
- LocalAI: 🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference.
- ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
- go-attention: A full attention mechanism and transformer in pure go.
- langchaingo: LangChain for Go, the easiest way to write LLM-based programs in Go.
- gpt4all-bindings: GPT4All Language Bindings provide cross-language interfaces to easily integrate and interact with GPT4All’s local LLMs, simplifying model loading and inference for developers.
- go-openai: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go.
- llama.go: llama.go is like llama.cpp in pure Golang.
- eino: The ultimate LLM/AI application development framework in Golang.
- fabric: fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
- genkit: An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to develop, integrate, and test AI features with observability and evaluations. Genkit works with various models and platforms.
- swarmgo: SwarmGo (agents-sdk-go) is a Go package that allows you to create AI agents capable of interacting, coordinating, and executing tasks.
- orra: The orra-dev/orra project offers resilience for AI agent workflows.
- core: A fast, agnostic, and powerful Go AI framework for one-shot workflows, building autonomous agents, and working with LLM providers.
- gollm: Unified Go interface for Language Model (LLM) providers. Simplifies LLM integration with flexible prompt management and common task functions.
RAG (Retrieval Augmented Generation)
Document Parser
- markitdown: Python tool for converting files and office documents to Markdown.
- MinerU: A high-quality tool for convert PDF to Markdown and JSON.
- docling: Get your documents ready for gen AI.
- marker: Convert PDF to markdown + JSON quickly with high accuracy.
Pipeline and Data Version
- pachyderm: Data-Centric Pipelines and Data Versioning.
Embedding
Benchmark
- MTEB: MTEB (Massive Text Embedding Benchmark) is an open-source benchmarking framework for evaluating and comparing text embedding models across 8 tasks (e.g., classification, retrieval, clustering) using 58 datasets in 112 languages, providing standardized performance metrics for model selection.
- BRIGHT: BBRIGHT is a realistic, challenging benchmark for reasoning-intensive retrieval, featuring 12 diverse datasets (math, code, biology, etc.) to evaluate retrieval models across complex, context-rich queries requiring logical inference.
Vector Database
Indexer and Retriever.
- chroma: Open-source search and retrieval database for AI applications.
- cli: Work seamlessly with Pinecone from the command line.
- milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search.
- pinecone: Pinecone.io Golang Client.
- qdrant: Go client for Qdrant vector search engine.
- tidb: TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
- weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
General Machine Learning libraries
- eaopt: Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution).
- go-datamining: Basic collection of data mining algorithms implemented in Go.
- goml:On-line Machine Learning in Go (and so much more).
- golearn: simple and customizable batteries included ML library in Go.
- gonum:Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more.
- gorgonia: Gorgonia is a library that helps facilitate machine learning in Go.
- spago: Self-contained Machine Learning and Natural Language Processing library in Go.
- goro: A High-level Machine Learning Library for Go.
- goga: Golang Genetic Algorithm.
- hep: hep is the mono repository holding all of go-hep.org/x/hep packages and tools.
- hector: Golang machine learning lib.
- sklearn: bits of sklearn ported to Go.
- stats: A well tested and comprehensive Golang statistics library package with no dependencies.
- tokenizer: NLP tokenizers written in Go language.
Neural Networks
- emergent: Biologically based neural network simulations of the brain written in Go with a 3D GUI powered by Cogent Core.
- gobrain: Neural Networks written in go.
- go-ctr: Go DeepLearning based Recommendation Framework.
- go-deep: Artificial Neural Network.
- go-infer: Go framework for DL model inference and API deployment.
- gomid: A simplistic Neural Network Library in Go.
- gomlx: An Accelerated Machine Learning Framework For Go.
- go-neural: Neural network implementation on golang.
- go-neural: Feedforward Neural Networks in Go.
- gonn: GoNN is an implementation of Neural Network in Go Language, which includes BPNN, RBF, PCN.
- gonn: Building a simple neural network in Go.
- gosom: Self-organizing maps in Go.
- go-perceptron-go: A single / multi layer / recurrent neural network written in Golang.
- olivia: Your new best friend powered by an artificial neural network.
- neurgo: Neural Network toolkit in Go.
- tensorflow: TensorFlow is an open source software library for numerical computation using data flow graphs.
NLP (Natural Language Processing)
- jiagu: Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类.
- lingua-go: The most accurate natural language detection library for Go, suitable for short text and mixed-language text.
- nlp: Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang.
Linear Algebra
- gosl: Linear algebra, eigenvalues, FFT, Bessel, elliptic, orthogonal polys, geometry, NURBS, numerical quadrature, 3D transfinite interpolation, random numbers, Mersenne twister, probability distributions, optimisation, differential equations.
- sparse: Sparse matrix formats for linear algebra supporting scientific and machine learning applications.
Probability Distributions
- godist: Probability distributions and associated methods in Go.
Decision Trees
- CloudForest: CloudForest is a fast, flexible Go library for multi-threaded decision tree ensembles (Random Forest, Gradient Boosting, etc.) designed for high-dimensional heterogeneous data with missing values, emphasizing speed and robustness for real-world machine learning tasks.
Regression
- regression: Multivariable regression library in Go.
- ridge: Ridge regression in Go.
Bayesian Classifiers
- bayesian: Naive Bayesian Classification for Golang.
- multibayes: Multiclass Naive Bayesian Classification.
Recommendation Engines
- regommend: Recommendation engine for Go.
- gorse: Gorse open source recommender system engine.
- too: Simple recommendation engine implementation built on top of Redis.
Evolutionary Algorithms
- eaopt: Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution).
- evo: Evolutionary Algorithms in Go.
Graph
- gocv: Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.
- go-face: Face recognition with Go.
- gogl: A graph library in Go.
- imaging: Imaging is a simple image processing package for Go.
- plot: A repository for plotting and visualizing data.
- wasmvision: wasmVision gets you going with computer vision using WebAssembly.
Cluster
- go-cluster: k-modes and k-prototypes clustering algorithms implementation in Go.
- gokmeans: K-means algorithm implemented in Go (golang).
- kmeans: k-means clustering algorithm implementation written in Go.
Anomaly Detection
- morgoth: Metric anomaly detection.
- anomalyzer: Probabilistic anomaly detection for time series data.
- goanomaly: Golang library for anomaly detection. Uses the Gaussian distribution and the probability density formula.
DataFrames
- gota: Gota: DataFrames and data wrangling in Go.
- dataframe-go: DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration.
- qframe: Immutable data frame for Go.
Explaining Model
- lime: Lime: Explaining the predictions of any machine learning classifier.
Books
Basic Knowledge
Reinforcement Learning
Datasets