awesome-golang-ai

Awesome Golang.ai

Golang AI applications have incredible potential. With unique features like inexplicable speed, easy debugging, concurrency, and excellent libraries for ML, deep learning, and reinforcement learning.

Benchmark

ADeLe: ADeLe v1.0 is a comprehensive AI evaluation framework that combines explanatory analysis and predictive modeling capabilities to systematically assess AI system performance across multiple dimensions.
SWELancer: The SWE-Lancer-Benchmark is designed to evaluate the capabilities of frontier LLMs in solving real-world freelance software engineering tasks, exploring their potential to generate economic value through complex software development scenarios.

Real World Challenge

RPBench-Auto: An automated pipeline for evaluating LLMs for role-playing.
SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation.

Text-to-Speech(TTS)

emergenttts-eval-public: Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.

English

ARC-AGI: The Abstraction and Reasoning Corpus.
ARC-Challenge: AI2 Reasoning Challenge (ARC) Set.
BBH: Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.
BIG-bench: Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models.
GPQA: GPQA: A Graduate-Level Google-Proof Q&A Benchmark.
HelloSwag: HellaSwag: Can a Machine Really Finish Your Sentence?
IFEval: IFEval is designed to systematically evaluate the instruction-following capabilities of large language models by incorporating 25 verifiable instruction types (e.g., format constraints, keyword inclusion) and applying dual strict-loose metrics for automated, objective assessment of model compliance.
LiveBench: A Challenging, Contamination-Free LLM Benchmark.
MMLU: Measuring Massive Multitask Language Understanding ICLR 2021.
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark.
MMLU-Pro: [NeurIPS 2024] A More Robust and Challenging Multi-Task Language Understanding Benchmark.
MTEB: Massive Text Embedding Benchmark.
PIQA: PIQA is a dataset for commonsense reasoning, and was created to investigate the physical knowledge of existing models in NLP.
WinoGrande: An Adversarial Winograd Schema Challenge at Scale.

Chinese

C-Eval: [NeurIPS 2023] A Chinese evaluation suite for foundation models.
CMMLU: Measuring massive multitask language understanding in Chinese.
C-SimpleQA: A Chinese Factuality Evaluation for Large Language Models.

Math

AIME: Evaluation of LLMs on latest math competitions.
grade-school-math: The GSM8K dataset contains 8.5K grade school math word problems designed to evaluate multi-step reasoning capabilities in language models, revealing that even large transformers struggle with these conceptually simple yet procedurally complex tasks.
MATH: The MATH Dataset for NeurIPS 2021, is a benchmark for evaluating mathematical problem-solving capabilities, offering dataset loaders, evaluation code, and pre-training data.
matharena: Evaluation of LLMs on latest math competitions.
MathVista: MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts.
Omni-MATH: Omni-MATH is a comprehensive and challenging benchmark specifically designed to assess LLMs’ mathematical reasoning at the Olympiad level.
imobench: The IMO-Bench dataset is a gold-standard suite of rigorous mathematical reasoning benchmarks derived from International Mathematical Olympiad problems, used to test the limits of AI’s ability to perform complex, robust mathematical proofs and derivations.
TAU-bench: TauBench is an open-source benchmark suite designed to evaluate the performance of large language models (LLMs) on complex reasoning tasks across multiple domains.

Code

AIDER: The leaderboards page of aider presents a performance comparison of various LLMs in programming-related tasks, such as code writing and editing.
BFCL: BFCL aims to provide a thorough study of the function-calling capability of different LLMs.
BigCodeBench: [ICLR’25] BigCodeBench: Benchmarking Code Generation Towards AGI.
Code4Bench: A Mutildimensional Benchmark of Codeforces Data for Different Program Analysis Techniques.
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation.
HumanEval: Code for the paper “Evaluating Large Language Models Trained on Code”.
HLE: Humanity’s Last Exam.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code.
MBPP: The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on.
MultiPL-E: A multi-programming language benchmark for LLMs.
multi-swe-bench: The Multi-SWE-bench project, developed by ByteDance’s Doubao team, is the first open-source multilingual dataset for evaluating and enhancing large language models’ ability to automatically debug code, covering 7 major programming languages (e.g., Java, C++, JavaScript) with real-world GitHub issues to benchmark “full-stack engineering” capabilities.

Code Agent

Multi-SWE-Bench: Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving.
SWE-Bench: SWE-bench: Can Language Models Resolve Real-world Github Issues?
Terminal‑Bench: A benchmark for LLMs on complicated tasks in the terminal.

Search Agent

BrowseComp:
BrowseComp-Plus: BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent.

Tool Use

BFCL: Training and Evaluating LLMs for Function Calls (Tool Calls).
MCP-Bench: MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers.
MCPMark: MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
MCP-Universe: MCP-Universe is a comprehensive framework designed for developing, testing, and benchmarking AI agents.
Tool Decathlon:The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution.
T-Eval: [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step.
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users.
τ²-Bench: τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment.

Computer Use

OSWorld: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments.

Open ended

Arena-Hard: Arena-Hard-Auto: An automatic LLM benchmark.

Visual Reasoning

MMMU: MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI.

Novel Problem Solving

ARC-AGI: The Abstraction and Reasoning Corpus.
ARC-AGI-2: ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test.

Safety

False refusal

Xstest: Röttger et al. (NAACL 2024): “XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models”.

DPG-Bench: The DPG benchmark tests a model’s ability to follow complex image generation prompts.
geneval: GenEval: An object-focused framework for evaluating text-to-image alignment.
LongVideoBench: [Neurips 24’ D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
MLVU: Multi-task Long Video Understanding Benchmark.
perception_test: A Diagnostic Benchmark for Multimodal Video Models is a multimodal benchmark designed to comprehensively evaluate the perception and reasoning skills of multimodal video models.
TempCompass: A benchmark to evaluate the temporal perception ability of Video LLMs.
VBench: VBench is an open-source project aiming to build a comprehensive evaluation benchmark for video generation models.
Video-MME: [CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.

Model Context Protocol

gateway: Universal MCP-Server for your Databases optimized for LLMs and AI-Agents.
mcp-go: A Go implementation of the Model Context Protocol (MCP), enabling seamless integration between LLM applications and external data sources and tools.
mcp-golang: Write Model Context Protocol servers in few lines of go code.
registry: A community driven registry service for Model Context Protocol (MCP) servers.

Large Language Model

GPT

gpt-go: Tiny GPT implemented from scratch in pure Go. Trained on Jules Verne books.

ChatGPT Apps

feishu-openai: Feishu (Lark) integrated with (GPT-4 + GPT-4V + DALL·E-3 + Whisper) delivers an extraordinary work experience.
chatgpt-telegram: Run your own GPTChat Telegram bot, with a single command.

Agent

anyi: A Golang autonomous AI agent framework for assisting real work.
AgenticGoKit: Event-driven Agentic AI framework in Go. LLM-agnostic with MCP tool discovery, built-in observability, and production patterns.
agent-sdk-go: Build AI agents in light speed.
code-editing-agent: A Go-based AI agent that edits code using the DeepSeek model, offering a clear example of how AI agents work.

SDKs

anthropic-sdk-go: Access to Anthropic’s safety-first language model APIs via Go.
cohere-go: Go Library for Accessing the Cohere API.
deepseek-go: A Deepseek client written for Go supporting R-1, Chat V3, and Coder. Also supports external providers like Azure, OpenRouter and Local Ollama.
go-anthropic: Anthropic Claude API wrapper for Go.
go-openai: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go.
go-genai: Google Gen AI Go SDK provides an interface for developers to integrate Google’s generative models into their Go applications.
generative-ai-go: Go SDK for Google Generative AI.
openai-go: The official Go library for the OpenAI API.
volcengine-go-sdk: The Volcengine Go SDK is the official Go language SDK for ByteDance’s Volcengine cloud computing platform, providing developers with programmatic access to various cloud services through a standardized API interface.

DevTools

LocalAI: 🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference.
ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
go-attention: A full attention mechanism and transformer in pure go.
langchaingo: LangChain for Go, the easiest way to write LLM-based programs in Go.
gpt4all-bindings: GPT4All Language Bindings provide cross-language interfaces to easily integrate and interact with GPT4All’s local LLMs, simplifying model loading and inference for developers.
go-openai: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go.
llama.go: llama.go is like llama.cpp in pure Golang.
eino: The ultimate LLM/AI application development framework in Golang.
fabric: fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
genkit: An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to develop, integrate, and test AI features with observability and evaluations. Genkit works with various models and platforms.
swarmgo: SwarmGo (agents-sdk-go) is a Go package that allows you to create AI agents capable of interacting, coordinating, and executing tasks.
orra: The orra-dev/orra project offers resilience for AI agent workflows.
core: A fast, agnostic, and powerful Go AI framework for one-shot workflows, building autonomous agents, and working with LLM providers.
gollm: Unified Go interface for Language Model (LLM) providers. Simplifies LLM integration with flexible prompt management and common task functions.

RAG (Retrieval Augmented Generation)

Document Parser

markitdown: Python tool for converting files and office documents to Markdown.
MinerU: A high-quality tool for convert PDF to Markdown and JSON.
docling: Get your documents ready for gen AI.
marker: Convert PDF to markdown + JSON quickly with high accuracy.

Pipeline and Data Version

pachyderm: Data-Centric Pipelines and Data Versioning.

Embedding

embedding-knowledge-base: a local knowledge base based on chatgpt Embedding and qdrant, supporting data import and Q&A.

Benchmark

MTEB: MTEB (Massive Text Embedding Benchmark) is an open-source benchmarking framework for evaluating and comparing text embedding models across 8 tasks (e.g., classification, retrieval, clustering) using 58 datasets in 112 languages, providing standardized performance metrics for model selection.
BRIGHT: BBRIGHT is a realistic, challenging benchmark for reasoning-intensive retrieval, featuring 12 diverse datasets (math, code, biology, etc.) to evaluate retrieval models across complex, context-rich queries requiring logical inference.

Vector Database

Indexer and Retriever.

chroma: Open-source search and retrieval database for AI applications.
cli: Work seamlessly with Pinecone from the command line.
milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search.
pinecone: Pinecone.io Golang Client.
qdrant: Go client for Qdrant vector search engine.
tidb: TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

General Machine Learning libraries

eaopt: Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution).
go-datamining: Basic collection of data mining algorithms implemented in Go.
goml：On-line Machine Learning in Go (and so much more).
golearn: simple and customizable batteries included ML library in Go.
gonum：Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more.
gorgonia: Gorgonia is a library that helps facilitate machine learning in Go.
spago: Self-contained Machine Learning and Natural Language Processing library in Go.
goro: A High-level Machine Learning Library for Go.
goga: Golang Genetic Algorithm.
hep: hep is the mono repository holding all of go-hep.org/x/hep packages and tools.
hector: Golang machine learning lib.
sklearn: bits of sklearn ported to Go.
stats: A well tested and comprehensive Golang statistics library package with no dependencies.
tokenizer: NLP tokenizers written in Go language.

Neural Networks

emergent: Biologically based neural network simulations of the brain written in Go with a 3D GUI powered by Cogent Core.
gobrain: Neural Networks written in go.
go-ctr: Go DeepLearning based Recommendation Framework.
go-deep: Artificial Neural Network.
go-infer: Go framework for DL model inference and API deployment.
gomid: A simplistic Neural Network Library in Go.
gomlx: An Accelerated Machine Learning Framework For Go.
go-neural: Neural network implementation on golang.
go-neural: Feedforward Neural Networks in Go.
gonn: GoNN is an implementation of Neural Network in Go Language, which includes BPNN, RBF, PCN.
gonn: Building a simple neural network in Go.
gosom: Self-organizing maps in Go.
go-perceptron-go: A single / multi layer / recurrent neural network written in Golang.
olivia: Your new best friend powered by an artificial neural network.
neurgo: Neural Network toolkit in Go.
tensorflow: TensorFlow is an open source software library for numerical computation using data flow graphs.

NLP (Natural Language Processing)

jiagu: Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类.
lingua-go: The most accurate natural language detection library for Go, suitable for short text and mixed-language text.
nlp: Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang.

Linear Algebra

gosl: Linear algebra, eigenvalues, FFT, Bessel, elliptic, orthogonal polys, geometry, NURBS, numerical quadrature, 3D transfinite interpolation, random numbers, Mersenne twister, probability distributions, optimisation, differential equations.
sparse: Sparse matrix formats for linear algebra supporting scientific and machine learning applications.

Probability Distributions

godist: Probability distributions and associated methods in Go.

Decision Trees

CloudForest: CloudForest is a fast, flexible Go library for multi-threaded decision tree ensembles (Random Forest, Gradient Boosting, etc.) designed for high-dimensional heterogeneous data with missing values, emphasizing speed and robustness for real-world machine learning tasks.

Regression

regression: Multivariable regression library in Go.
ridge: Ridge regression in Go.

Bayesian Classifiers

bayesian: Naive Bayesian Classification for Golang.
multibayes: Multiclass Naive Bayesian Classification.

Recommendation Engines

regommend: Recommendation engine for Go.
gorse: Gorse open source recommender system engine.
too: Simple recommendation engine implementation built on top of Redis.

Evolutionary Algorithms

eaopt: Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution).
evo: Evolutionary Algorithms in Go.

Graph

gocv: Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.
go-face: Face recognition with Go.
gogl: A graph library in Go.
imaging: Imaging is a simple image processing package for Go.
plot: A repository for plotting and visualizing data.
wasmvision: wasmVision gets you going with computer vision using WebAssembly.

Cluster

go-cluster: k-modes and k-prototypes clustering algorithms implementation in Go.
gokmeans: K-means algorithm implemented in Go (golang).
kmeans: k-means clustering algorithm implementation written in Go.

Anomaly Detection

morgoth: Metric anomaly detection.
anomalyzer: Probabilistic anomaly detection for time series data.
goanomaly: Golang library for anomaly detection. Uses the Gaussian distribution and the probability density formula.

DataFrames

gota: Gota: DataFrames and data wrangling in Go.
dataframe-go: DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration.
qframe: Immutable data frame for Go.

Explaining Model

lime: Lime: Explaining the predictions of any machine learning classifier.

Books

Machine Learning With go
Machine-Learning-With-Go: example code.
Go Machine Learning Projects: Go Machine Learning Projects, published by Packt.
机器学习：Go语言实现
GO语言机器学习实战

Basic Knowledge

Reinforcement Learning

Hands-on Reinforcement Learning

Datasets

LendingClub

This site is open source. Improve this page.

awesome-golang-ai

Awesome Golang.ai

Benchmark

Real World Challenge

Text-to-Speech(TTS)

English

Chinese

Math

Code

Code Agent

Search Agent

Tool Use

Computer Use

Open ended

Visual Reasoning

Novel Problem Solving

Safety

False refusal

Multi-modal

Model Context Protocol

Large Language Model

GPT

ChatGPT Apps

Agent

SDKs

DevTools

RAG (Retrieval Augmented Generation)

Document Parser

Pipeline and Data Version

Embedding

Benchmark

Vector Database

General Machine Learning libraries

Neural Networks

NLP (Natural Language Processing)

Linear Algebra

Probability Distributions

Decision Trees

Regression

Bayesian Classifiers

Recommendation Engines

Evolutionary Algorithms

Graph

Cluster

Anomaly Detection

DataFrames

Explaining Model

Books

Basic Knowledge

Reinforcement Learning

Datasets