awesome-golang-ai

Awesome Golang.ai

Golang AI applications have incredible potential. With unique features like inexplicable speed, easy debugging, concurrency, and excellent libraries for ML, deep learning, and reinforcement learning.

Benchmark

ADeLe: ADeLe v1.0 is a comprehensive AI evaluation framework that combines explanatory analysis and predictive modeling capabilities to systematically assess AI system performance across multiple dimensions.
SWELancer: The SWE-Lancer-Benchmark is designed to evaluate the capabilities of frontier LLMs in solving real-world freelance software engineering tasks, exploring their potential to generate economic value through complex software development scenarios.

English

ARC-AGI: The Abstraction and Reasoning Corpus.
ARC-Challenge: AI2 Reasoning Challenge (ARC) Set.
BBH: Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.
BIG-bench: Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models.
GPQA: GPQA: A Graduate-Level Google-Proof Q&A Benchmark.
HelloSwag: HellaSwag: Can a Machine Really Finish Your Sentence?
IFEval: IFEval is designed to systematically evaluate the instruction-following capabilities of large language models by incorporating 25 verifiable instruction types (e.g., format constraints, keyword inclusion) and applying dual strict-loose metrics for automated, objective assessment of model compliance.
LiveBench: A Challenging, Contamination-Free LLM Benchmark.
MMLU: Measuring Massive Multitask Language Understanding ICLR 2021.
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark.
MMLU-Pro: [NeurIPS 2024] A More Robust and Challenging Multi-Task Language Understanding Benchmark.
MTEB: Massive Text Embedding Benchmark.
PIQA: PIQA is a dataset for commonsense reasoning, and was created to investigate the physical knowledge of existing models in NLP.
WinoGrande: An Adversarial Winograd Schema Challenge at Scale.

Chinese

C-Eval: [NeurIPS 2023] A Chinese evaluation suite for foundation models.
CMMLU: Measuring massive multitask language understanding in Chinese.
C-SimpleQA: A Chinese Factuality Evaluation for Large Language Models.

Math

AIME: Evaluation of LLMs on latest math competitions.
grade-school-math: The GSM8K dataset contains 8.5K grade school math word problems designed to evaluate multi-step reasoning capabilities in language models, revealing that even large transformers struggle with these conceptually simple yet procedurally complex tasks.
MATH: The MATH Dataset for NeurIPS 2021, is a benchmark for evaluating mathematical problem-solving capabilities, offering dataset loaders, evaluation code, and pre-training data.
MathVista: MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts.
Omni-MATH: Omni-MATH is a comprehensive and challenging benchmark specifically designed to assess LLMs’ mathematical reasoning at the Olympiad level.
TAU-bench: TauBench is an open-source benchmark suite designed to evaluate the performance of large language models (LLMs) on complex reasoning tasks across multiple domains.

Code

AIDER: The leaderboards page of aider presents a performance comparison of various LLMs in programming-related tasks, such as code writing and editing.
BFCL: BFCL aims to provide a thorough study of the function-calling capability of different LLMs.
BigCodeBench: [ICLR’25] BigCodeBench: Benchmarking Code Generation Towards AGI.
Code4Bench: A Mutildimensional Benchmark of Codeforces Data for Different Program Analysis Techniques.
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation.
HumanEval: Code for the paper “Evaluating Large Language Models Trained on Code”.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code.
MBPP: The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on.
MultiPL-E: A multi-programming language benchmark for LLMs.
multi-swe-bench: The Multi-SWE-bench project, developed by ByteDance’s Doubao team, is the first open-source multilingual dataset for evaluating and enhancing large language models’ ability to automatically debug code, covering 7 major programming languages (e.g., Java, C++, JavaScript) with real-world GitHub issues to benchmark “full-stack engineering” capabilities.
SWE-bench: SWE-bench is a benchmark suite designed to evaluate the capabilities of large language models (LLMs) in solving real-world software engineering tasks, focusing on actual software bug-fixing challenges extracted from open-source projects.

Tool Use

BFCL: Training and Evaluating LLMs for Function Calls (Tool Calls).
T-Eval: [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step.
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users.

Open ended

Arena-Hard: Arena-Hard-Auto: An automatic LLM benchmark.

Safety

False refusal

Xstest: Röttger et al. (NAACL 2024): “XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models”.

DPG-Bench: The DPG benchmark tests a model’s ability to follow complex image generation prompts.
geneval: GenEval: An object-focused framework for evaluating text-to-image alignment.
LongVideoBench: [Neurips 24’ D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
MLVU: Multi-task Long Video Understanding Benchmark.
perception_test: A Diagnostic Benchmark for Multimodal Video Models is a multimodal benchmark designed to comprehensively evaluate the perception and reasoning skills of multimodal video models.
TempCompass: A benchmark to evaluate the temporal perception ability of Video LLMs.
VBench: VBench is an open-source project aiming to build a comprehensive evaluation benchmark for video generation models.
Video-MME: [CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.

Model Context Protocol

mcp-go: A Go implementation of the Model Context Protocol (MCP), enabling seamless integration between LLM applications and external data sources and tools.
mcp-golang: Write Model Context Protocol servers in few lines of go code.
gateway: Universal MCP-Server for your Databases optimized for LLMs and AI-Agents.

Large Language Model

GPT

gpt-go: Tiny GPT implemented from scratch in pure Go. Trained on Jules Verne books.

ChatGPT Apps

feishu-openai: Feishu (Lark) integrated with (GPT-4 + GPT-4V + DALL·E-3 + Whisper) delivers an extraordinary work experience.
chatgpt-telegram: Run your own GPTChat Telegram bot, with a single command.

SDKs

openai-go: The official Go library for the OpenAI API.
go-openai: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go.
generative-ai-go: Go SDK for Google Generative AI.
anthropic-sdk-go: Access to Anthropic’s safety-first language model APIs via Go.
go-anthropic: Anthropic Claude API wrapper for Go.
deepseek-go: A Deepseek client written for Go supporting R-1, Chat V3, and Coder. Also supports external providers like Azure, OpenRouter and Local Ollama.

DevTools

ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
go-attention: A full attention mechanism and transformer in pure go.
langchaingo: LangChain for Go, the easiest way to write LLM-based programs in Go.
gpt4all-bindings: GPT4All Language Bindings provide cross-language interfaces to easily integrate and interact with GPT4All’s local LLMs, simplifying model loading and inference for developers.
go-openai: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go.
llama.go: llama.go is like llama.cpp in pure Golang.
eino: The ultimate LLM/AI application development framework in Golang.
fabric: fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
genkit: An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to develop, integrate, and test AI features with observability and evaluations. Genkit works with various models and platforms.
swarmgo: SwarmGo (agents-sdk-go) is a Go package that allows you to create AI agents capable of interacting, coordinating, and executing tasks.
orra: The orra-dev/orra project offers resilience for AI agent workflows.
core: A fast, agnostic, and powerful Go AI framework for one-shot workflows, building autonomous agents, and working with LLM providers.
gollm: Unified Go interface for Language Model (LLM) providers. Simplifies LLM integration with flexible prompt management and common task functions.

RAG (Retrieval Augmented Generation）

Document Parser

markitdown: Python tool for converting files and office documents to Markdown.
MinerU: A high-quality tool for convert PDF to Markdown and JSON.
docling: Get your documents ready for gen AI.
marker: Convert PDF to markdown + JSON quickly with high accuracy.

Pipeline and Data Version

pachyderm: Data-Centric Pipelines and Data Versioning.

Embedding

Benchmark

MTEB: MTEB (Massive Text Embedding Benchmark) is an open-source benchmarking framework for evaluating and comparing text embedding models across 8 tasks (e.g., classification, retrieval, clustering) using 58 datasets in 112 languages, providing standardized performance metrics for model selection.
BRIGHT: BBRIGHT is a realistic, challenging benchmark for reasoning-intensive retrieval, featuring 12 diverse datasets (math, code, biology, etc.) to evaluate retrieval models across complex, context-rich queries requiring logical inference.

Vector Database

Indexer and Retriever.

milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search.
weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
tidb: TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.

General Machine Learning libraries

goml：On-line Machine Learning in Go (and so much more).
golearn: simple and customizable batteries included ML library in Go.
gonum：Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more.
gorgonia: Gorgonia is a library that helps facilitate machine learning in Go.
spago: Self-contained Machine Learning and Natural Language Processing library in Go.
goro: A High-level Machine Learning Library for Go.
goga: Golang Genetic Algorithm.
hep: hep is the mono repository holding all of go-hep.org/x/hep packages and tools.
hector: Golang machine learning lib.
sklearn: bits of sklearn ported to Go.
tokenizer: NLP tokenizers written in Go language.

Neural Networks

gobrain: Neural Networks written in go.
go-neural: Neural network implementation on golang.
go-deep: Artificial Neural Network.
olivia: Your new best friend powered by an artificial neural network.
gomid: A simplistic Neural Network Library in Go.
neurgo: Neural Network toolkit in Go.
gonn: GoNN is an implementation of Neural Network in Go Language, which includes BPNN, RBF, PCN.
gosom: Self-organizing maps in Go.
go-perceptron-go: A single / multi layer / recurrent neural network written in Golang.

Linear Algebra

gosl: Linear algebra, eigenvalues, FFT, Bessel, elliptic, orthogonal polys, geometry, NURBS, numerical quadrature, 3D transfinite interpolation, random numbers, Mersenne twister, probability distributions, optimisation, differential equations.
sparse: Sparse matrix formats for linear algebra supporting scientific and machine learning applications.

Probability Distributions

godist: Probability distributions and associated methods in Go.

Decision Trees

CloudForest: CloudForest is a fast, flexible Go library for multi-threaded decision tree ensembles (Random Forest, Gradient Boosting, etc.) designed for high-dimensional heterogeneous data with missing values, emphasizing speed and robustness for real-world machine learning tasks.
Regression
regression: Multivariable regression library in Go.
ridge: Ridge regression in Go.

Bayesian Classifiers

bayesian: Naive Bayesian Classification for Golang.
multibayes: Multiclass Naive Bayesian Classification.

Recommendation Engines

regommend: Recommendation engine for Go.
gorse: Go Recommender System Engine.
too: Simple recommendation engine implementation built on top of Redis.

Evolutionary Algorithms

eaopt: Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution).
evo: Evolutionary Algorithms in Go.

Graph

gogl: A graph library in Go.

Cluster

gokmeans: K-means algorithm implemented in Go (golang).
kmeans: k-means clustering algorithm implementation written in Go.

Anomaly Detection

morgoth: Metric anomaly detection.
anomalyzer: Probabilistic anomaly detection for time series data.
goanomaly: Golang library for anomaly detection. Uses the Gaussian distribution and the probability density formula.

DataFrames

gota: Gota: DataFrames and data wrangling in Go.
dataframe-go: DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration.
qframe: Immutable data frame for Go.

Explaining Model

lime: Lime: Explaining the predictions of any machine learning classifier.

Books

Basic Knowledge

Reinforcement Learning

Hands-on Reinforcement Learning

Datasets

LendingClub

Star History

</a>

Star Geographical Distribution

</a>

This site is open source. Improve this page.

awesome-golang-ai

Awesome Golang.ai

Benchmark

English

Chinese

Math

Code

Tool Use

Open ended

Safety

False refusal

Multi-modal

Model Context Protocol

Large Language Model

GPT

ChatGPT Apps

SDKs

DevTools

RAG (Retrieval Augmented Generation）

Document Parser

Pipeline and Data Version

Embedding

Benchmark

Vector Database

General Machine Learning libraries

Neural Networks

Linear Algebra

Probability Distributions

Decision Trees

Regression

Bayesian Classifiers

Recommendation Engines

Evolutionary Algorithms

Graph

Cluster

Anomaly Detection

DataFrames

Explaining Model

Books

Basic Knowledge

Reinforcement Learning

Datasets

Star History

Star Geographical Distribution