Jobs

Showing 8 jobs

Full-time

Research Engineers

Tensorplex Labs

Tensorplex Labs is hiring Research Engineers focused on:

Scalable synthetic data generation
Open, attack-resistant human feedback collection
Seamless human-AI task delegation & hallucination detection

Join our research team to work on pushing the boundaries of incentivised decentralised RLHF platforms. From defending against reward hackers to generating the most diverse synthetic datasets, come own the hardest problems to generate the highest impact outcomes!

If you're keen to explore any of these topics, send over your background & publications via X DM.

Competitive

Remote

•

Posted 75 days ago

Full-time

Distributed Training

Senior AI Researcher, DLLMT

τemplar

Senior AI Researcher, Distributed Large Language Model Training

About Templar

Templar is a pioneering community of researchers and engineers pushing the boundaries of permissionless pretraining. We are advancing the frontier of distributed AI development by enabling collaborative training across diverse computational resources without traditional centralized constraints. Our work is detailed in our recent research: Incentivizing Permissionless Distributed Learning of LLMs.

Overview

We're seeking an exceptional AI researcher to advance the frontier of distributed training for Large Language Models. You'll lead cutting-edge research efforts exploring novel approaches to distributed LLM pretraining, with a focus on communication optimization, incentivization mechanisms, and trust frameworks in permissionless environments. Your work will directly contribute to scaling our proven approach and developing the next generation of collaboratively trained language models.

Responsibilities

Distributed LLM Training Research
Drive foundational research in decentralized optimization algorithms specifically for large language model pretraining
Develop innovative gradient aggregation approaches that maintain training stability across heterogeneous nodes during LLM training
Research efficient distributed learning mechanisms that minimize communication overhead for transformer architectures
Design novel trust and quality assessment frameworks for distributed LLM training contributions

Investigate scaling laws and convergence properties specific to distributed language model training

System Architecture & Optimization for LLMs

Conceptualize scalable architectures for decentralized LLM training across the full pretraining lifecycle
Research compression and sparsification techniques optimized for transformer gradient patterns and network bandwidth requirements
Develop synchronization mechanisms resilient to node failures during long-running LLM training processes
Create adaptive optimization techniques suitable for varied computational environments in language model training
Design efficient data loading and preprocessing pipelines for distributed LLM training

Research Leadership

Lead a multidisciplinary team investigating distributed LLM training systems
Establish research roadmaps aligning theoretical advances in distributed optimization with practical LLM training implementations
Collaborate with academic and industry partners to advance distributed language model training
Publish findings in top-tier conferences and journals focused on machine learning and distributed systems
Mentor junior researchers and contribute to the broader research community

Required Qualifications

PhD/MSc in Computer Science, Machine Learning, or related field
Strong research background in distributed systems, machine learning optimization, or large-scale model training
Deep understanding of transformer architectures and large language model training dynamics
Expertise in modern deep learning frameworks (PyTorch, JAX) and distributed computing systems
Demonstrated ability to translate theoretical research into practical implementations for large-scale systems
Excellent written and verbal communication skills with experience presenting research findings

Preferred Experience

Research publications in distributed learning, federated optimization, or large language model training
Hands-on experience with LLM pretraining, including data preparation, training optimization, and evaluation
Background in information theory, compression, or signal processing applied to neural networks
Contributions to open-source machine learning projects, particularly those related to distributed training or LLMs
Experience with permissionless or incentivized distributed computing systems

Key Research Areas

Asynchronous and semi-synchronous distributed optimization for transformer training
Gradient compression and efficient communication protocols tailored to LLM parameter patterns
Trust mechanisms and contribution quality assessment in permissionless LLM training
Specialized techniques for distributed domain adaptation of language models
Scalable architectures for heterogeneous computing environments in LLM training
Incentivization mechanisms for sustainable distributed LLM pretraining

Potential Short-Term Objectives (0-6 Months)

Conduct comprehensive analysis of current distributed LLM training approaches, particularly in permissionless settings

Benchmark and evaluate existing distributed training methods on language model tasks

Identify key research directions for improving communication efficiency, robustness, and verification in distributed LLM training

Develop experimental frameworks for rigorously comparing distributed optimization strategies for transformer models

Design and prototype trust-based contribution evaluation mechanisms for LLM training participants

Collaborate with engineering teams to address practical challenges in scaling our training approach

Potential Long-Term Objectives (6+ Months)

Pioneer novel gradient compression and aggregation techniques optimized for transformer architectures
Develop scalable architectures supporting LLM training across heterogeneous node capabilities and trust levels
Create frameworks for domain-specific LLM adaptation in decentralized environments
Establish theoretical foundations for trust-weighted learning with unverified participants in LLM training
Lead research initiatives resulting in breakthrough publications and open-source contributions to the distributed LLM training community
Drive the development of Templar's next-generation distributed language models

Impact

Your research will directly advance Templar's mission to democratize AI development through permissionless, distributed LLM training. You'll help create more efficient, accessible, and scalable distributed training systems that enable collaborative development of state-of-the-art language models. Your work will contribute to a future where cutting-edge AI capabilities can be developed through community collaboration rather than concentrated computational resources, fundamentally changing how large language models are created and improved.

How to Apply

Send your application materials to: [email protected]

Competitive

Remote

•

Posted 60 days ago

Full-time

Distributed Training

Senior ML Engineer, DLLMT

τemplar

Senior ML Engineer, DLLM Training Infrastructure

Overview

We're seeking an exceptional ML Engineer to build and optimize the infrastructure for distributed Large Language Model training at scale. You'll architect, implement, and maintain the core systems that power our permissionless training approach, working with cutting-edge distributed training frameworks and building novel solutions for decentralized environments. Your engineering expertise will be critical to scaling our proven approach and enabling the next generation of collaboratively trained language models.

Responsibilities

Distributed Training Infrastructure

Design and implement scalable distributed training systems using frameworks like TorchTitan, Megatron-LM, DeepSpeed, and FairScale
Build and optimize data parallelism, model parallelism, and pipeline parallelism strategies for large-scale LLM training
Implement efficient gradient synchronization and all-reduce operations across heterogeneous clusters
Develop custom CUDA kernels and optimize memory management for large model training
Build fault-tolerant training systems with automatic checkpoint recovery and node failure handling

Framework Development & Optimization

Extend existing distributed training frameworks to support permissionless, multi-party training scenarios
Implement advanced optimization techniques including gradient compression, quantization, and sparsification
Build efficient communication backends optimized for high-latency, unreliable network conditions
Develop custom schedulers and resource managers for heterogeneous computational environments
Create monitoring and profiling tools for distributed training performance analysis

System Architecture & Scaling

Architect end-to-end training pipelines from data ingestion to model deployment
Implement efficient data loading and preprocessing systems for multi-TB datasets
Build containerized and orchestrated training environments using Kubernetes, Docker, and cloud platforms
Design and implement sharding strategies for massive parameter spaces (100B+ parameters)
Develop automated testing and continuous integration pipelines for distributed training systems

Performance Engineering

Profile and optimize training throughput, memory usage, and communication efficiency
Implement mixed-precision training, gradient accumulation, and other memory optimization techniques
Build custom operators and fused kernels for transformer architectures
Optimize inter-node and intra-node communication patterns for various network topologies
Develop benchmarking suites and performance regression testing frameworks

Required Qualifications

Bachelor's/Master's degree in Computer Science, Engineering, or related technical field
5+ years of experience in large-scale distributed systems and high-performance computing
Deep hands-on experience with distributed training frameworks: TorchTitan, Megatron-LM, DeepSpeed, FairScale, or similar
Expert-level PyTorch knowledge with experience in distributed training (DDP, FSDP, RPC)
Strong systems programming skills in Python and C++/CUDA
Experience with containerization, orchestration, and cloud computing platforms (AWS, GCP, Azure)
Proven track record of optimizing large-scale ML training workloads

Preferred Experience

Experience training models with 10B+ parameters using model parallelism techniques
Background in CUDA programming and GPU optimization for deep learning workloads
Experience with networking protocols and communication libraries (NCCL, Gloo, MPI)
Knowledge of transformer architectures and attention mechanism implementations
Experience with distributed storage systems and high-throughput data pipelines
Familiarity with blockchain technologies, P2P networks, or decentralized systems
Contributions to open-source distributed training frameworks or ML infrastructure projects

Technical Stack & Tools

Core Frameworks

PyTorch Distributed: DDP, FSDP, RPC, torch.distributed

Training Frameworks: TorchTitan, Megatron-LM, DeepSpeed, FairScale, Accelerate

Communication: NCCL, Gloo, UCX, InfiniBand

Compute: CUDA, Triton, cuDNN, multi-GPU and multi-node setups

Infrastructure & DevOps

Orchestration: Kubernetes, Docker, Slurm, Ray

Cloud Platforms: AWS (EKS, EC2, S3), GCP (GKE, Compute Engine), Azure

Monitoring: Weights & Biases, TensorBoard, Prometheus, Grafana

Storage: Distributed file systems, object storage, high-performance I/O

Development Tools

Languages: Python, C++, CUDA, Bash
Version Control: Git, distributed development workflows
CI/CD: GitHub Actions, GitLab CI, automated testing frameworks
Key Engineering Challenges
Building fault-tolerant distributed training systems for unreliable, heterogeneous nodes
Implementing efficient gradient aggregation and synchronization across high-latency networks
Optimizing memory usage and computation for massive transformer models
Developing trust and verification mechanisms for distributed training contributions
Creating seamless integration between permissionless participants and training infrastructure
Scaling training systems to handle thousands of distributed participants
Immediate Engineering Objectives (0-6 Months)
Analyze and benchmark current distributed training frameworks for permissionless scenarios
Build proof-of-concept distributed training system extending our current architecture
Implement core infrastructure components: gradient compression, fault tolerance, and node management

Develop testing frameworks and performance benchmarking suites

Create monitoring and observability tools for distributed training operations

Collaborate with research teams to implement novel algorithms in production systems

Long-Term Engineering Goals (6+ Months)

Lead development of Templar's next-generation distributed training platform

Build production-ready systems supporting 1000+ distributed training participants

Implement advanced optimization techniques and custom CUDA kernels for efficiency gains

Create comprehensive SDK and APIs for permissionless training participation

Establish industry-leading performance benchmarks for distributed LLM training

Open-source key infrastructure components to benefit the broader ML community

Impact

Your engineering expertise will be fundamental to scaling Templar's revolutionary approach to permissionless AI training. You'll create the robust, scalable infrastructure that enables collaborative training of state-of-the-art language models across distributed, untrusted environments. Your work will directly enable the democratization of AI development by making large-scale model training accessible to researchers and organizations worldwide, regardless of their individual computational resources.

How to Apply

Send your application materials to: [email protected]

Competitive

Remote

•

Posted 60 days ago

Full-time

Pretraining

Inference

Extremely Talented Inference Engineer

MANIFOLD

We are a small team focused on engineering excellence.

With @Targon (x @TargonCompute) we have aggregated over 1,000 H200s and are now optimizing model inference for enterprise-grade latency and throughput.

As we begin serving the enterprise, you will play a key role in building reliable, high-performance production systems

You are the right fit if you:

Care deeply about democratizing access to AI
Thrive in fast-paced frontier environments
Have experience with any of:
vLLM
SGLang
TensorRT

Please email [email protected] or DM him on X @jameswoodmanv if you are interested or know anyone who might be.

Competitive

Austin, TX

•

Posted 41 days ago

Full-Time

Asset Management

Junior Data Engineer (UK)

Tensora Group

What you'll do (responsibilities)

This is a full-time remote role for a Junior Data Engineer at Tensora within our innovation team. The focus of the innovation team is to push the boundaries of what's possible, allowing us to continually outcompete ourselves and others within the Bittensor ecosystem. To achieve this we need to ensure we are staying at the forefront of technological advancements, keeping up with and implementing the newest models and hardware.

This role will center on a project aimed at collecting, processing, and analysing large volumes of blockchain data across multiple networks. The Junior Data Engineer will help develop and maintain data pipelines that pull in-chain data from blockchains like Bittensor, Ethereum and others. They will also assist with designing systems for efficient storage and querying of blockchain data, including learning and applying graph database technologies where needed. Although prior experience with graph databases isn’t required, a willingness to learn and adapt will be key. This work will contribute to the development of a tool designed to make blockchain data more accessible for vulnerability identification and forensic chain analysis following security events such as hacks.

If you're passionate about innovation and thrive in a dynamic environment, we invite you to join us in our goal of shaping the future of AI. We plan to hold quarterly in-person hackathon days followed by dinner in Manchester. Therefore, some travel will be expected approximately four times a year.

What we're looking for (qualifications)

- Strong programming skills in Python, with a focus on data processing or backend development.
- Solid understanding of data structures, databases, and basic data engineering concepts.
- Exposure to working with APIs and/or blockchain RPC endpoints to pull data.
- A strong willingness to learn new technologies, including graph databases (e.g., Neo4j) and blockchain frameworks like Substrate.
- Ability to work independently and collaboratively in a fast-paced, remote environment.
- Good written and verbal communication skills.
- A problem-solving mindset, with attention to detail and a drive to understand systems deeply.
- Experience or exposure to working with blockchain technologies (Ethereum, Polkadot, Bittensor, etc.) is a plus.
- Familiarity with graph databases (e.g., Neo4j, ArangoDB, or similar) is a plus.

What we offer (compensation & benefits)

We'll treat you well. If there are any other benefits that are important to you, we'd like to hear.

- Competitive salary and incentive scheme — multiple options based on your desire for ownership.
- Paid time off — 4 weeks paid vacation, paid sick leave, and paid parental leave.
- Hardware setup — new MacBook Air, big display, and accessories.

How to Apply

Interested in learning more about what it's like to build Tensora with us?
We'd love to talk!
Contact us directly at [email protected]

- Include this role's title in your subject line.
- Send along links that best showcase the relevant things you've built and done.
- Tell us briefly why you're interested in joining Tensora.

Competitive

United Kingdom

•

Posted 67 days ago

Full-time

Analysts/Collaborators

AI Factory

Collaborators/ Business Analyst "🚀 We're Hiring Collaborators at AI Factory subnet 80!

Hello everyone! We're excited to announce that AI Factory is looking for new collaborators to join our growing team.

If you're passionate about AI and want to be part of an engaging and innovative community, this could be a great fit for you!

What We're Looking For: Strong understanding of AI concepts, especially trending topics in the field Basic presentation skills. You should be able to create slides and deliver engaging talks. You're comfortable giving public presentations (your identity can remain private, and showing your face is not required)

How to Apply: If you're interested, please send a brief description of yourself and your background to the AI Factory Discord Server or DM.

Competitive

Remote

•

Posted 68 days ago

Full-time

Bittensor Trading Engineer

Yuma

As a Bittensor Trading Engineer, you will be a critical member of the Yuma investment team and at the forefront of trading and portfolio execution within the Bittensor ecosystem. You will actively manage and optimize on-chain trading strategies across tokens. This role uniquely combines quantitative investment expertise with hands-on coding skills, allowing you to build, deploy, and oversee algorithmic trading models that maximize returns within the Dynamic TAO (DTAO) framework.

You will work closely with the Yuma Investment Lead to translate investment strategy into code, ensuring effective trade execution, liquidity optimization, and risk management. You will leverage smart contract interactions, trading bots, and automated portfolio rebalancing techniques to maintain a high-performing, data-driven investment approach.

KEY RESPONSIBILITIES :

Algorithmic Trading & Execution

Write, manage, and oversee the execution of dynamic TAO trading strategies across a portfolio of tokens. Develop, deploy, and optimize algorithmic trading scripts to automate trading decisions and portfolio rebalancing. Implement and maintain smart contract integrations for automated trade execution, staking, and liquidity provisioning. Monitor on-chain order flow, token velocity, and market signals to adjust trading strategies in real time. Ensure efficient execution of staking, validator participation, and yield-maximizing strategies. Portfolio Construction & Investment Strategy

Work closely with the Yuma Investment Lead to execute on-chain portfolio construction decisions. Analyze liquidity, slippage, and order book depth to optimize trade execution across multiple subnet tokens. Utilize quantitative and statistical models to assess risk-adjusted returns and improve trade execution strategies. Implement risk management frameworks, including dynamic hedging strategies and portfolio optimization techniques. Develop custom dashboards and trading analytics tools to monitor portfolio performance, P&L, and market impact. Research & Market Intelligence

Conduct deep technical and economic research on Bittensor, TAO, and subnet token markets. Assess tokenomics models, staking incentives, and liquidity incentives for investment opportunities. Track on-chain metrics and market data using Dune Analytics, Nansen, or custom-built data pipelines. Identify emerging trends in decentralized AI, validator economies, and subnet activity that influence investment decisions. Stay ahead of regulatory developments, smart contract risks, and security vulnerabilities in automated trading systems. Technical & Risk Management

Ensure trade execution reliability and efficiency, maintaining uptime and accuracy in automated systems.

Implement automated risk controls, including circuit breakers, position limits, and execution safeguards.
Work with cross-functional teams to optimize system architecture, API integrations, and decentralized execution pipelines.
Stay ahead of Bittensor subnet developments, ensuring trading strategies remain adaptive and competitive.

QUALIFICATIONS:

Proficiency in coding languages: Python, Rust, Solidity, SQL, or other relevant programming languages for on-chain trading execution. Experience in developing trading algorithms, automated execution strategies, or DeFi smart contract interactions.

Strong understanding of blockchain protocols, validator dynamics, and decentralized exchanges (DEXs).
Hands-on experience with smart contract interactions, oracles, and automated trading bots. Technical fluency in smart contract interactions, blockchain data, and automated trading mechanisms (Python, SQL, Solidity preferred).
Deep understanding of tokenomics, staking incentives, validator economics, and liquidity provisioning.
Experience in on-chain financial modeling, risk assessment, and portfolio optimization.
Strong ability to translate investment theses into executable, algorithmic trading strategies Preferred Qualifications

3+ years of experience in digital asset trading, quantitative finance, hedge funds, or crypto investment strategies. Familiarity with staking-as-a-service, perpetual futures, structured DeFi products, and liquidity mining strategies. Experience with Bittensor, TAO, and subnet token staking mechanisms. Understanding of DeFi, perpetual futures, options trading, and structured crypto products. Prior experience developing algorithmic trading models for digital asset hedge funds or quant desks. WHAT WE OFFER:

An opportunity to thrive in a dynamic, cutting-edge environment at a rapidly scaling company led by experienced industry leaders An innovative learning environment where you can immerse yourself in the latest technologies, contribute to building a transformative new industry, and make a meaningful impact.Competitive base salary, bonus and incentive compensation

Unlimited PTO / Flexible time off - work with your leader to take time off when you need it
Professional development budget with flexibility for personal and professional growth
Outstanding health insurance for employee, partner and dependents
Life insurance, short-term & long-term disability coverage
401K plan with company contribution
Flexible spending programs for medical and dependent care
Paid parental leave

Competitive

Stamford, CT

•

Posted 96 days ago

Full-time

AI Data Pipeline

Yuma Investment Lead

Yuma

Join Us in Shaping the Future of Decentralized Intelligence.

DESCRIPTION:

As the Investment Lead, you will lead the end-to-end investment strategy for Bittensor’s ecosystem, TAO, and subnet alpha tokens, overseeing capital deployment, risk management, and investment execution. You will represent Yuma externally, engaging with key stakeholders, investor and partners to advance Yuma’s investment objectives and strengthen industry relationships.

This role requires deep investment expertise—combined with a technical understanding of blockchain markets, decentralized finance (DeFi), and decentralized AI (deAI). You will oversee investment selection, sizing, and structuring, ensuring an optimal balance between liquidity, risk, and upside potential.

KEY RESPONSIBILITIES:

Investment Strategy & Execution

Own the full investment lifecycle—from sourcing and due diligence to execution and ongoing portfolio management. Develop and execute an investment framework for TAO and subnet alpha tokens, optimizing capital allocation across liquid and illiquid markets. Determine investment sizing for each opportunity, evaluating potential returns, risk exposure, and liquidity constraints. Structure investments with a focus on staking incentives, validator dynamics, and long-term token value accrual. Assess the impact of Dynamic TAO (DTAO) on market opportunities, portfolio construction, and investment risk. Oversee automated trading strategies, ensuring efficient execution of staking, liquidity provisioning, and asset rebalancing within the Bittensor network. Leadership & Execution of Investment Strategies

Guide quantitative and algorithmic trading execution, leveraging smart contract interactions, automated trading systems and investment models. Ensure effective use of on-chain data analytics, helping to refine trading signals and market intelligence. Provide strategic direction on TAO and subnet token trade execution, position sizing, and real-time market adjustments. External Stakeholder Engagement & Representation:

Represent for Yuma in investment-related discussions, fostering relationships with key industry stakeholders, including investors, partners, and ecosystem participants. Participate in industry events, panels, and forums to position Yuma as a leader in decentralized AI and investment strategy. Develop and maintain relationships with key influencers, thought leaders, and potential collaborators to enhance Yuma’s investment reach. Research & Market Intelligence

Conduct in-depth technical and financial analysis of TAO, subnet tokens, and broader crypto ecosystems (DeFi, L1s, L2s, deAI). Develop valuation models for token investments, integrating on-chain data, staking mechanisms, and liquidity constraints. Monitor and analyze the performance of subnet alpha tokens, validator rewards, and staking dynamics to optimize returns. Maintain and oversee analytics dashboards to track key on-chain metrics, investment KPIs, and market trends. Stay ahead of crypto regulatory changes, evaluating potential impact on Bittensor’s investment strategy. Portfolio & Risk Management

Oversee and manage a portfolio of TAO and subnet token investments, ensuring risk-adjusted returns.

Identify and mitigate market, liquidity, and counterparty risks, particularly in decentralized trading environments.
Oversee portfolio rebalancing and dynamic liquidity management, adjusting exposure based on market conditions.
Create investment reports and dashboards, providing regular updates to executive leadership on portfolio performance and strategic adjustments.

QUALIFICATIONS:

Bachelor’s or Master’s degree in Finance, Economics, Engineering, Computer Science, or related fields. 3+ years of experience in venture capital, private equity, hedge funds, or digital asset investment roles.

Deep expertise in blockchain investments, including DeFi, L1/L2 ecosystems, decentralized AI, and staking economies.
Strong understanding of Bittensor, TAO, and subnet tokenomics, with experience in validator dynamics and protocol incentives.
Experience overseeing algorithmic trading or automated execution strategies in digital assets.
Proficiency in on-chain analytics tools such as Dune Analytics, Nansen, Flipside Crypto, or similar platforms.
Technical fluency in smart contract interactions, blockchain data, and automated trading mechanisms (Python, SQL, Solidity preferred).
Proven leadership experience managing investment professionals, ideally in a crypto-native or hedge fund setting. Preferred Qualifications

Experience leading quant trading teams, algo traders, or market-making desks within digital assets. Familiarity with staking-as-a-service, perpetual futures, structured DeFi products, and liquidity mining strategies. Prior experience working with decentralized asset management frameworks.

WHAT WE OFFER:

Unlimited PTO / Flexible time off - work with your leader to take time off when you need it
Professional development budget with flexibility for personal and professional growth
Outstanding health insurance for employee, partner and dependents
Life insurance, short-term & long-term disability coverage
401K plan with company contribution
Flexible spending programs for medical and dependent care Paid parental leave

Competitive

Stamford, CT

•

Posted 102 days ago