Machine Learning

machinelearning@lemmy.ml

PostsComments

Shamar@feddit.it • 11 days ago

A community statement supporting the Open Source Definition (OSD)

6

A community statement supporting the Open Source Definition (OSD)

Shamar@feddit.it • 11 days ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

How ‘Embeddings’ Encode What Words Mean

7

How ‘Embeddings’ Encode What Words Mean

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

New AI model “learns” how to simulate Super Mario Bros. from video footage

3

New AI model “learns” how to simulate Super Mario Bros. from video footage

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)

8

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI

8

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

The Difference Between Speaking and Thinking

4

The Difference Between Speaking and Thinking

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

Diffusion Models Are Real-Time Game Engines

4

Diffusion Models Are Real-Time Game Engines

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 3 months ago

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.

2

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 3 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 3 months ago

Transformer Explainer

2

Transformer Explainer

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 3 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 3 months ago

Alibaba claims no. 1 spot in AI math models with Qwen2-Math

9

Alibaba claims no. 1 spot in AI math models with Qwen2-Math

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 3 months ago

yboutros@infosec.pub • 3 months ago

How to convert a positionally encoded predicted embedding from a decoder to its matching token?

2

How to convert a positionally encoded predicted embedding from a decoder to its matching token?

yboutros@infosec.pub • 3 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 3 months ago

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow

5

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 3 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 4 months ago

AI models collapse when trained on recursively generated data

15

AI models collapse when trained on recursively generated data

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 4 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 4 months ago

RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing

4

RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 4 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 4 months ago

Alibaba's Qwen LLM model leading open source rankings

4

Alibaba's Qwen LLM model leading open source rankings

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 4 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 5 months ago

By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K. That’s better than GPT-4, Claude and Gemini, with 200x fewer parameters!

7

By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K. That’s better than GPT-4, Claude and Gemini, with 200x fewer parameters!

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 5 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 5 months ago

Mixture of Agents (MoA) leverages several open-source LLM agents to achieve a score of 65.1% on AlpacaEval 2.0

4

Mixture of Agents (MoA) leverages several open-source LLM agents to achieve a score of 65.1% on AlpacaEval 2.0

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 5 months ago

ylai@lemmy.ml • 5 months ago

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

1

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

ylai@lemmy.ml • 5 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 6 months ago

Sakuga-42M Dataset: Scaling Up Cartoon Research

3

Sakuga-42M Dataset: Scaling Up Cartoon Research

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 6 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 6 months ago

How AI 'Understands' Images (CLIP)

4

How AI 'Understands' Images (CLIP)

☆ Yσɠƚԋσʂ ☆@lemmy.ml • 6 months ago