ai brain laptop

Ollama Gets a New Challenger in Foundary Local in Running Local LLMs

Spread the love

As the AI revolution and innovation continue to move at neck-breaking speed, one thing is becoming recurrent in all the innovations happening across the board. That one concept is open-source and brings models closer to users and engineers, and smaller devices. The goal is to give power to the engineer to run the life cycle of a model locally with little or no reliance on cloud and huge AI hosting bills for testing and building solutions.

In a previous article, we introduced Ollama, which is a tool that allows you to run multiple open-source LLMs on your local machine. In this article, we shall talk about a new kid in the block trying to take the market from Ollama and allow you to run models locally, which has a somewhat lighter touch and is pretty easy to use, like Ollama.

Introducing Foundary Local

Foundary Local is an open-source LLM orchestration tool that allows you to download, run, and work with LLM all on your local machine. It is currently managed by Microsoft, and as at the time of writing this, it does not support all open-source LLMs but a few popular models such as PhiQwen, and Mistral. You can access the Github repository of Foundary Local.

foundry local arch

Foundary Local Architecture (source: https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/concepts/foundry-local-architecture)

Values of using Foundary Local

On-Device Inference

On-device inference means that your AI models run entirely on your hardware—whether that’s a laptop, a desktop, or an edge device—without any need to send requests to remote cloud servers. By eliminating network round-trips, you get dramatically lower latency and a more responsive experience, while also removing dependencies on external services and safeguarding your data against potential exposure in transit.

Data Privacy & Security

By performing all inference operations entirely on-device, Foundry Local ensures that sensitive information never leaves your hardware, greatly reducing the risk of data exposure over networks.

This local-only execution model helps you adhere to stringent privacy regulations and compliance standards, such as GDPR, HIPAA, or industry-specific guidelines, by keeping all data processing within the secure boundaries of your infrastructure.

Cost Efficiency

By leveraging your existing hardware—whether it’s CPUs, GPUs, or specialized NPUs—Foundry Local lets you perform inference without incurring ongoing cloud usage charges. This means you can scale AI workloads on-premises at a fraction of the cost of pay-as-you-go cloud services, turning capital investments in local infrastructure into long-term savings. As your usage grows, you avoid the steep operational expenses associated with high-volume API calls, making large-scale AI deployments economically sustainable.

Low Latency

By executing models locally on your hardware, Foundry Local removes the need for data to traverse the internet and eliminates the inherent delays of cloud-based inference. This direct, in-place processing slashes response times, enabling real-time or highly interactive applications—like voice assistants, AR/VR experiences, or live data analytics—to perform smoothly and predictably without network-induced lag.

Offline Operation

Foundry Local lets you keep AI capabilities up and running even when you’re off the grid or inside highly regulated environments with restricted connectivity. Since the models reside and execute entirely on-device, you don’t need an active internet connection to perform inference, making it perfect for field deployments, industrial edge systems, or secure facilities where network access is limited or prohibited. This offline operation ensures uninterrupted functionality and consistent performance regardless of external network availability.

Flexible Hardware Support

Built on the versatile ONNX Runtime with a unified hardware abstraction layer, Foundry Local intelligently detects and leverages the compute resources available on your device—whether that’s multi-core CPUs, discrete GPUs from NVIDIA, AMD, or Intel, or specialized NPUs like Qualcomm’s Hexagon or Apple’s Neural Engine.

This flexibility means you don’t have to worry about manual tuning or separate binaries for different architectures; Foundry Local optimizes model execution under the hood to deliver the best possible performance across a wide range of hardware configurations.

Scalable Deployment

With Foundry Local, you can develop and test on a single laptop or desktop and then effortlessly scale up to high-density, server-grade clusters without touching a line of integration code.

Whether you’re deploying in a small office, across multiple edge sites, or within a Kubernetes-managed datacenter, Foundry Local’s consistent API and deployment model let you expand capacity simply by adding more nodes—so your applications can grow with demand, all while maintaining the same seamless developer experience.

Model Customization & Management

Foundry Local empowers you to tailor your AI stack by choosing from a library of optimized, preset models or compiling and fine-tuning your own through the Olive framework. Once you’ve selected a model, you can effortlessly manage its lifecycle—caching, loading, updating, or retiring versions—using either the intuitive command-line interface or the OpenAI-compatible REST API, ensuring that your applications always run the exact model you need, when you need it.

Seamless Integration

You can get started with Foundry Local immediately on any compatible device—no cloud account required. Its installer and runtime operate entirely standalone, so you can prototype, develop, and test AI-powered features on your local machine without incurring any upfront or ongoing cloud costs, giving you full control over your development environment. Integrate via an intuitive CLI, REST endpoints (OpenAI-compatible), SDKs in popular languages, or the AI Toolkit for Visual Studio Code for an end-to-end developer experience.

Foundary Local vs Ollama

Let us do a fair comparison between Foundary Local and Ollama and see how they compare with each other

4kumQAAAABJRU5ErkJggg==

How to use Foundary Local

The next step is a practical guide on how to install and use Foundary Local on your machine. At the moment, its official installation is only for Windows and for Mac. Let us get our hands dirty!

Installing Found Local

To install on Windows, run this on the Windows Terminal Window:

winget install Microsoft.FoundryLocal

To install on Mac, run this Mac Terminal Window:

brew tap microsoft/foundrylocal
brew install foundrylocal

Download a Model

You can download a model from the list of available models that is supported by Foundary Local, with the following command

foundary model download phi-3.5-mini 

Run Model 

foundry model run phi-3.5-mini

You can skip the download step to run the model directly; it will be downloaded automatically and run on your local machine.

When the model is running, you can immediately start prompting it to get results.

There are other commands for managing the life cycle of the models. You can see the documentation for the commands here.

Conclusion

The local running of models in the LLM ecosystem is growing fast, which will further boost a structured way of testing, running, and managing the LLM life cycle. This will further improve the MLOps process,s bringing ML Engineers, DevOps Engineers and Software Developers on thesame table, and a structured process to version, and push models for developers to integrate apps, MCP servers, and DevOps/MLOps engineers to automate both models deployment and management and application system via SDLC life cycle. This means that this cycle will become larger over time, maybe SDLC will not fit in anymore?


Spread the love

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
×