Overview
Running a Large Language Model (LLM) locally is a great way to explore AI capabilities without relying on infrastructure. Ollama makes it easy to download and run open-source models directly on your machine whether you’re running on MacOS, Windows or Linux locally.
With the rise of AI on the horizon, experimenting with LLMs locally can help you or your organization begin exploring new solutions built on AI:
- Familiarize yourself with modern AI frameworks.
- Understand how LLMs work and generate languages.
- Test applications and workflows before scaling to cloud or bare metal based systems.
- Get hands-on experience running and interacting with LLM models in real-time.
In this guide, I’ll walk you through installing Ollama, pulling a small model, and running your first prompt.
1. Install Ollama
Ollama provides native installers for macOS, Windows, and Linux.
Once installed, you can verify that Ollama is available from your terminal or CMD/Powershell prompt:
ollama --version
If you see a version number, you’re ready to go!
2. Pull a Lightweight 1B Parameter Model
Ollama supports a wide range of open models, from small lightweight options to multi-billion parameter models.
Assuming you’re testing locally or on a lightweight server, let’s start with a smaller model. The Llama 3.2 1B model provides a good balance of performance and capability for local experimentation.
Run the following command to download it (approximately 1-2 GB in size):
ollama pull llama3.2:1b
Ollama will handle downloading and setting up the model automatically.
3. Run the Model
Once the model is downloaded, you can start chatting with it directly from your terminal:
ollama run llama3.2:1b
You’ll then see a prompt (>) where you can start typing messages.
For example, ask the model a question:
> What is 2+2?
The model should respond with:
Two plus two is four.
To exit the prompt window, press Ctrl + C.
4. Tips for Running Local LLMs
Running LLMs locally can be resource-intensive, especially as model sizes grow.
-
Use a GPU-backed system: While Ollama supports CPU inference, performance will be significantly better on systems with a dedicated GPU system. At xByte Cloud, we encourage customers to utilize GPU based systems when deploying an LLM in a production setting.
-
Monitor system resources: If you experience slow responses or high memory usage, consider using a smaller model or upgrading your hardware.
As you begin your AI journey our Sales team can connect you with a Solutions Architect to help design an optimized, scalable LLM-AI infrastructure that fits your organization’s needs.
5. Next Steps
Now that you have a model running locally, try:
-
Experimenting with different models from Ollama’s model library
-
Integrating Ollama into your own applications using the Ollama API
Congratulations!
You’ve just set up and run your first local LLM using Ollama.