Getting Started with a Local LLM using Ollama

Overview

Running a Large Language Model (LLM) locally is a great way to explore AI capabilities without relying on infrastructure. Ollama makes it easy to download and run open-source models directly on your machine whether you’re running on MacOS, Windows or Linux locally.

With the rise of AI on the horizon, experimenting with LLMs locally can help you or your organization begin exploring new solutions built on AI:

  • Familiarize yourself with modern AI frameworks.
  • Understand how LLMs work and generate languages.
  • Test applications and workflows before scaling to cloud or bare metal based systems.
  • Get hands-on experience running and interacting with LLM models in real-time.

In this guide, I’ll walk you through installing Ollama, pulling a small model, and running your first prompt.


1. Install Ollama

Ollama provides native installers for macOS, Windows, and Linux.

Download

Once installed, you can verify that Ollama is available from your terminal or CMD/Powershell prompt:


ollama --version

If you see a version number, you’re ready to go!


2. Pull a Lightweight 1B Parameter Model

Ollama supports a wide range of open models, from small lightweight options to multi-billion parameter models.

Assuming you’re testing locally or on a lightweight server, let’s start with a smaller model. The Llama 3.2 1B model provides a good balance of performance and capability for local experimentation.

Run the following command to download it (approximately 1-2 GB in size):


ollama pull llama3.2:1b

Ollama will handle downloading and setting up the model automatically.


3. Run the Model

Once the model is downloaded, you can start chatting with it directly from your terminal:


ollama run llama3.2:1b

You’ll then see a prompt (>) where you can start typing messages.

For example, ask the model a question:


> What is 2+2?

The model should respond with:


Two plus two is four.

To exit the prompt window, press Ctrl + C.


4. Tips for Running Local LLMs

Running LLMs locally can be resource-intensive, especially as model sizes grow.

  • Use a GPU-backed system: While Ollama supports CPU inference, performance will be significantly better on systems with a dedicated GPU system. At xByte Cloud, we encourage customers to utilize GPU based systems when deploying an LLM in a production setting.

  • Monitor system resources: If you experience slow responses or high memory usage, consider using a smaller model or upgrading your hardware.

As you begin your AI journey our Sales team can connect you with a Solutions Architect to help design an optimized, scalable LLM-AI infrastructure that fits your organization’s needs.

:backhand_index_pointing_right: Contact Sales


5. Next Steps

Now that you have a model running locally, try:


Congratulations!

You’ve just set up and run your first local LLM using Ollama.