other

Running AI Locally with Ollama Server

Gabriel Haab

Dec 11, 2024 • 3 min read

Artificial intelligence is transforming the way we interact with technology. From chatbots to predictive analytics, AI’s capabilities are expanding rapidly. However, many individuals and small businesses hesitate to adopt AI because of perceived barriers like expensive servers or reliance on cloud-based APIs. But there’s good news: you can deploy AI locally with tools like the Ollama server and leverage models such as 8B or 13B for most tasks—all without heavy investment or API limits.

Why Deploy AI Locally?

Running AI locally offers several advantages:

Cost Savings: There’s no need for expensive cloud servers or ongoing subscription fees.
Privacy: Local deployment keeps your data within your control, reducing concerns about data breaches or sharing sensitive information with third-party providers.
Reliability: No dependency on external APIs means your AI remains functional even during internet outages.
Scalability: With access to a GPU (even modest ones), you can achieve impressive performance with models like 8B or 13B.

What Is Ollama Server?

The Ollama server is an open-source platform that simplifies deploying and managing AI models locally. It’s designed to work with various hardware setups and can handle popular AI models effectively, even on GPUs with limited memory.

Hardware Requirements

You don’t need a high-end server to get started. Many modern GPUs—even consumer-grade ones like NVIDIA’s GTX 1650 or RTX 3060—can run models like 8B or 13B. While higher-end GPUs will naturally perform better, Ollama’s efficient implementation ensures even modest setups can yield good results. The amount of available GPU RAM is important since the model needs to be loaded into it.

Free Models You Can Use

A growing library of free and open-source AI models can be used with the Ollama server. Here are a few notable options:

LLaMA (by Meta):
- Developed by Meta, LLaMA models are optimized for efficiency and available in various sizes, including 7B, 13B, and 30B. These models excel in tasks like text generation and language understanding.
Gemini (by Google):
- A versatile model known for its balanced performance and wide application range. Ideal for content creation, summarization, and question-answering tasks.
Many more available on: https://ollama.com/library

These freely available models can be tailored to your specific use cases. You can optimize performance and minimize hardware requirements by choosing the right model for your needs.

OpenWeb UI: A ChatGPT-Like Experience

For those who prefer an intuitive graphical user interface (GUI), OpenWeb UI is an excellent option. It provides a user-friendly platform to interact with local AI models, offering a ChatGPT-like experience. Here’s how you can set it up:

Install OpenWeb UI:
- Download the OpenWeb UI package from its official repository. https://github.com/open-webui/open-webui
- Follow the installation instructions specific to your operating system.
Configure the Interface:
- Point OpenWeb UI to your Ollama server and ensure it can access the AI models you’ve set up.
- Customize the interface settings, such as response formatting and chat history, to suit your preferences.
Launch the UI:
- Start OpenWeb UI and access it via your web browser.
- You can now interact with your local AI models through an interface that mimics the functionality of ChatGPT, including real-time conversation and task execution.
Enhance Workflows:
- Use the GUI for a wide range of tasks, such as drafting emails, brainstorming ideas, or automating responses—all powered by your locally hosted AI.

OpenWeb UI bridges the gap between raw server setups and seamless user experiences, making local AI more accessible to non-technical users.

Step-by-Step Guide to Deploying Ollama Server

Install Dependencies:
- Ensure you have Python and CUDA installed on your system.
- Install Ollama server from their GitHub repository or package manager.
Download the AI Models:
- Visit Ollama’s website or repositories to select and download pre-trained models, such as 8B or 13B.
- Store the models in a designated folder for easy access.
Configure the Server:
- Set up the Ollama configuration file to point to your model directory.
- Specify the GPU or CPU resources to allocate for model execution.
Start the Server:
- Use the command line to launch the Ollama server.
- Verify the server is running by accessing its web interface or command-line tools.
Integrate AI into Your Workflow:
- Use the Ollama server’s API or local endpoints to interact with the models.
- Connect the server to automation tools like Home Assistant for tasks like smart home management, data processing, or personal assistants.

The Power of 8B and 13B Models

Models like 8B and 13B strike the perfect balance between performance and resource requirements. While not as large as 175B models, they are more than capable for tasks like:

Language understanding and generation
Sentiment analysis
Summarization
Basic code generation

The best part? These models eliminate the limits imposed by API-based solutions. With local deployment, your only constraint is your hardware’s capacity.

Conclusion

With tools like Ollama server and accessible AI models, running AI locally is no longer reserved for tech giants or AI experts. Anyone with a modest GPU can harness the power of AI for automation, content creation, and much more. Say goodbye to API limits and cloud dependency, and hello to a cost-effective, reliable AI solution.