How To Get Started With Code Llama

How To Get Started With Code Llama
Image Credit: Maginative

Meta recently released Code Llama, a family of models (7, 13, and 34 billion parameters) trained on 500 billion tokens of code data. Meta fine-tuned those base models for two different flavors: a Python specialist (100 billion additional tokens) and an instruction fine-tuned version, which can understand natural language instructions.

The purpose of this article is to help readers easily get up and running with Code Llama. We'll cover the main ways to access Code Llama's capabilities both locally or via hosted services.

Update: If you are interested in codellama-70b please check the updated link below:

How To Get Started With CodeLlama-70B
CodeLlama-70B-Instruct achieves 67.8 on HumanEval, making it one of the highest performing open models available today.

Directly From the Source

If you are an experienced researcher/developer, you can submit a request to download the model weight and tokenizers directly from Meta. You can find sample code to load Code Llama models and run inference on GitHub.

N.B. Keep in mind that the links expire after 24 hours and a certain amount of downloads. If you start seeing errors such as 403: Forbidden, you can always re-request a link.

From Hugging Face

Image Credit: Hugging Face

Alternatively, you may choose to access Code Llama on the Hugging Face platform. There you can find:

  • Models on the Hub with their model cards and license
  • Transformers integration
  • Integration with Text Generation Inference for fast and efficient production-ready inference
  • Integration with Inference Endpoints
  • Code benchmarks

Read this blog post for all the details including running the small models in free Google Colab! Expect a lot more updates from Hugging Face in the coming days. They are working on sharing scripts to train models, optimizations for on-device inference, even nicer demos (and for more powerful models), and much more.

Base Model Python Instruct
7B CodeLlama-7b-hf CodeLlama-7b-Python-hf CodeLlama-7b-Instruct-hf
13B codellama/CodeLlama-13b-hf codellama/CodeLlama-13b-Python-hf codellama/CodeLlama-13b-Instruct-hf
34B codellama/CodeLlama-34b-hf codellama/CodeLlama-34b-Python-hf codellama/CodeLlama-34b-Instruct-hf

Code Llama Playground

This is a demo to generate text and code with the Code Llama model (13B). Please note that this model is not designed for instruction purposes but for code completion. Infilling is currently not supported.

Code Llama Inside a Chatbot

One of the easiest ways to try Code Llama is to use one of the instruction models within a conversational app like a chatbot. The Instruct models of Code Llama are specifically fine-tuned to understand natural language prompts so users can simply ask the chatbot to write a function or clarify a section of code.

Perplexity Llama Chat

Perplexity AI is an AI-chat-based conversational search engine that delivers answers to questions using various language models. Within 6 hours of Code Llama's release, Perplexity integrated the 34b-instruct model into its Llama Chat offering. Simply navigate to the website to get going.


Faraday has also added support for the 7b, 13b, and 34b Code Llama instruct models.

Faraday is an easy-to-use desktop app (Mac and Windows) that allows users to chat with AI "characters" offline. It features a one-click Desktop installer that "just works" out of the box (GPU & Metal acceleration included!). The AI models that power Faraday are stored 100% locally on your computer. Your chat data is saved to your computer and is never sent to a remote server

Code Llama 13B Chat on Hugging Face

You can check out this Space on Hugging Face for a quick demo of CodeLlama-13b-Instruct. You can play with it as is, or duplicate to run generations without a queue! If you want to run your own service, you can also deploy the model on Inference Endpoints.

Additionally you can access the 34B Instruct model for free with super fast inference using Hugging Chat.

Code Llama Inside Your IDE

For most developers, you may be looking to use Code Llama as a copilot inside VSCode and other IDEs. Here are some options:

CodeGPT + Ollama

  1. Install Ollama on your Mac to run various open source models locally. Ollama currently supports Code Llama 7B instruct model with support for other models coming soon.
  2. Install CodeGPT and follow these instructions to connect Ollama.

Continue + Ollama /TogetherAI/Replicate

With Continue VS Code Extension, you can use Code Llama as a drop-in replacement for GPT-4, either by running locally with Ollama or TogetherAI or through Replicate.

  1. Install the Continue VS Code extension
  2. Follow these instructions to use Ollama, TogetherAI or through Replicate

P.S. It is likely that Hugging Face's VSCode extension will be updated soon to support Code Llama.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.

Let’s stay in touch. Get the latest AI news from Maginative in your inbox.