Meta recently released Code Llama, a family of models (7, 13, and 34 billion parameters) trained on 500 billion tokens of code data. Meta fine-tuned those base models for two different flavors: a Python specialist (100 billion additional tokens) and an instruction fine-tuned version, which can understand natural language instructions.
The purpose of this article is to help readers easily get up and running with Code Llama. We'll cover the main ways to access Code Llama's capabilities both locally or via hosted services.
Directly From the Source
If you are an experienced researcher/developer, you can submit a request to download the model weight and tokenizers directly from Meta. You can find sample code to load Code Llama models and run inference on GitHub.
N.B. Keep in mind that the links expire after 24 hours and a certain amount of downloads. If you start seeing errors such as
403: Forbidden, you can always re-request a link.
From Hugging Face
Alternatively, you may choose to access Code Llama on the Hugging Face platform. There you can find:
- Models on the Hub with their model cards and license
- Transformers integration
- Integration with Text Generation Inference for fast and efficient production-ready inference
- Integration with Inference Endpoints
- Code benchmarks
Read this blog post for all the details including running the small models in free Google Colab! Expect a lot more updates from Hugging Face in the coming days. They are working on sharing scripts to train models, optimizations for on-device inference, even nicer demos (and for more powerful models), and much more.
This is a demo to generate text and code with the Code Llama model (13B). Please note that this model is not designed for instruction purposes but for code completion. Infilling is currently not supported.
Code Llama Inside a Chatbot
One of the easiest ways to try Code Llama is to use one of the instruction models within a conversational app like a chatbot. The Instruct models of Code Llama are specifically fine-tuned to understand natural language prompts so users can simply ask the chatbot to write a function or clarify a section of code.
Perplexity AI is an AI-chat-based conversational search engine that delivers answers to questions using various language models. Within 6 hours of Code Llama's release, Perplexity integrated the
34b-instruct model into its Llama Chat offering. Simply navigate to the website to get going.
Faraday has also added support for the 7b, 13b, and 34b Code Llama instruct models.
Faraday is an easy-to-use desktop app (Mac and Windows) that allows users to chat with AI "characters" offline. It features a one-click Desktop installer that "just works" out of the box (GPU & Metal acceleration included!). The AI models that power Faraday are stored 100% locally on your computer. Your chat data is saved to your computer and is never sent to a remote server.
Code Llama 13B Chat on Hugging Face
You can check out this Space on Hugging Face for a quick demo of CodeLlama-13b-Instruct. You can play with it as is, or duplicate to run generations without a queue! If you want to run your own service, you can also deploy the model on Inference Endpoints.
Additionally you can access the 34B Instruct model for free with super fast inference using Hugging Chat.
Code Llama Inside Your IDE
For most developers, you may be looking to use Code Llama as a copilot inside VSCode and other IDEs. Here are some options:
- Install Ollama on your Mac to run various open source models locally. Ollama currently supports Code Llama 7B instruct model with support for other models coming soon.
- Install CodeGPT and follow these instructions to connect Ollama.
Continue + Ollama /TogetherAI/Replicate
With Continue VS Code Extension, you can use Code Llama as a drop-in replacement for GPT-4, either by running locally with Ollama or TogetherAI or through Replicate.
- Install the Continue VS Code extension
- Follow these instructions to use Ollama, TogetherAI or through Replicate
P.S. It is likely that Hugging Face's VSCode extension will be updated soon to support Code Llama.