Make sure you are using llama.cpp from commit d0cee0d36d5be95a0d9088b674dbb27354107221 or later.
./main -ngl 32 -m llama-2-13b-chat.q4_K_M.gguf –color -c 4096 –temp 0.7 –repeat_penalty 1.1 -n -1 -p “[INST] \nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.”
Alright, using OpenBLAS with Llama.cpp. Now, if you don’t know what OpenBLAS is, it’s basically a library for linear algebra that can make your computations faster than ever before! And if you don’t know what Llama.cpp is… well, let’s just say it’s the coolest thing to hit the open-source world since sliced bread (or maybe even since sliced cheese).
So how do we use OpenBLAS with Llama.cpp? Well, first things first: you need to download and install both of these bad boys on your computer. Don’t worry, it’s not as scary as it sounds! Just follow the instructions on their respective websites (or check out our handy-dandy tutorial video) and before you know it, you’ll be ready to roll.
Once you have OpenBLAS and Llama.cpp installed, you can start using them together by running some sweet commands in your terminal window. Here’s an example:
# This line is a comment and does not affect the execution of the script
# It is used to provide information and explanations about the code
# The following line uses the make command to compile the code with the LLAMA_OPENBLAS flag set to 1
# This flag enables the use of OpenBLAS and Llama.cpp together
make LLAMA_OPENBLAS=1
This command tells the makefile to use OpenBLAS instead of the default BLAS library (which is usually slower). Pretty cool, right? And if you want to get even more fancy, you can also customize your build by adding options like `-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS` when running CMake.
If you’re using Linux (or any other Unix-based operating system), you can also use OpenBLAS with Llama.cpp by compiling it from source code. Here are the steps:
1. Clone the llama.cpp repository to your computer.
2. Navigate to the directory where you cloned the repo (e.g., `cd ~/llama-cpp`).
3. Run `cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS`. This will generate a build system for Llama.cpp that includes OpenBLAS support.
4. Use the `make` command to compile and link your code (e.g., `make`).
5. Run your program using the `./main` script, passing in any necessary arguments (e.g., `./main -ngl 32 -m llama-2-13b-chat.q4_K_M.gguf –color -c 4096 –temp 0.7 –repeat_penalty 1.1 -n -1 -p`).
And that’s it! You should now be using OpenBLAS with Llama.cpp, which means your computations will be faster and more efficient than ever before. So give it a try we promise you won’t regret it!