facebook/galactica-1.3b · Hugging Face (2024)

facebook/galactica-1.3b · Hugging Face (1)

Model card from the original repo

Following Mitchell et al. (2018), this model card provides information about the GALACTICA model, how it was trained, and the intended use cases. Full details about how the model was trained and evaluated can be found in the release paper.

Model Details

The GALACTICA models are trained on a large-scale scientific corpus. The models are designed to perform scientific tasks, including but not limited to citation prediction, scientific QA, mathematical reasoning, summarization, document generation, molecular property prediction and entity extraction. The models were developed by the Papers with Code team at Meta AI to study the use of language models for the automatic organization of science. We train models with sizes ranging from 125M to 120B parameters. Below is a summary of the released models:

SizeParameters
mini125 M
base1.3 B
standard6.7 B
large30 B
huge120 B

Release Date

November 2022

Model Type

Transformer based architecture in a decoder-only setup with a few modifications (see paper for more details).

Paper & Demo

Paper / Demo

Model Use

The primary intended users of the GALACTICA models are researchers studying language models applied to the scientific domain. We also anticipate the model will be useful for developers who wish to build scientific tooling. However, we caution against production use without safeguards given the potential of language models to hallucinate.

The models are made available under a non-commercial CC BY-NC 4.0 license. More information about how to use the model can be found in the README.md of this repository.

Training Data

The GALACTICA models are trained on 106 billion tokens of open-access scientific text and data. This includes papers, textbooks, scientific websites, encyclopedias, reference material, knowledge bases, and more. We tokenize different modalities to provide a natural langauge interface for different tasks. See the README.md for more information. See the paper for full information on the training data.

How to use

Find below some example scripts on how to use the model in transformers:

Using the Pytorch model

Running the model on a CPU

Click to expand
from transformers import AutoTokenizer, OPTForCausalLMtokenizer = AutoTokenizer.from_pretrained("facebook/galactica-1.3b")model = OPTForCausalLM.from_pretrained("facebook/galactica-1.3b")input_text = "The Transformer architecture [START_REF]"input_ids = tokenizer(input_text, return_tensors="pt").input_idsoutputs = model.generate(input_ids)print(tokenizer.decode(outputs[0]))

Running the model on a GPU

Click to expand
# pip install acceleratefrom transformers import AutoTokenizer, OPTForCausalLMtokenizer = AutoTokenizer.from_pretrained("facebook/galactica-1.3b")model = OPTForCausalLM.from_pretrained("facebook/galactica-1.3b", device_map="auto")input_text = "The Transformer architecture [START_REF]"input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")outputs = model.generate(input_ids)print(tokenizer.decode(outputs[0]))

Running the model on a GPU using different precisions

FP16

Click to expand
# pip install accelerateimport torchfrom transformers import AutoTokenizer, OPTForCausalLMtokenizer = AutoTokenizer.from_pretrained("facebook/galactica-1.3b")model = OPTForCausalLM.from_pretrained("facebook/galactica-1.3b", device_map="auto", torch_dtype=torch.float16)input_text = "The Transformer architecture [START_REF]"input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")outputs = model.generate(input_ids)print(tokenizer.decode(outputs[0]))

INT8

Click to expand
# pip install bitsandbytes acceleratefrom transformers import AutoTokenizer, OPTForCausalLMtokenizer = AutoTokenizer.from_pretrained("facebook/galactica-1.3b")model = OPTForCausalLM.from_pretrained("facebook/galactica-1.3b", device_map="auto", load_in_8bit=True)input_text = "The Transformer architecture [START_REF]"input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")outputs = model.generate(input_ids)print(tokenizer.decode(outputs[0]))

Performance and Limitations

The model outperforms several existing language models on a range of knowledge probes, reasoning, and knowledge-intensive scientific tasks. This also extends to general NLP tasks, where GALACTICA outperforms other open source general language models. That being said, we note a number of limitations in this section.

As with other language models, GALACTICA is often prone to hallucination - and training on a high-quality academic corpus does not prevent this, especially for less popular and less cited scientific concepts. There are no guarantees of truthful output when generating from the model. This extends to specific modalities such as citation prediction. While GALACTICA's citation behaviour approaches the ground truth citation behaviour with scale, the model continues to exhibit a popularity bias at larger scales.

In addition, we evaluated the model on several types of benchmarks related to stereotypes and toxicity. Overall, the model exhibits substantially lower toxicity rates compared to other large language models. That being said, the model continues to exhibit bias on certain measures (see the paper for details). So we recommend care when using the model for generations.

Broader Implications

GALACTICA can potentially be used as a new way to discover academic literature. We also expect a lot of downstream use for application to particular domains, such as mathematics, biology, and chemistry. In the paper, we demonstrated several examples of the model acting as alternative to standard search tools. We expect a new generation of scientific tools to be built upon large language models such as GALACTICA.

We encourage researchers to investigate beneficial and new use cases for these models. That being said, it is important to be aware of the current limitations of large language models. Researchers should pay attention to common issues such as hallucination and biases that could emerge from using these models.

Citation

@inproceedings{GALACTICA, title={GALACTICA: A Large Language Model for Science}, author={Ross Taylor and Marcin Kardas and Guillem Cucurull and Thomas Scialom and Anthony Hartshorn and Elvis Saravia and Andrew Poulton and Viktor Kerkez and Robert Stojnic}, year={2022}}
facebook/galactica-1.3b · Hugging Face (2024)

FAQs

What is Hugging Face doing? ›

Hugging Face is an AI platform and supporting community. The community uses Hugging Face to do the following: Implement machine learning models. Users can upload machine learning models to the platform.

What is Hugging Face model? ›

The Hugging Face Hub hosts many models for a variety of machine learning tasks. Models are stored in repositories, so they benefit from all the features possessed by every repo on the Hugging Face Hub. Additionally, model repos have attributes that make exploring and using models as easy as possible.

Is Hugging Face safe? ›

Data Security/Privacy

Hugging Face does not store any customer data in terms of payloads or tokens that are passed to the Inference Endpoint. We are storing logs for 30 days. Every Inference Endpoints uses TLS/SSL to encrypt the data in transit.

Why is Hugging Face so popular? ›

Unlike many other companies in the AI and ML space, Hugging Face provides a platform where developers can freely share code, models, and datasets (Source). It led to broad adoption both by NLP researchers and practitioners (Source).

Does Hugging Face make money? ›

The vast majority of Hugging Face's ~$70M in annualized revenue today comes from the managed version of their product they're selling into companies like Amazon, Nvidia, and Microsoft with the much bigger monetization opportunity in the future as a result of being the central collaborative tool for devs building with ...

Does Hugging Face pay well? ›

The average Hugging Face salary ranges from approximately $46,676 per year (estimate) for a Guest Hugger to $319,089 per year (estimate) for a Chief Executive Officer.

Is Hugging Face free? ›

Free tier is available for everyone. For a limited number of samples, you can train your models for free!

What are the benefits of Hugging Face? ›

Office Perks
  • Company-sponsored outings.
  • Fitness stipend.
  • Some meals provided.
  • Home-office stipend for remote employees.
  • Meditation space.
  • Mother's room.

For whom is Hugging Face chat to mainly designed? ›

Hugging Face's chat tool, which is part of the Hugging Face Transformers library, is primarily designed for developers, researchers, and data scientists who want to build conversational AI applications.

How does Hugging Face make money? ›

At Hugging Face, we build a collaboration platform for the ML community (i.e., the Hub), and we monetize by providing simple access to compute for AI, with services like AutoTrain, Spaces and Inference Endpoints, directly accessible from the Hub, and billed by Hugging Face to the credit card on file.

Top Articles
Latest Posts
Article information

Author: Roderick King

Last Updated:

Views: 6476

Rating: 4 / 5 (71 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Roderick King

Birthday: 1997-10-09

Address: 3782 Madge Knoll, East Dudley, MA 63913

Phone: +2521695290067

Job: Customer Sales Coordinator

Hobby: Gunsmithing, Embroidery, Parkour, Kitesurfing, Rock climbing, Sand art, Beekeeping

Introduction: My name is Roderick King, I am a cute, splendid, excited, perfect, gentle, funny, vivacious person who loves writing and wants to share my knowledge and understanding with you.