Whistleblowing within AI companies, especially concerning sensitive information like model sizes and training methods, is a significant and serious action. It requires careful consideration of the legal, ethical, and personal implications involved. This guide aims to provide employees with comprehensive information on safely and anonymously whistleblowing or leaking confidential AI-related data while minimizing personal and professional risks.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import logging | |
import os | |
import fire | |
import torch | |
from datasets import load_dataset | |
from huggingface_hub import PyTorchModelHubMixin | |
from torch import nn | |
from transformers import AutoConfig, AutoModel, AutoTokenizer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# !pip install -q sentence-splitter | |
import os | |
from sentence_splitter import split_text_into_sentences | |
REFUSAL_TERMS = [ | |
"sorry", | |
"i can't", | |
"unfortunately,", | |
"as a language model", | |
"as an ai language model", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
def extract_comments_and_docs(multiline_string): | |
# Pattern to match lines where the first non-whitespace character is '#' | |
comment_pattern = r"^\s*#(.*)" | |
# Pattern to match any text within triple quotes (either ''' or """) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
import gzip | |
from datasets import load_dataset | |
from botocore import UNSIGNED | |
from botocore.config import Config | |
num_proc = 32 | |
s3 = boto3.client("s3", config=Config(signature_version=UNSIGNED)) | |
bucket_name = "softwareheritage" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import logging | |
import os | |
import fire | |
import torch | |
from datasets import load_dataset | |
from huggingface_hub import PyTorchModelHubMixin | |
from torch import nn | |
from transformers import AutoConfig, AutoModel, AutoTokenizer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import base64 | |
import os | |
from pathlib import Path | |
import fire | |
from openai import OpenAI | |
from tqdm.auto import tqdm | |
from joblib import Memory | |
# Set up joblib caching |
Schrödinger's Non-Commercial License (SNCL) v1.0
Preamble:
This license is designed to allow users to freely use, modify, and distribute the software for non-commercial purposes. It recognizes the challenges in defining what constitutes commercial activity and offers guidance and flexibility for users who are unsure about the nature of their activities.
1. Grant of License
Subject to the terms and conditions of this License, the Licensor hereby grants to the Licensee a worldwide, royalty-free, non-exclusive license to use, modify, and distribute the Software, provided that such activities are conducted for Non-Commercial Purposes, as defined below.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import streamlit as st | |
import pandas as pd | |
from datasets import load_from_disk | |
import textwrap | |
import json | |
# Constants | |
ROWS_PER_PAGE = 100 | |
LOGO_URL = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/datasets_logo.png" | |
DOCS_URL = "https://huggingface.co/docs/datasets/index" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
parses the standard main.log from nanoT5 and makes some plots | |
pip install matplotlib pandas seaborn | |
""" | |
import argparse | |
import logging | |
import os | |
import re | |
from pathlib import Path |
NewerOlder