Skip to content

Instantly share code, notes, and snippets.

View relyt0925's full-sized avatar

Tyler Lisowski relyt0925

View GitHub Profile
[root@tyler-fsdp-testing root]# ls -lh /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/checkpoint-11128/
total 101G
-rw-r--r--. 1 root root 789 Sep 4 04:36 config.json
-rw-r--r--. 1 root root 144 Sep 4 04:36 generation_config.json
-rw-r--r--. 1 root root 4.6G Sep 4 04:36 model-00001-of-00006.safetensors
-rw-r--r--. 1 root root 4.6G Sep 4 04:36 model-00002-of-00006.safetensors
-rw-r--r--. 1 root root 4.6G Sep 4 04:36 model-00003-of-00006.safetensors
-rw-r--r--. 1 root root 4.6G Sep 4 04:36 model-00004-of-00006.safetensors
-rw-r--r--. 1 root root 4.6G Sep 4 04:36 model-00005-of-00006.safetensors
-rw-r--r--. 1 root root 2.6G Sep 4 04:37 model-00006-of-00006.safetensors
[root@tyler-a100-newimage-val root]# /root/bin/ilab.sh --config /var/mnt/inststg1/instructlab/config.yaml model evaluate --model /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/hf_format/samples_25376/ --base-model /var/mnt/inststg1/instructlab/models/granite-7b-starter1.1/ --benchmark mmlu_branch --tasks-dir /var/mnt/inststg1/instructlab/generated/node_datasets_2024-08-18T15_57_14/
Using local safetensors found at '/var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/hf_format/samples_25376/' for '--model'
INFO 2024-08-18 22:00:17,135 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO 2024-08-18 22:00:17,135 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
INFO 2024-08-18 22:00:17,135 numexpr.utils:161: NumExpr defaulting to 16 threads.
INFO 2024-08-18 22:00:17,797 datasets:58: PyTorch version 2.3.1 available.
INFO 2024-08-18 22
This file has been truncated, but you can view the full file.
[root@tyler-a100-newimage-val instructlab]# nohup /root/bin/ilab.sh train --strategy lab-multiphase --phased-phase1-data /var/mnt/inststg1/instructlab/generated/knowledge_train_msgs_2024-08-18T15_57_14.jsonl --phased-phase2-data /var/mnt/inststg1/instructlab/generated/skills_train_msgs_2024-08-18T15_57_14.jsonl --phased-base-dir /var/mnt/inststg1/instructlab/phasedbasedir --phased-phase1-num-epochs 2 --phased-phase2-num-epochs 2 --phased-mt-bench-judge /var/mnt/inststg1/instructlab/models/prometheus-eval/prometheus-8x7b-v2.0/ --max-batch-len 10000 --max-seq-len 4096 --phased-phase1-effective-batch-size 128 --phased-phase2-effective-batch-size 3840 --enable-serving-output --gpus 8 --skip-user-confirm --model-path /var/mnt/inststg1/instructlab/models/granite-7b-starter1.1/ &
[root@tyler-a100-newimage-val instructlab]# cat nohup.out
time="2024-08-18T20:04:24Z" level=warning msg="The input device is not a TTY. The --tty and --interactive flags might not work properly"
[root@tyler-a100 instructlab]# cat /var/mnt/inststg1/instructlab/generated/checkpoints/compositional_skills_extraction_information_named_entities_places/data_checkpoint_34ae9efe032748c294999e937f44b437.jsonl
{"task_description":"","seed_context":"'Brian Patrick Kennedy( born 5 November 1961) is an Irish- born art museum director who has worked in Ireland and Australia, and now lives and works in the United States.\\n\\nHe is currently the director of the Peabody Essex Museum.\\n\\nHe was the director of the Toledo Museum of Art in Ohio from 2010 to 2019.\\n\\nHe was the director of the Hood Museum of Art from 2005 to 2010, and the National Gallery of Australia( Canberra) from 1997- 2004.\\nIan Barry is an Australian director of film and TV.\\nSaltwater is a 2000 Irish drama film written and directed by Conor McPherson.\\n\\nThe film stars Peter McDonald, Brian Cox, Conor Mullen, Laurence Kinlan, Brendan Gleeson and Eva Birthistle.\\n\\nThe film was released on September 29, 2000, by Buena Vista International
@relyt0925
relyt0925 / gist:fafbc33e9c8d0d77cdb8f74a3ef27ebe
Created August 18, 2024 19:30
knowledge checkpoint example
[root@tyler-a100 instructlab]# cat /var/mnt/inststg1/instructlab/generated/checkpoints/knowledge_compliance_personally-identifiable-information/data_checkpoint_0b9687e0abdd41f688fd204d84698410.jsonl
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant sear
[root@tyler-a100 instructlab]# cat /var/mnt/inststg1/instructlab/generated/knowledge_recipe_2024-08-17T15_42_00.yaml
datasets:
- path: node_datasets_2024-08-17T15_42_00/knowledge_compliance_personally-identifiable-information_p07.jsonl
sampling_size: 1.0
metadata:
sys_prompt: "I am, Red Hat\xAE Instruct Model based on Granite 7B, an AI language\
\ model developed by Red Hat and IBM Research, based on the Granite-7b-base language\
\ model. My primary function is to be a chat assistant."
[root@tyler-a100 instructlab]# cat /var/mnt/inststg1/instructlab/generated/skills_recipe_2024-08-17T15_42_00.yaml
datasets:
- path: /usr/share/instructlab/sdg/datasets/skills.jsonl
sampling_size: 1.0
- path: node_datasets_2024-08-17T15_42_00/knowledge_compliance_personally-identifiable-information_p10.jsonl
sampling_size: 1.0
- path: node_datasets_2024-08-17T15_42_00/compositional_skills_general_tables_editing_add_remove.jsonl
sampling_size: 30
- path: node_datasets_2024-08-17T15_42_00/compositional_skills_general_tables_editing_combining_altering.jsonl
sampling_size: 30
[root@tyler-a100 instructlab]# head -n 100 /var/mnt/inststg1/instructlab/generated/skills_train_msgs_2024-08-17T15_42_00.jsonl
{"messages":[{"content":"I am, Red Hat\u00ae Instruct Model based on Granite 7B, an AI language model developed by Red Hat and IBM Research, based on the Granite-7b-base language model. My primary function is to be a chat assistant.","role":"system"},{"content":"Suppose there are two gas stations, A and B, located on opposite sides of a highway. Both charge the same price, $2.50 per gallon, for gasoline. However, station A is 1 mile closer to the majority of drivers, while station B is 1 mile closer to the minority of drivers. The cost of driving 1 mile is $0.10. If station A and B both advertise their prices, which station will attract more customers and what will be their profits?","role":"user"},{"content":"To determine which station will attract more customers and their profits, we need to consider the cost of driving to each station for the majority and minority of drivers.\n\nL
This file has been truncated, but you can view the full file.
[root@tyler-a100 instructlab]# cat /var/mnt/inststg1/instructlab/generated/knowledge_train_msgs_2024-08-17T15_42_00.jsonl
@relyt0925
relyt0925 / gist:1fd2ca8c1c9fc21c2108129fb6048d82
Created August 18, 2024 01:24
mmlubench_knowledge_compliance_personally-identifiable-information.jsonl
[root@tyler-a100 generated]# cat /var/mnt/inststg1/instructlab/generated//node_datasets_2024-08-17T15_42_00/mmlubench_knowledge_compliance_personally-identifiable-information.jsonl
{"icl_document":"hii","document":"# Personal Data\n\n## Overview\n\nPersonal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person.\n\nThe abbreviation PII is widely accepted in the United States, but the phrase it abbreviates has four common variants based on personal or personally, and identifiable or identifying. Not all are equivalent, and for legal purposes the effective definitions vary depending on the jurisdiction and the purposes for which the term is being used. Under European Union and United Kingdom data protection regimes, which centre primarily on the General Data Protection Regulation (GDPR), the term \"personal data\" is significantly broader, and determines the scope of the regulatory regime.\n\nNational Institute of Standards an