Skip to content

Instantly share code, notes, and snippets.

@relyt0925
Created August 18, 2024 19:30
Show Gist options
  • Save relyt0925/fafbc33e9c8d0d77cdb8f74a3ef27ebe to your computer and use it in GitHub Desktop.
Save relyt0925/fafbc33e9c8d0d77cdb8f74a3ef27ebe to your computer and use it in GitHub Desktop.
knowledge checkpoint example
[root@tyler-a100 instructlab]# cat /var/mnt/inststg1/instructlab/generated/checkpoints/knowledge_compliance_personally-identifiable-information/data_checkpoint_0b9687e0abdd41f688fd204d84698410.jsonl
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What are the consequences of PII being stolen or leaked?","response":"The consequences of PII being stolen or leaked include identity theft, sale on the black market, and ransomware attacks, leading to significant costs for individuals and information security professionals to maintain data privacy.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What are the two categories of Personally Identifiable Information?","response":"Personally Identifiable Information can be categorized into direct identifiers and indirect identifiers.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What is the difference between sensitive and non-sensitive PII?","response":"Sensitive PII directly identifies an individual and could cause significant harm if leaked or stolen, while non-sensitive PII may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"How does the context in which PII is used affect its sensitivity?","response":"The context in which PII is used plays a crucial role in determining its sensitivity. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What is the format of a typical Western name?","response":"The format of a typical Western name is FirstName LastName.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What is an example of sensitive PII?","response":"Examples of sensitive PII include social security numbers, unique identification numbers, biometric data, financial information, and medical records.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What is an example of non-sensitive PII?","response":"Examples of non-sensitive PII include a person's full name, mother's maid"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What is the definition of Personally Identifiable Information?","response":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number.\n"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What are the two categories of Personally Identifiable Information?","response":"Personally Identifiable Information (PII) can be categorized into two types: direct identifiers and indirect identifiers.\n"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What is an example of a direct identifier?","response":"Direct identifiers are unique to a person and can determine their identity. An example of a direct identifier is a passport or driver's license number.\n"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What is an example of an indirect identifier?","response":"Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What is the difference between sensitive and non-sensitive PII?","response":"Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation.\n"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and information security professionals to maintain data privacy.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose a risk to the individual's privacy and security.\n\nThe context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"How does the context in which information is used determine whether it is sensitive or non-sensitive PII?","response":"The context in which the information is used plays a crucial role in determining whether it is sensitive or non-sensitive PII. For example, a full name on its own may be non-sensitive, but a list"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"PII, or Personally Identifiable Information, refers to any information linked to a specific individual that can be used to uncover their identity. This can include social security numbers, full names, email addresses, or phone numbers. As people increasingly rely on technology, the amount of PII shared with organizations has grown, leading to an increased risk of data breaches. Hackers can use stolen PII to commit identity theft, sell it on the black market, or hold it captive via ransomware.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined. For example, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be categorized as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, can directly identify an individual and cause significant harm if leaked or stolen. Non-sensitive PII, such as a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose a risk to individuals.\n\nData privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII may or may not be publicly available and may not require protection. However, many companies still put safeguards in place to protect non-sensitive PII due to the potential risks associated with combining multiple pieces of non-sensitive PII. The classification of PII as sensitive or non-sensitive depends heavily on context.","question":"What is the definition of Personally Identifiable Information (PII)?","response":"Personally Identifiable Information (PII) refers to any information linked to a specific individual that can be used to uncover their identity. This can include social security numbers, full names, email addresses, or phone numbers.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"PII, or Personally Identifiable Information, refers to any information linked to a specific individual that can be used to uncover their identity. This can include social security numbers, full names, email addresses, or phone numbers. As people increasingly rely on technology, the amount of PII shared with organizations has grown, leading to an increased risk of data breaches. Hackers can use stolen PII to commit identity theft, sell it on the black market, or hold it captive via ransomware.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined. For example, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be categorized as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, can directly identify an individual and cause significant harm if leaked or stolen. Non-sensitive PII, such as a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose a risk to individuals.\n\nData privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII may or may not be publicly available and may not require protection. However, many companies still put safeguards in place to protect non-sensitive PII due to the potential risks associated with combining multiple pieces of non-sensitive PII. The classification of PII as sensitive or non-sensitive depends heavily on context.","question":"What are the two types of PII and how do they differ?","response":"PII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"PII, or Personally Identifiable Information, refers to any information linked to a specific individual that can be used to uncover their identity. This can include social security numbers, full names, email addresses, or phone numbers. As people increasingly rely on technology, the amount of PII shared with organizations has grown, leading to an increased risk of data breaches. Hackers can use stolen PII to commit identity theft, sell it on the black market, or hold it captive via ransomware.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined. For example, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be categorized as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, can directly identify an individual and cause significant harm if leaked or stolen. Non-sensitive PII, such as a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose a risk to individuals.\n\nData privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII may or may not be publicly available and may not require protection. However, many companies still put safeguards in place to protect non-sensitive PII due to the potential risks associated with combining multiple pieces of non-sensitive PII. The classification of PII as sensitive or non-sensitive depends heavily on context.","question":"What are examples of sensitive PII and non-sensitive PII?","response":"Sensitive PII includes social security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"PII, or Personally Identifiable Information, refers to any information linked to a specific individual that can be used to uncover their identity. This can include social security numbers, full names, email addresses, or phone numbers. As people increasingly rely on technology, the amount of PII shared with organizations has grown, leading to an increased risk of data breaches. Hackers can use stolen PII to commit identity theft, sell it on the black market, or hold it captive via ransomware.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined. For example, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be categorized as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, can directly identify an individual and cause significant harm if leaked or stolen. Non-sensitive PII, such as a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose a risk to individuals.\n\nData privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII may or may not be publicly available and may not require protection. However, many companies still put safeguards in place to protect non-sensitive PII due to the potential risks associated with combining multiple pieces of non-sensitive PII. The classification of PII as sensitive or non-sensitive depends heavily on context.","question":"What are the potential risks associated with combining multiple pieces of non-sensitive PII?","response":"When combined, multiple pieces of non-sensitive PII can still pose a risk to individuals. For example, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"PII, or Personally Identifiable Information, refers to any information linked to a specific individual that can be used to uncover their identity. This can include social security numbers, full names, email addresses, or phone numbers. As people increasingly rely on technology, the amount of PII shared with organizations has grown, leading to an increased risk of data breaches. Hackers can use stolen PII to commit identity theft, sell it on the black market, or hold it captive via ransomware.\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined. For example, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be categorized as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, can directly identify an individual and cause significant harm if leaked or stolen. Non-sensitive PII, such as a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose a risk to individuals.\n\nData privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII may or may not be publicly available and may not require protection. However, many companies still put safeguards in place to protect non-sensitive PII due to the potential risks associated with combining multiple pieces of non-sensitive PII. The classification of PII as sensitive or non-sensitive depends heavily on context.","question":"What are the typical requirements of data privacy laws regarding PII?","response":"Data privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII may or may not be publicly available and may not require protection.\n"}
{"icl_document":"hiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"PII, or Personally Identifiable Information, refers to any information linked to a specific individual that can be used to uncover their identity. This can include social security numbers, full names, email addresses, or phone numbers. As people increasingly rely on technology, the amount of PII shared with organizations has grown, leading to an increased risk of data breaches. Hackers can use stolen PII to commit identity theft, sell it on the black market, or hold it captive via ransomware.\n\nPII comes in two types: direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined. For example, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may not cause significant harm if leaked or stolen in isolation. However, when combined, criminals could use non-sensitive PII to cause trouble, such as breaking into someone's bank account app.\n\nData privacy laws require organizations to safeguard sensitive PII, while some regulations don't require the protection of non-sensitive PII. However, many companies put safeguards in place anyway due to the potential misuse of non-sensitive PII. The classification of PII as sensitive or non-sensitive depends heavily on context.","question":"What is the definition of Personally Identifiable Information (PII)?","response":"Personally Identifiable Information (PII) refers to any information linked to a specific individual that can be used to uncover their identity. This can include social security numbers, full names, email addresses, or phone numbers.\n"}
{"icl_document":"hiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"PII, or Personally Identifiable Information, refers to any information linked to a specific individual that can be used to uncover their identity. This can include social security numbers, full names, email addresses, or phone numbers. As people increasingly rely on technology, the amount of PII shared with organizations has grown, leading to an increased risk of data breaches. Hackers can use stolen PII to commit identity theft, sell it on the black market, or hold it captive via ransomware.\n\nPII comes in two types: direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined. For example, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may not cause significant harm if leaked or stolen in isolation. However, when combined, criminals could use non-sensitive PII to cause trouble, such as breaking into someone's bank account app.\n\nData privacy laws require organizations to safeguard sensitive PII, while some regulations don't require the protection of non-sensitive PII. However, many companies put safeguards in place anyway due to the potential misuse of non-sensitive PII. The classification of PII as sensitive or non-sensitive depends heavily on context.","question":"What are the two types of PII and how do they differ?","response":"PII comes in two types: direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined.\n"}
{"icl_document":"hiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"PII, or Personally Identifiable Information, refers to any information linked to a specific individual that can be used to uncover their identity. This can include social security numbers, full names, email addresses, or phone numbers. As people increasingly rely on technology, the amount of PII shared with organizations has grown, leading to an increased risk of data breaches. Hackers can use stolen PII to commit identity theft, sell it on the black market, or hold it captive via ransomware.\n\nPII comes in two types: direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined. For example, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may not cause significant harm if leaked or stolen in isolation. However, when combined, criminals could use non-sensitive PII to cause trouble, such as breaking into someone's bank account app.\n\nData privacy laws require organizations to safeguard sensitive PII, while some regulations don't require the protection of non-sensitive PII. However, many companies put safeguards in place anyway due to the potential misuse of non-sensitive PII. The classification of PII as sensitive or non-sensitive depends heavily on context.","question":"What are examples of sensitive and non-sensitive PII?","response":"Sensitive PII includes social security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and organizations.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient to determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose risks.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures, while non-sensitive PII may or may not be protected depending on the regulations and the organization's policies. The classification of PII as sensitive or non-sensitive depends on the context, such as the specific use case or potential harm that could result from a breach.","question":"What is the definition of Personally Identifiable Information (PII)?","response":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and organizations.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient to determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose risks.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures, while non-sensitive PII may or may not be protected depending on the regulations and the organization's policies. The classification of PII as sensitive or non-sensitive depends on the context, such as the specific use case or potential harm that could result from a breach.","question":"What are the two categories of PII?","response":"PII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient to determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and organizations.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient to determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose risks.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures, while non-sensitive PII may or may not be protected depending on the regulations and the organization's policies. The classification of PII as sensitive or non-sensitive depends on the context, such as the specific use case or potential harm that could result from a breach.","question":"What is the difference between sensitive and non-sensitive PII?","response":"Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose risks.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and organizations.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient to determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose risks.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures, while non-sensitive PII may or may not be protected depending on the regulations and the organization's policies. The classification of PII as sensitive or non-sensitive depends on the context, such as the specific use case or potential harm that could result from a breach.","question":"What are some examples of direct identifiers?","response":"Examples of direct identifiers include passport numbers, driver's license numbers, and social security numbers.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_detailed","document":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and organizations.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient to determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose risks.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures, while non-sensitive PII may or may not be protected depending on the regulations and the organization's policies. The classification of PII as sensitive or non-sensitive depends on the context, such as the specific use case or potential harm that could result from a breach.","question":"What are some examples of indirect identifiers?","response":"Examples of indirect identifiers include race, place of birth, and ZIP code.\n"}
{"icl_document":"hiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Sensitive information becomes Personally Identifiable Information (PII) when it can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A case in point is a recent Federal Trade Commission (FTC) lawsuit against the data broker Kochava. The FTC argued that Kochava was selling geolocation data that qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. Specifically, the location of a mobile device at night is likely the user's home address, which can be combined with property records to uncover their identity.\n\nTechnological advancements are also making it easier to identify individuals with fewer pieces of information, potentially lowering the threshold for what is considered PII. For example, researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"What is the definition of Personally Identifiable Information (PII)?","response":"Personally Identifiable Information (PII) refers to sensitive information that can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII.\n"}
{"icl_document":"hiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Sensitive information becomes Personally Identifiable Information (PII) when it can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A case in point is a recent Federal Trade Commission (FTC) lawsuit against the data broker Kochava. The FTC argued that Kochava was selling geolocation data that qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. Specifically, the location of a mobile device at night is likely the user's home address, which can be combined with property records to uncover their identity.\n\nTechnological advancements are also making it easier to identify individuals with fewer pieces of information, potentially lowering the threshold for what is considered PII. For example, researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"How does the context of information presentation determine whether it is considered PII?","response":"The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII.\n"}
{"icl_document":"hiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Sensitive information becomes Personally Identifiable Information (PII) when it can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A case in point is a recent Federal Trade Commission (FTC) lawsuit against the data broker Kochava. The FTC argued that Kochava was selling geolocation data that qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. Specifically, the location of a mobile device at night is likely the user's home address, which can be combined with property records to uncover their identity.\n\nTechnological advancements are also making it easier to identify individuals with fewer pieces of information, potentially lowering the threshold for what is considered PII. For example, researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"What is an example of how technological advancements are making it easier to identify individuals with fewer pieces of information?","response":"Researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.\n"}
{"icl_document":"hiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Sensitive information becomes Personally Identifiable Information (PII) when it can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A case in point is a recent Federal Trade Commission (FTC) lawsuit against the data broker Kochava. The FTC argued that Kochava was selling geolocation data that qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. Specifically, the location of a mobile device at night is likely the user's home address, which can be combined with property records to uncover their identity.\n\nTechnological advancements are also making it easier to identify individuals with fewer pieces of information, potentially lowering the threshold for what is considered PII. For example, researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"What are the two categories of PII?","response":"PII is classified into two categories: direct identifiers and indirect identifiers.\n"}
{"icl_document":"hiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Sensitive information becomes Personally Identifiable Information (PII) when it can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A case in point is a recent Federal Trade Commission (FTC) lawsuit against the data broker Kochava. The FTC argued that Kochava was selling geolocation data that qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. Specifically, the location of a mobile device at night is likely the user's home address, which can be combined with property records to uncover their identity.\n\nTechnological advancements are also making it easier to identify individuals with fewer pieces of information, potentially lowering the threshold for what is considered PII. For example, researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"What is an example of a direct identifier?","response":"Examples of direct identifiers include full name, social security number, and home address.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Sensitive information becomes Personally Identifiable Information (PII) when it can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A case in point is a recent Federal Trade Commission (FTC) lawsuit against the data broker Kochava. The FTC argued that Kochava was selling geolocation data that qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. This was possible by combining the location of a mobile device at night, which is likely the user's home address, with property records to uncover their identity.\n\nTechnological advancements are also lowering the threshold for what is considered PII by making it easier to identify people with fewer pieces of information. For example, researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"What is the definition of Personally Identifiable Information (PII)?","response":"Personally Identifiable Information (PII) refers to sensitive information that can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Sensitive information becomes Personally Identifiable Information (PII) when it can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A case in point is a recent Federal Trade Commission (FTC) lawsuit against the data broker Kochava. The FTC argued that Kochava was selling geolocation data that qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. This was possible by combining the location of a mobile device at night, which is likely the user's home address, with property records to uncover their identity.\n\nTechnological advancements are also lowering the threshold for what is considered PII by making it easier to identify people with fewer pieces of information. For example, researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"How does the context of information presentation determine whether it is considered PII?","response":"The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Sensitive information becomes Personally Identifiable Information (PII) when it can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A case in point is a recent Federal Trade Commission (FTC) lawsuit against the data broker Kochava. The FTC argued that Kochava was selling geolocation data that qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. This was possible by combining the location of a mobile device at night, which is likely the user's home address, with property records to uncover their identity.\n\nTechnological advancements are also lowering the threshold for what is considered PII by making it easier to identify people with fewer pieces of information. For example, researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"What is an example of how anonymous location data can become PII?","response":"An example of how anonymous location data can become PII is a recent Federal Trade Commission (FTC) lawsuit against the data broker Kochava. The FTC argued that Kochava was selling geolocation data that qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_detailed","document":"Sensitive information becomes Personally Identifiable Information (PII) when it can be used to identify a specific individual. The context in which the information is presented plays a role in determining whether it is considered PII. For instance, aggregated anonymous geolocation data is generally seen as generic personal data because the identity of a single user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A case in point is a recent Federal Trade Commission (FTC) lawsuit against the data broker Kochava. The FTC argued that Kochava was selling geolocation data that qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. This was possible by combining the location of a mobile device at night, which is likely the user's home address, with property records to uncover their identity.\n\nTechnological advancements are also lowering the threshold for what is considered PII by making it easier to identify people with fewer pieces of information. For example, researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"How can indirect identifiers be used to identify a person?","response":"Indirect identifiers cannot be used to identify a person directly but can be combined with other data to identify a person. For example, researchers at IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some instances but sensitive in others.","question":"What is the definition of Personally Identifiable Information?","response":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some instances but sensitive in others.","question":"What are the two categories of Personally Identifiable Information?","response":"Personally Identifiable Information can be categorized into direct and indirect identifiers.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some instances but sensitive in others.","question":"What is the difference between direct and indirect identifiers?","response":"Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some instances but sensitive in others.","question":"What are examples of direct identifiers?","response":"Examples of direct identifiers include passport numbers, driver's license numbers, and Social Security numbers.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some instances but sensitive in others.","question":"What are examples of indirect identifiers?","response":"Examples of indirect identifiers include race, place of birth, and ZIP code.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some instances but sensitive in others.","question":"What is the definition of sensitive PII?","response":"Sensitive PII directly identifies an individual and could cause significant harm if leaked or stolen.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some instances but sensitive in others.","question":"What are examples of sensitive PII?","response":"Examples of sensitive PII include Social Security numbers, unique identification numbers, biometric data, financial information, and medical records.\n"}
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some instances but sensitive in others.","question":"What is the definition of non-sensitive PII?","response":"Non-sensitive PII would not cause significant harm to a person if leaked or stolen.\n"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some cases but sensitive in others.","question":"What is the definition of Personally Identifiable Information?","response":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number.\n"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some cases but sensitive in others.","question":"What are the two categories of Personally Identifiable Information?","response":"Personally Identifiable Information can be categorized into direct and indirect identifiers.\n"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some cases but sensitive in others.","question":"What is the difference between direct and indirect identifiers?","response":"Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined.\n"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some cases but sensitive in others.","question":"What is sensitive PII?","response":"Sensitive PII directly identifies an individual and could cause significant harm if leaked or stolen. Examples include Social Security numbers, unique identification numbers, biometric data, financial information, and medical records.\n"}
{"icl_document":"hiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to an increased interest from cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, with the average cost of a data breach caused by a ransomware attack being USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like a passport number or driver's license number, are unique to a person and can determine someone's identity with a single identifier. Indirect identifiers, such as race and place of birth, are not unique but can identify a person when combined; for instance, 87% of US citizens can be identified based on their gender, ZIP code, and date of birth.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, which directly identifies an individual and could cause significant harm if leaked or stolen, includes Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, which would not cause significant harm to a person if leaked or stolen, includes a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion. While not always required by data privacy laws, many companies still safeguard non-sensitive PII to prevent criminals from causing trouble by combining multiple pieces of non-sensitive PII. The sensitivity of PII depends on the context, as a full name or phone number can be non-sensitive in some cases but sensitive in others.","question":"What is non-sensitive PII?","response":"Non-sensitive PII would not cause significant harm to a person if leaked or stolen. Examples include a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to uncover their identity, such as social security number, full name, email address, or phone number. As people increasingly rely on technology, the sharing of PII has grown, with companies collecting personal data for market understanding and consumers willingly providing information for services and online shopping. While sharing PII has benefits, like tailored products and services, it also attracts cybercriminals who steal PII for identity theft, black market sales, or ransomware attacks, with an average cost of USD 5.13 million per data breach.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient for identification. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, or place and date of birth, may or may not be unique but would not cause significant harm in isolation if leaked or stolen. However, when combined, non-sensitive PII can be used for malicious purposes, such as hacking into someone's bank account with their phone number, email address, and mother's maiden name.","question":"What is the definition of Personally Identifiable Information?","response":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to uncover their identity, such as social security number, full name, email address, or phone number.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to uncover their identity, such as social security number, full name, email address, or phone number. As people increasingly rely on technology, the sharing of PII has grown, with companies collecting personal data for market understanding and consumers willingly providing information for services and online shopping. While sharing PII has benefits, like tailored products and services, it also attracts cybercriminals who steal PII for identity theft, black market sales, or ransomware attacks, with an average cost of USD 5.13 million per data breach.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient for identification. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, or place and date of birth, may or may not be unique but would not cause significant harm in isolation if leaked or stolen. However, when combined, non-sensitive PII can be used for malicious purposes, such as hacking into someone's bank account with their phone number, email address, and mother's maiden name.","question":"What are the two categories of Personally Identifiable Information?","response":"Personally Identifiable Information can be categorized into direct and indirect identifiers.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to uncover their identity, such as social security number, full name, email address, or phone number. As people increasingly rely on technology, the sharing of PII has grown, with companies collecting personal data for market understanding and consumers willingly providing information for services and online shopping. While sharing PII has benefits, like tailored products and services, it also attracts cybercriminals who steal PII for identity theft, black market sales, or ransomware attacks, with an average cost of USD 5.13 million per data breach.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient for identification. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, or place and date of birth, may or may not be unique but would not cause significant harm in isolation if leaked or stolen. However, when combined, non-sensitive PII can be used for malicious purposes, such as hacking into someone's bank account with their phone number, email address, and mother's maiden name.","question":"What is the difference between direct and indirect identifiers?","response":"Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient for identification. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to uncover their identity, such as social security number, full name, email address, or phone number. As people increasingly rely on technology, the sharing of PII has grown, with companies collecting personal data for market understanding and consumers willingly providing information for services and online shopping. While sharing PII has benefits, like tailored products and services, it also attracts cybercriminals who steal PII for identity theft, black market sales, or ransomware attacks, with an average cost of USD 5.13 million per data breach.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient for identification. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, or place and date of birth, may or may not be unique but would not cause significant harm in isolation if leaked or stolen. However, when combined, non-sensitive PII can be used for malicious purposes, such as hacking into someone's bank account with their phone number, email address, and mother's maiden name.","question":"What is sensitive PII?","response":"Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to uncover their identity, such as social security number, full name, email address, or phone number. As people increasingly rely on technology, the sharing of PII has grown, with companies collecting personal data for market understanding and consumers willingly providing information for services and online shopping. While sharing PII has benefits, like tailored products and services, it also attracts cybercriminals who steal PII for identity theft, black market sales, or ransomware attacks, with an average cost of USD 5.13 million per data breach.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient for identification. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, or place and date of birth, may or may not be unique but would not cause significant harm in isolation if leaked or stolen. However, when combined, non-sensitive PII can be used for malicious purposes, such as hacking into someone's bank account with their phone number, email address, and mother's maiden name.","question":"What is non-sensitive PII?","response":"Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, or place and date of birth, may or may not be unique but would not cause significant harm in isolation if leaked or stolen. However, when combined, non-sensitive PII can be used for malicious purposes, such as hacking into someone's bank account with their phone number, email address, and mother's maiden name.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to uncover their identity, such as social security number, full name, email address, or phone number. As people increasingly rely on technology, the sharing of PII has grown, with companies collecting personal data for market understanding and consumers willingly providing information for services and online shopping. While sharing PII has benefits, like tailored products and services, it also attracts cybercriminals who steal PII for identity theft, black market sales, or ransomware attacks, with an average cost of USD 5.13 million per data breach.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient for identification. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, or place and date of birth, may or may not be unique but would not cause significant harm in isolation if leaked or stolen. However, when combined, non-sensitive PII can be used for malicious purposes, such as hacking into someone's bank account with their phone number, email address, and mother's maiden name.","question":"What is the format of a typical Western name?","response":"The format of a typical Western name is FirstName LastName, where FirstName is the given name and LastName is the family name. For example, James Smith"}
{"icl_document":"hiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal that person's identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their professional and personal lives, the sharing of PII with organizations has escalated. Companies collect personal data for market understanding, while consumers willingly share their contact information for services and online shopping. Although PII sharing has benefits like personalized products and services, it also attracts cybercriminals who steal this data for identity theft, black-market sales, or ransomware attacks. The IBM Cost of a Data Breach 2023 report states that the average cost of a data breach caused by ransomware is USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, unique to a person, include passport or driver's license numbers. Indirect identifiers, like race and place of birth, aren't unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, directly identifying an individual, could cause significant harm if leaked, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, like a person's full name or email address, may not cause significant harm in isolation but could be problematic when combined with other data.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures. Non-sensitive PII may or may not be unique to a person and could be publicly available. While not all regulations require protection for non-sensitive PII, companies often implement safeguards to prevent criminals from causing trouble by assembling multiple pieces of non-sensitive PII.","question":"What is the definition of Personally Identifiable Information?","response":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal that person's identity, such as their Social Security number, full name, email address, or phone number.\n"}
{"icl_document":"hiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal that person's identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their professional and personal lives, the sharing of PII with organizations has escalated. Companies collect personal data for market understanding, while consumers willingly share their contact information for services and online shopping. Although PII sharing has benefits like personalized products and services, it also attracts cybercriminals who steal this data for identity theft, black-market sales, or ransomware attacks. The IBM Cost of a Data Breach 2023 report states that the average cost of a data breach caused by ransomware is USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, unique to a person, include passport or driver's license numbers. Indirect identifiers, like race and place of birth, aren't unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, directly identifying an individual, could cause significant harm if leaked, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, like a person's full name or email address, may not cause significant harm in isolation but could be problematic when combined with other data.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures. Non-sensitive PII may or may not be unique to a person and could be publicly available. While not all regulations require protection for non-sensitive PII, companies often implement safeguards to prevent criminals from causing trouble by assembling multiple pieces of non-sensitive PII.","question":"What are the two categories of Personally Identifiable Information?","response":"Personally Identifiable Information can be categorized into direct and indirect identifiers.\n"}
{"icl_document":"hiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal that person's identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their professional and personal lives, the sharing of PII with organizations has escalated. Companies collect personal data for market understanding, while consumers willingly share their contact information for services and online shopping. Although PII sharing has benefits like personalized products and services, it also attracts cybercriminals who steal this data for identity theft, black-market sales, or ransomware attacks. The IBM Cost of a Data Breach 2023 report states that the average cost of a data breach caused by ransomware is USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, unique to a person, include passport or driver's license numbers. Indirect identifiers, like race and place of birth, aren't unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, directly identifying an individual, could cause significant harm if leaked, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, like a person's full name or email address, may not cause significant harm in isolation but could be problematic when combined with other data.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures. Non-sensitive PII may or may not be unique to a person and could be publicly available. While not all regulations require protection for non-sensitive PII, companies often implement safeguards to prevent criminals from causing trouble by assembling multiple pieces of non-sensitive PII.","question":"What are examples of direct identifiers?","response":"Examples of direct identifiers include passport or driver's license numbers.\n"}
{"icl_document":"hiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal that person's identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their professional and personal lives, the sharing of PII with organizations has escalated. Companies collect personal data for market understanding, while consumers willingly share their contact information for services and online shopping. Although PII sharing has benefits like personalized products and services, it also attracts cybercriminals who steal this data for identity theft, black-market sales, or ransomware attacks. The IBM Cost of a Data Breach 2023 report states that the average cost of a data breach caused by ransomware is USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, unique to a person, include passport or driver's license numbers. Indirect identifiers, like race and place of birth, aren't unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, directly identifying an individual, could cause significant harm if leaked, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, like a person's full name or email address, may not cause significant harm in isolation but could be problematic when combined with other data.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures. Non-sensitive PII may or may not be unique to a person and could be publicly available. While not all regulations require protection for non-sensitive PII, companies often implement safeguards to prevent criminals from causing trouble by assembling multiple pieces of non-sensitive PII.","question":"What are examples of indirect identifiers?","response":"Examples of indirect identifiers include race and place of birth.\n"}
{"icl_document":"hiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal that person's identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their professional and personal lives, the sharing of PII with organizations has escalated. Companies collect personal data for market understanding, while consumers willingly share their contact information for services and online shopping. Although PII sharing has benefits like personalized products and services, it also attracts cybercriminals who steal this data for identity theft, black-market sales, or ransomware attacks. The IBM Cost of a Data Breach 2023 report states that the average cost of a data breach caused by ransomware is USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, unique to a person, include passport or driver's license numbers. Indirect identifiers, like race and place of birth, aren't unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, directly identifying an individual, could cause significant harm if leaked, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, like a person's full name or email address, may not cause significant harm in isolation but could be problematic when combined with other data.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures. Non-sensitive PII may or may not be unique to a person and could be publicly available. While not all regulations require protection for non-sensitive PII, companies often implement safeguards to prevent criminals from causing trouble by assembling multiple pieces of non-sensitive PII.","question":"What is the difference between sensitive and non-sensitive PII?","response":"Sensitive PII, directly identifying an individual, could cause significant harm if leaked, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, like a person's full name or email address, may not cause significant harm in isolation but could be problematic when combined with other data.\n"}
{"icl_document":"hiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal that person's identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their professional and personal lives, the sharing of PII with organizations has escalated. Companies collect personal data for market understanding, while consumers willingly share their contact information for services and online shopping. Although PII sharing has benefits like personalized products and services, it also attracts cybercriminals who steal this data for identity theft, black-market sales, or ransomware attacks. The IBM Cost of a Data Breach 2023 report states that the average cost of a data breach caused by ransomware is USD 5.13 million.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, unique to a person, include passport or driver's license numbers. Indirect identifiers, like race and place of birth, aren't unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, directly identifying an individual, could cause significant harm if leaked, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records. Non-sensitive PII, like a person's full name or email address, may not cause significant harm in isolation but could be problematic when combined with other data.\n\nData privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures. Non-sensitive PII may or may not be unique to a person and could be publicly available. While not all regulations require protection for non-sensitive PII, companies often implement safeguards to prevent criminals from causing trouble by assembling multiple pieces of non-sensitive PII.","question":"What are some examples of data privacy laws that require organizations to safeguard sensitive PII?","response":"Data privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to benefits like personalized products and services but also attracting cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it for ransom, causing significant financial losses.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose risks. Data privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII protection varies depending on the context and regulations.","question":"What is the definition of Personally Identifiable Information (PII)?","response":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to benefits like personalized products and services but also attracting cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it for ransom, causing significant financial losses.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose risks. Data privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII protection varies depending on the context and regulations.","question":"What are the two categories of Personally Identifiable Information?","response":"Personally Identifiable Information can be categorized into direct and indirect identifiers.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to benefits like personalized products and services but also attracting cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it for ransom, causing significant financial losses.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose risks. Data privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII protection varies depending on the context and regulations.","question":"What is the difference between direct and indirect identifiers?","response":"Direct identifiers, like passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to benefits like personalized products and services but also attracting cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it for ransom, causing significant financial losses.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose risks. Data privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII protection varies depending on the context and regulations.","question":"What are examples of sensitive Personally Identifiable Information?","response":"Examples of sensitive Personally Identifiable Information include Social Security numbers, unique identification numbers, biometric data, financial information, and medical records.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to benefits like personalized products and services but also attracting cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it for ransom, causing significant financial losses.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose risks. Data privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII protection varies depending on the context and regulations.","question":"What are examples of non-sensitive Personally Identifiable Information?","response":"Examples of non-sensitive Personally Identifiable Information include a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to benefits like personalized products and services but also attracting cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it for ransom, causing significant financial losses.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose risks. Data privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII protection varies depending on the context and regulations.","question":"What are the potential consequences of sensitive Personally Identifiable Information being leaked or stolen?","response":"If sensitive Personally Identifiable Information is leaked or stolen, it could directly identify an individual and cause significant harm.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_extractive","document":"Personally Identifiable Information (PII) refers to any data linked to a specific individual that can be used to reveal their identity, such as their Social Security number, full name, email address, or phone number. As people increasingly use technology in their personal and professional lives, the sharing of PII with organizations has grown, leading to benefits like personalized products and services but also attracting cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it for ransom, causing significant financial losses.\n\nPII can be categorized into direct and indirect identifiers. Direct identifiers, like passport or driver's license numbers, are unique to a person and can determine their identity alone. Indirect identifiers, like race and place of birth, are not unique but can identify someone when combined, such as gender, ZIP code, and date of birth identifying 87% of US citizens.\n\nPII can also be classified as sensitive or non-sensitive. Sensitive PII, such as Social Security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they could still pose risks. Data privacy laws typically require organizations to safeguard sensitive PII, while non-sensitive PII protection varies depending on the context and regulations.","question":"What are the potential consequences of non-sensitive Personally Identifiable Information being leaked or stolen?","response":"If non-sensitive Personally Identifiable Information is leaked or stolen, it may or may not be unique to a person but would not cause significant harm"}
{"icl_document":"hiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Sensitive information can be classified as Personally Identifiable Information (PII) based on context. Generally, aggregated anonymous geolocation data is considered generic personal data since the identity of a specific user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A notable example is the Federal Trade Commission (FTC) lawsuit against data broker Kochava, which sold geolocation data that counted as PII. The FTC argued that Kochava's customized data feeds enabled purchasers to identify and track specific mobile device users, making it possible to link the location of a mobile device at night (likely the user's home address) with property records to uncover their identity.\n\nTechnological advancements are lowering the threshold for what is considered PII, as researchers can now identify individuals with fewer pieces of information. IBM and the University of Maryland developed an algorithm that combines anonymous location data with publicly available information from social networking sites to identify specific individuals, highlighting the evolving nature of PII.","question":"What is the difference between direct and indirect identifiers of Personally Identifiable Information?","response":"Direct identifiers are data elements that can be used to identify an individual without additional information, such as full name, social security number, and home address. Indirect identifiers are data elements that cannot be used to identify an individual without additional information, such as date of birth, gender, and race.\n"}
{"icl_document":"hiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Sensitive information can be classified as Personally Identifiable Information (PII) based on context. Generally, aggregated anonymous geolocation data is considered generic personal data since the identity of a specific user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A notable example is the Federal Trade Commission (FTC) lawsuit against data broker Kochava, which sold geolocation data that counted as PII. The FTC argued that Kochava's customized data feeds enabled purchasers to identify and track specific mobile device users, making it possible to link the location of a mobile device at night (likely the user's home address) with property records to uncover their identity.\n\nTechnological advancements are lowering the threshold for what is considered PII, as researchers can now identify individuals with fewer pieces of information. IBM and the University of Maryland developed an algorithm that combines anonymous location data with publicly available information from social networking sites to identify specific individuals, highlighting the evolving nature of PII.","question":"How can anonymous location data become Personally Identifiable Information?","response":"Anonymous location data can become Personally Identifiable Information when it is combined with publicly available information from social networking sites, as demonstrated by IBM and the University of Maryland's algorithm. This combination can enable the identification of specific individuals, even if the location data is initially anonymous.\n"}
{"icl_document":"hiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Sensitive information can be classified as Personally Identifiable Information (PII) based on context. Generally, aggregated anonymous geolocation data is considered generic personal data since the identity of a specific user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A notable example is the Federal Trade Commission (FTC) lawsuit against data broker Kochava, which sold geolocation data that counted as PII. The FTC argued that Kochava's customized data feeds enabled purchasers to identify and track specific mobile device users, making it possible to link the location of a mobile device at night (likely the user's home address) with property records to uncover their identity.\n\nTechnological advancements are lowering the threshold for what is considered PII, as researchers can now identify individuals with fewer pieces of information. IBM and the University of Maryland developed an algorithm that combines anonymous location data with publicly available information from social networking sites to identify specific individuals, highlighting the evolving nature of PII.","question":"What is an example of a direct identifier of Personally Identifiable Information?","response":"An example of a direct identifier of Personally Identifiable Information is a full name.\n"}
{"icl_document":"hiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Sensitive information can be classified as Personally Identifiable Information (PII) based on context. Generally, aggregated anonymous geolocation data is considered generic personal data since the identity of a specific user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A notable example is the Federal Trade Commission (FTC) lawsuit against data broker Kochava, which sold geolocation data that counted as PII. The FTC argued that Kochava's customized data feeds enabled purchasers to identify and track specific mobile device users, making it possible to link the location of a mobile device at night (likely the user's home address) with property records to uncover their identity.\n\nTechnological advancements are lowering the threshold for what is considered PII, as researchers can now identify individuals with fewer pieces of information. IBM and the University of Maryland developed an algorithm that combines anonymous location data with publicly available information from social networking sites to identify specific individuals, highlighting the evolving nature of PII.","question":"What is an example of how anonymous location data can become Personally Identifiable Information?","response":"An example of how anonymous location data can become Personally Identifiable Information is when it is combined with property records to uncover the identity of a mobile device user.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Sensitive information can be classified as Personally Identifiable Information (PII) based on context. Generally, aggregated anonymous geolocation data is considered generic personal data since the identity of a specific user cannot be isolated. However, individual records of anonymous geolocation data can become PII. This was highlighted in a Federal Trade Commission (FTC) lawsuit against data broker Kochava. The FTC argued that Kochava's geolocation data qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. This was possible by linking the location of a mobile device at night, which is likely the user's home address, with property records to uncover their identity. The advancement in technology is making it easier to identify individuals with fewer pieces of information, which could potentially lower the threshold for what is considered PII. This is demonstrated by an algorithm developed by researchers at IBM and the University of Maryland, which identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"What is the difference between direct and indirect identifiers?","response":"Direct identifiers are pieces of information that can be used to identify an individual without additional context, such as full name, social security number, and home address. Indirect identifiers, on the other hand, are pieces of information that can be used to identify an individual when combined with other data points, such as a date of birth alone, which is not considered PII, but when combined with a city of birth, it can be used to identify an individual.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Sensitive information can be classified as Personally Identifiable Information (PII) based on context. Generally, aggregated anonymous geolocation data is considered generic personal data since the identity of a specific user cannot be isolated. However, individual records of anonymous geolocation data can become PII. This was highlighted in a Federal Trade Commission (FTC) lawsuit against data broker Kochava. The FTC argued that Kochava's geolocation data qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. This was possible by linking the location of a mobile device at night, which is likely the user's home address, with property records to uncover their identity. The advancement in technology is making it easier to identify individuals with fewer pieces of information, which could potentially lower the threshold for what is considered PII. This is demonstrated by an algorithm developed by researchers at IBM and the University of Maryland, which identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"How can anonymous location data become PII?","response":"Anonymous location data can become PII when it is possible to identify and track specific mobile device users by linking the location of a mobile device at night, which is likely the user's home address, with property records to uncover their identity.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Sensitive information can be classified as Personally Identifiable Information (PII) based on context. Generally, aggregated anonymous geolocation data is considered generic personal data since the identity of a specific user cannot be isolated. However, individual records of anonymous geolocation data can become PII. This was highlighted in a Federal Trade Commission (FTC) lawsuit against data broker Kochava. The FTC argued that Kochava's geolocation data qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. This was possible by linking the location of a mobile device at night, which is likely the user's home address, with property records to uncover their identity. The advancement in technology is making it easier to identify individuals with fewer pieces of information, which could potentially lower the threshold for what is considered PII. This is demonstrated by an algorithm developed by researchers at IBM and the University of Maryland, which identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"How can indirect identifiers be used to identify an individual?","response":"Indirect identifiers can be used to identify an individual when combined with other data points. For instance, a date of birth alone is not considered PII, but when combined with a city of birth, it can be used to identify an individual.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"summary_extractive","document":"Sensitive information can be classified as Personally Identifiable Information (PII) based on context. Generally, aggregated anonymous geolocation data is considered generic personal data since the identity of a specific user cannot be isolated. However, individual records of anonymous geolocation data can become PII. This was highlighted in a Federal Trade Commission (FTC) lawsuit against data broker Kochava. The FTC argued that Kochava's geolocation data qualified as PII because the company's customized data feeds enabled purchasers to identify and track specific mobile device users. This was possible by linking the location of a mobile device at night, which is likely the user's home address, with property records to uncover their identity. The advancement in technology is making it easier to identify individuals with fewer pieces of information, which could potentially lower the threshold for what is considered PII. This is demonstrated by an algorithm developed by researchers at IBM and the University of Maryland, which identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"What is an example of a direct identifier?","response":"An example of a direct identifier is a full name.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"summary_extractive","document":"Sensitive information can be classified as Personally Identifiable Information (PII) based on context. Generally, aggregated anonymous geolocation data is considered generic personal data since the identity of a specific user cannot be isolated. However, individual records of anonymous geolocation data can become PII. A notable example is the Federal Trade Commission (FTC) lawsuit against data broker Kochava, which sold geolocation data that counted as PII. The FTC argued that Kochava's customized data feeds enabled purchasers to identify and track specific mobile device users, making it possible to link a user's location at night (likely their home address) with property records to uncover their identity.\n\nTechnological advancements are lowering the threshold for what is considered PII, as fewer pieces of information can now be used to identify individuals. Researchers from IBM and the University of Maryland have developed an algorithm that identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"What is the difference between direct and indirect identifiers of Personally Identifiable Information?","response":"Direct identifiers are data elements that can be used to identify an individual directly, such as a name, social security number, or email address. Indirect identifiers are data elements that cannot be used to identify an individual directly but can be combined with other data elements to identify an individual. Examples of indirect identifiers include date of birth, gender, and zip code.\n"}
{"icl_document":"hiiiiiiiiiiiiiiiiii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What is the acronym for Personally Identifiable Information?","icl_query_3":"What are the two types of Personally Identifiable Information?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"PII is the acronym for Personally Identifiable Information.","icl_response_3":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","route":"_B_","dataset_type":"base_document","document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.\n\nHackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM\u2019s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.\n\n## Direct versus indirect identifiers\n\nPII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.\n\nIndirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of US citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.\n\n## Sensitive PII versus non-sensitive PII\n\nNot all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person, such as the kind of information you might supply to verify your identity when contacting your bank.\n\nAmong PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.\n\nA social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:\n\nUnique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers.\nBiometric data, such as fingerprints and retinal scans.\nFinancial information, including bank account numbers and credit card numbers.\nMedical records.\nSensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures.\n\nNon-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:\n\nA person's full name\nMother's maiden name\nTelephone number\nIP address\nPlace of birth\nDate of birth\nGeographical details (ZIP code, city, state, country, etc.)\nEmployment information\nEmail address or mailing address\nRace or ethnicity\nReligion\nNon-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.\n\nFor example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.\n\nIt's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.","question":"What is the definition of Personally Identifiable Information (PII)?","response":"Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number.\n\n\n[QUEST"}
{"icl_document":"hiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"base_document","document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"What is the difference between direct and indirect identifiers?","response":"Direct identifiers are pieces of information that can be used to identify a person directly, such as a name, social security number, or phone number. Indirect identifiers, on the other hand, are pieces of information that can be used to identify a person indirectly, such as a person's job title, birthdate, or zip code.\n"}
{"icl_document":"hiiiiiiiiiii","raw_document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","document_outline":"To teach a language model facts about Personally Identifiable Information","domain":"personally-identifiable-information","icl_query_1":"What are examples of personally identifiable information?","icl_query_2":"What are the two types of Personally Identifiable Information?","icl_query_3":"What is the format of a typical Western name with an example?","icl_response_1":"Examples of Personally Identifiable Information include social security numbers, names, and addresses.","icl_response_2":"Direct identifiers and indirect identifiers are the two types of Personally Identifiable Information.","icl_response_3":"FirstName LastName. An example name is James Smith","route":"_B_","dataset_type":"base_document","document":"### When does sensitive information become PII?\n\nContext also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated.\n\nHowever, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com).\n\nThe FTC argues that the data broker Kochava was selling geolocation data that counted as PII because \"the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity.\"\n\nAdvances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM\u00ae and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.","question":"How can context determine whether something is considered PII?","response":"Context can determine whether something is considered PII because aggregated anonymous geolocation data is often seen as generic personal data, but individual records of anonymous geolocation data can become PII if the company's customized data feeds allow purchasers to identify and track specific mobile device users.\n"}
[root@tyler-a100 instructlab]#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment