Renaud Richardet renaud

PostgreSQL is Enough

Background and Cron Jobs

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

	Current events of September 3, 1995 (1995-09-03) (Sunday) :
	eBay is founded.
	Current events of September 6, 1995 (1995-09-06) (Wednesday) :
	NATO air strikes against Bosnian Serb forces continue, after repeated attempts at a solution to the Bosnian War fail.
	Current events of September 19, 1995 (1995-09-19) (Tuesday) :
	The Washington Post and The New York Times publish the Unabomber's manifesto.
	Current events of September 22, 1995 (1995-09-22) (Friday) :
	American millionaire Steve Forbes announces his candidacy for the 1996 Republican presidential nomination.
	Current events of September 23, 1995 (1995-09-23) (Saturday) :

	from SPARQLWrapper import SPARQLWrapper, JSON
	sparql = SPARQLWrapper("http://localhost:8890/sparql")

	for i in range(25):
	query = """
	select ?slabel ?olabel
	where {
	?s rdfs:subClassOf ?o.
	?s rdf:type owl:Class.
	?o rdf:type owl:Class.

	from gensim import models

	sentence = models.doc2vec.LabeledSentence(
	words=[u'so`bme', u'words', u'here'], tags=["SENT_0"])
	sentence1 = models.doc2vec.LabeledSentence(
	words=[u'here', u'we', u'go'], tags=["SENT_1"])

	sentences = [sentence, sentence1]

	class LabeledLineSentence(object):

	"""A simple implementation of a greedy transition-based parser. Released under BSD license."""
	from os import path
	import os
	import sys
	from collections import defaultdict
	import random
	import time
	import pickle

	SHIFT = 0; RIGHT = 1; LEFT = 2;

	-- Two dashes start a one-line comment.

	--[[
	Adding two ['s and ]'s makes it a
	multi-line comment.
	--]]

	----------------------------------------------------
	-- 1. Variables and flow control.
	----------------------------------------------------

	package topic

	import spark.broadcast._
	import spark.SparkContext
	import spark.SparkContext._
	import spark.RDD
	import spark.storage.StorageLevel
	import scala.util.Random
	import scala.math.{ sqrt, log, pow, abs, exp, min, max }
	import scala.collection.mutable.HashMap

	"""
	Add copy to clipboard from IPython!
	To install, just copy it to your profile/startup directory, typically:

	~/.ipython/profile_default/startup/

	Example usage:

	%clip hello world
	# will store "hello world"

	% A simple test suite for PA 3
	%
	% copy the comparedata.m file from last week's test suite or from
	% http://www.mathworks.com/matlabcentral/fileexchange/1459-comparedata
	% into the directory for this weeks assignment and save this file
	% as PA3Test.m
	%
	% A test can have three different results:
	% - If the test suite says "OK", your code produced the exactly the same
	% output as the sample data.