Research Proposal – Evolutionary Algorithms and Compression/Generalization of Training Data

Objective

The primary goal of this research is to harness evolutionary algorithms and genomic mappings for the development of efficient, effective, and hopefully sparse deep learning models. This entails two main areas of focus:

Architecture Optimization for Data Compression
- Identify the most computationally efficient architectures for compressing training data into the model, ensuring perfect recall. This exploration anticipates that different data types or skills might necessitate uniquely optimized architectures.
Application of Neural Growth and Pruning Cycles
- The methodology involves adding layers, cross-connections, and mutations surrounding the frozen “base-model”, which contains the compressed training data. This model will undergo phased and gated training, initially focusing on data comprehension, followed by cognitive reasoning, and then extrapolation, intuition, and creativity. The phases will be designed to equip the models with capabilities to invent and derive solutions for historical problems that were previously out-of-distribution for human solvers at the time. Progression through phases will be contingent upon the model’s continued demonstration of passing previous phase gates.

Rationale

This research is inspired by the principle of energy gradient traversal (sun’s constant energy input into a semi-closed system), emulating the evolutionary efficiency refined through billions of years of:

f(a) <-> f(o)

The discovery of Rosetta Neurons in various vision models suggests that certain concepts are universally embedded in applied mathematics, and can be learned across different models and modalities, transcending specific tasks or architectures. The idea is to map known performant architectures, components, settings, initializations, hyperparameters, etc. to a genomic space with seed individuals being CNN, RNN, RWKV, LSTM, SSM, transformers, etc. then compress data into them for perfect recall and comprehension. Then, biomimicry of neural growth and pruning cycles during the generalization phase.

Research Overview

This project seeks to revolutionize deep learning architecture search through evolutionary algorithms, phased training, and data engineering driven from pedagogic first principles. The two goals:

Identify and implement the most efficient architectures for compressing diverse data types or scoped data types, ensuring perfect recall with the least amount of compute.
Enhance the model’s capability for comprehension and generalization through cycles of neural growth and pruning. This approach is inspired by biological evolution and pedagogy, aiming to develop neural networks that can generalize knowledge creatively.

This approach is inspired by biological evolution and pedagogy, aiming to develop neural networks that can generalize knowledge creatively.

Key components of this research include:

Data Engineering: Creating or sourcing intuitive, foundational training datasets.
Genomic Mapping: Translating architecture, configurations, and training procedures to a genomic space.
Conceptualizing Rosetta Neurons: Leveraging these neurons for broader learning, interpretability, and possibly identifying comprehension neuron structures.
Concept Library: Develop a repository of learned concepts and capabilities.

Proposed Methodology

Data Engineering
- Driven by optimal pedagogic strategies, training datasets will be derived from first principles. They should be easy to follow and understand to children and adolescents, whilst being thorough enough to build foundational understanding on. These datasets will be open sourced.
Mapping of seed individuals, configurations, and training procedures to a genomic space that describes arbitrary graph structures of components, settings, and operations.
Compression of Training Data
- The initial step involves compressing the training data into the model to achieve perfect recall. This phase focuses on efficiently encoding information of different types and skills within the neural network’s architecture.
Neural Growth – Layer Addition, Cross-Connections, and Mutations
- Following the compression phase, the model undergoes a neural growth process. This involves adding additional layers and cross-connections around the base data model. Each logical chunk of training data is synthetically expanded by capable AI models into textbook and courseware style questions, quizzes, and exams focused on comprehension. This may involve the models interacting with simulators for laboratory type learning like hardware circuit simulations. The models will learn to intuit and extrapolate just as humans do!
Phased Training for Generalization
- After establishing a strong comprehension foundation, the model engages in further training for concept generalization. This phase simulates cognitive reasoning, enabling the model to extrapolate and apply learned concepts to new, unencountered scenarios, or intentionally omitted data.
- Implement phased training focused on generalizing concepts. Only progressing to the next phase after the validation loss on a generalizing dataset decreases while the original pre-training and gate losses stay low, indicating an extension in the function approximation’s output to cover the full unseen range while not disrupting the previous progress.
- This approach mirrors pedagogical strategies that lead students to develop confidence in comprehension and generalization skills, starting from basic principles and gradually introducing more complex concepts while ensuring previous progress is maintained.
- The training data will be structured to follow the evolution of human knowledge, starting with fundamental concepts before introducing revolutionary breakthroughs. The validation data will include innovative human solutions to problems plaguing us for ages, challenging the model to generalize from known context to novel scenarios.
Neural Pruning
- After the generalization phase, neural pruning occurs. This process involves refining the model by removing redundant or less significant connections, enhancing efficiency and focus on relevant patterns and information.
Cyclical Process
- The entire process is cyclical, mimicking what happens in biology. Following pruning, the model re-enters the growth phase to adapt to new data or concepts, continually evolving its architecture, capabilities, and creativity.
Visualization and Manipulation of Rosetta Neurons
- Apply techniques to visualize and manipulate Rosetta Neurons, using them as handles for controlled edits on data and for providing interpretability.
Curating a Dictionary of Concepts
- Develop a library of concepts and capabilities that can be merged into models.
Figure out a way to map the Rosetta Neurons into a transform so that they can be merged into models across architectures.

Expected Outcomes

Identification of optimal architectures to compress different types and mixtures of data.
Develop and source datasets for foundational comprehension and generalization of concepts.
Development of optimal architectures that wrap/adapt the compressed data base model to exhibit further comprehension, cognitive reasoning skills, and are able to generalize out of distribution.
Build a library of concepts and capabilities that can be merged into models.
Discover a transform for merging Rosetta Neurons into existing models across architectures.

Conclusion

This research proposes a novel approach to efficient neural network training and data engineering, drawing inspiration from evolutionary processes and educational methodologies.

By compressing the data into a base structure with perfect recall and then adding and mutating subsequent layers, we aim to find efficient structures of compression, comprehension, generalization, intuition, creativity, and causal learning.

Background

Clay has a foundation in engineering and research developed from childhood in isolated rural Texas, starting with only his imagination, Discovery/Science Channels (Aliens!), Popular Science/Mechanics magazines, programming books at 12 years old, and eventually dial-up in Junior High. Embracing autodidactism as a sport from beginnings in PBASIC, VB6, robotics, (L/B)AMP server admin, and CCDC (Collegiate Cyber Defense Competitions) has facilitated his transition into industry as a Twitter SRE, Pivotal Software Cloud Foundry Engineer, and then into scaling Loom’s infrastructure from $30M to $300M valuation. He recently intuited the vectors to jump in and surf the accelerating shockwave by delobotomizing alpaca-lora’s instruction dataset. He brings a unique perspective on learning algorithms, data processing, and the critical ability of learning how to learn and how to teach. This background has instilled a deep understanding of computational principles, path-finding, and precognition of the tech tree crucial for the acceleration of humanity through FOSS. He also hates the midwit voice oozing from AI writing and dropped out of college with a ~1.3 GPA.

License

This post is published under CC-BY

This license enables users to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.

claysauruswrecks/RP-EA-Generalization.md