Skip to content

Instantly share code, notes, and snippets.

@wecacuee
Created February 19, 2020 00:58
Show Gist options
  • Save wecacuee/7afe01e62d53025733077b8d37e91e00 to your computer and use it in GitHub Desktop.
Save wecacuee/7afe01e62d53025733077b8d37e91e00 to your computer and use it in GitHub Desktop.
Notes from AAAI 2020

AAAI 2020

  • Broad trends: Graph networks, Neuro-symbolics, Hierarchies

Plenary Talks

Interesting papers

  • An Intrinsically-­‐Motivated Approach for Learning Highly Exploring and Fast Mixing Policies

  • [-] Reinforcement Learningof Risk-­‐Constrained Policies in Markov Decision Processes Tomáš Brázdil, Krishnendu Chatterjee, Petr Novotný, Jiří Vahala

  • Few-­‐Shot Bayesian Imitation Learning with Logical Program Policies

  • Off-Policy Evaluation in Partially Observable Environments Guy Tennenholtz, Shie Mannor, Uri Shalit https://arxiv.org/abs/1909.03739

    • " Our work sits at an intersection between the fields of RL and Causal Inference."
    • "In Decoupled POMDPs, observed and unobserved states are separated into two distinct processes, with a coupling between them at each time step."
    • "we demonstrate the use of a well-known approach, Importance Sampling (IS): a reweighting of rewards generated by the be- havior policy, π b , such that they are equivalent to unbiased rewards from an evaluation policy π e ."
  • Deep Conservative Policy Iteration

  • Deep Model-­‐Based Reinforcement Learning via Estimated Uncertainty
    and Conservative Policy Optimization

  • Querying to Find a Safe Policy Under Uncertain Safety Constraints in Markov Decision Process Shun Zhang, Edmund H. Durfee, Satinder Singh

  • Policy Search by Target Distribution Learning for Continuous Control

    • Chuheng Zhang Yuanqi Li Jian Li
    • https://arxiv.org/abs/1905.11041
    • "It is known that existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may suffer from overly large gradients when the current policy is close to determin- istic, leading to an unstable training process. We show that such instability can happen even in a very simple environment."
  • Tree-­‐Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video

    • Jie Wu 1 , Guanbin Li 1∗ , Si Liu 2 , Liang Lin
    • http://colalab.org/media/paper/AAAI2020-Tree-Structured.pdf
    • "Inspired by hu- man’s coarse-to-fine decision-making paradigm, we design a tree-structured policy to decompose complex action poli- cies and propose a more reasonable primitive action via two- stages selection, instead of using a flat policy that maps the state feature to action directly (He et al. 2019). As shown in the right half of Figure 2, the tree-structured policy consist- s of a root policy and a leaf policy at each time step. The root policy π r (a rt |s t ) decides which semantic branch will be primarily relied on. The leaf policy π l (a lt |s t , a rt ) consists of five sub-policies, which corresponds to five high-level semantic branches."
  • Gradient-­‐Aware Model-­‐based Policy Search

    • Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli
    • https://arxiv.org/abs/1909.04115
    • propagate environment model gradients based on weights based on policy gradients
  • Deterministic Value-­‐Policy Gradients

  • Safe Linear Stochastic Bandits

    • https://arxiv.org/pdf/1911.09501.pdf
    • "the learner is required to select an arm with an expected reward that is no less than a predetermined (safe) threshold with high probability"
    • P (<Xₜ,θ*> ≥ b) ≥ 1 − δ,
  • Planning and Acting with Non-­‐deterministic Events: Navigating between Safe States

  • NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations

  • PsyNet: Self-­‐supervised Approach to Object Localization using Point Symmetric Transformation

  • Learning and Reasoning for Robot Sequential Decision Making under Uncertainty

    • https://arxiv.org/abs/1901.05322
    • "In experi- ments, a mobile robot is tasked with estimating human in- tentions using their motion trajectories, declarative contex- tual knowledge, and human-robot interaction (dialog-based and motion-based)"
  • Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

  • Specifying Weight Priors in Bayesian Deep Neural Networks with Empirical Bayes

  • Collaborative sampling for Generative Adversarial Networks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment