Skip to content

Commit ce29cea

Browse files
committed
add autopasta link
1 parent e64fea6 commit ce29cea

File tree

3 files changed

+52
-14
lines changed

3 files changed

+52
-14
lines changed

_includes/01_research.html

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,9 @@ <h2 style="text-align: center; margin-top: -150px;"> Research</h2>
4141
<br>
4242
<a href="https://arxiv.org/abs/2310.14034">tree prompting</a> - improve black-box few-shot text classification
4343
with decision trees<br>
44-
<a href="https://arxiv.org/abs/2311.02262">attention steering</a> - guide LLMs by emphasizing specific input
44+
<a href="https://arxiv.org/abs/2311.02262">attention steering</a> / <a
45+
href="https://arxiv.org/abs/2409.10790">automatic attention steering</a> - guide LLMs by
46+
emphasizing specific input
4547
spans<br>
4648
<a href="https://arxiv.org/abs/2210.01848">interpretable autoprompting</a> - automatically find fluent
4749
natural-language prompts<br>
@@ -196,6 +198,20 @@ <h2 style="text-align: center; margin-top: -150px;"> Research</h2>
196198
<td class="med">
197199
</td>
198200
</tr>
201+
202+
<tr>
203+
<td class="center">'24</td>
204+
<td>Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
205+
</td>
206+
<td>zhang*, yu*, et al.</td>
207+
<td class="med">🔎🌀</td>
208+
<td class="center"><a href="https://arxiv.org/abs/2409.10790">arxiv</a></td>
209+
<td class="big"><a href="https://github.com/QingruZhang/AutoPASTA"><i class="fa fa-github fa-fw"></i></a>
210+
</td>
211+
<td class="med">
212+
</td>
213+
</tr>
214+
199215
<tr>
200216
<td class="center">'24</td>
201217
<td>Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

_notes/ml/nlp.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,16 +74,36 @@ Nice repo keeping track of progress [here](https://github.com/sebastianruder/NLP
7474
- [QNLI](https://rajpurkar.github.io/SQuAD-explorer/) (Question-answering Natural Language Inference) - determine if the answer to a question is contained in a second sentence or not
7575
- [RTE](https://aclweb.org/aclwiki/Recognizing_Textual_Entailment) (Recognizing Textual Entailment) - determine if a sentence entails a given hypothesis or not
7676
- [WNLI](https://cs.nyu.edu/~davise/papers/WinogradSchemas/WS.html) (Winograd Natural Language Inference) - determine if a sentence with an anonymous pronoun and a sentence with this pronoun replaced are entailed or not
77+
7778
- more NLI ( natural language inference)
7879
- ANLI: Adversarial NLI ([nie et al. 2019](https://arxiv.org/abs/1910.14599)) - harder examples found by model failures
7980
- SNLI Benchmark ([bowman et al. 2015](https://arxiv.org/abs/1508.05326)) = Stanford Natural Languge Inference - entailment dataset
8081
- 570k human-annotated sentence pairs where people ask about entailment
8182
- FEVER: Fact Extraction and VERification ([Thorne et al., 2018](https://aclanthology.org/N18-1074/))
8283
- SciTail ([khot et al. 2018](https://ojs.aaai.org/index.php/AAAI/article/view/12022)) - textual entailment derived from science-question answering
84+
8385
- QA
8486
- SQuAD 2.0 ([Rajpurkar...liang, 2018](https://arxiv.org/abs/1806.03822)) - adds 50k unanswerable questions; system must know when it can't answer
8587
- SQuAD ([Rajpurkar...liang, 2016](https://arxiv.org/abs/1606.05250)) - Stanford Question Answering Dataset (SQuAD) - 100k questions from 23k passages in 500 wikipedia articles
8688

89+
- Text classification datasets (used in [Tree-Prompt](https://arxiv.org/pdf/2310.14034) and [Aug-imodels](https://www.nature.com/articles/s41467-023-43713-1))
90+
91+
| Dataset | | Classes | Text | Label |
92+
| :-----: | ------------------------------------------------------------ | ------- | :----------------------------------------------------------: | :------------: |
93+
| SST2 | Movie review sentiment | 2 | that loves its characters and communicates something rather beautiful about human nature | positive |
94+
| SUBJ | subjective vs objective | 2 | the script isn't very good; not even someone as gifted as hoffman ( the actor ) can make it work. | subjective |
95+
| MPQA | question answer sentiment | 2 | victory of democracy | positive |
96+
| AGNews | classify news titles | 4 | Wall St. Bears Claw Back Into the Black (Reuters). "Reuters - Short-sellers, Wall Street's dwindling band of ultra-cynics, are seeing green again." | business |
97+
| CB | given a text and a clause, predict how much the text commits to the clause | 3 | Premise: "Do you mind if I use your phone?" Ronni could see that Guido's brain was whirring. Hypothesis: Guido's brain was whirring | entailment |
98+
| CR | customer review sentiment | 2 | i didn 't have any major problems installing this software . | positive |
99+
| DBPedia | Categories in wikipedia | 14 | Geoffrey D. Falksen (born July 31 1982) is an American steampunk writer. | artist |
100+
| MR | Movie review sentiment | 2 | the film is flat . | negative |
101+
| RTE | Entailment | 2 | Sentence 1: No Weapons of Mass Destruction Found in Iraq Yet. Sentence 2: "Weapons of Mass Destruction Found in Iraq. | not_entailment |
102+
| TREC | Classifying questions | 6 | What 's known as The queen of Drinks ? | entity |
103+
| FPB | Financial phrase sentiment | 3 | According to Gran, the company has no plans to move all production to Russia, although that is where the company is growing. | neutral |
104+
| IMDB | Movie review sentiment | 2 | would put this at the top of my list of films in the category of unwatchable trash! [...] | negative |
105+
| Emotion | Tweet emotion classification | 6 | i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake | sadness |
106+
87107
**common data sources**
88108

89109
- WSJ

_notes/research_ovws/ovw_llms.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -448,7 +448,7 @@ See related papers in the [📌 interpretability](https://csinva.io/notes/resear
448448
- prompting = few-shot learning = priming = in-context learning (starts with GPT)
449449
- prompting without changing any model parameters
450450
- limitation: can't exploit sets longer than the training window
451-
- MetaICL: Learning to Learn In Context ([min et al. 2022](https://arxiv.org/abs/2110.15943)) - tune LLM to do in-context learning on a large set of training tasks (few-show prompting and training time and at test-time)
451+
- MetaICL: Learning to Learn In Context ([min et al. 2022](https://arxiv.org/abs/2110.15943)) - tune LLM to do in-context learning on a large set of training tasks (few-shot prompting and training time and at test-time)
452452
- Visual Prompting via Image Inpainting ([bar...darrell, globerson, efros, 2022](https://arxiv.org/abs/2209.00647) )
453453
- PatternExploiting Training (PET) -- Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference ([schick & schutze, 2021](https://aclanthology.org/2021.eacl-main.20.pdf))
454454
- **cloze questions** - same as masked language modeling: task is to replace some missing words
@@ -485,7 +485,7 @@ See related papers in the [📌 interpretability](https://csinva.io/notes/resear
485485

486486
- Teach Llamas to Talk: Recent Progress in Instruction Tuning ([gao blogpost 2023](https://gaotianyu.xyz/blog/2023/11/30/instruction-tuning/))
487487

488-
- Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs ([zhang et al. 2023](https://arxiv.org/abs/2311.02262))
488+
- Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs, PASTA ([zhang et al. 2023](https://arxiv.org/abs/2311.02262))
489489
- The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction ([sharma...misra, 2023](https://arxiv.org/abs/2312.13558))
490490
- human feedback
491491
- Learning to summarize with human feedback ([OpenAI, 2020](https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html))
@@ -581,15 +581,15 @@ Editing is generally very similar to just adaptation/finetuning. One distinction
581581
- then, perform generation with latent embedding
582582
- learn linear transformation given a dataset of examples with attributes and desired completions
583583
- (also regularize the model to not change *too much* on other stuff)
584-
- Activation Addition: Steering Language Models Without Optimization ([turner...macdiarmid, 2023](https://arxiv.org/abs/2308.10248))
585-
- blog post: activation engineering: Steering GPT-2-XL by adding an activation vector ([turner, ..., mini, 2023](https://www.alignmentforum.org/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector#6__The_Eiffel_Tower_is_in_Rome))
586-
- obtain "steering vector" by embedding a phrase (e.g. *love*) and adding that vector to the llm embedding during generation
587-
- they only add the embedding for some layers for some tokens
588-
- Extracting Latent Steering Vectors from Pretrained Language Models ([subramani, ..., peters, 2022](https://arxiv.org/abs/2205.05124)) - find latent vectors via optimization that cause an LLM to output a particular sequence
589-
- then, use these vectors to do things like transfer to new tasks / compute textual similarity
590-
- Function Vectors in LLMs ([todd...wallace, bau, 2023](https://arxiv.org/pdf/2310.15213.pdf))
591-
- In-Context Learning Creates Task Vectors ([hendel, geva, & globerson, 2023](https://arxiv.org/pdf/2310.15916))
592-
584+
- Activation Addition: Steering Language Models Without Optimization ([turner...macdiarmid, 2023](https://arxiv.org/abs/2308.10248))
585+
- blog post: activation engineering: Steering GPT-2-XL by adding an activation vector ([turner, ..., mini, 2023](https://www.alignmentforum.org/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector#6__The_Eiffel_Tower_is_in_Rome))
586+
- obtain "steering vector" by embedding a phrase (e.g. *love*) and adding that vector to the llm embedding during generation
587+
- they only add the embedding for some layers for some tokens
588+
- Extracting Latent Steering Vectors from Pretrained Language Models ([subramani, ..., peters, 2022](https://arxiv.org/abs/2205.05124)) - find latent vectors via optimization that cause an LLM to output a particular sequence
589+
- then, use these vectors to do things like transfer to new tasks / compute textual similarity
590+
- Function Vectors in LLMs ([todd...wallace, bau, 2023](https://arxiv.org/pdf/2310.15213.pdf))
591+
- In-Context Learning Creates Task Vectors ([hendel, geva, & globerson, 2023](https://arxiv.org/pdf/2310.15916))
592+
- Programming Refusal with Conditional Activation Steering ([lee...dhurandhar, 2024](https://arxiv.org/abs/2409.05907))
593593
- PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions ([chen...sameer singh...kelvin guu, 2023](https://drive.google.com/file/d/1CXSUii4w8Y2uj-zLm8zRl63SYh45FaZL/view))
594594
- new datasets
595595
- MQUAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions ([zhong...manning, potts, chen, 2023](https://www.cs.princeton.edu/~zzhong/papers/MQuAKE.pdf)) - introduces benchmark MQUAKE + method MeLLo, which stores edited facts externally while prompting the language model iteratively to generate answers that are consistent with the edited facts
@@ -1200,7 +1200,9 @@ mixture of experts models have become popular because of the need for (1) fast s
12001200
- Jailbreaking Proprietary Large Language Models using Word Substitution Cipher ([Handa…Baral 2024](https://arxiv.org/abs/2402.10601)): short nice paper! just says substitute unsafe words with safe words, provide the mapping to the model and the original question substituted with the words. Ask the LLM to reply, high ASR for ChatGPT and Gemini.
12011201
- CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models ([Lv…Huang 2024](https://arxiv.org/abs/2402.16717)): In this case they ask the malicious question using code where the input sentence is encrypted using some simple coding schemes (reverse words or sort words by their length) and the code includes the decryption function. Highest ASR among all baselines which includes the CipherChat and multilingual.
12021202
- MULTIVERSE: Exposing Large Language Model Alignment Problems in Diverse Worlds ([Jin…Zhang 2024](https://arxiv.org/abs/2402.01706)): This is not doing cipher language. It creates several layers of alternate worlds where one can put a malicious query and it bypasses model security. The deeper the layers, the higher ASR the attack has.
1203-
- People have used other modalities like images, voice, to attack models. One can think of other modalities as a generalization of different languages.
1203+
- Data Contamination Can Cross Language Barriers ([feng yao, yufan zhuang, ..., jingbo shang](https://arxiv.org/html/2406.13236v1)) - LLMs can overfit to benchmarks by being trained on translations of them
1204+
- To detect this contamination, for each question, we replace all the incorrect choices with correct choices taken from other questions
1205+
12041206

12051207
# applications
12061208

@@ -1297,13 +1299,13 @@ mixture of experts models have become popular because of the need for (1) fast s
12971299
- Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression ([raventos…ganguli, 2023](https://openreview.net/forum?id=BtAz4a5xDg))
12981300
- Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models ([fu...sharan, 2023](https://arxiv.org/abs/2310.17086))
12991301
- How Well Can Transformers Emulate In-context Newton’s Method? ([giannou...papailiopoulos, & lee, 2024](https://arxiv.org/pdf/2403.03183v1.pdf))
1300-
13011302
- Teaching Algorithmic Reasoning via In-context Learning ([zhou...sedghi, 2022](https://arxiv.org/abs/2211.09066))
13021303
- Looped Transformers as Programmable Computers ([giannou, ..., jason lee, papailiopoulos, 2023](https://arxiv.org/abs/2301.13196)) - use transformers as universal computers by programming them with specific weights
13031304
- Learning mathematical problems ([francois charton](https://scholar.google.com/citations?hl=en&user=1tMnd-4AAAAJ&view_op=list_works&sortby=pubdate))
13041305
- Probing the Decision Boundaries of In-context Learning in Large Language Models ([zhao, nguyen, & grover, 2024](https://arxiv.org/pdf/2406.11233v1))
13051306
- Theory (don't directly predict algorithm)
13061307
- Meta-learning for Mixed Linear Regression ([kong...kakade, oh, 2020](https://proceedings.mlr.press/v119/kong20a.html)) - generalization for linear regression based on which linear tasks were seen before
1308+
- Transformers are Universal In-context Learners ([furuya...peyre, 2024](https://arxiv.org/abs/2408.01367)) - mathetmatically show that transformers are universal and can approximate continuous in-context mappings to arbitrary precision
13071309
- Limitations
13081310
- Faith and Fate: Limits of Transformers on Compositionality ([dziri...choi, 2023](https://arxiv.org/abs/2305.18654)) - LLMs can't (easily) be trained well for multiplication (and similar tasks)
13091311

0 commit comments

Comments
 (0)