注册
登录
知识图谱
Awesome-Multimodal-Research
返回
项目作者:
Eurus-Holmes
项目描述 :
A curated list of Multimodal Related Research.
高级语言:
Python
项目主页:
https://chenfeiyang.top/Awesome-Multimodal-Research/
项目地址:
git://github.com/Eurus-Holmes/Awesome-Multimodal-Research.git
创建时间:
2019-07-31T14:15:49Z
项目社区:
https://github.com/Eurus-Holmes/Awesome-Multimodal-Research
开源协议:
MIT License
下载
Unsupervised Learning of Spoken Language with Visual Context_1648624951003.pdf
WAV2PIX- SPEECH-CONDITIONED FACE GENERATION USING GENERATIVE ADVERSARIAL NETWORKS_1648624951634.pdf
Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving- Datasets, Methods, and Challenges_1648624952135.pdf
Multimodal End-to-End Autonomous Driving_1648624952706.pdf
nuScenes- A multimodal dataset for autonomous driving_1648624952904.pdf
A Logical Model for Supporting Social Commonsense Knowledge Acquisition_1648624953053.pdf
Adventures in Flatland- Perceiving Social Interactions Under Physical Dynamics_1648624953146.pdf
COMMONSENSEQA- A Question Answering Challenge Targeting Commonsense Knowledge_1648624953414.pdf
From Recognition to Cognition- Visual Commonsense Reasoning_1648624953702.pdf
Heterogeneous Graph Learning for Visual Commonsense Reasoning_1648624954110.pdf
SOCIAL IQA- Commonsense Reasoning about Social Interactions_1648624954271.pdf
Audiovisual Behavior Descriptors for Depression Assessment_1648624954381.pdf
Cross-modal Recurrent Models for Weight Objective Prediction from Multimodal Time-series Data_1648624954454.pdf
Dyadic Behavior Analysis in Depression Severity Assessment Interviews_1648624954582.pdf
Improving Hospital Mortality Prediction with Medical Named Entities and Multimodal Learning_1648624954687.pdf
Knowledge-driven generative subspaces for modeling multi-view dependencies in medical data_1648624954855.pdf
Learning the Joint Representation of Heterogeneous Temporal Events for Clinical Endpoint Prediction_1648624954946.pdf
Leveraging Medical Visual Question Answering with Supporting Facts_1648624955018.pdf
Machine Learning in Multimodal Medical Imaging_1648624955129.pdf
Multimodal Medical Image Retrieval based on Latent Topic Modeling_1648624955241.pdf
SimSensei Kiosk- A Virtual Human Interviewer for Healthcare Decision Support_1648624955401.pdf
Understanding Coagulopathy using Multi-view Data in the Presence of Sub-Cohorts- A Hierarchical Subspace Approach_1648624955516.pdf
Unsupervised Multimodal Representation Learning across Medical Images and Reports_1648624955640.pdf
Attention Based Natural Language Grounding by Navigating Virtual Environment_1648624955939.pdf
Embodied Question Answering_1648624956303.pdf
FROM LANGUAGE TO GOALS- INVERSE REINFORCEMENT LEARNING FOR VISION-BASED INSTRUCTION FOLLOWING_1648624956790.pdf
Hierarchical Decision Making by Generating and Following Natural Language Instructions_1648624957109.pdf
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web_1648624957669.pdf
Learning to Navigate Unseen Environments- Back Translation with Environmental Dropout_1648624958241.pdf
Look Before You Leap- Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation_1648624958430.pdf
Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction_1648624958739.pdf
Multi-modal Discriminative Model for Vision-and-Language Navigation_1648624959233.pdf
Read, Watch, and Move- Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos_1648624959336.pdf
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation_1648624959540.pdf
SELF-MONITORING NAVIGATION AGENT VIA AUXILIARY PROGRESS ESTIMATION_1648624960033.pdf
Stay on the Path- Instruction Fidelity in Vision-and-Language Navigation_1648624960748.pdf
TOUCHDOWN- Natural Language Navigation and Spatial Reasoning in Visual Street Environments_1648624960949.pdf
Tactical Rewind- Self-Correction via Backtracking in Vision-and-Language Navigation_1648624961462.pdf
The Regretful Agent- Heuristic-Aided Navigation through Progress Estimation_1648624962025.pdf
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training_1648624962941.pdf
VideoNavQA- Bridging the Gap between Visual and Embodied Question Answering_1648624963424.pdf
Vision-and-Dialog Navigation_1648624963572.pdf
Vision-and-Language Navigation- Interpreting visually-grounded navigation instructions in real environments_1648624963965.pdf
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)_1648624964713.pdf
Finding “It”- Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos_1648624965329.pdf
Gated-Attention Architectures for Task-Oriented Language Grounding_1648624965524.pdf
Grounded Compositional Semantics for Finding and Describing Images with Sentences_1648624965947.pdf
Grounded Language Learning Fast and Slow_1648624966103.pdf
Grounded Language Learning from Video Described with Sentences_1648624966443.pdf
Grounded Video Description_1648624966656.pdf
Grounding language acquisition by training semantic parsers using captioned videos_1648624967302.pdf
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts_1648624967714.pdf
Localizing Moments in Video with Natural Language_1648624968579.pdf
Multilevel Language and Vision Integration for Text-to-Clip Retrieval_1648624969323.pdf
SCAN- LEARNING HIERARCHICAL COMPOSITIONAL VISUAL CONCEPTS_1648624969800.pdf
Show, Control and Tell- A Framework for Generating Controllable and Grounded Captions_1648624970767.pdf
The Hateful Memes Challenge- Detecting Hate Speech in Multimodal Memes_1648624971511.pdf
Using Syntax to Ground Referring Expressions in Natural Images_1648624971748.pdf
VIOLIN- A Large-Scale Dataset for Video-and-Language Inference_1648624971981.pdf
Visual Coreference Resolution in Visual Dialog using Neural Module Networks_1648624972524.pdf
Visual Grounding in Video for Unsupervised Word Translation_1648624973055.pdf
AUDIO CAPTION- LISTEN AND TELL_1648624974118.pdf
AUDIO-LINGUISTIC EMBEDDINGS FOR SPOKEN SENTENCES_1648624974345.pdf
DEEP VOICE 3- SCALING TEXT-TO-SPEECH WITH CONVOLUTIONAL SEQUENCE LEARNING_1648624974486.pdf
Deep Voice 2- Multi-Speaker Neural Text-to-Speech_1648624974712.pdf
Deep Voice- Real-time Neural Text-to-Speech_1648624974952.pdf
Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation_1648624975012.pdf
FROM AUDIO TO SEMANTICS- APPROACHES TO END-TO-END SPOKEN LANGUAGE UNDERSTANDING_1648624975136.pdf
From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings_1648624975196.pdf
Lattice Transformer for Speech Translation_1648624975327.pdf
NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MEL SPECTROGRAM PREDICTIONS_1648624975860.pdf
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering_1648624976376.pdf
CLEVR- A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning_1648624977269.pdf
Don’t Just Assume; Look and Answer- Overcoming Priors for Visual Question Answering_1648624977863.pdf
Fusion of Detected Objects in Text for Visual Question Answering_1648624978529.pdf
GQA- A New Dataset for Real-World Visual Reasoning and Compositional Question Answering_1648624979017.pdf
Interactive Language Learning by Question Answering_1648624979496.pdf
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA_1648624979795.pdf
LEARNING TO COUNT OBJECTS IN NATURAL IMAGES FOR VISUAL QUESTION ANSWERING_1648624980463.pdf
Learning to Reason- End-to-End Module Networks for Visual Question Answering_1648624980712.pdf
MUREL- Multimodal Relational Reasoning for Visual Question Answering_1648624980982.pdf
MovieQA- Understanding Stories in Movies through Question-Answering_1648624981300.pdf
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding_1648624981631.pdf
Neural-Symbolic VQA- Disentangling Reasoning from Vision and Language Understanding_1648624981983.pdf
OK-VQA- A Visual Question Answering Benchmark Requiring External Knowledge_1648624982507.pdf
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization_1648624982923.pdf
Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering_1648624983172.pdf
RUBi- Reducing Unimodal Biases in Visual Question Answering_1648624983582.pdf
RecipeQA- A Challenge Dataset for Multimodal Comprehension of Cooking Recipes_1648624984130.pdf
Social-IQ- A Question Answering Benchmark for Artificial Social Intelligence_1648624984525.pdf
Stacked Latent Attention for Multimodal Reasoning_1648624984648.pdf
TVQA- Localized, Compositional Video Question Answering_1648624985280.pdf
VQA- Visual Question Answering_1648624986159.pdf
A Dataset for Movie Description_1648624986913.pdf
Charades-Ego- A Large-Scale Dataset of Paired Third and First Person Videos_1648624987395.pdf
Deep Visual-Semantic Alignments for Generating Image Descriptions_1648624987733.pdf
Generating Descriptions with Grounded and Co-Referenced People_1648624988192.pdf
Grounding Referring Expressions in Images by Variational Context_1648624988529.pdf
Hollywood in Homes- Crowdsourcing Data Collection for Activity Understanding_1648624988988.pdf
Joint Event Detection and Description in Continuous Video Streams_1648624989393.pdf