I've been thinking since early AI Dungeon of an idea for the AI to create images based on what you type, similar to the Text 2 Image AI, and if the image is way too incomprehensible, people can upload their own image/edit whatever the AI thinks the image is.
To make it basic you could target specific nouns someone would say in their story, like if someone says "tavern" or "house" or "grocery store" the image pops up.
Somehow if DALL-E could know the context the story is in, then the nouns could center around that. For example, Context: 1950's or Context: 10,000bc or Context: Mars
Just an idea.