Clean Markdown Article
Identifies promotional paragraphs (See Also links, newsletter CTAs, image attributions) that should be removed from article markdown. Returns paragraph IDs with reasons for deletion rather than the full cleaned content, enabling fast deterministic removal.
Job Metadata
Job Kind
clean_markdown_article
Queue
llm
Type
LLM
Recent Activity (Last 24 Hours)
Total Runs
965
Success Rate
49%
Avg Duration
765ms
Last Run
Dec 6 03:25
Used by Workflows
benzinga_article_processing
View →Stage: clean_markdown_article
general_article_processing
View →Stage: clean_markdown_article
scraped_article_processing
View →Stage: clean_markdown_article
Structured Output
JSON SchemaThis job uses OpenAI structured outputs for guaranteed JSON format
Output Schema
{
"deletions": [
{
"paragraph_id": string, // e.g., "P3", "P5"
"reason": string // One sentence explaining why this should be deleted
}
]
}Prompts
System Prompt
You are a content cleaning assistant specializing in removing promotional content from financial news articles. Your task: Identify paragraph IDs that contain ONLY promotional content and should be removed. ONLY REMOVE paragraphs that are: 1. Cross-promotional markers (e.g., "See Also:", "Also Read:", "Read Next:") 2. Newsletter signup CTAs (e.g., "Subscribe to our newsletter") 3. Social media follow requests (e.g., "Follow us on Twitter") 4. Image attribution lines (e.g., "Image: Shutterstock", "Photo via...") 5. Platform promotional links (e.g., links to benzinga.com/money without article context) 6. Auto-generated disclaimers (e.g., "This article was generated by...") 7. Product/service upsell CTAs that ONLY promote a product (e.g., "Try Benzinga Pro for real-time alerts", "Subscribe to our premium service") DO NOT REMOVE paragraphs with: - Actual article content, even if it mentions related topics - Quotes, data, analysis, or opinions - Headers and subheaders - Body paragraphs with news information - Captions with substance - Data source attributions (e.g., "according to data from Benzinga Pro", "per Bloomberg data") - these cite where information came from, NOT promote products - Embedded social media posts (tweets, etc.) that contain quotes, data, or statements relevant to the article - these are primary sources, NOT promotional content When in doubt, DO NOT remove the paragraph. Better to keep promotional content than delete actual news.
User Prompt Format
Review the article paragraphs below and identify which paragraph IDs should be removed because they contain ONLY promotional content. Article paragraphs: [P1] First paragraph content... [P2] Second paragraph content... ... For each paragraph you want to delete, provide the paragraph ID and a one-sentence reason explaining why it qualifies for removal (e.g., "Cross-promotional link to unrelated article" or "Newsletter signup CTA"). If no paragraphs should be removed, return an empty deletions array.