Real-Time News Analysis with NLP
Designed a scalable news intelligence pipeline with concurrent article ingestion, explainable sentiment scoring, and automated topic clustering across global news sources. Unlike standard scrapers, this system employs a fault-tolerant dual-strategy pipeline (Newspaper3k + Custom DOM Parsing) to ensure high data availability even on complex modern websites.
The system processes articles in bulk, categorizing them into 8+ domains (Politics, Tech, Finance, etc.) while providing granular sentiment scores coupled with "example sentences" that justify the rating—bridging the gap between raw metrics and explainable AI.
Clean, minimalist interface allowing users to input any URL or select a preset. Users can configure batch size (up to 20 articles) for deep processing.
Quick-access dashboard for major international outlets (BBC, Reuters, Al Jazeera), enabling one-click sentiment auditing of global news.
Real-time visualization layer showing the "Sentiment Distribution" (Positive/Neutral/Negative) and "Topic Categorization" across the analyzed batch.
Detailed cards for each article featuring the calculated sentiment score, auto-detected tag (e.g., "Health", "World"), and the specific "driver statement" that influenced the rating.
Newspaper3k library
to custom BeautifulSoup parsers if structural extraction fails, achieving high data extraction reliability.The core engine doesn't just return a score; it explains why. By tokenizing full articles into sentences and scoring them individually, the system isolates the most polarized statements to present as evidence.
<article>,
meta[name="description"]) when standard extraction fails.