Tfidf Extractor
❖ NLP & Text Analysis

TF-IDF Keyword Extractor

v1.0 documentation

Extract top TF-IDF terms from one or multiple documents. Useful for discovering what terms define a page's content.

URL inputFile inputXLSX export
tfidf_extractor.py107 lines8 paramsPython 3.8+
Quick start
1

Install

terminal
pip install -r requirements.txt
2

Run

terminal
python tfidf_extractor.py --url https://example.com/page --top 30
terminal
python tfidf_extractor.py --files page1.html page2.html --output tfidf_results.xlsx
terminal
python tfidf_extractor.py --file article.txt --ngram-range 1 3
3

Export

Add --output report.xlsx to save results as a spreadsheet.

Parameters
FlagDescription
--urlSingle URL to analyze
--urlsMultiple URLs. Multiple values allowed
--fileSingle file
--filesMultiple files. Multiple values allowed
--topTop N terms per document. Default: 30 (integer)
--ngram-minNgram min (integer)
--ngram-maxNgram max (integer)
--outputSave results as XLSX
help
python tfidf_extractor.py --help
Use cases
Content quality audit
Writer evaluation
Client deliverable

Run across all your blog posts to score quality. Sort by score in the XLSX export, then prioritize rewrites for the lowest-scoring pages.

Before publishing freelance content, run this tool to check quality signals. Use specific metrics as concrete feedback for writers.

Include the analysis in your SEO audit report. Clients appreciate data-backed recommendations over subjective opinions.

Dependencies

Requires: beautifulsoup4, pandas, requests, scikit-learn. All included in requirements.txt.

Get all 154 Python SEO tools — $49

One-time payment. Lifetime access. No monthly fees.
Learn 25 tools and get 25% back. Earn from client work and get 50% back.

Get the full toolkit

AAIO Inc — aaioinc.com/tools/tfidf_extractor/