Anatoly d90b4a1b18 ️ Catching not images
Also got rid of the sleep
2021-04-06 23:37:53 +03:00
2021-04-06 23:37:53 +03:00
2021-04-06 18:24:54 +03:00
2021-04-06 01:52:48 +03:00
2021-04-06 23:37:53 +03:00
2021-04-06 18:58:12 +03:00
2021-04-06 03:57:52 +03:00

Dochunt

An OSINT tool to run public vk documents though an OCR system.

Install:

git clone https://github.com/anatolykopyl/dochunt.git
cd dochunt
pip install -r requirements.txt

Usage:

  1. Set up your credentials in config.ini.

  2. Set up whatever you're interested in in config.ini in interests. This should be either a list of strings formatted like this:

interests = [
    "interest1",
    "interest2",
    "interest3",
    ...
  ]

If the image contains any of these strings the script will save it.

Or the word any to save any image containing text.

interests = any
  1. Run the script

This will watch the latest uploaded document:

python main.py

This will go through all availible documents (vk caps search to 1000 latest docs):

python main.py -a

Inspired by darkshot.

Description
Scrape vk public documents with OCR
Readme GPL-3.0 49 KiB
Languages
Python 100%