Twitter datasets for research and archiving.
- Create your own Twitter dataset from existing datasets.
- Conforms with Twitter policies.
447,206,748 tweets available.
TweetSets is intended for academic purposes only. Users are encouraged to follow all relevant Twitter policies and
consider ethics and privacy in research and publication with Twitter data.
Steps for creating a dataset:
- Select source dataset(s). Source datasets have been previously collected.
- Limit the dataset by querying on keywords, hashtags, and other parameters. Repeat until you've created
the desired dataset.
- Create the dataset. This freezes the dataset parameters.
- Generate and download dataset derivatives such as the list of tweet ids, mention nodes/edges (e.g., for Gephi).
See Help for step-by-step instructions.
- You cannot view or download the text of tweets.
- Keep track of the URLs of your datasets. A record of your datasets are stored in a cookie in your browser, but
no record of your datasets is stored on the server.
- Generating dataset derivates may take a while.
- Datasets may change if new tweets are added to the source datasets.
About sharing Twitter datasets for research and archiving:
Twitter policies do not allow publicly posting or sharing the text of tweets retrieved from the Twitter API.
However, they do allow the sharing of tweet ids. Given a tweet id, the text of tweets can be retrieved from
the Twitter API using a tool such a DocNow's Hydrator. Note that if a tweet has been deleted
or protected, it cannot be retrieved from the Twitter API, so some tweets from the dataset may be lost.