TweetSets

Twitter datasets for research and archiving.

  • Create your own Twitter dataset from existing datasets.
  • Conforms with Twitter policies.



369,928,607 tweets available.

TweetSets is intended for academic purposes only. Users are encouraged to follow all relevant Twitter policies and consider ethics and privacy in research and publication with Twitter data.

Steps for creating a dataset:
  1. Select source dataset(s). Source datasets have been previously collected.
  2. Limit the dataset by querying on keywords, hashtags, and other parameters. Repeat until you've created the desired dataset.
  3. Create the dataset. This freezes the dataset parameters.
  4. Generate and download dataset derivatives such as the list of tweet ids, mention nodes/edges (e.g., for Gephi).

See Help for step-by-step instructions.


Important information:

About sharing Twitter datasets for research and archiving:

Twitter policies do not allow publicly posting or sharing the text of tweets retrieved from the Twitter API. However, they do allow the sharing of tweet ids. Given a tweet id, the text of tweets can be retrieved from the Twitter API using a tool such a DocNow's Hydrator. Note that if a tweet has been deleted or protected, it cannot be retrieved from the Twitter API, so some tweets from the dataset may be lost.