Twitter datasets for research and archiving.

  • Create your own Twitter dataset from existing datasets.
  • Conforms with Twitter policies.
  • Members of the George Washington University community should use the GWU VPN for full access.

1,771,107,526 tweets available.

TweetSets is intended for academic purposes only. Users are encouraged to follow all relevant Twitter policies and consider ethics and privacy in research and publication with Twitter data.

Steps for creating a dataset:
  1. Select source dataset(s). Source datasets have been previously collected.
  2. Limit the dataset by querying on keywords, hashtags, and other parameters. Repeat until you've created the desired dataset.
  3. Create the dataset. This freezes the dataset parameters.
  4. Generate and download dataset exports such as the list of tweet ids, mention nodes/edges (e.g., for Gephi).

See Help for step-by-step instructions.

Important information:
  • You cannot view or download the text of tweets.
  • Keep track of the URLs of your datasets. A record of your datasets are stored in a cookie in your browser, but no record of your datasets is stored on the server.
  • Generating dataset exports may take a while.
  • Datasets may change if new tweets are added to the source datasets.

About sharing Twitter datasets for research and archiving:

Twitter policies do not allow publicly posting or sharing the text of tweets retrieved from the Twitter API. However, they do allow the sharing of tweet ids. Given a tweet id, the text of tweets can be retrieved from the Twitter API using a tool such as DocNow's Hydrator. Note that if a tweet has been deleted or protected, it cannot be retrieved from the Twitter API, so some tweets from the dataset may be lost.