Dataset Viewer
Auto-converted to Parquet Duplicate
model_id
stringclasses
1 value
dataset_id
stringclasses
1 value
columns
listlengths
1
1
seed
int64
42
42
sample_idx
int64
2
45.9k
sentence_prefix
stringlengths
7
2.17k
predicted_token
stringlengths
2
17
actual_token
stringlengths
2
17
probability
float64
0.9
1
num_tokens
int32
4
557
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
2
The detrimental effects that burning fossil fuels has on the environment, such as climate change and air
pollution
pollution
0.94
20
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
2
The importance of investing in renewable energy sources, such as solar and wind power, to reduce our dependency on fossil
fuels
fuels
0.99
24
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
6
An effective time management strategy should include setting clear and realistic goals, planning ahead, breaking tasks down into smaller chunks, being organized, prioritizing tasks, and staying focused on the task at
hand
hand
1
39
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
13
Eventually, their collective hope paid
off
off
0.96
8
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
13
On one fateful day, the sun slowly began to emerge from the darkness like a phoenix rising from the
ashes
ashes
0.93
23
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
24
Gradient descent is an optimization algorithm used in machine learning to find a set of parameters that minimizes a given cost
function
function
0.99
24
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
24
Gradient descent is used in many machine learning algorithms and is one of the key techniques used in deep
learning
learning
0.97
21
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
31
ARPA stands for Advanced Research Projects
Agency
Agency
0.97
9
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
41
It is caused by the change in speed and wavelength of light as it goes from one medium to
another
another
0.93
21
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
65
Moreover, they are able to bind with the substrates of the reaction, allowing the substrates to remain in close proximity and enhancing the rate of the
reaction
reaction
0.95
32
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
68
The death of John F. Kennedy forever changed the course of history, leaving a lasting legacy of hope and sorrow in its
wake
wake
0.98
26
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
70
For example, if you have a class called “Car”, you can create a new class called “SportCar” and extend the “Car”
class
class
0.96
31
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
97
Not only is it beneficial for communication, leisure, convenience, and productivity, but technology is now an important factor in improving education, health, security and overall quality of
life
life
0.97
35
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
97
From being able to access the internet from virtually anywhere to the rise of smart phones that allow us to stay connected to the people, services and information we need, technology is becoming a part of our lives that we cannot do away
with
with
0.97
47
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
97
With the advances in technology and the potential it holds, we are only seeing the beginning of its potential to improve our quality of
life
life
0.98
27
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
101
Replacing current fossil fuel-powered energy sources with renewable forms of energy is an important step in combating climate
change
change
0.99
21
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
104
Public transportation is an invaluable resource for communities and cities around the
world
world
0.93
14
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
104
It reduces the number of cars on the road, resulting in decreased air pollution, fewer greenhouse gas emissions, and improved public
health
health
0.93
26
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
106
We are committed to making sure that this issue does not occur again in the
future
future
0.92
17
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
111
The duo must fight off the shadows and find their way to the changing light of the morning, all while keeping their fears at
bay
bay
1
27
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
119
The United States Declaration of Independence promised life, liberty, and the pursuit of
happiness
happiness
0.98
17
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
137
Your wisdom and guidance were invaluable and I am thankful for your dedication and hard
work
work
0.99
17
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
148
The industrial revolution had a number of impacts, both good and
bad
bad
0.96
14
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
148
However, it also resulted in increased pollution, exploitation of workers, and a widening gap between the rich and the
poor
poor
0.99
24
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
152
Transfer the batter to the prepared pan and bake until a wooden pick inserted into the center comes out
clean
clean
0.9
21
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
156
Despite their reserved attitude, cats are very loving and affectionate towards the humans they bond
with
with
0.98
19
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
161
This event happened several years
ago
ago
0.95
7
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
162
It is also important to eat a balanced diet rich in fruits and vegetables, lean protein, and healthy
fats
fats
0.93
22
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
162
Make sure to stretch before and after workouts and get at least 7-8 hours of sleep every
night
night
0.94
22
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
164
The two most influential people of the twentieth century are Mahatma Gandhi and Nelson
Mandela
Mandela
0.99
18
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
164
Gandhi led India's successful struggle for independence from the British Empire and his philosophy of nonviolent resistance served as an inspiration for civil rights movements around the
world
world
0.97
33
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
175
Immigration to the United States has both pros and
cons
cons
0.99
12
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
177
Supervised learning is used for tasks such as classification, regression, and forecasting, while unsupervised learning is useful for tasks such as clustering and dimensionality
reduction
reduction
1
33
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
178
A recommender system is a type of information filtering system that uses user's past actions or preferences to suggest new items that the user may be interested
in
in
1
31
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
179
Player 1 and Player 2 decide who will go
first
first
0.97
13
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
179
Player 1 and Player 2 make a gesture (rock, paper, or scissors) at the same
time
time
0.98
23
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
183
The average life expectancy of a cat is around 12 to 15
years
years
0.96
18
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
188
Ransomware: malicious software that encrypts data, locking the user out of their system until a ransom is
paid
paid
0.99
24
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
211
User experience should be carefully considered to ensure the app is as intuitive and user-friendly as
possible
possible
1
19
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
212
The flight time from Orlando, FL to Boston, MA is approximately 3 hours and 18
minutes
minutes
0.99
22
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
213
The surface area of a sphere with radius 5 is 314.1592653589793 square
units
units
0.93
32
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
223
The UI interface for a grocery store checkout system should be intuitive and user-friendly, making it easy for customers to quickly and accurately check
out
out
0.93
28
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
230
The solar system is comprised of eight planets orbiting the Sun, along with dwarf planets, asteroids, comets, and other objects such as natural
satellites
satellites
0.98
31
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
230
The eight planets are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and
Neptune
Neptune
0.99
22
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
236
Mia: Hey John, I'm thinking of taking up a new
hobby
hobby
0.93
16
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
236
Mia: But I want to do something that you'll actually be interested
in
in
0.96
17
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
240
def sum_numbers(x, y): """ Returns the sum of two
numbers
numbers
0.9
17
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
241
A healthy diet should include a balance of fruits, vegetables, whole grains, low-fat dairy, lean proteins, and healthy
fats
fats
0.94
26
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
241
Eating a variety of foods is important to receive all types of nutrients, vitamins, and
minerals
minerals
0.93
20
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
246
ATT will affect the advertising industry by requiring transparency on how this data is tracked, how it used, and who it's shared
with
with
0.92
27
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
246
By limiting targeted advertising, users will be able to choose the ads they see, promote a healthier online environment, and enforce greater data privacy regulations if need
be
be
0.96
32
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
250
It could be due to a number of reasons, such as a misconfiguration, server overload due to high traffic, issues with the application’s coding and logic, or a problem with the server’s hardware or operating
system
system
0.96
44
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
263
Artificial Intelligence (AI) is a form of technology that is revolutionizing the way we interact with the world around
us
us
1
25
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
264
I had the pleasure of dining at Panera Bread recently, and it was one of the best restaurant experiences I've ever
had
had
0.96
26
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
264
I would highly recommend Panera Bread to any discerning diner looking for a top-notch dining
experience
experience
0.94
20
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
275
This quote by Nelson Mandela speaks to the resilience, courage and determination of the human
spirit
spirit
0.94
18
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
275
Such rising ensures that we learn from our mistakes, grow in the face of adversity, and allows us to become the best, and most resilient, versions of
ourselves
ourselves
0.96
33
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
276
For energy production, we should focus on shifting away from fossil fuel-based sources and transitioning to renewable sources such as wind and
solar
solar
0.96
26
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
286
Customer: The product isn't working properly when I try to use
it
it
0.92
15
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
292
Later that year, the United States used two atomic bombs over Hiroshima and Nagasaki, Japan, on August 6 and 9 respectively, resulting in the surrender of the Japanese Empire and the official end of World War
II
II
0.99
46
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
297
It can provide you with relaxed vacations, business opportunities, and a way to explore cultures around the
world
world
0.94
21
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
298
Questions pertaining to customer experience at a retail store should seek to assess customer satisfaction with the overall experience, including customer service, product selection, checkout process, store atmosphere, and any other factors that may have impacted the shopping
experience
experience
0.97
45
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
303
Actions speak louder than words", "A piece of cake", "Piece by piece", "Cut to the chase", "Cost an arm and a
leg
leg
0.96
30
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
312
Mountain climbing requires a lot of practice and hard
work
work
0.97
11
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
332
John is an engineer living in California who loves running and reading in his spare
time
time
1
17
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
344
A zero-sum game is a type of game where one person's gain is another person's
loss
loss
1
20
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
344
This means that the total benefit or reward of the game is static, and that any gain to one player will result in a corresponding loss for all other
players
players
0.99
32
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
357
One day, he was presented with a peculiar maze that he couldn't solve, no matter how hard he
tried
tried
0.94
23
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
369
A three paragraph essay usually consists of an introductory paragraph, a body paragraph, and a concluding
paragraph
paragraph
0.99
20
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
371
Leaderboards can create a sense of competition and allow players to compete with each
other
other
1
17
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
376
Additionally, she is a role model for young girls and women, inspiring them to be brave and stand up for what is
right
right
0.97
26
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
383
For these reasons, it is important that we exercise our right to vote and have our voices
heard
heard
0.96
20
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
386
While both machine learning and deep learning are subfields of artificial intelligence, there are differences between the
two
two
0.92
21
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
386
Additionally, deep learning models have recently achieved impressive results on many tasks such as object recognition and natural language
processing
processing
0.9
22
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
398
The sentence "I am sitting" is a declarative
sentence
sentence
0.94
13
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
400
According to a report published by the World Health Organization (WHO), air pollution is the leading environmental cause of premature death worldwide, and smog is a major factor in air
pollution
pollution
0.95
36
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
400
Long-term exposure to smog can increase the risk of developing respiratory and cardiovascular diseases, stroke, and lung
cancer
cancer
0.97
23
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
400
This is due to the presence of fine particles and other compounds that are released into the air from burning fossil
fuels
fuels
0.94
23
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
401
Poverty is linked to poor health for a variety of
reasons
reasons
0.95
13
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
402
Apple has been empowering people through technology for over four
decades
decades
0.99
12
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
404
His mission this time was to uncover the truth about a mad scientist's invention, a mysterious device that could travel back in
time
time
0.99
26
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
404
John stepped onto the device and traveled back in
time
time
0.98
11
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
408
The Cat in the
Hat
Hat
0.98
6
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
422
The function f(x) = 2x is an even
function
function
1
14
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
425
Additionally, investing in crime prevention measures such as improved education and community initiatives could help to reduce the overall crime
rate
rate
0.92
23
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
429
The moon shone eerily through the trees, casting long shadows on the forest
floor
floor
0.96
18
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
429
As they went deeper into the woods, they could hear the faint sounds of owls and nightjars, and the howling of wolves in the
distance
distance
0.92
32
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
434
The oldest living tree on Earth is a Bristlecone Pine tree located in the White Mountains of California and it is estimated to be over 5,000 years
old
old
0.99
37
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
438
I would also include relevant hashtags so that the message reaches a wider
audience
audience
0.96
15
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
439
Support levels are where the price falls and struggles to move below, while resistance levels are where the price rises and struggles to move
above
above
0.92
27
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
442
Additionally, make sure to stay hydrated and take time to relax with hobbies you
enjoy
enjoy
0.97
17
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
454
Solitude is the peacefulness that one experiences when they are alone, either enjoying their own thoughts or meditating on the world around
them
them
0.96
28
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
455
Jack and Linda were childhood friends that had gone their separate ways after college, but when Linda learned that Jack had taken on a difficult task she wanted to help him in any way she
could
could
0.98
38
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
457
The Fahrenheit scale is based on 32 degrees for the freezing point of water and 212 degrees for its boiling
point
point
1
27
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
457
The Celsius scale is based on 0 degrees for the freezing point of water and 100 degrees for its boiling
point
point
1
26
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
458
The access control policies for a cloud-based application can include policy-based access control, role-based access control, attribute-based access control, identity-based access control, and authentication-based access
control
control
1
37
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
458
Attribute-based access control allows administrators to define certain attributes and grant or deny access according to those
attributes
attributes
0.91
20
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
460
The total cost of buying 10 cinema tickets that cost 6 euros each is 60
euros
euros
0.98
22
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
462
Eat a balanced diet to ensure your body and brain get the nutrition they
need
need
0.98
16
Qwen/Qwen2.5-0.5B
tatsu-lab/alpaca
[ "output" ]
42
473
He had been training for years and was finally ready to take his game to the next
level
level
0.97
19
End of preview. Expand in Data Studio

High-Probability Sentence Predictions Dataset

Dataset Description

This dataset contains sentences from tatsu-lab/alpaca where the model Qwen/Qwen2.5-0.5B predicts the token before the final period with ≥90% probability.

Source Dataset Attribution

This dataset is derived from tatsu-lab/alpaca and inherits its license terms (cc-by-nc-4.0). Please cite the original dataset when using this data.

Extraction Parameters

Parameter Value
Source Dataset tatsu-lab/alpaca
Model Qwen/Qwen2.5-0.5B
Probability Threshold 0.9
Seed 42
Source Columns output
Extraction Date 2025-12-15
Total Samples 10,000

Schema

Field Type Description
model_id string Model used for prediction
dataset_id string Source dataset identifier
columns list[string] Source columns extracted from
seed int64 Random seed used for reproducibility
sample_idx int64 Index in source dataset
sentence_prefix string Text before predicted token
predicted_token string Model's top prediction
actual_token string Ground truth token
probability float64 Prediction confidence (0-1)
num_tokens int32 Token count in sentence

Usage

from datasets import load_dataset

dataset = load_dataset("ermiaazarkhalili/alpaca-high-prob-qwen-0.5b-10k")
print(dataset["train"][0])

Citation

@dataset{high_prob_sentences_2025,
    title = {High-Probability Sentence Predictions from tatsu-lab/alpaca},
    year = {2025},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/datasets/ermiaazarkhalili/alpaca-high-prob-qwen-0.5b-10k}},
    note = {Derived from tatsu-lab/alpaca, model: Qwen/Qwen2.5-0.5B}
}

License

This dataset inherits the license from the source dataset: cc-by-nc-4.0

See tatsu-lab/alpaca for full license terms.

Reproducibility

To reproduce this dataset extraction:

python scripts/extract_high_prob_sentences.py \
    --dataset "tatsu-lab/alpaca" \
    --model "Qwen/Qwen2.5-0.5B" \
    --threshold 0.9 \
    --seed 42 \
    --columns output \
    --output data/output.parquet
Downloads last month
14