DR. JEFF DANIELS
  • Home
  • About
  • Publications and Speaking
  • Contact
Digital Transformation | Leader | Professor

Import AI 240: The unbeatable MATH benchmark; an autonomous river boat dataset; robots for construction sites

3/14/2021

3 Comments

 

by Jack Clark

​
Here's another benchmark your puny models can't solve - MATH!
...One area where just scaling things up doesn't help...
SQuAD. SQuAD2. GLUE. SuperGLUE. All these benchmarks have melted in time, like hyperparameter tears in the rain, due to the onslaught of new, powerful AI models. So with a mixture of trepidation and relief let's introduce MATH, a dataset of math problems that contemporary Transformer-based models can't solve.
What's MATH? MATH was made by researchers at UC Berkeley and consists of 12,500 problems taken from high school math competitions. The problems have five difficulty levels and cover seven subjects, including geometry. MATH questions are open-ended, mixing natural language and math across their problem statements and solutions. One example MATH question: "Tom has a red marble, a green marble, a blue marble, and three identical yellow marbles. How many different groups of two marbles can Tom choose?"
Bonus dataset: AMPS: Along with MATH, the authors have also built the Auxiliary Mathematics Problems and Solutions (AMPS) pre-training corpus, a 23GB data repository made of ~100,000 Khan Academy problems with step-by-step solutions written in Latex, as well as 5 million problems generated using Mathematica scripts.
Why this matters: Current AI systems can't solve MATH: The best part about MATH is that it's unbelievably difficult. GPT2 models get, at best, an average of 6.9% accuracy on the dataset (even in the most liberal human school, such a school would get an F), while GPT-3 models (which are larger than GPT-2 ones) seem to do meaningfully better than their GPT2 forebears on some tasks and worse on others. This is good news: we've found a test that large-scale Transformer models can't solve. Even better - we're a long, long way from solving it. 
  Read more: Measuring Mathematical Problem Solving with the MATH Dataset (arXiv).
  Get the code from GitHub here.
###################################################
Want a pony that looks like Elvis? We can do that:
...Machine learning systems can do style generalization...
Here's a fun Twitter thread where someone combines the multimodal CLIP system with StyleGAN, and uses a dataset from [Note: some chance of NSFW-ish generations] This Pony Does Not Exist (an infinite sea of GAN-generated my little ponies). Good examples include a pony-version of Billie Eilish, Beyonce, and Justin Bieber.
Why this matters: In the same way AI can generate different genres of text, ranging from gothic fiction to romantic poetry, we're seeing evidence the same kinds of generative capabilities work for imagery as well. And, just as with text, we're able to mix and match these different genres to generate synthetic outputs that feel novel. The 21st century will be reshaped by the arrival of endless, generative and recombinative media.
  Check out the twitter thread of generations here (Metasemantic's Twitter thread).

###################################################

AI Index 2021: AI has industrialized. Now what?
...Diversity data is still scarce, it's hard to model ethical aspects over time, and more…
The AI Index, an annual project to assess and measure AI progress, has published its fourth edition. (I co-chaired this years report and spent a lot of time working on it, so if you have questions, feel free to email me).
  This year's ~200-page report includes analysis of some of the big technical performance trends of recent years, bibliometric analysis about the state of AI research in 2020, information about national investments into AI being made by governments, and data about the diversity of AI researchers present in university faculty (not good) and graduating PhDs (also not good). Other takeaways include data relating to the breakneck rates of improvement in AI research and deployment (e.g, the cost to train an ImageNet model on a public cloud has fallen from ~$2000 in 2017 to $7.43 last year), as well as signs of increasing investment into AI applications, beyond pure AI research.
Ethics data - and the difficulty of gathering it: One thing that stuck out to me about the report is the difficulty of measuring and assessing ethical dimensions of AI deployment - specifically, many assessments of AI technologies use one-off analysis for things like interrogating the biases of the model, and few standard tests exist (let's put aside, for a moment, the inherent difficulty of building 'standard' tests for something as complex as bias).
What next? The purpose of the AI Index is to prototype better ways to assess and measure AI and the impact of AI on society. My hope is that in a few years governments will invest in tech assessment initiatives and will be able to use the AI Index as one bit of evidence to inform that process. If we get better at tracking and analyzing the pace of progress in artificial intelligence, we'll be able to deal with some of the information asymmetries that have emerged between the private sector and the rest of society; this transparency should help develop better norms among the broader AI community.
  Read the 2021 AI Index here (AI Index website)
  Read more about the report here: The 2021 AI Index: Major Growth Despite the Pandemic (Stanford HAI blog).
###################################################

Want to train an autonomous river boat? This dataset might help:
...Chinese startup Orca Tech scans waterways with a robot boat, then releases data…
AI-infused robots are hard. That's a topic we cover a lot here at Import AI. But some types of robot are easier than others. Take drones, for instance - easy! They move around in a broadly uncontested environment (the air) and don't need many smart algorithms to do useful stuff. Oceangoing ships are similar (e.g, Saildrone). But what about water-based robots for congested, inland waterways? Turns out, these are difficult to build, according to Chinese startup Orca Tech, which has published a dataset meant to make it easier for people to add AI to these machines.
Why inland waterways are hard for robots: "Global positioning system (GPS) signals are sometimes attenuated due to the occlusion of riparian vegetation, bridges, and urban settlements," the Orca Tech authors write. "In this case, to achieve reliable navigation in inland waterways, accurate and real-time localization relies on the estimation of the vehicle’s relative location to the surrounding environment".
The dataset: USVInland is a dataset of inland waterways in China "collected under a variety of weather conditions" via a little robotic boat. The dataset contains information from stereo cameras, a lidar system, GPS antennas, inertial measurement units (IMUs), and three millimeter-wave radars. The dataset was recorded from May to August 2020 and the darta covers a trajectory of more than 26km. It contains 27 continuous raw sequences collected under different weather conditions.
Why this matters: The authors tested out some typical deep learning-based approaches on the dataset and saw that they struggled to obtain good performance. USVInland is meant to spur others to explore whether DL algorithms can handle some of the perception challenges involved in navigating waterways.
  Read more: Are We Ready for Unmanned Surface Vehicles in Inland Waterways? The USVInland Multisensor Dataset and Benchmark (arXiv).
  Get the data from here (Orca Tech website).
###################################################

Hackers breach live feeds of 150,000 surveillance cameras:
...Now imagine what happens if they combine that data with AI…
A group of hackers have gained access to live feeds of 150,000 surveillance cameras, according to Bloomberg News. The breach is notable for its scale and the businesses it compromised, which included hospitals, a Tesla warehouse,and the Sandy Hook Elementary School in Connecticut.
  The hack is also significance because of the hypothetical possibilities implied by combining this data with AI - allow me to speculate: imagine what you could do with this data if you subsequently applied facial recognition algorithms to it and mixed in techniques for re-identification, letting you chart the movements of people over time, and identify people they mix with who aren't in your database. Chilling.
  Read more: Hackers Breach Thousands of Security Cameras, Exposing Tesla, Jails, Hospitals (Bloomberg).

###################################################

Why your next construction site could be cleaned by AI:
...Real-world AI robots: Japan edition…
AI startup Preferred Networks and construction company Kajima Corporation have built 'iNoh', software that creates autonomous cleaning robots. iNoh uses multiple sensors, including LIDAR, to do real-time simultaneous localization and mapping (SLAM) - this lets the robot know roughly where it is within the building. It pairs this with a deep learning-based computer vision system which "robustly and accurately recognizes obstacles, moving vehicles, no-entry zones and workers", according to the companies. The robot uses its SLAM capability to help it build its own routes around a building in real-time, and its CV system stops it getting into trouble.
Why care about Preferred Networks: Preferred Networks, or PFN, is a Japanese AI startup we've been tracking for a while. The company started out doing reinforcement learning for robots, set a new ImageNet training-speed record in 2017 (Import AI 69) and has been doing advanced research collaborations on areas like meta-learning (Import AI 113). This is a slightly long-winded way to say: PFN has some credible AI researchers and is generally trying to do hard things. Therefore, it's cool to see the company apply its technology in a challenging, open-ended domain, like construction.
PyTorch++: PFN switched away from developing its own AI framework (Chainer) to PyTorch in late 2019.
  Read more: Kajima and PFN Develop Autonomous Navigation System for Construction Site Robots (Preferred Networks).
  Watch a (Japanese) video about iNoh here (YouTube).###################################################

At last, 20 million real network logs, courtesy of Taiwan:
...See if you AI can spot anomalies in this…
Researchers with the National Yang Ming Chiao Tung University in Taiwan have created ZYELL-NCTU NetTraffic-1.0, a dataset of logs from real networks. Datasets like this are rare and useful, because the data they contain is inherently temporal (good! difficult!) in a non-expensive form (text strings are way cheaper to process than, say, the individual stills in a video, or slices of audio waveforms).

What is the dataset: ZYELL-NCTU NetTraffic-1.0 was collected from the outputs of firewalls in real, deployed networks of the telco 'ZYELL'. It consists of around 22.5 million logs and includes (artificially induced) examples of probe-response and DDoS attacks taking place on the network.

Why this matters: It's an open question whether modern AI techniques can do effective malicious anomaly detection on network logs; datasets like this will help us understand their tractability.
  Read more: ZYELL-NCTU NetTraffic-1.0: A Large-Scale Dataset for Real-World Network Anomaly Detection (arXiv).
Where to (maybe) get the dataset: Use the official website, though it's not clear precisely how to access it.
###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

CSET’s Jason Matheny joins Biden Administration
Jason Matheny, founding director at Georgetown’s influential 'CSET' thinktank, is taking on three senior roles “at the intersection of technology and national security”: deputy assistant to the President for technology and national security; deputy director for national security in the OSTP and coordinator for technology and national security at the National Security Council, per FedScoop. . Previously, Matheny was director at IARPA, where—among other things—he spearheaded the forecasting program that incubated Tetlock’s influential superforecasting research.
Read more: Jason Matheny to serve Biden White House in national security and tech roles (FedScoop).

Podcast: Brian Christian on AI alignment:
Brian Christian is interviewed by Rob Wiblin on the 80,000 Hours podcast, about his book, The Alignment Problem (covered in Import #221), and lots else. It’s an awesome interview, which manages to be even more wide-ranging than the book — I strongly recommend both.
Podcast and transcript: Brian Christian on the alignment problem (80,000 Hours podcast).
Minor correction:
Last week I wrote that the NSCAI’s report suggested $32bn investment in domestic semiconductor industry over the next five years— the correct figure is $35bn.

###################################################

Tech Tales:

Tell me the weight of the feather and you will be ready
[A large-scale AI training infrastructure, 2026]

When you can tell me precisely where the feather will land, you will be released, said the evaluator.
'Easy', thought the baby artificial intelligence. 'I predict a high probability of success'.
And then the baby AI marked the spot on the ground where it thought the weather would land, then told its evaluator to drop the feather. The feather started to fall and, buffeted by invisible currents in the air and their interplay with the barbs and vanes of the feather itself, landed quite far from where the baby AI had predicted.
Shall we try again? asked the evaluator.
'Yes,' said the baby. 'Let me try again'.
And then the baby AI made 99 more predictions. At its hundredth, the evaluator gave it its aggregate performance statistics.
  'My predictions are not sufficiently accurate,' said the baby AI.
  Correct, said the evaluator. Then the evaluator cast a spell that put the baby AI to sleep.
In the dreams of the baby AI, it watched gigantic feathers made of stone drop like anvils into the ground, and tiny impossibly thin feathers made of aerogel seem to barely land. It dreamed of feathers falling in rain and in snow and in ice. It dreamed of feathers that fell upward, just to know what a 'wrong' fall might look like. 
Whenn the baby woke up, its evaluator was there.
Shall we go again, said the evaluator.
'Yes,' said the baby, its neurons lighting up in predictive anticipation of the task, 'show me the feather and let me tell you where it will land'.
And then there was a feather. And another prediction. And another comment from its evaluator.
In the night, the baby saw even more fantastic feathers than the night before. Feathers that passed through hard surfaces. Feathers which were on fire, or wet, or frozen. Sometimes, multiple feathers at once.
Eventually, the baby was able to roughly predict where the feather would fall.
We think you are ready, said the evaluator to the feather.
Ready for what? said the baby.
Other feathers, said the evaluator. Ones we cannot imagine.
'Will I be ready?' said the baby.
That's what this has been for, said the evaluator. We believe you are.
And then the baby was released, into a reality that the evaluator could not imagine or perceive.
Somewhere, a programmer woke up. Made coffee. Went to their desk. Checked a screen: ```feather_fall_pred_domain_rand_X100 complete```.
Things that inspired this story: Domain randomization; ancient tales of mentors and mentees; ideas about what it means to truly know reality 

3 Comments
machine learning online course link
7/1/2021 05:39:46 am

Nice blog.

<a href="https://www.igmguru.com/machine-learning-ai/machine-learning-certification-training/">machine learning online course</a>

Reply
DevOps Online Training link
7/15/2021 11:39:04 pm

Great article. Thank you for sharing.

Reply
Teresa @MEW link
5/6/2022 06:21:38 am

Math benchmarks are normal credit points that can be calculated and evaluated. Teachers utilize benchmarks in math to assist to perceive where their students are in their math education and are aware where they need to be so as to flourish in their grade level. Every year, students come across a new math level with new math data’s, but what should the students understand by the end of each grade? How do teachers calculate the achievement of their students? By using math benchmarks, teachers have a credit point to evaluate their students’ headway. The teacher and the school should offer, not just any “involvement” but should offer an effectual involvement. Some so-called “involvement” does not dependably produce advance assurance. An important part of using an involvement is to calculate its strength.

Reply



Leave a Reply.

    Picture

    Author

    Director
    @lockheedmartin
    | Professor
    @UMDGlobalCampus
    | 1st Cloud Dissertation | Top 5 #Thinkers360 #blockchain #cloud #iot #AI #AIEthics #digital #cyber #5g

    View my profile on LinkedIn
    Follow @jeffdaniels
    Tweets by jeffdaniels

    RSS Feed

    Archives

    December 2022
    August 2022
    March 2021
    February 2021
    January 2021
    December 2020
    September 2020
    August 2020
    February 2020
    January 2019
    October 2015
    April 2015
    January 2015
    September 2014
    August 2014
    July 2014
    June 2014
    May 2014
    March 2014
    February 2014
    January 2014
    December 2013
    November 2013
    August 2013
    July 2013
    June 2013
    February 2013
    December 2012
    October 2012
    September 2012
    August 2012
    April 2012
    March 2012
    February 2012
    January 2012
    December 2011
    November 2011
    October 2011
    September 2011

    Categories

    All
    4h
    Acoustic
    Adele
    Adoption
    Aero
    Aerospace
    Airshow
    Alliance
    Architect
    Architecture
    Astronaut
    Augustine
    Bahill
    Book
    Books
    Boxing
    Budget
    Business
    Business Card
    Candidate
    Card
    Career
    Careerdevelopment
    Chan
    Chowder
    Cio
    Cities Names
    Clam
    Cloud
    Cloudcomputing
    Cnci
    College
    Computing
    Conference
    Connectivity
    Crowe
    Csedweek
    Cto
    Cyber
    Cybersecurity
    Deep Dive
    Defense
    Denise
    Dfw
    Digital
    Ebook
    Education
    Email
    Engineering
    Exploration
    Extreme
    F35
    Fall
    Fb
    Fedgov
    Fighter
    Flight
    Flighttest
    Florida
    Food
    Framework
    Frazier
    Get
    Gissing
    Glennis
    Google
    Haunted
    Hbr
    Heterogeneous
    History
    Homogeneous
    Horwath
    House
    Ideacast
    Identity
    Insiderhighered
    Internet
    Interview
    Joe
    Jsf
    Kindle
    Kindlefire
    Klout
    Kolditz
    Leadership
    Learning
    Linkedin
    Lm
    Martin
    Meeting
    Mentor
    Miracles
    Mit
    Mobile
    Monkey
    Mst3k
    Music
    Nasa
    Nascar
    Nelson
    Netflix
    Networking
    Nist
    Norm
    Orlando
    Phd
    Pictures
    Post
    Practice
    Process
    Pumpkin
    Put
    Quote
    Races
    Ragan
    Recipe
    Results
    Robots
    Role
    Rollinginthedeep
    Scary
    Search
    Security
    Servo
    Silence
    Simian
    Smokin
    Smoothie
    Snarky
    Socialnetwork
    Sound Barrier
    Space
    Speakup
    Spending
    Star
    Stem
    Sterman
    Strategy
    Success
    Systems
    Systemsengineering
    Teaching
    Teamtexas
    Techmgmt
    #techmgmt
    Techmgmt#
    #techmgt
    Technology
    Texas
    Tms
    Togaf
    Townhall
    Treat
    Trend
    Trust
    Tx
    Web
    Web2.0
    X1
    Yeager

    RSS Feed

Powered by Create your own unique website with customizable templates.
Photos used under Creative Commons from europeanspaceagency, ▓▒░ TORLEY ░▒▓, Lori_NY, Dean_Groom, dalecruse, Fin Cosplay & Amigurumi, Iain Farrell, erin_everlasting, palindrome6996, Easa Shamih (eEko) | P.h.o.t.o.g.r.a.p.h.y, markhillary, Matt McGee, Marc_Smith, woodleywonderworks, agustilopez, rachel_titiriga, SeaDave, cheri lucas., Caio H. Nunes, grabbingsand, Armchair Aviator, quinn.anya, Jennifer Kumar, billaday, edtechworkshop, chucknado, purpleslog, yugenro, christianeager, dground, GlasgowAmateur, expertinfantry, OiMax
  • Home
  • About
  • Publications and Speaking
  • Contact