Android Layout NER Dataset Search
Datasets for Named Entity Recognition on Android User Interface Layouts
- Introduction:
Named Entity Recognition (NER), a fundamental task in Natural Language Processing (NLP), traditionally focuses on identifying and categorizing entities within unstructured text into predefined classes such as persons, organizations, and locations.1 This capability has become integral to various applications, including machine translation, question-answering systems, and information retrieval.1 Extending the concept of NER to the domain of Android User Interface (UI) layouts presents a novel area of exploration. In this context, entities can be envisioned as the interactive and informational elements that constitute an applicationβs interface. These entities could range from specific UI widgets like buttons and text fields to content elements such as dates and prices, or even encompass functional components like login forms and search bars.
The application of NER to Android UI layouts holds significant potential for advancing several areas. For instance, it can substantially enhance accessibility by enabling screen readers to provide more detailed and contextually relevant descriptions of UI elements to users with visual impairments.4 Furthermore, it can improve the ability of machines to understand UI structures, which is crucial for tasks like automated UI testing, UI generation, and the development of more intelligent mobile AI agents capable of interacting with applications at a semantic level.5 Recognizing entities within a UI allows an AI agent to understand not just the visual presentation but also the functional roles of different components, leading to more sophisticated interactions.5
Applying NER to UI layouts necessitates a shift in perspective from traditional linguistic entities to the visual and functional components of an applicationβs interface. This requires a re-evaluation of standard NER categories and the development of new annotation schemes specifically tailored to UI elements. For example, instead of identifying a βpersonβ or βorganization,β the task becomes identifying a βbuttonβ with a specific label or an βimageβ representing a certain object. This fundamental change in focus requires a deep understanding of Android UI structures and the roles of various UI elements within an application.32 - Overview of Relevant Android UI Datasets:
Several datasets containing information about Android UIs have been developed, which hold potential for use in Android Layout NER tasks. These datasets vary in their focus, size, data format, and the types of annotations they provide. The primary datasets identified as potentially relevant include:- Android in the Wild (AitW): A large-scale dataset designed for Android device control research, containing human demonstrations of UI interactions.10
- Rico: A comprehensive repository of mobile application designs, encompassing visual, textual, structural, and interactive properties.5
- MobileViews: A large-scale dataset of mobile GUI elements, providing pairs of screenshots and their corresponding view hierarchies.47
- CLAY: A dataset that includes labels indicating the type of UI objects present in Android application screenshots, primarily used for screen layout denoising.6
- AMEX (Android Multi-annotation EXpo): A dataset featuring multi-level annotations for training mobile GUI-control agents, including detailed element grounding and functionality descriptions.12 It is important to distinguish this from the AMEX dataset used for default prediction.69
- RICO-WidgetCaptioning: A dataset derived from the Rico dataset, which provides natural language captions for various UI elements.46
- Datasets with Potential for Android Layout NER:
- 3.1. Android in the Wild (AitW):
The Android in the Wild (AitW) dataset is a large-scale resource for research in device control, comprising 715,000 episodes of human-demonstrated interactions with Android applications and websites.10 Each data point in the dataset is stored as a TFRecord file, compressed using GZIP.11 This format includes various fields, such as the Android API level of the emulator used, the name of the current activity, the device type (primarily Pixel devices), a unique episode identifier, the total length of the episode, and the natural language instruction provided to the user.11
For UI element information, the dataset includes OCR-detected text for each annotated element in the image/ui_annotations_text field, and the type of UI element (icon or text) in the image/ui_annotations_ui_types field.11 Furthermore, the dataset provides bounding box coordinates for these UI elements in the image/ui_annotations_positions field, with the coordinates represented as a flattened array in the format (y, x, height, width).11 While the AitW dataset does not contain explicit NER labels, the availability of OCR-detected text and precise bounding box coordinates for a large number of UI elements makes it a promising resource for creating a dataset suitable for NER on Android layouts. The provided UI type information (icon or text) can serve as an initial set of entity categories.11 The sheer scale of the dataset, with over 700,000 interaction episodes, is a significant advantage for training robust machine learning models.10 Additionally, the inclusion of natural language instructions associated with each episode 10 can provide valuable context for labeling UI elements as entities based on their purpose within a task. The availability of a research paper and the dataset on GitHub further enhances its utility.10 - 3.2. Rico Dataset:
The Rico dataset is a large-scale collection of mobile application designs, comprising visual, textual, structural, and interactive properties of over 72,000 unique UI screens from more than 9,000 Android applications across 27 categories.5 The dataset provides view hierarchies in JSON format, screenshots in PNG format, and various metadata in CSV files.37 Within the JSON view hierarchy, the bounds property specifies the bounding box of each UI element using the format [left, top, right, bottom].37
Notably, Rico includes semantic annotations for UI elements, offering a componentLabel that categorizes elements into 24 types (e.g., βText Button,β βImageβ).37 It also provides textButtonClass for more specific text button concepts (197 types) and iconClass for various icon categories (97 types).37 These existing semantic labels make Rico a valuable resource for Android Layout NER, as they can be directly mapped to certain NER categories. For instance, elements labeled as βText Buttonβ can be tagged as βUI-BUTTON,β and βImageβ elements as βUI-IMAGE.β The text content contained within these elements can be further analyzed using NLP techniques to identify more granular entities.37 The availability of layout vectors 37 could also be beneficial as features for an NER model. A subset of the Rico dataset has undergone manual verification to ensure the alignment between layout code and visual elements, increasing its reliability.38 The datasetβs widespread use in the research community 5 underscores its significance for UI understanding tasks. - 3.3. MobileViews Dataset:
MobileViews is a large-scale mobile GUI dataset comprising over 600,000 screenshot-view hierarchy (VH) pairs collected from more than 20,000 modern Android applications.47 The dataset is available on Hugging Face in both .zip format (containing .jpg screenshots and .json/.xml view hierarchies) and .parquet format.47 Information about the bounding boxes of UI elements is included within the view hierarchy files and also in the actions.csv file, particularly for elements that were involved in user interactions.47 The actions.csv file logs these interactions and often includes details about the type of element interacted with (e.g., <button>, <p>) along with its bounding box coordinates.47
The actions.csv file also provides basic semantic labels through the type of interacted element and the alt text associated with it.47 Additionally, the AppMetadata.csv file contains information about the genre and categories of each application.47 These element types found in actions.csv can serve as initial, coarse-grained NER labels, such as UI-BUTTON or UI-TEXT. The textual content present within the view hierarchies can be further analyzed to identify more specific entities. The sheer volume of data in MobileViews, with over 600,000 screenshot-VH pairs, makes it a valuable resource for training robust models for UI understanding.47 Furthermore, the datasetβs focus on modern Android applications ensures its relevance to contemporary UI design patterns.48 - 3.4. CLAY Dataset:
The CLAY dataset is designed for training and evaluating models for screen layout denoising and includes UI object type labels for Android application screenshots.6 The dataset provides these labels in the clay_labels.csv file, along with a mapping from the type ID to the type string in the label_map.txt file.57 This mapping includes labels for common UI elements such as BUTTON, IMAGE, CHECKBOX, and TEXT_INPUT.57 While the provided research snippets do not explicitly detail the availability or format of bounding box coordinates within the CLAY dataset, its use in screen layout denoising strongly suggests that spatial information about the UI objects is likely included, possibly in conjunction with the labels in clay_labels.csv.
The primary focus of the CLAY dataset is to provide semantic type labels for UI objects, which are directly applicable to Named Entity Recognition. These labels define clear semantic categories for various UI elements.57 The label_map.txt file 59 offers a straightforward mapping of these entity types. The dataset has been utilized for training and assessing the performance of models aimed at cleaning noisy UI layouts.7 The availability of this dataset on GitHub 57 facilitates its accessibility for researchers and practitioners. - 3.5. AMEX (Android Multi-annotation EXpo):
The Android Multi-annotation EXpo (AMEX) dataset is a comprehensive, large-scale resource designed to advance research on AI agents for mobile scenarios.12 It comprises over 104,000 high-resolution screenshots from 110 popular mobile applications, annotated at multiple levels and stored in JSON files.12 These annotations explicitly include bounding box information for GUI interactive elements, which have been filtered and verified by human annotators.14
AMEX provides three levels of annotation: GUI interactive element grounding (identifying interactive elements), GUI screen and element functionality descriptions (explaining what each element does), and complex natural language instructions paired with stepwise GUI-action chains (showing how elements are used to complete tasks).12 The element functionality descriptions are generated using GPT models and then manually checked for accuracy.14 These detailed annotations make AMEX exceptionally valuable for Android Layout NER. The element grounding can help identify the type of UI element, while the functionality descriptions offer semantic information about its purpose. The natural language instructions provide context on how these elements are used within application workflows. By leveraging these multi-level annotations along with the provided bounding boxes, a highly informative UI NER dataset can be created, where entities are not only categorized by type but also described by their function and role in user tasks. The datasetβs focus on widely used applications ensures its relevance to real-world scenarios.12 The project and a sample of the dataset are available online.14 - 3.6. RICO-WidgetCaptioning:
The RICO-WidgetCaptioning dataset is derived from the larger Rico dataset and contains 48,325 rows of data stored in Parquet format.46 This dataset focuses on providing concise natural language captions for UI elements present on mobile screens.46 For each captioned UI element, the dataset includes bounding box coordinates (bbox) which are scaled between 0 and 1 relative to the image dimensions.46
While RICO-WidgetCaptioning does not directly provide NER labels for the UI elements themselves, the natural language captions offer a unique opportunity for applying standard NLP-based NER techniques.46 By treating the UI element as an entity and using its caption as the associated text, existing NER models can be employed to extract entities that are mentioned within the description of the UI element. For example, if a button has the caption βLogin with Google,β a standard NER model might identify βGoogleβ as an ORGANIZATION. The UI element (the button in this case) can also be considered an entity, with the entire caption serving as its descriptive label or context. This approach offers an alternative way to perform Android Layout NER by leveraging the power of natural language processing on descriptions of UI elements. The dataset is readily accessible through Hugging Face.46
- 3.1. Android in the Wild (AitW):
- Challenges and Considerations:
When considering the use of these datasets for Android Layout NER, several challenges and considerations arise. The granularity of annotations varies significantly across the datasets 14, ranging from basic UI type labels to detailed functional descriptions and natural language captions. The user must determine the level of detail required for their specific NER task [Implicit based on user query]. Furthermore, standard NLP NER categories like Person, Organization, and Location may not directly correspond to UI elements, necessitating the creation of a custom ontology of UI-specific entity types such as UI-Button, UI-TextField, or Action-Login.2
Most of these datasets will likely require some form of preprocessing to be effectively used for NER.3 This might involve parsing view hierarchies from JSON or XML files, extracting textual content from UI elements, aligning bounding box coordinates with the corresponding text, or even manual annotation if the existing labels are insufficient for the desired NER categories. The diversity of data formats used by these datasets, including JSON, XML, CSV, Parquet, and TFRecord 11, requires familiarity with different data processing libraries and techniques. Finally, the format of bounding box coordinates can differ between datasets, and ensuring consistency, potentially through conversion, is important.11 The coordinate systems used might also vary, with some using absolute pixel coordinates and others using relative coordinates. - Recommendations and Future Work:
Based on the analysis of the available datasets, several recommendations can be made for researchers and practitioners interested in Android Layout NER. For those seeking a large dataset with preprocessed UI element information and bounding boxes that can be adapted for custom NER labeling, Android in the Wild (AitW) is a strong candidate due to its substantial size and detailed annotations, including text and type information. Researchers looking for a dataset with existing semantic labels that can be directly used or extended for UI NER should consider the Rico dataset, which offers comprehensive view hierarchies, bounding boxes, and a variety of semantic annotations. For individuals interested in modern UI designs and identifying entities based on user interactions, MobileViews provides a large dataset of recent Android applications with interaction logs and basic semantic information. The CLAY dataset is suitable for tasks focused on identifying the type of UI elements, as it offers a clean set of UI object type labels, although the availability of bounding boxes needs further verification. For advanced UI NER projects aiming to understand the function and interactivity of UI elements, AMEX stands out due to its rich set of multi-level annotations, including detailed functional descriptions. Finally, for those interested in exploring NLP-based NER on descriptions of UI elements, RICO-WidgetCaptioning provides natural language captions that can be processed using existing NER models.
Looking towards the future, the field would significantly benefit from the creation of a standardized dataset specifically designed for Android Layout NER. This dataset should ideally include a well-defined ontology of UI entities and their attributes, along with bounding boxes, textual content, semantic types, functional descriptions, and potentially even accessibility information. Further research into automated annotation techniques for UI elements, possibly leveraging the capabilities of Large Language Models (LLMs) and visual understanding models 5, is also crucial for generating large-scale, high-quality datasets for this task. Exploring methods to align and integrate information from the various existing datasets could also represent a valuable direction for future work. - Conclusion:
In conclusion, several existing Android UI datasets offer valuable resources for the task of Named Entity Recognition on Android layouts. Rico and CLAY provide semantic labels for UI elements, AMEX offers functional descriptions, RICO-WidgetCaptioning provides natural language captions, AITW offers preprocessed data suitable for custom annotation, and MobileViews provides a large dataset of modern UIs with interaction information. The optimal choice of dataset will depend on the specific requirements of the NER task at hand. Addressing the inherent challenges related to annotation granularity, mapping UI elements to standard NER categories, preprocessing requirements, and the diversity of data formats is essential for effectively utilizing these resources. The development of a dedicated, standardized Android Layout NER dataset remains a key area for future research to further advance the understanding of user interfaces and the capabilities of mobile AI.
Table 1: Comparison of Android UI Datasets for NER Potential
Dataset Name | Data Format (Primary) | Size (Approximate) | Bounding Boxes Available? | Semantic Labels Available? | Functionality Descriptions Available? | Natural Language Captions Available? | Potential NER Categories (Examples) | Key Snippet IDs |
---|---|---|---|---|---|---|---|---|
Android in the Wild (AitW) | TFRecord | 715k episodes | Yes | Basic (icon/text) | No | No | UI-ICON, UI-TEXT, custom labels based on OCR | 11 |
Rico | JSON, PNG, CSV | 72k UIs | Yes | Yes (24 component types, 197 text button, 97 icon classes) | No | No | UI-BUTTON, UI-IMAGE, UI-LIST_ITEM, etc. | 37 |
MobileViews | ZIP (JSON/XML), Parquet | 600k+ pairs | Yes | Basic (element type in actions.csv) | No | No | UI-BUTTON, UI-TEXT, UI-IMAGE, etc. | 47 |
CLAY | CSV, TXT | 59k+ screenshots | Likely | Yes (e.g., BUTTON, IMAGE, TEXT_INPUT) | No | No | UI-BUTTON, UI-IMAGE, UI-TEXT_INPUT, etc. | 57 |
AMEX | JSON | 104k screenshots | Yes | Yes (interactive element grounding) | Yes | No | UI-BUTTON, ACTION-SUBMIT, INFO-TEXT, etc. | 14 |
RICO-WidgetCaptioning | Parquet | 48k+ rows | Yes | No | No | Yes | Entities extracted from captions (e.g., LOCATION, ORGANIZATION) | 46 |
Works cited
- Recent Advances in Named Entity Recognition: A Comprehensive Survey and Comparative Study - arXiv, accessed May 3, 2025, https://arxiv.org/html/2401.10825v3
- NER - Named Entity Recognition Tutorial - Kaggle, accessed May 3, 2025, https://www.kaggle.com/code/eneszvo/ner-named-entity-recognition-tutorial
- What Is Named Entity Recognition? Selecting the Best Tool to Transform Your Model Training Data - Encord, accessed May 3, 2025, https://encord.com/blog/named-entity-recognition/
- chenjshnn/LabelDroid: Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning - GitHub, accessed May 3, 2025, https://github.com/chenjshnn/LabelDroid
- MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling, accessed May 3, 2025, https://arxiv.org/html/2405.07090v1
- Pairwise GUI Dataset Construction Between Android Phones and Tablets, accessed May 3, 2025, https://proceedings.neurips.cc/paper_files/paper/2023/file/bc4cff0b37ccab13e98b6128d89ca172-Paper-Datasets_and_Benchmarks.pdf
- Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at Scale | Request PDF, accessed May 3, 2025, https://www.researchgate.net/publication/360249324_Learning_to_Denoise_Raw_Mobile_UI_Layouts_for_Improving_Datasets_at_Scale
- Pairwise GUI Dataset Construction Between Android Phones and Tablets - OpenReview, accessed May 3, 2025, https://openreview.net/forum?id=8gDJXL652A
- Learning Structural Similarity of User Interface Layouts using Graph Networks - European Computer Vision Association, accessed May 3, 2025, https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123670732.pdf
- AndroidInTheWild: A Large-Scale Dataset For Android Device Control | OpenReview, accessed May 3, 2025, https://openreview.net/forum?id=j4b3l5kOil¬eId=S9D1g92glC
- arxiv.org, accessed May 3, 2025, https://arxiv.org/abs/2307.10088
- AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents - Yuxiang Chai, accessed May 3, 2025, http://yxchai.com/AMEX/
- NeurIPS Poster AndroidInTheWild: A Large-Scale Dataset For Android Device Control, accessed May 3, 2025, https://neurips.cc/virtual/2023/poster/73496
- AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents - arXiv, accessed May 3, 2025, https://arxiv.org/html/2407.17490v1
- Android in the Wild: A Large-Scale Dataset for Android Device Control - ResearchGate, accessed May 3, 2025, https://www.researchgate.net/publication/372468923_Android_in_the_Wild_A_Large-Scale_Dataset_for_Android_Device_Control
- RICO: A Mobile App Dataset for Building Data-Driven Design Applications - YouTube, accessed May 3, 2025, https://www.youtube.com/watch?v=bVKgsNazl7w
- [2307.10088] Android in the Wild: A Large-Scale Dataset for Android Device Control - ar5iv, accessed May 3, 2025, https://ar5iv.labs.arxiv.org/html/2307.10088
- A Large-Scale Dataset for Android Device Control - arXiv, accessed May 3, 2025, https://arxiv.org/pdf/2307.10088
- Android in the Wild: A Large-Scale Dataset for Android Device Control - Hugging Face, accessed May 3, 2025, https://huggingface.co/papers/2307.10088
- DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning - arXiv, accessed May 3, 2025, https://arxiv.org/html/2406.11896v1
- [2406.11896] DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning - arXiv, accessed May 3, 2025, https://arxiv.org/abs/2406.11896
- arXiv:2311.07562v1 [cs.CV] 13 Nov 2023, accessed May 3, 2025, http://arxiv.org/pdf/2311.07562
- βAndroid in the Wild: A Large-Scale Dataset for Android Device Controlβ, Rawles et al 2023 {G} (imitation-learning + PaLM-2 inner-monologue for smartphone control) : r/reinforcementlearning - Reddit, accessed May 3, 2025, https://www.reddit.com/r/reinforcementlearning/comments/154t0u3/android_in_the_wild_a_largescale_dataset_for/
- Android in the Wild: A Large-Scale Dataset for Android Device Control - Google Research, accessed May 3, 2025, https://research.google/pubs/android-in-the-wild-a-large-scale-dataset-for-android-device-control/
- alipay/mobile-agent - GitHub, accessed May 3, 2025, https://github.com/alipay/mobile-agent
- AitW Dataset - Papers With Code, accessed May 3, 2025, https://paperswithcode.com/dataset/aitw
- Android in the Wild: A Large-Scale Dataset for Android Device Control, accessed May 3, 2025, https://crawles.com/aitw/
- Android in the Wild: A Large-Scale Dataset for Android Device Control - Emergent Mind, accessed May 3, 2025, https://www.emergentmind.com/papers/2307.10088
- cjfcsjt/AITW_Single Β· Datasets at Hugging Face, accessed May 3, 2025, https://huggingface.co/datasets/cjfcsjt/AITW_Single
- logo DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning, accessed May 3, 2025, https://digirl-agent.github.io/
- UIBert: Learning Generic Multimodal Representations for UI Understanding - IJCAI, accessed May 3, 2025, https://www.ijcai.org/proceedings/2021/0235.pdf
- Android UI Layouts | GeeksforGeeks, accessed May 3, 2025, https://www.geeksforgeeks.org/android-ui-layouts/
- Layout basics | Mobile - Android Developers, accessed May 3, 2025, https://developer.android.com/design/ui/mobile/guides/layout-and-content/layout-basics
- Layouts in views - Android Developers, accessed May 3, 2025, https://developer.android.com/develop/ui/views/layout/declaring-layout
- Android Layout - LinearLayout, RelativeLayout - DigitalOcean, accessed May 3, 2025, https://www.digitalocean.com/community/tutorials/android-layout-linearlayout-relativelayout
- Android: How to understand layouts? - Stack Overflow, accessed May 3, 2025, https://stackoverflow.com/questions/37360802/android-how-to-understand-layouts
- Rico - Interaction Mining, accessed May 3, 2025, http://www.interactionmining.org/rico.html
- RicoSCA Dataset | Papers With Code, accessed May 3, 2025, https://paperswithcode.com/dataset/ricosca
- Pairwise GUI Dataset Construction Between Android Phones and Tablets - NIPS papers, accessed May 3, 2025, https://papers.nips.cc/paper_files/paper/2023/file/bc4cff0b37ccab13e98b6128d89ca172-Paper-Datasets_and_Benchmarks.pdf
- Rico: A Mobile App Dataset for Building Data-Driven Design Applications - Ranjitha Kumar, accessed May 3, 2025, https://ranjithakumar.net/resources/rico.pdf
- Rico: A Mobile App Dataset for Building Data-Driven Design Applications - Ranjitha Kumar, accessed May 3, 2025, https://www.ranjithakumar.net/resources/rico.pdf
- An Early Rico Retrospective: Three Years of Uses for a Mobile App Dataset - Bardia Doosti, accessed May 3, 2025, https://bardiadoosti.github.io/Papers/aiforhci/rico.pdf
- creative-graphic-design/Rico Β· Datasets at Hugging Face, accessed May 3, 2025, https://huggingface.co/datasets/creative-graphic-design/Rico
- An Early Rico Retrospective: Three Years Of Uses For A Mobile App Dataset, accessed May 3, 2025, https://research.google/pubs/an-early-rico-retrospective-three-years-of-uses-for-a-mobile-app-dataset/
- Interaction Mining, accessed May 3, 2025, http://interactionmining.org/
- rootsautomation/RICO-WidgetCaptioning Β· Datasets at Hugging Face, accessed May 3, 2025, https://huggingface.co/datasets/rootsautomation/RICO-WidgetCaptioning
- mllmTeam/MobileViews Β· Datasets at Hugging Face, accessed May 3, 2025, https://huggingface.co/datasets/mllmTeam/MobileViews
- Paper page - MobileViews: A Large-Scale Mobile GUI Dataset - Hugging Face, accessed May 3, 2025, https://huggingface.co/papers/2409.14337
- Dataset viewer - Hugging Face, accessed May 3, 2025, https://huggingface.co/docs/dataset-viewer/index
- MobileViews: A Large-Scale Mobile GUI Dataset | PromptLayer, accessed May 3, 2025, https://www.promptlayer.com/research-papers/mobileviews-a-large-scale-mobile-gui-dataset
- MobileViews: A Large-Scale Mobile GUI Dataset - arXiv, accessed May 3, 2025, https://arxiv.org/html/2409.14337v2
- Datasets - Hugging Face, accessed May 3, 2025, https://huggingface.co/docs/datasets/index
- [2409.14337] MobileViews: A Large-Scale Mobile GUI Dataset - arXiv, accessed May 3, 2025, https://arxiv.org/abs/2409.14337
- Datasets Overview - Hugging Face, accessed May 3, 2025, https://huggingface.co/docs/hub/datasets-overview
- xwk123/Mobile3M Β· Datasets at Hugging Face, accessed May 3, 2025, https://huggingface.co/datasets/xwk123/Mobile3M/viewer
- Large Language Model-Powered GUI Agents: A Survey - OpenReview, accessed May 3, 2025, https://openreview.net/pdf/6ea73900a41de66867bc8bbb6f4b3fdac4a69fce.pdf
- google-research-datasets/clay: The dataset includes UI β¦ - GitHub, accessed May 3, 2025, https://github.com/google-research-datasets/clay
- Clay Foundation Model, accessed May 3, 2025, https://clay-foundation.github.io/model/index.html
- clay/label_map.txt at main Β· google-research-datasets/clay Β· GitHub, accessed May 3, 2025, https://github.com/google-research-datasets/clay/blob/main/label_map.txt
- nicbarker/clay: High performance UI layout library in C. - GitHub, accessed May 3, 2025, https://github.com/nicbarker/clay
- Issues Β· google-research-datasets/clay - GitHub, accessed May 3, 2025, https://github.com/google-research-datasets/clay/issues
- Github x Clay integration, accessed May 3, 2025, https://www.clay.com/integrations/data-provider/github
- Dataset available? Β· Clay-foundation model Β· Discussion #146 - GitHub, accessed May 3, 2025, https://github.com/Clay-foundation/model/discussions/146
- Training Data β Clay Foundation Model, accessed May 3, 2025, https://clay-foundation.github.io/model/release-notes/data_sampling.html
- bkiers/Clay: A small library to parse CSV files and optionally map the records from the CSV file to a Java class. - GitHub, accessed May 3, 2025, https://github.com/bkiers/Clay
- Digital Earth Pacific applications Β· Clay-foundation model Β· Discussion #140 - GitHub, accessed May 3, 2025, https://github.com/Clay-foundation/model/discussions/140
- AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents - Hugging Face, accessed May 3, 2025, https://huggingface.co/papers/2407.17490
- Yuxiang007/AMEX Β· Datasets at Hugging Face, accessed May 3, 2025, https://huggingface.co/datasets/Yuxiang007/AMEX
- American Express - Default Prediction | Kaggle, accessed May 3, 2025, https://www.kaggle.com/competitions/amex-default-prediction
- How American Express uses Sales Analytics to issue over 115 million credit cards to their customers, accessed May 3, 2025, https://www.getsuper.ai/post/how-american-express-uses-sales-analytics-to-issue-over-115-million-credit-cards-to-their-customers
- Investor Relations: American Express Company, accessed May 3, 2025, http://ir.americanexpress.com/
- How American Express uses Big Data to transform operations - Supply Chain Digital, accessed May 3, 2025, https://supplychaindigital.com/technology/how-american-express-uses-big-data-transform-operations
- amex datasets | Kaggle, accessed May 3, 2025, https://www.kaggle.com/datasets/takaito/amex-datasets
- jxzly/Kaggle-American-Express-Default-Prediction-1st-solution - GitHub, accessed May 3, 2025, https://github.com/jxzly/Kaggle-American-Express-Default-Prediction-1st-solution
- Raw Data Automation and Reconciliation - American Express, accessed May 3, 2025, https://www.americanexpress.com/us/merchant/raw-data.html
- Formatting training dataset for SpaCy NER - Stack Overflow, accessed May 3, 2025, https://stackoverflow.com/questions/47443976/formatting-training-dataset-for-spacy-ner
- Labelling 100k dataset for BERT-NER - nlp - Stack Overflow, accessed May 3, 2025, https://stackoverflow.com/questions/77435341/labelling-100k-dataset-for-bert-ner
- Data preprocessing for Named Entity Recognition? - Stack Overflow, accessed May 3, 2025, https://stackoverflow.com/questions/62208736/data-preprocessing-for-named-entity-recognition
- Infering Alt-text For UI Icons With Large Language Models During App Development - arXiv, accessed May 3, 2025, https://arxiv.org/html/2409.18060v1