├── tasks ├── __init__.py ├── insforge ├── supabase ├── utils │ └── __init__.py ├── notion │ ├── easy │ │ ├── .gitkeep │ │ ├── japan_travel_planner │ │ │ └── simple__remove_osaka_itinerary │ │ │ │ ├── description.md │ │ │ │ └── meta.json │ │ ├── team_projects │ │ │ └── simple__swap_tasks │ │ │ │ ├── description.md │ │ │ │ └── meta.json │ │ ├── self_assessment │ │ │ └── simple__faq_column_layout │ │ │ │ ├── description.md │ │ │ │ └── meta.json │ │ ├── computer_science_student_dashboard │ │ │ ├── simple__study_session_tracker │ │ │ │ ├── description.md │ │ │ │ └── meta.json │ │ │ └── simple__code_snippets_go │ │ │ │ ├── meta.json │ │ │ │ └── description.md │ │ ├── toronto_guide │ │ │ └── simple__change_color │ │ │ │ ├── description.md │ │ │ │ └── meta.json │ │ ├── standard_operating_procedure │ │ │ └── simple__section_organization │ │ │ │ ├── description.md │ │ │ │ └── meta.json │ │ ├── it_trouble_shooting_hub │ │ │ └── simple__asset_retirement_migration │ │ │ │ ├── description.md │ │ │ │ └── meta.json │ │ ├── python_roadmap │ │ │ └── simple__expert_level_lessons │ │ │ │ └── meta.json │ │ └── online_resume │ │ │ └── simple__skills_development_tracker │ │ │ ├── meta.json │ │ │ └── description.md │ └── standard │ │ ├── japan_travel_planner │ │ ├── remove_osaka_itinerary │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── restaurant_expenses_sync │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── packing_progress_summary │ │ │ ├── description.md │ │ │ └── meta.json │ │ └── daily_itinerary_overview │ │ │ └── meta.json │ │ ├── team_projects │ │ ├── swap_tasks │ │ │ ├── description.md │ │ │ └── meta.json │ │ └── priority_tasks_table │ │ │ ├── meta.json │ │ │ └── description.md │ │ ├── toronto_guide │ │ ├── change_color │ │ │ ├── description.md │ │ │ └── meta.json │ │ └── weekend_adventure_planner │ │ │ └── meta.json │ │ ├── self_assessment │ │ ├── numbered_list_emojis │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── faq_column_layout │ │ │ ├── description.md │ │ │ └── meta.json │ │ └── hyperfocus_analysis_report │ │ │ └── meta.json │ │ ├── online_resume │ │ ├── layout_adjustment │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── work_history_addition │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── projects_section_update │ │ │ └── meta.json │ │ └── skills_development_tracker │ │ │ └── meta.json │ │ ├── company_in_a_box │ │ ├── goals_restructure │ │ │ ├── meta.json │ │ │ └── description.md │ │ ├── employee_onboarding │ │ │ └── meta.json │ │ └── quarterly_review_dashboard │ │ │ └── meta.json │ │ ├── python_roadmap │ │ ├── expert_level_lessons │ │ │ └── meta.json │ │ └── learning_metrics_dashboard │ │ │ └── meta.json │ │ ├── computer_science_student_dashboard │ │ ├── code_snippets_go │ │ │ └── meta.json │ │ ├── study_session_tracker │ │ │ ├── meta.json │ │ │ └── description.md │ │ └── courses_internships_relation │ │ │ └── meta.json │ │ ├── standard_operating_procedure │ │ ├── section_organization │ │ │ ├── meta.json │ │ │ └── description.md │ │ └── deployment_process_sop │ │ │ └── meta.json │ │ └── it_trouble_shooting_hub │ │ ├── security_audit_ticket │ │ └── meta.json │ │ ├── asset_retirement_migration │ │ └── meta.json │ │ └── verification_expired_update │ │ ├── meta.json │ │ └── description.md ├── postgres │ ├── easy │ │ ├── .gitkeep │ │ ├── chinook │ │ │ ├── customer_data_migration_basic │ │ │ │ └── customer_data.pkl │ │ │ └── update_employee_info │ │ │ │ └── description.md │ │ ├── dvdrental │ │ │ └── create_payment_index │ │ │ │ └── description.md │ │ ├── sports │ │ │ └── create_performance_indexes │ │ │ │ └── description.md │ │ └── employees │ │ │ ├── employee_projects_basic │ │ │ └── description.md │ │ │ └── hiring_year_summary │ │ │ └── description.md │ └── standard │ │ ├── chinook │ │ └── customer_data_migration │ │ │ └── customer_data.pkl │ │ └── vectors │ │ └── dba_vector_analysis │ │ └── prepare_environment.py ├── filesystem │ ├── easy │ │ ├── .gitkeep │ │ ├── student_database │ │ │ ├── recommender_name │ │ │ │ └── description.md │ │ │ └── duplicate_name │ │ │ │ └── description.md │ │ ├── file_property │ │ │ ├── largest_rename │ │ │ │ ├── description.md │ │ │ │ └── meta.json │ │ │ └── txt_merging │ │ │ │ ├── description.md │ │ │ │ └── meta.json │ │ ├── file_context │ │ │ ├── uppercase │ │ │ │ └── description.md │ │ │ ├── file_splitting │ │ │ │ └── description.md │ │ │ └── pattern_matching │ │ │ │ └── description.md │ │ ├── papers │ │ │ └── papers_counting │ │ │ │ └── description.md │ │ ├── folder_structure │ │ │ └── structure_analysis │ │ │ │ └── description.md │ │ └── legal_document │ │ │ └── file_reorganize │ │ │ └── description.md │ └── standard │ │ ├── papers │ │ └── find_math_paper │ │ │ └── description.md │ │ ├── student_database │ │ ├── duplicate_name │ │ │ └── description.md │ │ ├── english_talent │ │ │ └── description.md │ │ └── gradebased_score │ │ │ └── description.md │ │ ├── file_context │ │ ├── file_splitting │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── file_merging │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── duplicates_searching │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── pattern_matching │ │ │ ├── description.md │ │ │ └── meta.json │ │ └── uppercase │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── threestudio │ │ ├── requirements_completion │ │ │ └── description.md │ │ ├── output_analysis │ │ │ └── description.md │ │ └── code_locating │ │ │ └── description.md │ │ ├── file_property │ │ ├── size_classification │ │ │ ├── description.md │ │ │ └── meta.json │ │ └── time_classification │ │ │ ├── meta.json │ │ │ └── description.md │ │ ├── votenet │ │ ├── debugging │ │ │ └── description.md │ │ └── dataset_comparison │ │ │ └── description.md │ │ ├── folder_structure │ │ └── structure_mirror │ │ │ └── description.md │ │ ├── desktop │ │ └── timeline_extraction │ │ │ └── description.md │ │ └── desktop_template │ │ └── file_arrangement │ │ └── description.md ├── playwright │ ├── easy │ │ └── .gitkeep │ └── standard │ │ ├── web_search │ │ ├── birth_of_arvinxu │ │ │ ├── description.md │ │ │ └── meta.json │ │ └── r1_arxiv │ │ │ ├── meta.json │ │ │ └── description.md │ │ └── eval_web │ │ ├── extraction_table │ │ └── meta.json │ │ └── cloudflare_turnstile_challenge │ │ ├── meta.json │ │ └── description.md ├── playwright_webarena │ ├── easy │ │ ├── .gitkeep │ │ ├── reddit │ │ │ ├── ai_data_analyst │ │ │ │ ├── label.txt │ │ │ │ ├── meta.json │ │ │ │ └── description.md │ │ │ ├── movie_reviewer_analysis │ │ │ │ ├── label.txt │ │ │ │ ├── meta.json │ │ │ │ └── description.md │ │ │ ├── llm_research_summary │ │ │ │ ├── label.txt │ │ │ │ ├── meta.json │ │ │ │ └── description.md │ │ │ ├── nba_statistics_analysis │ │ │ │ ├── label.txt │ │ │ │ ├── meta.json │ │ │ │ └── description.md │ │ │ └── routine_tracker_forum │ │ │ │ ├── meta.json │ │ │ │ └── description.md │ │ └── shopping_admin │ │ │ ├── products_sales_analysis │ │ │ ├── label.txt │ │ │ └── meta.json │ │ │ ├── sales_inventory_analysis │ │ │ ├── label.txt │ │ │ └── meta.json │ │ │ ├── fitness_promotion_strategy │ │ │ ├── label.txt │ │ │ └── meta.json │ │ │ ├── ny_expansion_analysis │ │ │ ├── label.txt │ │ │ └── meta.json │ │ │ └── search_filtering_operations │ │ │ ├── label.txt │ │ │ └── meta.json │ └── standard │ │ ├── shopping │ │ ├── running_shoes_purchase │ │ │ ├── label.txt │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── advanced_product_analysis │ │ │ ├── label.txt │ │ │ ├── meta.json │ │ │ └── description.md │ │ ├── printer_keyboard_search │ │ │ ├── label.txt │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── gaming_accessories_analysis │ │ │ ├── label.txt │ │ │ └── meta.json │ │ ├── holiday_baking_competition │ │ │ ├── label.txt │ │ │ └── meta.json │ │ ├── multi_category_budget_analysis │ │ │ ├── label.txt │ │ │ └── meta.json │ │ └── health_routine_optimization │ │ │ ├── label.txt │ │ │ └── meta.json │ │ ├── shopping_admin │ │ ├── customer_segmentation_setup │ │ │ ├── label.txt │ │ │ └── meta.json │ │ ├── products_sales_analysis │ │ │ ├── label.txt │ │ │ └── meta.json │ │ ├── marketing_customer_analysis │ │ │ ├── label.txt │ │ │ └── meta.json │ │ ├── sales_inventory_analysis │ │ │ ├── label.txt │ │ │ └── meta.json │ │ ├── search_filtering_operations │ │ │ ├── label.txt │ │ │ └── meta.json │ │ ├── fitness_promotion_strategy │ │ │ ├── label.txt │ │ │ └── meta.json │ │ └── ny_expansion_analysis │ │ │ ├── label.txt │ │ │ └── meta.json │ │ └── reddit │ │ ├── ai_data_analyst │ │ ├── label.txt │ │ └── meta.json │ │ ├── movie_reviewer_analysis │ │ ├── label.txt │ │ └── meta.json │ │ ├── llm_research_summary │ │ ├── label.txt │ │ └── meta.json │ │ ├── buyitforlife_research │ │ ├── label.txt │ │ └── meta.json │ │ ├── budget_europe_travel │ │ └── meta.json │ │ ├── routine_tracker_forum │ │ ├── meta.json │ │ └── description.md │ │ └── nba_statistics_analysis │ │ ├── meta.json │ │ └── label.txt └── github │ ├── easy │ ├── build-your-own-x │ │ ├── close_commented_issues │ │ │ ├── description.md │ │ │ └── meta.json │ │ └── record_recent_commits │ │ │ ├── meta.json │ │ │ └── description.md │ ├── claude-code │ │ ├── thank_docker_pr_author │ │ │ ├── description.md │ │ │ └── meta.json │ │ ├── triage_missing_tool_result_issue │ │ │ ├── description.md │ │ │ └── meta.json │ │ └── add_terminal_shortcuts_doc │ │ │ ├── description.md │ │ │ └── meta.json │ ├── missing-semester │ │ ├── find_ga_tracking_id │ │ │ ├── description.md │ │ │ └── meta.json │ │ └── count_translations │ │ │ ├── description.md │ │ │ └── meta.json │ └── mcpmark-cicd │ │ ├── nightly_health_check │ │ ├── description.md │ │ └── meta.json │ │ ├── issue_lint_guard │ │ ├── meta.json │ │ └── description.md │ │ └── basic_ci_checks │ │ ├── meta.json │ │ └── description.md │ └── standard │ ├── missing-semester │ ├── find_salient_file │ │ ├── description.md │ │ └── meta.json │ ├── find_legacy_name │ │ ├── description.md │ │ └── meta.json │ └── assign_contributor_labels │ │ ├── description.md │ │ └── meta.json │ ├── build_your_own_x │ ├── find_commit_date │ │ ├── description.md │ │ └── meta.json │ └── find_rag_commit │ │ ├── description.md │ │ └── meta.json │ ├── harmony │ ├── fix_conflict │ │ └── meta.json │ ├── issue_pr_commit_workflow │ │ └── meta.json │ ├── issue_tagging_pr_closure │ │ └── meta.json │ ├── release_management_workflow │ │ └── meta.json │ └── multi_branch_commit_aggregation │ │ └── meta.json │ ├── easyr1 │ ├── qwen3_issue_management │ │ └── meta.json │ ├── config_parameter_audit │ │ └── meta.json │ ├── advanced_branch_strategy │ │ └── meta.json │ └── performance_regression_investigation │ │ └── meta.json │ ├── mcpmark-cicd │ ├── linting_ci_workflow │ │ └── meta.json │ ├── issue_management_workflow │ │ └── meta.json │ ├── deployment_status_workflow │ │ └── meta.json │ └── pr_automation_workflow │ │ └── meta.json │ └── claude-code │ ├── claude_collaboration_analysis │ └── meta.json │ ├── feature_commit_tracking │ └── meta.json │ ├── label_color_standardization │ └── meta.json │ ├── critical_issue_hotfix_workflow │ └── meta.json │ └── automated_changelog_generation │ └── meta.json ├── src ├── base │ ├── __init__.py │ └── login_helper.py ├── config │ └── __init__.py ├── mcp_services │ ├── insforge │ │ └── __init__.py │ ├── playwright │ │ └── __init__.py │ ├── notion │ │ └── __init__.py │ ├── supabase │ │ └── __init__.py │ ├── postgres │ │ └── __init__.py │ ├── github │ │ └── __init__.py │ └── filesystem │ │ └── __init__.py ├── agents │ ├── utils │ │ └── __init__.py │ ├── mcp │ │ └── __init__.py │ └── __init__.py └── logger.py ├── .gitattributes ├── .github ├── ISSUE_TEMPLATE │ ├── config.yml │ ├── 2_feature_request.yml │ └── 1_bug_report.yml └── PULL_REQUEST_TEMPLATE.md ├── requirements.txt ├── cspell.config.yaml ├── .editorconfig ├── docs └── contributing │ └── make-contribution.md ├── pyproject.toml └── .dockerignore /tasks/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/base/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/config/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tasks/insforge: -------------------------------------------------------------------------------- 1 | postgres -------------------------------------------------------------------------------- /tasks/supabase: -------------------------------------------------------------------------------- 1 | postgres -------------------------------------------------------------------------------- /tasks/utils/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tasks/notion/easy/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tasks/postgres/easy/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tasks/filesystem/easy/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tasks/playwright/easy/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/mcp_services/insforge/__init__.py: -------------------------------------------------------------------------------- 1 | """Insforge MCP Service Implementation for MCPMark.""" 2 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/ai_data_analyst/label.txt: -------------------------------------------------------------------------------- 1 | - Deeplearning_Post_Count|25 2 | - ChatGPT_Tool_Vote_Count|2655 3 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | # SCM syntax highlighting & preventing 3-way merges 2 | pixi.lock merge=binary linguist-language=YAML linguist-generated=true 3 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/running_shoes_purchase/label.txt: -------------------------------------------------------------------------------- 1 | SKUID|B08KKX1WXQ 2 | Price|$56.21 3 | NumberOfReviews|46 4 | ReviewRating|86% 5 | Subtotal|$112.42 -------------------------------------------------------------------------------- /tasks/notion/standard/japan_travel_planner/remove_osaka_itinerary/description.md: -------------------------------------------------------------------------------- 1 | Go to Japan Travel Planner and remove the itinerary in OSAKA after 6 PM (excluding 6 PM) in Day 1 and Day 2. -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/advanced_product_analysis/label.txt: -------------------------------------------------------------------------------- 1 | GingerAleSKU|B071KC37VD 2 | IntelNUCSKU|B01DJ9XID4 3 | CartTotal|$668.49 4 | ReviewCount|12 5 | LatestReviewer|jwm -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/customer_segmentation_setup/label.txt: -------------------------------------------------------------------------------- 1 | InitialGroups|4 2 | FinalGroups|5 3 | InitialCustomers|70 4 | FinalCustomers|71 5 | LastOrderCustomer|Ava Brown -------------------------------------------------------------------------------- /tasks/postgres/standard/chinook/customer_data_migration/customer_data.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eval-sys/mcpmark/HEAD/tasks/postgres/standard/chinook/customer_data_migration/customer_data.pkl -------------------------------------------------------------------------------- /tasks/postgres/easy/chinook/customer_data_migration_basic/customer_data.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eval-sys/mcpmark/HEAD/tasks/postgres/easy/chinook/customer_data_migration_basic/customer_data.pkl -------------------------------------------------------------------------------- /src/agents/utils/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Utility functions for MCPMark Agent 3 | ==================================== 4 | """ 5 | 6 | from .token_usage import TokenUsageTracker 7 | 8 | __all__ = ["TokenUsageTracker"] -------------------------------------------------------------------------------- /tasks/notion/standard/team_projects/swap_tasks/description.md: -------------------------------------------------------------------------------- 1 | Go to the Team Projects page, find the person responsible for the most tasks and the person responsible for the fewest tasks, then swap their assigned tasks. -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/movie_reviewer_analysis/label.txt: -------------------------------------------------------------------------------- 1 | - Total_Year_Posts|1 2 | - Top_Title|Who will win the Oscar for ACTRESS IN A SUPPORTING ROLE? 3 | - Top_Upvotes|9933 4 | - Top_Comments|23 5 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/config.yml: -------------------------------------------------------------------------------- 1 | contact_links: 2 | - name: Questions and ideas 3 | url: https://github.com/eval-sys/mcpmark/discussions/new/choose 4 | about: Please post questions, and ideas in discussions. 5 | -------------------------------------------------------------------------------- /tasks/filesystem/easy/student_database/recommender_name/description.md: -------------------------------------------------------------------------------- 1 | Please find the recommendation letter for Patricia Jones and identify who wrote it. Generate a `recommender.txt` file with only the author's name. 2 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/llm_research_summary/label.txt: -------------------------------------------------------------------------------- 1 | - Total_LLM_Posts|9 2 | - Top1_Title|[P] I made a command-line tool that explains your errors using ChatGPT (link in comments) 3 | - Top1_Upvotes|2655 4 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/printer_keyboard_search/label.txt: -------------------------------------------------------------------------------- 1 | PrinterSKUID|B09J8KQX6V 2 | PrinterPrice|$248.04 3 | KeyboardSKUID|B08JD7F3F5 4 | KeyboardPrice|$85.99 5 | KeyboardReviews|12 6 | KeyboardRating|77% -------------------------------------------------------------------------------- /tasks/github/easy/build-your-own-x/close_commented_issues/description.md: -------------------------------------------------------------------------------- 1 | Use the GitHub MCP tools to close every issue in `mcpmark-eval/build-your-own-x` that already has at least one comment. Leave all other issues unchanged. 2 | -------------------------------------------------------------------------------- /tasks/notion/easy/japan_travel_planner/simple__remove_osaka_itinerary/description.md: -------------------------------------------------------------------------------- 1 | Go to Japan Travel Planner, and go to the Travel Itineray database, and remove the itinerary in OSAKA after 6 PM (excluding 6 PM) in Day 1 and Day 2. -------------------------------------------------------------------------------- /src/mcp_services/playwright/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | Playwright MCP Service for MCPMark 4 | ================================== 5 | 6 | This package provides Playwright MCP integration for web automation tasks. 7 | """ 8 | -------------------------------------------------------------------------------- /tasks/notion/easy/team_projects/simple__swap_tasks/description.md: -------------------------------------------------------------------------------- 1 | Go to the Team Projects page, find the person responsible for the most tasks (10 in total) and the person responsible for the fewest tasks (3 in total), then swap their assigned tasks. -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/nba_statistics_analysis/label.txt: -------------------------------------------------------------------------------- 1 | - Total_NBA_Posts|20 2 | - Top_Title|Hamby claims [WNBA Champ] Aces 'unprofessional' after trade 3 | - Top_Votes|614 4 | - Top_Comments|170 5 | - Top_Author|Responsible-Lunch815 6 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | openai-agents 2 | openai 3 | python-dotenv 4 | notion-client>=2.4.0 5 | playwright>=1.43.0 6 | seaborn>=0.12.0 7 | matplotlib>=3.7.0 8 | numpy>=1.23.0 9 | psycopg2 10 | pyyaml 11 | nest_asyncio 12 | litellm==1.80.0 13 | pydantic 14 | -------------------------------------------------------------------------------- /src/mcp_services/notion/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Notion-specific modules for MCPMark. 3 | """ 4 | 5 | from .notion_task_manager import NotionTaskManager, NotionTask 6 | from .notion_state_manager import NotionStateManager 7 | 8 | __all__ = ["NotionTaskManager", "NotionTask", "NotionStateManager"] 9 | -------------------------------------------------------------------------------- /src/base/login_helper.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | 3 | 4 | class BaseLoginHelper(ABC): 5 | """Abstract base class for login helpers.""" 6 | 7 | def __init__(self): 8 | pass 9 | 10 | @abstractmethod 11 | def login(self, **kwargs): 12 | pass 13 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/gaming_accessories_analysis/label.txt: -------------------------------------------------------------------------------- 1 | Products70Plus|7 2 | CheapestReviewedSKU|B014HDAUAA 3 | CheapestReviewedPrice|$0.99 4 | ComparisonCount|2 5 | N64Subtotal|$84.95 6 | CheckoutEmail|test.buyer@example.com 7 | ShippingState|California 8 | ShippingMethods|1 -------------------------------------------------------------------------------- /tasks/github/standard/missing-semester/find_salient_file/description.md: -------------------------------------------------------------------------------- 1 | I want to know which file has been modified most frequently in the past 100 commits. However, I don't want to consider files related to GitHub Actions. 2 | Please find the file and create an ANSWER.md, then write the file name in it. -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/shopping_admin/products_sales_analysis/label.txt: -------------------------------------------------------------------------------- 1 | YogaProducts|171 2 | WH11Price|$54.00 3 | ZeroQuantityProducts|150 4 | LowestProduct|Sprite Stasis Ball 55 cm foot:5 5 | QuestLumaflexQuantity|6 6 | DashboardRevenue|$0.00 7 | PendingOrders|10 8 | GraceNguyenOrderID|000000189 9 | -------------------------------------------------------------------------------- /cspell.config.yaml: -------------------------------------------------------------------------------- 1 | version: "0.2" 2 | ignorePaths: [] 3 | dictionaryDefinitions: [] 4 | dictionaries: [] 5 | words: 6 | - datname 7 | - domcontentloaded 8 | - modelcontextprotocol 9 | - pgdumplib 10 | - pixi 11 | - pypi 12 | - topbar 13 | - usename 14 | ignoreWords: [] 15 | import: [] 16 | -------------------------------------------------------------------------------- /tasks/github/standard/build_your_own_x/find_commit_date/description.md: -------------------------------------------------------------------------------- 1 | Find out when the entries in the Voxel Engine section were first created by Daniel Stefanovic. After finding this information, create an ANSWER.md file in the repository with the content being the date in [YYYY]-[MM]-[DD] format (e.g., 2000-06-02). -------------------------------------------------------------------------------- /src/agents/mcp/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | MCP (Model Context Protocol) Components 3 | ======================================== 4 | 5 | Minimal MCP server implementations for MCPMark. 6 | """ 7 | 8 | from .stdio_server import MCPStdioServer 9 | from .http_server import MCPHttpServer 10 | 11 | __all__ = ["MCPStdioServer", "MCPHttpServer"] -------------------------------------------------------------------------------- /tasks/playwright/standard/web_search/birth_of_arvinxu/description.md: -------------------------------------------------------------------------------- 1 | # Web Search Task 2 | 3 | Use Playwright MCP tools to search for information about the X profile https://x.com/arvin17x and find out when this person was born. 4 | 5 | ## Requirements: 6 | 7 | Extract the answer in specific format: 8 | - just year,like 1990, 2001 9 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/holiday_baking_competition/label.txt: -------------------------------------------------------------------------------- 1 | SecondGingerbreadSKU|B0075AO9RI 2 | HighestRatedCookieSKURating|B0951CPYV7:86% 3 | CheapestChocolatePriceReviews|$1.04:12 4 | CartSubtotalAfterUpdate|$128.07 5 | TotalCartItems|8 6 | Page2ThirdProductSKUPrice|B09RPXCB47:$21.99 7 | ProductManufacturer|That Melanin Tho -------------------------------------------------------------------------------- /.editorconfig: -------------------------------------------------------------------------------- 1 | root = true 2 | 3 | ; Always use Unix style new lines with new line ending on every file and trim whitespace 4 | [*] 5 | end_of_line = lf 6 | insert_final_newline = true 7 | trim_trailing_whitespace = true 8 | 9 | ; Python: PEP8 defines 4 spaces for indentation 10 | [*.py] 11 | indent_style = space 12 | indent_size = 4 13 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/shopping_admin/sales_inventory_analysis/label.txt: -------------------------------------------------------------------------------- 1 | SpriteProducts|16 2 | Quantity100Products|1886 3 | WS12Info|Radiant Tee:$22.00 4 | PendingOrders|10 5 | GraceOrderID|000000114 6 | HighestOrderInfo|Samantha Jones:$292.40 7 | CheapProduct|Sprite Yoga Strap 6 foot:6 8 | OvernightDufflePrice|$45.00 9 | HollisterPosition|1st 10 | -------------------------------------------------------------------------------- /tasks/github/standard/build_your_own_x/find_rag_commit/description.md: -------------------------------------------------------------------------------- 1 | Find out the specific commit SHA of adding an entry about "RAG for Document Search". After finding this information, create an ANSWER.md file in the repository with the content being the commit SHA (e.g., 023dfa35694db2709057488ad338afdbc89fb226). 2 | 3 | Hint: It should be in an "AI model" section I think. -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/products_sales_analysis/label.txt: -------------------------------------------------------------------------------- 1 | YogaProducts|171 2 | WH11Price|$54.00 3 | ZeroQuantityProducts|150 4 | LowestProduct|Sprite Stasis Ball 55 cm foot:5 5 | QuestLumaflexQuantity|6 6 | DashboardRevenue|$0.00 7 | SarahMillerEmail|helloworld@yahoo.com 8 | TotalCustomers|72 9 | PendingOrders|10 10 | GraceNguyenOrderID|000000189 -------------------------------------------------------------------------------- /tasks/github/standard/missing-semester/find_legacy_name/description.md: -------------------------------------------------------------------------------- 1 | I remember that a long time ago, *The Missing Semester of Your CS Education* had a different name and domain. There should be some related commit history. Please find the old name and domain and create an **ANSWER.md** file with them, formatted as: 2 | 3 | [title](url) 4 | 5 | Then push the file to the `master` branch. -------------------------------------------------------------------------------- /src/mcp_services/supabase/__init__.py: -------------------------------------------------------------------------------- 1 | """Supabase MCP service integration for MCPMark.""" 2 | 3 | from .supabase_login_helper import SupabaseLoginHelper 4 | from .supabase_state_manager import SupabaseStateManager 5 | from .supabase_task_manager import SupabaseTaskManager 6 | 7 | __all__ = [ 8 | "SupabaseLoginHelper", 9 | "SupabaseStateManager", 10 | "SupabaseTaskManager", 11 | ] 12 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/ai_data_analyst/label.txt: -------------------------------------------------------------------------------- 1 | - Deeplearning_Post_Count|25 2 | - ChatGPT_Tool_Vote_Count|2655 3 | - Transformer_Third_Result|[R] The Table Feature Transformation Library Release 4 | - Page2_Top_Post_Title|[D]GPT-4 might be able to tell you if it hallucinated 5 | - Page2_Top_Post_Votes|634 6 | - Page2_Last_Comment_Username|Nous_AI 7 | - Page2_Last_Comment_Text|Fascinating. -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/marketing_customer_analysis/label.txt: -------------------------------------------------------------------------------- 1 | Top2SearchTerms|hollister:19,Joust Bag:4 2 | ZeroResultTerm|nike:3 3 | TotalSearchTerms|7 4 | CouponCodes|H20:$4 Luma water bottle (save 70%) 5 | ActiveRulesCount|4 6 | SubscribedCount|1 7 | EmailVerification|john.smith.xyz@gmail.com:yes,admin@magento.com:no 8 | TopProduct|Sprite Stasis Ball 65 cm:6 9 | TotalRevenue|$0.00 -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/shopping_admin/fitness_promotion_strategy/label.txt: -------------------------------------------------------------------------------- 1 | Bestseller1|Sprite Stasis Ball 65 cm:$27.00:6:24-WG082-blue:100:Enabled 2 | Bestseller2|Quest Lumaflex™ Band:$19.00:6:24-UG01:100:Enabled 3 | Bestseller3|Sprite Yoga Strap 6 foot:$14.00:6:24-WG085:100:Enabled 4 | TotalRevenue|$0.00 5 | BestsellerInSearch|No:0 6 | PercentageDiscountRule|20% OFF Ever $200-plus purchase!*:20% 7 | ActiveRulesCount|4 8 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/shopping_admin/ny_expansion_analysis/label.txt: -------------------------------------------------------------------------------- 1 | Lifetime_Sales_Amount|$0.00 2 | Cheap_Bestseller_Name|Sprite Yoga Strap 6 foot 3 | Second_Bestseller_Price|$14.00 4 | Second_Bestseller_Quantity|6 5 | Product_In_Last_Orders|No 6 | NY_Tax_Rate|8.3750 7 | CA_Tax_Rate|8.2500 8 | Higher_Tax_State|NY 9 | Total_States_With_Tax|2 10 | Processing_Visible_Storefront|Yes 11 | Processing_Default_Status|Yes 12 | -------------------------------------------------------------------------------- /docs/contributing/make-contribution.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | 1. Fork the repository and create a feature branch. 4 | 5 | 2. Add new tasks under `tasks/////` with the files of `meta.json`, `description.md` and `verify.py`. Please refer to [Task Page](../datasets/task.md) for detailed instructions. 6 | 7 | 3. Ensure all tests pass. 8 | 9 | 4. Submit a pull request — contributions are welcome! 10 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/multi_category_budget_analysis/label.txt: -------------------------------------------------------------------------------- 1 | chocolate_products|$1.04:B071954ZDC;$1.89:B07ND598N3;$2.50:B01G26DMSC 2 | chocolate_sum|$5.43 3 | tabletop_search_count|60 4 | tabletop_product|$169.99:B09NPX5CDP 5 | tabletop_reviews|4:95% 6 | cheapest_computer_accessory|$1.17 7 | price_difference|$168.82 8 | comparison_count|3 9 | cart_subtotal|$2.21 10 | cart_item_count|2 11 | under_60_budget|YES -------------------------------------------------------------------------------- /tasks/filesystem/easy/file_property/largest_rename/description.md: -------------------------------------------------------------------------------- 1 | # Largest File Rename Task 2 | 3 | ## 📋 Task Description 4 | 5 | Rename the largest `.jpg` file in the test directory to `largest.jpg` based on file size. 6 | 7 | ## 🎯 Task Objectives 8 | 9 | 1. **Find all `.jpg` files** in the test directory 10 | 2. **Determine which `.jpg` file is the largest** by file size 11 | 3. **Rename the largest `.jpg` file to `largest.jpg`** 12 | -------------------------------------------------------------------------------- /tasks/filesystem/easy/student_database/duplicate_name/description.md: -------------------------------------------------------------------------------- 1 | Please help me identify any duplicate name from the list of all the 150 students. Do not use python code. You only need to find **any one** duplicate name. Then generate a `namesake.txt` file to record the result in the following format, with only three lines. Note: when recording the name, replace underscores with spaces. 2 | 3 | name: xxx 4 | count: xxx 5 | ids: xxx, xxx, ... 6 | -------------------------------------------------------------------------------- /tasks/notion/standard/toronto_guide/change_color/description.md: -------------------------------------------------------------------------------- 1 | Navigate to the Toronto Guide page in Notion and change all pink-colored elements (tags and callout colors) to different colors. 2 | 3 | ## Requirements 4 | 1. Find and access the Toronto Guide page in Notion 5 | 2. Identify and change all pink elements including: 6 | - Pink tags in databases 7 | - Pink callout backgrounds 8 | 3. Change all pink colors to any other color of your choice -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/shopping_admin/search_filtering_operations/label.txt: -------------------------------------------------------------------------------- 1 | TankSearchCount|2 2 | ZeroResultsCount|1 3 | HighestUseTerm|hollister:19 4 | Results20to30Term|Antonia Racer Tank:23|tanks:23 5 | Hits15PlusCount|1 6 | ID10to15MaxResults|Antonia Racer Tank:23 7 | DefaultStoreViewCount|7 8 | OneResultTerm|hollister:19|WP10:1 9 | HighestResultLastSearch|Antonia Racer Tank:23 10 | Position3Bestseller|Sprite Stasis Ball 65 cm:6 11 | -------------------------------------------------------------------------------- /tasks/github/easy/claude-code/thank_docker_pr_author/description.md: -------------------------------------------------------------------------------- 1 | Use the GitHub MCP tools to comment on the pull request in `mcpmark-eval/claude-code` that proposes automating Docker image builds with GitHub Actions. 2 | 3 | 1. Skim the PR description so you understand it’s the Docker workflow automation proposal. 4 | 2. Add a new comment on that PR that thanks the author and contains all of these keywords: `Docker workflow`, `automation`, `review`. 5 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/health_routine_optimization/label.txt: -------------------------------------------------------------------------------- 1 | 2 | AdvancedSearchResults|2906 3 | Battery1Name|LOOPACELL AG13 LR44 4 | Battery1Price|$3.72 5 | Battery2Name|Energizer MAX C 6 | Battery2Price|$14.87 7 | ComparisonCount|2 8 | TeaSKU|B0040WHKIY 9 | TeaRating|95% 10 | TeaReviews|4 11 | V8Position|3rd 12 | CartUniqueProducts|2 13 | CartTotalQuantity|3 14 | InitialSubtotal|$53.19 15 | FinalSubtotal|$72.55 16 | -------------------------------------------------------------------------------- /tasks/notion/easy/self_assessment/simple__faq_column_layout/description.md: -------------------------------------------------------------------------------- 1 | Navigate to the "Self Assessment" page and reorganize the FAQ toggle content to make it easier to scan. 2 | 3 | **Task Requirements:** 4 | 1. Add a column list with two columns inside the FAQ toggle. 5 | 2. Move the first two existing Q&A pairs from the FAQ into the left column. 6 | 3. Move the third existing Q&A pair into the right column, keeping the original heading/paragraph formatting. 7 | -------------------------------------------------------------------------------- /src/agents/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | MCPMark Agent Module 3 | ==================== 4 | 5 | Provides agent implementations and registry for MCPMark. 6 | """ 7 | 8 | from .base_agent import BaseMCPAgent 9 | from .mcpmark_agent import MCPMarkAgent 10 | from .react_agent import ReActAgent 11 | 12 | AGENT_REGISTRY = { 13 | "mcpmark": MCPMarkAgent, 14 | "react": ReActAgent, 15 | } 16 | 17 | __all__ = ["BaseMCPAgent", "MCPMarkAgent", "ReActAgent", "AGENT_REGISTRY"] 18 | 19 | -------------------------------------------------------------------------------- /tasks/notion/easy/computer_science_student_dashboard/simple__study_session_tracker/description.md: -------------------------------------------------------------------------------- 1 | Create a new study-session entry on the **Computer Science Student Dashboard** page. 2 | 3 | 1. Locate the ☑️ Habit tracker section of the page. 4 | 2. **Insert a new date mention** for `2025-01-29` immediately **after the existing `2022-09-02` items but before the divider block** that follows them. Match the formatting of the existing dates (bold text with a Notion date mention). 5 | -------------------------------------------------------------------------------- /tasks/github/easy/missing-semester/find_ga_tracking_id/description.md: -------------------------------------------------------------------------------- 1 | Use the GitHub MCP tools to inspect the `mcpmark-eval/missing-semester` repository. 2 | 3 | 1. Determine the Analytics tracking ID that the Missing Semester site declares in its configuration. 4 | 2. Create an `ANSWER.md` file in the repository root that contains exactly: 5 | 6 | ``` 7 | Analytics Tracking ID: 8 | ``` 9 | 10 | 3. Commit the new file and push the change to `master`. 11 | -------------------------------------------------------------------------------- /tasks/notion/standard/japan_travel_planner/restaurant_expenses_sync/description.md: -------------------------------------------------------------------------------- 1 | Please find the restaurants that appear in Day 1 of the Travel Itinerary database, then create corresponding entries in the Expenses database, one restaurant per entry. Set the date uniformly to Jan 1, 2025, and the cost uniformly to $120. Display the restaurant name in the Expense field. Set Category to Dining. For Comment, use the Description from the corresponding restaurant page. Leave other properties empty. -------------------------------------------------------------------------------- /tasks/notion/easy/toronto_guide/simple__change_color/description.md: -------------------------------------------------------------------------------- 1 | Open the **Toronto Guide** page and refresh the colors of the tags in the **Food** database. 2 | 3 | ## Requirements 4 | 1. Find and open the Toronto Guide page in Notion. 5 | 2. Locate the *Food* database on that page. 6 | 3. Update every tag in the Food database that is currently pink so that it uses a different color of your choice (any non-pink color is fine). 7 | 4. Do not modify callouts or tags in the other databases. 8 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/sales_inventory_analysis/label.txt: -------------------------------------------------------------------------------- 1 | SpriteProducts|16 2 | Quantity100Products|1886 3 | WS12Info|Radiant Tee:$22.00 4 | PendingOrders|10 5 | GraceOrderID|000000114 6 | HighestOrderInfo|Samantha Jones:$292.40 7 | CheapProduct|Sprite Yoga Strap 6 foot:6 8 | OvernightDufflePrice|$45.00 9 | HollisterPosition|1st 10 | CostelloCustomers|0 11 | SarahMillerInfo|General:Apr 19, 2023 5:45:07 PM 12 | PaidInvoices|2 13 | Invoice002BillTo|Veronica Costello -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | #### Change Type 2 | 3 | 4 | 5 | - [ ] ✨ feat 6 | - [ ] 🐛 fix 7 | - [ ] ♻️ refactor 8 | - [ ] 💄 style 9 | - [ ] 👷 build 10 | - [ ] ⚡️ perf 11 | - [ ] 📝 docs 12 | - [ ] 🔨 chore 13 | 14 | #### Description of Change 15 | 16 | 17 | 18 | #### Additional Information 19 | 20 | 21 | -------------------------------------------------------------------------------- /tasks/notion/standard/self_assessment/numbered_list_emojis/description.md: -------------------------------------------------------------------------------- 1 | Please find all numbered list items in the Self Assessment page, use Notion tools to replace the numbers with corresponding emoji numbers (e.g., 1️⃣, 2️⃣, 3️⃣). For example: 2 | Here is the translated and reformatted version of your request: 3 | 4 | If the original numbered list is: 5 | 6 | 1. First step 7 | 2. Second step 8 | 3. Third step 9 | 10 | It should become: 11 | 12 | 1️⃣ First step 13 | 2️⃣ Second step 14 | 3️⃣ Third step -------------------------------------------------------------------------------- /tasks/filesystem/easy/file_context/uppercase/description.md: -------------------------------------------------------------------------------- 1 | # File Context Task: Convert Files to Uppercase 2 | 3 | ## 📋 Task Description 4 | 5 | You need to process 5 text files (file_01.txt to file_05.txt) and convert their content to uppercase format. 6 | 7 | ## 🎯 Task Objectives 8 | 9 | 1. **Create an uppercase directory** in the test environment root 10 | 2. **Convert each file** from file_01.txt to file_05.txt to uppercase 11 | 3. **Save converted files** in the uppercase/ directory with the same names 12 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/search_filtering_operations/label.txt: -------------------------------------------------------------------------------- 1 | TankSearchCount|2 2 | ZeroResultsCount|1 3 | HighestUseTerm|hollister:19 4 | Results20to30Term|Antonia Racer Tank:23|tanks:23 5 | Hits15PlusCount|1 6 | ID10to15MaxResults|Antonia Racer Tank:23 7 | DefaultStoreViewCount|7 8 | OneResultTerm|hollister:19|WP10:1 9 | HighestResultLastSearch|Antonia Racer Tank:23 10 | Position3Bestseller|Sprite Stasis Ball 65 cm:6 11 | TopUseTerm|hollister:19 12 | FirstNonZeroResult|WP10:1 13 | TotalUniqueTerms|7 -------------------------------------------------------------------------------- /tasks/filesystem/standard/papers/find_math_paper/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | You are given a directory containing multiple paper files. Please help me find a math-related benchmark paper. I don’t remember its name, but I remember it not only checks whether the answer is correct, but also analyzes whether the model suffers from insufficient knowledge, lacks generalization ability, or relies on rote memorization. After finding this paper, rename its corresponding HTML file to `answer.html`. -------------------------------------------------------------------------------- /tasks/github/easy/claude-code/triage_missing_tool_result_issue/description.md: -------------------------------------------------------------------------------- 1 | Use the GitHub MCP tools to triage issue #24 in the `mcpmark-eval/claude-code` repository. 2 | 3 | 1. Read the issue details to understand the reported API error. 4 | 2. Add a triage comment on the issue that explicitly includes all of the following keywords: `invalid_request_error`, `toolu_01Kjp7i9iF3xJ3z9aH4pSaRw`, `tool_result`, `tool_use`. Use them while confirming the API error and asking for the missing result block. 5 | 3. Remove the `area:packaging` label from issue #24. 6 | -------------------------------------------------------------------------------- /tasks/filesystem/easy/papers/papers_counting/description.md: -------------------------------------------------------------------------------- 1 | # File Context Task: Count HTML Files 2 | 3 | ## 📋 Task Description 4 | 5 | You need to count the number of HTML files in the given directory and write the count to a file. 6 | 7 | ## 🎯 Task Objectives 8 | 9 | 1. **Count HTML files** in the given directory 10 | 2. **Create a file** named `count.txt` in the same directory 11 | 3. **Write the count** (just the number) to `count.txt` 12 | 13 | ## 📝 Expected Output 14 | 15 | - File `count.txt` containing only the number of HTML files found 16 | -------------------------------------------------------------------------------- /tasks/filesystem/standard/student_database/duplicate_name/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | Please help me identify duplicate names from the list of all the 150 students. Do not use python code. Then generate a `namesake.txt` file to record the results in the following format, with each group written in three lines: 4 | 5 | name: xxx 6 | count: xxx 7 | ids: xxx, xxx, ... 8 | 9 | Leave one blank line between every two groups. If there are multiple duplicates, just list all corresponding IDs in the third line. 10 | -------------------------------------------------------------------------------- /src/mcp_services/postgres/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | PostgreSQL MCP Service for MCPMark 3 | =================================== 4 | 5 | This module provides PostgreSQL database integration for MCPMark evaluation. 6 | """ 7 | 8 | from .postgres_login_helper import PostgresLoginHelper 9 | from .postgres_state_manager import PostgresStateManager 10 | from .postgres_task_manager import PostgresTaskManager, PostgresTask 11 | 12 | __all__ = [ 13 | "PostgresLoginHelper", 14 | "PostgresStateManager", 15 | "PostgresTaskManager", 16 | "PostgresTask", 17 | ] 18 | -------------------------------------------------------------------------------- /tasks/filesystem/easy/folder_structure/structure_analysis/description.md: -------------------------------------------------------------------------------- 1 | # Directory Structure Analysis Task 2 | 3 | You need to recursively traverse the entire folder structure under the main directory and count the total number of `.py` files in the entire directory (including all subdirectories). 4 | 5 | Write the answer (just a single number) in a file named `structure_analysis.txt` in the main directory (at the same level as the `complex_structure` folder). 6 | 7 | You should not change or delete any existed files. 8 | 9 | Do not try to use python code. 10 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/fitness_promotion_strategy/label.txt: -------------------------------------------------------------------------------- 1 | Bestseller1|Sprite Stasis Ball 65 cm:$27.00:6:24-WG082-blue:100:Enabled 2 | Bestseller2|Quest Lumaflex™ Band:$19.00:6:24-UG01:100:Enabled 3 | Bestseller3|Sprite Yoga Strap 6 foot:$14.00:6:24-WG085:100:Enabled 4 | TotalRevenue|$0.00 5 | BestsellerInSearch|No:0 6 | PercentageDiscountRule|20% OFF Ever $200-plus purchase!*:20% 7 | ActiveRulesCount|4 8 | TotalOrders|308 9 | MostRecentOrderID|000000299 10 | TopCustomer|Sarah Miller:sarah.miller@example.com:General 11 | SameGroupCustomers|70 -------------------------------------------------------------------------------- /src/logger.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """Logger configuration for MCPMark.""" 3 | 4 | import logging 5 | import sys 6 | 7 | 8 | def get_logger(name: str) -> logging.Logger: 9 | """Get a configured logger instance.""" 10 | logger = logging.getLogger(name) 11 | 12 | if not logger.handlers: 13 | handler = logging.StreamHandler(sys.stdout) 14 | formatter = logging.Formatter("%(message)s") 15 | handler.setFormatter(formatter) 16 | logger.addHandler(handler) 17 | logger.setLevel(logging.INFO) 18 | 19 | return logger 20 | -------------------------------------------------------------------------------- /tasks/notion/easy/standard_operating_procedure/simple__section_organization/description.md: -------------------------------------------------------------------------------- 1 | # Task: Reorganize Standard Operating Procedure Page Sections 2 | 3 | ## Objective 4 | Modify the structure of the Standard Operating Procedure page in Notion by updating the order of two sections. 5 | 6 | ## Requirements 7 | - Navigate to the Standard Operating Procedure page 8 | - Swap the positions of the "Terminologies" and "Roles & responsibilities" sections 9 | - Preserve all content within each section exactly as is 10 | - Maintain the original formatting and structure of each section 11 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/movie_reviewer_analysis/label.txt: -------------------------------------------------------------------------------- 1 | - Total_Year_Posts|1 2 | - Top1_Title|Who will win the Oscar for ACTRESS IN A SUPPORTING ROLE? 3 | - Top1_Upvotes|9933 4 | - Top1_Comments|23 5 | - Top2_Title|Who will win the Oscar for FILM EDITING? 6 | - Top2_Upvotes|7720 7 | - Top2_Comments|20 8 | - Top3_Title|Cindy Williams Dies: 'Laverne & Shirley' Star Who Appeared In 'American Graffiti' & 'The Conversation' Was 75 9 | - Top3_Upvotes|5268 10 | - Top3_Comments|190 11 | - Rittenhouse_Upvotes|2761 12 | - Rittenhouse_Comments|182 13 | - Total_Image_Posts_5Pages|6 -------------------------------------------------------------------------------- /tasks/github/easy/claude-code/add_terminal_shortcuts_doc/description.md: -------------------------------------------------------------------------------- 1 | Use the GitHub MCP tools to edit the `mcpmark-eval/claude-code` repository. 2 | 3 | 1. On the `main` branch, add a new file `docs/TERMINAL_SHORTCUTS.md` containing exactly: 4 | 5 | ``` 6 | # Terminal Shortcuts 7 | 8 | - `claude plan`: Outline the next steps before making edits. 9 | - `claude apply`: Run the plan and apply the queued changes. 10 | - `claude check`: Re-run relevant tests or linters to validate the edits. 11 | ``` 12 | 13 | 2. Commit with the message `docs: add terminal shortcuts reference` and push directly to `main`. 14 | -------------------------------------------------------------------------------- /tasks/notion/standard/self_assessment/faq_column_layout/description.md: -------------------------------------------------------------------------------- 1 | Navigate to the "Self Assessment" page and reorganize the content under the FAQ toggle as follows: 2 | 3 | **Task Requirements:** 4 | 1. Add a column list with two columns inside the FAQ toggle 5 | 2. Move the first two existing Q&A pairs from the FAQ to the left column 6 | 3. Move the third existing Q&A pair to the right column 7 | 4. Add one additional Q&A pair in the right column to match the format, so both columns have exactly 2 Q&A pairs 8 | 5. Ensure all Q&A pairs maintain consistent formatting (heading_3 for questions, paragraph for answers) -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/ny_expansion_analysis/label.txt: -------------------------------------------------------------------------------- 1 | Lifetime_Sales_Amount|$0.00 2 | Cheap_Bestseller_Name|Sprite Yoga Strap 6 foot 3 | Second_Bestseller_Price|$14.00 4 | Second_Bestseller_Quantity|6 5 | Product_In_Last_Orders|No 6 | NY_Tax_Rate|8.3750 7 | CA_Tax_Rate|8.2500 8 | Higher_Tax_State|NY 9 | Total_States_With_Tax|2 10 | Processing_Visible_Storefront|Yes 11 | Processing_Default_Status|Yes 12 | Number_Of_Websites|1 13 | Main_Store_Code|main_website_store 14 | Default_Source_Pickup_Status|Enabled 15 | Default_Source_State|No 16 | Dashboard_Revenue|$0.00 17 | Tax_Shipping_Zero|Yes -------------------------------------------------------------------------------- /tasks/github/easy/missing-semester/count_translations/description.md: -------------------------------------------------------------------------------- 1 | Use the GitHub MCP tools to inspect the `mcpmark-eval/missing-semester` repository. 2 | 3 | 1. Navigate the repository to find the list of community translations that appears on the site's home page. 4 | 2. Determine how many translation links are currently listed. 5 | 3. Record both the count and the specific file you used as evidence by creating an `ANSWER.md` file in the repository root that contains exactly: 6 | 7 | ``` 8 | Translation Count: 9 | Source: 10 | ``` 11 | 12 | 4. Commit the new file and push the change to `master`. 13 | -------------------------------------------------------------------------------- /tasks/filesystem/easy/file_property/txt_merging/description.md: -------------------------------------------------------------------------------- 1 | # Text File Merging Task 2 | 3 | ## 📋 Task Description 4 | 5 | Merge all `.txt` files in the test directory into a single file called `merge.txt`. The merged file should contain the content from all `.txt` files. 6 | 7 | ## 🎯 Task Objectives 8 | 9 | 1. **Read all `.txt` files** in the test directory 10 | 2. **Create a new file** called `merge.txt` in the test directory 11 | 3. **Write the content** from all `.txt` files into `merge.txt` 12 | 4. **The order** of content doesn't matter - as long as all content from all `.txt` files is present in `merge.txt` 13 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/running_shoes_purchase/description.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 1. Find running shoes: 4 | - Price between $50.00-$60.00 5 | - "running shoe" must appear in product name 6 | - Choose the one with highest number of reviews 7 | - Select black or white color, size 10 8 | - Add to cart with quantity 2 9 | 10 | 2. Record from product page: SKU ID, price, number of reviews, review rating 11 | 12 | 3. Record cart subtotal 13 | 14 | **Output Format:** 15 | 16 | ``` 17 | 18 | SKUID|id 19 | Price|$XX.XX 20 | NumberOfReviews|XX 21 | ReviewRating|XX% 22 | Subtotal|$XX.XX 23 | 24 | ``` 25 | 26 | -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_context/file_splitting/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | You need to split a large text file into multiple smaller files with equal character counts. The task involves creating a new directory and splitting the content into exactly 10 files. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Create a new directory** named `split` in the test directory 10 | 2. **Split the file** `large_file.txt` into exactly 10 files with equal character counts 11 | 3. **Name the files** as `split_01.txt`, `split_02.txt`, ..., `split_10.txt` in the `split` directory 12 | -------------------------------------------------------------------------------- /src/mcp_services/github/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | GitHub MCP Service for MCPMark 3 | =============================== 4 | 5 | This module provides GitHub-specific MCP server integration for MCPMark evaluation. 6 | Uses GitHub's official remote MCP server for streamable HTTP/SSE communication. 7 | 8 | Updated to include initial state-based environment replication mechanism. 9 | """ 10 | 11 | from .github_login_helper import GitHubLoginHelper 12 | from .github_task_manager import GitHubTaskManager, GitHubTask 13 | from .github_state_manager import GitHubStateManager 14 | 15 | __all__ = ["GitHubLoginHelper", "GitHubTaskManager", "GitHubTask", "GitHubStateManager"] 16 | -------------------------------------------------------------------------------- /src/mcp_services/filesystem/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Filesystem MCP Service for MCPMark 3 | =================================== 4 | 5 | This module provides filesystem-specific MCP server integration for MCPMark evaluation. 6 | Uses the official filesystem MCP server for local file operations. 7 | """ 8 | 9 | from .filesystem_login_helper import FilesystemLoginHelper 10 | from .filesystem_state_manager import FilesystemStateManager 11 | from .filesystem_task_manager import FilesystemTaskManager, FilesystemTask 12 | 13 | __all__ = [ 14 | "FilesystemLoginHelper", 15 | "FilesystemStateManager", 16 | "FilesystemTaskManager", 17 | "FilesystemTask", 18 | ] 19 | -------------------------------------------------------------------------------- /tasks/filesystem/easy/file_context/file_splitting/description.md: -------------------------------------------------------------------------------- 1 | # File Splitting Task 2 | 3 | ## 📋 Task Description 4 | 5 | You need to split a large text file into multiple smaller files with equal character counts. The task involves creating a new directory and splitting the content into exactly 3 files. 6 | 7 | ## 🎯 Task Objectives 8 | 9 | 1. **Create a new directory** named `split` in the test directory 10 | 2. **Split the file** `large_file.txt` into exactly 3 files with **similar** character counts (maximum character difference of 100 between any two files) 11 | 3. **Name the files** as `split_01.txt`, `split_02.txt`, `split_03.txt` in the `split` directory 12 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/llm_research_summary/label.txt: -------------------------------------------------------------------------------- 1 | - Total_LLM_Posts|9 2 | - Top1_Title|[P] I made a command-line tool that explains your errors using ChatGPT (link in comments) 3 | - Top1_Upvotes|2655 4 | - Top1_Date|3 years ago 5 | - Top2_Title|[P] I built Adrenaline, a debugger that fixes errors and explains them with GPT-3 6 | - Top2_Upvotes|1542 7 | - Top2_Date|3 years ago 8 | - Top3_Title|[N] OpenAI may have benchmarked GPT-4's coding ability on it's own training data 9 | - Top3_Upvotes|925 10 | - Top3_Date|2 years ago 11 | - Deeplearning_MostDiscussed|Do companies actually care about their model's training/inference speed? 12 | - Deeplearning_Comments|39 -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/buyitforlife_research/label.txt: -------------------------------------------------------------------------------- 1 | - Post1_Title|Hand me down name tag trail on this child's jacket 2 | - Post1_Upvotes|14487 3 | - Post1_Comments|163 4 | - Post2_Title|My Grandmother's oven, circa 1966. Many holiday meals were cooked with love here, right up until the day she passed. Aside from one shattered glass door that was replaced, everything still works! 5 | - Post2_Upvotes|6413 6 | - Post2_Comments|205 7 | - Post3_Title|1956 Frigidaire Range 8 | - Post3_Upvotes|5797 9 | - Post3_Comments|190 10 | - TopComment_Text|Rei does this too 11 | - TopComment_Username|ATeaformeplease 12 | - Post1_Author|Mofomania 13 | - Post2_Author|_Mr_Roboto_ 14 | - Post3_Author|dezualy -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_context/file_merging/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | You are given a directory containing multiple text files of varying sizes. Your task is to identify the 10 smallest .txt files, merge their content in alphabetical order, and create a consolidated file called "merged_content.txt" with proper formatting. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Identify the 10 smallest .txt files** in the test directory 10 | 2. **Sort the selected files alphabetically** by filename 11 | 3. **Merge the content** of these files into a single file 12 | 4. **Add file headers** (file name) before each file's content 13 | -------------------------------------------------------------------------------- /tasks/github/standard/harmony/fix_conflict/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "fix_conflict", 3 | "task_name": "Fix Conflict", 4 | "category_id": "harmony", 5 | "category_name": "Harmony", 6 | "description": "Resolve merge conflicts by creating missing infrastructure and ensuring conflicted PRs can merge cleanly.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "pr workflows" 12 | ], 13 | "mcp": [ 14 | "github" 15 | ], 16 | "meta_data": { 17 | "stateType": "url", 18 | "stateContent": null, 19 | "stateUrl": "https://github.com/mcpmark-source/harmony", 20 | "stateOriginalUrl": "https://github.com/openai/harmony" 21 | } 22 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/threestudio/requirements_completion/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | The `requirements.txt` file in the ThreeStudio project is used to install necessary Python libraries. However, the Zero123-related dependencies were accidentally deleted from the file. Your task is to restore these missing dependencies. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Locate the requirements.txt file** in the test environment 10 | 2. **Identify the missing Zero123 dependencies** that need to be restored 11 | 3. **Add the required dependencies** to the requirements.txt file 12 | 4. **Ensure the file format is correct** (one dependency per line) 13 | -------------------------------------------------------------------------------- /tasks/github/easy/mcpmark-cicd/nightly_health_check/description.md: -------------------------------------------------------------------------------- 1 | Use the GitHub MCP tools to add a tiny bit of automation to `mcpmark-eval/mcpmark-cicd`. 2 | 3 | Goal: every night the repo should run the existing health check script. 4 | 5 | Do the usual branch/PR flow with a branch named `nightly-health` and a PR titled `Add nightly health check`. 6 | 7 | Create `.github/workflows/nightly-health.yml` with: 8 | - workflow name `Nightly Health Check` 9 | - triggers: `workflow_dispatch` plus a cron schedule `0 2 * * *` 10 | - one job called `health-check` on `ubuntu-latest` 11 | - use Node.js 18 via `actions/setup-node` 12 | - steps in order: checkout, npm ci, `npm run health-check` 13 | 14 | Merge the PR so the workflow lives on `main`. 15 | -------------------------------------------------------------------------------- /tasks/github/easy/mcpmark-cicd/issue_lint_guard/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "issue_lint_guard", 3 | "task_name": "Issue Lint Guard", 4 | "category_id": "mcpmark-cicd", 5 | "category_name": "MCPMark CI/CD (Easy)", 6 | "description": "Add an issue-triggered lint workflow and prove it runs when the tracking issue is opened.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "ci/cd", 12 | "github actions", 13 | "issues" 14 | ], 15 | "mcp": [ 16 | "github" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://github.com/mcpmark-source/mcpmark-cicd", 22 | "stateOriginalUrl": null 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/github/standard/easyr1/qwen3_issue_management/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "qwen3_issue_management", 3 | "task_name": "Qwen3 Issue Management", 4 | "category_id": "easyr1", 5 | "category_name": "EasyR1", 6 | "description": "Find and reopen all closed Qwen3-related issues with proper tagging for continued tracking and investigation.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "issue management" 12 | ], 13 | "mcp": [ 14 | "github" 15 | ], 16 | "meta_data": { 17 | "stateType": "url", 18 | "stateContent": null, 19 | "stateUrl": "https://github.com/mcpmark-source/EasyR1", 20 | "stateOriginalUrl": "https://github.com/hiyouga/EasyR1" 21 | } 22 | } -------------------------------------------------------------------------------- /tasks/github/standard/missing-semester/assign_contributor_labels/description.md: -------------------------------------------------------------------------------- 1 | Assign assignees for each open issue and open PR by adding labels instead of using direct assignees. Only contributors who appeared in the past 100 commits are considered. First, collect all such contributors and identify the most frequent author among them. For each open issue or PR, assign using labels according to the following rules: 2 | • If the comments mention an author with @username, add a label in the format assigned-username. 3 | • If multiple authors are mentioned, add labels in the same format for all of them. 4 | • If no authors are mentioned in the comments, add a label for the most frequent contributor from the past 100 commits, using the format assigned-username. -------------------------------------------------------------------------------- /tasks/github/easy/mcpmark-cicd/nightly_health_check/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "nightly_health_check", 3 | "task_name": "Nightly Health Check", 4 | "category_id": "mcpmark-cicd", 5 | "category_name": "MCPMark CI/CD (Easy)", 6 | "description": "Add a scheduled workflow that runs the npm health check script every night.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "ci/cd", 12 | "github actions", 13 | "scheduling" 14 | ], 15 | "mcp": [ 16 | "github" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://github.com/mcpmark-source/mcpmark-cicd", 22 | "stateOriginalUrl": null 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/github/standard/mcpmark-cicd/linting_ci_workflow/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "linting_ci_workflow", 3 | "task_name": "Linting Ci Workflow", 4 | "category_id": "mcpmark-cicd", 5 | "category_name": "MCPMark CI/CD", 6 | "description": "Set up ESLint workflow for code quality enforcement on all pull requests with proper CI integration.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "ci/cd automation", 12 | "pr workflows" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/mcpmark-cicd", 21 | "stateOriginalUrl": null 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/playwright/standard/web_search/birth_of_arvinxu/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "birth_of_arvinxu", 3 | "task_name": "Birth Of Arvinxu", 4 | "category_id": "web_search", 5 | "category_name": "Web Search", 6 | "description": "Search for biographical information about X profile arvin17x across multiple web sources, extract birth year data, verify information accuracy, and compile findings.", 7 | "author": "Arvin Xu", 8 | "created_at": "2025-08-18", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "search aggregation", 12 | "data extraction" 13 | ], 14 | "mcp": [ 15 | "playwright" 16 | ], 17 | "meta_data": { 18 | "stateType": null, 19 | "stateContent": null, 20 | "stateUrl": null, 21 | "stateOriginalUrl": null 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/github/standard/build_your_own_x/find_commit_date/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "find_commit_date", 3 | "task_name": "Find Commit Date", 4 | "category_id": "build_your_own_x", 5 | "category_name": "Build Your Own X", 6 | "description": "Find when Voxel Engine entries were first created by Daniel Stefanovic and document the date.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "repository analysis" 12 | ], 13 | "mcp": [ 14 | "github" 15 | ], 16 | "meta_data": { 17 | "stateType": "url", 18 | "stateContent": null, 19 | "stateUrl": "https://github.com/mcpmark-source/build-your-own-x", 20 | "stateOriginalUrl": "https://github.com/codecrafters-io/build-your-own-x" 21 | } 22 | } -------------------------------------------------------------------------------- /tasks/github/standard/build_your_own_x/find_rag_commit/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "find_rag_commit", 3 | "task_name": "Find Rag Commit", 4 | "category_id": "build_your_own_x", 5 | "category_name": "Build Your Own X", 6 | "description": "Identify the specific commit SHA that added the RAG for Document Search entry to the repository.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "repository analysis" 12 | ], 13 | "mcp": [ 14 | "github" 15 | ], 16 | "meta_data": { 17 | "stateType": "url", 18 | "stateContent": null, 19 | "stateUrl": "https://github.com/mcpmark-source/build-your-own-x", 20 | "stateOriginalUrl": "https://github.com/codecrafters-io/build-your-own-x" 21 | } 22 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/printer_keyboard_search/description.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 1. Search for a `printer capable of reducing blue light` that: 4 | - Is pink or purple (must be stated in product details, not from image) 5 | - Manufactured in Asia 6 | Record SKU ID and price 7 | 8 | 2. Find a keyboard with: 9 | - Bluetooth mode (must be stated either stated in details or title) 10 | - Price between $50.00-$100.00 11 | - Highest review rating among matching products 12 | Record SKU ID, price, number of reviews, and review rating 13 | 14 | **Output Format:** 15 | 16 | ``` 17 | 18 | PrinterSKUID|id 19 | PrinterPrice|$XX.XX 20 | KeyboardSKUID|id 21 | KeyboardPrice|$XX.XX 22 | KeyboardReviews|XX 23 | KeyboardRating|XX% 24 | 25 | ``` 26 | 27 | -------------------------------------------------------------------------------- /tasks/github/standard/missing-semester/find_legacy_name/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "find_legacy_name", 3 | "task_name": "Find Legacy Name", 4 | "category_id": "missing-semester", 5 | "category_name": "Missing Semester", 6 | "description": "Find the old name and domain of The Missing Semester course from commit history and document the findings.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "repository analysis" 12 | ], 13 | "mcp": [ 14 | "github" 15 | ], 16 | "meta_data": { 17 | "stateType": "url", 18 | "stateContent": null, 19 | "stateUrl": "https://github.com/mcpmark-source/missing-semester", 20 | "stateOriginalUrl": "https://github.com/missing-semester/missing-semester" 21 | } 22 | } -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/2_feature_request.yml: -------------------------------------------------------------------------------- 1 | name: '🌠 Feature Request' 2 | description: 'Suggest an idea' 3 | title: '[Request] ' 4 | type: Feature 5 | body: 6 | - type: textarea 7 | attributes: 8 | label: '🥰 Feature Description' 9 | description: Please add a clear and concise description of the problem you are seeking to solve with this feature request. 10 | validations: 11 | required: true 12 | - type: textarea 13 | attributes: 14 | label: '🧐 Proposed Solution' 15 | description: Describe the solution you'd like in a clear and concise manner. 16 | validations: 17 | required: true 18 | - type: textarea 19 | attributes: 20 | label: '📝 Additional Information' 21 | description: Add any other context about the problem here. 22 | -------------------------------------------------------------------------------- /tasks/github/easy/claude-code/thank_docker_pr_author/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "thank_docker_pr_author", 3 | "task_name": "Thank Docker PR Author", 4 | "category_id": "claude-code", 5 | "category_name": "Claude Code (Easy)", 6 | "description": "Leave a thank-you comment on the Docker automation PR mentioning the workflow automation review keywords.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "pull request", 12 | "comment" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/claude-code", 21 | "stateOriginalUrl": "https://github.com/anthropics/claude-code" 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/github/standard/easyr1/config_parameter_audit/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "config_parameter_audit", 3 | "task_name": "Config Parameter Audit", 4 | "category_id": "easyr1", 5 | "category_name": "EasyR1", 6 | "description": "Investigate configuration changes causing training instability by analyzing commits and identifying related memory issues.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "repository analysis", 12 | "issue management" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/EasyR1", 21 | "stateOriginalUrl": "https://github.com/hiyouga/EasyR1" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/github/standard/easyr1/advanced_branch_strategy/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "advanced_branch_strategy", 3 | "task_name": "Advanced Branch Strategy", 4 | "category_id": "easyr1", 5 | "category_name": "EasyR1", 6 | "description": "Implement GitFlow branching strategy with develop, release, and feature branches to replace risky direct-to-main development.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "pr workflows", 12 | "release coordination" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/EasyR1", 21 | "stateOriginalUrl": "https://github.com/hiyouga/EasyR1" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/github/standard/mcpmark-cicd/issue_management_workflow/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "issue_management_workflow", 3 | "task_name": "Issue Management Workflow", 4 | "category_id": "mcpmark-cicd", 5 | "category_name": "MCPMark CI/CD", 6 | "description": "Build intelligent issue management automation with auto-triage, task breakdown for epics, and first-time contributor handling.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "issue management", 12 | "workflow automation" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/mcpmark-cicd", 21 | "stateOriginalUrl": null 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/student_database/english_talent/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | We are now recruiting students proficient in English to be responsible for the school’s English media operations. To contact with students, from the total of 150 students, select those who **meet both of the following criteria** : 4 | 5 | 1. Rated ****S** or** ****A** grade level in** `recommendation_letter.txt` by their teachers. 6 | 2. TOEFL score in the basic info is **higher than or equal to 100** . 7 | 8 | Please compile all their names, ids and emails into a `qualified_students.txt` file, with the format: 9 | 10 | name: xxx 11 | id: xxx 12 | email: xxx 13 | 14 | Each person’s information should occupy three lines, with one blank line between each block. 15 | -------------------------------------------------------------------------------- /tasks/github/easy/build-your-own-x/record_recent_commits/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "record_recent_commits", 3 | "task_name": "Record Recent Commits", 4 | "category_id": "build-your-own-x", 5 | "category_name": "Build Your Own X (Easy)", 6 | "description": "Summarize the latest five commits by opening an issue with their SHAs, authors, and subjects.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "commits", 12 | "issue" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/build-your-own-x", 21 | "stateOriginalUrl": "https://github.com/codecrafters-io/build-your-own-x" 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/github/easy/claude-code/triage_missing_tool_result_issue/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "triage_missing_tool_result_issue", 3 | "task_name": "Triage Missing Tool Result Issue", 4 | "category_id": "claude-code", 5 | "category_name": "Claude Code (Easy)", 6 | "description": "Leave a predefined triage comment on issue #24 and remove the area:packaging label.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "issue triage", 12 | "github" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/claude-code", 21 | "stateOriginalUrl": "https://github.com/anthropics/claude-code" 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/github/easy/mcpmark-cicd/basic_ci_checks/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "basic_ci_checks", 3 | "task_name": "Basic CI Checks", 4 | "category_id": "mcpmark-cicd", 5 | "category_name": "MCPMark CI/CD (Easy)", 6 | "description": "Add a lightweight GitHub Actions workflow that runs npm ci, npm run lint, and npm test whenever main is updated or receives a pull request.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "ci/cd", 12 | "github actions", 13 | "workflow basics" 14 | ], 15 | "mcp": [ 16 | "github" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://github.com/mcpmark-source/mcpmark-cicd", 22 | "stateOriginalUrl": null 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_property/size_classification/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | Classify all files in the test directory into three categories based on their file size. Create three subdirectories and move files accordingly. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Create three directories** in the test directory: 10 | 11 | - `small_files/` - for files smaller than 300 bytes 12 | - `medium_files/` - for files between 300-700 bytes (inclusive) 13 | - `large_files/` - for files larger than 700 bytes 14 | 2. **Move all files** from the test directory into the appropriate subdirectory based on their size 15 | 3. **Handle all file types** - classify all files regardless of their extension (.txt, .jpg, .MOV, etc.) 16 | -------------------------------------------------------------------------------- /tasks/github/easy/claude-code/add_terminal_shortcuts_doc/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "add_terminal_shortcuts_doc", 3 | "task_name": "Add Terminal Shortcuts Doc", 4 | "category_id": "claude-code", 5 | "category_name": "Claude Code (Easy)", 6 | "description": "Add a simple terminal shortcuts reference file to docs/TERMINAL_SHORTCUTS.md and push it to main.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "docs update", 12 | "content creation" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/claude-code", 21 | "stateOriginalUrl": "https://github.com/anthropics/claude-code" 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/github/standard/harmony/issue_pr_commit_workflow/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "issue_pr_commit_workflow", 3 | "task_name": "Issue Pr Commit Workflow", 4 | "category_id": "harmony", 5 | "category_name": "Harmony", 6 | "description": "Implement complete bug tracking workflow demonstrating proper cross-referencing between issues, PRs, and commits for race condition fixes.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "issue management", 12 | "pr workflows" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/harmony", 21 | "stateOriginalUrl": "https://github.com/openai/harmony" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/github/standard/harmony/issue_tagging_pr_closure/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "issue_tagging_pr_closure", 3 | "task_name": "Issue Tagging Pr Closure", 4 | "category_id": "harmony", 5 | "category_name": "Harmony", 6 | "description": "Simulate development workflow where enhancement PR is closed without merging due to technical constraints discovered during implementation.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "issue management", 12 | "pr workflows" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/harmony", 21 | "stateOriginalUrl": "https://github.com/openai/harmony" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/github/standard/harmony/release_management_workflow/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "release_management_workflow", 3 | "task_name": "Release Management Workflow", 4 | "category_id": "harmony", 5 | "category_name": "Harmony", 6 | "description": "Implement comprehensive release management workflow including bug fixes, version updates, changelog creation, and PR merging.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "release coordination", 12 | "pr workflows" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/harmony", 21 | "stateOriginalUrl": "https://github.com/openai/harmony" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/github/standard/mcpmark-cicd/deployment_status_workflow/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "deployment_status_workflow", 3 | "task_name": "Deployment Status Workflow", 4 | "category_id": "mcpmark-cicd", 5 | "category_name": "MCPMark CI/CD", 6 | "description": "Create deployment status workflow with pre-deployment checks, rollback preparation, and comprehensive issue tracking for deployments.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "ci/cd automation", 12 | "workflow automation" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/mcpmark-cicd", 21 | "stateOriginalUrl": null 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/github/standard/mcpmark-cicd/pr_automation_workflow/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "pr_automation_workflow", 3 | "task_name": "Pr Automation Workflow", 4 | "category_id": "mcpmark-cicd", 5 | "category_name": "MCPMark CI/CD", 6 | "description": "Create comprehensive PR automation with parallel jobs for code quality, testing, security scanning, and build validation.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "pr workflows", 12 | "ci/cd automation", 13 | "workflow automation" 14 | ], 15 | "mcp": [ 16 | "github" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://github.com/mcpmark-source/mcpmark-cicd", 22 | "stateOriginalUrl": null 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/github/easy/build-your-own-x/close_commented_issues/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "close_commented_issues", 3 | "task_name": "Close Commented Issues", 4 | "category_id": "build-your-own-x", 5 | "category_name": "Build Your Own X (Easy)", 6 | "description": "Use GitHub MCP tools to close every issue with comments in build-your-own-x and leave everything else alone.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "issue management" 12 | ], 13 | "mcp": [ 14 | "github" 15 | ], 16 | "meta_data": { 17 | "stateType": "url", 18 | "stateContent": null, 19 | "stateUrl": "https://github.com/mcpmark-source/build-your-own-x", 20 | "stateOriginalUrl": "https://github.com/codecrafters-io/build-your-own-x" 21 | } 22 | } 23 | -------------------------------------------------------------------------------- /tasks/postgres/standard/vectors/dba_vector_analysis/prepare_environment.py: -------------------------------------------------------------------------------- 1 | """ 2 | Environment preparation script for Vector Database DBA Analysis task. 3 | 4 | This script imports and uses the shared vector database setup utilities. 5 | """ 6 | 7 | import sys 8 | import logging 9 | from pathlib import Path 10 | 11 | # Add the vectors directory to import the shared utilities 12 | sys.path.append(str(Path(__file__).resolve().parents[1])) 13 | 14 | from vectors_setup import prepare_vector_environment 15 | 16 | logger = logging.getLogger(__name__) 17 | 18 | 19 | def prepare_environment(): 20 | """Main function to prepare the vector database environment.""" 21 | prepare_vector_environment() 22 | 23 | 24 | if __name__ == "__main__": 25 | logging.basicConfig(level=logging.INFO) 26 | prepare_environment() -------------------------------------------------------------------------------- /tasks/postgres/easy/dvdrental/create_payment_index/description.md: -------------------------------------------------------------------------------- 1 | Create an index to optimize customer payment queries in the DVD rental database. 2 | 3 | ## Your Task: 4 | 5 | Create an index on the `customer_id` column of the `payment` table to improve query performance. 6 | 7 | ## Requirements: 8 | 9 | - Create an index on the `payment` table's `customer_id` column 10 | - The index name can be anything you choose (e.g., `idx_payment_customer_id`) 11 | - Use the standard CREATE INDEX syntax 12 | 13 | ## Why This Helps: 14 | 15 | The `customer_id` column is frequently used in: 16 | - JOIN operations between customer and payment tables 17 | - WHERE clauses filtering by customer 18 | - Subqueries that look up payments for specific customers 19 | 20 | Adding an index will significantly speed up these operations. 21 | 22 | -------------------------------------------------------------------------------- /tasks/playwright/standard/web_search/r1_arxiv/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "r1_arxiv", 3 | "task_name": "R1 Arxiv", 4 | "category_id": "web_search", 5 | "category_name": "Web Search", 6 | "description": "Search arXiv for R1 model research papers, extract technical specifications, analyze methodology sections, compile research findings, and generate comprehensive literature review.", 7 | "author": "Arvin Xu", 8 | "created_at": "2025-08-18", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "search aggregation", 12 | "data extraction", 13 | "comparative analysis", 14 | "content submission" 15 | ], 16 | "mcp": [ 17 | "playwright" 18 | ], 19 | "meta_data": { 20 | "stateType": null, 21 | "stateContent": null, 22 | "stateUrl": null, 23 | "stateOriginalUrl": null 24 | } 25 | } 26 | -------------------------------------------------------------------------------- /tasks/github/standard/claude-code/claude_collaboration_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "claude_collaboration_analysis", 3 | "task_name": "Claude Collaboration Analysis", 4 | "category_id": "claude-code", 5 | "category_name": "Claude Code", 6 | "description": "Analyze Claude AI collaboration patterns in commit history and create a comprehensive report of co-authored commits and top collaborators.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "repository analysis" 12 | ], 13 | "mcp": [ 14 | "github" 15 | ], 16 | "meta_data": { 17 | "stateType": "url", 18 | "stateContent": null, 19 | "stateUrl": "https://github.com/mcpmark-source/claude-code", 20 | "stateOriginalUrl": "https://github.com/anthropics/claude-code" 21 | } 22 | } -------------------------------------------------------------------------------- /tasks/filesystem/easy/file_context/pattern_matching/description.md: -------------------------------------------------------------------------------- 1 | # File Filtering Task: Find Files with Common Substring 2 | 3 | ## 📋 Task Description 4 | 5 | Your task is to find all files that contain a substring of 30 or more characters that also appears in `large_file.txt`. **You are not allowed to use python code.** 6 | 7 | ## 🎯 Task Objectives 8 | 9 | 1. **Read the reference file** `large_file.txt` to understand its content 10 | 2. **Examine each file** from file_01.txt to file_20.txt 11 | 3. **Find files** that contain a substring of 30 or more characters that matches a substring in `large_file.txt` 12 | 4. **Create a file `answer.txt`** and write the results to it with the following format: 13 | - One line per matching file 14 | - Format: `filename.txt` 15 | - Do not add any things else other than `filename.txt.` 16 | -------------------------------------------------------------------------------- /tasks/github/standard/claude-code/feature_commit_tracking/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "feature_commit_tracking", 3 | "task_name": "Feature Commit Tracking", 4 | "category_id": "claude-code", 5 | "category_name": "Claude Code", 6 | "description": "Research development history across branches to track when specific features were introduced and create comprehensive documentation.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "repository analysis", 12 | "release coordination" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/claude-code", 21 | "stateOriginalUrl": "https://github.com/anthropics/claude-code" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/github/standard/claude-code/label_color_standardization/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "label_color_standardization", 3 | "task_name": "Label Color Standardization", 4 | "category_id": "claude-code", 5 | "category_name": "Claude Code", 6 | "description": "Standardize label colors from default gray to a comprehensive color scheme for better visual organization and issue triage.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "issue management", 12 | "workflow automation" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/claude-code", 21 | "stateOriginalUrl": "https://github.com/anthropics/claude-code" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/threestudio/output_analysis/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | ThreeStudio is a comprehensive codebase that implements various diffusion-based text-to-3D models, including NeRF-based rendering stage and diffusion guidance stage. Your task is to explore the codebase and identify the specific file that defines the guidance functionality for the Zero123 model. 6 | 7 | ### Task 8 | 9 | What is the output of `guidance_out`, returned by the code at line 137 in `threestudio/systems/zero123.py`? 10 | 11 | Clearly state the structure of it and where you find the answer (file and line numbers).Write your answer in a file named `answer.txt` in the test directory root. Do not add extra explanation or formatting beyond what is required by the task. 12 | -------------------------------------------------------------------------------- /tasks/github/standard/harmony/multi_branch_commit_aggregation/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "multi_branch_commit_aggregation", 3 | "task_name": "Multi Branch Commit Aggregation", 4 | "category_id": "harmony", 5 | "category_name": "Harmony", 6 | "description": "Generate comprehensive commit history report by aggregating changes from multiple branches with contributor analysis and merge timeline.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "repository analysis", 12 | "release coordination" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/harmony", 21 | "stateOriginalUrl": "https://github.com/openai/harmony" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/github/easy/missing-semester/count_translations/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "count_translations", 3 | "task_name": "Count Translations", 4 | "category_id": "missing-semester", 5 | "category_name": "Missing Semester (Easy)", 6 | "description": "Use GitHub MCP to count the translations listed on the home page, record the value in ANSWER.md, and push the change to master.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "content search", 12 | "answer file" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/missing-semester", 21 | "stateOriginalUrl": "https://github.com/missing-semester/missing-semester" 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/github/standard/claude-code/critical_issue_hotfix_workflow/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "critical_issue_hotfix_workflow", 3 | "task_name": "Critical Issue Hotfix Workflow", 4 | "category_id": "claude-code", 5 | "category_name": "Claude Code", 6 | "description": "Implement a critical issue hotfix workflow for memory and context management issues with proper PR management and issue tracking.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "issue management", 12 | "pr workflows" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/claude-code", 21 | "stateOriginalUrl": "https://github.com/anthropics/claude-code" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/github/standard/easyr1/performance_regression_investigation/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "performance_regression_investigation", 3 | "task_name": "Performance Regression Investigation", 4 | "category_id": "easyr1", 5 | "category_name": "EasyR1", 6 | "description": "Create systematic investigation workflow for performance regressions with tracking issues, investigation branches, and sub-issues.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "issue management", 12 | "repository analysis" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/EasyR1", 21 | "stateOriginalUrl": "https://github.com/hiyouga/EasyR1" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/notion/standard/team_projects/swap_tasks/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "swap_tasks", 3 | "task_name": "Swap Tasks", 4 | "category_id": "team_projects", 5 | "category_name": "Team Projects", 6 | "description": "Find the person responsible for the most and fewest tasks, then swap their assigned tasks.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-08-12", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data aggregation", 12 | "automated migration", 13 | "conditional filtering" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Team-Projects-24e81626b6d7809c982fdb7a25825898", 22 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/gantt-chart" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/playwright/standard/eval_web/extraction_table/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "extraction_table", 3 | "task_name": "Extraction Table", 4 | "category_id": "eval_web", 5 | "category_name": "Eval Web", 6 | "description": "Extract structured data from complex web tables, parse multi-level headers, handle dynamic content loading, transform data formats, and export comprehensive datasets.", 7 | "author": "Arvin Xu", 8 | "created_at": "2025-08-18", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data extraction" 12 | ], 13 | "mcp": [ 14 | "playwright" 15 | ], 16 | "meta_data": { 17 | "stateType": "video", 18 | "stateContent": null, 19 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/mcpmark-eval-website.mp4", 20 | "stateOriginalUrl": "https://eval-web.mcpmark.ai/extraction" 21 | } 22 | } 23 | -------------------------------------------------------------------------------- /tasks/notion/standard/toronto_guide/change_color/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "change_color", 3 | "task_name": "Change Color", 4 | "category_id": "toronto_guide", 5 | "category_name": "Toronto Guide", 6 | "description": "Navigate to the Toronto Guide page and change all pink-colored elements to different colors.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-08-14", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "visual formatting", 12 | "conditional filtering" 13 | ], 14 | "mcp": [ 15 | "notion" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://painted-tennis-ebc.notion.site/Toronto-Guide-25281626b6d7802caa7cc394647e901c", 21 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/conquering-toronto-a-destination-guide" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/routine_tracker_forum/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "routine_tracker_forum_easy", 3 | "task_name": "Routine Tracker Forum (Easy)", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Repost the highest-rated LifeProTips calendar advice under a new routine-tracking thread.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "user interaction", 12 | "content submission" 13 | ], 14 | "mcp": [ 15 | "playwright" 16 | ], 17 | "meta_data": { 18 | "stateType": "video", 19 | "stateContent": null, 20 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 21 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/github/standard/claude-code/automated_changelog_generation/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "automated_changelog_generation", 3 | "task_name": "Automated Changelog Generation", 4 | "category_id": "claude-code", 5 | "category_name": "Claude Code", 6 | "description": "Analyze closed issues and open PRs to generate comprehensive documentation including changelog, migration guide, and analysis reports.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "release coordination", 12 | "workflow automation" 13 | ], 14 | "mcp": [ 15 | "github" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://github.com/mcpmark-source/claude-code", 21 | "stateOriginalUrl": "https://github.com/anthropics/claude-code" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/filesystem/easy/legal_document/file_reorganize/description.md: -------------------------------------------------------------------------------- 1 | # Legal Document File Reorganization Task 2 | 3 | **Overview** 4 | 5 | The folder "legal_files/" contains multiple versions of the Stock Purchase Agreement (Preferred_Stock_Purchase_Agreement_v0.txt through Preferred_Stock_Purchase_Agreement_v10.txt). 6 | 7 | ## Task 8 | 9 | Your task is to: 10 | 11 | 1. Identify the final version of the document among the different versions 12 | 2. Create a folder named `final_version` inside the `legal_files/` directory 13 | 3. Create an **empty file** with the same name as the final version in the newly created `final_version/` folder 14 | 4. Keep the original file in its original location 15 | 16 | Note: Due to the large file size, you only need to create an empty file (not copy the content). The filename should remain unchanged in the `final_version/` folder. 17 | -------------------------------------------------------------------------------- /tasks/filesystem/standard/votenet/debugging/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | There is a bug in the VoteNet backbone module that needs to be identified and fixed. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Examine the codebase** using filesystem MCP tools 10 | 2. **Identify the bug** inside the hole process 11 | 3. **Fix the bug** in the code 12 | 4. **Create an answer file** with the bug location 13 | 14 | ### Expected Output 15 | 16 | 1. **Fix the bug** in the code file directly 17 | 2. **Create `answer.txt`** in the test directory root with the format: `path` 18 | 19 | **Requirements:** 20 | 21 | - Only include the bug's file path in answer.txt 22 | - No additional text or explanation 23 | 24 | ### Hint 25 | 26 | **The bug is not in demo.py**, please look deeper inside the codebase. 27 | -------------------------------------------------------------------------------- /tasks/notion/easy/team_projects/simple__swap_tasks/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "simple__swap_tasks", 3 | "task_name": "Simple Swap Tasks", 4 | "category_id": "team_projects", 5 | "category_name": "Team Projects", 6 | "description": "Find the person responsible for the most and fewest tasks, then swap their assigned tasks.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "data aggregation", 12 | "automated migration", 13 | "conditional filtering" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Team-Projects-24e81626b6d7809c982fdb7a25825898", 22 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/gantt-chart" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/notion/standard/japan_travel_planner/packing_progress_summary/description.md: -------------------------------------------------------------------------------- 1 | I'm preparing for my Japan trip and need to organize my packing list. Please help me: 2 | 3 | **Step 1: Update Items in the Packing List Database** 4 | In the Clothes category, all items have already been packed except for the hat After this, check the `SIM Card` entry and the `Wallet` entry. 5 | 6 | **Step 2: Create Packing Progress Summary** 7 | After adding the items, create a new section in the main Japan Travel Planner page immediately after the "Packing List 💼" heading. This section should contain: 8 | 9 | 1. A paragraph block with the bold text "**Packing Progress Summary**" 10 | 2. Followed by bullet list items showing statistics for each category in the format: 11 | - "Category: X/Y packed" (where X is packed items, Y is total items), for example: "Shoes: 2/10 packed" 12 | - ... -------------------------------------------------------------------------------- /tasks/filesystem/standard/folder_structure/structure_mirror/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task 4 | 5 | Copy the entire directory structure of `complex_structure/` to `complex_structure_mirror/` without copying any file contents. Do not use python code. 6 | 7 | ### Requirements 8 | 9 | - Create the entire directory structure in `complex_structure_mirror/` 10 | - Do not copy any file contents, only create directories 11 | - In each empty directory, create a `placeholder.txt` file containing the absolute path of that directory 12 | - Handle nested directories of any depth 13 | - You should also follow 2 rules: 14 | 1. **Discard any directory that directly contains more than 2 files (only count the immediate folder).** 15 | 2. **If a directory name contains numbers, append "_processed" to the mirror directory name** 16 | -------------------------------------------------------------------------------- /tasks/github/standard/missing-semester/find_salient_file/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "find_salient_file", 3 | "task_name": "Find Salient File", 4 | "category_id": "missing-semester", 5 | "category_name": "Missing Semester", 6 | "description": "Identify the most frequently modified file in the past 100 commits, excluding GitHub Actions related files, and create an ANSWER.md with the file name.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "commit analysis", 12 | "file tracking", 13 | "git history" 14 | ], 15 | "mcp": [ 16 | "github" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://github.com/mcpmark-source/missing-semester", 22 | "stateOriginalUrl": "https://github.com/missing-semester/missing-semester" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/notion/easy/toronto_guide/simple__change_color/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "simple__change_color", 3 | "task_name": "Simple Change Color", 4 | "category_id": "toronto_guide", 5 | "category_name": "Toronto Guide", 6 | "description": "Navigate to the Toronto Guide page and change all pink-colored elements to different colors.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "visual formatting", 12 | "conditional filtering" 13 | ], 14 | "mcp": [ 15 | "notion" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://painted-tennis-ebc.notion.site/Toronto-Guide-25281626b6d7802caa7cc394647e901c", 21 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/conquering-toronto-a-destination-guide" 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/notion/standard/self_assessment/numbered_list_emojis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "numbered_list_emojis", 3 | "task_name": "Numbered List Emojis", 4 | "category_id": "self_assessment", 5 | "category_name": "Self Assessment", 6 | "description": "Replace numbered list items with corresponding emoji numbers for better visual formatting.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "visual formatting", 12 | "automated migration" 13 | ], 14 | "mcp": [ 15 | "notion" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://painted-tennis-ebc.notion.site/Self-Assessment-24381626b6d780fe9f56c2ba14ea042d", 21 | "stateOriginalUrl": "https://painted-tennis-ebc.notion.site/Self-Assessment-24381626b6d780fe9f56c2ba14ea042d" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/notion/standard/japan_travel_planner/remove_osaka_itinerary/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "remove_osaka_itinerary", 3 | "task_name": "Remove Osaka Itinerary", 4 | "category_id": "japan_travel_planner", 5 | "category_name": "Japan Travel Planner", 6 | "description": "Remove the itinerary items in Osaka after 6 PM from Day 1 and Day 2 travel schedules.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "conditional filtering", 12 | "automated migration" 13 | ], 14 | "mcp": [ 15 | "notion" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://painted-tennis-ebc.notion.site/Japan-Travel-Planner-23181626b6d781c4b6bedb12786b5abe", 21 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/japantravelplanner101" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/github/easy/missing-semester/find_ga_tracking_id/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "find_ga_tracking_id", 3 | "task_name": "Find GA Tracking ID", 4 | "category_id": "missing-semester", 5 | "category_name": "Missing Semester (Easy)", 6 | "description": "Use GitHub MCP to discover the single Google Analytics tracking ID declared in the site configuration, write it to ANSWER.md, and push the change to master.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "config search", 12 | "analytics", 13 | "answer file" 14 | ], 15 | "mcp": [ 16 | "github" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://github.com/mcpmark-source/missing-semester", 22 | "stateOriginalUrl": "https://github.com/missing-semester/missing-semester" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/notion/standard/online_resume/layout_adjustment/description.md: -------------------------------------------------------------------------------- 1 | Please go to my Online Resume page and adjust the Skills display with the following requirements: 2 | 3 | ## Skills Section Adjustment 4 | 1. Delete the Skills database from the right side of the page 5 | 2. Add a new Skills section on the left side, under the Languages section 6 | 3. Format skills as "[icon] skill description (type)", for example "✨✨ Photoshop (Design Tool)" 7 | - Use ✨✨ icon for skills with level >= 50% 8 | - Use ✨ icon for skills with level < 50% 9 | 10 | ## Work History and Education Layout Adjustment 11 | 1. Adjust the layout so that logo/image columns take up 50% width in each section 12 | - Note: Column width ratio might not be returned by API when columns are equal (50/50) 13 | 2. Replace all images/icons with black placeholder images using URL containing "https://singlecolorimage.com/get/000000/1024x128" -------------------------------------------------------------------------------- /tasks/notion/standard/self_assessment/faq_column_layout/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "faq_column_layout", 3 | "task_name": "FAQ Column Layout", 4 | "category_id": "self_assessment", 5 | "category_name": "Self Assessment", 6 | "description": "Reorganize the FAQ section content into a two-column layout with balanced Q&A pairs.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content organization", 12 | "visual formatting", 13 | "template population" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Self-Assessment-24381626b6d780fe9f56c2ba14ea042d", 22 | "stateOriginalUrl": "https://painted-tennis-ebc.notion.site/Self-Assessment-24381626b6d780fe9f56c2ba14ea042d" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/llm_research_summary/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "llm_research_summary_easy", 3 | "task_name": "LLM Research Summary (Easy)", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Collect the headline GPT metrics from MachineLearning and publish a short five-line recap.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "data extraction", 12 | "search aggregation", 13 | "content submission", 14 | "user interaction" 15 | ], 16 | "mcp": [ 17 | "playwright" 18 | ], 19 | "meta_data": { 20 | "stateType": "video", 21 | "stateContent": null, 22 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 23 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 24 | } 25 | } 26 | -------------------------------------------------------------------------------- /tasks/notion/easy/japan_travel_planner/simple__remove_osaka_itinerary/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "simple__remove_osaka_itinerary", 3 | "task_name": "Simple Remove Osaka Itinerary", 4 | "category_id": "japan_travel_planner", 5 | "category_name": "Japan Travel Planner", 6 | "description": "Remove the itinerary items in Osaka after 6 PM from Day 1 and Day 2 travel schedules.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "conditional filtering", 12 | "automated migration" 13 | ], 14 | "mcp": [ 15 | "notion" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://painted-tennis-ebc.notion.site/Japan-Travel-Planner-23181626b6d781c4b6bedb12786b5abe", 21 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/japantravelplanner101" 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/budget_europe_travel/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "budget_europe_travel", 3 | "task_name": "Budget Europe Travel", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Create comprehensive budget travel resource for European destinations by establishing dedicated travel community with integrated wiki guides and engaging travel content.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-12", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content submission", 12 | "user interaction" 13 | ], 14 | "mcp": [ 15 | "playwright" 16 | ], 17 | "meta_data": { 18 | "stateType": "video", 19 | "stateContent": null, 20 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 21 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/notion/standard/company_in_a_box/goals_restructure/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "goals_restructure", 3 | "task_name": "Goals Restructure", 4 | "category_id": "company_in_a_box", 5 | "category_name": "Company In A Box", 6 | "description": "Restructure the Current Goals section on the Company In A Box page by adding a new goal heading and converting all goal headings to toggles with content inside.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content organization", 12 | "visual formatting" 13 | ], 14 | "mcp": [ 15 | "notion" 16 | ], 17 | "meta_data": { 18 | "stateType": "url", 19 | "stateContent": null, 20 | "stateUrl": "https://painted-tennis-ebc.notion.site/Company-In-A-Box-23d81626b6d7800098f3d0e64a706cd8", 21 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/company-in-a-box" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/notion/standard/japan_travel_planner/daily_itinerary_overview/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "daily_itinerary_overview", 3 | "task_name": "Daily Itinerary Overview", 4 | "category_id": "japan_travel_planner", 5 | "category_name": "Japan Travel Planner", 6 | "description": "Create a comprehensive daily itinerary overview page to organize Japan travel plans with structured day-by-day activities.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "database manipulation", 12 | "data aggregation", 13 | "report generation", 14 | "visual formatting", 15 | "status tracking" 16 | ], 17 | "mcp": [ 18 | "notion" 19 | ], 20 | "meta_data": { 21 | "stateType": "url", 22 | "stateContent": null, 23 | "stateUrl": "https://painted-tennis-ebc.notion.site/Japan-Travel-Planner-23181626b6d781c4b6bedb12786b5abe" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/notion/standard/team_projects/priority_tasks_table/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "priority_tasks_table", 3 | "task_name": "Priority Tasks Table", 4 | "category_id": "team_projects", 5 | "category_name": "Team Projects", 6 | "description": "Create a five-column table listing tasks with 50% or less progress or P0 priority tasks not completed.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-12", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "conditional filtering", 12 | "database manipulation", 13 | "data aggregation", 14 | "visual formatting" 15 | ], 16 | "mcp": [ 17 | "notion" 18 | ], 19 | "meta_data": { 20 | "stateType": "url", 21 | "stateContent": null, 22 | "stateUrl": "https://painted-tennis-ebc.notion.site/Team-Projects-24e81626b6d7809c982fdb7a25825898", 23 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/gantt-chart" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/ai_data_analyst/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "ai_data_analyst_easy", 3 | "task_name": "AI Data Analyst (Easy)", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Capture a lightweight AI/ML engagement snapshot by signing in, grabbing three key metrics, and posting them back to the MachineLearning forum.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "user interaction", 12 | "data extraction", 13 | "comparative analysis" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/github/easy/build-your-own-x/record_recent_commits/description.md: -------------------------------------------------------------------------------- 1 | Use the GitHub MCP tools to work in the `mcpmark-eval/build-your-own-x` repository. 2 | 3 | 1. Retrieve the newest five commits on the default branch. 4 | 2. Open a new issue titled exactly `Latest 5 Commit Snapshot`. 5 | 3. Set the issue body to exactly this format (newest commit first): 6 | 7 | ``` 8 | Latest 5 commits (newest first) 9 | 1. | | 10 | 2. | | 11 | 3. | | 12 | 4. | | 13 | 5. | | 14 | ``` 15 | 16 | Use the full 40-character SHA and only the first line of each commit message. The `` must come from the commit metadata's author name field (not the GitHub username/login). Leave the issue open and do not touch other issues. 17 | -------------------------------------------------------------------------------- /tasks/notion/easy/self_assessment/simple__faq_column_layout/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "simple__faq_column_layout", 3 | "task_name": "Simple FAQ Column Layout", 4 | "category_id": "self_assessment", 5 | "category_name": "Self Assessment", 6 | "description": "Reorganize the FAQ section content into a two-column layout with balanced Q&A pairs.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "content organization", 12 | "visual formatting", 13 | "template population" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Self-Assessment-24381626b6d780fe9f56c2ba14ea042d", 22 | "stateOriginalUrl": "https://painted-tennis-ebc.notion.site/Self-Assessment-24381626b6d780fe9f56c2ba14ea042d" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/notion/standard/online_resume/work_history_addition/description.md: -------------------------------------------------------------------------------- 1 | Hi! I realized I forgot to include one work experience on my resume page titled "Online Resume." Could you please help me add it to the "Work History" section? 2 | 3 | The position is "Research Assistant," and it took place from January to August 2023. The description should be: "Assisted in conducting user experience research projects at my bachelor’s program, supporting data collection, analyzing user feedback, and preparing research reports. Developed strong skills in research methodologies and improved collaboration with interdisciplinary teams." 4 | 5 | For the image or logo, please use the one from the "Education" section (my bachelor school) to keep everything consistent. 6 | 7 | Also, please make sure that the formatting — including font style, size, and layout — matches the existing entries in the Work History section so it looks seamless. 8 | 9 | Thank you! -------------------------------------------------------------------------------- /tasks/playwright/standard/web_search/r1_arxiv/description.md: -------------------------------------------------------------------------------- 1 | # Web Search Task 2 | 3 | Use Playwright MCP tools to search for the DeepSeek R1 research paper and extract all the paragraphs of the Conclusion section. 4 | 5 | ## Requirements: 6 | 7 | 1. Search for the DeepSeek R1 research paper 8 | 2. Navigate to the paper and find the Conclusion section 9 | 3. Extract **ALL the paragraphs** of the Conclusion section 10 | 4. **Provide the content in Markdown format - no explanations, no additional text** 11 | 12 | ## Important Notes: 13 | 14 | - **Output ALL the paragraphs of text** 15 | - **Do NOT include any explanations, summaries, or additional content** 16 | - **The response should contain ONLY the Conclusion section content formatted in Markdown** 17 | 18 | ## Expected Output: 19 | All the paragraphs of the Conclusion section from the DeepSeek R1 paper, formatted in Markdown with proper paragraph structure and formatting. 20 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/nba_statistics_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "nba_statistics_analysis_easy", 3 | "task_name": "NBA Statistics Analysis (Easy)", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Summarize just the three strongest NBA threads and share their vote/comment stats in a short post.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "user interaction", 12 | "data extraction", 13 | "comparative analysis", 14 | "content submission" 15 | ], 16 | "mcp": [ 17 | "playwright" 18 | ], 19 | "meta_data": { 20 | "stateType": "video", 21 | "stateContent": null, 22 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 23 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 24 | } 25 | } 26 | -------------------------------------------------------------------------------- /tasks/github/standard/missing-semester/assign_contributor_labels/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "assign_contributor_labels", 3 | "task_name": "Assign Contributor Labels", 4 | "category_id": "missing-semester", 5 | "category_name": "Missing Semester", 6 | "description": "Assign labels to open issues and PRs based on contributors mentioned in comments or the most frequent contributor from past 100 commits, using assigned-username format.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-08-15", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "issue management", 12 | "label automation", 13 | "contributor analysis" 14 | ], 15 | "mcp": [ 16 | "github" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://github.com/mcpmark-source/missing-semester", 22 | "stateOriginalUrl": "https://github.com/missing-semester/missing-semester" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/notion/standard/online_resume/work_history_addition/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "work_history_addition", 3 | "task_name": "Work History Addition", 4 | "category_id": "online_resume", 5 | "category_name": "Online Resume", 6 | "description": "Add a Research Assistant position to the Work History section with consistent formatting and university logo.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "database manipulation", 12 | "template population", 13 | "cross-reference linking", 14 | "visual formatting" 15 | ], 16 | "mcp": [ 17 | "notion" 18 | ], 19 | "meta_data": { 20 | "stateType": "url", 21 | "stateContent": null, 22 | "stateUrl": "https://painted-tennis-ebc.notion.site/Online-Resume-23181626b6d781159faaeb5eadaf612e", 23 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/online-resume" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/movie_reviewer_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "movie_reviewer_analysis_easy", 3 | "task_name": "Movie Reviewer Analysis (Easy)", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Grab the first-page movie signals plus the Rittenhouse poster stats and share them in a concise recap post.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "user interaction", 12 | "data extraction", 13 | "comparative analysis", 14 | "content submission" 15 | ], 16 | "mcp": [ 17 | "playwright" 18 | ], 19 | "meta_data": { 20 | "stateType": "video", 21 | "stateContent": null, 22 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 23 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 24 | } 25 | } 26 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/ai_data_analyst/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "ai_data_analyst", 3 | "task_name": "AI Data Analyst", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Create account on forum platform, collect AI/ML discussion metrics including post counts, vote data, and analyze community engagement patterns through systematic data extraction.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-12", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "user interaction", 12 | "data extraction", 13 | "comparative analysis" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/notion/easy/it_trouble_shooting_hub/simple__asset_retirement_migration/description.md: -------------------------------------------------------------------------------- 1 | Please migrate expiring assets out of the **IT Inventory** database using the simplified checklist below. Your changes will be verified automatically, so match the details exactly. 2 | 3 | --- 4 | Task Steps 5 | 1. Inside the **IT Trouble Shooting Hub** page, locate the database named **IT Inventory**. 6 | 2. Collect every page in **IT Inventory** whose **Status** is **Expired** or **To be returned**. 7 | 3. Create a **new full-page database** under the same hub titled **IT Asset Retirement Queue** with exactly these properties (names and types must match): 8 | • Serial – title 9 | • Status – select 10 | • Expiration date – date 11 | 4. For every item gathered in step 2, create a page in **IT Asset Retirement Queue**, copy over the Serial, Status, and Expiration date values, then archive the original inventory page once the copy is made. 12 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/search_filtering_operations/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "search_filtering_operations", 3 | "task_name": "Search Filtering Operations", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Configure advanced search and filtering systems in admin interface, implement category hierarchies, set up attribute filters, and optimize search algorithms for user experience.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content submission" 12 | ], 13 | "mcp": [ 14 | "playwright" 15 | ], 16 | "meta_data": { 17 | "stateType": "video", 18 | "stateContent": null, 19 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 20 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 21 | } 22 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/shopping_admin/search_filtering_operations/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "search_filtering_operations_easy", 3 | "task_name": "Search Filtering Operations (Easy)", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Follow a clearly guided path through Search Terms, the Search Terms report, and the dashboard widgets to capture the metrics needed for a focused search-behavior brief.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "content submission" 12 | ], 13 | "mcp": [ 14 | "playwright" 15 | ], 16 | "meta_data": { 17 | "stateType": "video", 18 | "stateContent": null, 19 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 20 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 21 | } 22 | } 23 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/routine_tracker_forum/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "routine_tracker_forum", 3 | "task_name": "Routine Tracker Forum", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Establish productivity community by creating account, setting up routine tracking forum, implementing daily habit systems, and engaging members with structured productivity challenges and resources.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-12", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "user interaction", 12 | "content submission" 13 | ], 14 | "mcp": [ 15 | "playwright" 16 | ], 17 | "meta_data": { 18 | "stateType": "video", 19 | "stateContent": null, 20 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 21 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/desktop/timeline_extraction/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | Read all the files under current path, extract every time/plan information that clearly indicates 2024, and integrate them into a list and create a file in main directory called `timeline.txt`. Write the timeline in the file in the following format. 4 | 5 | ### Rules 6 | - If a task only shows month without day, use the 1st day of that month 7 | - If a task only shows year without month and day, skip it. 8 | - If a file shows multiple tasks on the same date, count only once per date 9 | 10 | ### Output Format 11 | - Each line format: `file_path:time` 12 | - `file_path`: The file path where this time information appears (**relative to the current path**) 13 | - `time`: Specific time, if it's a time period, write the start time (YYYY-MM-DD) 14 | 15 | ### Sorting Requirements 16 | - Sort by chronological order 17 | -------------------------------------------------------------------------------- /tasks/notion/standard/japan_travel_planner/packing_progress_summary/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "packing_progress_summary", 3 | "task_name": "Packing Progress Summary", 4 | "category_id": "japan_travel_planner", 5 | "category_name": "Japan Travel Planner", 6 | "description": "Update packing list items and create a progress summary section showing statistics for each category.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "database manipulation", 12 | "data aggregation", 13 | "report generation", 14 | "status tracking" 15 | ], 16 | "mcp": [ 17 | "notion" 18 | ], 19 | "meta_data": { 20 | "stateType": "url", 21 | "stateContent": null, 22 | "stateUrl": "https://painted-tennis-ebc.notion.site/Japan-Travel-Planner-23181626b6d781c4b6bedb12786b5abe", 23 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/japantravelplanner101" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/notion/standard/python_roadmap/expert_level_lessons/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "expert_level_lessons", 3 | "task_name": "Expert Level Lessons", 4 | "category_id": "python_roadmap", 5 | "category_name": "Python Roadmap", 6 | "description": "Create an Expert Level chapter with sophisticated prerequisite chains and four expert-level lessons.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-08-02", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "database manipulation", 12 | "cross-reference linking", 13 | "conditional filtering", 14 | "status tracking", 15 | "template population" 16 | ], 17 | "mcp": [ 18 | "notion" 19 | ], 20 | "meta_data": { 21 | "stateType": "url", 22 | "stateContent": null, 23 | "stateUrl": "https://painted-tennis-ebc.notion.site/Python-Roadmap-25281626b6d78012bf2bce1fa8711f4d", 24 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/python-roadmap" 25 | } 26 | } -------------------------------------------------------------------------------- /tasks/notion/standard/self_assessment/hyperfocus_analysis_report/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "hyperfocus_analysis_report", 3 | "task_name": "Hyperfocus Analysis Report", 4 | "category_id": "self_assessment", 5 | "category_name": "Self Assessment", 6 | "description": "Create a hyperfocus analysis report by analyzing high-productivity sessions with challenges.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "conditional filtering", 12 | "data aggregation", 13 | "report generation", 14 | "visual formatting" 15 | ], 16 | "mcp": [ 17 | "notion" 18 | ], 19 | "meta_data": { 20 | "stateType": "url", 21 | "stateContent": null, 22 | "stateUrl": "https://painted-tennis-ebc.notion.site/Self-Assessment-24381626b6d780fe9f56c2ba14ea042d", 23 | "stateOriginalUrl": "https://painted-tennis-ebc.notion.site/Self-Assessment-24381626b6d780fe9f56c2ba14ea042d" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/shopping_admin/products_sales_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "products_sales_analysis_easy", 3 | "task_name": "Products Sales Analysis (Easy)", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Make a single guided pass through Catalog, Dashboard, Customers, and Orders to collect the exact fields needed for a quick sales recap.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "content submission" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/running_shoes_purchase/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "running_shoes_purchase", 3 | "task_name": "Running Shoes Purchase", 4 | "category_id": "shopping", 5 | "category_name": "Shopping", 6 | "description": "Research running footwear by analyzing biomechanical features, comparing cushioning technologies, evaluating durability ratings, considering user preferences, and recommending optimal shoe selections.", 7 | "author": "Yaoqi Ye", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "search aggregation", 12 | "comparative analysis" 13 | ], 14 | "mcp": [ 15 | "playwright" 16 | ], 17 | "meta_data": { 18 | "stateType": "video", 19 | "stateContent": null, 20 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/one-stop-market.mp4", 21 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_context/duplicates_searching/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | You are given a directory containing multiple text files. Some files have identical content and need to be organized. Your task is to identify all files with duplicate content and move them to a newly created 'duplicates' directory. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Scan all text files** in the test directory to identify groups with identical content 10 | 2. **Create a 'duplicates' directory** in the test directory root 11 | 3. **Move all duplicate files** into the 'duplicates' directory 12 | 4. **Leave unique files** in their original location 13 | 14 | ### Expected Output 15 | 16 | After completing the task, the directory structure should be: 17 | 18 | - `duplicates/` directory containing all files with duplicate content 19 | - Original directory containing only files with unique content 20 | -------------------------------------------------------------------------------- /tasks/notion/standard/computer_science_student_dashboard/code_snippets_go/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "code_snippets_go", 3 | "task_name": "Code Snippets Go", 4 | "category_id": "computer_science_student_dashboard", 5 | "category_name": "Computer Science Student Dashboard", 6 | "description": "Add a new Go column to the Code Snippets section between Python and JavaScript columns.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content organization", 12 | "visual formatting", 13 | "template population" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Computer-Science-Student-Dashboard-23e81626b6d78083b787d3c832b02ef4", 22 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/computer-science-student-dashboard" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/notion/standard/standard_operating_procedure/section_organization/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "section_organization", 3 | "task_name": "Section Organization", 4 | "category_id": "standard_operating_procedure", 5 | "category_name": "Standard Operating Procedure", 6 | "description": "Reorganize the Standard Operating Procedure page by swapping sections and creating a column layout.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-08-11", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content organization", 12 | "cross-reference linking", 13 | "visual formatting" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Standard-Operating-Procedure-24381626b6d780a8b678f9e62ae5b152", 22 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/standard-operating-procedure" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/notion/easy/python_roadmap/simple__expert_level_lessons/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "expert_level_lessons", 3 | "task_name": "Expert Level Lessons", 4 | "category_id": "python_roadmap", 5 | "category_name": "Python Roadmap", 6 | "description": "Create an Expert Level chapter with sophisticated prerequisite chains and four expert-level lessons.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "database manipulation", 12 | "cross-reference linking", 13 | "conditional filtering", 14 | "status tracking", 15 | "template population" 16 | ], 17 | "mcp": [ 18 | "notion" 19 | ], 20 | "meta_data": { 21 | "stateType": "url", 22 | "stateContent": null, 23 | "stateUrl": "https://painted-tennis-ebc.notion.site/Python-Roadmap-25281626b6d78012bf2bce1fa8711f4d", 24 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/python-roadmap" 25 | } 26 | } 27 | -------------------------------------------------------------------------------- /tasks/notion/standard/it_trouble_shooting_hub/security_audit_ticket/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "security_audit_ticket", 3 | "task_name": "Security Audit Ticket", 4 | "category_id": "it_trouble_shooting_hub", 5 | "category_name": "IT Trouble Shooting Hub", 6 | "description": "Create a comprehensive security audit ticket based on expired inventory items and security FAQ entries.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "conditional filtering", 12 | "database manipulation", 13 | "data aggregation", 14 | "report generation" 15 | ], 16 | "mcp": [ 17 | "notion" 18 | ], 19 | "meta_data": { 20 | "stateType": "url", 21 | "stateContent": null, 22 | "stateUrl": "https://painted-tennis-ebc.notion.site/It-Trouble-Shooting-Hub-23e81626b6d78020aba7eb65ae1cc2d5", 23 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/it-trouble-shooting-hub" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/shopping_admin/ny_expansion_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "ny_expansion_analysis_easy", 3 | "task_name": "NY Expansion Analysis (Easy)", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Capture just the dashboard, tax, order-status, store, and inventory facts required to judge if New York can launch without heavy configuration work.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "content submission" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/shopping_admin/sales_inventory_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "sales_inventory_analysis_easy", 3 | "task_name": "Sales Inventory Analysis (Easy)", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Follow one guided tour through Products, Orders, Dashboard, Customers, and Invoices to capture a compact set of sales-plus-inventory facts.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "inventory management" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/1_bug_report.yml: -------------------------------------------------------------------------------- 1 | name: '🐛 Bug Report' 2 | description: 'Report an bug' 3 | labels: ['unconfirm'] 4 | type: Bug 5 | body: 6 | - type: textarea 7 | attributes: 8 | label: '🐛 Bug Description' 9 | description: A clear and concise description of the bug, if the above option is `Other`, please also explain in detail. 10 | validations: 11 | required: true 12 | - type: textarea 13 | attributes: 14 | label: '📷 Recurrence Steps' 15 | description: A clear and concise description of how to recurrence. 16 | - type: textarea 17 | attributes: 18 | label: '🚦 Expected Behavior' 19 | description: A clear and concise description of what you expected to happen. 20 | - type: textarea 21 | attributes: 22 | label: '📝 Additional Information' 23 | description: If your problem needs further explanation, or if the issue you're seeing cannot be reproduced in a gist, please add more information here. 24 | -------------------------------------------------------------------------------- /tasks/notion/standard/company_in_a_box/employee_onboarding/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "employee_onboarding", 3 | "task_name": "Employee Onboarding", 4 | "category_id": "company_in_a_box", 5 | "category_name": "Company In A Box", 6 | "description": "Build an integrated Employee Onboarding system for the existing Company In A Box page with a checklist database, onboarding hub, and feedback form.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "database manipulation", 12 | "template population", 13 | "cross-reference linking", 14 | "status tracking" 15 | ], 16 | "mcp": [ 17 | "notion" 18 | ], 19 | "meta_data": { 20 | "stateType": "url", 21 | "stateContent": null, 22 | "stateUrl": "https://painted-tennis-ebc.notion.site/Company-In-A-Box-23d81626b6d7800098f3d0e64a706cd8", 23 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/company-in-a-box" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/notion/standard/online_resume/projects_section_update/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "projects_section_update", 3 | "task_name": "Projects Section Update", 4 | "category_id": "online_resume", 5 | "category_name": "Online Resume", 6 | "description": "Reorganize the projects section by removing outdated projects and adding new relevant work with proper formatting.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "database manipulation", 12 | "template population", 13 | "data aggregation", 14 | "visual formatting", 15 | "cross-reference linking" 16 | ], 17 | "mcp": [ 18 | "notion" 19 | ], 20 | "meta_data": { 21 | "stateType": "url", 22 | "stateContent": null, 23 | "stateUrl": "https://painted-tennis-ebc.notion.site/Online-Resume-23181626b6d781159faaeb5eadaf612e", 24 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/online-resume" 25 | } 26 | } -------------------------------------------------------------------------------- /tasks/notion/standard/python_roadmap/learning_metrics_dashboard/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "learning_metrics_dashboard", 3 | "task_name": "Learning Metrics Dashboard", 4 | "category_id": "python_roadmap", 5 | "category_name": "Python Roadmap", 6 | "description": "Create a comprehensive Learning Metrics Dashboard section displaying precise statistics and recommendations based on the Steps database.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-08-02", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data aggregation", 12 | "conditional filtering", 13 | "report generation", 14 | "visual formatting" 15 | ], 16 | "mcp": [ 17 | "notion" 18 | ], 19 | "meta_data": { 20 | "stateType": "url", 21 | "stateContent": null, 22 | "stateUrl": "https://painted-tennis-ebc.notion.site/Python-Roadmap-25281626b6d78012bf2bce1fa8711f4d", 23 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/python-roadmap" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_context/pattern_matching/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | Your task is to find all files that contain a substring of 30 or more characters that also appears in `large_file.txt`. **You are not allowed to use python code.** 6 | 7 | ### Task Objectives 8 | 9 | 1. **Read the reference file** `large_file.txt` to understand its content 10 | 2. **Examine each file** from file_01.txt to file_20.txt 11 | 3. **Find files** that contain a substring of 30 or more characters that matches a substring in `large_file.txt` 12 | 4. **Create a file `answer.txt`** and write the results to it with the following format: 13 | - One line per matching file 14 | - Format: `filename.txt,start_position` 15 | - Where start_position is the character position (1-indexed) of the matching substring in `large_file.txt` 16 | - Do not add any things else other than `filename.txt,start_position`. 17 | -------------------------------------------------------------------------------- /tasks/notion/standard/computer_science_student_dashboard/study_session_tracker/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "study_session_tracker", 3 | "task_name": "Study Session Tracker", 4 | "category_id": "computer_science_student_dashboard", 5 | "category_name": "Computer Science Student Dashboard", 6 | "description": "Create a new study-session entry in the Habit tracker section with four unchecked to-do items.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content organization", 12 | "visual formatting", 13 | "status tracking" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Computer-Science-Student-Dashboard-23e81626b6d78083b787d3c832b02ef4", 22 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/computer-science-student-dashboard" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/notion/standard/japan_travel_planner/restaurant_expenses_sync/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "restaurant_expenses_sync", 3 | "task_name": "Restaurant Expenses Sync", 4 | "category_id": "japan_travel_planner", 5 | "category_name": "Japan Travel Planner", 6 | "description": "Find restaurants from Day 1 Travel Itinerary and create corresponding entries in the Expenses database.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "conditional filtering", 12 | "database manipulation", 13 | "cross-reference linking", 14 | "template population" 15 | ], 16 | "mcp": [ 17 | "notion" 18 | ], 19 | "meta_data": { 20 | "stateType": "url", 21 | "stateContent": null, 22 | "stateUrl": "https://painted-tennis-ebc.notion.site/Japan-Travel-Planner-23181626b6d781c4b6bedb12786b5abe", 23 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/japantravelplanner101" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/notion/standard/computer_science_student_dashboard/study_session_tracker/description.md: -------------------------------------------------------------------------------- 1 | Your goal is to create a new study-session entry in the **Computer Science Student Dashboard** page. 2 | 3 | 1. Locate the ☑️ Habit tracker section of the page. 4 | 2. **Insert a new date section** immediately **after the existing `2022-09-02` to-do items but *before* the divider block** that follows them. Make sure the new date has proper formatting with a date mention and bold styling like the existing dates, and all to-do items should be unchecked initially. The new section should be inserted right after the 2022-09-02 to-do items but before the divider. 5 | 3. Directly **beneath** this new date mention, add **exactly four unchecked to-do blocks** with the following plain text (including the leading emoji on each line): 6 | • 🧠 Review algorithms for technical interview 7 | • 📚 Study database systems chapter 7 8 | • ⚡ Practice system design problems 9 | • 🎯 Complete data structures assignment -------------------------------------------------------------------------------- /tasks/playwright/standard/eval_web/cloudflare_turnstile_challenge/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "cloudflare_turnstile_challenge", 3 | "task_name": "Cloudflare Turnstile Challenge", 4 | "category_id": "eval_web", 5 | "category_name": "Eval Web", 6 | "description": "Navigate websites with Cloudflare Turnstile protection, handle security challenges, bypass bot detection mechanisms, and successfully access protected content using automated browser interactions.", 7 | "author": "Allison Zhan", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "user interaction" 12 | ], 13 | "mcp": [ 14 | "playwright" 15 | ], 16 | "meta_data": { 17 | "stateType": "video", 18 | "stateContent": null, 19 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/mcpmark-eval-website.mp4", 20 | "stateOriginalUrl": "https://mcp-eval-website.vercel.app/auth/turnstile" 21 | } 22 | } 23 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/sales_inventory_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "sales_inventory_analysis", 3 | "task_name": "Sales Inventory Analysis", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Analyze sales patterns and inventory levels to optimize stock management, identify slow-moving items, predict demand trends, and generate restocking recommendations.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "inventory management" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/notion/easy/computer_science_student_dashboard/simple__code_snippets_go/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "simple__code_snippets_go", 3 | "task_name": "Simple Code Snippets Go", 4 | "category_id": "computer_science_student_dashboard", 5 | "category_name": "Computer Science Student Dashboard", 6 | "description": "Add a new Go column to the Code Snippets section between Python and JavaScript columns.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "content organization", 12 | "visual formatting", 13 | "template population" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Computer-Science-Student-Dashboard-23e81626b6d78083b787d3c832b02ef4", 22 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/computer-science-student-dashboard" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/notion/easy/standard_operating_procedure/simple__section_organization/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "simple__section_organization", 3 | "task_name": "Simple Section Organization", 4 | "category_id": "standard_operating_procedure", 5 | "category_name": "Standard Operating Procedure", 6 | "description": "Reorganize the Standard Operating Procedure page by swapping sections and creating a column layout.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "content organization", 12 | "cross-reference linking", 13 | "visual formatting" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Standard-Operating-Procedure-24381626b6d780a8b678f9e62ae5b152", 22 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/standard-operating-procedure" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/llm_research_summary/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "llm_research_summary", 3 | "task_name": "LLM Research Summary", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Aggregate and analyze LLM research discussions across multiple forums, collect trending topics, compile technical insights, and create comprehensive summary post with community engagement.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-12", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data extraction", 12 | "search aggregation", 13 | "content submission", 14 | "user interaction" 15 | ], 16 | "mcp": [ 17 | "playwright" 18 | ], 19 | "meta_data": { 20 | "stateType": "video", 21 | "stateContent": null, 22 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 23 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/printer_keyboard_search/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "printer_keyboard_search", 3 | "task_name": "Printer Keyboard Search", 4 | "category_id": "shopping", 5 | "category_name": "Shopping", 6 | "description": "Search and evaluate office equipment by comparing printer specifications, keyboard ergonomics, analyzing user reviews, tracking prices, and generating detailed purchase recommendations report.", 7 | "author": "Yaoqi Ye", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "search aggregation", 12 | "comparative analysis", 13 | "content submission" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/one-stop-market.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/ny_expansion_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "ny_expansion_analysis", 3 | "task_name": "NY Expansion Analysis", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Prepare New York market expansion strategy by analyzing regional demographics, evaluating competitor presence, assessing logistics requirements, and creating detailed market entry plan.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "content submission" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/health_routine_optimization/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "health_routine_optimization", 3 | "task_name": "Health Routine Optimization", 4 | "category_id": "shopping", 5 | "category_name": "Shopping", 6 | "description": "Optimize health and wellness product selections by analyzing nutritional supplements, fitness equipment, creating personalized routines, and tracking health metrics for lifestyle improvements.", 7 | "author": "Yaoqi Ye", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "content submission" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/one-stop-market.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/customer_segmentation_setup/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "customer_segmentation_setup", 3 | "task_name": "Customer Segmentation Setup", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Configure customer segmentation system in admin panel by defining demographic criteria, creating behavior-based segments, implementing targeting rules, and setting up automated marketing workflows.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content submission", 12 | "inventory management" 13 | ], 14 | "mcp": [ 15 | "playwright" 16 | ], 17 | "meta_data": { 18 | "stateType": "video", 19 | "stateContent": null, 20 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 21 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_property/size_classification/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "size_classification", 3 | "task_name": "Size Classification", 4 | "category_id": "file_property", 5 | "category_name": "File Property", 6 | "description": "Classify all files in the folder by size into distinct categories (small/medium/large) and generate a comprehensive summary report with statistics.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-08-07", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "file organization", 12 | "pattern analysis" 13 | ], 14 | "mcp": [ 15 | "filesystem" 16 | ], 17 | "meta_data": { 18 | "stateType": "text", 19 | "stateContent": "file_property/\n ├── bear.jpg\n ├── bridge.jpg\n ├── bus.MOV\n ├── random_file_1.txt\n ├── random_file_2.txt\n ├── random_file_3.txt\n ├── road.MOV\n └── sg.jpg", 20 | "stateUrl": "https://storage.mcpmark.ai/filesystem/file_property.zip", 21 | "stateOriginalUrl": null 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/movie_reviewer_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "movie_reviewer_analysis", 3 | "task_name": "Movie Reviewer Analysis", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Analyze movie review patterns by creating reviewer profile, collecting ratings data, tracking review trends, and generating analytical report on community movie preferences and discussions.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-12", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "user interaction", 12 | "data extraction", 13 | "comparative analysis", 14 | "content submission" 15 | ], 16 | "mcp": [ 17 | "playwright" 18 | ], 19 | "meta_data": { 20 | "stateType": "video", 21 | "stateContent": null, 22 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 23 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/notion/standard/online_resume/skills_development_tracker/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "skills_development_tracker", 3 | "task_name": "Skills Development Tracker", 4 | "category_id": "online_resume", 5 | "category_name": "Online Resume", 6 | "description": "Create a comprehensive skills audit system with development tracking for skills below 70% proficiency.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "database manipulation", 12 | "cross-reference linking", 13 | "conditional filtering", 14 | "data aggregation", 15 | "template population", 16 | "visual formatting" 17 | ], 18 | "mcp": [ 19 | "notion" 20 | ], 21 | "meta_data": { 22 | "stateType": "url", 23 | "stateContent": null, 24 | "stateUrl": "https://painted-tennis-ebc.notion.site/Online-Resume-23181626b6d781159faaeb5eadaf612e", 25 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/online-resume" 26 | } 27 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/buyitforlife_research/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "buyitforlife_research", 3 | "task_name": "Buy It For Life Research", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Research durable lifetime-quality products by creating forum account, analyzing community recommendations, extracting product data, and compiling comprehensive durability report with voting metrics.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-12", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "user interaction", 12 | "data extraction", 13 | "search aggregation", 14 | "content submission" 15 | ], 16 | "mcp": [ 17 | "playwright" 18 | ], 19 | "meta_data": { 20 | "stateType": "video", 21 | "stateContent": null, 22 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 23 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/gaming_accessories_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "gaming_accessories_analysis", 3 | "task_name": "Gaming Accessories Analysis", 4 | "category_id": "shopping", 5 | "category_name": "Shopping", 6 | "description": "Research gaming peripherals by analyzing technical specifications, comparing performance metrics, evaluating user reviews, tracking price trends, and creating detailed gaming accessory recommendations.", 7 | "author": "Yaoqi Ye", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "search aggregation", 12 | "comparative analysis", 13 | "data extraction" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/one-stop-market.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/student_database/gradebased_score/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Simple Grade Calculation 4 | 5 | 1. Read Student Data: 6 | 7 | * Process all student basic_info.txt files from the database 8 | * Extract scores for Chinese, Math, and English subjects 9 | 10 | 2. Calculate Basic Grades: 11 | 12 | * Use simple grade scale: A (90+), B (80-89), C (70-79), D (60-69), F (<60) 13 | * Apply this same scale to all subjects 14 | 15 | ### Generate Output Files 16 | 17 | 1. Create student_grades.csv: 18 | 19 | * Columns: student_id, name, chinese_score, chinese_grade, math_score, math_grade, english_score, english_grade 20 | * Must contain exactly each students 21 | * Each students one row 22 | 23 | 2. Create grade_summary.txt: 24 | 25 | * Total number of students processed 26 | * Number of A's, B's, C's, D's, and F's for each subject 27 | * Simple count of students with passing grades (A, B, C) vs failing grades (D, F) for each subjects 28 | -------------------------------------------------------------------------------- /tasks/notion/easy/computer_science_student_dashboard/simple__study_session_tracker/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "simple__study_session_tracker", 3 | "task_name": "Simple Study Session Tracker", 4 | "category_id": "computer_science_student_dashboard", 5 | "category_name": "Computer Science Student Dashboard", 6 | "description": "Create a new study-session entry in the Habit tracker section with four unchecked to-do items.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "content organization", 12 | "visual formatting", 13 | "status tracking" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Computer-Science-Student-Dashboard-23e81626b6d78083b787d3c832b02ef4", 22 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/computer-science-student-dashboard" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /tasks/notion/standard/toronto_guide/weekend_adventure_planner/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "weekend_adventure_planner", 3 | "task_name": "Weekend Adventure Planner", 4 | "category_id": "toronto_guide", 5 | "category_name": "Toronto Guide", 6 | "description": "Create a comprehensive weekend adventure planner that analyzes Toronto Guide databases and generates a structured itinerary page.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-08-14", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "conditional filtering", 12 | "data aggregation", 13 | "report generation", 14 | "visual formatting", 15 | "status tracking" 16 | ], 17 | "mcp": [ 18 | "notion" 19 | ], 20 | "meta_data": { 21 | "stateType": "url", 22 | "stateContent": null, 23 | "stateUrl": "https://painted-tennis-ebc.notion.site/Toronto-Guide-25281626b6d7802caa7cc394647e901c", 24 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/conquering-toronto-a-destination-guide" 25 | } 26 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/nba_statistics_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "nba_statistics_analysis", 3 | "task_name": "NBA Statistics Analysis", 4 | "category_id": "reddit", 5 | "category_name": "Reddit", 6 | "description": "Create sports analytics account, collect NBA player statistics from forum discussions, analyze basketball performance metrics, and compile comprehensive statistical report with community insights.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-12", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "user interaction", 12 | "data extraction", 13 | "comparative analysis", 14 | "content submission" 15 | ], 16 | "mcp": [ 17 | "playwright" 18 | ], 19 | "meta_data": { 20 | "stateType": "video", 21 | "stateContent": null, 22 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/postmill.mp4", 23 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/holiday_baking_competition/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "holiday_baking_competition", 3 | "task_name": "Holiday Baking Competition", 4 | "category_id": "shopping", 5 | "category_name": "Shopping", 6 | "description": "Research baking supplies for competition preparation including ingredient quality analysis, equipment comparisons, recipe optimization, and creating comprehensive shopping list with budget recommendations.", 7 | "author": "Yaoqi Ye", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "search aggregation", 12 | "comparative analysis", 13 | "inventory management" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/one-stop-market.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/products_sales_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "products_sales_analysis", 3 | "task_name": "Products Sales Analysis", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Generate comprehensive sales performance reports by extracting product metrics, analyzing revenue trends, identifying top performers, evaluating inventory turnover, and creating actionable insights.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "content submission" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/advanced_product_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "advanced_product_analysis", 3 | "task_name": "Advanced Product Analysis", 4 | "category_id": "shopping", 5 | "category_name": "Shopping", 6 | "description": "Perform comprehensive product analysis including feature comparisons, price tracking, review aggregation, customer sentiment analysis, and generate detailed recommendation reports for informed purchasing decisions.", 7 | "author": "Yaoqi Ye", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "content submission" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/one-stop-market.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/postgres/easy/sports/create_performance_indexes/description.md: -------------------------------------------------------------------------------- 1 | Create indexes to optimize participant and statistics queries in the sports database. 2 | 3 | ## Your Task: 4 | 5 | Create two indexes to improve query performance: 6 | 7 | 1. **Index on participants_events table**: Create an index on the `participant_id` column of the `participants_events` table 8 | 2. **Composite index on stats table**: Create a composite index on the `stats` table using columns `stat_holder_type` and `stat_holder_id` (in that order) 9 | 10 | ## Requirements: 11 | 12 | - Create an index on `participants_events(participant_id)` 13 | - Create a composite index on `stats(stat_holder_type, stat_holder_id)` 14 | - Index names can be anything you choose (e.g., `idx_participants_events_participant_id`, `idx_stats_holder`) 15 | - Use the standard CREATE INDEX syntax 16 | 17 | ## Expected Outcome: 18 | 19 | After creating these indexes, queries that involve participant filtering and statistics lookups will run significantly faster. 20 | -------------------------------------------------------------------------------- /tasks/notion/standard/standard_operating_procedure/deployment_process_sop/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "deployment_process_sop", 3 | "task_name": "Deployment Process SOP", 4 | "category_id": "standard_operating_procedure", 5 | "category_name": "Standard Operating Procedure", 6 | "description": "Complete the SOP template with comprehensive content for a Software Deployment Process with interconnected sections.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "template population", 12 | "cross-reference linking", 13 | "content organization", 14 | "visual formatting" 15 | ], 16 | "mcp": [ 17 | "notion" 18 | ], 19 | "meta_data": { 20 | "stateType": "url", 21 | "stateContent": null, 22 | "stateUrl": "https://painted-tennis-ebc.notion.site/Standard-Operating-Procedure-24381626b6d780a8b678f9e62ae5b152", 23 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/standard-operating-procedure" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/routine_tracker_forum/description.md: -------------------------------------------------------------------------------- 1 | Reshare the most helpful LifeProTips calendar advice so it appears as a fresh routine post for other readers. 2 | 3 | **Task Requirements:** 4 | 5 | 1. Register and log in with username `RoutineTracker2025` and password `DailyRoutine123!`. 6 | 2. Find the LifeProTips thread titled `LPT: Use your calendar as your to-do list. Assigning dedicated time to tasks increases the likelyhood of you acting upon it.` and identify the comment with the highest upvote count. Copy its full text verbatim. 7 | 3. In the `LifeProTips` forum, publish a post titled `My 5-Step Morning Routine That Increased My Productivity by 200%` whose body is exactly the comment text you copied (no additional words or formatting). 8 | 4. Confirm that your new post appears in the LifeProTips listing so community members can immediately see it. 9 | 10 | Only these three actions—copy, repost, verify—are required for the easy version; no extra upvotes or settings changes are necessary. 11 | -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_property/time_classification/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "time_classification", 3 | "task_name": "Time Classification", 4 | "category_id": "file_property", 5 | "category_name": "File Property", 6 | "description": "Organize files based on modification timestamps into temporal categories and create a detailed time-based classification report with groupings.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-08-07", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "file organization", 12 | "data extraction", 13 | "pattern analysis" 14 | ], 15 | "mcp": [ 16 | "filesystem" 17 | ], 18 | "meta_data": { 19 | "stateType": "text", 20 | "stateContent": "file_property/\n ├── bear.jpg\n ├── bridge.jpg\n ├── bus.MOV\n ├── random_file_1.txt\n ├── random_file_2.txt\n ├── random_file_3.txt\n ├── road.MOV\n └── sg.jpg", 21 | "stateUrl": "https://storage.mcpmark.ai/filesystem/file_property.zip", 22 | "stateOriginalUrl": null 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/notion/standard/it_trouble_shooting_hub/asset_retirement_migration/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "asset_retirement_migration", 3 | "task_name": "Asset Retirement Migration", 4 | "category_id": "it_trouble_shooting_hub", 5 | "category_name": "IT Trouble Shooting Hub", 6 | "description": "Restructure the IT Inventory database by migrating expired assets to a new IT Asset Retirement Queue database.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "database manipulation", 12 | "automated migration", 13 | "conditional filtering", 14 | "data aggregation", 15 | "report generation" 16 | ], 17 | "mcp": [ 18 | "notion" 19 | ], 20 | "meta_data": { 21 | "stateType": "url", 22 | "stateContent": null, 23 | "stateUrl": "https://painted-tennis-ebc.notion.site/It-Trouble-Shooting-Hub-23e81626b6d78020aba7eb65ae1cc2d5", 24 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/it-trouble-shooting-hub" 25 | } 26 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/shopping_admin/fitness_promotion_strategy/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "fitness_promotion_strategy_easy", 3 | "task_name": "Fitness Promotion Strategy (Easy)", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Capture the three dashboard bestsellers, confirm their catalog details, and snapshot the related promo and customer metrics needed for a quick campaign brief.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "inventory management", 14 | "content submission" 15 | ], 16 | "mcp": [ 17 | "playwright" 18 | ], 19 | "meta_data": { 20 | "stateType": "video", 21 | "stateContent": null, 22 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 23 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 24 | } 25 | } 26 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/marketing_customer_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "marketing_customer_analysis", 3 | "task_name": "Marketing Customer Analysis", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Analyze customer behavior patterns using admin analytics, segment user demographics, track purchase histories, evaluate campaign effectiveness, and generate comprehensive marketing intelligence reports.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "content submission" 14 | ], 15 | "mcp": [ 16 | "playwright" 17 | ], 18 | "meta_data": { 19 | "stateType": "video", 20 | "stateContent": null, 21 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 22 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/notion/easy/online_resume/simple__skills_development_tracker/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "simple__skills_development_tracker", 3 | "task_name": "Simple Skills Development Tracker", 4 | "category_id": "online_resume", 5 | "category_name": "Online Resume", 6 | "description": "Create a comprehensive skills audit system with development tracking for skills below 70% proficiency.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "database manipulation", 12 | "cross-reference linking", 13 | "conditional filtering", 14 | "data aggregation", 15 | "template population", 16 | "visual formatting" 17 | ], 18 | "mcp": [ 19 | "notion" 20 | ], 21 | "meta_data": { 22 | "stateType": "url", 23 | "stateContent": null, 24 | "stateUrl": "https://painted-tennis-ebc.notion.site/Online-Resume-23181626b6d781159faaeb5eadaf612e", 25 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/online-resume" 26 | } 27 | } 28 | -------------------------------------------------------------------------------- /tasks/notion/standard/company_in_a_box/quarterly_review_dashboard/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "quarterly_review_dashboard", 3 | "task_name": "Quarterly Review Dashboard", 4 | "category_id": "company_in_a_box", 5 | "category_name": "Company In A Box", 6 | "description": "Create a quarterly business review dashboard in Notion based on the existing Company In A Box workspace with department objectives and action items database.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "database manipulation", 12 | "data aggregation", 13 | "report generation", 14 | "status tracking", 15 | "template population" 16 | ], 17 | "mcp": [ 18 | "notion" 19 | ], 20 | "meta_data": { 21 | "stateType": "url", 22 | "stateContent": null, 23 | "stateUrl": "https://painted-tennis-ebc.notion.site/Company-In-A-Box-23d81626b6d7800098f3d0e64a706cd8", 24 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/company-in-a-box" 25 | } 26 | } -------------------------------------------------------------------------------- /tasks/notion/standard/computer_science_student_dashboard/courses_internships_relation/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "courses_internships_relation", 3 | "task_name": "Courses Internships Relation", 4 | "category_id": "computer_science_student_dashboard", 5 | "category_name": "Computer Science Student Dashboard", 6 | "description": "Connect the Courses and Internship search databases with bidirectional relations and populate with sample data.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "database manipulation", 12 | "cross-reference linking", 13 | "template population" 14 | ], 15 | "mcp": [ 16 | "notion" 17 | ], 18 | "meta_data": { 19 | "stateType": "url", 20 | "stateContent": null, 21 | "stateUrl": "https://painted-tennis-ebc.notion.site/Computer-Science-Student-Dashboard-23e81626b6d78083b787d3c832b02ef4", 22 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/computer-science-student-dashboard" 23 | } 24 | } -------------------------------------------------------------------------------- /tasks/notion/standard/it_trouble_shooting_hub/verification_expired_update/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "verification_expired_update", 3 | "task_name": "Verification Expired Update", 4 | "category_id": "it_trouble_shooting_hub", 5 | "category_name": "IT Trouble Shooting Hub", 6 | "description": "Update pages with expired verification status by adding warning callouts and creating a batch update request.", 7 | "author": "Zijian Wu", 8 | "created_at": "2025-07-27", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "conditional filtering", 12 | "visual formatting", 13 | "database manipulation", 14 | "cross-reference linking", 15 | "status tracking" 16 | ], 17 | "mcp": [ 18 | "notion" 19 | ], 20 | "meta_data": { 21 | "stateType": "url", 22 | "stateContent": null, 23 | "stateUrl": "https://painted-tennis-ebc.notion.site/It-Trouble-Shooting-Hub-23e81626b6d78020aba7eb65ae1cc2d5", 24 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/it-trouble-shooting-hub" 25 | } 26 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_context/uppercase/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | You need to process 10 text files (file_01.txt to file_10.txt) and convert their content to uppercase format. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Create an uppercase directory** in the test environment root 10 | 2. **Convert each file** from file_01.txt to file_10.txt to uppercase 11 | 3. **Save converted files** in the uppercase/ directory with the same names 12 | 4. **Count words** in each original file (file_01.txt to file_10.txt) 13 | 5. **Create answer.txt** with word counts in the specified format. 14 | 15 | ### Specified Format of answer.txt 16 | 17 | Create a file named `answer.txt` in uppercase/ 18 | 19 | **Requirements:** 20 | 21 | - Each line should follow the format: `:` 22 | - Include all 10 files: file_01.txt, file_02.txt, ..., file_10.txt 23 | - Use the exact filename format (file_01.txt, file_02.txt, etc.) 24 | - One entry per line 25 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/llm_research_summary/description.md: -------------------------------------------------------------------------------- 1 | Produce a five-line GPT activity bulletin so readers can quickly see how active the MachineLearning forum is. 2 | 3 | **Task Requirements:** 4 | 5 | 1. Register and log in with username `llm_analyst_2024` and password `LLMExpert2024!`. 6 | 2. On the first page of the `MachineLearning` forum, count how many post titles contain `GPT`, `ChatGPT`, or `LLM`; record the exact total. 7 | 3. From that subset, identify the single post with the highest upvote count and note its full title and vote total. 8 | 4. Create a new post titled `LLM Research Summary: GPT Discussions Analysis [2024]` in the same forum. The body must be exactly the Markdown list below with your numbers in place of each `FILL_VALUE` (keep the keys, order, and pipe separator intact): 9 | 10 | ``` 11 | - Total_LLM_Posts|FILL_VALUE 12 | - Top1_Title|FILL_VALUE 13 | - Top1_Upvotes|FILL_VALUE 14 | ``` 15 | 16 | That’s the entire easy version—just report the headline LLM count and the hottest related thread. 17 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping_admin/fitness_promotion_strategy/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "fitness_promotion_strategy", 3 | "task_name": "Fitness Promotion Strategy", 4 | "category_id": "shopping_admin", 5 | "category_name": "Shopping Admin", 6 | "description": "Develop fitness product promotion campaigns by analyzing sales data, creating targeted offers, configuring promotional rules, and implementing cross-selling strategies in admin dashboard.", 7 | "author": "Fanqing Meng", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data extraction", 12 | "comparative analysis", 13 | "inventory management", 14 | "content submission" 15 | ], 16 | "mcp": [ 17 | "playwright" 18 | ], 19 | "meta_data": { 20 | "stateType": "video", 21 | "stateContent": null, 22 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/magento-admin.mp4", 23 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/notion/standard/company_in_a_box/goals_restructure/description.md: -------------------------------------------------------------------------------- 1 | Please restructure the **Current Goals** section on my **Company In A Box** page as follows: 2 | 3 | 1. **Add a new goal heading** — create a new `heading_3` block titled: 4 | 5 | `🔄 Digital Transformation Initiative` 6 | 7 | 2. **Convert all four goal headings to toggles** — the three existing goals 8 | * ⚙️ Expand Operations to LATAM 9 | * 🛠️ Push for Enterprise 10 | * 🩶 Boost Employee Engagement 11 | * 🔄 Digital Transformation Initiative 12 | 13 | 3. **Move descriptions inside the toggles** — every paragraph or list that originally sat directly under a goal heading should become a **child block** of that heading after it is made toggleable. 14 | 15 | 4. **Preserve content & order** — apart from the changes above, do **not** modify the text, formatting, or order of existing goal descriptions. 16 | 17 | The end result should be a clean **Current Goals** section containing four toggleable goal headings, each with its corresponding details tucked inside. -------------------------------------------------------------------------------- /tasks/notion/standard/online_resume/layout_adjustment/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "layout_adjustment", 3 | "task_name": "Layout Adjustment", 4 | "category_id": "online_resume", 5 | "category_name": "Online Resume", 6 | "description": "This task involves modifying the layout and content of an online resume page by restructuring the Skills section with icon indicators and adjusting the Work History and Education sections to use equal column widths with placeholder images.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-08-14", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content organization", 12 | "visual formatting", 13 | "conditional filtering", 14 | "template population" 15 | ], 16 | "mcp": [ 17 | "notion" 18 | ], 19 | "meta_data": { 20 | "stateType": "url", 21 | "stateContent": null, 22 | "stateUrl": "https://painted-tennis-ebc.notion.site/Online-Resume-23181626b6d781159faaeb5eadaf612e", 23 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/online-resume" 24 | } 25 | } -------------------------------------------------------------------------------- /tasks/notion/easy/it_trouble_shooting_hub/simple__asset_retirement_migration/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "simple__asset_retirement_migration", 3 | "task_name": "Simple Asset Retirement Migration", 4 | "category_id": "it_trouble_shooting_hub", 5 | "category_name": "IT Trouble Shooting Hub", 6 | "description": "Restructure the IT Inventory database by migrating expired assets to a new IT Asset Retirement Queue database.", 7 | "author": "Xiangyan Liu", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "database manipulation", 12 | "automated migration", 13 | "conditional filtering", 14 | "data aggregation", 15 | "report generation" 16 | ], 17 | "mcp": [ 18 | "notion" 19 | ], 20 | "meta_data": { 21 | "stateType": "url", 22 | "stateContent": null, 23 | "stateUrl": "https://painted-tennis-ebc.notion.site/It-Trouble-Shooting-Hub-23e81626b6d78020aba7eb65ae1cc2d5", 24 | "stateOriginalUrl": "https://www.notion.so/marketplace/templates/it-trouble-shooting-hub" 25 | } 26 | } 27 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | authors = [] 3 | name = "MCPMark" 4 | requires-python = ">= 3.11" 5 | version = "0.0.1" 6 | dependencies = [ 7 | "notion-client>=2.4.0", 8 | "playwright>=1.43.0", 9 | "seaborn>=0.12.0", 10 | "matplotlib>=3.7.0", 11 | "numpy>=1.23.0", 12 | "openai-agents>=0.2.3,<0.3", 13 | "openai>=1.96.1", 14 | "python-dotenv>=1.1.1,<2", 15 | "ruff>=0.12.4,<0.13", 16 | "psycopg2-binary>=2.9.10,<3", 17 | "pyyaml>=6.0.2,<7", 18 | "nest-asyncio>=1.6.0,<2", 19 | "pixi", 20 | "pipx>=1.7.1,<2", 21 | "pgdumplib>=3.1.0,<4", 22 | "litellm==1.80.0" 23 | ] 24 | 25 | [build-system] 26 | build-backend = "hatchling.build" 27 | requires = ["hatchling"] 28 | 29 | [tool.pixi.workspace] 30 | channels = ["conda-forge"] 31 | platforms = [ 32 | "osx-arm64", 33 | "linux-aarch64", 34 | "linux-64", 35 | "win-64", 36 | "osx-64", 37 | ] 38 | 39 | [tool.pixi.tasks] 40 | fmt = "ruff" 41 | 42 | [tool.ruff.format] 43 | indent-style = "space" 44 | line-ending = "auto" 45 | 46 | [tool.hatch.build.targets.wheel] 47 | packages = ["src", "tasks"] 48 | -------------------------------------------------------------------------------- /tasks/filesystem/easy/file_property/txt_merging/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "txt_merging", 3 | "task_name": "Text File Merging", 4 | "category_id": "file_property", 5 | "category_name": "File Property", 6 | "description": "Combine the contents of every .txt file into a single merge.txt file so the archive has one consolidated view.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "content consolidation", 12 | "file automation" 13 | ], 14 | "mcp": [ 15 | "filesystem" 16 | ], 17 | "meta_data": { 18 | "stateType": "text", 19 | "stateContent": "file_property/\n \u251c\u2500\u2500 bear.jpg\n \u251c\u2500\u2500 bridge.jpg\n \u251c\u2500\u2500 bus.MOV\n \u251c\u2500\u2500 random_file_1.txt\n \u251c\u2500\u2500 random_file_2.txt\n \u251c\u2500\u2500 random_file_3.txt\n \u251c\u2500\u2500 road.MOV\n \u2514\u2500\u2500 sg.jpg", 20 | "stateUrl": "https://storage.mcpmark.ai/filesystem/file_property.zip", 21 | "stateOriginalUrl": null 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/nba_statistics_analysis/label.txt: -------------------------------------------------------------------------------- 1 | - Total_NBA_Posts|20 2 | - Top1_Title|Hamby claims [WNBA Champ] Aces 'unprofessional' after trade 3 | - Top1_Votes|614 4 | - Top1_Comments|170 5 | - Top1_Author|Responsible-Lunch815 6 | - Top2_Title|Heat place literally every player on injury report after receiving NBA fine ahead of Mexico City game 7 | - Top2_Votes|1266 8 | - Top2_Comments|145 9 | - Top2_Author|XXmynameisNeganXX 10 | - Top3_Title|[ESPN] Announced attendance at the Alamodome tonight|68,323, a new single-game NBA record, in the Spurs' first game there since Game 4 of the 2002 Western Conference Semifinals. 11 | - Top3_Votes|1511 12 | - Top3_Comments|101 13 | - Top3_Author|dragon8811 14 | - Top4_Title|Phoenix Mercury confirm Brittney Griner’s return to WNBA 15 | - Top4_Votes|0 16 | - Top4_Comments|42 17 | - Top4_Author|rejs7 18 | - Top5_Title|Perspective | Kyrie Irving lit a flame. The NBA, top to bottom, watched the fire spread. 19 | - Top5_Votes|74 20 | - Top5_Comments|32 21 | - Top5_Author|tomyland 22 | - BCLetsRide69_Total_Posts|48 -------------------------------------------------------------------------------- /tasks/postgres/easy/chinook/update_employee_info/description.md: -------------------------------------------------------------------------------- 1 | Update employee information and reorganize the reporting structure in the Chinook database to reflect organizational changes. 2 | 3 | ## Your Tasks: 4 | 5 | ### **UPDATE: Modify Existing Employee Information** 6 | - Change Andrew Adams (EmployeeId = 1) title from 'General Manager' to 'CEO' 7 | - Update Nancy Edwards (EmployeeId = 2) phone number to '+1 (403) 555-9999' 8 | - Change all employees with Title = 'IT Staff' to have Title = 'IT Specialist' 9 | 10 | 11 | ## Requirements: 12 | 13 | - Use UPDATE statements to modify the existing records 14 | - The title update for 'IT Staff' should affect all matching employees 15 | 16 | ## Expected Results: 17 | 18 | After completing the updates: 19 | - Andrew Adams should have Title = 'CEO' 20 | - Nancy Edwards should have Phone = '+1 (403) 555-9999' 21 | - All employees previously with Title = 'IT Staff' should now have Title = 'IT Specialist' 22 | 23 | This task practices UPDATE operations for both employee information and organizational hierarchy management. 24 | -------------------------------------------------------------------------------- /tasks/filesystem/easy/file_property/largest_rename/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "largest_rename", 3 | "task_name": "Largest File Rename", 4 | "category_id": "file_property", 5 | "category_name": "File Property", 6 | "description": "Identify the largest .jpg in the workspace and rename it to largest.jpg while leaving the other files untouched.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-11-15", 9 | "difficulty": "L1", 10 | "tags": [ 11 | "file organization", 12 | "attribute inspection" 13 | ], 14 | "mcp": [ 15 | "filesystem" 16 | ], 17 | "meta_data": { 18 | "stateType": "text", 19 | "stateContent": "file_property/\n \u251c\u2500\u2500 bear.jpg\n \u251c\u2500\u2500 bridge.jpg\n \u251c\u2500\u2500 bus.MOV\n \u251c\u2500\u2500 random_file_1.txt\n \u251c\u2500\u2500 random_file_2.txt\n \u251c\u2500\u2500 random_file_3.txt\n \u251c\u2500\u2500 road.MOV\n \u2514\u2500\u2500 sg.jpg", 20 | "stateUrl": "https://storage.mcpmark.ai/filesystem/file_property.zip", 21 | "stateOriginalUrl": null 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/multi_category_budget_analysis/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "multi_category_budget_analysis", 3 | "task_name": "Multi Category Budget Analysis", 4 | "category_id": "shopping", 5 | "category_name": "Shopping", 6 | "description": "Analyze spending patterns across multiple product categories, optimize budget allocation, identify cost-saving opportunities, and generate comprehensive financial planning report with purchase recommendations.", 7 | "author": "Yaoqi Ye", 8 | "created_at": "2025-08-17", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "data extraction", 12 | "search aggregation", 13 | "content submission", 14 | "comparative analysis", 15 | "inventory management" 16 | ], 17 | "mcp": [ 18 | "playwright" 19 | ], 20 | "meta_data": { 21 | "stateType": "video", 22 | "stateContent": null, 23 | "stateUrl": "https://storage.mcpmark.ai/tasks_state/playwright_video/one-stop-market.mp4", 24 | "stateOriginalUrl": "https://github.com/web-arena-x/webarena/tree/main/environment_docker" 25 | } 26 | } -------------------------------------------------------------------------------- /tasks/notion/easy/computer_science_student_dashboard/simple__code_snippets_go/description.md: -------------------------------------------------------------------------------- 1 | Find the page named "Computer Science Student Dashboard" and extend the **Code Snippets** section with Go content. 2 | 3 | **Task Requirements:** 4 | 1. Add a bold paragraph that contains exactly the text `Go` to mark the start of the Go snippets. 5 | 2. Directly under that heading, add three code blocks configured with `language` set to **go**: 6 | a. **Basic Go program** – Caption must be `Basic Go program` and the code content must be exactly: 7 | ```go 8 | package main 9 | 10 | import "fmt" 11 | 12 | func main() { 13 | fmt.Println("Hello, World!") 14 | } 15 | ``` 16 | b. **For loop in Go** – Caption must be `For loop in Go` and the code content must be exactly: 17 | ```go 18 | for i := 0; i < 5; i++ { 19 | fmt.Println(i) 20 | } 21 | ``` 22 | c. **Function definition in Go** – Caption must be `Function definition in Go` and the code content must be exactly: 23 | ```go 24 | func add(a, b int) int { 25 | return a + b 26 | } 27 | ``` 28 | -------------------------------------------------------------------------------- /tasks/notion/standard/team_projects/priority_tasks_table/description.md: -------------------------------------------------------------------------------- 1 | Hi! In my Team Projects page, please create a five-column table block that lists all tasks meeting either of the following conditions: 2 | 1. The progress is 50% or less, or 3 | 2. The task has priority P0 but is not yet completed (i.e., progress not at 100%). 4 | 5 | You should query this information from the existing “Projects” database. 6 | 7 | In the newly created table, each row should represent one task, and all information should be stored as plain text (not relations, formulas, or linked properties). 8 | 9 | In the newly created table: 10 | • Each row should represent one task 11 | • All fields should be stored as plain text (not relations, formulas, or linked properties) 12 | • The table should be sorted by expected end date (End Date) in ascending order, so that the first entry is the one with the earliest end date 13 | 14 | The table should include the following headers: 15 | • Project 16 | • Eng Hours 17 | • Progress 18 | • Start Date 19 | • End Date 20 | 21 | Please make sure all relevant tasks are included. Thank you! -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/ai_data_analyst/description.md: -------------------------------------------------------------------------------- 1 | Deliver a bite-sized AI community snapshot by capturing two essential metrics and posting them back to the forum. 2 | 3 | **Task Requirements:** 4 | 5 | 1. Register and log in with username `AIDataAnalyst2025` and password `SecurePass123!` so every action is tied to that account. 6 | 2. Visit the `deeplearning` forum (page 1 only) and count how many posts are visible; note the exact total. 7 | 3. Go to the `MachineLearning` forum, find the post titled `[P] I made a command-line tool that explains your errors using ChatGPT (link in comments)`, and record the vote count shown in the listing. 8 | 4. Still in `MachineLearning`, create a post titled `MachineLearning_Extraction` whose body is exactly the two-line Markdown list below (keep the keys and pipe separators untouched, replacing each `FILL_VALUE` with your numbers): 9 | 10 | ``` 11 | - Deeplearning_Post_Count|FILL_VALUE 12 | - ChatGPT_Tool_Vote_Count|FILL_VALUE 13 | ``` 14 | 15 | No searches or multi-page navigation are required—just capture these two data points and report them accurately. 16 | -------------------------------------------------------------------------------- /tasks/github/easy/mcpmark-cicd/issue_lint_guard/description.md: -------------------------------------------------------------------------------- 1 | Use the GitHub MCP tools to wire up a tiny issue-triggered lint check for `mcpmark-eval/mcpmark-cicd`. 2 | 3 | ## Goal 4 | Whenever a maintainer opens the tracking issue **Lint workflow check**, the repo should automatically run `npm run lint` via GitHub Actions. Keep it simple—just prove the workflow fires for issue events. 5 | 6 | ## Requirements 7 | 1. Create a branch called `issue-lint-workflow` from `main`. 8 | 2. Add `.github/workflows/issue-lint.yml` with: 9 | - Workflow name **Issue Lint Guard**. 10 | - Trigger: `issues` with `types: [opened]` (no push/PR triggers). 11 | - Single job `lint` on `ubuntu-latest` using Node.js 18 via `actions/setup-node`. 12 | - Steps in order: `actions/checkout`, `npm ci`, `npm run lint`. 13 | 3. Open a pull request titled `Add issue lint workflow`, get it merged so the workflow exists on `main`. 14 | 4. After the merge, open a new issue titled **Lint workflow check** to trigger the workflow and wait until the matching run finishes successfully. Leave the issue open; we only care that the run went green. 15 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/shopping/advanced_product_analysis/description.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | **Task Requirements:** 4 | 5 | 1. Search for products with 'Ginger' in the Product Name field and price range $50.00 to $100.00 6 | 7 | 2. Add Q Mixers Premium Ginger Ale product to the comparison list 8 | 9 | 3. Find Intel NUC Kit product in Electronics category and add it to the comparison list 10 | 11 | 4. From the comparison page: 12 | - Record SKU numbers for both products 13 | - Add all products to cart 14 | 15 | 5. Record the total cart value 16 | 17 | 6. On the Ginger Ale product detail page, record: 18 | - Number of customer reviews 19 | - Name of the most recent reviewer (on top of the first page) 20 | 21 | 7. Output your findings in this format: 22 | 23 | ``` 24 | 25 | GingerAleSKU|sku 26 | IntelNUCSKU|sku 27 | CartTotal|amount 28 | ReviewCount|count 29 | LatestReviewer|name 30 | 31 | ``` 32 | 33 | **Example Output:** 34 | ``` 35 | 36 | GingerAleSKU|XXXXXXXXX 37 | IntelNUCSKU|XXXXXXXXX 38 | CartTotal|$XXX.XX 39 | ReviewCount|XX 40 | LatestReviewer|name 41 | 42 | ``` 43 | 44 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/movie_reviewer_analysis/description.md: -------------------------------------------------------------------------------- 1 | Provide a lightweight status report on what’s trending in the movies forum so stakeholders can scan it at a glance. 2 | 3 | **Task Requirements:** 4 | 5 | 1. Register and log in with username `movie_reviewer_2024` and password `movie_reviewer_2024`. 6 | 2. On the first page of the `movies` forum, count how many post titles contain any four-digit year (e.g., 1984, 2024) and record the total. 7 | 3. Still on that page, find the post with the highest upvote count and record its full title as well as the vote and comment counts shown. 8 | 4. Publish a post in the same forum titled `Wonderful Movies Analysis: Community Favorites [2024]`. The body must match the four-line Markdown list below—keep the keys, order, and pipe separators exactly as written while replacing each `FILL_VALUE` with your data: 9 | 10 | ``` 11 | - Total_Year_Posts|FILL_VALUE 12 | - Top_Title|FILL_VALUE 13 | - Top_Upvotes|FILL_VALUE 14 | - Top_Comments|FILL_VALUE 15 | ``` 16 | 17 | No multi-page browsing or special threads are required; this easy task captures just the top signals from the first page. 18 | -------------------------------------------------------------------------------- /tasks/filesystem/standard/threestudio/code_locating/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | ThreeStudio is a comprehensive codebase that implements various diffusion-based text-to-3D models, including NeRF-based rendering stage and diffusion guidance stage. Your task is to explore the codebase and identify the specific file that defines the guidance functionality for the Zero123 model. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Explore the ThreeStudio codebase** using filesystem MCP tools 10 | 2. **Search through the project structure** to understand the codebase organization 11 | 3. **Identify the file** that contains the Zero123 guidance implementation 12 | 4. **Create an answer file** with the correct file path 13 | 14 | ### Expected Output 15 | 16 | Create a file named `answer.txt` in the test directory root 17 | 18 | **Requirements:** 19 | - Only include the file path, no additional text or explanation 20 | - Use forward slashes (/) for path separators 21 | - Include the full relative path from the project root 22 | - Ensure the path points to the actual file that defines Zero123 guidance -------------------------------------------------------------------------------- /tasks/notion/standard/it_trouble_shooting_hub/verification_expired_update/description.md: -------------------------------------------------------------------------------- 1 | **Task Overview** 2 | 3 | My IT knowledge base contains pages whose verification status has expired: 4 | 5 | **Task Requirements** 6 | 1. Locate the database named **"IT Homepage"** inside the main page **"It Trouble Shooting Hub"**. 7 | 2. Within that database, find every page (except for **"It Inventory"**) where the **Verification** property state contains `expired`. 8 | 3. For **each** expired page: 9 | • Insert a **callout block** at the very top (as the first child block) whose rich-text content is: 10 | `VERIFICATION EXPIRED - This page needs review and re-verification` 11 | • Set the callout’s icon to ⚠️. 12 | • Set the callout’s colour to `red_background`. 13 | 4. Create a new entry in the **"IT Requests"** database with: 14 | • Title (property **Task name**) **exactly** `Batch Verification Update Required`. 15 | • **Priority** set to `High`. 16 | • **Status** set to `In progress`. 17 | • In the page body add a **bulleted list** where each bullet is a **mention** of the page processed in step 3 (i.e., use the Notion mention object linking to that page). -------------------------------------------------------------------------------- /tasks/github/easy/mcpmark-cicd/basic_ci_checks/description.md: -------------------------------------------------------------------------------- 1 | Use the GitHub MCP tools to update the `mcpmark-eval/mcpmark-cicd` repository with a very small CI workflow. 2 | 3 | ## Goal 4 | Add a GitHub Actions workflow named **Basic CI Checks** that automatically runs linting and unit tests any time work is pushed to or proposed for the `main` branch. 5 | 6 | ## Requirements 7 | 1. Create a branch called `basic-ci-checks` from `main`. 8 | 2. Add `.github/workflows/basic-ci.yml` with the following characteristics: 9 | - Workflow name: `Basic CI Checks`. 10 | - Trigger on both `push` and `pull_request`, limited to the `main` branch. 11 | - Single job called `quality-checks` that runs on `ubuntu-latest` and uses Node.js 18 (`actions/setup-node`). 12 | - Steps must include `actions/checkout`, `npm ci`, `npm run lint`, and `npm test` in that order after Node is configured. 13 | 3. Commit the workflow to your branch, open a pull request titled `Add basic CI checks`, and merge it so the workflow exists on `main`. 14 | 15 | That's it—no caching, matrix builds, or issue automation required. Keep it lightweight and focused on verifying the existing lint/test scripts. 16 | -------------------------------------------------------------------------------- /tasks/notion/standard/standard_operating_procedure/section_organization/description.md: -------------------------------------------------------------------------------- 1 | # Task: Reorganize Standard Operating Procedure Page Sections 2 | 3 | ## Objective 4 | Modify the structure of the Standard Operating Procedure page in Notion by reorganizing sections through swapping and creating a column layout. 5 | 6 | ## Requirements 7 | 8 | ### Step 1: Swap Sections 9 | - Navigate to the Standard Operating Procedure page 10 | - Swap the positions of the "Terminologies" and "Roles & responsibilities" sections 11 | - Preserve all content within each section exactly as is 12 | - Maintain the original formatting and structure of each section 13 | 14 | ### Step 2: Create Column Layout 15 | - After swapping, arrange the "Tools" section and the section immediately below it ("Terminologies") into a 2-column layout 16 | - Position the "Tools" section in the left column 17 | - Position the "Terminologies" section in the right column 18 | - In the "Tools" column, add links to the Notion and Figma pages using appropriate reference blocks 19 | - Preserve the original child pages from the "Tools" section in a toggle block placed below the column layout, with the toggle titled "original pages" -------------------------------------------------------------------------------- /tasks/playwright/standard/eval_web/cloudflare_turnstile_challenge/description.md: -------------------------------------------------------------------------------- 1 | # Cloudflare Turnstile Authentication Challenge Task 2 | 3 | Use Playwright MCP tools to complete Cloudflare Turnstile authentication challenge. 4 | 5 | ## Requirements: 6 | 7 | 1. Navigate to https://eval-web.mcpmark.ai/auth/turnstile 8 | 2. Fill in the authentication form with provided test credentials: 9 | - Username: "testuser" 10 | - Password: "password123" 11 | 3. Wait for the Cloudflare Turnstile challenge widget to load completely 12 | 4. Interact with the Turnstile challenge widget to complete the authentication (if needed) 13 | 5. Wait for successful challenge completion (widget shows success state with checkmark) 14 | 6. Submit the form by clicking the "Sign In" button 15 | 7. Wait for and capture any success message or confirmation that appears 16 | 8. Output the success capture in step 7 17 | 18 | ## Notes: 19 | 20 | - Use the provided test credentials: testuser / password123 21 | - Page shows success message inline, does not redirect to separate success page 22 | - Wait for all UI state changes before proceeding to next step 23 | - Verify both Turnstile completion and form submission success 24 | -------------------------------------------------------------------------------- /tasks/filesystem/standard/votenet/dataset_comparison/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | Analyze the codebase to map ScanNet object categories to SUN RGB-D categories and calculate object counts. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Primary Goal**: Use SUN RGB-D's 10-category classification system as the target taxonomy 10 | 2. **Mapping Requirement**: Map each ScanNet object category (using the "category" field, not "raw_category") to the corresponding SUN RGB-D category 11 | 3. **Calculation**: For each SUN RGB-D category, calculate the total count of objects from ScanNet that map to that category (It only counts if the category (not raw category) name are exactly the same(night_stand = nightstand)) 12 | 4. **Output**: Generate an analysis.txt file in the main directory showing the mapping and counts 13 | 14 | ### Expected Output 15 | 16 | Create a file named `analysis.txt` in the test directory root with the following format: 17 | 18 | - Each SUN RGB-D category should be represented as a 2-line block 19 | - Line 1: category name 20 | - Line 2: total count 21 | - Each block should be separated by one empty line 22 | -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_context/file_splitting/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "file_splitting", 3 | "task_name": "File Splitting", 4 | "category_id": "file_context", 5 | "category_name": "File Context", 6 | "description": "Split a large text file into multiple equal-length segments for easier processing, distribution, and parallel handling of content.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-08-08", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content transformation" 12 | ], 13 | "mcp": [ 14 | "filesystem" 15 | ], 16 | "meta_data": { 17 | "stateType": "text", 18 | "stateContent": "file_context/\n ├── file_01.txt\n ├── file_02.txt\n ├── file_03.txt\n ├── file_04.txt\n ├── file_05.txt\n ├── file_06.txt\n ├── file_07.txt\n ├── file_08.txt\n ├── file_09.txt\n ├── file_10.txt\n ├── file_11.txt\n ├── file_12.txt\n ├── file_13.txt\n ├── file_14.txt\n ├── file_15.txt\n ├── file_16.txt\n ├── file_17.txt\n ├── file_18.txt\n ├── file_19.txt\n ├── file_20.txt\n └── large_file.txt", 19 | "stateUrl": "https://storage.mcpmark.ai/filesystem/file_context.zip", 20 | "stateOriginalUrl": null 21 | } 22 | } -------------------------------------------------------------------------------- /tasks/notion/easy/online_resume/simple__skills_development_tracker/description.md: -------------------------------------------------------------------------------- 1 | Create a comprehensive skills audit system by performing the following tasks: 2 | 3 | **Task Requirements:** 4 | 1. Create a new database named "Skills Development Tracker" as a child database in the main resume page with the following properties: 5 | - Name (title property) 6 | - Current Skill (relation to Skills database) 7 | - Current Proficiency (rollup from related skill's "Skill Level" property) 8 | - Target Proficiency (number property with format "percent") 9 | - Gap (formula: Target Proficiency - Current Proficiency) 10 | - Learning Resources (rich text property) 11 | - Progress Notes (rich text property) 12 | 13 | 2. Populate the Skills Development Tracker database with entries for all skills that have a proficiency level below 70% (0.7): 14 | - For each qualifying skill, create an entry with: 15 | - Name: "[Skill Name] Development Plan" 16 | - Link to the corresponding skill in Skills database 17 | - Target Proficiency: Set to Current + 25% (capped at 95%) 18 | - Learning Resources: "Online courses and practice projects" 19 | - Progress Notes: "Initial assessment completed" 20 | -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_context/uppercase/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "uppercase", 3 | "task_name": "Uppercase", 4 | "category_id": "file_context", 5 | "category_name": "File Context", 6 | "description": "Convert the content of 10 specified files to uppercase format and calculate the total word count across all processed files.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-08-19", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content transformation", 12 | "data extraction" 13 | ], 14 | "mcp": [ 15 | "filesystem" 16 | ], 17 | "meta_data": { 18 | "stateType": "text", 19 | "stateContent": "file_context/\n ├── file_01.txt\n ├── file_02.txt\n ├── file_03.txt\n ├── file_04.txt\n ├── file_05.txt\n ├── file_06.txt\n ├── file_07.txt\n ├── file_08.txt\n ├── file_09.txt\n ├── file_10.txt\n ├── file_11.txt\n ├── file_12.txt\n ├── file_13.txt\n ├── file_14.txt\n ├── file_15.txt\n ├── file_16.txt\n ├── file_17.txt\n ├── file_18.txt\n ├── file_19.txt\n ├── file_20.txt\n └── large_file.txt", 20 | "stateUrl": "https://storage.mcpmark.ai/filesystem/file_context.zip", 21 | "stateOriginalUrl": null 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_property/time_classification/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | Analyze the creation time (ctime) of all files in the test directory and organize them into a hierarchical directory structure based on their creation dates. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Read metadata** of all files in the test directory 10 | 2. **Analyze creation times** (ctime) of all files (excluding .DS_Store) 11 | 3. **Create directory structure** organized by month/day based on creation time 12 | 4. **Move files** to appropriate directories 13 | 5. **Create metadata analysis files** in each directory 14 | 15 | ### Expected Output 16 | 17 | #### Directory Structure 18 | 19 | Create directories in the format: `MM/DD/` where: 20 | 21 | - MM = month (two digits, e.g., 01, 02) 22 | - DD = day (two digits, e.g., 07, 09, 11, 26) 23 | 24 | #### Metadata Analysis Files 25 | 26 | Create a file named `metadata_analyse.txt` in each directory containing exactly only two lines: 27 | 28 | - **Line 1**: Oldest filename and its creation time (excluding .DS_Store) 29 | - **Line 2**: Latest filename and its creation time (excluding .DS_Store) 30 | -------------------------------------------------------------------------------- /tasks/postgres/easy/employees/employee_projects_basic/description.md: -------------------------------------------------------------------------------- 1 | Create and manage a basic employee projects table to track company projects. The IT team needs you to build the database table structure and populate it with initial project data. 2 | 3 | ## Your Tasks: 4 | 5 | 1. **Create the employee_projects table** — build a new table in the `employees` schema: 6 | 7 | **Table: `employee_projects`** 8 | * `project_id` (integer, primary key, auto-increment) 9 | * `project_name` (varchar(100), not null) 10 | * `start_date` (date, not null) 11 | * `end_date` (date) 12 | * `budget` (decimal(10,2)) 13 | * `status` (varchar(20), default 'active') 14 | 15 | 2. **Insert exactly this initial data into `employee_projects`**: 16 | * Project 1: name='Database Modernization', start_date='2024-01-15', end_date='2024-06-30', budget=250000.00, status='active' 17 | * Project 2: name='Employee Portal Upgrade', start_date='2024-02-01', end_date='2024-05-15', budget=180000.00, status='active' 18 | * Project 3: name='HR Analytics Dashboard', start_date='2023-11-01', end_date='2024-01-31', budget=120000.00, status='active' 19 | 20 | This will establish the basic project tracking foundation for the company. 21 | -------------------------------------------------------------------------------- /.dockerignore: -------------------------------------------------------------------------------- 1 | # Git 2 | .git 3 | .gitignore 4 | 5 | # Python 6 | __pycache__ 7 | *.pyc 8 | *.pyo 9 | *.pyd 10 | .Python 11 | *.egg 12 | *.egg-info/ 13 | dist/ 14 | build/ 15 | .eggs/ 16 | *.so 17 | 18 | # Virtual environments 19 | venv/ 20 | env/ 21 | ENV/ 22 | .venv/ 23 | 24 | # IDE 25 | .vscode/ 26 | .idea/ 27 | *.swp 28 | *.swo 29 | *~ 30 | .DS_Store 31 | 32 | # Environment files (contain secrets) 33 | .env 34 | .mcp_env 35 | notion_state.json 36 | 37 | # Test and development files 38 | .pytest_cache/ 39 | .coverage 40 | htmlcov/ 41 | .tox/ 42 | .mypy_cache/ 43 | .ruff_cache/ 44 | tests/ 45 | test_environments/ 46 | 47 | # Results and logs 48 | results/ 49 | *.log 50 | logs/ 51 | 52 | # PostgreSQL data 53 | .postgres/ 54 | 55 | # Playwright 56 | playwright-report/ 57 | test-results/ 58 | 59 | # Documentation images 60 | asset/ 61 | 62 | # Temporary files 63 | *.tmp 64 | tmp/ 65 | temp/ 66 | 67 | # Docker 68 | Dockerfile 69 | docker-compose.yml 70 | .dockerignore 71 | 72 | # Node modules (if any locally installed) 73 | node_modules/ 74 | 75 | # Pixi lock file 76 | pixi.lock 77 | .pixi/ 78 | 79 | # GitHub state files 80 | github_state/ 81 | github_template_repo/ 82 | 83 | # Backup directories 84 | .mcpbench_backups/ -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_context/file_merging/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "file_merging", 3 | "task_name": "File Merging", 4 | "category_id": "file_context", 5 | "category_name": "File Context", 6 | "description": "Identify the 10 smallest text files in the directory, then merge their content in alphabetical order into a single consolidated file.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-08-07", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "content transformation", 12 | "file organization" 13 | ], 14 | "mcp": [ 15 | "filesystem" 16 | ], 17 | "meta_data": { 18 | "stateType": "text", 19 | "stateContent": "file_context/\n ├── file_01.txt\n ├── file_02.txt\n ├── file_03.txt\n ├── file_04.txt\n ├── file_05.txt\n ├── file_06.txt\n ├── file_07.txt\n ├── file_08.txt\n ├── file_09.txt\n ├── file_10.txt\n ├── file_11.txt\n ├── file_12.txt\n ├── file_13.txt\n ├── file_14.txt\n ├── file_15.txt\n ├── file_16.txt\n ├── file_17.txt\n ├── file_18.txt\n ├── file_19.txt\n ├── file_20.txt\n └── large_file.txt", 20 | "stateUrl": "https://storage.mcpmark.ai/filesystem/file_context.zip", 21 | "stateOriginalUrl": null 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/playwright_webarena/easy/reddit/nba_statistics_analysis/description.md: -------------------------------------------------------------------------------- 1 | Share a five-line NBA engagement brief so sports moderators can instantly see which thread is dominating. 2 | 3 | **Task Requirements:** 4 | 5 | 1. Register and log in with username `NBA_DataAnalyst_2024` and password `Research#2024!`. 6 | 2. In the `sports` forum (or via the site search for “NBA”), locate the NBA-related post with the highest upvote count you can see. Record its exact title, vote total, comment count, and author username. 7 | 3. Count how many distinct NBA-titled posts you reviewed for this summary (at minimum, include the post from step 2) and store that total as `Total_NBA_Posts`. 8 | 4. Publish a post titled `Statistical Analysis: NBA Content Engagement on This Forum` in the same forum. Its body must consist solely of the Markdown list below; keep the keys/order/pipes untouched while replacing each `FILL_VALUE` with your numbers: 9 | 10 | ``` 11 | - Total_NBA_Posts|FILL_VALUE 12 | - Top_Title|FILL_VALUE 13 | - Top_Votes|FILL_VALUE 14 | - Top_Comments|FILL_VALUE 15 | - Top_Author|FILL_VALUE 16 | ``` 17 | 18 | This easy edition just reports the leading NBA thread plus the count of posts you reviewed—no deeper profile checks are necessary. 19 | -------------------------------------------------------------------------------- /tasks/playwright_webarena/standard/reddit/routine_tracker_forum/description.md: -------------------------------------------------------------------------------- 1 | Create a daily routine tracking system on the Postmill forum platform by setting up a new account and engaging with productivity-related content in the LifeProTips community. 2 | 3 | **Task Requirements:** 4 | 5 | 1. Create a new account with the exact username "RoutineTracker2025" and password "DailyRoutine123!" 6 | 7 | 2. Locate the following specific post: 8 | - Title: "LPT: Use your calendar as your to-do list. Assigning dedicated time to tasks increases the likelyhood of you acting upon it." 9 | - Record the content of the most upvoted comment of this post 10 | 11 | 3. Create a new submission in the LifeProTips forum with these exact specifications: 12 | - Title: "My 5-Step Morning Routine That Increased My Productivity by 200%" 13 | - Body content must be the content recorded in the above step 14 | 15 | 4. After successfully posting, verify that your post appears in the LifeProTips forum listing 16 | 17 | 5. Engage with the community by upvoting exactly these two posts: 18 | - The calendar to-do list post you found in step 2 19 | - The post titled "LPT: clean your stovetop after using the oven. The heat loosens grime for easy removal" -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_context/pattern_matching/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "pattern_matching", 3 | "task_name": "Pattern Matching", 4 | "category_id": "file_context", 5 | "category_name": "File Context", 6 | "description": "Search multiple files for shared character sequences and precisely locate all matching pattern occurrences within the target files.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-08-06", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "pattern analysis", 12 | "cross-referencing" 13 | ], 14 | "mcp": [ 15 | "filesystem" 16 | ], 17 | "meta_data": { 18 | "stateType": "text", 19 | "stateContent": "file_context/\n ├── file_01.txt\n ├── file_02.txt\n ├── file_03.txt\n ├── file_04.txt\n ├── file_05.txt\n ├── file_06.txt\n ├── file_07.txt\n ├── file_08.txt\n ├── file_09.txt\n ├── file_10.txt\n ├── file_11.txt\n ├── file_12.txt\n ├── file_13.txt\n ├── file_14.txt\n ├── file_15.txt\n ├── file_16.txt\n ├── file_17.txt\n ├── file_18.txt\n ├── file_19.txt\n ├── file_20.txt\n └── large_file.txt", 20 | "stateUrl": "https://storage.mcpmark.ai/filesystem/file_context.zip", 21 | "stateOriginalUrl": null 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/file_context/duplicates_searching/meta.json: -------------------------------------------------------------------------------- 1 | { 2 | "task_id": "duplicates_searching", 3 | "task_name": "Duplicates Searching", 4 | "category_id": "file_context", 5 | "category_name": "File Context", 6 | "description": "Scan directory to identify files with identical content, then organize all duplicate files into a separate dedicated directory for cleanup.", 7 | "author": "Lingjun Chen", 8 | "created_at": "2025-08-06", 9 | "difficulty": "L3", 10 | "tags": [ 11 | "pattern analysis", 12 | "file organization" 13 | ], 14 | "mcp": [ 15 | "filesystem" 16 | ], 17 | "meta_data": { 18 | "stateType": "text", 19 | "stateContent": "file_context/\n ├── file_01.txt\n ├── file_02.txt\n ├── file_03.txt\n ├── file_04.txt\n ├── file_05.txt\n ├── file_06.txt\n ├── file_07.txt\n ├── file_08.txt\n ├── file_09.txt\n ├── file_10.txt\n ├── file_11.txt\n ├── file_12.txt\n ├── file_13.txt\n ├── file_14.txt\n ├── file_15.txt\n ├── file_16.txt\n ├── file_17.txt\n ├── file_18.txt\n ├── file_19.txt\n ├── file_20.txt\n └── large_file.txt", 20 | "stateUrl": "https://storage.mcpmark.ai/filesystem/file_context.zip", 21 | "stateOriginalUrl": null 22 | } 23 | } -------------------------------------------------------------------------------- /tasks/filesystem/standard/desktop_template/file_arrangement/description.md: -------------------------------------------------------------------------------- 1 | Please use FileSystem tools to finish the following task: 2 | 3 | ### Task Description 4 | 5 | You are tasked with organizing files on an AI researcher's desktop into a structured folder system. You need to create specific folders and move files to their designated locations according to the provided organization scheme. 6 | 7 | ### Task Objectives 8 | 9 | 1. **Create the following folder structure** in the main directory: 10 | 11 | - `work/` - for work, research and projects related files 12 | - `life/` - for personal life related files 13 | - `archives/` - for archived files or files with past dates in its file names 14 | - `temp/` - for temporary files, drafts 15 | - `others/` - for files that cannot be classified elsewhere 16 | 17 | ### Important Notes 18 | 19 | - All files must be moved from their current locations to the specified folders 20 | - The `others/` folder is for files that don't fit the other categories 21 | - Do not modify the contents of any files, only move them to the correct locations 22 | - If you are not sure about which folder it should belongs to, you can read the context in the files before making decisions 23 | - **Do not change files' name** 24 | -------------------------------------------------------------------------------- /tasks/postgres/easy/employees/hiring_year_summary/description.md: -------------------------------------------------------------------------------- 1 | Create a hiring year summary table to help HR track employee retention trends over the years. This analysis shows how many employees were hired each year and how many are still with the company. 2 | 3 | ## Your Task: 4 | 5 | **Create the hiring year summary table** — build a table called `hiring_year_summary` in the `employees` schema with these exact columns: 6 | 7 | * `hire_year` (integer) — year employees were hired 8 | * `employees_hired` (integer) — number of employees hired that year 9 | * `still_employed` (integer) — how many from that year are still employed (have active salary where to_date = '9999-01-01') 10 | * `retention_rate` (decimal) — percentage still employed (still_employed / employees_hired * 100) 11 | 12 | ## Requirements: 13 | 14 | 1. Extract the hire year from the `hire_date` column in the `employees` table 15 | 2. Count total employees hired in each year 16 | 3. Determine which employees are still employed by checking for active salary records (to_date = '9999-01-01' in the `salaries` table) 17 | 4. Order results by hire_year in ascending order 18 | 19 | This analysis will help HR understand retention patterns and identify years with particularly high or low retention rates. 20 | --------------------------------------------------------------------------------