├── Data ├── column_id_test.json ├── column_id_validation.json ├── row_id_test.json └── row_id_validation.json ├── Population ├── abstract_index.py ├── cat_type_index.py ├── column_evaluation.py ├── column_rank_label.py ├── elastic.py ├── elastic_cache.py ├── retrieval.py ├── row_evaluation.py ├── row_ranking_entities.py ├── scorer.py ├── table_index_example.py └── toy_index.py ├── README.md └── requirements.txt /Data/column_id_test.json: -------------------------------------------------------------------------------- 1 | ["table-0298-610", "table-0707-95", "table-0676-574", "table-1356-847", "table-0784-89", "table-1309-632", "table-0156-787", "table-1258-550", "table-1414-539", "table-0249-860", "table-0080-813", "table-1593-997", "table-0415-953", "table-1412-750", "table-1121-916", "table-1125-616", "table-1000-724", "table-0080-635", "table-0016-880", "table-0409-667", "table-1434-124", "table-0367-31", "table-1508-774", "table-0182-472", "table-0181-274", "table-0767-873", "table-1134-716", "table-1132-84", "table-1142-776", "table-1509-709", "table-1410-247", "table-0222-123", "table-0299-668", "table-1537-286", "table-0541-881", "table-0226-214", "table-1218-948", "table-0568-903", "table-1591-93", "table-0970-202", "table-0887-9", "table-1311-296", "table-0391-583", "table-0615-99", "table-0438-806", "table-0607-381", "table-1562-280", "table-0282-507", "table-1563-226", "table-1344-925", "table-1451-775", "table-0476-683", "table-0208-71", "table-0802-177", "table-0702-468", "table-1610-976", "table-1458-710", "table-0352-491", "table-0163-453", "table-1508-708", "table-0547-494", "table-0617-545", "table-0793-901", "table-0080-907", "table-1587-200", "table-0398-699", "table-0470-665", "table-0145-963", "table-1010-28", "table-0838-822", "table-1126-300", "table-0797-349", "table-0629-30", "table-0544-661", "table-1492-152", "table-0205-858", "table-0500-27", "table-0628-871", "table-1460-565", "table-1364-897", "table-0067-3", "table-0826-678", "table-0650-392", "table-0946-284", "table-1459-438", "table-0893-634", "table-0527-288", "table-0212-900", "table-0517-46", "table-0607-346", "table-0395-622", "table-0697-648", "table-1090-708", "table-0596-31", "table-0852-538", "table-1482-720", "table-1500-890", "table-1081-552", "table-0736-268", "table-1018-21", "table-1626-720", "table-0762-322", "table-1118-363", "table-0966-378", "table-0355-968", "table-1599-823", "table-1136-293", "table-1190-814", "table-0625-113", "table-0030-139", "table-1433-425", "table-0179-57", "table-0562-558", "table-0318-594", "table-0384-484", "table-0103-724", "table-0266-90", "table-1359-567", "table-1421-398", "table-1112-238", "table-0425-994", "table-0924-90", "table-1086-230", "table-0067-989", "table-1462-926", "table-0544-324", "table-0207-501", "table-0262-53", "table-0116-757", "table-1295-490", "table-1266-993", "table-0212-396", "table-0572-80", "table-0429-397", "table-1073-863", "table-0725-291", "table-1406-157", "table-0177-609", "table-0005-498", "table-0195-374", "table-0225-154", "table-0409-271", "table-0543-806", "table-0618-489", "table-0506-564", "table-0764-257", "table-1356-841", "table-0755-781", "table-1416-310", "table-0583-952", "table-1094-544", "table-0548-315", "table-0792-398", "table-0565-445", "table-1202-458", "table-0714-366", "table-1410-323", "table-0484-425", "table-1411-809", "table-0157-875", "table-1479-709", "table-0392-522", "table-1125-577", "table-0876-2", "table-0676-494", "table-0460-705", "table-0204-968", "table-1642-239", "table-0103-789", "table-0008-694", "table-0653-383", "table-1100-950", "table-0157-927", "table-1014-616", "table-0318-503", "table-1310-166", "table-0725-332", "table-0185-272", "table-0519-197", "table-1443-782", "table-0696-62", "table-1075-394", "table-0502-14", "table-0298-406", "table-1573-683", "table-1342-325", "table-0502-558", "table-0533-482", "table-0970-910", "table-1572-266", "table-0724-862", "table-0158-359", "table-1491-266", "table-1173-974", "table-1147-369", "table-0430-897", "table-1147-568", "table-0647-465", "table-0558-811", "table-0393-519", "table-1583-498", "table-0210-573", "table-0303-238", "table-0104-467", "table-1413-638", "table-1405-792", "table-0747-649", "table-0626-204", "table-0695-630", "table-0949-300", "table-1020-925", "table-0823-900", "table-0193-696", "table-1044-293", "table-0647-355", "table-0953-532", "table-1001-182", "table-1019-745", "table-1311-772", "table-0157-200", "table-0248-449", "table-0831-846", "table-1506-938", "table-0915-187", "table-0404-837", "table-0556-685", "table-1265-344", "table-0315-755", "table-0894-581", "table-1526-849", "table-0585-942", "table-0797-574", "table-1416-79", "table-0448-226", "table-0769-454", "table-0158-577", "table-1651-929", "table-0884-987", "table-0754-903", "table-1245-764", "table-0157-766", "table-1326-206", "table-1283-853", "table-1185-928", "table-1026-752", "table-0288-361", "table-0562-112", "table-0707-931", "table-0306-682", "table-0597-836", "table-0278-748", "table-0912-720", "table-0396-949", "table-0889-349", "table-1558-898", "table-0887-120", "table-0267-612", "table-0460-929", "table-1231-325", "table-0049-240", "table-0663-441", "table-1155-740", "table-1637-560", "table-0960-738", "table-0346-296", "table-1111-757", "table-1626-594", "table-0083-434", "table-1397-159", "table-0249-374", "table-1607-308", "table-0340-37", "table-1610-431", "table-1622-79", "table-0837-947", "table-0198-591", "table-1044-915", "table-0243-882", "table-0951-858", "table-1603-741", "table-1186-517", "table-1504-364", "table-0540-919", "table-1504-107", "table-0527-856", "table-1147-691", "table-1342-421", "table-0431-964", "table-0157-78", "table-0158-298", "table-0295-64", "table-1106-199", "table-0157-62", "table-0408-107", "table-0166-610", "table-0531-449", "table-1274-977", "table-0157-940", "table-0959-183", "table-1573-482", "table-0481-872", "table-1375-943", "table-0126-242", "table-0899-941", "table-0474-851", "table-1243-880", "table-0860-796", "table-0958-182", "table-0060-993", "table-1560-145", "table-0157-273", "table-0664-210", "table-1238-114", "table-1094-366", "table-0183-581", "table-0948-853", "table-0203-480", "table-1218-829", "table-0591-993", "table-1288-103", "table-0456-895", "table-1596-173", "table-1080-779", "table-1315-779", "table-1423-105", "table-0294-490", "table-0142-976", "table-0281-966", "table-0235-968", "table-1277-861", "table-0654-349", "table-0045-191", "table-0320-121", "table-1202-464", "table-0137-521", "table-0545-311", "table-1283-769", "table-0103-838", "table-0446-339", "table-1557-434", "table-0522-511", "table-0488-978", "table-1127-404", "table-0206-703", "table-0421-327", "table-0158-367", "table-0313-222", "table-0061-512", "table-0801-129", "table-1170-540", "table-0398-790", "table-0338-176", "table-1147-586", "table-0434-566", "table-1577-215", "table-1579-719", "table-0546-213", "table-0142-983", "table-0315-131", "table-0464-149", "table-0083-127", "table-0975-995", "table-0398-822", "table-1085-65", "table-0725-449", "table-1254-120", "table-1403-736", "table-0928-898", "table-0552-59", "table-1014-487", "table-0113-922", "table-0157-353", "table-0944-582", "table-1361-135", "table-0220-834", "table-0481-408", "table-1479-346", "table-1365-68", "table-1156-8", "table-0196-114", "table-1180-191", "table-1314-69", "table-0503-918", "table-0194-966", "table-0541-422", "table-0543-817", "table-0931-943", "table-0293-434", "table-0022-942", "table-1004-372", "table-1194-985", "table-1538-639", "table-1491-757", "table-0573-259", "table-0090-366", "table-0158-823", "table-1090-726", "table-0486-524", "table-0583-950", "table-0198-249", "table-1203-356", "table-1004-373", "table-1054-258", "table-1371-611", "table-0259-832", "table-0355-191", "table-0627-199", "table-1567-54", "table-1130-572", "table-1166-326", "table-0412-764", "table-0732-901", "table-1350-899", "table-0703-988", "table-1550-852", "table-0768-845", "table-0161-676", "table-1340-583", "table-1025-791", "table-0126-96", "table-1546-200", "table-0506-503", "table-0755-774", "table-1068-539", "table-1356-656", "table-0105-742", "table-0177-871", "table-1088-31", "table-1246-975", "table-1373-56", "table-1351-754", "table-1217-699", "table-0459-821", "table-1132-780", "table-0663-443", "table-1575-687", "table-0158-277", "table-1277-545", "table-0502-401", "table-0586-304", "table-0675-545", "table-0894-393", "table-0897-983", "table-0236-29", "table-0442-480", "table-0623-433", "table-0896-878", "table-1248-884", "table-1142-348", "table-1587-12", "table-0696-434", "table-0459-474", "table-0289-164", "table-1361-746", "table-1213-413", "table-0548-80", "table-0184-35", "table-0151-657", "table-0624-6", "table-0355-650", "table-1490-676", "table-0451-251", "table-1219-970", "table-0175-105", "table-1395-377", "table-0091-657", "table-0425-428", "table-0279-695", "table-1371-76", "table-1257-728", "table-0926-59", "table-1082-61", "table-0914-370", "table-0301-609", "table-1271-271", "table-0250-25", "table-0734-46", "table-1244-409", "table-0091-228", "table-0656-707", "table-0530-638", "table-1560-354", "table-0787-758", "table-0428-163", "table-1537-801", "table-0770-118", "table-0278-43", "table-1061-921", "table-0886-782", "table-1086-294", "table-0635-345", "table-0158-454", "table-1569-365", "table-0876-381", "table-1293-350", "table-0781-270", "table-0067-17", "table-1424-587", "table-1157-318", "table-0108-678", "table-0739-418", "table-0474-869", "table-0633-796", "table-1063-461", "table-1420-907", "table-1580-260", "table-1271-292", "table-0298-493", "table-0278-407", "table-1214-528", "table-0395-927", "table-1433-446", "table-0380-537", "table-1569-551", "table-1136-150", "table-0726-902", "table-0976-544", "table-0935-495", "table-0991-485", "table-0213-16", "table-1415-271", "table-0252-211", "table-0512-211", "table-0075-62", "table-0901-32", "table-1132-82", "table-1566-973", "table-0157-410", "table-1445-790", "table-1491-773", "table-1603-476", "table-0184-952", "table-0457-421", "table-0917-867", "table-0157-75", "table-1293-556", "table-1003-954", "table-0232-3", "table-0743-182", "table-1043-814", "table-1416-801", "table-0754-998", "table-0424-759", "table-0699-185", "table-0178-972", "table-1601-337", "table-0142-254", "table-0897-813", "table-0813-831", "table-0145-695", "table-1060-788", "table-0446-206", "table-0158-873", "table-0451-909", "table-1364-679", "table-1103-255", "table-1134-660", "table-1363-571", "table-0186-535", "table-1408-657", "table-0790-807", "table-0643-330", "table-0591-992", "table-0534-30", "table-0084-431", "table-0471-388", "table-1402-649", "table-0402-49", "table-0500-304", "table-0238-349", "table-1559-56", "table-0196-8", "table-1102-177", "table-1405-83", "table-0924-732", "table-0097-324", "table-0812-763", "table-1233-887", "table-1186-682", "table-0499-888", "table-0110-476", "table-0591-991", "table-1392-643", "table-0835-200", "table-1515-588", "table-1491-555", "table-0128-443", "table-0922-180", "table-1526-502", "table-0570-719", "table-0923-977", "table-0149-324", "table-1163-335", "table-0665-726", "table-1156-110", "table-1156-402", "table-0317-268", "table-0184-503", "table-0722-932", "table-1487-693", "table-1085-77", "table-1133-594", "table-0160-777", "table-1355-931", "table-0303-70", "table-0392-36", "table-1068-838", "table-0590-127", "table-1009-437", "table-0542-8", "table-0521-328", "table-0458-308", "table-0175-128", "table-0228-861", "table-1224-772", "table-0182-651", "table-1095-167", "table-0040-79", "table-1399-521", "table-1350-610", "table-0218-636", "table-0315-750", "table-0272-937", "table-0659-353", "table-0153-954", "table-1243-920", "table-0710-596", "table-0451-119", "table-0722-558", "table-0929-689", "table-1131-266", "table-0175-828", "table-1157-922", "table-0421-359", "table-1426-93", "table-1388-101", "table-0217-927", "table-1329-199", "table-0149-912", "table-0998-125", "table-0374-692", "table-1006-766", "table-0474-705", "table-0044-349", "table-1602-874", "table-0545-360", "table-1287-145", "table-0482-443", "table-0071-57", "table-1526-787", "table-0145-641", "table-0158-425", "table-1324-700", "table-0237-559", "table-1435-230", "table-0596-297", "table-0106-101", "table-0411-824", "table-1560-375", "table-0614-686", "table-1179-721", "table-0180-786", "table-1549-342", "table-0532-992", "table-1255-736", "table-0201-865", "table-0635-986", "table-1275-782", "table-1518-113", "table-1648-184", "table-0180-350", "table-0067-357", "table-0034-460", "table-1215-32", "table-0587-209", "table-0672-909", "table-0968-108", "table-1145-324", "table-0543-297", "table-1230-783", "table-0799-573", "table-1319-109", "table-1518-770", "table-0592-965", "table-1035-752", "table-0428-898", "table-1394-418", "table-1379-533", "table-0459-769", "table-1312-9", "table-0428-816", "table-0297-218", "table-0246-889", "table-0927-150", "table-0129-954", "table-0920-990", "table-1185-291", "table-1063-60", "table-0767-364", "table-0624-5", "table-0193-156", "table-1131-637", "table-0725-320", "table-0246-681", "table-0717-530", "table-1319-848", "table-0067-655", "table-1103-51", "table-0385-725", "table-0541-412", "table-0591-74", "table-1242-306", "table-1055-73", "table-0435-40", "table-1453-272", "table-1121-857", "table-0158-89", "table-0301-471", "table-0766-65", "table-1583-554", "table-0246-720", "table-0256-343", "table-0439-213", "table-1578-360", "table-0967-861", "table-0156-862", "table-0275-547", "table-0544-234", "table-0182-249", "table-0483-225", "table-1536-76", "table-0838-860", "table-1265-628", "table-0156-515", "table-1230-547", "table-1396-812", "table-0299-775", "table-0289-20", "table-1234-898", "table-0192-498", "table-0409-268", "table-0158-673", "table-0998-128", "table-0227-702", "table-0002-873", "table-0395-332", "table-0136-110", "table-0359-190", "table-0479-108", "table-1225-672", "table-1379-623", "table-0580-99", "table-0175-707", "table-0924-5", "table-1031-190", "table-0235-697", "table-0299-595", "table-0662-910", "table-1462-141", "table-0425-923", "table-1614-441", "table-0439-156", "table-1310-955", "table-0257-143", "table-0419-913", "table-1202-457", "table-0549-10", "table-0086-782", "table-0802-388", "table-1544-22", "table-0850-110", "table-0086-781", "table-0075-249", "table-0142-68", "table-0458-879", "table-1381-897", "table-0148-21", "table-0891-118", "table-0948-788", "table-0696-164", "table-1234-147", "table-0933-190", "table-0545-275", "table-0944-981", "table-1051-310", "table-0151-893", "table-1348-652", "table-0210-609", "table-0657-533", "table-0299-426", "table-1530-104", "table-1213-869", "table-1416-799", "table-1486-366", "table-1366-637", "table-0721-375", "table-0158-568", "table-1630-715", "table-0156-571", "table-0878-543", "table-1361-140", "table-1154-200", "table-0976-542", "table-1034-919", "table-0897-926", "table-0310-271", "table-1215-643", "table-0598-961", "table-1186-536", "table-0794-843", "table-0617-99", "table-0458-523", "table-0898-773", "table-0267-766", "table-1565-85", "table-0388-629", "table-1646-811", "table-0748-763", "table-0278-132", "table-1225-63", "table-1612-521", "table-1389-87", "table-0822-259", "table-0207-410", "table-0858-124", "table-1546-823", "table-0774-360", "table-1210-473", "table-0914-382", "table-0067-199", "table-1179-865", "table-0483-208", "table-0581-743", "table-1167-247", "table-0696-74", "table-0767-286", "table-0828-284", "table-0998-668", "table-0707-946", "table-0194-351", "table-1341-44", "table-0081-18", "table-1635-254", "table-0317-330", "table-0493-74", "table-0724-19", "table-0158-107", "table-0160-441", "table-0333-592", "table-0459-375", "table-0529-802", "table-0874-694", "table-1003-696", "table-0178-56", "table-1018-616", "table-1491-947", "table-0733-917", "table-0605-948", "table-1356-479", "table-1653-730", "table-0091-963", "table-0210-394", "table-0278-409", "table-1268-946", "table-0920-91", "table-1288-1", "table-0157-117", "table-0612-136", "table-1218-999", "table-0474-434", "table-0285-836", "table-0766-448", "table-0664-26", "table-0665-131", "table-0874-750", "table-0187-991", "table-1076-441", "table-0269-541", "table-1025-635", "table-1019-963", "table-1646-124", "table-1109-550", "table-1249-301", "table-1086-283", "table-0296-698", "table-0750-740", "table-0988-43", "table-1225-656", "table-1647-786", "table-0208-306", "table-0502-822", "table-1096-630", "table-1505-911", "table-0731-103", "table-1509-786", "table-0499-890", "table-0997-194", "table-1570-828", "table-0727-581", "table-0132-945", "table-0685-223", "table-1090-718", "table-0541-433", "table-0750-837", "table-0834-977", "table-0067-14", "table-0020-796", "table-0941-210", "table-0478-948", "table-0158-669", "table-0156-651", "table-1139-288", "table-0173-805", "table-0512-464", "table-0428-649", "table-0889-682", "table-0553-1000", "table-0603-297", "table-0873-476", "table-0254-833", "table-0643-126", "table-0708-382", "table-1565-742", "table-1401-84", "table-0088-617", "table-1328-643", "table-1248-265", "table-0714-555", "table-1521-844", "table-0953-848", "table-1481-165", "table-0465-819", "table-0387-247", "table-1622-301", "table-0282-829", "table-1385-349", "table-1263-952", "table-1111-891", "table-0893-974", "table-1303-634", "table-0903-271", "table-0606-762", "table-1085-69", "table-0940-584", "table-1642-928", "table-0333-294", "table-1604-506", "table-0847-679", "table-1584-709", "table-1387-950", "table-0618-212", "table-0157-310", "table-1610-425", "table-1257-568", "table-0657-477", "table-0936-445", "table-0928-698", "table-1457-696", "table-1489-103", "table-1182-648", "table-1301-997", "table-0245-346", "table-0429-395", "table-1300-502", "table-1109-293", "table-1145-242", "table-0192-482", "table-1471-217", "table-0754-905", "table-0160-278", "table-1395-225", "table-0808-263", "table-1479-345", "table-0698-838", "table-1528-37", "table-1134-641", "table-0459-966", "table-0080-598", "table-0571-537", "table-0714-551", "table-0187-287", "table-0953-810", "table-0469-377", "table-0766-951", "table-1439-27", "table-0098-810", "table-1117-710", "table-0081-246", "table-0684-630", "table-1278-729", "table-0194-713", "table-0524-711", "table-0998-876", "table-0506-467", "table-0697-297", "table-1251-512", "table-0968-741", "table-1123-216", "table-0554-253", "table-0739-600"] -------------------------------------------------------------------------------- /Data/column_id_validation.json: -------------------------------------------------------------------------------- 1 | ["table-0780-126", "table-0651-788", "table-1512-363", "table-0380-140", "table-1621-515", "table-0759-437", "table-0140-529", "table-0419-910", "table-0103-683", "table-0097-301", "table-0157-938", "table-0131-306", "table-1265-217", "table-0039-568", "table-1336-624", "table-1366-247", "table-0250-497", "table-1439-74", "table-1256-535", "table-0106-505", "table-0680-339", "table-1399-466", "table-1019-916", "table-0449-208", "table-0897-560", "table-0158-678", "table-1003-947", "table-1269-517", "table-0298-386", "table-1507-386", "table-1006-423", "table-0897-557", "table-0757-351", "table-1037-775", "table-1078-108", "table-0453-346", "table-0214-19", "table-0431-417", "table-0557-53", "table-1080-219", "table-1001-880", "table-0275-957", "table-0755-228", "table-0685-72", "table-1419-673", "table-1185-10", "table-0814-96", "table-0111-452", "table-1131-376", "table-0940-752", "table-1171-583", "table-0809-372", "table-0535-891", "table-0742-782", "table-1420-213", "table-0271-611", "table-0718-895", "table-1461-975", "table-1004-980", "table-0851-656", "table-1354-747", "table-0285-848", "table-1407-110", "table-0124-337", "table-0742-448", "table-0138-767", "table-0156-583", "table-0156-627", "table-0204-193", "table-0832-190", "table-0175-5", "table-0754-423", "table-0858-201", "table-0755-766", "table-1191-461", "table-0303-398", "table-0545-552", "table-0336-736", "table-1582-551", "table-1120-214", "table-0789-60", "table-1107-281", "table-0514-648", "table-1243-504", "table-0041-694", "table-0740-507", "table-1343-451", "table-0776-77", "table-0543-360", "table-0166-642", "table-0854-67", "table-0437-953", "table-1004-754", "table-1121-205", "table-0614-436", "table-0179-123", "table-0106-282", "table-0158-176", "table-1375-275", "table-0171-227", "table-0454-841", "table-0713-242", "table-1567-764", "table-0486-222", "table-1095-319", "table-1230-4", "table-0951-177", "table-0537-408", "table-0482-461", "table-0587-119", "table-0138-26", "table-1402-303", "table-0461-422", "table-1284-383", "table-1406-881", "table-0429-406", "table-0106-127", "table-0640-318", "table-0386-192", "table-1489-497", "table-0160-700", "table-1011-412", "table-0877-225", "table-1630-554", "table-1531-11", "table-0460-805", "table-1054-808", "table-1415-891", "table-0652-762", "table-1314-367", "table-1114-115", "table-0654-136", "table-0102-751", "table-0947-219", "table-0316-169", "table-0300-835", "table-0282-587", "table-0768-134", "table-0932-745", "table-1152-544", "table-1493-116", "table-0529-559", "table-0153-692", "table-0130-665", "table-0566-981", "table-0335-567", "table-1559-140", "table-0868-150", "table-0546-808", "table-1460-242", "table-1359-298", "table-0105-888", "table-1387-392", "table-1378-661", "table-1527-187", "table-1515-978", "table-0360-699", "table-0089-147", "table-1226-618", "table-0471-192", "table-0452-395", "table-1277-128", "table-0745-368", "table-1395-433", "table-0438-517", "table-0545-170", "table-0158-615", "table-0230-349", "table-0360-748", "table-0922-184", "table-1572-510", "table-1074-980", "table-0981-889", "table-1234-208", "table-0178-504", "table-1390-704", "table-0158-509", "table-1482-884", "table-0028-481", "table-1597-166", "table-0911-887", "table-0050-52", "table-0306-271", "table-1130-636", "table-1421-873", "table-0545-299", "table-0725-299", "table-1560-454", "table-0978-109", "table-0998-122", "table-0761-540", "table-1270-176", "table-0173-42", "table-1236-152", "table-0160-17", "table-1125-621", "table-0315-751", "table-0156-810", "table-0133-688", "table-0865-328", "table-1519-914", "table-1254-780", "table-0860-447", "table-0092-603", "table-0207-24", "table-1286-617", "table-0904-12", "table-0072-586", "table-0020-774", "table-1202-466", "table-1145-511", "table-0178-914", "table-0653-379", "table-0298-781", "table-0592-945", "table-0010-183", "table-0814-391", "table-0184-977", "table-0325-693", "table-1533-471", "table-1105-938", "table-1359-564", "table-0446-885", "table-0848-348", "table-1412-313", "table-0298-463", "table-0112-374", "table-1106-19", "table-1153-823", "table-0903-66", "table-1557-287", "table-0091-283", "table-1497-498", "table-0107-927", "table-0780-11", "table-0164-554", "table-1500-494", "table-0646-340", "table-0953-578", "table-0299-716", "table-0083-118", "table-0889-363", "table-0569-999", "table-0853-510", "table-1094-729", "table-1552-789", "table-0546-618", "table-1066-693", "table-0901-306", "table-0301-212", "table-0228-732", "table-1139-322", "table-0784-316", "table-1009-1000", "table-0713-366", "table-0086-869", "table-0127-715", "table-0300-677", "table-0928-872", "table-0157-232", "table-0419-665", "table-0523-460", "table-0830-294", "table-0001-94", "table-0786-565", "table-1647-686", "table-0585-962", "table-1641-474", "table-0022-880", "table-1607-67", "table-1423-661", "table-0740-757", "table-0442-970", "table-0044-929", "table-0852-992", "table-0157-723", "table-0177-715", "table-1099-220", "table-0294-717", "table-0055-870", "table-0545-186", "table-1189-418", "table-1408-772", "table-1185-266", "table-0916-865", "table-0640-810", "table-0610-416", "table-0544-232", "table-0326-617", "table-0430-445", "table-0543-925", "table-0669-70", "table-0158-781", "table-0805-785", "table-0653-390", "table-0572-231", "table-0112-10", "table-0924-73", "table-0797-626", "table-0723-912", "table-1601-873", "table-1016-73", "table-0532-638", "table-1002-863", "table-0466-385", "table-0261-415", "table-1508-42", "table-0661-606", "table-1446-163", "table-0845-448", "table-0164-852", "table-0198-793", "table-0327-696", "table-1167-981", "table-0733-924", "table-0527-194", "table-1404-551", "table-0920-75", "table-0221-186", "table-0156-939", "table-0303-48", "table-0157-105", "table-0061-16", "table-1052-941", "table-0086-843", "table-0643-223", "table-1295-618", "table-0199-727", "table-0775-954", "table-1604-107", "table-0849-718", "table-0090-330", "table-0630-225", "table-0485-44", "table-0598-172", "table-1474-657", "table-0457-828", "table-1156-782", "table-0888-437", "table-0169-585", "table-1648-738", "table-0326-576", "table-0080-184", "table-0690-777", "table-0822-324", "table-0934-479", "table-1338-777", "table-0151-194", "table-0726-868", "table-0355-631", "table-0341-83", "table-0083-499", "table-0158-727", "table-1372-493", "table-0037-612", "table-0964-141", "table-1127-8", "table-0254-23", "table-1620-940", "table-0663-13", "table-0615-709", "table-0178-435", "table-0671-908", "table-0698-527", "table-1412-924", "table-0024-321", "table-0919-569", "table-0086-736", "table-0990-598", "table-1112-148", "table-0139-536", "table-0322-998", "table-1344-930", "table-0671-159", "table-0003-255", "table-0974-117", "table-1004-594", "table-0139-502", "table-0244-998", "table-1484-562", "table-0188-162", "table-0665-367", "table-0382-734", "table-0556-43", "table-0079-893", "table-1512-763", "table-1466-326", "table-0929-993", "table-0282-852", "table-0856-996", "table-0748-502", "table-1640-589", "table-1000-296", "table-0230-571", "table-0353-463", "table-1119-782", "table-0299-88", "table-1149-30", "table-1553-804", "table-0177-427", "table-1408-976", "table-0739-254", "table-1561-543", "table-0945-381", "table-1135-584", "table-1117-726", "table-1592-89", "table-0887-117", "table-1462-133", "table-0556-310", "table-0920-626", "table-0321-592", "table-0536-850", "table-1578-106", "table-0459-454", "table-1622-865", "table-0509-868", "table-0298-310", "table-0196-308", "table-1315-927", "table-1367-585", "table-1174-687", "table-1223-559", "table-0730-200", "table-1605-652", "table-0951-178", "table-0250-314", "table-1455-521", "table-0876-948", "table-1031-188", "table-0160-875", "table-0162-102", "table-0158-186", "table-0829-921", "table-0795-712", "table-0293-375", "table-0694-587", "table-0683-156", "table-0892-750", "table-1587-810", "table-0318-197", "table-1438-951", "table-0212-391", "table-1004-459", "table-0788-462", "table-1290-726", "table-0246-893", "table-0070-13", "table-0845-929", "table-0379-936", "table-0297-544", "table-0505-213", "table-0852-928", "table-1410-604", "table-1093-140", "table-1240-53", "table-0842-968", "table-0205-251", "table-1142-330", "table-1122-751", "table-0444-244", "table-0097-763", "table-1106-195", "table-1152-44", "table-0565-424", "table-1427-35", "table-0734-583", "table-0849-800", "table-0725-548", "table-0106-938", "table-0361-64", "table-0396-966", "table-0821-414", "table-0821-139", "table-0199-859", "table-0758-664", "table-1403-486", "table-0837-120", "table-1115-221", "table-1160-54", "table-0957-444", "table-0395-345", "table-1340-800", "table-0942-786", "table-0293-490", "table-1016-860", "table-0739-90", "table-0947-363", "table-0920-209", "table-0024-178", "table-0545-269", "table-0416-723", "table-0649-326", "table-0659-406", "table-1644-95", "table-0695-628", "table-1276-562", "table-0297-529", "table-0137-76", "table-1624-111", "table-1485-117", "table-0091-793", "table-1483-875", "table-1040-396", "table-0595-895", "table-1056-49", "table-1383-556", "table-0181-447", "table-1388-413", "table-1183-934", "table-1567-736", "table-0280-419", "table-0235-899", "table-0526-870", "table-1419-808", "table-0238-485", "table-0462-919", "table-0742-791", "table-0290-497", "table-0706-846", "table-0413-932", "table-1034-175", "table-1199-77", "table-0084-951", "table-1639-743", "table-0166-677", "table-0066-885", "table-0891-284", "table-1245-62", "table-1220-432", "table-0589-908", "table-1218-741", "table-1385-276", "table-1635-932", "table-1094-608", "table-0930-995", "table-0157-948", "table-1634-841", "table-1010-440", "table-0512-613", "table-1507-927", "table-0702-406", "table-0952-342", "table-0114-169", "table-0157-227", "table-0980-279", "table-0512-813", "table-0476-693", "table-0196-220", "table-1029-558", "table-1406-956", "table-1219-496", "table-1151-77", "table-1506-926", "table-1446-873", "table-0902-868", "table-0415-275", "table-0877-831", "table-1263-361", "table-0772-166", "table-1313-194", "table-0305-435", "table-1315-671", "table-1369-496", "table-0371-817", "table-0205-850", "table-0139-564", "table-0809-111", "table-0184-45", "table-0176-987", "table-0250-349", "table-0080-314", "table-1116-365", "table-0080-923", "table-0889-591", "table-0460-105", "table-1409-584", "table-1152-644", "table-0696-162", "table-0876-504", "table-0862-391", "table-0323-310", "table-0972-969", "table-0397-906", "table-0863-770", "table-0827-682", "table-0715-746", "table-1426-881", "table-1078-941", "table-1521-25", "table-0240-653", "table-1607-553", "table-1258-153", "table-0090-243", "table-1625-196", "table-1249-275", "table-0392-783", "table-1171-560", "table-0933-859", "table-0424-143", "table-0546-351", "table-0091-689", "table-0891-760", "table-0303-414", "table-0824-564", "table-0194-85", "table-0208-902", "table-0555-13", "table-1332-774", "table-0925-51", "table-0348-741", "table-1472-206", "table-1037-577", "table-0565-427", "table-0823-246", "table-1339-620", "table-0433-815", "table-0034-339", "table-0884-584", "table-1012-147", "table-1160-475", "table-0167-497", "table-0084-365", "table-0833-222", "table-1329-566", "table-0257-292", "table-1121-510", "table-0931-291", "table-0047-344", "table-1398-960", "table-0514-311", "table-0353-424", "table-1538-791", "table-1073-864", "table-1562-448", "table-0143-62", "table-0845-984", "table-0124-526", "table-1280-878", "table-0733-932", "table-1072-213", "table-1082-378", "table-1458-523", "table-0212-416", "table-1158-886", "table-1103-25", "table-0393-63", "table-0294-199", "table-1351-530", "table-0157-537", "table-1486-363", "table-0417-129", "table-0091-632", "table-0520-248", "table-0396-532", "table-0768-820", "table-1284-727", "table-0168-552", "table-1277-735", "table-0315-876", "table-0715-962", "table-0494-110", "table-1154-65", "table-1612-517", "table-0812-221", "table-0791-567", "table-0736-95", "table-0652-548", "table-1152-746", "table-1055-131", "table-0078-212", "table-0316-793", "table-0196-362", "table-0577-775", "table-1247-482", "table-0527-183", "table-1132-268", "table-0808-430", "table-0066-998", "table-1319-387", "table-0767-368", "table-1517-546", "table-0398-577", "table-0103-708", "table-1170-508", "table-0830-31", "table-1436-590", "table-1496-395", "table-1120-206", "table-1393-839", "table-0250-29", "table-0922-106", "table-1053-752", "table-0519-597", "table-0804-835", "table-0595-509", "table-0499-17", "table-0741-204", "table-0511-6", "table-0285-151", "table-0156-873", "table-0026-480", "table-0647-368", "table-0442-940", "table-0442-321", "table-0298-494", "table-1006-260", "table-0647-933", "table-0012-731", "table-1446-503", "table-0301-522", "table-1257-355", "table-0432-771", "table-1197-35", "table-1623-890", "table-1095-722", "table-0894-448", "table-1086-10", "table-1190-762", "table-0590-594", "table-0776-202", "table-0352-399", "table-0836-951", "table-0473-992", "table-0347-506", "table-0004-422", "table-1580-158", "table-0631-814", "table-1004-954", "table-1074-978", "table-1241-735", "table-1385-5", "table-0116-832", "table-0920-83", "table-0588-28", "table-0827-46", "table-0933-43", "table-0074-741", "table-1132-113", "table-0898-408", "table-1249-489", "table-0199-91", "table-0725-138", "table-1170-916", "table-0305-919", "table-1207-944", "table-0264-376", "table-0158-892", "table-0298-990", "table-1117-734", "table-0077-893", "table-0211-479", "table-1025-892", "table-0368-106", "table-1140-575", "table-1136-947", "table-0895-374", "table-1098-962", "table-1118-407", "table-0784-109", "table-0391-574", "table-0189-155", "table-1054-59", "table-1201-261", "table-0579-160", "table-0774-185", "table-1280-939", "table-0780-36", "table-0006-75", "table-0688-453", "table-0465-448", "table-0871-511", "table-0335-836", "table-0507-113", "table-1247-283", "table-1061-849", "table-0544-643", "table-1229-551", "table-1011-52", "table-0887-126", "table-0862-191", "table-1593-828", "table-1454-744", "table-1019-70", "table-1316-143", "table-0080-557", "table-1069-962", "table-0226-392", "table-1419-866", "table-0726-464", "table-0559-579", "table-0128-309", "table-1295-454", "table-0544-551", "table-0135-839", "table-0057-50", "table-0545-192", "table-0396-958", "table-1022-355", "table-0652-400", "table-0669-555", "table-0039-205", "table-1424-395", "table-0086-837", "table-1639-752", "table-0268-800", "table-0627-231", "table-1490-201", "table-1247-483", "table-0289-361", "table-0736-264", "table-1315-25", "table-0739-599", "table-0598-261", "table-0512-912", "table-0549-64", "table-1039-286", "table-0907-5", "table-1450-290", "table-0707-929", "table-0192-505", "table-0516-421", "table-1007-866", "table-0243-91", "table-1351-725", "table-0922-166", "table-0893-376", "table-0386-619", "table-0033-100", "table-0291-144", "table-0158-757", "table-0834-780", "table-1083-470", "table-0937-563", "table-1044-357", "table-1431-77", "table-1143-286", "table-0167-221", "table-0302-503", "table-1417-672", "table-0242-507", "table-1606-736", "table-1651-678", "table-0393-35", "table-0353-483", "table-0706-72", "table-0697-843", "table-1526-431", "table-1571-297", "table-1606-676", "table-0374-533", "table-0546-294", "table-0726-651", "table-0522-171", "table-1201-825", "table-0594-504", "table-0157-143", "table-0372-872", "table-0057-448", "table-0665-703", "table-1591-743", "table-1104-934", "table-0178-570", "table-0840-645", "table-0117-31", "table-1527-432", "table-1345-140", "table-0649-655", "table-0797-648", "table-0430-298", "table-1377-18", "table-0753-111", "table-1168-798", "table-0001-672", "table-1161-402", "table-1027-443", "table-0299-464", "table-1017-988", "table-0158-259", "table-0239-689", "table-0120-59", "table-1535-327", "table-0960-924", "table-1149-920", "table-0840-838", "table-0463-513", "table-0130-436", "table-0338-330", "table-1507-740", "table-1358-462", "table-0512-910", "table-1348-986", "table-0281-962", "table-0198-794", "table-1126-649", "table-0315-879", "table-0731-219", "table-1568-166", "table-0595-503", "table-1520-881", "table-0007-582", "table-0028-939", "table-0336-740", "table-0584-873", "table-0178-345", "table-0684-691", "table-1200-95", "table-0602-935", "table-0167-236", "table-0172-485", "table-1625-942", "table-1143-329", "table-0192-541", "table-0299-787", "table-0838-774", "table-0158-777", "table-0104-474", "table-0192-590", "table-0767-92", "table-0139-890", "table-1517-864", "table-1314-508", "table-1473-953", "table-0475-206", "table-1240-512", "table-0157-464", "table-1054-270", "table-0918-136", "table-0390-32", "table-0181-424", "table-0093-403", "table-0657-696", "table-1057-181", "table-1385-656", "table-1289-295", "table-0395-335", "table-0893-378", "table-0673-373", "table-1584-325", "table-0460-719", "table-0830-312", "table-1349-365", "table-1491-525", "table-0514-610", "table-0327-708", "table-0761-876", "table-0398-176", "table-0157-879", "table-0067-244", "table-1569-797", "table-1356-842", "table-1585-727", "table-0916-855", "table-1286-425", "table-1435-238", "table-0089-87", "table-0067-191", "table-1454-53", "table-0681-193", "table-0245-887", "table-0524-1", "table-1065-985", "table-0653-372", "table-0464-351", "table-0922-185", "table-0147-693", "table-0232-508", "table-1068-738", "table-0738-664", "table-1481-285", "table-0124-598", "table-0533-465", "table-1578-195", "table-0080-814", "table-0293-915", "table-1533-507", "table-0595-433", "table-1616-111", "table-1624-484", "table-0074-278", "table-0274-328", "table-1333-798", "table-0780-973", "table-1134-717", "table-0296-468", "table-0855-106", "table-1096-631", "table-0709-957", "table-1257-538", "table-1356-848", "table-1416-106", "table-1149-179", "table-0189-833", "table-0800-845", "table-1366-137", "table-0279-602", "table-0047-693", "table-0553-407", "table-1133-252", "table-0735-744"] -------------------------------------------------------------------------------- /Data/row_id_test.json: -------------------------------------------------------------------------------- 1 | ["table-0178-605", "table-0657-753", "table-0234-746", "table-0333-593", "table-0838-771", "table-0473-843", "table-0257-782", "table-1593-334", "table-0948-769", "table-0551-37", "table-0234-661", "table-1351-766", "table-0315-991", "table-1402-718", "table-1096-528", "table-0613-943", "table-1021-176", "table-0646-291", "table-0004-290", "table-0881-44", "table-1625-237", "table-0934-284", "table-0630-353", "table-0304-388", "table-0506-560", "table-0863-948", "table-0382-196", "table-1406-159", "table-0315-882", "table-1033-838", "table-0481-169", "table-1423-66", "table-1598-579", "table-0867-260", "table-0733-922", "table-0424-921", "table-0177-609", "table-1295-750", "table-0239-377", "table-0757-834", "table-0884-988", "table-0682-13", "table-1393-902", "table-1191-630", "table-0568-414", "table-0024-78", "table-1321-59", "table-1218-741", "table-1506-289", "table-0512-813", "table-0781-270", "table-1002-443", "table-1404-563", "table-0877-43", "table-1036-161", "table-0547-433", "table-1208-1000", "table-1647-618", "table-0427-177", "table-0303-643", "table-0914-912", "table-0261-12", "table-0664-963", "table-1251-693", "table-0184-312", "table-1022-388", "table-0931-833", "table-0733-930", "table-1169-554", "table-0301-617", "table-0253-14", "table-0309-396", "table-0977-879", "table-1111-637", "table-1372-692", "table-0514-217", "table-0604-821", "table-0427-484", "table-0677-106", "table-1640-125", "table-1614-750", "table-0518-248", "table-1633-278", "table-0156-733", "table-0528-34", "table-0640-77", "table-0086-763", "table-0189-158", "table-0662-560", "table-1000-421", "table-0157-255", "table-1567-506", "table-0237-562", "table-1395-410", "table-0904-679", "table-1389-607", "table-0278-409", "table-0325-451", "table-0445-493", "table-0180-43", "table-0938-552", "table-1130-97", "table-0423-209", "table-1151-329", "table-0022-544", "table-0670-46", "table-0020-610", "table-0458-526", "table-0899-415", "table-0017-751", "table-1021-800", "table-0428-884", "table-1014-467", "table-0173-808", "table-0464-559", "table-0013-297", "table-0604-200", "table-1097-996", "table-0565-401", "table-1395-785", "table-1178-237", "table-0712-105", "table-1487-944", "table-1295-852", "table-1331-404", "table-1095-128", "table-0192-604", "table-1456-272", "table-1264-188", "table-0495-172", "table-1491-497", "table-0530-159", "table-1274-122", "table-0421-327", "table-0671-158", "table-0318-593", "table-0648-368", "table-1085-317", "table-0108-684", "table-1619-623", "table-1136-214", "table-0298-226", "table-0050-612", "table-0917-132", "table-1464-271", "table-0969-955", "table-1410-910", "table-1272-602", "table-0062-783", "table-0218-653", "table-1393-16", "table-0086-841", "table-1478-910", "table-1548-56", "table-0735-553", "table-0579-649", "table-1311-300", "table-0140-394", "table-1414-517", "table-1158-727", "table-1099-52", "table-0394-130", "table-0616-845", "table-0025-327", "table-1462-140", "table-1630-271", "table-0488-85", "table-0565-425", "table-0228-729", "table-0212-391", "table-1234-862", "table-1021-274", "table-0617-387", "table-1077-683", "table-1450-156", "table-1510-690", "table-0632-264", "table-0001-62", "table-0451-325", "table-0430-34", "table-0503-626", "table-0426-452", "table-0645-917", "table-0981-847", "table-1571-775", "table-0626-727", "table-0515-388", "table-1081-584", "table-1296-931", "table-0039-205", "table-0676-562", "table-1258-448", "table-1099-916", "table-0357-413", "table-0156-996", "table-0548-80", "table-0904-177", "table-0301-509", "table-1445-356", "table-1137-576", "table-0656-689", "table-1550-214", "table-0206-405", "table-1371-597", "table-0802-384", "table-1158-502", "table-0097-500", "table-0461-330", "table-0299-176", "table-0546-149", "table-1078-442", "table-1069-631", "table-0814-458", "table-0425-130", "table-1652-581", "table-0067-382", "table-1547-613", "table-0013-742", "table-1387-390", "table-1045-415", "table-0136-110", "table-0281-763", "table-0028-208", "table-0392-830", "table-0838-898", "table-0013-243", "table-0285-858", "table-0985-368", "table-0198-477", "table-1088-873", "table-0616-69", "table-0796-285", "table-0837-906", "table-1019-916", "table-1109-355", "table-0552-47", "table-1446-163", "table-0285-861", "table-1242-827", "table-0890-571", "table-0135-851", "table-1312-209", "table-0636-949", "table-0889-918", "table-0542-489", "table-0022-945", "table-0566-53", "table-1178-845", "table-0336-724", "table-0456-76", "table-1281-794", "table-0320-893", "table-0072-586", "table-0792-913", "table-1558-703", "table-0803-991", "table-0943-506", "table-1048-176", "table-0007-188", "table-0990-603", "table-1347-453", "table-0184-236", "table-1274-977", "table-0379-729", "table-1546-760", "table-1053-827", "table-0901-306", "table-1026-66", "table-1457-184", "table-1426-122", "table-1558-309", "table-1628-884", "table-1526-787", "table-0329-380", "table-0986-150", "table-1082-177", "table-1211-540", "table-0111-667", "table-1545-94", "table-0744-615", "table-0564-640", "table-0719-959", "table-1449-109", "table-1625-464", "table-0154-542", "table-0468-565", "table-0595-433", "table-1435-238", "table-1002-491", "table-1491-803", "table-0822-298", "table-0454-230", "table-0455-284", "table-0805-207", "table-0010-164", "table-0710-841", "table-1314-69", "table-1008-842", "table-0170-676", "table-0450-446", "table-1051-730", "table-1484-682", "table-1158-885", "table-1149-942", "table-0225-319", "table-1373-465", "table-1127-865", "table-1087-858", "table-1593-721", "table-0696-67", "table-1256-725", "table-0156-927", "table-1111-875", "table-1463-533", "table-1543-316", "table-0097-899", "table-0457-631", "table-1002-561", "table-0742-251", "table-0933-302", "table-0837-967", "table-0534-404", "table-0213-836", "table-0346-92", "table-0094-859", "table-0044-155", "table-0086-857", "table-1563-509", "table-1581-494", "table-1579-493", "table-1128-924", "table-0924-469", "table-1276-650", "table-0222-957", "table-0209-755", "table-1011-769", "table-1239-257", "table-1226-320", "table-0721-992", "table-1438-253", "table-0167-234", "table-1105-180", "table-0505-484", "table-1426-284", "table-0201-962", "table-1450-210", "table-1134-644", "table-1638-144", "table-0127-24", "table-0192-602", "table-0395-56", "table-0658-379", "table-0532-206", "table-1510-341", "table-0892-744", "table-0357-852", "table-1347-756", "table-0144-612", "table-1581-451", "table-0300-802", "table-0737-582", "table-1035-913", "table-1363-990", "table-0199-727", "table-1322-178", "table-0067-14", "table-0861-199", "table-0193-164", "table-1017-573", "table-0499-663", "table-0302-543", "table-0557-864", "table-1359-535", "table-0633-968", "table-0114-90", "table-0276-185", "table-1025-671", "table-0549-69", "table-1079-600", "table-1196-348", "table-0080-692", "table-1071-754", "table-0744-1", "table-1167-980", "table-0278-207", "table-1412-889", "table-0873-459", "table-1129-753", "table-0685-976", "table-1056-861", "table-0081-271", "table-0080-601", "table-0487-475", "table-0790-144", "table-0107-73", "table-1130-939", "table-0503-480", "table-0871-805", "table-0167-182", "table-1283-704", "table-0157-304", "table-0998-621", "table-1572-501", "table-0884-987", "table-0226-422", "table-0548-994", "table-0542-51", "table-1012-272", "table-1119-910", "table-1356-474", "table-0635-346", "table-0243-715", "table-1450-354", "table-0873-298", "table-0547-358", "table-0869-353", "table-0631-832", "table-1132-319", "table-1414-463", "table-0643-914", "table-0733-923", "table-0222-797", "table-0247-21", "table-1580-158", "table-1630-545", "table-0359-559", "table-0931-361", "table-0916-472", "table-1606-575", "table-1194-564", "table-0365-362", "table-0614-406", "table-0498-284", "table-1607-553", "table-1080-366", "table-0150-619", "table-0212-810", "table-1256-741", "table-0545-527", "table-0280-844", "table-0365-190", "table-1328-643", "table-0211-336", "table-0743-871", "table-0306-699", "table-0975-82", "table-1336-350", "table-0627-226", "table-1485-820", "table-1203-820", "table-1652-273", "table-0447-150", "table-0597-812", "table-1098-923", "table-1103-30", "table-0159-595", "table-0683-574", "table-1143-285", "table-1270-757", "table-0949-497", "table-1356-311", "table-0275-791", "table-0498-629", "table-1517-815", "table-1379-85", "table-0065-830", "table-1242-703", "table-1274-124", "table-1088-159", "table-1094-978", "table-0923-59", "table-0390-946", "table-1061-668", "table-0684-337", "table-0113-926", "table-0322-944", "table-1651-489", "table-1000-665", "table-0856-207", "table-0180-385", "table-0963-490", "table-0710-821", "table-0251-794", "table-0088-615", "table-0286-411", "table-0627-627", "table-1051-645", "table-0424-143", "table-1193-912", "table-0921-128", "table-1216-491", "table-1302-612", "table-0979-600", "table-0509-806", "table-0141-116", "table-1391-617", "table-0187-23", "table-0835-329", "table-1002-509", "table-1597-61", "table-0891-889", "table-1025-783", "table-1410-648", "table-0801-167", "table-1479-899", "table-0616-939", "table-0081-812", "table-0560-334", "table-1052-262", "table-1526-471", "table-0287-90", "table-0841-650", "table-0049-860", "table-0400-601", "table-1106-385", "table-0613-197", "table-0559-164", "table-0171-874", "table-1002-822", "table-1099-912", "table-1292-366", "table-0452-844", "table-0104-474", "table-0083-129", "table-0309-240", "table-1422-603", "table-0396-966", "table-0107-183", "table-0963-888", "table-0876-3", "table-0730-631", "table-1160-312", "table-0374-983", "table-0639-777", "table-0804-813", "table-1205-338", "table-0422-909", "table-0862-125", "table-0104-284", "table-0158-675", "table-0030-140", "table-1425-220", "table-0338-340", "table-1183-730", "table-1331-670", "table-0723-898", "table-1277-106", "table-0103-874", "table-0547-766", "table-1613-383", "table-0748-934", "table-1154-344", "table-0129-72", "table-0943-653", "table-0415-281", "table-0240-378", "table-0290-217", "table-1139-523", "table-1286-383", "table-0745-463", "table-0060-584", "table-0731-353", "table-0653-785", "table-0687-500", "table-0245-887", "table-1119-892", "table-0488-753", "table-1546-994", "table-0786-559", "table-0762-325", "table-1056-50", "table-0416-81", "table-0336-738", "table-0195-550", "table-1639-410", "table-0124-263", "table-1528-996", "table-0962-663", "table-0232-472", "table-0841-170", "table-0407-876", "table-0591-559", "table-1111-757", "table-1268-806", "table-1468-983", "table-1325-471", "table-1174-769", "table-0982-633", "table-0885-868", "table-1315-671", "table-0836-340", "table-0157-5", "table-0091-622", "table-0190-29", "table-0080-925", "table-0573-257", "table-0465-672", "table-0531-340", "table-1021-230", "table-1634-614", "table-1014-590", "table-0747-997", "table-0621-730", "table-1428-781", "table-0107-376", "table-0723-235", "table-0409-851", "table-0515-89", "table-0649-335", "table-0678-389", "table-0272-22", "table-0089-933", "table-1203-139", "table-0004-422", "table-1064-287", "table-0123-995", "table-1171-560", "table-0938-130", "table-0544-232", "table-0508-70", "table-1056-15", "table-0637-71", "table-0290-165", "table-0546-351", "table-0146-369", "table-0491-367", "table-0485-298", "table-0581-929", "table-0913-845", "table-0091-581", "table-0708-747", "table-0544-225", "table-0070-917", "table-0408-497", "table-0086-744", "table-1348-709", "table-0127-715", "table-1600-974", "table-1127-316", "table-0587-906", "table-0836-753", "table-0107-289", "table-0707-330", "table-1095-722", "table-1055-395", "table-0277-477", "table-0590-94", "table-0923-976", "table-0412-318", "table-1132-91", "table-1482-719", "table-1252-698", "table-0578-200", "table-1564-386", "table-0100-58", "table-0520-605", "table-0177-856", "table-0482-133", "table-0573-39", "table-0554-974", "table-0664-367", "table-1574-996", "table-1295-251", "table-0156-881", "table-1612-526", "table-1521-846", "table-1267-168", "table-0778-515", "table-0152-584", "table-0792-731", "table-1301-217", "table-0870-167", "table-0054-435", "table-0406-280", "table-0526-132", "table-0165-661", "table-1337-202", "table-0088-931", "table-0991-18", "table-0320-793", "table-1097-402", "table-1024-280", "table-0974-129", "table-0651-189", "table-1231-805", "table-0105-26", "table-0926-60", "table-1070-44", "table-0582-452", "table-0744-158", "table-0145-91", "table-1259-812", "table-0230-970", "table-1256-220", "table-0419-32", "table-0927-150", "table-0387-776", "table-0210-400", "table-0258-806", "table-1143-722", "table-1438-347", "table-0144-354", "table-1004-517", "table-0275-612", "table-0755-841", "table-1128-836", "table-1470-511", "table-1211-546", "table-0299-793", "table-0725-548", "table-0157-228", "table-0250-43", "table-1199-371", "table-0593-83", "table-0149-678", "table-0181-617", "table-1466-582", "table-1083-528", "table-1648-159", "table-0156-824", "table-0520-798", "table-1093-140", "table-0019-369", "table-1291-616", "table-0410-781", "table-1129-381", "table-0232-611", "table-0715-962", "table-0831-848", "table-1132-77", "table-0247-467", "table-1379-338", "table-1007-334", "table-0267-302", "table-1592-89", "table-0798-376", "table-0599-282", "table-0702-840", "table-0542-919", "table-0328-856", "table-0068-1000", "table-0838-766", "table-0755-920", "table-1269-517", "table-1508-760", "table-1579-196", "table-1267-388", "table-1520-158", "table-0838-762", "table-0793-990", "table-0952-418", "table-1483-904", "table-0902-868", "table-0331-56", "table-0034-336", "table-1635-254", "table-0956-147", "table-0185-213", "table-0104-428", "table-0270-476", "table-0929-424", "table-1357-259", "table-0633-936", "table-1127-206", "table-0518-34", "table-0531-599", "table-0460-114", "table-1161-796", "table-1209-35", "table-1282-537", "table-0534-24", "table-0158-359", "table-0934-747", "table-1128-844", "table-0838-895", "table-1598-373", "table-0155-586", "table-0423-395", "table-0425-710", "table-0364-856", "table-0331-693", "table-1503-954", "table-0603-523", "table-1558-817", "table-1580-309", "table-0091-743", "table-0184-214", "table-1492-263", "table-0655-143", "table-1382-579", "table-0697-525", "table-1006-747", "table-0272-105", "table-0646-42", "table-1166-586", "table-0221-724", "table-0425-363", "table-0545-377", "table-1633-755", "table-0116-450", "table-0388-488", "table-0725-152", "table-0897-168", "table-1277-949", "table-0031-844", "table-0802-414", "table-0795-887", "table-0240-834", "table-0529-576", "table-1039-236", "table-0288-701", "table-0436-676", "table-0617-758", "table-1301-997", "table-1361-220", "table-0529-491", "table-0533-466", "table-0316-186", "table-0972-286", "table-1006-423", "table-1170-293", "table-0424-654", "table-0422-319", "table-1492-592", "table-1028-555", "table-0182-632", "table-1509-826", "table-0323-310", "table-0559-161", "table-0059-655", "table-0876-946", "table-0544-220", "table-0172-492", "table-0144-369", "table-1618-227", "table-0544-474", "table-1024-861", "table-1384-55", "table-1103-33", "table-0080-288", "table-0169-103", "table-1193-888", "table-0856-307", "table-1547-279", "table-0162-94", "table-0700-161", "table-0292-864", "table-0494-854", "table-0230-888", "table-1380-438", "table-1429-593", "table-0374-662", "table-0013-298", "table-1402-21", "table-1198-953", "table-1640-712", "table-0784-89", "table-0253-241", "table-1187-246", "table-1004-783", "table-0196-365", "table-1202-455", "table-0154-131", "table-0829-392", "table-0381-248", "table-1446-745", "table-0319-749", "table-0767-92", "table-0142-977", "table-0931-936", "table-1481-410", "table-0319-587", "table-0527-123", "table-1298-679", "table-0578-249", "table-1480-124", "table-1572-391", "table-1376-618", "table-0014-59", "table-0887-836", "table-1570-953", "table-0255-492", "table-0129-90", "table-0464-506", "table-0529-432", "table-0759-435", "table-0289-697", "table-1228-656", "table-0665-654", "table-0707-806", "table-0187-844", "table-0962-475", "table-0613-613", "table-1613-704", "table-1556-144", "table-0384-771", "table-0298-450", "table-0158-430", "table-0725-278", "table-0241-654", "table-1002-781", "table-1011-947", "table-0127-667", "table-0787-299", "table-0546-938", "table-0586-304", "table-0218-646", "table-1427-392", "table-0207-90", "table-1293-616", "table-0734-348", "table-0365-647", "table-1245-973", "table-0229-219", "table-0033-243", "table-0158-231", "table-1259-507", "table-0019-307", "table-0223-667", "table-1168-134", "table-0410-854", "table-1505-480", "table-1117-739", "table-0533-40", "table-0158-228", "table-0749-370", "table-0643-236", "table-1508-399", "table-1011-552", "table-1560-832", "table-1318-418", "table-0448-535", "table-0489-696", "table-0218-641", "table-0156-721", "table-1011-707", "table-0233-297", "table-0122-687", "table-0464-281", "table-1094-356", "table-0071-901", "table-1624-144", "table-0627-247", "table-0627-79", "table-1377-18", "table-1268-611", "table-0650-817", "table-1335-47", "table-0542-61", "table-0930-142", "table-1314-23", "table-1462-297", "table-1344-3", "table-1282-273", "table-0573-256", "table-1253-378", "table-0181-424", "table-0310-275", "table-0220-577", "table-1029-590", "table-0729-57", "table-0714-358", "table-0794-757", "table-0094-393", "table-1126-849", "table-1383-542", "table-1233-692", "table-0505-498", "table-0770-226", "table-0742-348", "table-1572-319", "table-0459-809", "table-0692-710", "table-1084-319", "table-1627-240", "table-0157-800", "table-1085-79", "table-0922-160", "table-0005-352", "table-1044-913", "table-0163-453", "table-1156-402", "table-1026-448", "table-0767-292", "table-0922-159", "table-0091-924", "table-1584-292", "table-0521-347", "table-0904-465", "table-1526-549", "table-0798-735", "table-1221-348", "table-0634-329", "table-0061-16", "table-1491-757", "table-0589-450", "table-0364-438", "table-0740-17", "table-0311-212", "table-0109-172"] -------------------------------------------------------------------------------- /Data/row_id_validation.json: -------------------------------------------------------------------------------- 1 | ["table-0009-519", "table-0769-795", "table-1591-99", "table-1125-964", "table-0455-811", "table-1231-444", "table-0739-318", "table-1126-676", "table-0018-419", "table-1126-857", "table-1518-949", "table-0323-833", "table-1613-594", "table-1002-674", "table-1178-455", "table-1424-498", "table-1526-458", "table-0225-904", "table-1376-719", "table-0034-450", "table-0895-291", "table-1274-305", "table-0158-832", "table-1436-262", "table-1036-399", "table-1014-315", "table-0984-166", "table-0021-98", "table-0802-389", "table-0081-269", "table-0232-141", "table-0743-526", "table-1211-853", "table-0454-907", "table-1394-71", "table-0979-324", "table-1622-934", "table-0586-445", "table-0857-931", "table-0301-685", "table-0595-400", "table-0158-249", "table-0064-448", "table-1244-394", "table-0762-891", "table-1550-456", "table-0603-73", "table-0190-993", "table-1487-97", "table-1094-817", "table-1151-4", "table-0636-322", "table-1155-481", "table-0836-782", "table-1460-247", "table-0255-504", "table-0056-643", "table-0659-405", "table-1479-740", "table-1571-877", "table-0430-986", "table-0255-986", "table-1034-175", "table-1052-652", "table-0145-580", "table-1375-114", "table-1156-157", "table-1472-178", "table-0828-82", "table-0842-404", "table-0022-649", "table-1059-863", "table-1086-596", "table-0916-974", "table-0107-361", "table-1004-555", "table-1277-153", "table-1542-228", "table-0570-128", "table-0928-966", "table-0141-995", "table-0393-889", "table-0532-159", "table-1340-800", "table-0563-249", "table-0218-708", "table-1263-957", "table-0234-530", "table-0830-431", "table-0947-220", "table-0254-357", "table-0682-769", "table-0158-543", "table-0725-332", "table-0212-900", "table-0415-284", "table-0302-224", "table-1335-955", "table-0044-572", "table-1241-876", "table-1247-846", "table-1415-542", "table-0953-810", "table-1489-500", "table-1462-761", "table-1247-451", "table-1572-419", "table-0124-337", "table-0157-84", "table-1620-121", "table-0242-228", "table-1055-524", "table-0350-834", "table-0283-874", "table-1429-481", "table-0510-905", "table-1595-158", "table-1200-313", "table-1383-301", "table-1283-877", "table-0954-785", "table-0567-21", "table-0158-615", "table-0907-465", "table-1136-935", "table-0396-477", "table-0695-953", "table-0415-277", "table-1026-438", "table-0127-622", "table-1542-349", "table-1225-653", "table-0746-620", "table-1402-452", "table-0299-542", "table-0457-168", "table-1037-957", "table-0146-370", "table-1189-420", "table-0487-67", "table-0312-34", "table-0285-929", "table-0544-354", "table-0918-745", "table-0966-159", "table-0202-282", "table-1558-440", "table-0122-751", "table-1018-852", "table-1642-928", "table-0251-975", "table-0047-771", "table-0124-161", "table-0205-279", "table-0768-203", "table-0755-763", "table-0512-684", "table-0056-64", "table-1252-608", "table-0709-922", "table-0496-696", "table-1405-700", "table-0107-805", "table-0212-27", "table-0444-807", "table-1512-71", "table-0378-798", "table-0257-459", "table-0426-303", "table-0416-7", "table-0647-999", "table-1416-587", "table-1002-421", "table-0838-822", "table-0975-764", "table-0216-686", "table-0568-780", "table-1338-6", "table-0422-616", "table-1523-601", "table-0448-290", "table-0719-554", "table-0362-850", "table-0808-914", "table-0976-550", "table-1636-662", "table-1634-334", "table-0564-267", "table-1100-385", "table-0487-328", "table-0254-684", "table-0143-49", "table-0210-609", "table-0976-925", "table-0243-91", "table-0859-419", "table-0546-161", "table-1523-938", "table-0760-440", "table-0408-913", "table-0944-675", "table-0912-717", "table-0359-202", "table-0490-568", "table-1141-37", "table-1013-451", "table-0280-650", "table-0742-397", "table-0550-581", "table-0080-334", "table-1642-761", "table-0348-741", "table-1411-314", "table-0437-891", "table-0727-136", "table-0236-391", "table-0119-543", "table-1291-90", "table-1494-86", "table-1278-743", "table-0001-695", "table-0554-573", "table-0001-439", "table-0918-624", "table-1585-689", "table-0881-77", "table-0617-977", "table-0326-113", "table-0605-816", "table-0183-993", "table-0698-346", "table-1277-348", "table-1450-523", "table-0455-433", "table-1120-308", "table-0460-38", "table-0096-60", "table-0439-213", "table-0988-964", "table-1067-437", "table-1330-796", "table-0529-458", "table-0101-190", "table-0161-71", "table-0636-646", "table-0158-464", "table-0462-792", "table-0575-268", "table-0789-664", "table-0201-971", "table-1492-84", "table-0595-726", "table-0004-182", "table-0887-121", "table-0420-314", "table-0560-303", "table-0428-182", "table-0217-197", "table-1198-394", "table-0639-463", "table-0275-547", "table-1535-99", "table-0166-906", "table-0695-799", "table-0540-922", "table-1392-379", "table-0745-210", "table-0570-101", "table-0783-59", "table-1371-194", "table-0646-212", "table-0797-522", "table-0358-18", "table-0796-609", "table-1568-247", "table-1293-617", "table-1205-984", "table-0695-273", "table-0699-272", "table-1038-21", "table-0851-476", "table-1587-822", "table-0781-52", "table-0006-75", "table-1004-756", "table-1266-465", "table-1020-603", "table-1061-39", "table-1459-132", "table-0317-261", "table-0876-613", "table-1452-345", "table-0936-603", "table-0542-490", "table-0733-915", "table-1428-371", "table-1494-31", "table-0646-211", "table-1294-613", "table-0858-939", "table-1111-989", "table-1184-9", "table-0444-439", "table-0190-360", "table-0109-244", "table-0115-262", "table-0476-683", "table-0970-215", "table-1582-559", "table-1322-263", "table-0806-543", "table-1334-743", "table-0271-502", "table-0505-125", "table-0505-611", "table-1085-78", "table-1004-378", "table-0703-936", "table-0542-471", "table-1547-412", "table-1449-760", "table-1053-341", "table-1199-862", "table-0296-49", "table-0949-423", "table-1616-533", "table-0590-595", "table-1333-144", "table-1406-74", "table-0750-59", "table-0846-535", "table-1518-173", "table-0245-944", "table-1083-527", "table-0685-183", "table-1130-804", "table-1056-996", "table-1101-613", "table-1582-525", "table-1529-601", "table-1562-748", "table-0158-807", "table-0463-513", "table-0907-393", "table-0550-584", "table-0080-612", "table-0005-824", "table-0951-858", "table-1643-190", "table-0695-599", "table-1233-928", "table-0238-34", "table-1505-479", "table-0397-510", "table-1449-415", "table-1013-125", "table-0317-691", "table-1592-88", "table-0287-86", "table-0545-313", "table-1593-829", "table-1633-265", "table-1460-979", "table-1184-321", "table-1288-516", "table-0168-791", "table-1602-116", "table-0696-559", "table-1264-189", "table-0617-583", "table-0175-734", "table-1251-892", "table-0290-174", "table-1000-252", "table-1350-170", "table-0154-1000", "table-0086-737", "table-1013-453", "table-0158-809", "table-1336-290", "table-0645-482", "table-0010-304", "table-0888-571", "table-1620-162", "table-1431-818", "table-0157-105", "table-0543-819", "table-1081-552", "table-1559-135", "table-0493-73", "table-0509-255", "table-1639-781", "table-0410-219", "table-1273-682", "table-0649-331", "table-0491-883", "table-1165-975", "table-0823-900", "table-1140-971", "table-1437-216", "table-0004-181", "table-0815-234", "table-1216-536", "table-0622-268", "table-0396-895", "table-1574-687", "table-0927-481", "table-0139-881", "table-1149-158", "table-1292-766", "table-0335-947", "table-1155-493", "table-0028-943", "table-1279-458", "table-0004-531", "table-0530-704", "table-1653-510", "table-0158-72", "table-0714-564", "table-0193-69", "table-0493-135", "table-1495-701", "table-0188-138", "table-1363-693", "table-0436-492", "table-0103-595", "table-0216-801", "table-1095-509", "table-0167-465", "table-0357-38", "table-1050-132", "table-0710-248", "table-1070-132", "table-0702-762", "table-0554-587", "table-0743-185", "table-1592-240", "table-0492-903", "table-0786-391", "table-0033-100", "table-1595-162", "table-0897-813", "table-1513-721", "table-0551-412", "table-1061-971", "table-0141-825", "table-1200-851", "table-1459-450", "table-0102-343", "table-1120-211", "table-1496-791", "table-0373-777", "table-1612-134", "table-1448-743", "table-0543-973", "table-1376-147", "table-0861-159", "table-1393-913", "table-1542-818", "table-0646-96", "table-0142-506", "table-0023-444", "table-0403-345", "table-0158-505", "table-1252-25", "table-1210-875", "table-1026-167", "table-1572-38", "table-0440-255", "table-0156-736", "table-1526-497", "table-1507-230", "table-0836-46", "table-0298-353", "table-0847-793", "table-0213-759", "table-0144-609", "table-1368-83", "table-0460-928", "table-0318-553", "table-1505-887", "table-0335-225", "table-1489-484", "table-1044-436", "table-0615-134", "table-0287-919", "table-0622-846", "table-0426-319", "table-0278-279", "table-0565-428", "table-1245-62", "table-0544-431", "table-0431-581", "table-0364-494", "table-0800-843", "table-1015-642", "table-0388-448", "table-1120-834", "table-0335-836", "table-0086-381", "table-0170-294", "table-0515-980", "table-1005-708", "table-1276-747", "table-0830-298", "table-0497-838", "table-0121-545", "table-0158-692", "table-1142-701", "table-1315-697", "table-0183-176", "table-1633-208", "table-1589-657", "table-0143-61", "table-0281-967", "table-0289-37", "table-1039-286", "table-0993-960", "table-0056-838", "table-0919-985", "table-0854-188", "table-1371-599", "table-1569-232", "table-0240-968", "table-0749-353", "table-0298-277", "table-0793-317", "table-1570-724", "table-0154-25", "table-1129-204", "table-0227-702", "table-0925-349", "table-0232-32", "table-0549-3", "table-0652-558", "table-0984-203", "table-1381-867", "table-1077-304", "table-0451-628", "table-0085-299", "table-0484-406", "table-1572-202", "table-0766-801", "table-1120-898", "table-1416-600", "table-1187-217", "table-0027-8", "table-0704-875", "table-0963-213", "table-0407-343", "table-0176-404", "table-0079-243", "table-0270-328", "table-1004-459", "table-0529-575", "table-0699-406", "table-0360-73", "table-0386-629", "table-0152-176", "table-0162-952", "table-1101-975", "table-0088-667", "table-0742-375", "table-1278-731", "table-1559-525", "table-0621-45", "table-0671-879", "table-1539-41", "table-1056-78", "table-1556-561", "table-0184-168", "table-0067-3", "table-1507-221", "table-0732-708", "table-0755-647", "table-1446-591", "table-0987-197", "table-0746-182", "table-0885-708", "table-0139-555", "table-0437-296", "table-1115-789", "table-0725-205", "table-0175-836", "table-1547-975", "table-0839-852", "table-1483-882", "table-0901-55", "table-1360-698", "table-1630-363", "table-1539-280", "table-0653-382", "table-1287-396", "table-0059-539", "table-0166-877", "table-0556-215", "table-1243-920", "table-0201-407", "table-0480-225", "table-0105-164", "table-0321-769", "table-0441-224", "table-0022-907", "table-0229-542", "table-1069-571", "table-0299-677", "table-1471-341", "table-0831-130", "table-1157-866", "table-1428-375", "table-0065-500", "table-0166-605", "table-0440-299", "table-1066-332", "table-0641-928", "table-1117-725", "table-1405-71", "table-1633-741", "table-0188-284", "table-0973-112", "table-0004-311", "table-1252-530", "table-0327-706", "table-1490-82", "table-0629-353", "table-1254-874", "table-1132-808", "table-0394-37", "table-1387-241", "table-1566-492", "table-0070-746", "table-0425-904", "table-0680-376", "table-1243-613", "table-1138-711", "table-0770-570", "table-1283-217", "table-1031-340", "table-0455-255", "table-0158-740", "table-0115-877", "table-1004-415", "table-0767-733", "table-0441-754", "table-1002-678", "table-0450-62", "table-0722-558", "table-0230-964", "table-1092-183", "table-0291-637", "table-0553-583", "table-1149-179", "table-0949-813", "table-0480-463", "table-1438-226", "table-1616-965", "table-0693-433", "table-0491-247", "table-1647-789", "table-0202-374", "table-0945-380", "table-0213-718", "table-1489-488", "table-0465-504", "table-0045-807", "table-0797-574", "table-0918-767", "table-0317-257", "table-1094-865", "table-1520-550", "table-1214-395", "table-0682-386", "table-0647-933", "table-1359-971", "table-1266-886", "table-0595-401", "table-1383-478", "table-1073-863", "table-0573-263", "table-1392-209", "table-0085-694", "table-0648-4", "table-1256-726", "table-1467-457", "table-1464-272", "table-0637-61", "table-0013-608", "table-0197-315", "table-0776-682", "table-1412-668", "table-0459-443", "table-0175-807", "table-1409-585", "table-1264-903", "table-0326-603", "table-0712-984", "table-0533-41", "table-1221-159", "table-1283-112", "table-0714-565", "table-0304-152", "table-1234-861", "table-0741-937", "table-1553-804", "table-0158-526", "table-0529-907", "table-1479-351", "table-0741-723", "table-0468-405", "table-1172-160", "table-0158-839", "table-0117-751", "table-1257-62", "table-1240-388", "table-0025-323", "table-0529-768", "table-0389-617", "table-0298-109", "table-0184-843", "table-0980-745", "table-0315-879", "table-1546-203", "table-0203-448", "table-1192-416", "table-0766-733", "table-0320-857", "table-0359-196", "table-0529-568", "table-1404-562", "table-0482-443", "table-1104-440", "table-0317-279", "table-1558-337", "table-0388-519", "table-0157-241", "table-1190-818", "table-1356-845", "table-0404-570", "table-0042-846", "table-0198-249", "table-1002-842", "table-1446-396", "table-0156-479", "table-1332-66", "table-0916-869", "table-0464-907", "table-0058-406", "table-0682-389", "table-0500-401", "table-0080-219", "table-0461-256", "table-0065-926", "table-1167-978", "table-0595-327", "table-0541-428", "table-1489-448", "table-0874-19", "table-1467-231", "table-1289-726", "table-0025-285", "table-0221-615", "table-1594-3", "table-0945-70", "table-1637-658", "table-0979-245", "table-0714-321", "table-1398-382", "table-0066-494", "table-1313-327", "table-0144-80", "table-0431-743", "table-0819-222", "table-0178-435", "table-0644-169", "table-0218-655", "table-0993-813", "table-0622-313", "table-0154-108", "table-1596-985", "table-0592-965", "table-1302-595", "table-1551-628", "table-0862-161", "table-0005-138", "table-0083-311", "table-1599-222", "table-0582-523", "table-1284-407", "table-1486-394", "table-1637-285", "table-1428-664", "table-0044-17", "table-0182-249", "table-0895-216", "table-1158-378", "table-0329-170", "table-0186-511", "table-0893-396", "table-0250-410", "table-1515-127", "table-0642-401", "table-1074-574", "table-0080-662", "table-0975-866", "table-1446-732", "table-1518-223", "table-0124-526", "table-1160-298", "table-0700-659", "table-1443-288", "table-0075-141", "table-1097-777", "table-0779-47", "table-0568-13", "table-0091-900", "table-0188-712", "table-0458-632", "table-0611-589", "table-1531-14", "table-0072-590", "table-0355-17", "table-1024-999", "table-0028-926", "table-1251-789", "table-1229-551", "table-0079-964", "table-0979-224", "table-0552-188", "table-0511-294", "table-0727-581", "table-1413-438", "table-0544-737", "table-0464-210", "table-1089-554", "table-0503-524", "table-0291-646", "table-1644-363", "table-1592-230", "table-0316-211", "table-0139-566", "table-0982-810", "table-1342-459", "table-1539-337", "table-0185-312", "table-0878-539", "table-1300-236", "table-0449-237", "table-0501-798", "table-0651-820", "table-0476-315", "table-0198-696", "table-1377-616", "table-0799-464", "table-0657-516", "table-0696-557", "table-0481-81", "table-0083-115", "table-1602-935", "table-0725-449", "table-0574-885", "table-0190-897", "table-0278-262", "table-1454-744", "table-1284-977", "table-1002-493", "table-1288-100", "table-0631-846", "table-0778-1", "table-1011-204", "table-0684-99", "table-1630-56", "table-0019-991", "table-0989-279", "table-1539-279", "table-1342-421", "table-0429-155", "table-1055-102", "table-0809-506", "table-0711-184", "table-0588-570", "table-0400-859", "table-0768-215", "table-1435-647", "table-1030-471", "table-0985-826", "table-0625-385", "table-0767-590", "table-0511-622", "table-1070-354", "table-0157-33", "table-0798-291", "table-0975-77", "table-1047-751", "table-0545-295", "table-1203-968", "table-1482-538", "table-1000-82", "table-0676-609", "table-1006-332", "table-0952-796", "table-1114-118", "table-1199-368", "table-0567-822", "table-0393-737", "table-1437-812", "table-0840-313", "table-0484-431", "table-0300-835", "table-0060-587", "table-0841-649", "table-1029-558", "table-1550-121", "table-0300-650", "table-0705-349", "table-0757-795", "table-1004-424", "table-0458-518", "table-0872-155", "table-0008-693", "table-0670-761", "table-0615-333", "table-0701-479", "table-1449-666", "table-0253-940", "table-1278-822", "table-0070-13", "table-0318-607", "table-0006-197", "table-1345-701", "table-0484-79", "table-0388-364", "table-0158-217", "table-1578-355", "table-0094-706", "table-1570-741", "table-0299-601", "table-1644-428", "table-0598-960", "table-1608-185", "table-0955-22", "table-1250-474", "table-0818-765", "table-0740-908", "table-0630-352", "table-0613-57", "table-0368-688", "table-0104-700", "table-1394-699", "table-1561-571", "table-1428-888", "table-0066-53", "table-1063-60", "table-0459-680", "table-1343-147", "table-1483-875", "table-1427-101", "table-0614-265", "table-1446-872", "table-1211-844", "table-1179-71", "table-0342-862", "table-1561-361", "table-1275-495", "table-1094-587", "table-1024-901", "table-1100-382", "table-0649-334", "table-0886-684", "table-0235-670", "table-0546-349", "table-0982-616", "table-1561-447", "table-1590-462", "table-0157-745", "table-1115-521", "table-0689-195", "table-0963-987", "table-1074-978", "table-0364-807", "table-0314-531", "table-0529-771", "table-1026-54", "table-0460-595", "table-0545-511", "table-0726-159", "table-1562-288", "table-1070-749", "table-0592-284", "table-1653-594", "table-0770-779", "table-1265-324", "table-0224-795", "table-0565-472", "table-0907-1", "table-0695-18", "table-0459-552", "table-0739-89", "table-0158-462", "table-1063-873", "table-0398-790"] -------------------------------------------------------------------------------- /Population/abstract_index.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script is used to index dbpedia abstract: 3 | id: 4 | abstract: analyzed abstract 5 | 6 | author: Shuo Zhang 7 | """ 8 | 9 | from Population.elastic import Elastic 10 | 11 | def abstract_index(ab_file = "long_abstracts_en.ttl"): 12 | 13 | index_name = "dbpedia_2015_10_abstract" 14 | mappings = { 15 | "abstract":Elastic.analyzed_field(), 16 | } 17 | elastic = Elastic(index_name) 18 | elastic.create_index(mappings, force=True) 19 | f = open(ab_file,"r") 20 | docs = {} 21 | count = 1 22 | for line in f: 23 | if "started" in line or "completed" in line: 24 | continue 25 | tmp = line.strip().split(" ",2) 26 | entity_id = tmp[0].split("")[0] 27 | abstract = tmp[2].split('"')[1] 28 | docs[entity_id] = {"abstract":abstract} 29 | if len(docs) == 10000: 30 | print(count, count, count) 31 | elastic.add_docs_bulk(docs) 32 | docs = {} 33 | count += 1 34 | elastic.add_docs_bulk(docs) 35 | 36 | 37 | if __name__ == "__main__": 38 | abstract_index() 39 | -------------------------------------------------------------------------------- /Population/cat_type_index.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script is used to index dbpedia category and types information 3 | id: 4 | type_n: not analyzed type 5 | type_a: analyzed type 6 | cat_n: not analyzed category 7 | cat_a: analyzed category 8 | Type only include ontology types. 9 | 10 | author: Shuo Zhang 11 | """ 12 | 13 | from elastic import Elastic 14 | import json 15 | import re 16 | 17 | def convert_from_camelcase(name): 18 | """Splits a CamelCased string into a new one, capitalized, where words are separated by blanks. 19 | 20 | :param name: 21 | :return: 22 | """ 23 | # http://stackoverflow.com/questions/1175208/elegant-python-function-to-convert-camelcase-to-snake-case 24 | s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name) 25 | return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).replace("_", " ")#.capitalize() 26 | 27 | def extrac_type(): 28 | out = open("entity_type.json", "w") 29 | dir = "/data/scratch/tmp/instance_types_transitive_en.ttl" 30 | f = open(dir, "r") 31 | entity_type = {} 32 | for line in f: 33 | if "started" in line: # first line 34 | continue 35 | tmp = line.strip().split() 36 | try: 37 | entity = tmp[0].split("resource/")[1].split(">")[0] 38 | type = tmp[2] 39 | print("entity:", entity, "...Type:", type) 40 | if entity not in entity_type.keys(): 41 | entity_type[entity] = [] 42 | entity_type[entity].append(type) 43 | else: 44 | entity_type[entity].append(type) 45 | except: # last line 46 | print(line) 47 | print(len(entity_type)) 48 | json.dump(entity_type, out, indent=2) 49 | 50 | 51 | def extrac_cat(): 52 | out = open("entity_category.json", "w") 53 | dir = "/data/scratch/tmp/article_categories_en.ttl" 54 | f = open(dir, "r") 55 | entity_type = {} 56 | for line in f: 57 | if "started" in line: # first line 58 | continue 59 | try: 60 | tmp = line.strip().split() 61 | entity = tmp[0].split("resource/")[1].split(">")[0] 62 | cat = tmp[2].split("/Category:")[1].split(">")[0] 63 | print("entity:", entity, "...cat:", cat) 64 | if entity not in entity_type.keys(): 65 | entity_type[entity] = [] 66 | entity_type[entity].append(cat) 67 | else: 68 | entity_type[entity].append(cat) 69 | except: # last line 70 | print(line) 71 | print(len(entity_type)) 72 | json.dump(entity_type, out, indent=2) 73 | 74 | 75 | def cat_type_index(): 76 | f1 = open("entity_category.json", "r") 77 | entity_cat = json.load(f1) 78 | f2 = open("entity_type.json", "r") 79 | entity_type = json.load(f2) 80 | index_name = "dbpedia_2015_10_type_cat" 81 | mappings = { 82 | "type_n": Elastic.notanalyzed_field(), 83 | "type_a": Elastic.analyzed_field(), 84 | "category_n": Elastic.notanalyzed_field(), 85 | "category_a": Elastic.analyzed_field() 86 | } 87 | elastic = Elastic(index_name) 88 | elastic.create_index(mappings, force=True) 89 | keys = list(set(list(entity_cat.keys()) + list(entity_type.keys()))) 90 | docs = {} 91 | count = 1 92 | for key in keys: 93 | entity = "" 94 | type_tmp = entity_type.get(key, []) 95 | type = [] 96 | for t in type_tmp: 97 | if t.startswith("")[0].rsplit("/")[-1] 99 | type.append(tmp) 100 | 101 | cat = entity_cat.get(key, []) 102 | cat_a = [] 103 | for c in cat: # prepare analyzed version 104 | cat_a.append(c.replace("_", " ")) 105 | type_a = [] 106 | for t in type: 107 | type_a.append(convert_from_camelcase(t)) # e.g., camelcase "MeanOfTransportation" => "Mean Of Transportation" 108 | 109 | # print('TTTT',type) 110 | doc = {"type_n": type, "type_a": type_a, "category_n": cat, "category_a": cat_a} 111 | docs[entity] = doc 112 | if len(docs) == 10000: 113 | print("-------", count) 114 | count += 1 115 | elastic.add_docs_bulk(docs) 116 | docs = {} 117 | elastic.add_docs_bulk(docs) 118 | print("Finish now") 119 | 120 | def statistic(): 121 | a = 0 122 | b = 0 123 | f1 = open("entity_category.json", "r") 124 | entity_cat = json.load(f1) 125 | f2 = open("entity_type.json", "r") 126 | entity_type = json.load(f2) 127 | keys = list(set(list(entity_cat.keys()) + list(entity_type.keys()))) 128 | for key in keys: 129 | type_tmp = entity_type.get(key, []) 130 | type = [] 131 | for t in type_tmp: 132 | if t.startswith("")[0].rsplit("/")[-1] 134 | type.append(tmp) 135 | b += 1 136 | 137 | cat = entity_cat.get(key, []) 138 | cat_a = [] 139 | for c in cat: # prepare analyzed version 140 | cat_a.append(c.replace("_", " ")) 141 | a += 1 142 | print("Finish now") 143 | print(a,b,len(keys)) 144 | 145 | 146 | if __name__ == "__main__": 147 | # extrac_type() 148 | # extrac_cat() 149 | # cat_type_index() 150 | statistic() 151 | # a = "" 152 | # print(a) 153 | # tmp = a.split(">")[0].rsplit("/")[-1] 154 | # print(tmp) 155 | # ---------------Testing----------------- 156 | # index_name = "dbpedia_2015_10_type_cat" 157 | # es = Elastic(index_name) 158 | # entity_id = "" 159 | # field = "type_a" 160 | # doc = es.get_doc(entity_id, field) 161 | # cats = doc.get("_source").get(field) 162 | # print("TYPE:",cats) 163 | # for cat in cats: 164 | # cat = "Scranton/Wilkes-Barre_RailRiders_players" 165 | # cat = es.analyze_query(cat) 166 | # print(cat) 167 | # res = es.search(query=cat, field=field, num=100) 168 | # print(res.keys()) 169 | # break 170 | # name = "Ase Bsef Cfds Dfdls" 171 | # a = convert_from_camelcase(name) 172 | # print(a) 173 | # f2 = open("entity_type.json", "r") 174 | # entity_type = json.load(f2) 175 | # print(entity_type["Audi_A4"]) 176 | 177 | -------------------------------------------------------------------------------- /Population/column_evaluation.py: -------------------------------------------------------------------------------- 1 | """ 2 | Evaluation of Column Population 3 | ------------------------------- 4 | 5 | Column population pipeline. 6 | 7 | @author: Shuo Zhang 8 | """ 9 | 10 | from elastic import Elastic 11 | 12 | 13 | 14 | class Column_evaluation(object): 15 | def __init__(self, test_tables=None): 16 | """ 17 | 18 | :param test_tables: 1000 wiki test tables 19 | """ 20 | self.test_tables = test_tables 21 | self.__tes = Elastic("table_index_frt") 22 | 23 | def rank_candidates(self, seed, c, E): 24 | """ 25 | 26 | :param cand: candidate entities 27 | :param seed: Seed entity 28 | :param c: Table caption 29 | :return: Ranked suggestions 30 | """ 31 | pass 32 | 33 | def find_candidates_c(self, c, seed, num=100): 34 | """find candidate tables complement with table caption""" 35 | res = self.__tes.search(query=c, field="caption", num=num) 36 | cand = [] 37 | for table_id in res.keys(): 38 | doc = self.__tes.get_doc(table_id) 39 | labels = doc["_source"]["headings_a"] 40 | cand += labels 41 | return set([i for i in cand if i not in seed]), list(res.keys()) 42 | 43 | 44 | def find_candidates_l(self, seed, num=100): 45 | """find candidate labels Using labels """ 46 | tables = [] 47 | cand = [] 48 | for label in seed: 49 | res = self.__tes.search(query=label, field="headings", num=num) 50 | tables += list(res.keys()) 51 | for table_id in res.keys(): 52 | doc = self.__tes.get_doc(table_id) 53 | labels = doc["_source"]["headings"] 54 | cand += labels 55 | return set([i for i in cand if i not in seed]), tables 56 | 57 | 58 | def find_candidates_e(self, E, seed, num=100): 59 | """find candidate labels Using entities""" 60 | tables = [] 61 | cand = [] 62 | for entity in E: 63 | body = self.generate_search_body(entity=entity, field="entity") 64 | res = self.__tes.search_complex(body=body, num=num) 65 | tables += list(res.keys()) 66 | for table_id in res.keys(): 67 | doc = self.__tes.get_doc(table_id) 68 | labels = doc["_source"]["headings"] 69 | cand += labels 70 | return set([i for i in cand if i not in seed]), tables 71 | 72 | def generate_search_body(self, entity, field): 73 | """Generate search body""" 74 | body = { 75 | "query": { 76 | "bool": { 77 | "must": { 78 | "term": {field: entity} 79 | } 80 | } 81 | } 82 | } 83 | return body 84 | 85 | def parse(self, text): 86 | """Put query into a term list for term iteration""" 87 | stopwords = [] 88 | terms = [] 89 | # Replace specific characters with space 90 | chars = ["'", ".", ":", ",", "/", "(", ")", "-", "+"] 91 | for ch in chars: 92 | if ch in text: 93 | text = text.replace(ch, " ") 94 | # Tokenization 95 | for term in text.split(): # default behavior of the split is to split on one or more whitespaces 96 | # Stopword removal 97 | if term in stopwords: 98 | continue 99 | terms.append(term) 100 | return terms 101 | 102 | 103 | 104 | 105 | 106 | -------------------------------------------------------------------------------- /Population/column_rank_label.py: -------------------------------------------------------------------------------- 1 | """ 2 | Given seed labels, entities and caption to rank candidate labels. 3 | 4 | author: Shuo Zhang 5 | """ 6 | 7 | from column_evaluation import Column_evaluation 8 | from scorer import ScorerLM 9 | from elastic import Elastic 10 | import math 11 | 12 | 13 | class Rank_label(Column_evaluation): 14 | def __init__(self, index_name = "table_index_frt"): 15 | super().__init__() 16 | self.__tes = Elastic(index_name=index_name) 17 | self.__num = 100 18 | 19 | def rank_candidates(self, seed_label, E, c): 20 | """Ranking candidate labels""" 21 | p_all = {} 22 | labels_c, tables_c = self.find_candidates_c(c, seed=seed_label, num=self.__num) # search tables with similar caption 23 | labels_e, tables_e = self.find_candidates_e(E, seed=seed_label, num=self.__num) 24 | lables_h, tables_h = self.find_candidates_l(seed=seed_label, num=self.__num) 25 | all_tables = set(tables_c + tables_e + tables_h) # all related tables (ids) 26 | candidate_labels = set(list(labels_c) + list(labels_e) + list(lables_h)) 27 | p_t_ecl, headings = self.p_t_ecl(all_tables, seed_label, E) 28 | for label in candidate_labels: 29 | p_all[label] = 0 30 | for table in all_tables: 31 | table_label = headings.get(table,[]) 32 | if label in table_label: 33 | p_all[label] += p_t_ecl[table]/len(table_label) 34 | return p_all 35 | 36 | def p_t_ecl(self, all_table, seed_label, E): 37 | p = {} 38 | headings = {} 39 | for table in all_table: 40 | doc = self.__tes.get_doc(table) 41 | table_label = doc.get("_source").get("headings_n") 42 | headings[table] = table_label 43 | sim_l = self.overlap(table_label, seed_label) 44 | table_entity = doc.get("_source").get("entity") 45 | sim_e = self.overlap(table_entity, E) 46 | table_caption = doc.get("_source").get("caption") 47 | score = ScorerLM(self.__tes, table_caption, {}).score_doc(table) 48 | p[table] = max(sim_e, 0.000001) * max(sim_l, 0.000001) * max(math.exp(score), 0.000001) 49 | return p, headings 50 | 51 | 52 | def overlap(self, a, b): 53 | """Calculate |A and B|/|B|""" 54 | return len([i for i in a if i in b]) / len(b) 55 | 56 | 57 | if __name__ == "__main__": 58 | r = Rank_label() 59 | seed_label = ["episode"] 60 | E = ["Does_the_Team_Think?"] 61 | c = "Episodes" 62 | res = r.rank_candidates(seed_label=seed_label, E=E, c=c) 63 | print(res) 64 | 65 | 66 | -------------------------------------------------------------------------------- /Population/elastic.py: -------------------------------------------------------------------------------- 1 | """ 2 | elastic 3 | ------- 4 | 5 | Tools for working with Elasticsearch. 6 | This class is to be instantiated for each index. 7 | 8 | @author: Faegheh Hasibi 9 | @author: Krisztian Balog 10 | """ 11 | 12 | from pprint import pprint 13 | 14 | from elasticsearch import Elasticsearch 15 | from elasticsearch import helpers 16 | 17 | ES_config = { 18 | "hosts": [ 19 | "localhost:9200" 20 | ], 21 | "settings": { 22 | "number_of_shards": 1, 23 | "number_of_replicas": 0 24 | } 25 | } 26 | ELASTIC_HOSTS = ES_config.get("host") 27 | ELASTIC_SETTINGS = ES_config.get("settings") 28 | 29 | 30 | class Elastic(object): 31 | FIELD_CATCHALL = "catchall" 32 | FIELD_ELASTIC_CATCHALL = "_all" 33 | DOC_TYPE = "doc" # we don't make use of types 34 | ANALYZER_STOP_STEM = "english" 35 | ANALYZER_STOP = "stop_en" 36 | BM25 = "BM25" 37 | SIMILARITY = "sim" # Used when other similarities are used 38 | 39 | def __init__(self, index_name): 40 | self.__es = Elasticsearch(hosts=ELASTIC_HOSTS) 41 | self.__index_name = index_name 42 | 43 | @staticmethod 44 | def analyzed_field(analyzer=ANALYZER_STOP): 45 | """Returns the mapping for analyzed fields. 46 | 47 | :param analyzer: name of the analyzer; valid options: [ANALYZER_STOP, ANALYZER_STOP_STEM] 48 | """ 49 | if analyzer not in {Elastic.ANALYZER_STOP, Elastic.ANALYZER_STOP_STEM}: 50 | print("Error: Analyzer", analyzer, "is not valid.") 51 | exit(0) 52 | return {"type": "string", 53 | "term_vector": "with_positions_offsets", 54 | "analyzer": analyzer} 55 | 56 | @staticmethod 57 | def notanalyzed_field(): 58 | """Returns the mapping for not-analyzed fields.""" 59 | return {"type": "string", 60 | "index": "not_analyzed"} 61 | # "similarity": Elastic.SIMILARITY} 62 | 63 | def __gen_similarity(self, model, params=None): 64 | """Gets the custom similarity function.""" 65 | similarity = params if params else {} 66 | similarity["type"] = model 67 | return {Elastic.SIMILARITY: similarity} 68 | 69 | def __gen_analyzers(self): 70 | """Gets custom analyzers. 71 | We include customized analyzers in the index setting, a field may or may not use it. 72 | """ 73 | analyzer = {"type": "standard", "stopwords": "_english_"} 74 | analyzers = {"analyzer": {Elastic.ANALYZER_STOP: analyzer}} 75 | return analyzers 76 | 77 | def analyze_query(self, query, analyzer=ANALYZER_STOP): 78 | """Analyzes the query. 79 | 80 | :param query: raw query 81 | :param analyzer: name of analyzer 82 | """ 83 | tokens = self.__es.indices.analyze(index=self.__index_name, body=query, analyzer=analyzer)["tokens"] 84 | query_terms = [] 85 | for t in sorted(tokens, key=lambda x: x["position"]): 86 | query_terms.append(t["token"]) 87 | return " ".join(query_terms) 88 | 89 | def get_mapping(self): 90 | """Returns mapping definition for the index.""" 91 | mapping = self.__es.indices.get_mapping(index=self.__index_name, doc_type=self.DOC_TYPE) 92 | return mapping[self.__index_name]["mappings"][self.DOC_TYPE]["properties"] 93 | 94 | def get_settings(self): 95 | """Returns index settings.""" 96 | return self.__es.indices.get_settings(index=self.__index_name)[self.__index_name]["settings"]["index"] 97 | 98 | def __update_settings(self, settings): 99 | """Updates the index settings.""" 100 | self.__es.indices.close(index=self.__index_name) 101 | self.__es.indices.put_settings(index=self.__index_name, body=settings) 102 | self.__es.indices.open(index=self.__index_name) 103 | self.__es.indices.refresh(index=self.__index_name) 104 | 105 | def update_similarity(self, model=BM25, params=None): 106 | """Updates the similarity function "sim", which is fixed for all index fields. 107 | 108 | The method and param should match elastic settings: 109 | https://www.elastic.co/guide/en/elasticsearch/reference/2.3/index-modules-similarity.html 110 | 111 | :param model: name of the elastic model 112 | :param params: dictionary of params based on elastic 113 | """ 114 | old_similarity = self.get_settings()["similarity"] 115 | new_similarity = self.__gen_similarity(model, params) 116 | # We only update the similarity if it is different from the old one. 117 | # this avoids unnecessary closing of the index 118 | if old_similarity != new_similarity: 119 | self.__update_settings({"similarity": new_similarity}) 120 | 121 | def delete_index(self): 122 | """Deletes an index.""" 123 | self.__es.indices.delete(index=self.__index_name) 124 | print("Index <" + self.__index_name + "> has been deleted.") 125 | 126 | def create_index(self, mappings, model=BM25, model_params=None, force=False): 127 | """Creates index (if it doesn't exist). 128 | 129 | :param mappings: field mappings 130 | :param model: name of elastic search similarity 131 | :param model_params: name of elastic search similarity 132 | :param force: forces index creation (overwrites if already exists) 133 | """ 134 | if self.__es.indices.exists(self.__index_name): 135 | if force: 136 | self.delete_index() 137 | else: 138 | print("Index already exists. No changes were made.") 139 | return 140 | 141 | # sets general elastic settings 142 | body = ELASTIC_SETTINGS 143 | 144 | # sets the global index settings 145 | # number of shards should be always set to 1; otherwise the stats would not be correct 146 | body["settings"] = {"analysis": self.__gen_analyzers(), 147 | "index": {"number_of_shards": 1, 148 | "number_of_replicas": 0}, 149 | } 150 | 151 | # sets similarity function 152 | # If model is not BM25, a similarity module with the given model and params is defined 153 | if model != Elastic.BM25: 154 | body["settings"]["similarity"] = self.__gen_similarity(model, model_params) 155 | sim = model if model == Elastic.BM25 else Elastic.SIMILARITY 156 | for mapping in mappings.values(): 157 | mapping["similarity"] = sim 158 | 159 | # sets the field mappings 160 | body["mappings"] = {self.DOC_TYPE: {"properties": mappings}} 161 | 162 | # creates the index 163 | self.__es.indices.create(index=self.__index_name, body=body) 164 | pprint(body) 165 | print("New index <" + self.__index_name + "> is created.") 166 | 167 | def add_docs_bulk(self, docs): 168 | """Adds a set of documents to the index in a bulk. 169 | 170 | :param docs: dictionary {doc_id: doc} 171 | """ 172 | actions = [] 173 | for doc_id, doc in docs.items(): 174 | action = { 175 | "_index": self.__index_name, 176 | "_type": self.DOC_TYPE, 177 | "_id": doc_id, 178 | "_source": doc 179 | } 180 | actions.append(action) 181 | 182 | if len(actions) > 0: 183 | helpers.bulk(self.__es, actions) 184 | 185 | def add_doc(self, doc_id, contents): 186 | """Adds a document with the specified contents to the index. 187 | 188 | :param doc_id: document ID 189 | :param contents: content of document 190 | """ 191 | self.__es.index(index=self.__index_name, doc_type=self.DOC_TYPE, id=doc_id, body=contents) 192 | 193 | def get_doc(self, doc_id, fields=None, source=True): 194 | """Gets a document from the index based on its ID. 195 | 196 | :param doc_id: document ID 197 | :param fields: list of fields to return (default: all) 198 | :param source: return document source as well (default: yes) 199 | """ 200 | return self.__es.get(index=self.__index_name, doc_type=self.DOC_TYPE, id=doc_id, _source=source) 201 | 202 | def search(self, query, field, num=100, fields_return="", start=0): 203 | """Searches in a given field using the similarity method configured in the index for that field. 204 | 205 | :param query: query string 206 | :param field: field to search in 207 | :param num: number of hits to return (default: 100) 208 | :param fields_return: additional document fields to be returned 209 | :param start: starting offset (default: 0) 210 | :return: dictionary of document IDs with scores 211 | """ 212 | hits = self.__es.search(index=self.__index_name, q=query, df=field, _source=False, size=num, 213 | from_=start)["hits"]["hits"] 214 | results = {} 215 | for hit in hits: 216 | results[hit["_id"]] = hit["_score"] 217 | return results 218 | 219 | def estimate_number(self, query): 220 | """Search body, return the number of hits containg body""" 221 | try: 222 | return self.__es.search(index=self.__index_name, q = query, _source=False, size=1, 223 | from_=0)["hits"]["total"] 224 | except: 225 | return 0 226 | 227 | def search_complex(self, body, num=100, fields_return="", start=0): 228 | """Searches in a given field using the similarity method configured in the index for that field. 229 | 230 | :param body: query body 231 | :param field: field to search in 232 | :param num: number of hits to return (default: 100) 233 | :param fields_return: additional document fields to be returned 234 | :param start: starting offset (default: 0) 235 | :return: dictionary of document IDs with scores 236 | """ 237 | hits = self.__es.search(index=self.__index_name, body=body, _source=False, size=num, 238 | from_=start)["hits"]["hits"] 239 | results = {} 240 | for hit in hits: 241 | results[hit["_id"]] = hit["_score"] 242 | return results 243 | 244 | def estimate_number_complex(self, body): 245 | """Search body, return the number of hits containg body""" 246 | try: 247 | return self.__es.search(index=self.__index_name, body=body, _source=False, size=1, 248 | from_=0)["hits"]["total"] 249 | except: 250 | return 0 251 | 252 | def get_ids(self, body): 253 | """Search body, return the number of hits containg body""" 254 | try: 255 | return search_complex(body).keys() 256 | except: 257 | return 0 258 | 259 | def get_field_stats(self, field): 260 | """Returns stats of the given field.""" 261 | return self.__es.field_stats(index=self.__index_name, fields=[field])["indices"]["_all"]["fields"][field] 262 | 263 | def get_fields(self): 264 | """Returns name of fields in the index.""" 265 | return list(self.get_mapping().keys()) 266 | 267 | # ========================================= 268 | # ================= Stats ================= 269 | # ========================================= 270 | def __get_termvector(self, doc_id, field, term_stats=False): 271 | """Returns a term vector for a given document field, including global field and term statistics. 272 | Term stats can have a serious performance impact; should be set to true only if it is needed! 273 | 274 | :param doc_id: document ID 275 | :param field: field name 276 | """ 277 | tv = self.__es.termvectors(index=self.__index_name, doc_type=self.DOC_TYPE, id=doc_id, fields=field, 278 | term_statistics=term_stats) 279 | return tv.get("term_vectors", {}).get(field, {}).get("terms", {}) 280 | 281 | def __get_coll_termvector(self, term, field): 282 | """Returns a term vector containing collection stats of a term.""" 283 | hits = self.search(term, field, num=1) 284 | doc_id = next(iter(hits.keys())) if len(hits) > 0 else None 285 | return self.__get_termvector(doc_id, field, term_stats=True) if doc_id else {} 286 | 287 | def num_docs(self): 288 | """Returns the number of documents in the index.""" 289 | return self.__es.count(index=self.__index_name, doc_type=self.DOC_TYPE)["count"] 290 | 291 | def num_fields(self): 292 | """Returns number of fields in the index.""" 293 | return len(self.get_mapping()) 294 | 295 | def doc_count(self, field): 296 | """Returns number of documents with at least one term for the given field.""" 297 | return self.get_field_stats(field)["doc_count"] 298 | 299 | def coll_length(self, field): 300 | """Returns length of field in the collection.""" 301 | return self.get_field_stats(field)["sum_total_term_freq"] 302 | 303 | def avg_len(self, field): 304 | """Returns average length of a field in the collection.""" 305 | return self.coll_length(field) / self.doc_count(field) 306 | 307 | def doc_length(self, doc_id, field): 308 | """Returns length of a field in a document.""" 309 | return sum(self.term_freqs(doc_id, field).values()) 310 | 311 | def doc_freq(self, term, field): 312 | """Returns document frequency for the given term and field.""" 313 | tv = self.__get_coll_termvector(term, field) 314 | return tv.get(term, {}).get("doc_freq", 0) 315 | 316 | def coll_term_freq(self, term, field): 317 | """ Returns collection term frequency for the given field.""" 318 | tv = self.__get_coll_termvector(term, field) 319 | return tv.get(term, {}).get("ttf", 0) 320 | 321 | def term_freqs(self, doc_id, field): 322 | """Returns term frequencies for a given document and field. 323 | 324 | :return dictionary of terms with their frequencies; {doc_id: freq, ...} 325 | """ 326 | tv = self.__get_termvector(doc_id, field) 327 | term_freqs = {} 328 | for term, val in tv.items(): 329 | term_freqs[term] = val["term_freq"] 330 | return term_freqs 331 | 332 | def term_freq(self, doc_id, field, term): 333 | """Returns frequency of a term in a given document and field.""" 334 | return self.term_freqs(doc_id, field).get(term, 0) 335 | 336 | 337 | if __name__ == "__main__": 338 | field = "content" 339 | term = "gonna" 340 | doc_id = 4 341 | 342 | es = Elastic("toy_index") 343 | pprint(es.search("gonna", "content")) 344 | 345 | print("================= Stats =================") 346 | print("[FIELD]: %s [TERM]: %s" % (field, term)) 347 | print("- Number of documents: %d" % es.num_docs()) 348 | print("- Number of fields: %d" % es.num_fields()) 349 | print("- Document count: %d" % es.doc_count(field)) 350 | print("- Collection length: %d" % es.coll_length(field)) 351 | print("- Average length: %.2f" % es.avg_len(field)) 352 | print("- Document length: %d" % es.doc_length(doc_id, field)) 353 | print("- Number of fields: %d" % es.num_fields()) 354 | print("- Document frequency: %d" % es.doc_freq(term, field)) 355 | print("- Collection frequency: %d" % es.coll_term_freq(term, field)) 356 | print("- Term frequencies:") 357 | pprint(es.term_freqs(doc_id, field)) 358 | person_id = "you" 359 | # search doc containing person_id 360 | body = { 361 | "query": { 362 | "bool": { 363 | "must": { 364 | "term": {"content": person_id} 365 | } 366 | } 367 | } 368 | } 369 | # search docs containing both person_id and a(analyzed) 370 | a = es.analyze_query("me") 371 | body = {"query": { 372 | "bool": { 373 | "must": [ 374 | { 375 | "match": {"content": person_id} 376 | }, 377 | { 378 | "match_phrase": {"content": a} 379 | } 380 | ] 381 | } 382 | }} 383 | print(es.search_complex(body, "content")) 384 | print(es.term_freqs(1, "content")) 385 | 386 | # pprint.pprint(es.get_termvector("", "title")) 387 | # pprint.pprint(es.search("people", "title", fields_return="title")) 388 | -------------------------------------------------------------------------------- /Population/elastic_cache.py: -------------------------------------------------------------------------------- 1 | """ 2 | elastic_cache 3 | ------------- 4 | 5 | This is a cache for elastic index stats; a layer between an index and scorer. 6 | 7 | @author: Faegheh Hasibi 8 | """ 9 | from elastic import Elastic 10 | 11 | 12 | class ElasticCache(Elastic): 13 | def __init__(self, index_name): 14 | super(ElasticCache, self).__init__(index_name) 15 | 16 | # Cached variables 17 | self.__num_docs = None 18 | self.__num_fields = None 19 | self.__doc_count = {} 20 | self.__coll_length = {} 21 | self.__avg_len = {} 22 | self.__doc_length = {} 23 | self.__doc_freq = {} 24 | self.__coll_termfreq = {} 25 | 26 | def __check_cache(self, func, params, var): 27 | #TODO 28 | pass 29 | 30 | def num_docs(self): 31 | """Returns the number of documents in the index.""" 32 | if self.__num_docs is None: 33 | self.__num_docs = super(ElasticCache, self).num_docs() 34 | return self.__num_docs 35 | 36 | def num_fields(self): 37 | """Returns number of fields in the index.""" 38 | if self.__num_fields is None: 39 | self.__num_fields = super(ElasticCache, self).num_fields() 40 | return self.__num_fields 41 | 42 | def doc_count(self, field): 43 | """Returns number of documents with at least one term for the given field.""" 44 | if field not in self.__doc_count: 45 | self.__doc_count[field] = super(ElasticCache, self).doc_count(field) 46 | return self.__doc_count[field] 47 | 48 | def coll_length(self, field): 49 | """Returns length of field in the collection.""" 50 | if field not in self.__coll_length: 51 | self.__coll_length[field] = super(ElasticCache, self).coll_length(field) 52 | return self.__coll_length[field] 53 | 54 | def avg_len(self, field): 55 | """Returns average length of a field in the collection.""" 56 | if field not in self.__avg_len: 57 | self.__avg_len[field] = super(ElasticCache, self).avg_len(field) 58 | return self.__avg_len[field] 59 | 60 | def doc_length(self, doc_id, field): 61 | """Returns length of a field in a document.""" 62 | if doc_id not in self.__doc_length: 63 | self.__doc_length[doc_id] = {} 64 | if field not in self.__doc_length[doc_id]: 65 | self.__doc_length[doc_id][field] = super(ElasticCache, self).doc_length(doc_id, field) 66 | return self.__doc_length[doc_id][field] 67 | 68 | def doc_freq(self, term, field): 69 | """Returns document frequency for the given term and field.""" 70 | if field not in self.__doc_freq: 71 | self.__doc_freq[field] = {} 72 | if term not in self.__doc_freq[field]: 73 | self.__doc_freq[field][term] = super(ElasticCache, self).doc_freq(term, field) 74 | return self.__doc_freq[field][term] 75 | 76 | def coll_term_freq(self, term, field): 77 | """ Returns collection term frequency for the given field.""" 78 | if field not in self.__coll_termfreq: 79 | self.__coll_termfreq[field] = {} 80 | if term not in self.__coll_termfreq[field]: 81 | self.__coll_termfreq[field][term] = super(ElasticCache, self).coll_term_freq(term, field) 82 | return self.__coll_termfreq[field][term] 83 | -------------------------------------------------------------------------------- /Population/retrieval.py: -------------------------------------------------------------------------------- 1 | """ 2 | retrieval 3 | --------- 4 | 5 | Console application for general-purpose retrieval. 6 | 7 | * *First pass*: get top ``N`` documents using Elastic's default retrieval method (based on the catch-all content field) 8 | * *Second pass*: perform (expensive) scoring of the top ``N`` documents using the Scorer class 9 | 10 | @author: Krisztian Balog 11 | @author: Faegheh Hasibi 12 | """ 13 | import argparse 14 | import json 15 | 16 | import sys 17 | from pprint import pprint 18 | 19 | from Population.elastic import Elastic 20 | from Population.elastic_cache import ElasticCache 21 | from Population.scorer import Scorer, ScorerLM 22 | #from file_utils import FileUtils 23 | 24 | 25 | class Retrieval(object): 26 | """Loads config file, checks params, and sets default values. 27 | 28 | :param config: retrieval config (JSON config file or a dictionary) of the shape: 29 | 30 | :: 31 | 32 | { 33 | "index_name": name of the index, 34 | "first_pass": { 35 | "num_docs": number of documents in first-pass scoring (default: 10000) 36 | "field": field used in first pass retrieval (default: Elastic.FIELD_CATCHALL) 37 | }, 38 | "second_pass": { 39 | "num_docs": number of documents to return (default: 100) 40 | "field": field name (for single field models; e.g., LM, SDM) 41 | "fields": list of fields (for multiple field models; e.g., MLM, PRMS) 42 | "field_weights": dictionary with fields and corresponding weights (for MLM and FSDM) 43 | "model": name of retrieval model; accepted values: [lm, mlm, prms, sdm, fsdm] (default: lm) 44 | "smoothing_method": accepted values: [jm, dirichlet] (default: dirichet) 45 | "smoothing_param": value of lambda or mu accepted values: [float or "avg_len"], 46 | (jm default: 0.1, dirichlet default: 2000) 47 | }, 48 | "query_file": name of query file (JSON), 49 | "output_file": name of output file, 50 | "run_id": run id for TREC output 51 | } 52 | 53 | """ 54 | FIELDED_MODELS = {"mlm", "prms", "fsdm"} 55 | LM_MODELS = {"lm", "mlm", "prms", "sdm", "fsdm"} 56 | 57 | def __init__(self, config): 58 | self.__check_config(config) 59 | pprint(config) 60 | self.__config = config 61 | self.__index_name = config["index_name"] 62 | self.__first_pass_num_docs = config["first_pass"]["num_docs"] 63 | self.__first_pass_field = config["first_pass"]["field"] 64 | self.__first_pass_model = config["first_pass"]["model"] 65 | self.__second_pass = config.get("second_pass", None) 66 | self.__second_pass_model = config.get("second_pass", {}).get("model", None) 67 | self.__second_pass_num_docs = config.get("second_pass", {}).get("num_docs", None) 68 | self.__query_file = config.get("query_file", None) 69 | self.__output_file = config.get("output_file", None) 70 | self.__run_id = config.get("run_id", self.__second_pass_model) 71 | 72 | self.__elastic = ElasticCache(self.__index_name) 73 | 74 | @staticmethod 75 | def __check_config(config): 76 | """Checks config parameters and sets default values.""" 77 | try: 78 | if "index_name" not in config: 79 | raise Exception("index_dir is missing") 80 | # Checks first pass parameters 81 | if "first_pass" not in config: 82 | config["first_pass"] = {} 83 | if "num_docs" not in config["first_pass"]: 84 | config["first_pass"]["num_docs"] = 1000 85 | if "field" not in config["first_pass"]: 86 | config["first_pass"]["field"] = Elastic.FIELD_CATCHALL 87 | if "model" not in config["first_pass"]: 88 | config["first_pass"]["model"] = Elastic.BM25 89 | # todo: set default params for "params" (from elastic search) 90 | 91 | # Checks second pass parameters 92 | if "second_pass" in config: 93 | if "num_docs" not in config["second_pass"]: 94 | config["second_pass"]["num_docs"] = 100 95 | if "field" not in config["second_pass"]: 96 | config["second_pass"]["field"] = Elastic.FIELD_CATCHALL 97 | if "model" not in config["second_pass"]: 98 | config["second_pass"]["model"] = "lm" 99 | if config["second_pass"]["model"] in Retrieval.LM_MODELS: 100 | if "smoothing_method" not in config["second_pass"]: 101 | config["second_pass"]["smoothing_method"] = ScorerLM.DIRICHLET 102 | if "smoothing_param" not in config["second_pass"]: 103 | if config["second_pass"]["smoothing_method"] == ScorerLM.DIRICHLET: 104 | config["second_pass"]["smoothing_param"] = 2000 105 | elif config["second_pass"]["smoothing_method"] == ScorerLM.JM: 106 | config["second_pass"]["smoothing_param"] = 0.1 107 | else: 108 | raise Exception("Smoothing method is not supported.") 109 | # todo: set default params for "fields" (for MLM, PRMS, etc.) 110 | except Exception as e: 111 | print("Error in config file: ", e) 112 | sys.exit(1) 113 | 114 | def _first_pass_scoring(self, analyzed_query): 115 | """Returns first-pass scoring of documents. 116 | 117 | :param analyzed_query: analyzed query 118 | :return: RetrievalResults object 119 | """ 120 | print("\tFirst pass scoring... ", ) 121 | # todo: add support for other similarities 122 | # self.__elastic.update_similarity(self.__first_pass_model, self.__first_pass_model_params) 123 | res1 = self.__elastic.search(analyzed_query, self.__first_pass_field, num=self.__first_pass_num_docs) 124 | return res1 125 | 126 | def _second_pass_scoring(self, res1, scorer): 127 | """Returns second-pass scoring of documents. 128 | 129 | :param res1: first pass results 130 | :param scorer: scorer object 131 | :return: RetrievalResults object 132 | """ 133 | print("\tSecond pass scoring... ", ) 134 | res2 = {} 135 | for doc_id in res1.keys(): 136 | res2[doc_id] = scorer.score_doc(doc_id) 137 | print("done") 138 | return res2 139 | 140 | def retrieve(self, query): 141 | """Scores documents for the given query.""" 142 | query = self.__elastic.analyze_query(query) 143 | # 1st pass retrieval 144 | res1 = self._first_pass_scoring(query) 145 | if not self.__second_pass: 146 | return res1 147 | 148 | # 2nd pass retrieval 149 | scorer = Scorer.get_scorer(self.__elastic, self.__second_pass_model, query, self.__second_pass) 150 | res2 = self._second_pass_scoring(res1, scorer) 151 | return res2 152 | 153 | def batch_retrieval(self): 154 | """Scores queries in a batch and outputs results.""" 155 | queries = json.load(open(self.__query_file)) 156 | 157 | # sets the numbers of documents in the trec file 158 | max_rank = self.__second_pass_num_docs if self.__second_pass else self.__first_pass_num_docs 159 | 160 | # init output file 161 | open(self.__output_file, "w").write("") 162 | out = open(self.__output_file, "w") 163 | 164 | # retrieves documents 165 | for query_id in sorted(queries): 166 | print("scoring [" + query_id + "] " + queries[query_id]) 167 | results = self.retrieve(queries[query_id]) 168 | out.write(self.trec_format(results, query_id, max_rank)) 169 | out.close() 170 | print("Output file:", self.__output_file) 171 | 172 | def trec_format(self, results, query_id, max_rank=100): 173 | """Outputs results in TREC format""" 174 | out_str = "" 175 | rank = 1 176 | for doc_id, score in sorted(results.items(), key=lambda x: x[1], reverse=True): 177 | if rank > max_rank: 178 | break 179 | out_str += query_id + "\tQ0\t" + doc_id + "\t" + str(rank) + "\t" + str(score) + "\t" + self.__run_id + "\n" 180 | rank += 1 181 | return out_str 182 | 183 | 184 | def arg_parser(): 185 | parser = argparse.ArgumentParser() 186 | parser.add_argument("config", help="config file", type=str) 187 | args = parser.parse_args() 188 | return args 189 | 190 | 191 | def main(): 192 | dbpedia_config = {"index_name": "dbpedia_2015_10", 193 | "first_pass": { 194 | "num_docs": 1000 195 | }, 196 | "second_pass": { 197 | "model": "lm", 198 | "num_docs": 1000, 199 | "smoothing_method": "dirichlet", 200 | "smoothing_param": 2000, 201 | "field_weights": {"catchall": 0.4, "related_entity_names": 0.2, "categories": 0.4} 202 | }, 203 | # "query_file": "data/queries/dbpedia-entity.json", 204 | # "output_file": "output/mlm_tc.txt", 205 | "run_id": "mlm_tc" 206 | } 207 | r = Retrieval(dbpedia_config) 208 | pprint(r.retrieve("gonna")) 209 | 210 | 211 | if __name__ == "__main__": 212 | main() 213 | -------------------------------------------------------------------------------- /Population/row_evaluation.py: -------------------------------------------------------------------------------- 1 | """ 2 | Evaluation of Row Population 3 | ---------------------------- 4 | 5 | Row population performance evaluation. 6 | 7 | @author: Shuo Zhang 8 | """ 9 | 10 | from elastic import Elastic 11 | 12 | 13 | class Row_evaluation(object): 14 | def __init__(self, index_name="dbpedia_2015_10_type_cat"): 15 | """ 16 | 17 | :param index_name: name of index 18 | """ 19 | self.__index_name = index_name 20 | self.__elastic = Elastic(self.__index_name) 21 | self.__tes = Elastic("table_index_frt") 22 | 23 | def rank_candidates(self, seed , c=None, l=None): 24 | """ 25 | 26 | :param cand: candidate entities 27 | :param seed: Seed entity 28 | :param a: Attributes 29 | :param c: Table caption 30 | :return: Ranked suggestions 31 | """ 32 | pass 33 | 34 | def find_candidates_c(self, seed_E, c, num=100): 35 | """table caption to find candidate entities""" 36 | cand = [] 37 | res = self.__tes.search(query=c, field="catchall", num=num) 38 | for table_id in res.keys(): 39 | doc = self.__tes.get_doc(table_id) 40 | labels = doc["_source"]["entity"] 41 | cand += labels 42 | return set([i for i in cand if i not in seed_E]) 43 | 44 | def find_candidates_e(self, seed_E, num=None): 45 | """seed entities to find candidate entities""" 46 | cand = [] 47 | for entity in seed_E: 48 | body = self.generate_search_body(item=entity, field="entity") 49 | res = self.__tes.search_complex(body=body, num=num) 50 | for table_id in res.keys(): 51 | doc = self.__tes.get_doc(table_id) 52 | labels = doc["_source"]["entity"] 53 | cand += labels 54 | return set([i for i in cand if i not in seed_E]) 55 | 56 | def generate_search_body(self, item, field): 57 | """Generate search body""" 58 | body = { 59 | "query": { 60 | "bool": { 61 | "must": { 62 | "term": {field: item} 63 | } 64 | } 65 | } 66 | } 67 | return body 68 | 69 | def find_candidates_cat(self, seed_E, num=100): # only category 70 | """return seed entities' categories""" 71 | cate_candidates = [] 72 | category = [] 73 | for entity in seed_E: 74 | doc = self.__elastic.get_doc(entity) 75 | cats = doc.get("_source").get("category") 76 | category += cats 77 | 78 | for cat in set(category): 79 | body = self.generate_search_body(item=cat, field="category") 80 | res = self.__elastic.search_complex(body=body, num=num) 81 | cate_candidates += [i for i in res.keys() if i not in seed_E] 82 | return set(cate_candidates) 83 | 84 | def parse(self, text): 85 | """Put query into a term list for term iteration""" 86 | stopwords = [] 87 | terms = [] 88 | # Replace specific characters with space 89 | chars = ["'", ".", ":", ",", "/", "(", ")", "-", "+"] 90 | for ch in chars: 91 | if ch in text: 92 | text = text.replace(ch, " ") 93 | # Tokenization 94 | for term in text.split(): # default behavior of the split is to split on one or more whitespaces 95 | # Stopword removal 96 | if term in stopwords: 97 | continue 98 | terms.append(term) 99 | return terms 100 | -------------------------------------------------------------------------------- /Population/row_ranking_entities.py: -------------------------------------------------------------------------------- 1 | """ 2 | Estimate P(E|e_i+1) of ranking entities in row population 3 | 4 | author: Shuo Zhang 5 | """ 6 | 7 | from elastic import Elastic 8 | from row_evaluation import Row_evaluation 9 | from scorer import ScorerLM 10 | import math 11 | 12 | 13 | class P_e_e(Row_evaluation): 14 | def __init__(self, index_name="table_index_frt", lamda=0.5): 15 | """ 16 | 17 | :param index_name: name of index 18 | :param lamda: smoothing parameter 19 | """ 20 | super().__init__() 21 | self.__lambda = lamda 22 | self.index_name = index_name 23 | self.__tes = Elastic(index_name) 24 | self.__elas = Elastic("dbpedia_2015_10_abstract") 25 | self.__mu = 0.5 26 | 27 | def rank_candidates(self, seed, c=None, l=None): 28 | cand = list(self.find_candidates_e(seed_E=seed, num=1)) + list(self.find_candidates_c(seed_E=seed, c=c)) + list( 29 | self.find_candidates_cat(seed_E=seed)) 30 | p_all = {} 31 | pee = self.estimate_pee(cand, seed) 32 | pce = self.estimate_pce(cand, c) 33 | ple = self.estimate_ple(cand, l) 34 | for entity, score in pee.items(): 35 | p_all[entity] = max(0.000001, score) * max(0.000001, pce.get(entity)) * max(0.000001, ple.get(entity)) 36 | return p_all 37 | 38 | def estimate_pee(self, cand, seed): 39 | """Estimate P(c|e_i+1) for candidates""" 40 | p_all = {} 41 | body = self.generate_search_body_multi(seed) 42 | n_e = self.__tes.estimate_number_complex(body) 43 | for entity in cand: 44 | body = self.generate_search_body_multi([entity]) 45 | n_e_i = self.__tes.estimate_number_complex(body) # number of tables containing e_i+1 46 | seed_e = [] 47 | seed_e.append(entity) 48 | for en in seed: 49 | seed_e.append(en) 50 | body = self.generate_search_body_multi(seed_e) 51 | n_e_e = self.__tes.estimate_number_complex(body) # number of tables containing e_i+1 and E 52 | sim = 0 # todo 53 | if n_e_i == 0: 54 | p_all[entity] = 0 55 | elif n_e == 0: 56 | p_all[entity] = (1 - self.__lambda) * sim # /n_e_i 57 | else: 58 | p_all[entity] = ((self.__lambda * (n_e_e / n_e) + (1 - self.__lambda) * sim)) # /n_e_i 59 | return p_all 60 | 61 | def generate_search_body_multi(self, seed): 62 | """Generate and return search body""" 63 | body = {} 64 | if len(seed) == 1: # One constraints 65 | body = { 66 | "query": { 67 | "bool": { 68 | "must": { 69 | "term": {"entity": seed[0]} 70 | } 71 | } 72 | } 73 | } 74 | else: # Multiple constraints 75 | must = [] 76 | must.append({"match": {"entity": seed[0]}}) 77 | for item in seed[1:]: 78 | must.append({"match_phrase": {"entity": item}}) 79 | body = { 80 | "query": { 81 | "bool": { 82 | "must": must 83 | } 84 | } 85 | } 86 | return body 87 | 88 | def estimate_pce(self, cand, c): 89 | """Estimate P(c|e_i+1) for candidates""" 90 | p_all = {} 91 | caption = self.parse(c) # Put query into a list 92 | for entity_id in cand: 93 | p = 0 94 | body = self.generate_search_body(entity_id, field="entity") 95 | table_ids = self.__tes.search_complex(body).keys() # Search table containing entity 96 | 97 | kb_l = self.__elas.doc_length(entity_id, "abstract") # entity abstract length 98 | kb_c_l = self.__elas.coll_length("abstract") # entity abstract collection length 99 | collection_l = self.__tes.coll_length("caption") # caption collection length 100 | for t in caption: # Iterate term in caption 101 | term = self.__tes.analyze_query(t) 102 | c_l, tf = 0, 0 # caption length, term freq 103 | for table_id in table_ids: 104 | c_l += self.__tes.doc_length(table_id, "caption") # caption length 105 | tf += self.__tes.term_freq(table_id, "caption", term) # caption term frequency 106 | tf_c = self.__tes.coll_term_freq(term, "caption") 107 | kb_tf = self.__elas.term_freq(entity_id, "abstract", term) # n(t,kb) 108 | kb_c_tf = self.__elas.coll_term_freq(term, "abstract") # term freq in kb collection 109 | p += self.estimate_p(kb_l, kb_tf, kb_c_l, kb_c_tf, tf, c_l, tf_c, collection_l) 110 | if p != 0: 111 | p = math.exp(p) 112 | p_all[entity_id] = p 113 | return p_all 114 | 115 | def estimate_p(self, kb_l, kb_tf, kb_c_l, kb_c_tf, tf, c_l, tf_c, collection_l): 116 | """P(t_c|e_i+1)""" 117 | p_kb = self.__lambda * (kb_tf + self.__mu * kb_c_tf / kb_c_l) / (kb_l + self.__mu) + (1 - self.__lambda) * ( 118 | tf + self.__mu * tf_c / collection_l) / (c_l + self.__mu) 119 | if p_kb != 0: 120 | p_kb = math.log(p_kb) 121 | return p_kb 122 | 123 | def p_l_theta_lm(self, label, table_ids): 124 | """Using language modeling estimate P(l|theta)""" 125 | p_label = self.parse(label) 126 | p_l_theta = 0 127 | c_l = self.__tes.coll_length("headings") # collection length 128 | for t in p_label: 129 | a_t = self.__tes.analyze_query(t) 130 | l_l = 0 # table label length(table containing t) 131 | t_f = 0 # tf of label 132 | c_tf = self.__tes.coll_term_freq(a_t, "headings") # tf in collection 133 | for table_id in table_ids2: 134 | l_l += self.__tes.doc_length(table_id, "headings") 135 | t_f += self.__tes.term_freq(table_id, "headings", a_t) 136 | if l_l + self.__mu != 0: 137 | p = (t_f + self.__mu * c_tf / c_l) / (l_l + self.__mu) 138 | p_l_theta += math.log(p) 139 | else: 140 | p_l_theta += 0 141 | 142 | if p_l_theta != 0: 143 | p_l_theta = math.exp(p_l_theta) 144 | return p_l_theta 145 | 146 | def estimate_ple(self, cand, l): 147 | """Estimate P(l|e_i+1) for candidates""" 148 | p_all = {} 149 | for entity in cand: 150 | p_all[entity] = 0 151 | for label in l: 152 | body = self.generate_search_body([entity], field="entity") 153 | n_e = self.__tes.estimate_number_complex(body) # number of tables containing e_i+1 154 | body2 = self.generate_search_body_l([entity, label]) 155 | n_l_e = self.__tes.estimate_number_complex(body2) # number of tables containing e_i+1&label 156 | table_ids = self.__tes.get_ids(body2) 157 | 158 | p_l_theta = self.p_l_theta_lm(label, table_ids) 159 | # ScorerLM(self.__tes, l, {}).score_doc(table) 160 | # self.p_l_theta_lm(label, table_ids) 161 | if n_e == 0: 162 | p_all[entity] += self.__lambda * p_l_theta 163 | else: 164 | p_all[entity] += self.__lambda * p_l_theta + (1 - self.__lambda) / len(l) * n_l_e / n_e 165 | return p_all 166 | 167 | 168 | def generate_search_body_l(self, query): 169 | """Generate and return search body""" 170 | body = {} 171 | if len(query) == 1: 172 | body = { 173 | "query": { 174 | "bool": { 175 | "must": { 176 | "term": {"entity": query[0]} 177 | } 178 | } 179 | } 180 | } 181 | elif len(query) == 2: 182 | body = { 183 | "query": { 184 | "bool": { 185 | "must": [ 186 | { 187 | "match": {"entity": query[0]} 188 | }, 189 | { 190 | "match_phrase": {"headings_n": query[1]} 191 | } 192 | ] 193 | } 194 | }} 195 | return body 196 | -------------------------------------------------------------------------------- /Population/scorer.py: -------------------------------------------------------------------------------- 1 | """ 2 | Scorer 3 | ====== 4 | 5 | Various retrieval models for scoring a individual document for a given query. 6 | 7 | :Authors: Faegheh Hasibi, Krisztian Balog 8 | """ 9 | import math 10 | import sys 11 | 12 | from elastic import Elastic 13 | from elastic_cache import ElasticCache 14 | 15 | 16 | class Scorer(object): 17 | """Base scorer class.""" 18 | 19 | SCORER_DEBUG = 0 20 | 21 | def __init__(self, elastic, query, params): 22 | self._elastic = elastic 23 | self._query = query 24 | self._params = params 25 | 26 | # The analyser might return terms that are not in the collection. 27 | # These terms are filtered out later in the score_doc functions. 28 | if self._query: 29 | self._query_terms = elastic.analyze_query(self._query).split() 30 | else: 31 | self._query_terms = [] 32 | 33 | # def score_doc(self, doc_id): 34 | # """Scorer method to be implemented in each subclass.""" 35 | # # should use elastic scoring 36 | # query = self._elastic.analyze_query(self._query) 37 | # field = params["first_pass"]["field"] 38 | # res = self._elastic.search(query, field, num=self.__first_pass_num_docs, start=start) 39 | # return 40 | 41 | @staticmethod 42 | def get_scorer(elastic, query, config): 43 | """Returns Scorer object (Scorer factory). 44 | 45 | :param elastic: Elastic object 46 | :param query: raw query (to be analyzed) 47 | :param config: dict with models parameters 48 | """ 49 | model = config.get("model", "lm") 50 | if model == "lm": 51 | return ScorerLM(elastic, query, config) 52 | elif model is None: 53 | return None 54 | else: 55 | raise Exception("Unknown model " + model) 56 | 57 | 58 | # ========================================= 59 | # ================== LM ================== 60 | # ========================================= 61 | class ScorerLM(Scorer): 62 | """Language Model (LM) scorer.""" 63 | JM = "jm" 64 | DIRICHLET = "dirichlet" 65 | 66 | def __init__(self, elastic, query, params): 67 | super(ScorerLM, self).__init__(elastic, query, params) 68 | self._field = params.get("fields", "catchall") 69 | self._smoothing_method = params.get("smoothing_method", self.DIRICHLET).lower() 70 | if self._smoothing_method == self.DIRICHLET: 71 | self._smoothing_param = params.get("smoothing_param", 50) 72 | elif self._smoothing_method == ScorerLM.JM: 73 | self._smoothing_param = params.get("smoothing_param", 0.1) 74 | # self._smoothing_param = params.get("smoothing_param", None) 75 | else: 76 | sys.exit(0) 77 | 78 | self._tf = {} 79 | 80 | @staticmethod 81 | def get_jm_prob(tf_t_d, len_d, tf_t_C, len_C, lambd): 82 | """Computes JM-smoothed probability. 83 | p(t|theta_d) = [(1-lambda) tf(t, d)/|d|] + [lambda tf(t, C)/|C|] 84 | 85 | :param tf_t_d: tf(t,d) 86 | :param len_d: |d| 87 | :param tf_t_C: tf(t,C) 88 | :param len_C: |C| = \sum_{d \in C} |d| 89 | :param lambd: \lambda 90 | :return: JM-smoothed probability 91 | """ 92 | p_t_d = tf_t_d / len_d if len_d > 0 else 0 93 | p_t_C = tf_t_C / len_C if len_C > 0 else 0 94 | if Scorer.SCORER_DEBUG: 95 | print("\t\t\tp(t|d) = {}\tp(t|C) = {}".format(p_t_d, p_t_C)) 96 | return (1 - lambd) * p_t_d + lambd * p_t_C 97 | 98 | @staticmethod 99 | def get_dirichlet_prob(tf_t_d, len_d, tf_t_C, len_C, mu): 100 | """Computes Dirichlet-smoothed probability. 101 | P(t|theta_d) = [tf(t, d) + mu P(t|C)] / [|d| + mu] 102 | 103 | :param tf_t_d: tf(t,d) 104 | :param len_d: |d| 105 | :param tf_t_C: tf(t,C) 106 | :param len_C: |C| = \sum_{d \in C} |d| 107 | :param mu: \mu 108 | :return: Dirichlet-smoothed probability 109 | """ 110 | if mu == 0: # i.e. field does not have any content in the collection 111 | return 0 112 | else: 113 | p_t_C = tf_t_C / len_C if len_C > 0 else 0 114 | return (tf_t_d + mu * p_t_C) / (len_d + mu) 115 | 116 | def __get_term_freq(self, doc_id, field, term): 117 | """Returns the (cached) term frequency.""" 118 | if doc_id not in self._tf: 119 | self._tf[doc_id] = {} 120 | if field not in self._tf[doc_id]: 121 | self._tf[doc_id][field] = self._elastic.term_freqs(doc_id, field) 122 | return self._tf[doc_id][field].get(term, 0) 123 | 124 | def get_lm_term_prob(self, doc_id, field, t, tf_t_d_f=None, tf_t_C_f=None): 125 | """Returns term probability for a document and field. 126 | 127 | :param doc_id: document ID 128 | :param field: field name 129 | :param t: term 130 | :return: P(t|d_f) 131 | """ 132 | len_d_f = self._elastic.doc_length(doc_id, field) 133 | len_C_f = self._elastic.coll_length(field) 134 | tf_t_C_f = self._elastic.coll_term_freq(t, field) if tf_t_C_f is None else tf_t_C_f 135 | tf_t_d_f = self.__get_term_freq(doc_id, field, t) if tf_t_d_f is None else tf_t_d_f 136 | if self.SCORER_DEBUG: 137 | print("\t\tt = {}\t f = {}".format(t, field)) 138 | print("\t\t\tDoc: tf(t,f) = {}\t|f| = {}".format(tf_t_d_f, len_d_f)) 139 | print("\t\t\tColl: tf(t,f) = {}\t|f| = ".format(tf_t_C_f, len_C_f)) 140 | 141 | p_t_d_f = 0 142 | # JM smoothing: p(t|theta_d_f) = [(1-lambda) tf(t, d_f)/|d_f|] + [lambda tf(t, C_f)/|C_f|] 143 | if self._smoothing_method == self.JM: 144 | lambd = self._smoothing_param 145 | p_t_d_f = self.get_jm_prob(tf_t_d_f, len_d_f, tf_t_C_f, len_C_f, lambd) 146 | if self.SCORER_DEBUG: 147 | print("\t\t\tJM smoothing:") 148 | print("\t\t\tDoc: p(t|theta_d_f)= ", p_t_d_f) 149 | 150 | # Dirichlet smoothing 151 | elif self._smoothing_method == self.DIRICHLET: 152 | mu = self._smoothing_param if self._smoothing_param != "avg_len" else self._elastic.avg_len(field) 153 | p_t_d_f = self.get_dirichlet_prob(tf_t_d_f, len_d_f, tf_t_C_f, len_C_f, mu) 154 | if self.SCORER_DEBUG: 155 | print("\t\t\tDirichlet smoothing:") 156 | print("\t\t\tmu: ", mu) 157 | print("\t\t\tDoc: p(t|theta_d_f)= ", p_t_d_f) 158 | return p_t_d_f 159 | 160 | def get_lm_term_probs(self, doc_id, field): 161 | """Returns probability of all query terms for a document and field; i.e. p(t|theta_d) 162 | 163 | :param doc_id: document ID 164 | :param field: field name 165 | :return: dictionary of terms with their probabilities 166 | """ 167 | p_t_theta_d_f = {} 168 | for t in set(self._query_terms): 169 | p_t_theta_d_f[t] = self.get_lm_term_prob(doc_id, field, t) 170 | return p_t_theta_d_f 171 | 172 | def score_doc(self, doc_id): 173 | """Scores the given document using LM. 174 | p(q|theta_d) = \sum log(p(t|theta_d)) 175 | 176 | :param doc_id: document id 177 | :return: LM score 178 | """ 179 | if self.SCORER_DEBUG: 180 | print("Scoring doc ID=" + doc_id) 181 | 182 | p_t_theta_d = self.get_lm_term_probs(doc_id, self._field) 183 | if sum(p_t_theta_d.values()) == 0: # none of query terms are in the field collection 184 | if self.SCORER_DEBUG: 185 | print("\t\tP(q|{}) = None".format(self._field)) 186 | return None 187 | 188 | # p(q|theta_d) = sum log(p(t|theta_d)); we return log-scale values 189 | p_q_theta_d = 0 190 | for t in self._query_terms: 191 | # Skips the term if it is not in the field collection 192 | if p_t_theta_d[t] == 0: 193 | continue 194 | if self.SCORER_DEBUG: 195 | print("\t\tP({}|{}) = {}".format(t, self._field, p_t_theta_d[t])) 196 | p_q_theta_d += math.log(p_t_theta_d[t]) 197 | if self.SCORER_DEBUG: 198 | print("P(d|q) = {}".format(p_q_theta_d)) 199 | return p_q_theta_d 200 | 201 | 202 | if __name__ == "__main__": 203 | query = "gonna friends" 204 | doc_id = "4" 205 | es = ElasticCache("toy_index") 206 | params = {"fields": "content", 207 | "__fields": {"title": 0.2, "content": 0.8}, 208 | "__fields": ["content", "title"] 209 | } 210 | score = ScorerPRMS(es, query, params).score_doc(doc_id) 211 | print(score) 212 | -------------------------------------------------------------------------------- /Population/table_index_example.py: -------------------------------------------------------------------------------- 1 | """ 2 | An example indexer for tables. 3 | 4 | author: Shuo Zhang 5 | 6 | table example 7 | "table-0003-724": { 8 | "data": [ 9 | [ "1996", "There's a Girl in Texas", "20", "2014", "33"], 10 | [ "1996", "[Every_Light_in_the_House|Every Light in the House]", "3", "78", "10"], 11 | [ "1997", "[(This_Ain't)_No_Thinkin'_Thing|(This Ain't) No Thinkin' Thing]", "1", "2014", "1"], 12 | [ "1997", "[I_Left_Something_Turned_On_at_Home|I Left Something Turned On at Home]", "2", "2014", "1"], 13 | [ "\"\u2014\" denotes releases that did not chart", "\"\u2014\" denotes releases that did not chart", "\"\u2014\" denotes releases that did not chart", "\"\u2014\" denotes releases that did not chart", "\"\u2014\" denotes releases that did not chart"] 14 | ], 15 | "entity": ["Every_Light_in_the_House", "(This_Ain't)_No_Thinkin'_Thing", "I_Left_Something_Turned_On_at_Home"] 16 | "heading": ["Year", "Single", "Peak chart positions", "Peak chart positions", "Peak chart positions"], 17 | "caption": "Singles", 18 | "pgTitle": "xxx" 19 | }, 20 | 21 | """ 22 | from elastic import Elastic 23 | import json 24 | 25 | 26 | def table_index(): 27 | index_name = "table_index_frt" 28 | mappings = { 29 | "entity_n": Elastic.notanalyzed_field(), 30 | "entity": Elastic.analyzed_field(), 31 | "data": Elastic.analyzed_field(), 32 | "caption": Elastic.analyzed_field(), 33 | "headings_n": Elastic.notanalyzed_field(), 34 | "headings": Elastic.analyzed_field(), 35 | "pgTitle": Elastic.analyzed_field(), 36 | "catchall": Elastic.analyzed_field(), 37 | } 38 | elastic = Elastic(index_name) 39 | elastic.create_index(mappings, force=True) 40 | tables = {} # todo: map ur data into a json; see above 41 | docs = {} 42 | for table_id, table in tables.items(): 43 | caption = table.get("caption") 44 | headings = label_replace(table.get("heading")) 45 | pgTitle = table.get("pgTitle") 46 | entity = table.get("entity") 47 | data = table.get("data") 48 | catcallall = " ".join([caption, json.dumps(data), pgTitle, headings]) 49 | docs[table_id] = { 50 | "entity_n": entity, 51 | "entity": entity, 52 | "data": data, 53 | "caption": caption, 54 | "headings_n": headings, 55 | "headings": headings, 56 | "pgTitle": pgTitle, 57 | "catchall": catcallall 58 | } 59 | elastic.add_docs_bulk(docs) 60 | 61 | 62 | def parse(h): 63 | """entity [A|B]----B""" 64 | if "[" in h and "|" in h and "]" in h: 65 | return h.split("|")[1].split("]")[0] 66 | else: 67 | return h 68 | 69 | 70 | def label_replace(headings): 71 | """Only keep entity strings""" 72 | return [parse(i) for i in headings] 73 | 74 | 75 | if __name__ == "__main__": 76 | table_index() 77 | -------------------------------------------------------------------------------- /Population/toy_index.py: -------------------------------------------------------------------------------- 1 | """ 2 | Toy Indexer 3 | =========== 4 | 5 | Toy indexing example for testing purposes. 6 | 7 | :Authors: Krisztian Balog, Faegheh Hasibi 8 | """ 9 | 10 | from elastic import Elastic 11 | 12 | def main(): 13 | index_name = "toy_index" 14 | 15 | mappings = { 16 | "title": Elastic.analyzed_field(), 17 | "content": Elastic.analyzed_field(), 18 | } 19 | 20 | docs = { 21 | 1: {"title": "Rap God", 22 | "content": "gonna, gonna, Look, I was gonna go easy on you and not to hurt your feelings" 23 | }, 24 | 2: {"title": "Lose Yourself", 25 | "content": "Yo, if you could just, for one minute Or one split second in time, forget everything Everything that bothers you, or your problems Everything, and follow me" 26 | }, 27 | 3: {"title": "Love The Way You Lie", 28 | "content": "Just gonna stand there and watch me burn But that's alright, because I like the way it hurts" 29 | }, 30 | 4: {"title": "The Monster", 31 | "content": ["gonna gonna I'm friends with the monster", "That's under my bed Get along with the voices inside of my head"] 32 | }, 33 | 5: {"title": "Beautiful", 34 | "content": "Lately I've been hard to reach I've been too long on my own Everybody has a private world Where they can be alone" 35 | } 36 | } 37 | 38 | 39 | elastic = Elastic(index_name) 40 | elastic.create_index(mappings, force=True) 41 | elastic.add_docs_bulk(docs) 42 | 43 | 44 | if __name__ == "__main__": 45 | main() 46 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # sigir2017-table 2 | 3 | This repository provides reources developed within the following paper: 4 | > S. Zhang and K. Balog. EntiTables: Smart Assistance for Entity-Focused Tables. - SIGIR'17 5 | 6 | This study is an effort aimed at reproducing the result presented in the Smart table paper. 7 | 8 | 9 | This repository is structured as follows: 10 | 11 | - Data: The table corpus is [WikiTables](http://websail-fe.cs.northwestern.edu/TabEL/), which comprises 1.6M tables extracted from Wikipedia. We proproceeed it and make it public downloadable [here](http://iai.group/downloads/smart_table/WP_tables.zip). 12 | - Population: All the core evaluation of population tasks are provided here. 13 | - Output: The output files can only be requested by email now. 14 | 15 | ## Run files 16 | You can download all the runfiles [here](https://gustav1.ux.uis.no/downloads/sigir2019-table2vec/runfiles.zip). 17 | 18 | 19 | ## Data 20 | The data we used are public data sets: 21 | - DBpedia 2015-10 22 | - WikiTable from http://websail-fe.cs.northwestern.edu/TabEL/ 23 | 24 | ## Population 25 | [NOTE] We are using elastic 2 ( > 2.3), elasticsearch 5 will encounter some minor problems with elastic.py wrapper. 26 | To score the column labels, we need to build a table index with multiple fields using elasticsearch. 27 | An exmaple indexer is provided for indexing. Index your table corpus data following this example and start your population:) 28 | 29 | ## Note 30 | The run files are accessiable per request. 31 | 32 | ## Citation 33 | ``` 34 | @inproceedings{Zhang:2017:ESA, 35 | author = {Zhang, Shuo and Balog, Krisztian}, 36 | title = {EntiTables: Smart Assistance for Entity-Focused Tables}, 37 | booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval}, 38 | series = {SIGIR '17}, 39 | year = {2017}, 40 | isbn = {978-1-4503-5022-8}, 41 | location = {Shinjuku, Tokyo, Japan}, 42 | pages = {255--264}, 43 | numpages = {10}, 44 | url = {http://doi.acm.org/10.1145/3077136.3080796}, 45 | doi = {10.1145/3077136.3080796}, 46 | acmid = {3080796}, 47 | publisher = {ACM}, 48 | address = {New York, NY, USA}, 49 | keywords = {intelligent table assistance, semantic search, table completion}, 50 | } 51 | ``` 52 | 53 | 54 | ## Contact 55 | If you have any question, please contact Shuo Zhang at imsure318@gmail.com or Krisztian Balog at krisztian.balog@uis.no 56 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | elasticsearch>=2.3.0 2 | --------------------------------------------------------------------------------